From mjldehoon at yahoo.com Sun Mar 1 07:17:28 2009 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Sun, 1 Mar 2009 04:17:28 -0800 (PST) Subject: [Biopython-dev] ScanProsite Message-ID: <704108.77040.qm@web62402.mail.re1.yahoo.com> ScanProsite is a web tool to scan protein sequences against the PROSITE database (see http://www.expasy.org/tools/scanprosite/). Biopython contains code in Bio.Prosite to interact with ScanProsite. However, this code needs to be updated, as it does not work with the current ScanProsite web pages: Neither accessing ScanProsite nor extracting the hits from the HTML page works. This problem is relatively easy to solve, since ExPASy nowadays allows programmatic access to ScanProsite (see http://www.expasy.org/tools/scanprosite/ScanPrositeREST.html). This returns the Prosite hits in XML format, which can be parsed easily in Python. The only issue now is how this should be presented to the user. The current (broken) way to access Prosite looks like this: >>> from Bio import ExPASy >>> handle = ExPASy.scanprosite1(seq=mysequence) to get a handle to the raw HTML output, and >>> from Bio import Prosite >>> hits = Prosite.scan_sequence_expasy(seq=mysequence) which returns the hits as a Python list. One possibility is to have a ScanProsite module under Bio.Prosite or Bio.ExPASy for interaction with ScanProsite. Something like this: >>> from Bio.ExPASy import ScanProsite >>> handle = ScanProsite.search(seq=mysequence) >>> hits = ScanProsite.read(handle) Another option is to have a scan function in the Bio.Prosite module that accesses the ScanProsite web tool and parses the results: >>> from Bio import Prosite >>> hits = Prosite.scan(seq=mysequence) This is more straightforward, but on the other hand people may want to save the XML search results in an XML file, and for that purpose we'd need a function that does the parsing only. Any opinions? --Michiel From bugzilla-daemon at portal.open-bio.org Sun Mar 1 12:00:36 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sun, 1 Mar 2009 12:00:36 -0500 Subject: [Biopython-dev] [Bug 2495] parse element symbols for ATOM/HETATM records (Bio.PDB.PDBParser) In-Reply-To: Message-ID: <200903011700.n21H0alo006588@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2495 ------- Comment #1 from barry_finzel at yahoo.com 2009-03-01 12:00 EST ------- IO.save should also write these element types on an output PDB file -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sun Mar 1 12:06:54 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sun, 1 Mar 2009 12:06:54 -0500 Subject: [Biopython-dev] [Bug 2292] Bio.PDBIO writes TER records without any required fields In-Reply-To: Message-ID: <200903011706.n21H6sJp007165@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2292 barry_finzel at yahoo.com changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |barry_finzel at yahoo.com ------- Comment #2 from barry_finzel at yahoo.com 2009-03-01 12:06 EST ------- IO.save is also writing TER cards at the end of chains, rather than at the end of polypeptide chains. TER cards should never follow HETATM atom records. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sun Mar 1 12:22:28 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sun, 1 Mar 2009 12:22:28 -0500 Subject: [Biopython-dev] [Bug 2774] New: Bio.PDBIO.save doesn't write the required END record Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2774 Summary: Bio.PDBIO.save doesn't write the required END record Product: Biopython Version: Not Applicable Platform: All OS/Version: Mac OS Status: NEW Severity: normal Priority: P2 Component: Other AssignedTo: biopython-dev at biopython.org ReportedBy: barry_finzel at yahoo.com According to the PDB format specification (http://www.wwpdb.org/documentation/format32/sect1.html) All PDB files must be terminated with a record containing just "END\n". Easy to fix in PDBIO.save() -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython at maubp.freeserve.co.uk Mon Mar 2 05:26:38 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 2 Mar 2009 10:26:38 +0000 Subject: [Biopython-dev] ScanProsite In-Reply-To: <704108.77040.qm@web62402.mail.re1.yahoo.com> References: <704108.77040.qm@web62402.mail.re1.yahoo.com> Message-ID: <320fb6e00903020226n3e5929ean957f38315c28d863@mail.gmail.com> On Sun, Mar 1, 2009 at 12:17 PM, Michiel de Hoon wrote: > ScanProsite is a web tool to scan protein sequences against the PROSITE > database (see http://www.expasy.org/tools/scanprosite/). Biopython contains > code in Bio.Prosite to interact with ScanProsite. However, this code needs to > be updated, as it does not work with the current ScanProsite web pages: > Neither accessing ScanProsite nor extracting the hits from the HTML page works. > > This problem is relatively easy to solve, since ExPASy nowadays allows > programmatic access to ScanProsite > (see http://www.expasy.org/tools/scanprosite/ScanPrositeREST.html). This > returns the Prosite hits in XML format, which can be parsed easily in Python. > > The only issue now is how this should be presented to the user. ... > ... > This is more straightforward, but on the other hand people may want to save the > XML search results in an XML file, and for that purpose we'd need a function that > does the parsing only. > > Any opinions? I would definitely have two functions, one returning a handle to the XML, and one for parsing XML from a handle. This would be more consistent with Bio.Entrez and other parsers, and more flexible. For example, the user can opt to save the XML to disk, and they can also use our parser on files or the remote site - plus of course they can use any other XML parser they may prefer. I like your suggestion to have a REST XML based module under Bio.ExPASy, which means we can deprecate the HTML based Bio.Prosite module and in the process make the top level list of modules in Biopython a bit shorter. In the long term I think that will help people find functionality. Peter From bugzilla-daemon at portal.open-bio.org Mon Mar 2 10:22:53 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 2 Mar 2009 10:22:53 -0500 Subject: [Biopython-dev] [Bug 2776] New: Bio.pairwise2 returns non-optimal alignment in at least some cases Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2776 Summary: Bio.pairwise2 returns non-optimal alignment in at least some cases Product: Biopython Version: 1.49 Platform: PC OS/Version: Linux Status: NEW Severity: normal Priority: P2 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: klaus.kopec at tuebingen.mpg.de At least in some cases, Bio.pairwise2 returns an alignment that is not the one with the highest score for the input parameters. This occurs in localXX and globalXX. Yet, I only encountered the problem with large mismatch values (which I use as I need mismatch free alignments). simple example (the bug also occured for longer sequences): >>> sequence1 = 'GKG' >>> sequence2 = 'GWG' >>> A = pairwise2.align.globalms(sequence1, sequence2, 5, -100, -5, -5)[0] >>> A[0] 'GKG--' >>> A[1] '--GWG' >>> A[2] -15.0 whereas 'GK-G' 'G-WG' would get a score of 0 System: Kubuntu 8.10 64Bit, Python 2.6.1, Biopython 1.49 (my pairwise2.py is identical to the current CVS version of it) -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Mar 4 07:41:33 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 4 Mar 2009 07:41:33 -0500 Subject: [Biopython-dev] [Bug 2777] New: [Solution is one line change!] Entity sorting altered by detach_child() calls Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2777 Summary: [Solution is one line change!] Entity sorting altered by detach_child() calls Product: Biopython Version: 1.49 Platform: PC OS/Version: Linux Status: NEW Severity: trivial Priority: P1 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: klaus.kopec at tuebingen.mpg.de detach_child(self, id) in Bio.PDB.Entity changes the order of self.child_list. This bug is caused by line 71, where self.child_list is set to self.child_dict.values() which are values of an unordered(!) dict: self.child_list=self.child_dict.values() Solution: Replace line 71 by: self.child_list.remove(child) -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Mar 4 07:48:19 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 4 Mar 2009 07:48:19 -0500 Subject: [Biopython-dev] [Bug 2777] [Solution is one line change!] Entity sorting altered by detach_child() calls In-Reply-To: Message-ID: <200903041248.n24CmJSZ008104@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2777 ------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2009-03-04 07:48 EST ------- Have you got a short example to demonstrate the original problem? It would be useful to evaluate your change, and could be made into a unit test too. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Mar 4 08:58:41 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 4 Mar 2009 08:58:41 -0500 Subject: [Biopython-dev] [Bug 2777] [Solution is one line change!] Entity sorting altered by detach_child() calls In-Reply-To: Message-ID: <200903041358.n24Dwfjk015027@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2777 ------- Comment #2 from klaus.kopec at tuebingen.mpg.de 2009-03-04 08:58 EST ------- Created an attachment (id=1253) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1253&action=view) example PDB file that can be used to see the bug ## Python Code to see the bug: import os from Bio.PDB.PDBParser import PDBParser p=PDBParser(PERMISSIVE=1) filename=os.path.expanduser("entity_detach_order_bug_example.pdb") s=p.get_structure('Entity.py bug example: detach changes order', filename) print 'order before detach:' for r in s[0]['A'].child_list: print r.id detach_me = s[0]['A'].child_list[-1] ## this is independent of the chosen entry in the list s[0]['A'].detach_child(detach_me.id) print 'order after detach:' for r in s[0]['A'].child_list: print r.id -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Mar 4 09:18:28 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 4 Mar 2009 09:18:28 -0500 Subject: [Biopython-dev] [Bug 2777] [Solution is one line change!] Entity sorting altered by detach_child() calls In-Reply-To: Message-ID: <200903041418.n24EISvd016743@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2777 ------- Comment #3 from klaus.kopec at tuebingen.mpg.de 2009-03-04 09:18 EST ------- the output of the code in my Comment #2 is: order before detach: ('H_PCA', 1, ' ') (' ', 2, ' ') (' ', 3, ' ') (' ', 4, ' ') order after detach: (' ', 2, ' ') (' ', 3, ' ') ('H_PCA', 1, ' ') I forgot to mention, that the line "self.child_list.sort(self._sort)" needs to be commented out as well for the fix to work (as hetatms are otherwise sorted to the end). hmmm... it just came to me, that this probably breaks the Parser for some other PDB files, where residues are unsorted. These changes do not break any existing unit tests for the PDB module, so maybe it's still a step in the right direction. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Mar 4 09:37:34 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 4 Mar 2009 09:37:34 -0500 Subject: [Biopython-dev] [Bug 2777] [Solution is one line change!] Entity sorting altered by detach_child() calls In-Reply-To: Message-ID: <200903041437.n24EbYhj018545@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2777 ------- Comment #4 from biopython-bugzilla at maubp.freeserve.co.uk 2009-03-04 09:37 EST ------- Created an attachment (id=1254) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1254&action=view) Patch for Bio/PDB/Entity.py based on Klaus Kopec's suggestion I've attached a patch which makes the suggested change. I'm hoping to get Thomas (the original author) to comment but otherwise I see no reason not to commit this fix soon. The old code did this: def detach_child(self, id): "Remove a child." child=self.child_dict[id] child.detach_parent() del self.child_dict[id] self.child_list=self.child_dict.values() self.child_list.sort(self._sort) It used a sort which should have preserved the order - but that only works if the child_list is always kept sorted. Looking at the add method, this isn't true: def add(self, entity): "Add a child to the Entity." entity_id=entity.get_id() if self.has_id(entity_id): raise PDBConstructionException( \ "%s defined twice" % str(entity_id)) entity.set_parent(self) self.child_list.append(entity) #self.child_list.sort(self._sort) self.child_dict[entity_id]=entity Interestingly the sort was commented out in the original version first committed to Biopython's CVS, so this change predates the integration into Biopython. I haven't checked to see if there are any other ways the child_list could become unsorted - that doesn't really matter. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Mar 4 11:17:31 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 4 Mar 2009 11:17:31 -0500 Subject: [Biopython-dev] [Bug 2774] Bio.PDBIO.save doesn't write the required END record In-Reply-To: Message-ID: <200903041617.n24GHVd1029752@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2774 thamelry at binf.ku.dk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #1 from thamelry at binf.ku.dk 2009-03-04 11:17 EST ------- save method now has option 'write_end': io.save(fp, write_end=1) if 1, END is written. The reason this is not done by default is that one sometimes calls 'save' multiple times, for example when concatenating files. So always writing END is not a good approach. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Mar 4 14:10:37 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 4 Mar 2009 14:10:37 -0500 Subject: [Biopython-dev] [Bug 2778] New: Efficiency improvement in function Bio.SeqUtils.GC() Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2778 Summary: Efficiency improvement in function Bio.SeqUtils.GC() Product: Biopython Version: 1.48 Platform: PC OS/Version: Linux Status: NEW Severity: normal Priority: P5 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: wscott at chem.ubc.ca Bio.SeqUtils.GC recalculates the gc variable in a loop using a dictionary whereas it could simply be calculated after the loop. The following code is suggested to replace the function: def ScoGC(seq): """ calculates G+C content """ gc=sum(map(seq.count,['G','C','g','c','S','s'])) return gc*100.0/len(seq) -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Mar 4 14:12:27 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 4 Mar 2009 14:12:27 -0500 Subject: [Biopython-dev] [Bug 2778] Efficiency improvement in function Bio.SeqUtils.GC() In-Reply-To: Message-ID: <200903041912.n24JCR2U014353@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2778 wscott at chem.ubc.ca changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |wscott at chem.ubc.ca ------- Comment #1 from wscott at chem.ubc.ca 2009-03-04 14:12 EST ------- of course, rename ScoGC to GC... -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Mar 4 17:03:59 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 4 Mar 2009 17:03:59 -0500 Subject: [Biopython-dev] [Bug 2779] New: Seq.count() docstring should note unexpected behaviour Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2779 Summary: Seq.count() docstring should note unexpected behaviour Product: Biopython Version: 1.49 Platform: PC OS/Version: Windows XP Status: NEW Severity: normal Priority: P2 Component: Documentation AssignedTo: biopython-dev at biopython.org ReportedBy: baoilleach at gmail.com The Seq.count() method has the following docs: "Count method, like that of a python string." This is a cop-out as it does not tell the user anything. In particular, it does not lead the user to expect that Seq("GGG").count("GG")==1. This might make sense for Python strings, but it's incorrect for sequences. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Mar 5 04:19:40 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 5 Mar 2009 04:19:40 -0500 Subject: [Biopython-dev] [Bug 2678] Bio.Entrez module does not always retrieve or find DTD files In-Reply-To: Message-ID: <200903050919.n259Je8d016299@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2678 ruzzo at cs.washington.edu changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |ruzzo at cs.washington.edu ------- Comment #8 from ruzzo at cs.washington.edu 2009-03-05 04:19 EST ------- I'm new to biopython, so I may be doing something else wrong, but in attempting to efetch a pubmed record tonight I see similar errors which seem to be fixed by downloading & installing several (new) DTD's: nlmmedline_090101.dtd nlmmedlinecitation_090101.dtd nlmsharedcatcit_090101.dtd nlmcommon_090101.dtd and possibly pubmed_090101.dtd -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Mar 5 04:23:31 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 5 Mar 2009 04:23:31 -0500 Subject: [Biopython-dev] [Bug 2779] Seq.count() docstring should note unexpected behaviour In-Reply-To: Message-ID: <200903050923.n259NV4S016627@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2779 ------- Comment #1 from lpritc at scri.sari.ac.uk 2009-03-05 04:23 EST ------- I think that's a good point about expected behaviour for count() in a biological sequence. Presumably, we all expect that Seq('GGG').count('GG') should find all overlapping matches, and return the value 2, in order to make intuitive 'biological' sense. There are, after all, two 'GG's in that sequence. This doesn't correspond to string count()ing behaviour, or to standard re module behaviour. The obvious way round it, that I've used before, is to compile the search string as a regular expression, and iterate regular expression matches from one symbol after the start of the preceding match (if any): >>> import re >>> startpos = 0 >>> seq = 'GGGG' >>> motif = 'GG' >>> motif_re = re.compile(motif) >>> matches = [] >>> while True: ... m = motif_re.search(seq, startpos) ... if m is None: ... break ... startpos = m.start() + 1 ... matches.append(m) ... >>> matches [<_sre.SRE_Match object at 0x68f38>, <_sre.SRE_Match object at 0x96ac60>, <_sre.SRE_Match object at 0x96a950>] >>> [(m.start(), m.group()) for m in matches] [(0, 'GG'), (1, 'GG'), (2, 'GG')] This could probably be done more efficiently. Is something like this already implemented in Bio.Motif -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Mar 5 04:24:43 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 5 Mar 2009 04:24:43 -0500 Subject: [Biopython-dev] [Bug 2779] Seq.count() docstring should note unexpected behaviour In-Reply-To: Message-ID: <200903050924.n259OhYw016750@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2779 ------- Comment #2 from lpritc at scri.sari.ac.uk 2009-03-05 04:24 EST ------- D'oh! There isn't a Bio.Motif. My bad. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Mar 5 04:43:09 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 5 Mar 2009 04:43:09 -0500 Subject: [Biopython-dev] [Bug 2779] Seq.count() docstring should note unexpected behaviour In-Reply-To: Message-ID: <200903050943.n259h9XG018545@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2779 ------- Comment #3 from baoilleach at gmail.com 2009-03-05 04:43 EST ------- Thanks for the workaround but could you replace the current count by that code? Can you imagine any existing code that would break because of correction of buggy behaviour? -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Mar 5 05:16:52 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 5 Mar 2009 05:16:52 -0500 Subject: [Biopython-dev] [Bug 2779] Seq.count() docstring should note unexpected behaviour In-Reply-To: Message-ID: <200903051016.n25AGqSW021680@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2779 ------- Comment #4 from lpritc at scri.sari.ac.uk 2009-03-05 05:16 EST ------- Created an attachment (id=1255) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1255&action=view) Patch to Seq.py that modified count behaviour for Seq and MutableSeq objects to return correct counts for substrings of length > 1 (In reply to comment #3) > Thanks for the workaround but could you replace the current count by that code? I don't have access to CVS ;) It would be nice to get consensus that the behaviour that this code would produce is the desired behaviour for everyone, that we've got an acceptable way of implementing it, and that it doesn't break anything downstream. There's bound to be, at best, a lag time. I've attached a proposed patch based on the above code, though it's not necessarily the best way to solve this problem. > Can you imagine any existing code that would break because of correction of > buggy behaviour? That should come out in the testing. And it turns out that there is a Bio.Motif, but it's in CVS. D'oh! again... -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Mar 5 05:22:40 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 5 Mar 2009 05:22:40 -0500 Subject: [Biopython-dev] [Bug 2779] Seq.count() docstring should note unexpected behaviour In-Reply-To: Message-ID: <200903051022.n25AMeIt022121@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2779 ------- Comment #5 from biopython-bugzilla at maubp.freeserve.co.uk 2009-03-05 05:22 EST ------- Prior to Biopython 1.45, the count method only worked with single letter search strings. I changed this just over a year ago for Biopython 1.45 as Bug 2386, but unfortunately at the time none of us considered this overlapping/non-overlapping behaviour. With hindsight we should have had this debate then. http://cvs.biopython.org/cgi-bin/viewcvs/viewcvs.cgi/biopython/Bio/Seq.py.diff?r1=1.19&r2=1.20&cvsroot=biopython We should either: (a) stick with the python string compatible behaviour (which has been a general principle for the Seq class), but document this issue more clearly as a non-overlapping search does run counter to biological usage. or, (b) Or change the behaviour as Leighton suggests to do an overlapping search. This could break any code relying on the old python string-like behaviour. I agree we need to have a discussion of this over on the main mailing list, as making the change could break people's code. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Mar 5 05:42:27 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 5 Mar 2009 05:42:27 -0500 Subject: [Biopython-dev] [Bug 2780] New: PDB file HETATMs cannot be alternative location of a residue that is an ATOM Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2780 Summary: PDB file HETATMs cannot be alternative location of a residue that is an ATOM Product: Biopython Version: 1.49 Platform: PC OS/Version: Linux Status: NEW Severity: normal Priority: P2 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: klaus.kopec at tuebingen.mpg.de In PDB files where HETATMs and ATOMs are altlocs of each other (e.g. 1RR2, residue 184), they are treated as two separate residues. A obvious solution is to add an "else" case to the "if" in StructureBuilder.py line 115 (method init_residue(...)) that introduces some kind of mixed (HETATM as well as ATOM) DisorderedResidue. The Main problem with that: the hetero field of the residue ids will differ between the residues, therefore the whole access-over-ids mechanism will most likely not work with these MixedDisorderedResidues as straight forward as it does so far. Sadly, I could not come up with a good solution for this. Maybe some __getattr__ magic that alters the way Chains access their residues might work by allowing access to residues by only using the second and third component of the id 3-tuple?! -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Mar 5 05:44:12 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 5 Mar 2009 05:44:12 -0500 Subject: [Biopython-dev] [Bug 2678] Bio.Entrez module does not always retrieve or find DTD files In-Reply-To: Message-ID: <200903051044.n25AiCH9023924@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2678 ------- Comment #9 from biopython-bugzilla at maubp.freeserve.co.uk 2009-03-05 05:44 EST ------- (In reply to comment #8) > I'm new to biopython, so I may be doing something else wrong, but in > attempting to efetch a pubmed record tonight I see similar errors which > seem to be fixed by downloading & installing several (new) DTD's: > > nlmmedline_090101.dtd > nlmmedlinecitation_090101.dtd > nlmsharedcatcit_090101.dtd > nlmcommon_090101.dtd > and possibly > pubmed_090101.dtd > Those have been added to CVS, and will be installed with Biopython 1.50 - perhaps we should hurry up our release plans. http://cvs.biopython.org/cgi-bin/viewcvs/viewcvs.cgi/biopython/Bio/Entrez/DTDs/?cvsroot=biopython -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Mar 5 05:46:09 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 5 Mar 2009 05:46:09 -0500 Subject: [Biopython-dev] [Bug 2780] PDB file HETATMs cannot be alternative location of a residue that is an ATOM In-Reply-To: Message-ID: <200903051046.n25Ak9DH024105@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2780 ------- Comment #1 from klaus.kopec at tuebingen.mpg.de 2009-03-05 05:46 EST ------- Created an attachment (id=1256) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1256&action=view) PDB file slice with 2 residues, that can be used to see the bug. slice of PDB file 1RR2 (example mentioned in my bug submission) showing two altloc residues where one is a HETATM and the other an ATOM. They are treated as two residues in Biopython. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Mar 5 05:56:39 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 5 Mar 2009 05:56:39 -0500 Subject: [Biopython-dev] [Bug 2778] Efficiency improvement in function Bio.SeqUtils.GC() In-Reply-To: Message-ID: <200903051056.n25AudjU024927@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2778 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk 2009-03-05 05:56 EST ------- I've checked that in, but with the existing code to catch a zero length sequence and return 0 instead of raising a ZeroDivisionError. def GC(seq): """Calculates G+C content, ...""" gc=sum(map(seq.count,['G','C','g','c','S','s'])) if gc == 0: return 0 return gc*100.0/len(seq) The old code had been modified several times - it originally calculated the GC% as the CG count divided by the ATCG count, thus it had to count all the bases. You are right, this is much cleaner. Thanks. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Mar 5 06:18:33 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 5 Mar 2009 06:18:33 -0500 Subject: [Biopython-dev] [Bug 2779] Seq.count() docstring should note unexpected behaviour In-Reply-To: Message-ID: <200903051118.n25BIXdp026743@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2779 ------- Comment #6 from baoilleach at gmail.com 2009-03-05 06:18 EST ------- Sorry - could you clarify which mailing list you mean by the "main mailing list", the dev list or the discuss list? -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Mar 5 07:27:49 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 5 Mar 2009 07:27:49 -0500 Subject: [Biopython-dev] [Bug 2779] Seq.count() docstring should note unexpected behaviour In-Reply-To: Message-ID: <200903051227.n25CRnmA001571@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2779 ------- Comment #7 from biopython-bugzilla at maubp.freeserve.co.uk 2009-03-05 07:27 EST ------- (In reply to comment #6) > Sorry - could you clarify which mailing list you mean by the "main mailing > list", the dev list or the discuss list? I was thinking the main discussion list, and we should focus on the desired behaviour rather than how we might implement it. See: http://lists.open-bio.org/pipermail/biopython/2009-March/004960.html Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Mar 5 07:31:50 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 5 Mar 2009 07:31:50 -0500 Subject: [Biopython-dev] [Bug 2781] New: Bio.PDB Structure instances cannot be deepcopied Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2781 Summary: Bio.PDB Structure instances cannot be deepcopied Product: Biopython Version: 1.49 Platform: PC OS/Version: Linux Status: NEW Severity: minor Priority: P3 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: klaus.kopec at tuebingen.mpg.de For some reason, copy.deepcopy() of a Structure instance results in: Exception RuntimeError: 'maximum recursion depth exceeded while calling a Python object' in ignored for most PDB files I tried. Maybe implementing some __deepcopy__ methods might help, but I am unsure, as I did not perform profound research concerning this bug. My system: Kubuntu 8.10 64-Bit, Python 2.6.1 -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython at maubp.freeserve.co.uk Thu Mar 5 07:40:16 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 5 Mar 2009 12:40:16 +0000 Subject: [Biopython-dev] Fwd: [Utilities-announce] NCBI E-Utilities requirements updated In-Reply-To: <7B6F170840CA6C4DA63EE0C8A7BB43EC051985F6@NIHCESMLBX15.nih.gov> References: <7B6F170840CA6C4DA63EE0C8A7BB43EC051985F6@NIHCESMLBX15.nih.gov> Message-ID: <320fb6e00903050440h138893b3yb2484a557621fc41@mail.gmail.com> This email was sent out a few weeks ago, but it took a while before the NCBI webpage was actually updated (maybe a caching issue) so I didn't rush to relax our rules immediately. Under the new rules we must make no more than three requests every second. We could track the times of the last two requests in order to enforce this as worded, but I think it would be simpler just to switch from using a minimum 3 second pause between Bio.Entrez requests to just a minimum 0.33334 second pause. This is a much simpler code change and will comply with the new relaxed rules. Unless anyone has a counter suggestion, I will update Bio.Entrez and the tutorial shortly. Peter ---------- Forwarded message ---------- From: Date: Thu, Feb 26, 2009 at 6:55 PM Subject: [Utilities-announce] NCBI E-Utilities requirements updated To: utilities-announce at ncbi.nlm.nih.gov NCBI E-Utilities users, E-Utilities system use requirements have been modified ?from no more than 1 request every 3 seconds to no more than 3 requests every second. The online documentation has been updated to reflect this change: http://eutils.ncbi.nlm.nih.gov/entrez/query/static/eutils_help.html Thank you. NCBI/NLM/NIH _______________________________________________ Utilities-announce mailing list http://www.ncbi.nlm.nih.gov/mailman/listinfo/utilities-announce From bugzilla-daemon at portal.open-bio.org Thu Mar 5 07:58:40 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 5 Mar 2009 07:58:40 -0500 Subject: [Biopython-dev] [Bug 2779] Seq.count() docstring should note unexpected behaviour In-Reply-To: Message-ID: <200903051258.n25Cwe9p004288@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2779 ------- Comment #8 from barwil at gmail.com 2009-03-05 07:58 EST ------- (In reply to comment #4) > This could probably be done more efficiently. Is something like this already > implemented in Bio.Motif > In Bio.Motif you can do: m=Bio.Motif.Motif() m.add_instance(Seq("GG"),m.alphabet)) for i in m.search_instances(Seq("GGGG",m.alphabet)): print i this should give you overlapping hits there is Bio.Motif in CVS, but the same implementation is in Bio.AlignAce.Motif (now obsoleted). -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython at maubp.freeserve.co.uk Thu Mar 5 07:58:40 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 5 Mar 2009 12:58:40 +0000 Subject: [Biopython-dev] determining the version In-Reply-To: <320fb6e00902190225o34092311saddf02ec39f1e1dd@mail.gmail.com> References: <320fb6e00809241412r54c2a3a1mc69f3e573f1eaac7@mail.gmail.com> <63700.34226.qm@web62405.mail.re1.yahoo.com> <320fb6e00809250222h3d0d15bw763446b5f0ec44d1@mail.gmail.com> <320fb6e00810010929y4dab07a5ya25767cc0818654d@mail.gmail.com> <320fb6e00902190225o34092311saddf02ec39f1e1dd@mail.gmail.com> Message-ID: <320fb6e00903050458r5ef0e202l5e1a61031fb80c2@mail.gmail.com> On Thu, Feb 19, 2009 at 10:25 AM, Peter wrote: > > Since this thread last year, there have been no objections. ?Following > a recent question on the main mailing list about how to determine the > version of Biopython this seems worth doing before the next release. > Again, an objections or comments on the implementation details? > Otherwise I'll make this change shortly. > Changes made in CVS, and updated the release instructions: http://biopython.org/wiki/Building_a_release In between releases, should we leave the __version__ as is, or explicitly update it to be something like "1.49+" just after releasing 1.49? This only affects people installing Biopython from CVS, so they should be technically inclined... Peter From bugzilla-daemon at portal.open-bio.org Thu Mar 5 09:47:30 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 5 Mar 2009 09:47:30 -0500 Subject: [Biopython-dev] [Bug 2507] Adding __getitem__ to SeqRecord for element access and slicing In-Reply-To: Message-ID: <200903051447.n25ElU37014276@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2507 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #14 from biopython-bugzilla at maubp.freeserve.co.uk 2009-03-05 09:47 EST ------- We seem to have reached agreement on the mailing list, so checking this patch in, and marking this issue as fixed. Note we may want to review the choice of name for the new per-letter-annotations attribute (as long as this happens before the Biopython 1.50 release), currently this is letter_annotations as per a brief discussion on the mailing list. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Mar 5 09:47:43 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 5 Mar 2009 09:47:43 -0500 Subject: [Biopython-dev] [Bug 2551] Adding advanced __getitem__ to generic alignment, e.g. align[1:2, 5:-5] In-Reply-To: Message-ID: <200903051447.n25ElhAb014302@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2551 Bug 2551 depends on bug 2507, which changed state. Bug 2507 Summary: Adding __getitem__ to SeqRecord for element access and slicing http://bugzilla.open-bio.org/show_bug.cgi?id=2507 What |Old Value |New Value ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Mar 5 09:47:44 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 5 Mar 2009 09:47:44 -0500 Subject: [Biopython-dev] [Bug 2767] Bio.SeqIO support for FASTQ and QUAL files In-Reply-To: Message-ID: <200903051447.n25EliM6014314@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2767 Bug 2767 depends on bug 2507, which changed state. Bug 2507 Summary: Adding __getitem__ to SeqRecord for element access and slicing http://bugzilla.open-bio.org/show_bug.cgi?id=2507 What |Old Value |New Value ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Mar 5 10:31:17 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 5 Mar 2009 10:31:17 -0500 Subject: [Biopython-dev] [Bug 2778] Efficiency improvement in function Bio.SeqUtils.GC() In-Reply-To: Message-ID: <200903051531.n25FVHOq018242@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2778 ------- Comment #3 from bsouthey at gmail.com 2009-03-05 10:31 EST ------- (In reply to comment #2) > I've checked that in, but with the existing code to catch a zero length > sequence and return 0 instead of raising a ZeroDivisionError. > > def GC(seq): > """Calculates G+C content, ...""" > gc=sum(map(seq.count,['G','C','g','c','S','s'])) > if gc == 0: return 0 > return gc*100.0/len(seq) > I think that it is clearer to check that the sequence length is not zero rather than assuming that if the sum is zero then the sequence length is also zero. def GC(seq): """Calculates G+C content, ...""" gc=sum(map(seq.count,['G','C','g','c','S','s'])) if len(seq) > 0: return gc*100.0/len(seq) else: return 0 -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Mar 5 10:51:20 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 5 Mar 2009 10:51:20 -0500 Subject: [Biopython-dev] [Bug 2778] Efficiency improvement in function Bio.SeqUtils.GC() In-Reply-To: Message-ID: <200903051551.n25FpKGf020282@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2778 ------- Comment #4 from lpritc at scri.sari.ac.uk 2009-03-05 10:51 EST ------- (In reply to comment #3) > (In reply to comment #2) > > I've checked that in, but with the existing code to catch a zero length > > sequence and return 0 instead of raising a ZeroDivisionError. > > > > def GC(seq): > > """Calculates G+C content, ...""" > > gc=sum(map(seq.count,['G','C','g','c','S','s'])) > > if gc == 0: return 0 > > return gc*100.0/len(seq) > > > > I think that it is clearer to check that the sequence length is not zero rather > than assuming that if the sum is zero then the sequence length is also zero. > > def GC(seq): > """Calculates G+C content, ...""" > gc=sum(map(seq.count,['G','C','g','c','S','s'])) > if len(seq) > 0: > return gc*100.0/len(seq) > else: > return 0 It would probably be clearest, quickest and most efficient to comment that particular line of the code to point out that it does elegant double-duty as a check for zero sequence length ;) -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Mar 5 10:56:38 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 5 Mar 2009 10:56:38 -0500 Subject: [Biopython-dev] [Bug 2778] Efficiency improvement in function Bio.SeqUtils.GC() In-Reply-To: Message-ID: <200903051556.n25Fuc13020807@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2778 ------- Comment #5 from biopython-bugzilla at maubp.freeserve.co.uk 2009-03-05 10:56 EST ------- (In reply to comment #3) > I think that it is clearer to check that the sequence length is > not zero rather than assuming that if the sum is zero then the > sequence length is also zero. I agree, but had chosen to keep the old code. > def GC(seq): > """Calculates G+C content, ...""" > gc=sum(map(seq.count,['G','C','g','c','S','s'])) > if len(seq) > 0: > return gc*100.0/len(seq) > else: > return 0 > Your length test isn't very elegant, this is much nicer/more pythonic I think: if seq : gc = sum(map(seq.count,['G','C','g','c','S','s'])) return gc*100.0/len(seq) else : return 0 However, given most of the time the sequence will not be empty, this should be faster: try : gc = sum(map(seq.count,['G','C','g','c','S','s'])) return gc*100.0/len(seq) except ZeroDivisionError : return 0 CVS updated. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Mar 5 11:04:07 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 5 Mar 2009 11:04:07 -0500 Subject: [Biopython-dev] [Bug 2551] Adding advanced __getitem__ to generic alignment, e.g. align[1:2, 5:-5] In-Reply-To: Message-ID: <200903051604.n25G471v021470@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2551 ------- Comment #4 from biopython-bugzilla at maubp.freeserve.co.uk 2009-03-05 11:04 EST ------- Created an attachment (id=1257) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1257&action=view) Patch for Bio/Align/Generic.py to support array like access This requires the patch to the SeqRecord __getitem__ method just committed to CVS for Bug 2507. This includes an extended doctest which tries to illustrate the typical usage I expect. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bsouthey at gmail.com Thu Mar 5 11:59:08 2009 From: bsouthey at gmail.com (Bruce Southey) Date: Thu, 5 Mar 2009 10:59:08 -0600 Subject: [Biopython-dev] determining the version In-Reply-To: <320fb6e00903050458r5ef0e202l5e1a61031fb80c2@mail.gmail.com> References: <320fb6e00809241412r54c2a3a1mc69f3e573f1eaac7@mail.gmail.com> <63700.34226.qm@web62405.mail.re1.yahoo.com> <320fb6e00809250222h3d0d15bw763446b5f0ec44d1@mail.gmail.com> <320fb6e00810010929y4dab07a5ya25767cc0818654d@mail.gmail.com> <320fb6e00902190225o34092311saddf02ec39f1e1dd@mail.gmail.com> <320fb6e00903050458r5ef0e202l5e1a61031fb80c2@mail.gmail.com> Message-ID: On Thu, Mar 5, 2009 at 6:58 AM, Peter wrote: > On Thu, Feb 19, 2009 at 10:25 AM, Peter wrote: >> >> Since this thread last year, there have been no objections. ?Following >> a recent question on the main mailing list about how to determine the >> version of Biopython this seems worth doing before the next release. >> Again, an objections or comments on the implementation details? >> Otherwise I'll make this change shortly. >> > > Changes made in CVS, and updated the release instructions: > http://biopython.org/wiki/Building_a_release > > In between releases, should we leave the __version__ as is, or > explicitly update it to be something like "1.49+" just after releasing > 1.49? ?This only affects people installing Biopython from CVS, so they > should be technically inclined... > > Peter > I agree that it would be helpful to distinguish between an official release and a build from the CVS. Furthermore, it would then be important to know when the build from CVS was done at least relative to the official releases. So I think you tending to have a numbering scheme like: 1.49 is an official release 1.49+ (or similar) is CVS after the 1.49 official release but before the next official release 1.50. 1.50 will be an official release 1.50+ (or similar) is the CVS after the 1.50 official release but before the next official release whatever number it will be. If so the release instructions should also include an instruction to change the CVS numbering in the version in __init__.py files after release has been made. Also, after looking at the release instructions shouldn't BioSQL and Doc also have version-related information? Ideally the Biopython BioSQL code should have some connection to the main version of BioSQL - I don't use it so it is not an issue for me (yet). Bruce From biopython at maubp.freeserve.co.uk Thu Mar 5 12:50:04 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 5 Mar 2009 17:50:04 +0000 Subject: [Biopython-dev] determining the version In-Reply-To: References: <320fb6e00809241412r54c2a3a1mc69f3e573f1eaac7@mail.gmail.com> <63700.34226.qm@web62405.mail.re1.yahoo.com> <320fb6e00809250222h3d0d15bw763446b5f0ec44d1@mail.gmail.com> <320fb6e00810010929y4dab07a5ya25767cc0818654d@mail.gmail.com> <320fb6e00902190225o34092311saddf02ec39f1e1dd@mail.gmail.com> <320fb6e00903050458r5ef0e202l5e1a61031fb80c2@mail.gmail.com> Message-ID: <320fb6e00903050950k4d0cce9i1fe1442e15cf9cf7@mail.gmail.com> On Thu, Mar 5, 2009 at 4:59 PM, Bruce Southey wrote: > > I agree that it would be helpful to distinguish between an official > release and a build from the CVS. Furthermore, it would then be > important to know when the build from CVS was done at least relative > to the official releases. > > So I think you tending to have a numbering scheme like: > 1.49 is an official release > 1.49+ (or similar) is CVS after the 1.49 official release but before > the next official release 1.50. > 1.50 will be an official release > 1.50+ (or similar) is the CVS after the 1.50 official release but > before the next official release whatever number it will be. That is one of the two suggestions I was putting forward. The other was just leaving the version number as that of the most recent release - people should know if they are running CVS as this has to be done deliberately. One tiny downside is the "+" gets turned into an underscore for filenames (e.g. egg files, and I assume a windows installer), but we won't be releasing those so that doesn't matter. > If so the release instructions should also include an instruction to > change the CVS numbering in the version in __init__.py files after > release has been made. Yes - assuming people are happy with this suggested scheme. Note that if we switch to SVN, something automated with the SVN revision number might be possible. > Also, after looking at the release instructions shouldn't BioSQL and > Doc also have version-related information? > Ideally the Biopython BioSQL code should have some connection to the > main version of BioSQL - I don't use it so it is not an issue for me > (yet). Because Bio/* and BioSQL/* are always shipped and packaged together, to my mind they together make up Biopython and share the same version number. As to why BioSQL is top level rather than being Bio.BioSQL, it was long ago and I have no idea. For the documentation, recent releases of the tutorial have included the target version of Biopython together with the date. Again, this should be in the release instructions. Peter From bugzilla-daemon at portal.open-bio.org Thu Mar 5 12:54:01 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 5 Mar 2009 12:54:01 -0500 Subject: [Biopython-dev] [Bug 2767] Bio.SeqIO support for FASTQ and QUAL files In-Reply-To: Message-ID: <200903051754.n25Hs1cW030546@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2767 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Attachment #1251 is|0 |1 obsolete| | ------- Comment #9 from biopython-bugzilla at maubp.freeserve.co.uk 2009-03-05 12:54 EST ------- Created an attachment (id=1258) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1258&action=view) Read/write support for FASTQ and QUAL files, using the letter_annotations dict Small update to earlier version, with minor comment changes. Also includes explicit rounding of scores to the nearest integer when writing out PHRED scores in Solexa format (and vice versa). This conversion still needs verifying against real world examples. I've been testing with real world PHRED based files only so far. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Mar 6 11:08:50 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 6 Mar 2009 11:08:50 -0500 Subject: [Biopython-dev] [Bug 2779] Seq.count() docstring should note unexpected behaviour In-Reply-To: Message-ID: <200903061608.n26G8oL9003353@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2779 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #9 from biopython-bugzilla at maubp.freeserve.co.uk 2009-03-06 11:08 EST ------- I have updated the docstrings in CVS to stress that like the python string a non-overlapping count is used, marking this bug as fixed. >From the mailing list discussion having a overlapping count available would be a welcome enhancement, perhaps as a separate method, e.g. overlapping_count. Leighton's patch or Sebastian's code in Bio/SeqUtils/MeltingTemp.py could be used for the implementation. We can do this on a new enhancement bug, once a consensus is reached on the mailing list. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Mar 6 12:34:58 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 6 Mar 2009 12:34:58 -0500 Subject: [Biopython-dev] [Bug 2783] New: Using alternative start codons in Bio.Seq translate method/function Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2783 Summary: Using alternative start codons in Bio.Seq translate method/function Product: Biopython Version: Not Applicable Platform: All OS/Version: All Status: NEW Severity: enhancement Priority: P2 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: biopython-bugzilla at maubp.freeserve.co.uk This bug covers an issue originally raised on Bug 2381. This bug is specifically for how to translates a CDS using a non-standard start codon (a codon which doesn't normally encode methionine). In computing, we often blindly translate without worrying about start codons. For example, you might translated a whole genomes (in all six frames) as part of looking for open reading frames. Translating a partial CDS where the start is missing is another example. The current Bio.Seq translation functionality supports these usages. In real biology however, translation from RNA to amino acids always starts at a initiation/start codon (typically AUG) which becomes the methionine at the start of the protein. In eukaryotes, usually the only start codon is AUG, and it normally encodes methionine, so this doesn't seem special. However, in many organisms there are lots of genes with a alternative start/initiation codons which do NOT normally encode methionine. However, when they are used as a start/initiation code they DO get translated as methionine! For example, there are 418 annotated genes in E. coli K12 with non-standard start codons - which you might want to translate into proteins (which *should* start with a methionine). For example, using the following NCBI FASTA file of CDS sequences, ftp://ftp.ncbi.nih.gov/genomes/Bacteria/Escherichia_coli_K12_substr__MG1655 Here is the CDS for gene yaaX: >ref|NC_000913.2|:5234-5530 GTGAAAAAGATGCAATCTATCGTACTCGCACTTTCCCTGGTTCTGGTCGCTCCCATGGCA GCACAGGCTGCGGAAATTACGTTAGTCCCGTCAGTAAAATTACAGATAGGCGATCGTGAT AATCGTGGCTATTACTGGGATGGAGGTCACTGGCGCGACCACGGCTGGTGGAAACAACAT TATGAATGGCGAGGCAATCGCTGGCACCTACACGGACCGCCGCCACCGCCGCGCCACCAT AAGAAAGCTCCTCATGATCATCACGGCGGTCATGGTCCAGGCAAACATCACCGCTAA This starts GTC which is a valid bacterial start codon. I'd like to be able to translate this and get the actual biologically relevant protein as given in the GenBank file NC_000913.gbk (with or without the stop symbol at the end), which starts with "M" not "V": CDS 5234..5530 /gene="yaaX" /locus_tag="b0005" /codon_start=1 /transl_table=11 /product="predicted protein" /protein_id="NP_414546.1" /db_xref="ASAP:ABE-0000015" /db_xref="UniProtKB/Swiss-Prot:P75616" /db_xref="GI:16127999" /db_xref="ECOCYC:G6081" /db_xref="EcoGene:EG14384" /db_xref="GeneID:944747" /translation="MKKMQSIVLALSLVLVAPMAAQAAEITLVPSVKLQIGDRDNRGY YWDGGHWRDHGWWKQHYEWRGNRWHLHGPPPPPRHHKKAPHDHHGGHGPGKHHR" Without any non-standard start codon support, my translations start with a V (rather than the desired M): >>> from Bio.Seq import Seq >>> yaaX = Seq("GTGAAAAAGATGCAATCTATCGTACTCGCACTTTCCCTGGTTCTGGTCGCTCCCATGGCA" ... "GCACAGGCTGCGGAAATTACGTTAGTCCCGTCAGTAAAATTACAGATAGGCGATCGTGAT" ... "AATCGTGGCTATTACTGGGATGGAGGTCACTGGCGCGACCACGGCTGGTGGAAACAACAT" ... "TATGAATGGCGAGGCAATCGCTGGCACCTACACGGACCGCCGCCACCGCCGCGCCACCAT" ... "AAGAAAGCTCCTCATGATCATCACGGCGGTCATGGTCCAGGCAAACATCACCGCTAA") >>> print yaaX.translate(table=11) VKKMQSIVLALSLVLVAPMAAQAAEITLVPSVKLQIGDRDNRGYYWDGGHWRDHGWWKQHYEWRGNRWHLHGPPPPPRHHKKAPHDHHGGHGPGKHHR* >>> print yaaX.translate(table=11, to_stop=True) VKKMQSIVLALSLVLVAPMAAQAAEITLVPSVKLQIGDRDNRGYYWDGGHWRDHGWWKQHYEWRGNRWHLHGPPPPPRHHKKAPHDHHGGHGPGKHHR These start with "V", while in this situation I want an "M" because I know this is a full CDS and the first codon is a start codon. I therefore want to add an optional argument to the Seq object's translate method (and the Bio.Seq.translate function) so that I can obtain the desired results (both with and without the terminator stop symbol). I want an option to tell Biopython that this sequence commences with a start/initiation codon: >>> print yaaX.translate(table=11, with_start_codon=True) MKKMQSIVLALSLVLVAPMAAQAAEITLVPSVKLQIGDRDNRGYYWDGGHWRDHGWWKQHYEWRGNRWHLHGPPPPPRHHKKAPHDHHGGHGPGKHHR* >>> print yaaX.translate(table=11, to_stop=True, with_start_codon=True) MKKMQSIVLALSLVLVAPMAAQAAEITLVPSVKLQIGDRDNRGYYWDGGHWRDHGWWKQHYEWRGNRWHLHGPPPPPRHHKKAPHDHHGGHGPGKHHR I have in the above example called this new argument "with_start_codon", but I am open to naming suggestions. If False (default), then nothing changes. If the new argument is True, this indicates that the first codon should be a valid start/initiation codon (in the declared translation table), and that it should be translated as a methionine. I will upload a patch implementing this in a moment... This proposal is NOT about an option to have the translate function/method search the sequence for the first valid start codon (either in frame or not). This proposal is NOT about an option to check the sequence is a valid CDS (i.e. starts with a start codon, ends with an in frame stop codon, and has no internal premature stop codons), and then translating it. While this makes sense (and BioPerl does this), this would prevent certain uses. e.g. a partial CDS sequence where the 3' end is missing. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Mar 6 12:36:24 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 6 Mar 2009 12:36:24 -0500 Subject: [Biopython-dev] [Bug 2783] Using alternative start codons in Bio.Seq translate method/function In-Reply-To: Message-ID: <200903061736.n26HaOWH012440@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2783 ------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2009-03-06 12:36 EST ------- Created an attachment (id=1259) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1259&action=view) Patch for Bio/Seq.py to support non-standard start codons in translation Patch implementing my proposed change, based my earlier patch attachment 1040 on Bug 2381. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Mar 6 12:38:39 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 6 Mar 2009 12:38:39 -0500 Subject: [Biopython-dev] [Bug 2381] translate and transcribe methods for the Seq object (in Bio.Seq) In-Reply-To: Message-ID: <200903061738.n26Hcd04012626@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2381 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #55 from biopython-bugzilla at maubp.freeserve.co.uk 2009-03-06 12:38 EST ------- I'm closing this bug as basic translate and transcribe methods where included with Biopython 1.49. I have filed Bug 2381 for "Using alternative start codons in Bio.Seq translate method/function". -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Mar 6 12:43:25 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 6 Mar 2009 12:43:25 -0500 Subject: [Biopython-dev] [Bug 2783] Using alternative start codons in Bio.Seq translate method/function In-Reply-To: Message-ID: <200903061743.n26HhPRX013186@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2783 ------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk 2009-03-06 12:43 EST ------- (In reply to comment #1) > Created an attachment (id=1259) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1259&action=view) [details] > Patch for Bio/Seq.py to support non-standard start codons in translation > > Patch implementing my proposed change, based my earlier patch > attachment 1040 [details] on Bug 2381. Actually, it was based on the patch in attachment 1032 (not 1040) on Bug 2381. Other names proposed for this new argument included: init - rejected as potentially confusing force_methionine - possible, but implies any codon would be allowed even something that isn't a valid start codon alt_start - perhaps confusing? -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Mar 6 14:54:17 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 6 Mar 2009 14:54:17 -0500 Subject: [Biopython-dev] [Bug 2783] Using alternative start codons in Bio.Seq translate method/function In-Reply-To: Message-ID: <200903061954.n26JsHK4026141@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2783 ------- Comment #3 from eric.talevich at gmail.com 2009-03-06 14:54 EST ------- (In reply to comment #2) How about require_start? Or require_met, if you don't mind how strange it looks as English. The name with_start_codon seems like it would take a codon or alternate table as the argument. I also see two choices being made by using this parameter: (1) Check that the sequence starts with a valid start codon, and if not, raise an exception; (2) Use a set of alternate genetic codes for looking up the initial methionine. >From the other bug's discussion it seems like there are a number of boolean options that could reasonably be used with the translate() method, but adding them all as keyword args would clutter up the API. What about using a bitmask in Bio.Seq that can be used with translate()? The re module takes a bitmask as the last parameter for most functions, for example, and it looks pretty clean compared to a series of boolean keyword args. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From mjldehoon at yahoo.com Sun Mar 8 08:03:31 2009 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Sun, 8 Mar 2009 05:03:31 -0700 (PDT) Subject: [Biopython-dev] ScanProsite In-Reply-To: <320fb6e00903020226n3e5929ean957f38315c28d863@mail.gmail.com> Message-ID: <956971.84123.qm@web62404.mail.re1.yahoo.com> --- On Mon, 3/2/09, Peter wrote: > I like your suggestion to have a REST XML based module > under Bio.ExPASy, which means we can deprecate the HTML based > Bio.Prosite module and in the process make the top level list of > modules in Biopython a bit shorter. In the long term I think that > will help people find functionality. > Then, how about the following code organization: Bio/ExPASy/__init__.py contains get_prodoc_entry Interface to the get-prodoc-entry CGI script. get_prosite_entry Interface to the get-prosite-entry CGI script. get_prosite_raw Interface to the get-prosite-raw CGI script. get_sprot_raw Interface to the get-sprot-raw CGI script. sprot_search_ful Interface to the sprot-search-ful CGI script. sprot_search_de Interface to the sprot-search-de CGI script. (currently in Bio/ExPASy.py) Bio/ExPASy/Prosite.py contains read(), parse(), Record for Prosite files (currently in Bio/Prosite/__init__.py), as well as a Pattern class to handle Prosite patterns (currently in Bio/Prosite/Pattern.py, but this seems to be unused). Bio/ExPASy/Prodoc.py contains read(), parse(), Record for Prosite documentation files (currently in Bio/Prosite/Prodoc.py) Bio/ExPASy/ScanProsite contains scan(), read(), Record to interact with ScanProsite (currently a broken version to access ScanProsite and parse its results exists in Bio/ExPASy.py and Bio/Prosite/__init__.py). I have a simplified version of the Prosite and Prodoc parsers. If we use the scheme above, I'll put the new version in Bio/ExPASy/Prosite.py and Bio/ExPASy/Prodoc.py, and deprecate Bio.Prosite. --Michiel. From biopython at maubp.freeserve.co.uk Tue Mar 10 16:29:54 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 10 Mar 2009 20:29:54 +0000 Subject: [Biopython-dev] [Utilities-announce] NCBI E-Utilities requirements updated In-Reply-To: <320fb6e00903050440h138893b3yb2484a557621fc41@mail.gmail.com> References: <7B6F170840CA6C4DA63EE0C8A7BB43EC051985F6@NIHCESMLBX15.nih.gov> <320fb6e00903050440h138893b3yb2484a557621fc41@mail.gmail.com> Message-ID: <320fb6e00903101329i69e40fc0i6a2b13332df55e7a@mail.gmail.com> On Thu, Mar 5, 2009 at 12:40 PM, Peter wrote: > This email was sent out a few weeks ago, but it took a while before > the NCBI webpage was actually updated (maybe a caching issue) so I > didn't rush to relax our rules immediately. > > Under the new rules we must make no more than three requests every > second. We could track the times of the last two requests in order to > enforce this as worded, but I think it would be simpler just to switch > from using a minimum 3 second pause between Bio.Entrez requests to > just a minimum 0.33334 second pause. This is a much simpler code > change and will comply with the new relaxed rules. > > Unless anyone has a counter suggestion, I will update Bio.Entrez and > the tutorial shortly. Change made in CVS, including the tutorial. Peter From bugzilla-daemon at portal.open-bio.org Tue Mar 10 16:36:28 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 10 Mar 2009 16:36:28 -0400 Subject: [Biopython-dev] [Bug 2762] GFF capability in SeqIO In-Reply-To: Message-ID: <200903102036.n2AKaSje008217@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2762 ------- Comment #5 from biopython-bugzilla at maubp.freeserve.co.uk 2009-03-10 16:36 EST ------- For anyone following this bug, Brad has some related code posted on his blog - see this mailing list discussion: http://lists.open-bio.org/pipermail/biopython/2009-March/004983.html -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Mar 10 16:49:30 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 10 Mar 2009 16:49:30 -0400 Subject: [Biopython-dev] [Bug 2783] Using alternative start codons in Bio.Seq translate method/function In-Reply-To: Message-ID: <200903102049.n2AKnUoD009300@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2783 ------- Comment #4 from biopython-bugzilla at maubp.freeserve.co.uk 2009-03-10 16:49 EST ------- On comment #3, Eric wrote: > > How about require_start? Or require_met, if you don't mind how strange > it looks as English. The name with_start_codon seems like it would take > a codon or alternate table as the argument. I think "require_start" is OK. Or "require_start_codon". > I also see two choices being made by using this parameter: > (1) Check that the sequence starts with a valid start codon, and > if not, raise an exception; That is what my patch does. Plus of course translating the valid start codon as a methionine. > (2) Use a set of alternate genetic codes for looking up the initial > methionine. I'm unsure what you mean here. If you mean actually having the translate method search for the first valid start codon, I am really not keen on this at all. This is complicated, and verges on gene/ORF finding, which I specifically wanted to avoid: Peter wrote in comment #0: >> This proposal is NOT about an option to have the translate >> function/method search the sequence for the first valid >> start codon (either in frame or not). On comment #3, Eric wrote: > From the other bug's discussion it seems like there are a number of boolean > options that could reasonably be used with the translate() method, but adding > them all as keyword args would clutter up the API. What about using a bitmask > in Bio.Seq that can be used with translate()? The re module takes a bitmask as > the last parameter for most functions, for example, and it looks pretty clean > compared to a series of boolean keyword args. I agree that there is a risk of confusion with too many arguments. But I don't think a bitmask would help - I think it makes it worse! I'm not saying its a good thing, but we have lots of functions/methods in Biopython already with lots of arguments (e.g. the standalone BLAST wrappers, or the Bio.Entrez functions). On the other hand, I can't immediately think of a single python function which uses a bitmask. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython at maubp.freeserve.co.uk Tue Mar 10 19:40:29 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 10 Mar 2009 23:40:29 +0000 Subject: [Biopython-dev] Bio.Entrez catching more errors Message-ID: <320fb6e00903101640s5db8ed9hc1335d02f5e4123@mail.gmail.com> Hi All, It occured to me that the Bio.Entrez._open function can look at the retmode argument (if present) and spot if there is a mismatch between the requested format (e.g. XML, HTML, text or asn.1) and the actual data the NCBI returned. Something along the following lines could be added to the end of the _open function in Bio/Entrez/__init__.py to acheive this: elif "retmode" in params and params["retmode"].lower()=="html" \ and not data.lower().startswith(">> print Entrez.efetch(db="homologene", id="nonexistant", retmode="text").read() >>> print Entrez.efetch(db="homologene", id="nonexistant", retmode="asn.1").read() Similarly, these give an XML like fragment (which is not a valid XML file in itself - arguably an NCBI bug; some databases like "protein" are better behaved in this respect): >>> print Entrez.efetch(db="pubmed", id="nonexistant", retmode="xml").read() >>> print Entrez.efetch(db="homologene", id="nonexistant", retmode="xml").read() >>> print Entrez.efetch(db="cdd", id="nonexistant", retmode="xml").read() >>> print Entrez.efetch(db="taxonomy", id="nonexistant", retmode="xml").read() My suggested change to Bio.Entrez would also catch the following examples (using an invalid database) where the NCBI ignore the retmode and return an HTML help page: >>> print Entrez.efetch(db="nonexistant", id="123456", retmode="xml").read() >>> print Entrez.efetch(db="nonexistant", id="123456", retmode="text").read() In a less clear cut example, this would flag the following as an error as the NCBI seem to return ASN.1 text instead of HTML here:: >>> print Entrez.efetch(db="nucleotide", retmode="html", id="123456").read() Overall, I think this change should catch lots of errors which otherwise may not be detected until later (e.g. while trying to parse the file). -------------------------------------------------------------------------------------------------- On another point, should we catch these responses as errors:? >>> efetch(db="snp", id="123456").read() 'PmFetch response\n
\n1:
id: 123456 Error occurred: cannot get document
summary\n
' >>> efetch(db="snp", id="123456", retmode="html").read() 'PmFetch response\n
\n1:
id: 123456 Error occurred: cannot get document
summary\n
' >>> efetch(db="snp", id="123456", retmode="xml").read() '\n1: id: 123456 Error occurred: cannot get document summary\n\n' >>> efetch(db="snp", id="123456", retmode="text").read() '1: id: 123456 Error occurred: cannot get document summary\n' and, >>> print efetch(db="homologene", retmode="html", id="fake").read()

Error occurred: Empty id list - nothing todo

... Looking for the string "Error occurred: " looks fairly safe here, and should cover a range of entries. Of course, you can imagine false positives too, e.g. a valid PUBMED plain text record for a tutorial article with a title like "Yikes! An Error Occurred: A beginner's Guide To Defensive Programming." could match. Peter From lorena.carlo at gmail.com Wed Mar 11 11:58:24 2009 From: lorena.carlo at gmail.com (=?ISO-8859-1?Q?Lorena_Carl=F3?=) Date: Wed, 11 Mar 2009 09:58:24 -0600 Subject: [Biopython-dev] function to map uniprot IDs with PDB IDs Message-ID: <22d7b0c30903110858x4125dc25v24fafec3d561209e@mail.gmail.com> Hi all, I would like to know if there is an implemented function in Biopython that allows getting the PDB id from a Uniprotkb ID?. Thanks, Lorena From biopython at maubp.freeserve.co.uk Wed Mar 11 12:12:36 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 11 Mar 2009 16:12:36 +0000 Subject: [Biopython-dev] function to map uniprot IDs with PDB IDs In-Reply-To: <22d7b0c30903110858x4125dc25v24fafec3d561209e@mail.gmail.com> References: <22d7b0c30903110858x4125dc25v24fafec3d561209e@mail.gmail.com> Message-ID: <320fb6e00903110912g717ccb52q4242a6ff169b5d1f@mail.gmail.com> On Wed, Mar 11, 2009 at 3:58 PM, Lorena Carl? wrote: > Hi all, > > I would like to know if there is an implemented function in Biopython that > allows getting the PDB id from a Uniprotkb ID?. > > Thanks, > Lorena There isn't a simple one-to-one mapping from a UniProtKB/Swiss-Prot ID to a PDB ID, see http://www.uniprot.org/faq/2 Are you working from UniProtKB/Swiss-Prot files? How about something like this: # This assumes you have downloaded the following file # to your working directory: # http://www.uniprot.org/uniprot/P00734.txt from Bio import SeqIO record = SeqIO.read(open("P00734.txt"),"swiss") for xref in record.dbxrefs : if xref.startswith("PDB:") : print xref.split(":",1)[1] Peter P.S. This is more a question for the main discussion list, rather than Biopython development From bugzilla-daemon at portal.open-bio.org Wed Mar 11 19:39:02 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 11 Mar 2009 19:39:02 -0400 Subject: [Biopython-dev] [Bug 2788] New: Bio.Nexus.Trees newick parser crash Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2788 Summary: Bio.Nexus.Trees newick parser crash Product: Biopython Version: 1.49 Platform: Macintosh OS/Version: Mac OS Status: NEW Severity: normal Priority: P2 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: matzke at berkeley.edu The newick files I have been working with seem to open fine in several different programs/packages (Dendroscope, R's APE package, phylocom, python alfacinha module), but not the newick parser in Bio.Nexus.Trees. Rather than upload a file I've got the full newick string hard-coded below: ============ from Bio.Nexus.Trees import Tree tree_str = '(((((((((((((((((Sambucus:43.136024,Viburnum:43.136040)Adoxaceae:53.892513,(Acanthopanax:34.719704,Aralia:34.719727,Dendropanax:34.719727,Evodiopanax:34.719727,Kalopanax:34.719727,Schefflera:34.719727)Araliaceae:62.308830):7.045975,Ilex:104.074516):3.056864,((((((Catalpa:22.623766,Paulownia:22.623785)Bignoniaceae:22.623766,(Clerodendrum:19.864199,Premna:19.864218)Verbenaceae:25.383331):22.378326,(Chionanthus:29.443968,Forestiera:29.443979,Fraxinus:29.443979,Ligustrum:29.443979,Osmanthus:29.443979,Syringa:29.443979)Oleaceae:38.181892):19.113832,(Adina:38.252457,Cephalanthus:38.252472,Emmenopterys:38.252472,Pinckneya:38.252472,Randia:38.252472)Rubiaceae:48.487236):2.360018,Ehretia:89.099709):13.495450,Eucommia:102.595161):4.536214):0.905059,((((Clethra:78.134140,((Cliftonia:38.402752,Cyrilla:38.402775)Cyrillaceae:38.402752,(Arbutus:38.402752,Elliottia:38.402775,Enkianthus:38.402775,Kalmia:38.402775,Lyonia:38.402775,Oxydendrum:38.402775,Rhododendron:38.402775,Vaccinium:38.402775)Ericaceae:38.402752):1.328631):12.980787,(((Halesia:30.391993,Pterostyrax:30.392012,Styrax:30.392012)Styracaceae:51.775261,Symplocos:82.167252):0.000000,(Camellia:41.083626,Franklinia:41.083649,Gordonia:41.083649,Stewartia:41.083649,Ternstroemia:41.083649)Theaceae:41.083626):8.947675):0.000149,Diospyros:91.115099):2.023849,((Ardisia:18.344650,Myrsine:18.344666)Myrsinaceae:74.794174,Bumelia:93.138824):0.000101):14.897509):1.462594,((Alangium:48.167362,Aucuba:48.167370,Cornus:48.167370,Macrocarpium:48.167370,Torricellia:48.167370)Cornaceae:53.025345,(Hydrangea:97.032310,(Davidia:48.516151,Nyssa:48.516167)Nyssaceae:48.516151):4.160399):8.306321):7.064716,Schoepfia:116.563736):0.000000,((((Altingia:50.813206,Liquidambar:50.813213)Altingiaceae:50.813206,(Disanthus:50.813206,Distylium:50.813213,Fortuneria:50.813213,Hamamelis:50.813213,Loropetalum:50.813213,Sinowilsonia:50.813213)Hamamelidaceae:50.813206):0.000131,(Cercidiphyllum:87.828712,Daphniphyllum:87.828712):13.797829):13.247040,(((((((Choerospondias:21.440735,Cotinus:21.440742,Pist! acia:21. 440742,Rhus:21.440742,Toxicodendron:21.440742)Anacardiaceae:37.304596,(Acer:29.372665,Aesculus:29.372681,Dipteronia:29.372681,Koelreuteria:29.372681,Sapindus:29.372681)Sapindaceae:29.372665):0.000114,((Cedrela:49.350353,(Ailanthus:24.675177,Leitneria:24.675188,Picrasma:24.675188)Simaroubaceae:24.675177):4.016092,(Evodia:26.683222,Phellodendron:26.683233,Ptelea:26.683233,Zanthoxylum:26.683233)Rutaceae:26.683222):5.379002):29.842871,(Firmiana:32.917126,Tilia:32.917149)Malvaceae:55.671188):12.661992,(Lagerstroemia:84.110847,Szyzygium:84.110847):17.139463):2.612011,((((((Alnus:16.609535,Betula:16.609543,Carpinus:16.609543,Corylus:16.609543,Ostrya:16.609543)Betulaceae:37.306709,((Carya:25.504854,Cyclocarya:25.504866,Juglans:25.504866,Engelhardtia:25.504866,Platycarya:25.504866,Pterocarya:25.504866)Juglandaceae:25.504854,Myrica:51.009708):2.906531):9.893459,(Castanea:31.904850,Castanopsis:31.904873,Cyclobalanopsis:31.904873,Fagus:31.904873,Lithocarpus:31.904873,Quercus:31.904873)Fagaceae:31.904850):21.681023,(((((Celtis:20.739927,Pteroceltis:20.739939)Cannabaceae:20.739927,((Broussounetia:12.614990,Cudrania:12.615005,Maclura:12.615005,Morus:12.615005)Moraceae:12.614990,Oreocnide:25.229980):16.249876):10.909924,(Aphananthe:26.194889,Hemiptelea:26.194897,Planera:26.194897,Ulmus:26.194897,Zelkova:26.194897)Ulmaceae:26.194889):11.649286,(Hovenia:32.019470,Rhamnus:32.019493,Ziziphus:32.019493)Rhamnaceae:32.019596):8.938065,(((Amelanchier:36.488564,(Crataegus:36.488586,Mespilus:36.488586):0.000000):0.000000,Chaenomeles:36.488586,Eriobotrya:36.488586,Malus:36.488586,Photinia:36.488586,Pyrus:36.488586,Sorbus:36.488586):0.000000,Prunus:36.488586)Rosaceae:36.488564):12.513593):4.616908,(Albizia:31.901920,Cercis:31.901943,Cladrastis:31.901943,Dalbergia:31.901943,Erythrina:31.901943,Gleditsia:31.901943,Gymnocladus:31.901943,Laburnum:31.901943,Maackia:31.901943,Ormosia:31.901943,Robinia:31.901943,Sophora:31.901943)Fabaceae:58.205711):4.139401,((Euonymus:90.433327,Sloanea:90.433327):0.000101,((Mallotus:28.689901,Sapium:28.6! 89920)Eu phorbiaceae:50.330055,(Idesia:29.019764,Poliothyrsis:29.019779,Populus:29.019779,Salix:29.019779,Xylosma:29.019779)Salicaceae:50.000195):11.413469):3.813607):9.615288):0.000000,(Staphylea:21.372393,Tapiscia:21.372404,Turpinia:21.372404)Staphyleaceae:82.489929):11.011259):1.690163):7.829397,Buxus:124.393143):0.000000,Tetracentron:124.393143):2.763555,Meliosma:127.156693):1.664427,Platanus:128.821121):2.029122,Euptelea:130.850250):11.447736,((Asimina:95.972672,(Liriodendron:47.125092,Magnolia:47.125114,Manglieita:47.125114,Michelia:47.125114)Magnoliaceae:48.847580):46.325292,(Actinodaphne:49.903526,Cinnamomum:49.903542,Lindera:49.903542,Litsea:49.903542,Machilus:49.903542,Neolitsea:49.903542,Nothaphoebe:49.903542,Persea:49.903542,Phoebe:49.903542,Sassafras:49.903542,Umbellularia:49.903542)Lauraceae:92.394188):0.000257):1.840266,(Yucca:110.138222,((Sabal:100.000000,(Serenoa:95.000000,Trachycarpus:95.000000)ST:5.000000)Arecaceae:10.000000,(Arundinaria:20.476601,Phyllostachys:20.476624,Semiarundinaria:20.476624)Poaceae:89.661629):0.000000):34):30.861772,Illicium:175.000000)aus2ast:175.000000,(((((Cephalotaxus:125.000000,(Taxus:100.000000,Torreya:100.000000)TT1:25.000000)Taxaceae:90.000000,((((((((Calocedrus:85.000000,Platycladus:85.000000)CP:5.000000,(Cupressus:85.000000,Juniperus:85.000000)CJ:5.000000)CJCP:5.000000,Chamaecyparis:95.000000)CCJCP:5.000000,(Thuja:7.870000,Thujopsis:7.870000)TT2:92.13)CJCPTT:30.000000,((Cryptomeria:120.000000,Taxodium:120.000000)CT:5.000000,Glyptostrobus:125.000000)CTG:5.000000)CupCallTax:5.830000,((Metasequoia:125.000000,Sequoia:125.000000)MS:5.000000,Sequoiadendron:130.000000)Sequoioid:5.830000)STCC:49.060001,Taiwania:184.889999)Taw+others:15.110000,Cunninghamia:200.000000)nonSci:15.000000)Tax+nonSci:10.000000,Sciadopitys:225.000000):25.000000,(((Abies:106.000000,Keteleeria:106.000000)AK:54.000000,(Pseudolarix:156.000000,Tsuga:156.000000)NTP:4.000000)NTPAK:24.000000,((Larix:87.000000,Pseudotsuga:87.000000)LP:81.000000,(Picea:155.000000,Pinus:155.000000)PPC:13.000000)Pinoideae:! 16.00000 0)Pinaceae:66.000000)Coniferales:25.000000,Ginkgo:275.000000)gymnosperm:75.000000)seedplant:50.000000;' tree_obj = Tree(tree_str) print tree_obj ============ This brings up the follow error for "tree_obj = Tree(tree_str)": ======== ValueError: invalid literal for float(): seedplant ======== It looks like it is looking for a floating point number where "seedplant" is. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Mar 12 06:17:01 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 12 Mar 2009 06:17:01 -0400 Subject: [Biopython-dev] [Bug 2788] Bio.Nexus.Trees newick parser crash In-Reply-To: Message-ID: <200903121017.n2CAH13S012060@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2788 ------- Comment #1 from cymon.cox at gmail.com 2009-03-12 06:17 EST ------- (In reply to comment #0) > The newick files I have been working with seem to open fine in several > different programs/packages (Dendroscope, R's APE package, phylocom, python > alfacinha module), but not the newick parser in Bio.Nexus.Trees. [a big tree] > tree_obj = Tree(tree_str) > > print tree_obj > ============ > > > This brings up the follow error for "tree_obj = Tree(tree_str)": > ======== > ValueError: invalid literal for float(): seedplant > ======== > > It looks like it is looking for a floating point number where "seedplant" is. Your tree is decorated with node labels, which the parser cannot handle. This came up recently (within the last year?) but I can't find the bug/message. Should probably catch this and return an informative error - or implement node labels... C. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Mar 12 06:38:59 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 12 Mar 2009 06:38:59 -0400 Subject: [Biopython-dev] [Bug 2788] Bio.Nexus.Trees newick parser does not support internal node labels In-Reply-To: Message-ID: <200903121038.n2CAcxMR014167@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2788 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Severity|normal |enhancement OS/Version|Mac OS |All Platform|Macintosh |All Summary|Bio.Nexus.Trees newick |Bio.Nexus.Trees newick |parser crash |parser does not support | |internal node labels ------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk 2009-03-12 06:38 EST ------- I thought it looked familiar, but I must have only searched the currently open bugs. This looks *very* similar to Bug 2543 which dealt with internal node names, which was fixed for Biopython 1.49 (and 1.49 beta). Frank wrote: > Nexus.Trees has been extended to deal with internal node names, or "special > comments" in the format [& blablalba]. Such comments comments can appear > directly after the taxon label, after the closing parentheses, or between > branchlength / support values attached to a node or a taxon labels, ... i.e. On Bug 2543, Frank didn't go as far as the enhancement to cope with "naked" node labels, just those in the square brackets. Consider this smaller example Cymon gave on Bug 2543: >>> from Bio.Nexus.Trees import Tree >>> tree_str2 = "(((t9:0.385832, (t8:0.445135,t4:0.41401)C:0.024032)B:0.041436, t6:0.392496)A:0.0291131, t2:0.497673, ((t0:0.301171, t7:0.482152)E:0.0268148, ((t5:0.0984167,t3:0.488578)G:0.0349662, t1:0.130208)F:0.0318288)D:0.0273876);" >>> tree_obj = Tree(tree_str2) Traceback (most recent call last): ... ValueError: invalid literal for float(): A I've retitled this and marked it as an enhancement. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Mar 12 06:41:30 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 12 Mar 2009 06:41:30 -0400 Subject: [Biopython-dev] [Bug 2543] Bio.Nexus.Trees can't handle named ancestors In-Reply-To: Message-ID: <200903121041.n2CAfUwH014362@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2543 ------- Comment #7 from biopython-bugzilla at maubp.freeserve.co.uk 2009-03-12 06:41 EST ------- On comment #5 Frank wrote: > In my opinion, naming nodes is a feature, and I would not regard the lack of > this feature as a bug. But I'll have a look at the code and see how easy > this can be changed. It would actually be nice if P4 and Bio.Nexus, both > being python programs, could read each other's trees. This enhancement is now covered by Bug 2788. It appears that now several other programs support this Newick tree variant, making it a bit more important. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From chris.lasher at gmail.com Thu Mar 12 17:07:21 2009 From: chris.lasher at gmail.com (Chris Lasher) Date: Thu, 12 Mar 2009 17:07:21 -0400 Subject: [Biopython-dev] biopython on github In-Reply-To: <320fb6e00902260600p5fb90241td1ded497c08cb901@mail.gmail.com> References: <5aa3b3570902150729g367022a5p334b2c33f86461f@mail.gmail.com> <320fb6e00902230731h6257376sb2d6772f72b6e03a@mail.gmail.com> <3f6baf360902230843u320e9fe9wc0a03928383d6cbb@mail.gmail.com> <320fb6e00902230908j38f5755la85a55bfc461a763@mail.gmail.com> <8b34ec180902231029u7a9d003r533af7f078f4a8e2@mail.gmail.com> <8b34ec180902250140k4fb1bef0y913b97db0e309e4b@mail.gmail.com> <320fb6e00902250210t2ad19536ke379e219ba6f7dae@mail.gmail.com> <8b34ec180902250256k6f6f5c1bvbf85d8b68a315927@mail.gmail.com> <8b34ec180902260526m3ff42f3x2a99a77d4d0fb928@mail.gmail.com> <320fb6e00902260600p5fb90241td1ded497c08cb901@mail.gmail.com> Message-ID: <128a885f0903121407g133ed8ctda57b21ff8adb70e@mail.gmail.com> On Thu, Feb 26, 2009 at 10:00 AM, Peter wrote: > Another option to consider would be to switch to running git on > biopython.org, but use the git-cvsserver tool to provide an emulated > CVS server on top of the git repository. ?This sounds possible in > theory, and would be nice for any "old fashioned" biopython developers > because is should be fairly transparent - they can continue to treat > it as CVS and just work on the main trunk. ?This would require someone > competent to do the conversion and alter the server setup - we'd have > to talk to the OBF team about this. ?However, if anyone has first hand > experience on git-cvsserver perhaps they could comment on weather this > sounds like a good plan or not. I must be missing something, Peter. Why would BioPython continue to operate with CVS? I suppose I just really hope to see BioPython running with something other than CVS, and I'd really like to see it go either under Bazaar or Git. Chris From bartek at rezolwenta.eu.org Thu Mar 12 19:20:23 2009 From: bartek at rezolwenta.eu.org (Bartek Wilczynski) Date: Fri, 13 Mar 2009 00:20:23 +0100 Subject: [Biopython-dev] biopython on github In-Reply-To: <128a885f0903121407g133ed8ctda57b21ff8adb70e@mail.gmail.com> References: <5aa3b3570902150729g367022a5p334b2c33f86461f@mail.gmail.com> <3f6baf360902230843u320e9fe9wc0a03928383d6cbb@mail.gmail.com> <320fb6e00902230908j38f5755la85a55bfc461a763@mail.gmail.com> <8b34ec180902231029u7a9d003r533af7f078f4a8e2@mail.gmail.com> <8b34ec180902250140k4fb1bef0y913b97db0e309e4b@mail.gmail.com> <320fb6e00902250210t2ad19536ke379e219ba6f7dae@mail.gmail.com> <8b34ec180902250256k6f6f5c1bvbf85d8b68a315927@mail.gmail.com> <8b34ec180902260526m3ff42f3x2a99a77d4d0fb928@mail.gmail.com> <320fb6e00902260600p5fb90241td1ded497c08cb901@mail.gmail.com> <128a885f0903121407g133ed8ctda57b21ff8adb70e@mail.gmail.com> Message-ID: <8b34ec180903121620w9c2ec46i8fed9ccb4781370e@mail.gmail.com> On Thu, Mar 12, 2009 at 10:07 PM, Chris Lasher wrote: > > I must be missing something, Peter. Why would BioPython continue to > operate with CVS? I suppose I just really hope to see BioPython > running with something other than CVS, and I'd really like to see it > go either under Bazaar or Git. > Hi Chris, The idea is to do the switch in two steps: - first we still have the main branch in CVS while we have git and/or bzr branches synchronized with it for people to branch and contribute - If this works nicely, we will switch to one of these systems completely (while possibly keeping the other branch in sync, but this is not yet decided) The first step is to some extent operational (I'm currently busy with other stuff, but I'll get arround it hopefully this weekend), but the second step requires decision on our side (git or bzr?) and action on the side of OBF (there is no git or bazar installed on obf servers). cheers -- Bartek Wilczynski From biopython at maubp.freeserve.co.uk Fri Mar 13 08:21:14 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 13 Mar 2009 12:21:14 +0000 Subject: [Biopython-dev] biopython on github In-Reply-To: <8b34ec180903121620w9c2ec46i8fed9ccb4781370e@mail.gmail.com> References: <5aa3b3570902150729g367022a5p334b2c33f86461f@mail.gmail.com> <320fb6e00902230908j38f5755la85a55bfc461a763@mail.gmail.com> <8b34ec180902231029u7a9d003r533af7f078f4a8e2@mail.gmail.com> <8b34ec180902250140k4fb1bef0y913b97db0e309e4b@mail.gmail.com> <320fb6e00902250210t2ad19536ke379e219ba6f7dae@mail.gmail.com> <8b34ec180902250256k6f6f5c1bvbf85d8b68a315927@mail.gmail.com> <8b34ec180902260526m3ff42f3x2a99a77d4d0fb928@mail.gmail.com> <320fb6e00902260600p5fb90241td1ded497c08cb901@mail.gmail.com> <128a885f0903121407g133ed8ctda57b21ff8adb70e@mail.gmail.com> <8b34ec180903121620w9c2ec46i8fed9ccb4781370e@mail.gmail.com> Message-ID: <320fb6e00903130521s69c5b3eg55b71191b1e8ff21@mail.gmail.com> On Thu, Mar 12, 2009 at 11:20 PM, Bartek Wilczynski wrote: > On Thu, Mar 12, 2009 at 10:07 PM, Chris Lasher wrote: >> On Thu, Feb 26, 2009 at 10:00 AM, Peter wrote: >>> Another option to consider would be to switch to running git on >>> biopython.org, but use the git-cvsserver tool to provide an emulated >>> CVS server on top of the git repository. This sounds possible in >>> theory, and would be nice for any "old fashioned" biopython developers >>> because is should be fairly transparent - they can continue to treat >>> it as CVS and just work on the main trunk. This would require someone >>> competent to do the conversion and alter the server setup - we'd have >>> to talk to the OBF team about this. However, if anyone has first hand >>> experience on git-cvsserver perhaps they could comment on weather this >>> sounds like a good plan or not. >> >> I must be missing something, Peter. Why would BioPython continue to >> operate with CVS? I suppose I just really hope to see BioPython >> running with something other than CVS, and I'd really like to see it >> go either under Bazaar or Git. I'm warming to the idea of git, and had noticed git includes the optional git-cvsserver tool which emulates a CVS server while using git underneath. I was wondering if anyone had first hand experience of this. If we did move from CVS to git (still hosted on biopython.org), this would seem to offer a nice migration path for of our "old school" CVS developers - they can carry on as usual. Of course, if none of us care about having to learn a new interface, then a simple switch would be less hassle to setup. For the server side of things, we'll need to talk to the OBF team about any such move - as far as I know they've only managed CVS to SVN migrations in the past. Peter > Hi Chris, > > The idea is to do the switch in two steps: > - first we still have the main branch in CVS while we have git and/or > bzr branches synchronized with it for people to branch and contribute > - If this works nicely, we will switch to one of these systems > completely (while possibly keeping the other branch in sync, but this > is not yet decided) That does seem like a good plan. Of course, there is the related issue of where we host the official repository (externally, e.g. on github or lauchpad) or in house (on biopython.org). I favour keeping the official repository on biopython.org but this will require OBF technical support (do we have the expertise within Biopython? Bartek? Chris?). > The first step is to some extent operational (I'm currently busy with > other stuff, but I'll get arround it hopefully this weekend), but the > second step requires decision on our side (git or bzr?) and action on > the side of OBF (there is no git or bazar installed on obf servers). There is also the previously semi-agreed solution of switching from CVS to SVN on biopython.org, but this would be only a gradual improvement. I gather there are mature tools for using git+svn together, so it should be better than using git+cvs together. Other than meaning all the OBF hosted projects are on SVN (I think we are the last still on CVS), this is beginning to seem a bit pointless. Peter From bugzilla-daemon at portal.open-bio.org Fri Mar 13 11:48:39 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 13 Mar 2009 11:48:39 -0400 Subject: [Biopython-dev] [Bug 2780] PDB file HETATMs cannot be alternative location of a residue that is an ATOM In-Reply-To: Message-ID: <200903131548.n2DFmdZ6015899@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2780 ------- Comment #2 from klaus.kopec at tuebingen.mpg.de 2009-03-13 11:48 EST ------- PDB IDs of some more occurances (simply search the file for "HETATM" and look for a HETATM record that is followed by a ATOM with the same residue number and a different altloc). 1din 1k4q 1k55 - multiple occurances 1k56 1rqh 1rr2 1xpk 1xpl - multiple occurances 1xpm - multiple occurances -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From jblanca at btc.upv.es Fri Mar 13 11:59:01 2009 From: jblanca at btc.upv.es (Jose Blanca) Date: Fri, 13 Mar 2009 16:59:01 +0100 Subject: [Biopython-dev] library to create gel image In-Reply-To: <200902271157.49948.jblanca@btc.upv.es> References: <200902261612.54306.jblanca@btc.upv.es> <320fb6e00902270245q65c0b924obd5181576374134c@mail.gmail.com> <200902271157.49948.jblanca@btc.upv.es> Message-ID: <200903131659.01590.jblanca@btc.upv.es> Hi: I've fishished a first version of a program that reads a list of Applied Biosystems fsa files and draws a virtual gel. It does not reads the sequence because my users are interested in fragment analysis, but the basic infraestructure is in place to do it. It does what my users need. It's quite slow though, but I'm not investing time in optimizing it. If anybody wants to take a look at the code is in: http://bioinf.comav.upv.es/svn/gelify/gelifyfsa/ I distribute it under the GPL licence. If you think that any part of the code could be of any use for the Biopython project I would be very please to give it to the comunity. Best regards, Jose Blanca On Friday 27 February 2009 11:57:49 Jose Blanca wrote: > On Friday 27 February 2009 11:45:59 Peter wrote: > > On Fri, Feb 27, 2009 at 9:05 AM, Jose Blanca wrote: > > That's much clearer - is the Genographer software showing the actual > > image (zoomed as required, with the colours adjusted as required), or > > an artificial recreation? > > Is an artificial recreation, the same as I'm trying to accomplish. I just > want more resolution an automated process (genographer is a GUI > application) > > > Are you trying to create this figure for illustrative purposes only? > > I mean would a slightly cartoon like recreation be fine, or are you > > trying to make it as realistic as possible? > > I want to analyze it. > > > I see you are having to reverse engineer their file format. I guess > > other people have tried this in the past so there may be more clues > > out on the internet. Have you tried emailing the company to see if > > they would publish the file format specifications (unlikely I fear, > > but worth asking). > > Fortunately the ABIF was reverse enginered by people more clever than me. > And a couple of years ago Applied published an specification. > http://bioinf.comav.upv.es/svn/gelify/gelifyfsa/src/doc/ABIF_File_Format.pd >f You can't beleive everything in that specification, but it is a good > start. Reading an abif file is not a problem, drawing the gel with as > little coding as possible is another thing. > Regards, > > Jose Blanca > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev -- Jose M. Blanca Postigo Instituto Universitario de Conservacion y Mejora de la Agrodiversidad Valenciana (COMAV) Universidad Politecnica de Valencia (UPV) Edificio CPI (Ciudad Politecnica de la Innovacion), 8E 46022 Valencia (SPAIN) Tlf.:+34-96-3877000 (ext 88473) From biopython at maubp.freeserve.co.uk Fri Mar 13 12:12:12 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 13 Mar 2009 16:12:12 +0000 Subject: [Biopython-dev] library to create gel image In-Reply-To: <200903131659.01590.jblanca@btc.upv.es> References: <200902261612.54306.jblanca@btc.upv.es> <320fb6e00902270245q65c0b924obd5181576374134c@mail.gmail.com> <200902271157.49948.jblanca@btc.upv.es> <200903131659.01590.jblanca@btc.upv.es> Message-ID: <320fb6e00903130912k455c49d6y6baff970ad064bd@mail.gmail.com> On Fri, Mar 13, 2009 at 3:59 PM, Jose Blanca wrote: > Hi: > I've fishished a first version of a program that reads a list of Applied > Biosystems fsa files and draws a virtual gel. It does not reads the sequence > because my users are interested in fragment analysis, but the basic > infraestructure is in place to do it. > It does what my users need. It's quite slow though, but I'm not investing time > in optimizing it. Do you have any example images online for people to look at? Peter From jblanca at btc.upv.es Fri Mar 13 12:16:46 2009 From: jblanca at btc.upv.es (Jose Blanca) Date: Fri, 13 Mar 2009 17:16:46 +0100 Subject: [Biopython-dev] library to create gel image In-Reply-To: <320fb6e00903130912k455c49d6y6baff970ad064bd@mail.gmail.com> References: <200902261612.54306.jblanca@btc.upv.es> <200903131659.01590.jblanca@btc.upv.es> <320fb6e00903130912k455c49d6y6baff970ad064bd@mail.gmail.com> Message-ID: <200903131716.46413.jblanca@btc.upv.es> Here you have one: http://bioinf.comav.upv.es/svn/gelify/gelifyfsa/src/doc/out.png Jose Blanca On Friday 13 March 2009 17:12:12 Peter wrote: > On Fri, Mar 13, 2009 at 3:59 PM, Jose Blanca wrote: > > Hi: > > I've fishished a first version of a program that reads a list of Applied > > Biosystems fsa files and draws a virtual gel. It does not reads the > > sequence because my users are interested in fragment analysis, but the > > basic infraestructure is in place to do it. > > It does what my users need. It's quite slow though, but I'm not investing > > time in optimizing it. > > Do you have any example images online for people to look at? > > Peter -- Jose M. Blanca Postigo Instituto Universitario de Conservacion y Mejora de la Agrodiversidad Valenciana (COMAV) Universidad Politecnica de Valencia (UPV) Edificio CPI (Ciudad Politecnica de la Innovacion), 8E 46022 Valencia (SPAIN) Tlf.:+34-96-3877000 (ext 88473) From chris.lasher at gmail.com Sun Mar 15 01:43:34 2009 From: chris.lasher at gmail.com (Chris Lasher) Date: Sun, 15 Mar 2009 01:43:34 -0400 Subject: [Biopython-dev] biopython on github In-Reply-To: <320fb6e00903130521s69c5b3eg55b71191b1e8ff21@mail.gmail.com> References: <5aa3b3570902150729g367022a5p334b2c33f86461f@mail.gmail.com> <8b34ec180902231029u7a9d003r533af7f078f4a8e2@mail.gmail.com> <8b34ec180902250140k4fb1bef0y913b97db0e309e4b@mail.gmail.com> <320fb6e00902250210t2ad19536ke379e219ba6f7dae@mail.gmail.com> <8b34ec180902250256k6f6f5c1bvbf85d8b68a315927@mail.gmail.com> <8b34ec180902260526m3ff42f3x2a99a77d4d0fb928@mail.gmail.com> <320fb6e00902260600p5fb90241td1ded497c08cb901@mail.gmail.com> <128a885f0903121407g133ed8ctda57b21ff8adb70e@mail.gmail.com> <8b34ec180903121620w9c2ec46i8fed9ccb4781370e@mail.gmail.com> <320fb6e00903130521s69c5b3eg55b71191b1e8ff21@mail.gmail.com> Message-ID: <128a885f0903142243r372026d7vdf5bbe998db3a326@mail.gmail.com> On Fri, Mar 13, 2009 at 8:21 AM, Peter wrote: > On Thu, Mar 12, 2009 at 11:20 PM, Bartek Wilczynski > wrote: >> On Thu, Mar 12, 2009 at 10:07 PM, Chris Lasher wrote: >>> On Thu, Feb 26, 2009 at 10:00 AM, Peter wrote: >>>> Another option to consider would be to switch to running git on >>>> biopython.org, but use the git-cvsserver tool to provide an emulated >>>> CVS server on top of the git repository. ?This sounds possible in >>>> theory, and would be nice for any "old fashioned" biopython developers >>>> because is should be fairly transparent - they can continue to treat >>>> it as CVS and just work on the main trunk. ?This would require someone >>>> competent to do the conversion and alter the server setup - we'd have >>>> to talk to the OBF team about this. ?However, if anyone has first hand >>>> experience on git-cvsserver perhaps they could comment on weather this >>>> sounds like a good plan or not. >>> >>> I must be missing something, Peter. Why would BioPython continue to >>> operate with CVS? I suppose I just really hope to see BioPython >>> running with something other than CVS, and I'd really like to see it >>> go either under Bazaar or Git. > > I'm warming to the idea of git, and had noticed git includes the > optional git-cvsserver tool which emulates a CVS server while using > git underneath. ?I was wondering if anyone had first hand experience > of this. ?If we did move from CVS to git (still hosted on > biopython.org), this would seem to offer a nice migration path for of > our "old school" CVS developers - they can carry on as usual. ?Of > course, if none of us care about having to learn a new interface, then > a simple switch would be less hassle to setup. ?For the server side of > things, we'll need to talk to the OBF team about any such move - as > far as I know they've only managed CVS to SVN migrations in the past. > > Peter > >> Hi Chris, >> >> The idea is to do the switch in two steps: >> - first we still have the main branch in CVS while we have git and/or >> bzr branches synchronized with it for people to branch and contribute >> - If this works nicely, we will switch to one of these systems >> completely (while possibly keeping the other branch in sync, but this >> is not yet decided) > > That does seem like a good plan. ?Of course, there is the related > issue of where we host the official repository (externally, e.g. on > github or lauchpad) or in house (on biopython.org). ?I favour keeping > the official repository on biopython.org but this will require OBF > technical support (do we have the expertise within Biopython? Bartek? > Chris?). > >> The first step is to some extent operational (I'm currently busy with >> other stuff, but I'll get arround it hopefully this weekend), but the >> second step requires decision on our side (git or bzr?) and action on >> the side of OBF (there is no git or bazar installed on obf servers). > > There is also the previously semi-agreed solution of switching from > CVS to SVN on biopython.org, but this would be only a gradual > improvement. ?I gather there are mature tools for using git+svn > together, so it should be better than using git+cvs together. ?Other > than meaning all the OBF hosted projects are on SVN (I think we are > the last still on CVS), this is beginning to seem a bit pointless. > > Peter > Peter et al., I started off writing an email about why I think hosting at GitHub or Launchpad is a better idea, but it got a bit verbose, so I just wrote up a blog post instead. (Besides, links and images are more fun, and make the intarwebs go 'round.) Please see http://igotgenes.blogspot.com/2009/03/why-biopython-needs-to-move-to-github.html or http://tinyurl.com/a9o7ae Chris From mjldehoon at yahoo.com Sun Mar 15 06:24:11 2009 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Sun, 15 Mar 2009 03:24:11 -0700 (PDT) Subject: [Biopython-dev] Bio.ExPASy Message-ID: <76595.11423.qm@web62404.mail.re1.yahoo.com> Hi everybody, As discussed previously, I have moved the Bio.Prosite code to Bio.ExPASy, and I've added a ScanProsite module to Bio.ExPASy. I guess Bio.Enzyme should also move to Bio.ExPASy. See http://biopython.org/DIST/docs/tutorial/Tutorial.proposal.html for the documentation of Biopython as currently in CVS. --Michiel. From mjldehoon at yahoo.com Sun Mar 15 08:53:28 2009 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Sun, 15 Mar 2009 05:53:28 -0700 (PDT) Subject: [Biopython-dev] Fw: Re: Bio.Entrez catching more errors Message-ID: <722257.11611.qm@web62401.mail.re1.yahoo.com> --- On Sun, 3/15/09, Michiel de Hoon wrote: > Whereas I think it's a good idea if Bio.Entrez catches > more errors, I think the parser is a more suitable place to > check for errors. See Bio.ExPASy.ScanProsite for an example > of catching errors with an XML parser; this avoids using a > File.UndoHandle. > > --Michiel > > --- On Tue, 3/10/09, Peter > wrote: > > > From: Peter > > Subject: [Biopython-dev] Bio.Entrez catching more > errors > > To: "BioPython-Dev Mailing List" > > > Date: Tuesday, March 10, 2009, 7:40 PM > > Hi All, > > > > It occured to me that the Bio.Entrez._open function > can > > look at the > > retmode argument (if present) and spot if there is a > > mismatch between > > the requested format (e.g. XML, HTML, text or asn.1) > and > > the actual > > data the NCBI returned. Something along the following > > lines could be > > added to the end of the _open function in > > Bio/Entrez/__init__.py to > > acheive this: > > > > elif "retmode" in params and > > params["retmode"].lower()=="html" > \ > > and not > data.lower().startswith(" > \ > > and not data.lower().startswith(" > html") : > > raise TypeError("Requested HTML, but > > didn't get it: %s..." % data) > > elif "retmode" in params and > > params["retmode"].lower()=="xml" > \ > > and not > data.lower().startswith(" > raise TypeError("Requested XML, but > didn't > > get it: %s..." % data) > > elif "retmode" in params and > > params["retmode"] \ > > and > > params["retmode"].lower()!="xml" > \ > > and data.lower().startswith(" : > > raise TypeError("Didn't request XML, > but > > got it: %s..." % data) > > elif "retmode" in params and > > params["retmode"] \ > > and > > params["retmode"].lower()!="html" > \ > > and (data.lower().startswith(" or > > \ > > data.lower().startswith(" > html")): > > #Expected for some error pages (e.g. the Bad > > Gateway caught above) > > raise TypeError("Didn't request HTML, > but > > got it: %s..." % data) > > > > I'm sure my XML/HTML detection could be made more > > robust here - I hope > > the principle is clear. My motivation is that I have > > noticed the NCBI > > can return HTML error pages, and while we do catch > some of > > these > > explicitly (e.g. Bad Gateway, or Service Unavailable), > I > > think any > > HTML page when the user asked from XML, text or asn.1 > > should be > > treated as error. Similarly, not getting XML when you > ask > > for it etc. > > > > Note that by raising the exception including the > message > > text it > > should be much easier to diagnose these failures. As > a > > tiny > > refinement to the above code, we should only add the > > "..." if there is > > more text to follow - this isn't always the case. > > > > e.g. The following give an HTML error page (while some > > databases like > > "protein" are better behaved in this > respect): > > >>> print > Entrez.efetch(db="homologene", > > id="nonexistant", > retmode="text").read() > > >>> print > Entrez.efetch(db="homologene", > > id="nonexistant", > > retmode="asn.1").read() > > > > Similarly, these give an XML like fragment (which is > not a > > valid XML > > file in itself - arguably an NCBI bug; some databases > like > > "protein" > > are better behaved in this respect): > > >>> print > Entrez.efetch(db="pubmed", > > id="nonexistant", > retmode="xml").read() > > >>> print > Entrez.efetch(db="homologene", > > id="nonexistant", > retmode="xml").read() > > >>> print Entrez.efetch(db="cdd", > > id="nonexistant", > retmode="xml").read() > > >>> print > Entrez.efetch(db="taxonomy", > > id="nonexistant", > retmode="xml").read() > > > > My suggested change to Bio.Entrez would also catch the > > following > > examples (using an invalid database) where the NCBI > ignore > > the retmode > > and return an HTML help page: > > >>> print > > Entrez.efetch(db="nonexistant", > > id="123456", retmode="xml").read() > > >>> print > > Entrez.efetch(db="nonexistant", > > id="123456", > retmode="text").read() > > > > In a less clear cut example, this would flag the > following > > as an error > > as the NCBI seem to return ASN.1 text instead of HTML > > here:: > > >>> print > Entrez.efetch(db="nucleotide", > > retmode="html", > id="123456").read() > > > > Overall, I think this change should catch lots of > errors > > which > > otherwise may not be detected until later (e.g. while > > trying to parse > > the file). > > > > > -------------------------------------------------------------------------------------------------- > > > > On another point, should we catch these responses as > > errors:? > > > > >>> efetch(db="snp", > > id="123456").read() > > 'PmFetch > > > response\n
\n1:
> > id: 123456 Error occurred: cannot get document
> >
> summary\n
' > > >>> efetch(db="snp", > > id="123456", > retmode="html").read() > > 'PmFetch > > > response\n
\n1:
> > id: 123456 Error occurred: cannot get document
> >
> summary\n
' > > >>> efetch(db="snp", > > id="123456", retmode="xml").read() > > ' > version="1.0"?>\n > > xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"\nxmlns="http://www.ncbi.nlm.nih.gov/SNP/docsum"\nxsi:schemaLocation="http://www.ncbi.nlm.nih.gov/SNP/docsum\nhttp://www.ncbi.nlm.nih.gov/SNP/docsum/eudocsum.xsd">1: > > id: 123456 Error occurred: cannot get document > > summary\n\n' > > >>> efetch(db="snp", > > id="123456", > retmode="text").read() > > '1: id: 123456 Error occurred: cannot get document > > summary\n' > > > > and, > > >>> print efetch(db="homologene", > > retmode="html", id="fake").read() > > > > > >

Error occurred: Empty id list - > > nothing todo

... > > > > Looking for the string "Error occurred: " > looks > > fairly safe here, and > > should cover a range of entries. Of course, you can > > imagine false > > positives too, e.g. a valid PUBMED plain text record > for a > > tutorial > > article with a title like "Yikes! An Error > Occurred: A > > beginner's > > Guide To Defensive Programming." could match. > > > > Peter > > _______________________________________________ > > Biopython-dev mailing list > > Biopython-dev at lists.open-bio.org > > > http://lists.open-bio.org/mailman/listinfo/biopython-dev From chapmanb at 50mail.com Sun Mar 15 14:54:43 2009 From: chapmanb at 50mail.com (Brad Chapman) Date: Sun, 15 Mar 2009 14:54:43 -0400 Subject: [Biopython-dev] biopython on github In-Reply-To: <128a885f0903142243r372026d7vdf5bbe998db3a326@mail.gmail.com> References: <8b34ec180902231029u7a9d003r533af7f078f4a8e2@mail.gmail.com> <8b34ec180902250140k4fb1bef0y913b97db0e309e4b@mail.gmail.com> <320fb6e00902250210t2ad19536ke379e219ba6f7dae@mail.gmail.com> <8b34ec180902250256k6f6f5c1bvbf85d8b68a315927@mail.gmail.com> <8b34ec180902260526m3ff42f3x2a99a77d4d0fb928@mail.gmail.com> <320fb6e00902260600p5fb90241td1ded497c08cb901@mail.gmail.com> <128a885f0903121407g133ed8ctda57b21ff8adb70e@mail.gmail.com> <8b34ec180903121620w9c2ec46i8fed9ccb4781370e@mail.gmail.com> <320fb6e00903130521s69c5b3eg55b71191b1e8ff21@mail.gmail.com> <128a885f0903142243r372026d7vdf5bbe998db3a326@mail.gmail.com> Message-ID: <20090315185443.GA30296@kunkel> Hi all; It is good to see the discussion around revision control systems; Chris and Paulo's posts make some nice points. Source code management is an important issue that influences perception of Biopython and barriers to contributing. My two cents on what we should do is: - Pick a distributed source code management system. My preference is Git, only because it currently has more steam behind it. Git/Bazaar will likely end up being like the VHS/Beta debate. - Test drive use of Git on an official GitHub repository. This would involve a few things: = Bartek and Giovanni: Can you coordinate on a single GitHub Biopython instance and remove the others to eliminate confusion? = Write up documentation for contributors. This is where we could use some volunteers from those interested to update the web pages. The two main places that need updating are: http://biopython.org/wiki/Contributing http://biopython.org/wiki/CVS I think we should ensure people are clear on what is being done and where you can contribute. - Ensure GitHub can be synced with current CVS. Bartek, it sounds like you have a handle on this. - Evaluate the success of Git. This is easy to measure in terms of new contributors, increased happiness, and what not. At the same time we can monitor how GitHub evolves over time. - If successful, talk to the OpenBio team about hosting Git locally. Peter, Michiel, et al -- how do you feel? I think being cautious with the transition, as Peter recommends, is important. I am old enough to remember Sourceforge being new and everyone saying how it was stupid not to move there; then over time Sourceforge got slow with all the users and people moved away from it. This is just to say -- no one knows how GitHub (or Launchpad) will evolve. OpenBio is a stable, small, nice community and to the extent we can use their resources I believe we should. Overall, the specifics of the above proposal aren't as important as just doing something unambiguous and then evaluating how it works. Right now things are a big confusing, which I think could put off new developers, who are always welcome. Looking forward to talking about code instead of revision control, Brad > On Fri, Mar 13, 2009 at 8:21 AM, Peter wrote: > > On Thu, Mar 12, 2009 at 11:20 PM, Bartek Wilczynski > > wrote: > >> On Thu, Mar 12, 2009 at 10:07 PM, Chris Lasher wrote: > >>> On Thu, Feb 26, 2009 at 10:00 AM, Peter wrote: > >>>> Another option to consider would be to switch to running git on > >>>> biopython.org, but use the git-cvsserver tool to provide an emulated > >>>> CVS server on top of the git repository. ?This sounds possible in > >>>> theory, and would be nice for any "old fashioned" biopython developers > >>>> because is should be fairly transparent - they can continue to treat > >>>> it as CVS and just work on the main trunk. ?This would require someone > >>>> competent to do the conversion and alter the server setup - we'd have > >>>> to talk to the OBF team about this. ?However, if anyone has first hand > >>>> experience on git-cvsserver perhaps they could comment on weather this > >>>> sounds like a good plan or not. > >>> > >>> I must be missing something, Peter. Why would BioPython continue to > >>> operate with CVS? I suppose I just really hope to see BioPython > >>> running with something other than CVS, and I'd really like to see it > >>> go either under Bazaar or Git. > > > > I'm warming to the idea of git, and had noticed git includes the > > optional git-cvsserver tool which emulates a CVS server while using > > git underneath. ?I was wondering if anyone had first hand experience > > of this. ?If we did move from CVS to git (still hosted on > > biopython.org), this would seem to offer a nice migration path for of > > our "old school" CVS developers - they can carry on as usual. ?Of > > course, if none of us care about having to learn a new interface, then > > a simple switch would be less hassle to setup. ?For the server side of > > things, we'll need to talk to the OBF team about any such move - as > > far as I know they've only managed CVS to SVN migrations in the past. > > > > Peter > > > >> Hi Chris, > >> > >> The idea is to do the switch in two steps: > >> - first we still have the main branch in CVS while we have git and/or > >> bzr branches synchronized with it for people to branch and contribute > >> - If this works nicely, we will switch to one of these systems > >> completely (while possibly keeping the other branch in sync, but this > >> is not yet decided) > > > > That does seem like a good plan. ?Of course, there is the related > > issue of where we host the official repository (externally, e.g. on > > github or lauchpad) or in house (on biopython.org). ?I favour keeping > > the official repository on biopython.org but this will require OBF > > technical support (do we have the expertise within Biopython? Bartek? > > Chris?). > > > >> The first step is to some extent operational (I'm currently busy with > >> other stuff, but I'll get arround it hopefully this weekend), but the > >> second step requires decision on our side (git or bzr?) and action on > >> the side of OBF (there is no git or bazar installed on obf servers). > > > > There is also the previously semi-agreed solution of switching from > > CVS to SVN on biopython.org, but this would be only a gradual > > improvement. ?I gather there are mature tools for using git+svn > > together, so it should be better than using git+cvs together. ?Other > > than meaning all the OBF hosted projects are on SVN (I think we are > > the last still on CVS), this is beginning to seem a bit pointless. > > > > Peter > > > > Peter et al., > > I started off writing an email about why I think hosting at GitHub or > Launchpad is a better idea, but it got a bit verbose, so I just wrote > up a blog post instead. (Besides, links and images are more fun, and > make the intarwebs go 'round.) Please see > http://igotgenes.blogspot.com/2009/03/why-biopython-needs-to-move-to-github.html > or > http://tinyurl.com/a9o7ae > > Chris > > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev From bartek at rezolwenta.eu.org Sun Mar 15 16:12:46 2009 From: bartek at rezolwenta.eu.org (Bartek Wilczynski) Date: Sun, 15 Mar 2009 21:12:46 +0100 Subject: [Biopython-dev] biopython on github In-Reply-To: <20090315185443.GA30296@kunkel> References: <8b34ec180902231029u7a9d003r533af7f078f4a8e2@mail.gmail.com> <320fb6e00902250210t2ad19536ke379e219ba6f7dae@mail.gmail.com> <8b34ec180902250256k6f6f5c1bvbf85d8b68a315927@mail.gmail.com> <8b34ec180902260526m3ff42f3x2a99a77d4d0fb928@mail.gmail.com> <320fb6e00902260600p5fb90241td1ded497c08cb901@mail.gmail.com> <128a885f0903121407g133ed8ctda57b21ff8adb70e@mail.gmail.com> <8b34ec180903121620w9c2ec46i8fed9ccb4781370e@mail.gmail.com> <320fb6e00903130521s69c5b3eg55b71191b1e8ff21@mail.gmail.com> <128a885f0903142243r372026d7vdf5bbe998db3a326@mail.gmail.com> <20090315185443.GA30296@kunkel> Message-ID: <8b34ec180903151312q5a3b2bcdwc526aef5d4ca2cfc@mail.gmail.com> Hi all, On Sun, Mar 15, 2009 at 7:54 PM, Brad Chapman wrote: > > - Pick a distributed source code management system. My preference > ?is Git, only because it currently has more steam behind it. > ?Git/Bazaar will likely end up being like the VHS/Beta debate. > > - Test drive use of Git on an official GitHub repository. This would > ?involve a few things: > > ?= Bartek and Giovanni: Can you coordinate on a single GitHub > ? ?Biopython instance and remove the others to eliminate confusion? > ?= Write up documentation for contributors. This is where we could use > ? ?some volunteers from those interested to update the web pages. > ? ?The two main places that need updating are: > > ? ?http://biopython.org/wiki/Contributing > ? ?http://biopython.org/wiki/CVS > > ? ?I think we should ensure people are clear on what is being done > ? ?and where you can contribute. > > - Ensure GitHub can be synced with current CVS. Bartek, it sounds > ?like you have a handle on this. > > - Evaluate the success of Git. This is easy to measure in terms of > ?new contributors, increased happiness, and what not. At the same > ?time we can monitor how GitHub evolves over time. > I think there are some important points brought by Brad (and others). - From the technical point of view, I don't see any serious problems: - I can setup a new branch in github (current one includes some testing changes done by Giovanni) - it will be synchronized daily with changes from CVS - I'll set up a script to also save a backup of the official branch at the OBF server (to ensure that we do not depend on github) - I can make a (short) documentation on how to contribute. I don't know wheteher anyone beside me is still interested in testdriving launchpad/bzr as an alternative. If there are no other people, I'll close the current testing branches from launchpad. > > Peter, Michiel, et al -- how do you feel? I would also very happily hear from other developers. Especially if there are any people who would be unhappy if we finally moved away from CVS. I'll post when I will have a running setup of cvs2git conversion. cheers Bartek From bartek at rezolwenta.eu.org Sun Mar 15 19:14:07 2009 From: bartek at rezolwenta.eu.org (Bartek Wilczynski) Date: Mon, 16 Mar 2009 00:14:07 +0100 Subject: [Biopython-dev] biopython on github In-Reply-To: <8b34ec180903151312q5a3b2bcdwc526aef5d4ca2cfc@mail.gmail.com> References: <8b34ec180902231029u7a9d003r533af7f078f4a8e2@mail.gmail.com> <8b34ec180902250256k6f6f5c1bvbf85d8b68a315927@mail.gmail.com> <8b34ec180902260526m3ff42f3x2a99a77d4d0fb928@mail.gmail.com> <320fb6e00902260600p5fb90241td1ded497c08cb901@mail.gmail.com> <128a885f0903121407g133ed8ctda57b21ff8adb70e@mail.gmail.com> <8b34ec180903121620w9c2ec46i8fed9ccb4781370e@mail.gmail.com> <320fb6e00903130521s69c5b3eg55b71191b1e8ff21@mail.gmail.com> <128a885f0903142243r372026d7vdf5bbe998db3a326@mail.gmail.com> <20090315185443.GA30296@kunkel> <8b34ec180903151312q5a3b2bcdwc526aef5d4ca2cfc@mail.gmail.com> Message-ID: <8b34ec180903151614k37db9568sc04b10bcdb688139@mail.gmail.com> Hi all, I've now set up script on my machine to update the biopython git branch on github once every hour. (thanks to Giovanni for creating and setting up the account) It's created using the git fast-import script because of its speed. You can find it here: http://github.com/biopython/biopython/ It's a different branch than the one created earlier by Giovanni. The old one is now called biopython_old and will soon disappear from github (there were some temporary changes in it) Th script also leaves a copy of the repository on dev.open-bio.org, just in case :) I've written a short guide on the wiki : http://biopython.org/wiki/GitMigration Please correct or give me comments if you don't like something or if you feel something is missing. I'm going to a conference, so I might be slow in responding to emails next week... cheers Bartek From dalloliogm at gmail.com Mon Mar 16 05:49:29 2009 From: dalloliogm at gmail.com (Giovanni Marco Dall'Olio) Date: Mon, 16 Mar 2009 10:49:29 +0100 Subject: [Biopython-dev] biopython on github In-Reply-To: <8b34ec180903151614k37db9568sc04b10bcdb688139@mail.gmail.com> References: <8b34ec180902231029u7a9d003r533af7f078f4a8e2@mail.gmail.com> <8b34ec180902260526m3ff42f3x2a99a77d4d0fb928@mail.gmail.com> <320fb6e00902260600p5fb90241td1ded497c08cb901@mail.gmail.com> <128a885f0903121407g133ed8ctda57b21ff8adb70e@mail.gmail.com> <8b34ec180903121620w9c2ec46i8fed9ccb4781370e@mail.gmail.com> <320fb6e00903130521s69c5b3eg55b71191b1e8ff21@mail.gmail.com> <128a885f0903142243r372026d7vdf5bbe998db3a326@mail.gmail.com> <20090315185443.GA30296@kunkel> <8b34ec180903151312q5a3b2bcdwc526aef5d4ca2cfc@mail.gmail.com> <8b34ec180903151614k37db9568sc04b10bcdb688139@mail.gmail.com> Message-ID: <5aa3b3570903160249l6db16b6ew349e394bc3e126dc@mail.gmail.com> On Mon, Mar 16, 2009 at 12:14 AM, Bartek Wilczynski < bartek at rezolwenta.eu.org> wrote: > Hi all, > > I've written a short guide on the wiki : > http://biopython.org/wiki/GitMigration I also have a draft for some documentation... I can contribute it later this morning (now I don't have time). p.s. the biopython website seems to be offline at the moment... -- My blog on bioinformatics (now in English): http://bioinfoblog.it From biopython at maubp.freeserve.co.uk Mon Mar 16 07:05:38 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 16 Mar 2009 11:05:38 +0000 Subject: [Biopython-dev] biopython on github In-Reply-To: <5aa3b3570903160249l6db16b6ew349e394bc3e126dc@mail.gmail.com> References: <8b34ec180902231029u7a9d003r533af7f078f4a8e2@mail.gmail.com> <320fb6e00902260600p5fb90241td1ded497c08cb901@mail.gmail.com> <128a885f0903121407g133ed8ctda57b21ff8adb70e@mail.gmail.com> <8b34ec180903121620w9c2ec46i8fed9ccb4781370e@mail.gmail.com> <320fb6e00903130521s69c5b3eg55b71191b1e8ff21@mail.gmail.com> <128a885f0903142243r372026d7vdf5bbe998db3a326@mail.gmail.com> <20090315185443.GA30296@kunkel> <8b34ec180903151312q5a3b2bcdwc526aef5d4ca2cfc@mail.gmail.com> <8b34ec180903151614k37db9568sc04b10bcdb688139@mail.gmail.com> <5aa3b3570903160249l6db16b6ew349e394bc3e126dc@mail.gmail.com> Message-ID: <320fb6e00903160405p5337f8b1m16d3c3d891950fd6@mail.gmail.com> On Mon, Mar 16, 2009 at 9:49 AM, Giovanni Marco Dall'Olio wrote: > On Mon, Mar 16, 2009 at 12:14 AM, Bartek Wilczynski < > bartek at rezolwenta.eu.org> wrote: >> Hi all, >> >> I've written a short guide on the wiki : >> http://biopython.org/wiki/GitMigration > > I also have a draft for some documentation... I can contribute it later this > morning (now I don't have time). In the meantime, I have updated the following pages accordingly: http://biopython.org/wiki/CVS http://biopython.org/wiki/SVN http://biopython.org/wiki/Subversion_migration http://biopython.org/wiki/Git #place holder, will be important if we do fully move to git http://biopython.org/wiki/GitMigration #Fixing biopython to Biopython etc Peter > p.s. the biopython website seems to be offline at the moment... All the OBF pages were out for bit this morning (e.g. OBF helpdesk #332), but it is back now. From biopython at maubp.freeserve.co.uk Mon Mar 16 07:30:12 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 16 Mar 2009 11:30:12 +0000 Subject: [Biopython-dev] biopython on github In-Reply-To: <20090315185443.GA30296@kunkel> References: <8b34ec180902231029u7a9d003r533af7f078f4a8e2@mail.gmail.com> <320fb6e00902250210t2ad19536ke379e219ba6f7dae@mail.gmail.com> <8b34ec180902250256k6f6f5c1bvbf85d8b68a315927@mail.gmail.com> <8b34ec180902260526m3ff42f3x2a99a77d4d0fb928@mail.gmail.com> <320fb6e00902260600p5fb90241td1ded497c08cb901@mail.gmail.com> <128a885f0903121407g133ed8ctda57b21ff8adb70e@mail.gmail.com> <8b34ec180903121620w9c2ec46i8fed9ccb4781370e@mail.gmail.com> <320fb6e00903130521s69c5b3eg55b71191b1e8ff21@mail.gmail.com> <128a885f0903142243r372026d7vdf5bbe998db3a326@mail.gmail.com> <20090315185443.GA30296@kunkel> Message-ID: <320fb6e00903160430h125d11a3jd100497d3e25ffb8@mail.gmail.com> On Sun, Mar 15, 2009 at 6:54 PM, Brad Chapman wrote: > Hi all; > It is good to see the discussion around revision control systems; > Chris and Paulo's posts make some nice points. Source code > management is an important issue that influences perception of > Biopython and barriers to contributing. > > My two cents on what we should do is: > > - Pick a distributed source code management system. My preference > ?is Git, only because it currently has more steam behind it. > ?Git/Bazaar will likely end up being like the VHS/Beta debate. I would agree git has more mind share, but I have no technical reason to choose one over the other. In terms of read only access, having a mirrored trunk branch on both git (e.g. github) and bazaar (e.g. launchpad) is possible for evaluation purposes. > - Test drive use of Git on an official GitHub repository. This would > ?involve a few things ... Giovanni has shared the github "Biopython" user information so we (i.e. Biopython) can use that for any official presence on github - which is great. Bartek and Giovanni seem to have this working OK. I think having the latest CVS trunk in Launchpad automatically is stalled because they (launchpad) can't cope with a simple username/password for accessing a remote CVS server. Is that right Bartek? > - Evaluate the success of Git. This is easy to measure in terms of > ?new contributors, increased happiness, and what not. At the same > ?time we can monitor how GitHub evolves over time. It may not be that easy to measure in practice... > - If successful, talk to the OpenBio team about hosting Git locally. I have contacted the OBF to ask who we should talk to about this idea (given it will probably involve server access to install new software and perhaps changing firewall/port settings). > Peter, Michiel, et al -- how do you feel? I'm happy in principle with a switch to git, ideally hosted on biopython.org (see below). > I think being cautious with the transition, as Peter recommends, is > important. I am old enough to remember Sourceforge being new and > everyone saying how it was stupid not to move there; then over time > Sourceforge got slow with all the users and people moved > away from it. This is just to say -- no one knows how GitHub (or > Launchpad) will evolve. OpenBio is a stable, small, nice community > and to the extent we can use their resources I believe we should. I did have that same example in mind - having to depend on a third party like GitHub, LaunchPad or Sourceforge is great until things go wrong. The Open Bio Foundation is much smaller, and while they don't have 100% uptime either, they are normally very responsive to issues because they only support a small number of projects. Of course, ideally we might have both - an OBF hosted (git) repository on biopython.org, synced to github for people to enjoy its collaborative additions. > Overall, the specifics of the above proposal aren't as important as > just doing something unambiguous and then evaluating how it works. > Right now things are a big confusing, which I think could put off > new developers, who are always welcome. > > Looking forward to talking about code instead of revision control, That would be nice :) Peter From biopython at maubp.freeserve.co.uk Mon Mar 16 08:16:06 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 16 Mar 2009 12:16:06 +0000 Subject: [Biopython-dev] Preparing for Biopython 1.50 (beta) Message-ID: <320fb6e00903160516yd63f61fu21ca7560562dd6dd@mail.gmail.com> Hi All, I think we should probably do another release soon - for one thing the NCBI updated their DTD files, and it would be great if Biopython shipped with them included (see discussion on Bug 2678). We still need to work on the documentation for Bio.Graphics.GenomeDiagram (Bug 2671) and Bio.Motif (Bug 2694), but in the meantime I think it would be sensible to do a Biopython 1.50 beta release in the next couple of weeks. I'd like to include the following changes as part of the beta, but it would be sensible to have someone else try these out first. Any volunteers? Bug 2767 - Bio.SeqIO support for FASTQ and QUAL files Bug 2551 - Adding advanced __getitem__ to generic alignment, e.g. align[1:2,5:-5] Any other nominations for Biopython 1.50? I'd also like to resolve Bug 2597 (Enforce alphabet letters in Seq objects), but that might deserve an alpha release given the higher chance of breaking existing scripts... Peter From biopython at maubp.freeserve.co.uk Mon Mar 16 09:18:19 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 16 Mar 2009 13:18:19 +0000 Subject: [Biopython-dev] biopython on github In-Reply-To: <320fb6e00903160430h125d11a3jd100497d3e25ffb8@mail.gmail.com> References: <8b34ec180902231029u7a9d003r533af7f078f4a8e2@mail.gmail.com> <8b34ec180902250256k6f6f5c1bvbf85d8b68a315927@mail.gmail.com> <8b34ec180902260526m3ff42f3x2a99a77d4d0fb928@mail.gmail.com> <320fb6e00902260600p5fb90241td1ded497c08cb901@mail.gmail.com> <128a885f0903121407g133ed8ctda57b21ff8adb70e@mail.gmail.com> <8b34ec180903121620w9c2ec46i8fed9ccb4781370e@mail.gmail.com> <320fb6e00903130521s69c5b3eg55b71191b1e8ff21@mail.gmail.com> <128a885f0903142243r372026d7vdf5bbe998db3a326@mail.gmail.com> <20090315185443.GA30296@kunkel> <320fb6e00903160430h125d11a3jd100497d3e25ffb8@mail.gmail.com> Message-ID: <320fb6e00903160618g2b5b6acs6695fab5ef432bc7@mail.gmail.com> Hi all, I'm thinking a news post on http://news.open-bio.org/news/category/obf-projects/biopython/ about version control would be a good idea at this point. How about this - keywords like git, subversion and the other project names would be links: Title: Biopython and version control systems Originally, all the OBF hosted projects used CVS for their source code repositories. At the start of 2008, BioPerl and BioJava moved over to Subversion (SVN), followed by BioSQL. Biopython was originally going to do the same, but this didn't actually happen. Having all the Bio* projects using the same version control system would have simplified server administration for the OBF, but wouldn't have actually made that much difference to Biopython development. Discussion has since shifted towards next generation distributed version control systems like git or bazaar. Quote from Linus Torvalds, The slogan of Subversion for a while was ?CVS done right?, or something like that, and if you start with that kind of slogan, there's nowhere you can go. There is no way to do CVS right In addition to creating the Linux kernel, Linus Torvalds more recently wrote git, a prominent example of a distributed version control system. Rather than switching from CVS to SVN, the BioRuby project choose instead to use git, hosted on github. Biopython is considering doing something similar - using a distributed version control system like git should make it easier for potential Biopython contributors to manage their own local copies of Biopython under version control. Initially for evaluation purposes only, Giovanni and Bartek have setup a Biopython branch on GitHub, which will automatically be updated from the OBF hosted Biopython CVS repository [Link to wiki page]. If this is favorably received, then moving Biopython from CVS to git seems likely at some point this year. Peter on behalf of the Biopython developers I hope this has everyone's approval... if not please reply here so we can revise this before it gets posted. Note that I've avoided getting into specifics here, such as hosting arrangements, as the details will go out of date. Peter From bartek at rezolwenta.eu.org Mon Mar 16 10:24:42 2009 From: bartek at rezolwenta.eu.org (Bartek Wilczynski) Date: Mon, 16 Mar 2009 15:24:42 +0100 Subject: [Biopython-dev] biopython on github In-Reply-To: <320fb6e00903160430h125d11a3jd100497d3e25ffb8@mail.gmail.com> References: <8b34ec180902231029u7a9d003r533af7f078f4a8e2@mail.gmail.com> <8b34ec180902250256k6f6f5c1bvbf85d8b68a315927@mail.gmail.com> <8b34ec180902260526m3ff42f3x2a99a77d4d0fb928@mail.gmail.com> <320fb6e00902260600p5fb90241td1ded497c08cb901@mail.gmail.com> <128a885f0903121407g133ed8ctda57b21ff8adb70e@mail.gmail.com> <8b34ec180903121620w9c2ec46i8fed9ccb4781370e@mail.gmail.com> <320fb6e00903130521s69c5b3eg55b71191b1e8ff21@mail.gmail.com> <128a885f0903142243r372026d7vdf5bbe998db3a326@mail.gmail.com> <20090315185443.GA30296@kunkel> <320fb6e00903160430h125d11a3jd100497d3e25ffb8@mail.gmail.com> Message-ID: <8b34ec180903160724h2e239fafi22d8f5fa9c1de7cc@mail.gmail.com> On Mon, Mar 16, 2009 at 12:30 PM, Peter wrote: > On Sun, Mar 15, 2009 at 6:54 PM, Brad Chapman wrote: >> - Pick a distributed source code management system. My preference >> ?is Git, only because it currently has more steam behind it. >> ?Git/Bazaar will likely end up being like the VHS/Beta debate. > > I would agree git has more mind share, but I have no technical reason > to choose one over the other. > > In terms of read only access, having a mirrored trunk branch on both > git (e.g. github) and bazaar (e.g. launchpad) is possible for > evaluation purposes. It is possible, but I don't know if we should do this. To some extent having too much choice might be problematic.... We've done some tests on both bzr and git and it seems that both can do the job for us. I assume, that the purpose of "test-driving" instead of directly switching to git is to give us a possibility to go back in case things go really bad. But I don't think it's a likely event. Bigger projects are using git (or bzr) and doing fine, so we shouldn't have problems either. On the other hand I don't expect that having the possibility to test-drive two options is going to make the decision any easier. I don't expect too many people to try both options and even if it happens I don't think there will be a clear acclamation that one is better than the other. Honestly, we can't expect that all developers will learn two tools just to help us choose... Even though I was myself one of the proponents of switching to bzr I think that we should focus on one option and git seems to be the one with bigger mind share among biopythonistas. So I would vote for dropping the discussion on bzr and focusing on making sure that noone is left behind with their problems during the (possibly not too long) transition to git. > >> - Test drive use of Git on an official GitHub repository. This would >> ?involve a few things ... > > Giovanni has shared the github "Biopython" user information so we > (i.e. Biopython) can use that for any official presence on github - > which is great. ?Bartek and Giovanni seem to have this working OK. > > I think having the latest CVS trunk in Launchpad automatically is > stalled because they (launchpad) can't cope with a simple > username/password for accessing a remote CVS server. ?Is that right > Bartek? > Yes, we have now the biopython branch on github synchronized with CVS on an hourly basis. There is no problem with synchronizing a branch on launchpad in the same script, but I didn't do it for reasons explained above. >> - Evaluate the success of Git. This is easy to measure in terms of >> ?new contributors, increased happiness, and what not. At the same >> ?time we can monitor how GitHub evolves over time. > > It may not be that easy to measure in practice... > Well, If everyone will be able to use git I'd say it's a success. We don't need a perfect solution. We want to move to _a_ distributed version control system. > I did have that same example in mind - having to depend on a third > party like GitHub, LaunchPad or Sourceforge is great until things go > wrong. ?The Open Bio Foundation is much smaller, and while they don't > have 100% uptime either, they are normally very responsive to issues > because they only support a small number of projects. ?Of course, > ideally we might have both - an OBF hosted (git) repository on > biopython.org, synced to github for people to enjoy its collaborative > additions. > There is one difference between moving to sourceforge and moving to git. With git, it is much less of a problem to switch hosting. The fundamental idea is that every branch (including all local developer branches) can be a "master" branch. So switching to a different hosting location is a matter of an e-mail on the developer mailing list telling people to update the location of the "master" in their branches. So I think that we need to worry less about git hosting than we would need to worry about cvs (or svn for that matter). cheers Bartek From biopython at maubp.freeserve.co.uk Mon Mar 16 11:00:16 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 16 Mar 2009 15:00:16 +0000 Subject: [Biopython-dev] biopython on github In-Reply-To: <8b34ec180903160724h2e239fafi22d8f5fa9c1de7cc@mail.gmail.com> References: <8b34ec180902231029u7a9d003r533af7f078f4a8e2@mail.gmail.com> <8b34ec180902260526m3ff42f3x2a99a77d4d0fb928@mail.gmail.com> <320fb6e00902260600p5fb90241td1ded497c08cb901@mail.gmail.com> <128a885f0903121407g133ed8ctda57b21ff8adb70e@mail.gmail.com> <8b34ec180903121620w9c2ec46i8fed9ccb4781370e@mail.gmail.com> <320fb6e00903130521s69c5b3eg55b71191b1e8ff21@mail.gmail.com> <128a885f0903142243r372026d7vdf5bbe998db3a326@mail.gmail.com> <20090315185443.GA30296@kunkel> <320fb6e00903160430h125d11a3jd100497d3e25ffb8@mail.gmail.com> <8b34ec180903160724h2e239fafi22d8f5fa9c1de7cc@mail.gmail.com> Message-ID: <320fb6e00903160800s36b8231fo57e0a11506f8635d@mail.gmail.com> On Mon, Mar 16, 2009 at 2:24 PM, Bartek Wilczynski wrote: > > On Mon, Mar 16, 2009 at 12:30 PM, Peter wrote: >> On Sun, Mar 15, 2009 at 6:54 PM, Brad Chapman wrote: >>> - Pick a distributed source code management system. My preference >>> ?is Git, only because it currently has more steam behind it. >>> ?Git/Bazaar will likely end up being like the VHS/Beta debate. >> >> I would agree git has more mind share, but I have no technical reason >> to choose one over the other. >> >> In terms of read only access, having a mirrored trunk branch on both >> git (e.g. github) and bazaar (e.g. launchpad) is possible for >> evaluation purposes. > > It is possible, but I don't know if we should do this. To some extent > having too much choice might be problematic... True. > We've done some tests on both bzr and git and it seems that both > can do the job for us. I assume, that the purpose of "test-driving" > instead of directly switching to git is to give us a possibility to go > back in case things go really bad. But I don't think it's a likely > event. Bigger projects are using git (or bzr) and doing fine, so > we shouldn't have problems either. Well yes, having a fall back plan during this migration is essential. I do think there is a separate need for "test driving" for those of us with Biopython CVS access how don't have personally experience with git (or github). Making the switch before then would be a very bad idea. I personally need to make time to play with git and github, doing a couple of *real* branches and merges. I hope to so this week, some of the changes I'd like to do for Biopython 1.50 would make good candidates... but this is time that might otherwise be spent on bug fixes, documentation etc. And there is of course my real job too... ;) Related to this, what OS and version of git are you (Bartel and Giovanni) using? > On the other hand I don't expect that having the possibility to > test-drive two options is going to make the decision any easier. > I don't expect too many people to try both options and even if it > happens I don't think there will be a clear acclamation that one > is better than the other. I agree. > Honestly, we can't expect that all developers will learn two tools > just to help us choose... Even though I was myself one of the > proponents of switching to bzr. > I think that we should focus on one option and git seems to be the one > with bigger mind share among biopythonistas. > So I would vote for dropping the discussion on bzr and focusing on > making sure that noone is left behind with their > problems during the (possibly not too long) transition to git. I'm happy with dropping discussion on bzr, in favour of git. (As an aside I always liked the term biopythoneers, but biopythonistas is fun too.) >> Giovanni has shared the github "Biopython" user information so we >> (i.e. Biopython) can use that for any official presence on github - >> which is great. ?Bartek and Giovanni seem to have this working OK. >> >> I think having the latest CVS trunk in Launchpad automatically is >> stalled because they (launchpad) can't cope with a simple >> username/password for accessing a remote CVS server. ?Is that right >> Bartek? > > Yes, we have now the biopython branch on github synchronized with CVS > on an hourly basis. > There is no problem with synchronizing a branch on launchpad in the > same script, but I didn't do it for reasons explained above. OK. Do you want to make sure your Launchpad branch is clearly labeled as not current? > Well, If everyone will be able to use git I'd say it's a success. We > don't need a perfect solution. We want to move to _a_ distributed > version control system. Well, I suspect there are some silent contributors who don't care either way - its not perfect, but CVS works well enough. Better the devil you know ... ;) > ... > There is one difference between moving to sourceforge and moving to git. > With git, it is much less of a problem to switch hosting... So I think that we > need to worry less about git hosting than we would need to worry about > cvs (or svn for that matter). That is another good reason to pick git. Peter From bartek at rezolwenta.eu.org Mon Mar 16 12:55:40 2009 From: bartek at rezolwenta.eu.org (Bartek Wilczynski) Date: Mon, 16 Mar 2009 17:55:40 +0100 Subject: [Biopython-dev] biopython on github In-Reply-To: <320fb6e00903160800s36b8231fo57e0a11506f8635d@mail.gmail.com> References: <8b34ec180902231029u7a9d003r533af7f078f4a8e2@mail.gmail.com> <320fb6e00902260600p5fb90241td1ded497c08cb901@mail.gmail.com> <128a885f0903121407g133ed8ctda57b21ff8adb70e@mail.gmail.com> <8b34ec180903121620w9c2ec46i8fed9ccb4781370e@mail.gmail.com> <320fb6e00903130521s69c5b3eg55b71191b1e8ff21@mail.gmail.com> <128a885f0903142243r372026d7vdf5bbe998db3a326@mail.gmail.com> <20090315185443.GA30296@kunkel> <320fb6e00903160430h125d11a3jd100497d3e25ffb8@mail.gmail.com> <8b34ec180903160724h2e239fafi22d8f5fa9c1de7cc@mail.gmail.com> <320fb6e00903160800s36b8231fo57e0a11506f8635d@mail.gmail.com> Message-ID: <8b34ec180903160955m3d427927wce61940f51cf5337@mail.gmail.com> On Mon, Mar 16, 2009 at 4:00 PM, Peter wrote: > > I do think there is a separate need for "test driving" for those of us with > Biopython CVS access how don't have personally experience with git > (or github). ?Making the switch before then would be a very bad idea. > > I personally need to make time to play with git and github, doing a > couple of *real* branches and merges. ?I hope to so this week, some > of the changes I'd like to do for Biopython 1.50 would make good > candidates... but this is time that might otherwise be spent on bug > fixes, documentation etc. ?And there is of course my real job too... ;) > > Related to this, what OS and version of git are you (Bartel and Giovanni) using? > I'm currently using the binary installations on mac (intel) and ubuntu (8.10). I haven't experienced any problems which is quite expected on unix-like systems. It would be interesting to hear from people's experiences on windows. > > OK. ?Do you want to make sure your Launchpad branch is clearly labeled > as not current? > I've removed the bzr branches from launchpad, so there should be no more confusion. cheers Bartek From nuin at genedrift.org Mon Mar 16 12:58:26 2009 From: nuin at genedrift.org (Paulo Nuin) Date: Mon, 16 Mar 2009 12:58:26 -0400 Subject: [Biopython-dev] biopython on github In-Reply-To: <8b34ec180903160955m3d427927wce61940f51cf5337@mail.gmail.com> References: <8b34ec180902231029u7a9d003r533af7f078f4a8e2@mail.gmail.com> <320fb6e00902260600p5fb90241td1ded497c08cb901@mail.gmail.com> <128a885f0903121407g133ed8ctda57b21ff8adb70e@mail.gmail.com> <8b34ec180903121620w9c2ec46i8fed9ccb4781370e@mail.gmail.com> <320fb6e00903130521s69c5b3eg55b71191b1e8ff21@mail.gmail.com> <128a885f0903142243r372026d7vdf5bbe998db3a326@mail.gmail.com> <20090315185443.GA30296@kunkel> <320fb6e00903160430h125d11a3jd100497d3e25ffb8@mail.gmail.com> <8b34ec180903160724h2e239fafi22d8f5fa9c1de7cc@mail.gmail.com> <320fb6e00903160800s36b8231fo57e0a11506f8635d@mail.gmail.com> <8b34ec180903160955m3d427927wce61940f51cf5337@mail.gmail.com> Message-ID: <49BE8532.9040701@genedrift.org> No problem on Vista. Git (version 1.5.6.1-preview20080701) Paulo Bartek Wilczynski wrote: > On Mon, Mar 16, 2009 at 4:00 PM, Peter wrote: > >> I do think there is a separate need for "test driving" for those of us with >> Biopython CVS access how don't have personally experience with git >> (or github). Making the switch before then would be a very bad idea. >> >> I personally need to make time to play with git and github, doing a >> couple of *real* branches and merges. I hope to so this week, some >> of the changes I'd like to do for Biopython 1.50 would make good >> candidates... but this is time that might otherwise be spent on bug >> fixes, documentation etc. And there is of course my real job too... ;) >> >> Related to this, what OS and version of git are you (Bartel and Giovanni) using? >> >> > I'm currently using the binary installations on mac (intel) and ubuntu > (8.10). I haven't > experienced any problems which is quite expected on unix-like systems. > It would be > interesting to hear from people's experiences on windows. > > >> OK. Do you want to make sure your Launchpad branch is clearly labeled >> as not current? >> >> > > I've removed the bzr branches from launchpad, so there should be no > more confusion. > > cheers > Bartek > > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev > From biopython at maubp.freeserve.co.uk Mon Mar 16 13:07:18 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 16 Mar 2009 17:07:18 +0000 Subject: [Biopython-dev] biopython on github In-Reply-To: <49BE8532.9040701@genedrift.org> References: <8b34ec180902231029u7a9d003r533af7f078f4a8e2@mail.gmail.com> <8b34ec180903121620w9c2ec46i8fed9ccb4781370e@mail.gmail.com> <320fb6e00903130521s69c5b3eg55b71191b1e8ff21@mail.gmail.com> <128a885f0903142243r372026d7vdf5bbe998db3a326@mail.gmail.com> <20090315185443.GA30296@kunkel> <320fb6e00903160430h125d11a3jd100497d3e25ffb8@mail.gmail.com> <8b34ec180903160724h2e239fafi22d8f5fa9c1de7cc@mail.gmail.com> <320fb6e00903160800s36b8231fo57e0a11506f8635d@mail.gmail.com> <8b34ec180903160955m3d427927wce61940f51cf5337@mail.gmail.com> <49BE8532.9040701@genedrift.org> Message-ID: <320fb6e00903161007p3e36b6d3j29e4c319c762576a@mail.gmail.com> On Mon, Mar 16, 2009 at 4:58 PM, Paulo Nuin wrote: > > No problem on Vista. > > Git (version 1.5.6.1-preview20080701) > > Paulo Hi Paulo, Could you be a bit more precise about the version are you using and where got it from? i.e. Are you using cygwin or the Windows native port, http://code.google.com/p/msysgit/ And did you mean in general you have no problems with git on Windows Vista, or have you also tried fetching Biopython from github, building, testing (and installing it)? For example, are there any new line issues from the unit tests? This is one area where CVS and git may differ slightly... Thanks, Peter From dalloliogm at gmail.com Mon Mar 16 15:57:38 2009 From: dalloliogm at gmail.com (Giovanni Marco Dall'Olio) Date: Mon, 16 Mar 2009 20:57:38 +0100 Subject: [Biopython-dev] biopython on github In-Reply-To: <320fb6e00903160800s36b8231fo57e0a11506f8635d@mail.gmail.com> References: <8b34ec180902231029u7a9d003r533af7f078f4a8e2@mail.gmail.com> <320fb6e00902260600p5fb90241td1ded497c08cb901@mail.gmail.com> <128a885f0903121407g133ed8ctda57b21ff8adb70e@mail.gmail.com> <8b34ec180903121620w9c2ec46i8fed9ccb4781370e@mail.gmail.com> <320fb6e00903130521s69c5b3eg55b71191b1e8ff21@mail.gmail.com> <128a885f0903142243r372026d7vdf5bbe998db3a326@mail.gmail.com> <20090315185443.GA30296@kunkel> <320fb6e00903160430h125d11a3jd100497d3e25ffb8@mail.gmail.com> <8b34ec180903160724h2e239fafi22d8f5fa9c1de7cc@mail.gmail.com> <320fb6e00903160800s36b8231fo57e0a11506f8635d@mail.gmail.com> Message-ID: <5aa3b3570903161257h75b4289bn6cebed8312834fc9@mail.gmail.com> On Mon, Mar 16, 2009 at 4:00 PM, Peter wrote: > On Mon, Mar 16, 2009 at 2:24 PM, Bartek Wilczynski > wrote: > > Related to this, what OS and version of git are you (Bartel and Giovanni) > using? I am using git 1.5.4.3 on an Ubuntu 8.04 distribution. At home, I am using a git binary distribution on an Ubuntu 8.10. At the moment I am having some strange problems, relative to the fact that I had a branch previously named as 'biopython' in my account, so it seems don't understand well the fact that the old branch has been renamed. For example, I don't have the 'Fork' button.... but it must be a temporary problem, I already contacted the github's tech support. > Peter > > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev > -- My blog on bioinformatics (now in English): http://bioinfoblog.it From bartek at rezolwenta.eu.org Mon Mar 16 17:04:57 2009 From: bartek at rezolwenta.eu.org (Bartek Wilczynski) Date: Mon, 16 Mar 2009 22:04:57 +0100 Subject: [Biopython-dev] biopython on github In-Reply-To: <5aa3b3570903161257h75b4289bn6cebed8312834fc9@mail.gmail.com> References: <8b34ec180902231029u7a9d003r533af7f078f4a8e2@mail.gmail.com> <128a885f0903121407g133ed8ctda57b21ff8adb70e@mail.gmail.com> <8b34ec180903121620w9c2ec46i8fed9ccb4781370e@mail.gmail.com> <320fb6e00903130521s69c5b3eg55b71191b1e8ff21@mail.gmail.com> <128a885f0903142243r372026d7vdf5bbe998db3a326@mail.gmail.com> <20090315185443.GA30296@kunkel> <320fb6e00903160430h125d11a3jd100497d3e25ffb8@mail.gmail.com> <8b34ec180903160724h2e239fafi22d8f5fa9c1de7cc@mail.gmail.com> <320fb6e00903160800s36b8231fo57e0a11506f8635d@mail.gmail.com> <5aa3b3570903161257h75b4289bn6cebed8312834fc9@mail.gmail.com> Message-ID: <8b34ec180903161404s506757c2k80597a12a362cfc1@mail.gmail.com> Hi, On Mon, Mar 16, 2009 at 8:57 PM, Giovanni Marco Dall'Olio wrote: > > At the moment I am having some strange problems, relative to the fact that I > had a branch previously named as 'biopython' in my account, so it seems > don't understand well the fact that the old branch has been renamed. > For example, I don't have the 'Fork' button.... but it must be a temporary > problem, I already contacted the github's tech support. > This is connected with the change I made in the repository. Namely I renamed the branch created by Giovanni to biopuython-old and created a new one (the "official" one) called biopython again. The "rename" feature was flagged as experimental, and I don't think we would expect to use it anymore, and there were warnings that it can affect the branches forked from the branched previously created by Giovanni. These two branches were incompatible, since they were done with different scripts (different revision numbers). So if you need to make retain some changes you made to the old branch, please export them from your local copy as changesets and apply these back to the new forks made from the new repository. I'm sorry for the inconvenience. cheers Bartek From chapmanb at 50mail.com Mon Mar 16 18:42:40 2009 From: chapmanb at 50mail.com (Brad Chapman) Date: Mon, 16 Mar 2009 18:42:40 -0400 Subject: [Biopython-dev] biopython on github In-Reply-To: <320fb6e00903160430h125d11a3jd100497d3e25ffb8@mail.gmail.com> References: <320fb6e00902250210t2ad19536ke379e219ba6f7dae@mail.gmail.com> <8b34ec180902250256k6f6f5c1bvbf85d8b68a315927@mail.gmail.com> <8b34ec180902260526m3ff42f3x2a99a77d4d0fb928@mail.gmail.com> <320fb6e00902260600p5fb90241td1ded497c08cb901@mail.gmail.com> <128a885f0903121407g133ed8ctda57b21ff8adb70e@mail.gmail.com> <8b34ec180903121620w9c2ec46i8fed9ccb4781370e@mail.gmail.com> <320fb6e00903130521s69c5b3eg55b71191b1e8ff21@mail.gmail.com> <128a885f0903142243r372026d7vdf5bbe998db3a326@mail.gmail.com> <20090315185443.GA30296@kunkel> <320fb6e00903160430h125d11a3jd100497d3e25ffb8@mail.gmail.com> Message-ID: <20090316224240.GA57054@sobchak.mgh.harvard.edu> Hey everyone; Wow, y'all are quick. Bartek, Giovanni and Peter -- thanks for all the hard work and organization. Consolidating a couple of threads below... > >> I've written a short guide on the wiki : > >> http://biopython.org/wiki/GitMigration > > > > I also have a draft for some documentation... I can contribute it later this > > morning (now I don't have time). > > In the meantime, I have updated the following pages accordingly: The documentation looks awesome. My only suggestion would be to change the navigation link that current points to CVS to point to a generic page like SourceCode. Then that landing page could link to the current CVS and explain we are working to transition to Git, with links to those pages. Currently, the Git docs are a bit buried from the front page. Peter, I don't appear to have wiki permissions to edit the navigation bar; do you? Peter: > I'm thinking a news post on > http://news.open-bio.org/news/category/obf-projects/biopython/ about > version control would be a good idea at this point. How about this - This is great, and I would move the last paragraph describing the Git repository to the beginning; start with what we are doing and then describe the rationale. This should help for those with ADD, and also give more prominent credit to Bartek, Giovanni and you for the work that went into this. > > - Evaluate the success of Git. This is easy to measure in terms of > > ?new contributors, increased happiness, and what not. At the same > > ?time we can monitor how GitHub evolves over time. > > It may not be that easy to measure in practice... How about these two metrics: - How do current developers like it? Beyond the initial learning curve, does it work at least as good as CVS for day to day stuff? - Does it lower the entry barriers to contributing to Biopython? The main reason to do this is to ease the initial work for coders who feel CVS/Patches/Bugzilla is too much. If we find new contributors through this, it's a win. Modest expectations are good. If either of these fail miserably, then we can re-evaluate. Brad From chapmanb at 50mail.com Mon Mar 16 18:55:58 2009 From: chapmanb at 50mail.com (Brad Chapman) Date: Mon, 16 Mar 2009 18:55:58 -0400 Subject: [Biopython-dev] Preparing for Biopython 1.50 (beta) In-Reply-To: <320fb6e00903160516yd63f61fu21ca7560562dd6dd@mail.gmail.com> References: <320fb6e00903160516yd63f61fu21ca7560562dd6dd@mail.gmail.com> Message-ID: <20090316225558.GC57054@sobchak.mgh.harvard.edu> Peter; > I think we should probably do another release soon Good call. +1 from me. > I'd like to include the following changes as part of the beta, but it > would be sensible to have someone else try these out first. Any > volunteers? > > Bug 2767 - Bio.SeqIO support for FASTQ and QUAL files The code for this looked good when I reviewed it earlier. I will test it out with some solexa reads from here this week; any reason not to check the patch and files into CVS? Then I can fire up my coal-powered revision control system, feed two punch cards into the mouth of the machine, hope the vacuum tubes don't burn out again, and check it out locally. Brad From tiagoantao at gmail.com Mon Mar 16 20:11:50 2009 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Tue, 17 Mar 2009 00:11:50 +0000 Subject: [Biopython-dev] biopython on github In-Reply-To: <20090316224240.GA57054@sobchak.mgh.harvard.edu> References: <320fb6e00902250210t2ad19536ke379e219ba6f7dae@mail.gmail.com> <8b34ec180902260526m3ff42f3x2a99a77d4d0fb928@mail.gmail.com> <320fb6e00902260600p5fb90241td1ded497c08cb901@mail.gmail.com> <128a885f0903121407g133ed8ctda57b21ff8adb70e@mail.gmail.com> <8b34ec180903121620w9c2ec46i8fed9ccb4781370e@mail.gmail.com> <320fb6e00903130521s69c5b3eg55b71191b1e8ff21@mail.gmail.com> <128a885f0903142243r372026d7vdf5bbe998db3a326@mail.gmail.com> <20090315185443.GA30296@kunkel> <320fb6e00903160430h125d11a3jd100497d3e25ffb8@mail.gmail.com> <20090316224240.GA57054@sobchak.mgh.harvard.edu> Message-ID: <6d941f120903161711p71c7c940t1eabe933c0fa43e5@mail.gmail.com> I've been reading this thread and mainly staying silent but there is one question that is not clear in my mind but I believe it is important: How is the "official" biopython trunk controlled? Currently what is on CVS is the gospel and Peter and Michiel essencially have control of what is there and what is labelled as a "biopython distribution". How will this work now? The second question, related to the first is how will different branches (of different persons) be managed? I am seeing people starting working on the same code in different directions and then having problems merging everything together. Maybe these questions stem from my ignorance of distributed version control. But, if not, I think they should be resolved before advancing. My suggestion: write (or at least informally agree) the policy before advancing. While distributed version control seems a good idea (no opposition), it also seems a good way to create new problems. BTW, I would be tempted to suggest that a labelled release would be a good starting point for a distributed revision control bootstrap. On Mon, Mar 16, 2009 at 10:42 PM, Brad Chapman wrote: > Hey everyone; > Wow, y'all are quick. Bartek, Giovanni and Peter -- thanks for all > the hard work and organization. Consolidating a couple of threads > below... > >> >> I've written a short guide on the wiki : >> >> http://biopython.org/wiki/GitMigration >> > >> > I also have a draft for some documentation... I can contribute it later this >> > morning (now I don't have time). >> >> In the meantime, I have updated the following pages accordingly: > > The documentation looks awesome. My only suggestion would be to > change the navigation link that current points to CVS to point to a > generic page like SourceCode. Then that landing page could link > to the current CVS and explain we are working to transition to > Git, with links to those pages. Currently, the Git docs are a > bit buried from the front page. > > Peter, I don't appear to have wiki permissions to edit the navigation > bar; do you? > > Peter: >> I'm thinking a news post on >> http://news.open-bio.org/news/category/obf-projects/biopython/ about >> version control would be a good idea at this point. ?How about this - > > This is great, and I would move the last paragraph describing > the Git repository to the beginning; start with what we are doing and > then describe the rationale. This should help for those with ADD, and > also give more prominent credit to Bartek, Giovanni and you for the > work that went into this. > >> > - Evaluate the success of Git. This is easy to measure in terms of >> > ?new contributors, increased happiness, and what not. At the same >> > ?time we can monitor how GitHub evolves over time. >> >> It may not be that easy to measure in practice... > > How about these two metrics: > > - How do current developers like it? Beyond the initial learning > ?curve, does it work at least as good as CVS for day to day stuff? > > - Does it lower the entry barriers to contributing to Biopython? The > ?main reason to do this is to ease the initial work for coders who > ?feel CVS/Patches/Bugzilla is too much. If we find new contributors > ?through this, it's a win. > > Modest expectations are good. If either of these fail miserably, then > we can re-evaluate. > > Brad > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev > -- "A man who dares to waste one hour of time has not discovered the value of life" - Charles Darwin From dschruth at u.washington.edu Mon Mar 16 19:15:39 2009 From: dschruth at u.washington.edu (David Schruth) Date: Mon, 16 Mar 2009 16:15:39 -0700 Subject: [Biopython-dev] Preparing for Biopython 1.50 (beta) In-Reply-To: <20090316225558.GC57054@sobchak.mgh.harvard.edu> References: <320fb6e00903160516yd63f61fu21ca7560562dd6dd@mail.gmail.com> <20090316225558.GC57054@sobchak.mgh.harvard.edu> Message-ID: <49BEDD9B.6030905@u.washington.edu> I've got some 454 and Solid data you could test it on too. Has anybody else looked into how these other two Next Gen formats might complicate things? Brad Chapman wrote: > Peter; > > >> I think we should probably do another release soon >> > > Good call. +1 from me. > > >> I'd like to include the following changes as part of the beta, but it >> would be sensible to have someone else try these out first. Any >> volunteers? >> >> Bug 2767 - Bio.SeqIO support for FASTQ and QUAL files >> > > The code for this looked good when I reviewed it earlier. I will > test it out with some solexa reads from here this week; any reason > not to check the patch and files into CVS? Then I can fire up my > coal-powered revision control system, feed two punch cards into the > mouth of the machine, hope the vacuum tubes don't burn out again, > and check it out locally. > > Brad > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev > -------------- next part -------------- A non-text attachment was scrubbed... Name: dschruth.vcf Type: text/x-vcard Size: 450 bytes Desc: not available URL: From bugzilla-daemon at portal.open-bio.org Mon Mar 16 20:40:01 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 16 Mar 2009 20:40:01 -0400 Subject: [Biopython-dev] [Bug 2790] New: Genepop parser creates a full representation of the file on memory Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2790 Summary: Genepop parser creates a full representation of the file on memory Product: Biopython Version: Not Applicable Platform: All OS/Version: All Status: NEW Severity: normal Priority: P2 Component: PopGen AssignedTo: biopython-dev at biopython.org ReportedBy: tiagoantao at gmail.com The genepop parser creates a full representation of the file on memory. This is fine for most users (like with 100/200 individuals and 100 markers) but, more and more people appear now with thousands of individuals and/or thousands of loci. In some cases the whole file doesn't fit memory. An alternative (iterator based) interface has to be created which only maintains a subset of the file in memory -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From idoerg at gmail.com Mon Mar 16 20:49:39 2009 From: idoerg at gmail.com (Iddo Friedberg) Date: Mon, 16 Mar 2009 17:49:39 -0700 Subject: [Biopython-dev] Preparing for Biopython 1.50 (beta) In-Reply-To: <49BEDD9B.6030905@u.washington.edu> References: <320fb6e00903160516yd63f61fu21ca7560562dd6dd@mail.gmail.com> <20090316225558.GC57054@sobchak.mgh.harvard.edu> <49BEDD9B.6030905@u.washington.edu> Message-ID: <1237250979.20135.5.camel@lafa> I have. For one thing, GenBank has some new files that break the current parser. LOCUS ABDH01000000 55108 rc DNA linear ENV 26-NOV-2007 This is a typical header for an environmental sequence (notice the ENV). Note taht this does not necessarily have to be a next-gen sequence. It can also be Sanger. The point is, it's not genome associated, but obtained using metagenomic methods To our business: the "rc" breaks the parser. The file itself is attahed. Note that in the end iit does not have a sequence, but rather a WGS field that points to sequence files. I'll actually be happy to take this one. ./I On Mon, 2009-03-16 at 16:15 -0700, David Schruth wrote: > I've got some 454 and Solid data you could test it on too. > > Has anybody else looked into how these other two Next Gen formats might > complicate things? > > Brad Chapman wrote: > > Peter; > > > > > >> I think we should probably do another release soon > >> > > > > Good call. +1 from me. > > > > > >> I'd like to include the following changes as part of the beta, but it > >> would be sensible to have someone else try these out first. Any > >> volunteers? > >> > >> Bug 2767 - Bio.SeqIO support for FASTQ and QUAL files > >> > > > > The code for this looked good when I reviewed it earlier. I will > > test it out with some solexa reads from here this week; any reason > > not to check the patch and files into CVS? Then I can fire up my > > coal-powered revision control system, feed two punch cards into the > > mouth of the machine, hope the vacuum tubes don't burn out again, > > and check it out locally. > > > > Brad > > _______________________________________________ > > Biopython-dev mailing list > > Biopython-dev at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/biopython-dev > > > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev -- Iddo Friedberg, Ph.D. CALIT2 Atkinson Hall MC #0446 University of California San Diego 9500 Gilman Drive La Jolla, CA 92093-0446 USA +1 (858) 534-0570 http://iddo-friedberg.org -------------- next part -------------- LOCUS ABDH01000000 55108 rc DNA linear ENV 26-NOV-2007 DEFINITION Termite gut metagenome, whole genome shotgun sequencing project. ACCESSION ABDH00000000 VERSION ABDH00000000.1 GI:161074815 PROJECT GenomeProject:19107 DBLINK Project:19107 KEYWORDS WGS. SOURCE termite gut metagenome ORGANISM termite gut metagenome unclassified sequences; metagenomes; organismal metagenomes. REFERENCE 1 (bases 1 to 55108) AUTHORS Warnecke,F., Luginbuhl,P., Ivanova,N., Ghassemian,M., Richardson,T.H., Stege,J.T., Cayouette,M., McHardy,A.C., Djordjevic,G., Aboushadi,N., Sorek,R., Tringe,S.G., Podar,M., Martin,H.G., Kunin,V., Dalevi,D., Madejska,J., Kirton,E., Platt,D., Szeto,E., Salamov,A., Barry,K., Mikhailova,N., Kyrpides,N.C., Matson,E.G., Ottesen,E.A., Zhang,X., Hernandez,M., Murillo,C., Acosta,L.G., Rigoutsos,I., Tamayo,G., Green,B.D., Chang,C., Rubin,E.M., Mathur,E.J., Robertson,D.E., Hugenholtz,P. and Leadbetter,J.R. TITLE Metagenomic and functional analysis of hindgut microbiota of a wood-feeding higher termite JOURNAL Nature 450 (7169), 560-565 (2007) PUBMED 18033299 REFERENCE 2 (bases 1 to 55108) AUTHORS Warnecke,F., Luginbuhl,P., Ivanova,N., Ghassemian,M., Richardson,T.H., Stege,J.T., Cayouette,M., Djordjevic,G., Aboushadi,N., Sorek,R., Tringe,S.G., Podar,M., Garcia Martin,H., Kunin,V., McHardy,A.C., Dalevi,D., Madejska,J., Kirton,E., Platt,D., Szeto,E., Salamov,A., Barry,K., Mikhailova,N., Kyrpides,N., Matson,E.G., Ottesen,E.A., Zhang,X., Hernandez,M., Murillo,C., Acosta,L.G., Rigoutsos,I., Tamayo,G., Green,B., Chang,C., Rubin,E.M., Mathur,E.J., Robertson,D.E., Hugenholtz,P. and Leadbetter,J.R. TITLE Direct Submission JOURNAL Submitted (27-JUN-2007) Microbial Ecology Program, US DOE Joint Genome Institute, 2800 Mitchell Drive B100, Walnut Creek, CA 94598-1698, USA COMMENT The termite gut metagenome whole genome shotgun (WGS) project has the project accession ABDH00000000. This version of the project (01) has the accession number ABDH01000000, and consists of sequences ABDH01000001-ABDH01055108. URL -- http://www.jgi.doe.gov JGI Project ID:4001605 Contact: Philip Hugenholtz (PHugenholtz at lbl.gov) sampling site latitude: N10.11.260; sampling site longitude: W083.51.345; sampling site altitude: 310 m AMSL; sample type: lumen content; host species: Nasutitermes sp.; anatomic site: gut, proctodeal segment 3, lumen; association type: symbiosis; sample treatment and preservation: termites were collected, transported to laboratory alive within 36 hours, P3 gut lumen fluid was extracted and stored frozen in buffered saline solution until DNA extraction. The JGI and collaborators endorse the principles for the distribution and use of large scale sequencing data adopted by the larger genome sequencing community and urge users of this data to follow them. It is our intention to publish the work of this project in a timely fashion and we welcome collaborative interaction on the project and analysis. (http://www.genome.gov/page.cfm?pageID=10506376). FEATURES Location/Qualifiers source 1..55108 /organism="termite gut metagenome" /mol_type="genomic DNA" /isolation_source="Nasutitermes sp. proctodeal segment 3 gut lumen" /db_xref="taxon:433724" /environmental_sample /country="Costa Rica" /lat_lon="10.1877 N 83.8558 W" /note="metagenomic" WGS ABDH01000001-ABDH01055108 // From chris.lasher at gmail.com Mon Mar 16 23:45:33 2009 From: chris.lasher at gmail.com (Chris Lasher) Date: Mon, 16 Mar 2009 23:45:33 -0400 Subject: [Biopython-dev] biopython on github In-Reply-To: <6d941f120903161711p71c7c940t1eabe933c0fa43e5@mail.gmail.com> References: <320fb6e00902250210t2ad19536ke379e219ba6f7dae@mail.gmail.com> <320fb6e00902260600p5fb90241td1ded497c08cb901@mail.gmail.com> <128a885f0903121407g133ed8ctda57b21ff8adb70e@mail.gmail.com> <8b34ec180903121620w9c2ec46i8fed9ccb4781370e@mail.gmail.com> <320fb6e00903130521s69c5b3eg55b71191b1e8ff21@mail.gmail.com> <128a885f0903142243r372026d7vdf5bbe998db3a326@mail.gmail.com> <20090315185443.GA30296@kunkel> <320fb6e00903160430h125d11a3jd100497d3e25ffb8@mail.gmail.com> <20090316224240.GA57054@sobchak.mgh.harvard.edu> <6d941f120903161711p71c7c940t1eabe933c0fa43e5@mail.gmail.com> Message-ID: <128a885f0903162045l474d0df3w2b8fad7f7f129a3b@mail.gmail.com> 2009/3/16 Tiago Ant?o > I've been reading this thread and mainly staying silent but there is > one question that is not clear in my mind but I believe it is > important: > > How is the "official" biopython trunk controlled? Currently what is on > CVS is the gospel and Peter and Michiel essencially have control of > what is there and what is labelled as a "biopython distribution". How > will this work now? In a distributed workflow, there is no technical official repository. The "official repository" is socially enforced. Technically, there is no official repository of the Linux kernel anymore. However, there is an "official" version, which is Linus Torvald's repository. It is socially enforced. I think Michiel and Peter still head the Biopython project--at least they have the most clout, I would say. Therefore, we will probably look to one of their branches as the "official" branch of Biopython. When one of them wants to step down in duty, we will socially pass the torch on to the next taker. See "6.3 Using gatekeepers" at http://doc.bazaar-vcs.org/latest/en/user-guide/index.html#team-collaboration-distributed-style See also http://betterexplained.com/articles/intro-to-distributed-version-control-illustrated/ > The second question, related to the first is how will different > branches (of different persons) be managed? I am seeing people > starting working on the same code in different directions and then > having problems merging everything together. People are supposed to work in different directions; this is the point of distributed workflows. Merging tends not to be so difficult, and compared to centralized models like CVS and SVN, it's a cinch. We will help provide documentation for proper merging habits (e.g., merge early, merge often, and no rebasing after pushing, etc.). There are also screencasts popping up (in particular Scott Chacon's re-make of his Gitcasts, now at learn.github) that we will link to for educational purposes. And of course, other developers will be around to help out in tricky merges. Chris From bugzilla-daemon at portal.open-bio.org Tue Mar 17 00:11:34 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 17 Mar 2009 00:11:34 -0400 Subject: [Biopython-dev] [Bug 2791] New: GenBank Scanner does not parse environmental (ENV) files Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2791 Summary: GenBank Scanner does not parse environmental (ENV) files Product: Biopython Version: 1.49 Platform: All OS/Version: All Status: NEW Severity: major Priority: P2 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: idoerg at gmail.com CC: idoerg at gmail.com GenBank Scanner does not parse environmental (ENV) files. Breask on the 'rc' characters in the LOCUS lines. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Mar 17 00:14:50 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 17 Mar 2009 00:14:50 -0400 Subject: [Biopython-dev] [Bug 2791] GenBank Scanner does not parse environmental (ENV) files In-Reply-To: Message-ID: <200903170414.n2H4Eoit008338@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2791 idoerg at gmail.com changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |ASSIGNED -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Mar 17 00:32:30 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 17 Mar 2009 00:32:30 -0400 Subject: [Biopython-dev] [Bug 2791] GenBank Scanner does not parse environmental (ENV) files In-Reply-To: Message-ID: <200903170432.n2H4WUQn009490@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2791 idoerg at gmail.com changed: What |Removed |Added ---------------------------------------------------------------------------- AssignedTo|biopython-dev at biopython.org |idoerg at gmail.com Status|ASSIGNED |NEW -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. You are the assignee for the bug, or are watching the assignee. From biopython at maubp.freeserve.co.uk Tue Mar 17 04:46:03 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 17 Mar 2009 08:46:03 +0000 Subject: [Biopython-dev] biopython on github In-Reply-To: <128a885f0903162045l474d0df3w2b8fad7f7f129a3b@mail.gmail.com> References: <320fb6e00902250210t2ad19536ke379e219ba6f7dae@mail.gmail.com> <128a885f0903121407g133ed8ctda57b21ff8adb70e@mail.gmail.com> <8b34ec180903121620w9c2ec46i8fed9ccb4781370e@mail.gmail.com> <320fb6e00903130521s69c5b3eg55b71191b1e8ff21@mail.gmail.com> <128a885f0903142243r372026d7vdf5bbe998db3a326@mail.gmail.com> <20090315185443.GA30296@kunkel> <320fb6e00903160430h125d11a3jd100497d3e25ffb8@mail.gmail.com> <20090316224240.GA57054@sobchak.mgh.harvard.edu> <6d941f120903161711p71c7c940t1eabe933c0fa43e5@mail.gmail.com> <128a885f0903162045l474d0df3w2b8fad7f7f129a3b@mail.gmail.com> Message-ID: <320fb6e00903170146x59218aa0m857cab797ad3f440@mail.gmail.com> On Tue, Mar 17, 2009 at 3:45 AM, Chris Lasher wrote: > 2009/3/16 Tiago Ant?o > >> I've been reading this thread and mainly staying silent but there is >> one question that is not clear in my mind but I believe it is >> important: >> >> How is the "official" biopython trunk controlled? Currently what is on >> CVS is the gospel and Peter and Michiel essencially have control of >> what is there and what is labelled as a "biopython distribution". How >> will this work now? > > In a distributed workflow, there is no technical official repository. The > "official repository" is socially enforced. Technically, there is no > official repository of the Linux kernel anymore. However, there is an > "official" version, which is Linus Torvald's repository. It is socially > enforced. I think Michiel and Peter still head the Biopython project--at > least they have the most clout, I would say. Therefore, we will probably > look to one of their branches as the "official" branch of Biopython. When > one of them wants to step down in duty, we will socially pass the torch on > to the next taker. I think it is essential we have a clearly labeled official trunk (perhaps with branches for releases), which will be used for all the official releases (tar balls, zip files and windows installers). Our main webpage should make this very clear. We could potentially continue to have a shared official branch (e.g. belonging to the generic github biopython user), and give all the existing CVS contributors write access - and continue to manage this as before. So for example, if Frank wanted to check in some minor changes to Bio.Nexus he could just do it. Future contributors patches/branches might get taken up by a developer on a personal branch for testing, before being merged into the official branch. i.e. We can initially continue as before - right now I don't have a feel for how much work the role of an official branch maintainer would be, and it is difficult to guess without more hands on experience using the new tools. >> The second question, related to the first is how will different >> branches (of different persons) be managed? I am seeing people >> starting working on the same code in different directions and then >> having problems merging everything together. > > People are supposed to work in different directions; this is the point of > distributed workflows. Merging tends not to be so difficult, and compared to > centralized models like CVS and SVN, it's a cinch. We will help provide > documentation for proper merging habits (e.g., merge early, merge often, and > no rebasing after pushing, etc.). There are also screencasts popping up (in > particular Scott Chacon's re-make of his Gitcasts, now at learn.github) that > we will link to for educational purposes. And of course, other developers > will be around to help out in tricky merges. Well, yes, in theory we have the same problem now with CVS - and while the tools may make merging easier, some communication is essential when working on the key modules which impact large parts of the code base. Peter From biopython at maubp.freeserve.co.uk Tue Mar 17 04:58:00 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 17 Mar 2009 08:58:00 +0000 Subject: [Biopython-dev] biopython on github In-Reply-To: <20090316224240.GA57054@sobchak.mgh.harvard.edu> References: <320fb6e00902250210t2ad19536ke379e219ba6f7dae@mail.gmail.com> <8b34ec180902260526m3ff42f3x2a99a77d4d0fb928@mail.gmail.com> <320fb6e00902260600p5fb90241td1ded497c08cb901@mail.gmail.com> <128a885f0903121407g133ed8ctda57b21ff8adb70e@mail.gmail.com> <8b34ec180903121620w9c2ec46i8fed9ccb4781370e@mail.gmail.com> <320fb6e00903130521s69c5b3eg55b71191b1e8ff21@mail.gmail.com> <128a885f0903142243r372026d7vdf5bbe998db3a326@mail.gmail.com> <20090315185443.GA30296@kunkel> <320fb6e00903160430h125d11a3jd100497d3e25ffb8@mail.gmail.com> <20090316224240.GA57054@sobchak.mgh.harvard.edu> Message-ID: <320fb6e00903170158o757a4fc4naae80f83850d6093@mail.gmail.com> > > The documentation looks awesome. My only suggestion would be to > change the navigation link that current points to CVS to point to a > generic page like SourceCode. Then that landing page could link > to the current CVS and explain we are working to transition to > Git, with links to those pages. Currently, the Git docs are a > bit buried from the front page. > > Peter, I don't appear to have wiki permissions to edit the navigation > bar; do you? I'm not sure how to do it (although I probably have the relevant permissions). I can probably give you admin rights - you use the "Chapmanb" username on the wiki, right? > Peter: >> I'm thinking a news post on >> http://news.open-bio.org/news/category/obf-projects/biopython/ about >> version control would be a good idea at this point. ?How about this - > > This is great, and I would move the last paragraph describing > the Git repository to the beginning; start with what we are doing and > then describe the rationale. This should help for those with ADD, and > also give more prominent credit to Bartek, Giovanni and you for the > work that went into this. OK. New version, with the markup for the links included: Initially for evaluation purposes only, Giovanni and Bartek have setup a mirror of Biopython on GitHub, which is automatically updated from the OBF hosted Biopython CVS repository. See our git migration wiki page for details. If this is favorably received, then moving Biopython from CVS to git seems likely at some point this year. Originally, all the OBF hosted projects used CVS for their source code repositories. At the start of 2008, BioPerl and BioJava moved over to Subversion (SVN), followed by BioSQL. Biopython was originally going to do the same, but this didn't actually happen. Having all the Bio* projects using the same version control system would have simplified server administration for the OBF, but using SVN wouldn't really have made that much difference to Biopython development. Discussion on the Biopython development mailing list has since shifted towards next-generation distributed version control systems like git or Bazaar. Quote from Linus Torvalds,
The slogan of Subversion for a while was ?CVS done right?, or something like that, and if you start with that kind of slogan, there's nowhere you can go. There is no way to do CVS right.
In addition to creating the Linux kernel, Linus Torvalds more recently wrote git, a prominent example of a distributed version control system. Rather than switching from CVS to SVN, the BioRuby project choose instead to use git, hosted on github (see the BioRuby repository). Biopython is considering doing something similar - using a distributed version control system like git should make it easier for potential Biopython contributors to manage their own local copies of Biopython under version control. Peter, on behalf of the Biopython developers From biopython at maubp.freeserve.co.uk Tue Mar 17 05:06:31 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 17 Mar 2009 09:06:31 +0000 Subject: [Biopython-dev] history on github - where are the tags? Message-ID: <320fb6e00903170206h570989bbgb6b9a761d2aa70ed@mail.gmail.com> Hi Bartek et al, I've just been looking over the github mirror of CVS, and wanted to see it presented the history of individual files. For example, this page looks at the Bio/SeqRecord.py history using ViewCVS: http://cvs.biopython.org/cgi-bin/viewcvs/viewcvs.cgi/biopython/Bio/SeqRecord.py?cvsroot=biopython For comparison, in GitHub, http://github.com/biopython/biopython/commits/master/Bio/SeqRecord.py As you can see, all the comments and changes are there - which is great. But I can't see the CVS tag information, which I assume would be converting into git tags. Is this information present in the git repository, but not shown by github, or was it lost during the migration? This might seem like a little thing, but I have found it incredibly important for tracing bugs reported in older releases, for example in narrowing down when something changed. Peter From biopython at maubp.freeserve.co.uk Tue Mar 17 05:41:22 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 17 Mar 2009 09:41:22 +0000 Subject: [Biopython-dev] biopython on github In-Reply-To: <8b34ec180903161404s506757c2k80597a12a362cfc1@mail.gmail.com> References: <8b34ec180902231029u7a9d003r533af7f078f4a8e2@mail.gmail.com> <8b34ec180903121620w9c2ec46i8fed9ccb4781370e@mail.gmail.com> <320fb6e00903130521s69c5b3eg55b71191b1e8ff21@mail.gmail.com> <128a885f0903142243r372026d7vdf5bbe998db3a326@mail.gmail.com> <20090315185443.GA30296@kunkel> <320fb6e00903160430h125d11a3jd100497d3e25ffb8@mail.gmail.com> <8b34ec180903160724h2e239fafi22d8f5fa9c1de7cc@mail.gmail.com> <320fb6e00903160800s36b8231fo57e0a11506f8635d@mail.gmail.com> <5aa3b3570903161257h75b4289bn6cebed8312834fc9@mail.gmail.com> <8b34ec180903161404s506757c2k80597a12a362cfc1@mail.gmail.com> Message-ID: <320fb6e00903170241i5b4a122ax1f33ff18450771df@mail.gmail.com> On Mon, Mar 16, 2009 at 9:04 PM, Bartek Wilczynski wrote: > Hi, > On Mon, Mar 16, 2009 at 8:57 PM, Giovanni Marco Dall'Olio > wrote: >> >> At the moment I am having some strange problems, relative to the fact that I >> had a branch previously named as 'biopython' in my account, so it seems >> don't understand well the fact that the old branch has been renamed. >> For example, I don't have the 'Fork' button.... but it must be a temporary >> problem, I already contacted the github's tech support. > > This is connected with the change I made in the repository. Namely I > renamed the branch created by Giovanni to biopuython-old and created > a new one (the "official" one) called biopython again. > > The "rename" feature was flagged as experimental, and I don't think we > would expect to use it anymore, and there were warnings that it can affect > the branches forked from the branched previously created by Giovanni. We may need to do another rename, if we have to repeat the CVS to git migration. For example, see my other email about the CVS tags (missing?). Another potential question is can you re-map the CVS usernames as part of the migration? e.g. Can you somehow replace CVS users "bartek", "peterc", ... with guthub users "barwil", "peterjc", ...? Not essential, but it would be nice. I would suggest as a precaution we rename it sooner rather than later (while only a few people will be inconvenienced), going from biopython to biopython-cvs-mirror (or similar). If this does end up being the actual trunk branch, we can just fork it under a new branch name like "biopython" or "biopython-official" etc. Peter From lpritc at scri.ac.uk Tue Mar 17 05:59:32 2009 From: lpritc at scri.ac.uk (Leighton Pritchard) Date: Tue, 17 Mar 2009 09:59:32 +0000 Subject: [Biopython-dev] biopython on github In-Reply-To: <320fb6e00903170146x59218aa0m857cab797ad3f440@mail.gmail.com> Message-ID: Hi all, This has been an occasionally frustrating thread to read... On 17/03/2009 08:46, "Peter" wrote: > On Tue, Mar 17, 2009 at 3:45 AM, Chris Lasher wrote: >> 2009/3/16 Tiago Ant?o >> >>> How is the "official" biopython trunk controlled? Currently what is on >>> CVS is the gospel and Peter and Michiel essencially have control of >>> what is there and what is labelled as a "biopython distribution". How >>> will this work now? >> In a distributed workflow, there is no technical official repository. The >> "official repository" is socially enforced. That was true before. Unless I misread the Biopython licencing, there was no real barrier to putting a branched copy of the code on your own server/site, with your own modifications. What git does is provide tools to make merging of that sort of code easier (along with a number of of other nice features, such as authentication of contributions). The presence of git does not ensure that your changes, or anyone else's, will be merged with any other repository, and nor does it ensure the quality of contributed code. Git, while nice, and ideal for a number of tasks, is no magic bullet. To an extent, the 'official' repository is, pragmatically, the one that is most stable and well-tested. If my hypothetical branched version had become more stable and widely-used than the 'official' trunk, and become the most frequently downloaded and implemented, and received new contributions in its own right, it might then be considered de facto 'the distribution'; nasty online spats with the original authors notwithstanding. The 'social enforcement' of politeness (i.e. *I* don't take credit for *your* work) prevents this to an extent, as it ought to under any versioning system. There's a competing tendency to consider that the coders who spent the most time creating the code understand it the best, and are in the best position to maintain it directly. This is true to a large degree, and entirely applicable to Biopython's contributed modules. git can potentially facilitate that sort of contribution to the 'official' trunk in a way that CVS can't, due to its permissions bottleneck. However, the mechanics of incorporating that contributed code are more or less the same: the people with control of the 'official' trunk review the code and decide whether to include it. This is true whether the code is submitted as a patch to Bugzilla, emailed to a developer, put up on public CVS on your site, or in a forked git repository. The same is true of your own git repository - you don't have to include someone else's forked code if you don't want to. What possibly needs to change is not the version control system, but the way in which people think about their contribution. Contributions can be made productively under any versioning system, and the key questions remain the same in all cases: Does the new code work (are there tests)? Does the new code break any old code? Is there documentation? Is the API consistent? "What version control system are we using?" is a minor detail, unless it is inherently broken, hinders any of the above, or causes some other deal-breaking issue (for Linus Torvalds, this included speed issues for merges). >> I think Michiel and Peter still head the Biopython project--at >> least they have the most clout, I would say. Therefore, we will probably >> look to one of their branches as the "official" branch of Biopython. When >> one of them wants to step down in duty, we will socially pass the torch on >> to the next taker. It has always been thus. Now, instead of passing on the user authentication to the CVS server at OBF, the user authentication to the biopython github account will be passed on, instead: > I think it is essential we have a clearly labeled official trunk > (perhaps with branches for releases), which will be used for all the > official releases (tar balls, zip files and windows installers). Our > main webpage should make this very clear. > > We could potentially continue to have a shared official branch (e.g. > belonging to the generic github biopython user), and give all the > existing CVS contributors write access - and continue to manage this > as before. So for example, if Frank wanted to check in some minor > changes to Bio.Nexus he could just do it. Future contributors > patches/branches might get taken up by a developer on a personal > branch for testing, before being merged into the official branch. > > i.e. We can initially continue as before - right now I don't have a > feel for how much work the role of an official branch maintainer would > be, and it is difficult to guess without more hands on experience > using the new tools. Plus ca change (avec git)... >>> The second question, related to the first is how will different >>> branches (of different persons) be managed? I am seeing people >>> starting working on the same code in different directions and then >>> having problems merging everything together. >> >> People are supposed to work in different directions; this is the point of >> distributed workflows. I may have a different understanding of 'different directions' than you mean, but I don't think that it's good for a community project if people work in different directions. I also don't think that that is the point of distributed workflows; on the contrary, I think that they are intended to make it easier to work independently towards a common goal. Even if that is by working on loosely- or non-interacting parts of the whole. >> Merging tends not to be so difficult, and compared to >> centralized models like CVS and SVN, it's a cinch. We will help provide >> documentation for proper merging habits (e.g., merge early, merge often, and >> no rebasing after pushing, etc.). There are also screencasts popping up (in >> particular Scott Chacon's re-make of his Gitcasts, now at learn.github) that >> we will link to for educational purposes. >> And of course, other developers will be around to help out in tricky merges. This characterises one of the frustrating aspects of this thread (not getting at you personally, Chris) - the occasional implicit assumption that 'things will be inherently *better* if we use git'. Developers are around to help now, even using CVS (which also has clear, long-standing stable documentation - and even an O'Reilly book). Several people don't seem to think that that - and the way that code is reviewed and incorporated into the main distribution - is good enough, and I don't think that this will change just because the version control system has changed. Nor will changing revision control system generate significant free time to write, test and document code. But we may have the recession to do that last one for us. > Well, yes, in theory we have the same problem now with CVS - and while > the tools may make merging easier, some communication is essential > when working on the key modules which impact large parts of the code > base. I would put it more strongly than that: communication is essential in all aspects of the project. A number of related blog posts make statements along the lines of "I don't use Biopython, or post to the mailing lists, but I think that they're doing *this* wrong", or "I submitted code, but it didn't get taken up immediately". Now, venting and ranting on a blog is fine, but it's not really *communicating*, any more than it was when I thought that the BioSQL GenBank upload code was broken, fixed it (for my purposes) and told no-one. Git won't change the communication issue (in either direction) any more than it changes the code review process. FWIW, I think that git looks like a good way to go, and that it could help encourage people to make local modifications of Biopython for their own benefit and in their own interests and expert area, in a way that is visible to the core distribution (unlike the patch submission process that is now implemented). In that way it could facilitate more rapid expansion of the core distribution. However, the bottlenecks of ensuring code quality, testing and documentation will only ease if that is taken up by the individuals/groups making those contributions, in addition to the core developers. And yes, I know I'm late with the new GenomeDiagram docs... ;) L. -- Dr Leighton Pritchard MRSC D131, Plant Pathology Programme, SCRI Errol Road, Invergowrie, Perth and Kinross, Scotland, DD2 5DA e:lpritc at scri.ac.uk w:http://www.scri.ac.uk/staff/leightonpritchard gpg/pgp: 0xFEFC205C tel:+44(0)1382 562731 x2405 ______________________________________________________ SCRI, Invergowrie, Dundee, DD2 5DA. The Scottish Crop Research Institute is a charitable company limited by guarantee. Registered in Scotland No: SC 29367. Recognised by the Inland Revenue as a Scottish Charity No: SC 006662. DISCLAIMER: This email is from the Scottish Crop Research Institute, but the views expressed by the sender are not necessarily the views of SCRI and its subsidiaries. This email and any files transmitted with it are confidential to the intended recipient at the e-mail address to which it has been addressed. It may not be disclosed or used by any other than that addressee. If you are not the intended recipient you are requested to preserve this confidentiality and you must not use, disclose, copy, print or rely on this e-mail in any way. Please notify postmaster at scri.ac.uk quoting the name of the sender and delete the email from your system. Although SCRI has taken reasonable precautions to ensure no viruses are present in this email, neither the Institute nor the sender accepts any responsibility for any viruses, and it is your responsibility to scan the email and the attachments (if any). ______________________________________________________ From bartek at rezolwenta.eu.org Tue Mar 17 06:06:33 2009 From: bartek at rezolwenta.eu.org (Bartek Wilczynski) Date: Tue, 17 Mar 2009 11:06:33 +0100 Subject: [Biopython-dev] history on github - where are the tags? In-Reply-To: <8b34ec180903170302v7dca4f04w85a11d3f0fbe6314@mail.gmail.com> References: <320fb6e00903170206h570989bbgb6b9a761d2aa70ed@mail.gmail.com> <8b34ec180903170302v7dca4f04w85a11d3f0fbe6314@mail.gmail.com> Message-ID: <8b34ec180903170306ocf4b9e7s6d34cacdfb7e423b@mail.gmail.com> Hi, I'll look into this. I'm now heading for a plane, so I can't do it now. cheers Bartek On Tue, Mar 17, 2009 at 11:02 AM, Bartek Wilczynski wrote: > Hi, > > I'll look into this. I'm now heading for a plane, so I can't do it now. > > cheers > ?Bartek > > On Tue, Mar 17, 2009 at 10:06 AM, Peter wrote: >> Hi Bartek et al, >> >> I've just been looking over the github mirror of CVS, and wanted to >> see it presented the history of individual files. ?For example, this >> page looks at the Bio/SeqRecord.py history using ViewCVS: >> http://cvs.biopython.org/cgi-bin/viewcvs/viewcvs.cgi/biopython/Bio/SeqRecord.py?cvsroot=biopython >> >> For comparison, in GitHub, >> http://github.com/biopython/biopython/commits/master/Bio/SeqRecord.py >> >> As you can see, all the comments and changes are there - which is >> great. ?But I can't see the CVS tag information, which I assume would >> be converting into git tags. ?Is this information present in the git >> repository, but not shown by github, or was it lost during the >> migration? ?This might seem like a little thing, but I have found it >> incredibly important for tracing bugs reported in older releases, for >> example in narrowing down when something changed. >> >> Peter >> > > > > -- > Bartek Wilczynski > ================== > Postdoctoral fellow > EMBL, Furlong group > Meyerhoffstrasse 1, > 69012 Heidelberg, > Germany > tel: +49 6221 387 8433 > -- Bartek Wilczynski ================== Postdoctoral fellow EMBL, Furlong group Meyerhoffstrasse 1, 69012 Heidelberg, Germany tel: +49 6221 387 8433 From biopython at maubp.freeserve.co.uk Tue Mar 17 06:17:25 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 17 Mar 2009 10:17:25 +0000 Subject: [Biopython-dev] gitignore file for github Message-ID: <320fb6e00903170317q683202c6ycd799de0ba748ef4@mail.gmail.com> Hi all, I think we should add a .gitignore file to the github mirror copy repository, which should ignore: * the build subdirectory and all its contents * all *.pyc files (recursively, e.g. for the unit tests) * all LaTeX temporary files recursively under Doc (e.g. *.aux, *.log) Is there anything else this should include? There are a few output files created by the unit tests that we might want to include... Otherwise all these files show up as "unstaged" to use git's terminology, and there is a risk of someone accidentally committing them. Peter From biopython at maubp.freeserve.co.uk Tue Mar 17 06:57:37 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 17 Mar 2009 10:57:37 +0000 Subject: [Biopython-dev] gitignore file for github In-Reply-To: <320fb6e00903170317q683202c6ycd799de0ba748ef4@mail.gmail.com> References: <320fb6e00903170317q683202c6ycd799de0ba748ef4@mail.gmail.com> Message-ID: <320fb6e00903170357s14a20each59f50f5e155298b0@mail.gmail.com> On Tue, Mar 17, 2009 at 10:17 AM, Peter wrote: > Hi all, > > I think we should add a .gitignore file to the github mirror copy > repository, which should ignore: > > * the build subdirectory and all its contents > * all *.pyc files (recursively, e.g. for the unit tests) > * all LaTeX temporary files recursively under Doc (e.g. *.aux, *.log) > > Is there anything else this should include? ?There are a few output > files created by the unit tests that we might want to include... This seems to work pretty well: #Ignore the build directory (and its sub-directories): build #Ignore backup files from some Unix editors, *~ #Ignore all compiled python files (e.g. from running the unit tests): *.pyc #The graphics unit tests produce output files for human inspection #(at the time of writing, only PDF files are created but I expect #this to change). Tests/Graphics/*.pdf Tests/Graphics/*.eps Tests/Graphics/*.svg Tests/Graphics/*.png I've uploaded this as part of one of my test branches on github, http://github.com/peterjc/biopython-seqio-quality/tree/master Peter From bugzilla-daemon at portal.open-bio.org Tue Mar 17 06:59:22 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 17 Mar 2009 06:59:22 -0400 Subject: [Biopython-dev] [Bug 2767] Bio.SeqIO support for FASTQ and QUAL files In-Reply-To: Message-ID: <200903171059.n2HAxMms006144@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2767 ------- Comment #10 from biopython-bugzilla at maubp.freeserve.co.uk 2009-03-17 06:59 EST ------- I've made these changes available on a test github branch, http://github.com/peterjc/biopython-seqio-quality/tree/master This doesn't include all the example files for the unit tests yet. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From tiagoantao at gmail.com Tue Mar 17 07:18:52 2009 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Tue, 17 Mar 2009 11:18:52 +0000 Subject: [Biopython-dev] biopython on github In-Reply-To: <320fb6e00903170146x59218aa0m857cab797ad3f440@mail.gmail.com> References: <320fb6e00902250210t2ad19536ke379e219ba6f7dae@mail.gmail.com> <8b34ec180903121620w9c2ec46i8fed9ccb4781370e@mail.gmail.com> <320fb6e00903130521s69c5b3eg55b71191b1e8ff21@mail.gmail.com> <128a885f0903142243r372026d7vdf5bbe998db3a326@mail.gmail.com> <20090315185443.GA30296@kunkel> <320fb6e00903160430h125d11a3jd100497d3e25ffb8@mail.gmail.com> <20090316224240.GA57054@sobchak.mgh.harvard.edu> <6d941f120903161711p71c7c940t1eabe933c0fa43e5@mail.gmail.com> <128a885f0903162045l474d0df3w2b8fad7f7f129a3b@mail.gmail.com> <320fb6e00903170146x59218aa0m857cab797ad3f440@mail.gmail.com> Message-ID: <6d941f120903170418k46481c8bj8c20d510314f57ee@mail.gmail.com> On Tue, Mar 17, 2009 at 8:46 AM, Peter wrote: > I think it is essential we have a clearly labeled official trunk > (perhaps with branches for releases), which will be used for all the > official releases (tar balls, zip files and windows installers). ?Our > main webpage should make this very clear. I agree. I would like to take this opportunity just to make my opinion clear (I normally tend to list hipothesis and refrain to give my own opinions). 1. I don't think there is a pressing need to go from CVS to whatever. While CVS is not perfect I don't think it has been a big hurdle. But if people want to go in that direction, I have no strong feelings against it also. 2. The hurdle was that _policy_ was too conservative: Some time ago it was not acceptable even to consider a development branch. That stiffles things (although it ensures stability which is good). Fortunately things are more negotiatable now. The point is: the main issues are policy, not technology. 3. Like it or not, different mechanisms (ie centralized versus distributed VCSs) facilitate different policies. Distributed version control facilitates branching to a massive degree. 4. I think a middle ground is a good idea: While there is an official distribution (eg that one that is labelled biopython 1.50 and that will end up on most users computers) which is agressively controled, there should be space for people to try out new things. 5. People that try out new things should be aware (to avoid disappointment) that their new code might not be accepted, for many reasons on the official trunk: not enough documentation, no test cases, design not acceptable, poorly-commented code, whatever. It would be very sad that people would start working on something, spend lots of time on their branch just to see their code refused to be on the "official" trunk. So, in my view things work like this: A. The "official" version on biopython.org is controlled by a "head honcho", currently Peter with input from biopython-dev. This is the version that most users will ever see in practice. B. The official version has a lot of quality enforcement on top. C. People should be free to branch away and try new things. D. People that branch away should be aware that their stuff might not be accepted on the official distribution. If they want it accepted they should come to biopython-dev and have a cup of tea with the community. E. Maybe some contact points should be defined for modules? F. People who want their code included in the "official" distribution should seriously think in branching from the "official" branch and not from any other. I would really like to see an "official" git branch which should be created, in my opinion from a stable release and either by Peter or Michiel (or any other long term CVS-write user). In my case I would branch to maintain some of the PopGen code. Tiago From lpritc at scri.ac.uk Tue Mar 17 08:19:28 2009 From: lpritc at scri.ac.uk (Leighton Pritchard) Date: Tue, 17 Mar 2009 12:19:28 +0000 Subject: [Biopython-dev] biopython on github In-Reply-To: <6d941f120903170418k46481c8bj8c20d510314f57ee@mail.gmail.com> Message-ID: On 17/03/2009 11:18, "Tiago Ant?o" wrote: > On Tue, Mar 17, 2009 at 8:46 AM, Peter > wrote: >> I think it is essential we have a clearly labeled official trunk >> (perhaps with branches for releases), which will be used for all the >> official releases (tar balls, zip files and windows installers). ?Our >> main webpage should make this very clear. > > I agree. > > I would like to take this opportunity just to make my opinion clear (I > normally tend to list hipothesis and refrain to give my own opinions). [...] +1 for Tiago's opinion. L. -- Dr Leighton Pritchard MRSC D131, Plant Pathology Programme, SCRI Errol Road, Invergowrie, Perth and Kinross, Scotland, DD2 5DA e:lpritc at scri.ac.uk w:http://www.scri.ac.uk/staff/leightonpritchard gpg/pgp: 0xFEFC205C tel:+44(0)1382 562731 x2405 ______________________________________________________ SCRI, Invergowrie, Dundee, DD2 5DA. The Scottish Crop Research Institute is a charitable company limited by guarantee. Registered in Scotland No: SC 29367. Recognised by the Inland Revenue as a Scottish Charity No: SC 006662. DISCLAIMER: This email is from the Scottish Crop Research Institute, but the views expressed by the sender are not necessarily the views of SCRI and its subsidiaries. This email and any files transmitted with it are confidential to the intended recipient at the e-mail address to which it has been addressed. It may not be disclosed or used by any other than that addressee. If you are not the intended recipient you are requested to preserve this confidentiality and you must not use, disclose, copy, print or rely on this e-mail in any way. Please notify postmaster at scri.ac.uk quoting the name of the sender and delete the email from your system. Although SCRI has taken reasonable precautions to ensure no viruses are present in this email, neither the Institute nor the sender accepts any responsibility for any viruses, and it is your responsibility to scan the email and the attachments (if any). ______________________________________________________ From biopython at maubp.freeserve.co.uk Tue Mar 17 08:44:05 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 17 Mar 2009 12:44:05 +0000 Subject: [Biopython-dev] biopython on github In-Reply-To: <6d941f120903170418k46481c8bj8c20d510314f57ee@mail.gmail.com> References: <320fb6e00902250210t2ad19536ke379e219ba6f7dae@mail.gmail.com> <320fb6e00903130521s69c5b3eg55b71191b1e8ff21@mail.gmail.com> <128a885f0903142243r372026d7vdf5bbe998db3a326@mail.gmail.com> <20090315185443.GA30296@kunkel> <320fb6e00903160430h125d11a3jd100497d3e25ffb8@mail.gmail.com> <20090316224240.GA57054@sobchak.mgh.harvard.edu> <6d941f120903161711p71c7c940t1eabe933c0fa43e5@mail.gmail.com> <128a885f0903162045l474d0df3w2b8fad7f7f129a3b@mail.gmail.com> <320fb6e00903170146x59218aa0m857cab797ad3f440@mail.gmail.com> <6d941f120903170418k46481c8bj8c20d510314f57ee@mail.gmail.com> Message-ID: <320fb6e00903170544i401fefa4gbfa2b2d542e94816@mail.gmail.com> 2009/3/17 Tiago Ant?o : > On Tue, Mar 17, 2009 at 8:46 AM, Peter wrote: >> I think it is essential we have a clearly labeled official trunk >> (perhaps with branches for releases), which will be used for all the >> official releases (tar balls, zip files and windows installers). ?Our >> main webpage should make this very clear. > > I agree. > > I would like to take this opportunity just to make my opinion clear (I > normally tend to list hipothesis and refrain to give my own opinions). > > 1. I don't think there is a pressing need to go from CVS to whatever. > While CVS is not perfect I don't think it has been a big hurdle. But > if people want to go in that direction, I have no strong feelings > against it also. On a purely pragmatic level, yes, CVS has been enough. This is one real reason why there hasn't been a great deal of pressure on us to move - it wasn't "broken" for how Biopython worked, although it does make branching non-trivial. Moving from CVS to a distributed version control system (DVCS) won't make much difference for those of us with CVS access - the big benefit as I see it is for potential contributors who can easily make a branch to try out their ideas, and keep it in sync with the master branch. This could transform how new modules or bug fixes get contributed, hopefully for the better. > 2. The hurdle was that _policy_ was too conservative: Some time ago it > was not acceptable even to consider a development branch. That > stiffles things (although it ensures stability which is good). > Fortunately things are more negotiatable now. The point is: the main > issues are policy, not technology. Historically Biopython has worked from a single stable branch (Brad - can you comment about the history of this effective policy?). I recall saying something in the last year or so about not wanting to do any branching in CVS while the SVN migration seemed imminent, but this was primarily to avoid any complication in the migration itself, rather than any deep objection to branches themselves. > 3. Like it or not, different mechanisms (ie centralized versus > distributed VCSs) facilitate different policies. Distributed version > control facilitates branching to a massive degree. True. > 4. I think a middle ground is a good idea: While there is an official > distribution (eg that one that is labelled biopython 1.50 and that > will end up on most users computers) which is agressively controled, > there should be space for people to try out new things. I'm not quite sure what you mean by agressively controlled. Moving to a DVCS really should make public experimental branches much easier. > 5. People that try out new things should be aware (to avoid > disappointment) that their new code might not be accepted, for many > reasons on the official trunk: not enough documentation, no test > cases, design not acceptable, poorly-commented code, whatever. It > would be very sad that people would start working on something, spend > lots of time on their branch just to see their code refused to be on > the "official" trunk. That is a risk - especially if anyone were to go off and work in complete isolation without even posting anything to this mailing list. > So, in my view things work like this: > A. The "official" version on biopython.org is controlled by a "head > honcho", currently Peter with input from biopython-dev. This is the > version that most users will ever see in practice. That could work - although having anyone as a single bottle neck is a risk, assuming you get someone to agree to the role in the first place ;) I am generally happy with the current arrangement where module owners have a degree of autonomy over their modules. I wouldn't want to have to approve every single minor change you (Tiago) make to Bio.PopGen - but I suppose occasional review and merging of code from Tiago's branch on request wouldn't be too onerous. > B. The official version has a lot of quality enforcement on top. What does that mean? e.g. a strict policy about unit tests before anything goes into the main branch? > C. People should be free to branch away and try new things. Given the Biopython license (as Leighton pointed out) this is already the case with CVS. Its just using a DVCS makes should this easier, especially for keeping branches in sync with the official branch, and hopefully for any merges back. > D. People that branch away should be aware that their stuff might not > be accepted on the official distribution. If they want it accepted > they should come to biopython-dev and have a cup of tea with the > community. I agree. I like tea. > E. Maybe some contact points should be defined for modules? Do you mean something more explicit about documenting who currently maintains each module? > F. People who want their code included in the "official" distribution > should seriously think in branching from the "official" branch and not > from any other. I agree. > I would really like to see an "official" git branch which should be > created, in my opinion from a stable release and either by Peter or > Michiel (or any other long term CVS-write user). I think we'll have that - and in the short term the CVS mirror on github can be used. > In my case I would branch to maintain some of the PopGen code. Great. Peter From chapmanb at 50mail.com Tue Mar 17 08:49:30 2009 From: chapmanb at 50mail.com (Brad Chapman) Date: Tue, 17 Mar 2009 08:49:30 -0400 Subject: [Biopython-dev] biopython on github In-Reply-To: References: <320fb6e00903170146x59218aa0m857cab797ad3f440@mail.gmail.com> Message-ID: <20090317124930.GE57054@sobchak.mgh.harvard.edu> Hi everyone; Nice to see the discussion around trying out git. Leighton and Tiago, you both brought up some definite concerns in moving to a distributed version control system. Git aims to help solve the problem of a them versus us community. When you read posts critical of Biopython, you will find a lot of complaints about "they didn't do this." This is confusing, as anyone using, coding with, interested in, or contributing to Biopython is a member of the community. CVS can help create this division, since it appears as a walled off repository only the core developers can access. Git frees up the source code and lowers this barrier to contributing. Now instead of saying "why didn't the developers integrate the code I sent to the mailing list and write tests and documentation for it," we can all turn the question back on ourselves and ask why we didn't create a branch with our new contribution and do it, soliciting help from others in Biopython. With solving the problems come potential concerns. This coincidental blog post from yesterday intelligently covers a lot of the issues: http://www.pointy-stick.com/blog/2009/03/16/dark-side-distributed-version-control/ The one we should be most concerned about is fragmentation. The community of Python coders in bioinformatics is too small to be split up; surely we are better served by resolving any differences and producing one high quality reusable code base. Tiago's assessment of how things should work practically looks exactly right. Hard working core developers, like Peter and Michiel, will be maintaining the trunk which we roll releases off of. Contributors can either submit patches as now, or create short branches which get merged back in. The advantage of branches is that others can test and develop the branched code, and that the software should help deal with some of the pain of merging. There is a lot of good material in this thread for new potential developers. Tiago, it would make sense to condense what you've written and include it with the Contributing guide: http://biopython.org/wiki/Contributing We should also create a place on the wiki from the developer documentation: http://biopython.org/wiki/Documentation#Documentation_for_Developers that describes active development branches and their goals (called, say, ActiveBranches). Tiago, I thought you did a page for PopGen earlier like this but I can't find it right now. We should keep communication at a high level to avoid confusing fragmentation. This is a difficult change in terms of how things work; we are asking the right questions to create a good environment for improvement. Brad > Hi all, > > This has been an occasionally frustrating thread to read... > > On 17/03/2009 08:46, "Peter" wrote: > > > On Tue, Mar 17, 2009 at 3:45 AM, Chris Lasher wrote: > >> 2009/3/16 Tiago Ant?o > >> > > >>> How is the "official" biopython trunk controlled? Currently what is on > >>> CVS is the gospel and Peter and Michiel essencially have control of > >>> what is there and what is labelled as a "biopython distribution". How > >>> will this work now? > > >> In a distributed workflow, there is no technical official repository. The > >> "official repository" is socially enforced. > > That was true before. Unless I misread the Biopython licencing, there was > no real barrier to putting a branched copy of the code on your own > server/site, with your own modifications. What git does is provide tools to > make merging of that sort of code easier (along with a number of of other > nice features, such as authentication of contributions). The presence of > git does not ensure that your changes, or anyone else's, will be merged with > any other repository, and nor does it ensure the quality of contributed > code. Git, while nice, and ideal for a number of tasks, is no magic bullet. > > To an extent, the 'official' repository is, pragmatically, the one that is > most stable and well-tested. If my hypothetical branched version had become > more stable and widely-used than the 'official' trunk, and become the most > frequently downloaded and implemented, and received new contributions in its > own right, it might then be considered de facto 'the distribution'; nasty > online spats with the original authors notwithstanding. The 'social > enforcement' of politeness (i.e. *I* don't take credit for *your* work) > prevents this to an extent, as it ought to under any versioning system. > > There's a competing tendency to consider that the coders who spent the most > time creating the code understand it the best, and are in the best position > to maintain it directly. This is true to a large degree, and entirely > applicable to Biopython's contributed modules. git can potentially > facilitate that sort of contribution to the 'official' trunk in a way that > CVS can't, due to its permissions bottleneck. However, the mechanics of > incorporating that contributed code are more or less the same: the people > with control of the 'official' trunk review the code and decide whether to > include it. This is true whether the code is submitted as a patch to > Bugzilla, emailed to a developer, put up on public CVS on your site, or in a > forked git repository. The same is true of your own git repository - you > don't have to include someone else's forked code if you don't want to. > > What possibly needs to change is not the version control system, but the way > in which people think about their contribution. Contributions can be made > productively under any versioning system, and the key questions remain the > same in all cases: Does the new code work (are there tests)? Does the new > code break any old code? Is there documentation? Is the API consistent? > > "What version control system are we using?" is a minor detail, unless it is > inherently broken, hinders any of the above, or causes some other > deal-breaking issue (for Linus Torvalds, this included speed issues for > merges). > > >> I think Michiel and Peter still head the Biopython project--at > >> least they have the most clout, I would say. Therefore, we will probably > >> look to one of their branches as the "official" branch of Biopython. When > >> one of them wants to step down in duty, we will socially pass the torch on > >> to the next taker. > > It has always been thus. Now, instead of passing on the user authentication > to the CVS server at OBF, the user authentication to the biopython github > account will be passed on, instead: > > > I think it is essential we have a clearly labeled official trunk > > (perhaps with branches for releases), which will be used for all the > > official releases (tar balls, zip files and windows installers). Our > > main webpage should make this very clear. > > > > We could potentially continue to have a shared official branch (e.g. > > belonging to the generic github biopython user), and give all the > > existing CVS contributors write access - and continue to manage this > > as before. So for example, if Frank wanted to check in some minor > > changes to Bio.Nexus he could just do it. Future contributors > > patches/branches might get taken up by a developer on a personal > > branch for testing, before being merged into the official branch. > > > > i.e. We can initially continue as before - right now I don't have a > > feel for how much work the role of an official branch maintainer would > > be, and it is difficult to guess without more hands on experience > > using the new tools. > > Plus ca change (avec git)... > > >>> The second question, related to the first is how will different > >>> branches (of different persons) be managed? I am seeing people > >>> starting working on the same code in different directions and then > >>> having problems merging everything together. > >> > >> People are supposed to work in different directions; this is the point of > >> distributed workflows. > > I may have a different understanding of 'different directions' than you > mean, but I don't think that it's good for a community project if people > work in different directions. I also don't think that that is the point of > distributed workflows; on the contrary, I think that they are intended to > make it easier to work independently towards a common goal. Even if that is > by working on loosely- or non-interacting parts of the whole. > > >> Merging tends not to be so difficult, and compared to > >> centralized models like CVS and SVN, it's a cinch. We will help provide > >> documentation for proper merging habits (e.g., merge early, merge often, and > >> no rebasing after pushing, etc.). There are also screencasts popping up (in > >> particular Scott Chacon's re-make of his Gitcasts, now at learn.github) that > >> we will link to for educational purposes. > >> And of course, other developers will be around to help out in tricky merges. > > This characterises one of the frustrating aspects of this thread (not > getting at you personally, Chris) - the occasional implicit assumption that > 'things will be inherently *better* if we use git'. Developers are around > to help now, even using CVS (which also has clear, long-standing stable > documentation - and even an O'Reilly book). Several people don't seem to > think that that - and the way that code is reviewed and incorporated into > the main distribution - is good enough, and I don't think that this will > change just because the version control system has changed. Nor will > changing revision control system generate significant free time to write, > test and document code. But we may have the recession to do that last one > for us. > > > Well, yes, in theory we have the same problem now with CVS - and while > > the tools may make merging easier, some communication is essential > > when working on the key modules which impact large parts of the code > > base. > > I would put it more strongly than that: communication is essential in all > aspects of the project. A number of related blog posts make statements > along the lines of "I don't use Biopython, or post to the mailing lists, but > I think that they're doing *this* wrong", or "I submitted code, but it > didn't get taken up immediately". Now, venting and ranting on a blog is > fine, but it's not really *communicating*, any more than it was when I > thought that the BioSQL GenBank upload code was broken, fixed it (for my > purposes) and told no-one. Git won't change the communication issue (in > either direction) any more than it changes the code review process. > > FWIW, I think that git looks like a good way to go, and that it could help > encourage people to make local modifications of Biopython for their own > benefit and in their own interests and expert area, in a way that is visible > to the core distribution (unlike the patch submission process that is now > implemented). In that way it could facilitate more rapid expansion of the > core distribution. However, the bottlenecks of ensuring code quality, > testing and documentation will only ease if that is taken up by the > individuals/groups making those contributions, in addition to the core > developers. > > And yes, I know I'm late with the new GenomeDiagram docs... ;) > > L. > > -- > Dr Leighton Pritchard MRSC > D131, Plant Pathology Programme, SCRI > Errol Road, Invergowrie, Perth and Kinross, Scotland, DD2 5DA > e:lpritc at scri.ac.uk w:http://www.scri.ac.uk/staff/leightonpritchard > gpg/pgp: 0xFEFC205C tel:+44(0)1382 562731 x2405 > > > ______________________________________________________ > SCRI, Invergowrie, Dundee, DD2 5DA. > The Scottish Crop Research Institute is a charitable company limited by guarantee. > Registered in Scotland No: SC 29367. > Recognised by the Inland Revenue as a Scottish Charity No: SC 006662. > > > DISCLAIMER: > > This email is from the Scottish Crop Research Institute, but the views expressed by the sender are not necessarily the views of SCRI and its subsidiaries. This email and any files transmitted with it are confidential to the intended recipient at the e-mail address to which it has been addressed. It may not be disclosed or used by any other than that > addressee. > If you are not the intended recipient you are requested to preserve this confidentiality and you must not use, disclose, copy, print or rely on > this e-mail in any way. Please notify postmaster at scri.ac.uk quoting the name of the sender and delete the email from your system. > > Although SCRI has taken reasonable precautions to ensure no viruses are present in this email, neither the Institute nor the sender accepts any responsibility for any viruses, and it is your responsibility to scan the email and the attachments (if any). > ______________________________________________________ > > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev From tiagoantao at gmail.com Tue Mar 17 09:10:18 2009 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Tue, 17 Mar 2009 13:10:18 +0000 Subject: [Biopython-dev] biopython on github In-Reply-To: <320fb6e00903170544i401fefa4gbfa2b2d542e94816@mail.gmail.com> References: <320fb6e00902250210t2ad19536ke379e219ba6f7dae@mail.gmail.com> <128a885f0903142243r372026d7vdf5bbe998db3a326@mail.gmail.com> <20090315185443.GA30296@kunkel> <320fb6e00903160430h125d11a3jd100497d3e25ffb8@mail.gmail.com> <20090316224240.GA57054@sobchak.mgh.harvard.edu> <6d941f120903161711p71c7c940t1eabe933c0fa43e5@mail.gmail.com> <128a885f0903162045l474d0df3w2b8fad7f7f129a3b@mail.gmail.com> <320fb6e00903170146x59218aa0m857cab797ad3f440@mail.gmail.com> <6d941f120903170418k46481c8bj8c20d510314f57ee@mail.gmail.com> <320fb6e00903170544i401fefa4gbfa2b2d542e94816@mail.gmail.com> Message-ID: <6d941f120903170610g161342f0ief365d68f25707c1@mail.gmail.com> Hi, > I'm not quite sure what you mean by agressively controlled. ?Moving to > a DVCS really should make public experimental branches much easier. I mean that the official release is a very controlled (a good thing!). Development branches should be more free. > That is a risk - especially if anyone were to go off and work in > complete isolation without even posting anything to this mailing list. I think our obligation is to inform people of the issue. If then people go away and don't communicate, then it becomes their problem. I think just a couple of sentences on the Contributing page on the wiki would be more than enough. > That could work - although having anyone as a single bottle neck is a > risk, assuming you get someone to agree to the role in the first place > ;) ?I am generally happy with the current arrangement where module > owners have a degree of autonomy over their modules. ?I wouldn't want > to have to approve every single minor change you (Tiago) make to > Bio.PopGen - but I suppose occasional review and merging of code from > Tiago's branch on request wouldn't be too onerous. I agree. I am just trying to make this "explicit" policy. So that everybody knows the rules of the game. If people dont agree than that should be discussed and changed. But the point is, these kind of management issues should be written down somewhere in a transparent way. >> B. The official version has a lot of quality enforcement on top. > > What does that mean? ?e.g. a strict policy about unit tests before > anything goes into the main branch? I was reading http://biopython.org/wiki/Contributing and the main stuff is already there (the "submitting code" place). But the point is: the official version should be stable and reliable (as it is now, IMHO) >> E. Maybe some contact points should be defined for modules? > > Do you mean something more explicit about documenting who currently > maintains each module? That is my point. Makes any sense? From chapmanb at 50mail.com Tue Mar 17 09:04:53 2009 From: chapmanb at 50mail.com (Brad Chapman) Date: Tue, 17 Mar 2009 09:04:53 -0400 Subject: [Biopython-dev] Preparing for Biopython 1.50 (beta) In-Reply-To: <49BEDD9B.6030905@u.washington.edu> References: <320fb6e00903160516yd63f61fu21ca7560562dd6dd@mail.gmail.com> <20090316225558.GC57054@sobchak.mgh.harvard.edu> <49BEDD9B.6030905@u.washington.edu> Message-ID: <20090317130453.GF57054@sobchak.mgh.harvard.edu> Hi David; > I've got some 454 and Solid data you could test it on too. > > Has anybody else looked into how these other two Next Gen formats might > complicate things? Sweet. We definitely want to support output from them as well; it is great to have someone on board who is working with data from other machines. Peter did a pretty thorough investigation of the different formats and wrote it up in the docs to the proposed QualityIO module: http://github.com/peterjc/biopython-seqio-quality/blob/6fdf27393cb7318b229ff8587721e83544da968d/Bio/SeqIO/QualityIO.py Does this make sense with your experience? If you feel comfortable with git, Peter set up a new branch with his code for this: http://github.com/peterjc/biopython-seqio-quality/tree/master and we'd be more than happy to have you testing it. Alternatively, if you want to submit some smaller data files we can use in testing, you could attach them to the current enhancement request: http://bugzilla.open-bio.org/show_bug.cgi?id=2767 Thanks for the help, Brad > > Brad Chapman wrote: > > Peter; > > > > > >> I think we should probably do another release soon > >> > > > > Good call. +1 from me. > > > > > >> I'd like to include the following changes as part of the beta, but it > >> would be sensible to have someone else try these out first. Any > >> volunteers? > >> > >> Bug 2767 - Bio.SeqIO support for FASTQ and QUAL files > >> > > > > The code for this looked good when I reviewed it earlier. I will > > test it out with some solexa reads from here this week; any reason > > not to check the patch and files into CVS? Then I can fire up my > > coal-powered revision control system, feed two punch cards into the > > mouth of the machine, hope the vacuum tubes don't burn out again, > > and check it out locally. > > > > Brad > > _______________________________________________ > > Biopython-dev mailing list > > Biopython-dev at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/biopython-dev > > > begin:vcard > fn:David Schruth > n:Schruth;David > org:University of Washington, Department of Oceanography;The Center for Environmental Genomics > adr;dom:616 NE Northlake Place;;Benjamin Hall IRB, Room 306;Seattle;WA;98105 > email;internet:dschruth at u.washington.edu > title:Bioinformatics Research Consultant > tel;work:(206) 328-7381 > tel;cell:(206) 250-9110 > x-mozilla-html:FALSE > url:http://armbrustlab.ocean.washington.edu/people/schruth > version:2.1 > end:vcard > From tiagoantao at gmail.com Tue Mar 17 09:19:38 2009 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Tue, 17 Mar 2009 13:19:38 +0000 Subject: [Biopython-dev] biopython on github In-Reply-To: <20090317124930.GE57054@sobchak.mgh.harvard.edu> References: <320fb6e00903170146x59218aa0m857cab797ad3f440@mail.gmail.com> <20090317124930.GE57054@sobchak.mgh.harvard.edu> Message-ID: <6d941f120903170619n4cb8d4dfr8a72f8ac1e0e896d@mail.gmail.com> On Tue, Mar 17, 2009 at 12:49 PM, Brad Chapman wrote: > There is a lot of good material in this thread for new potential > developers. Tiago, it would make sense to condense what you've > written and include it with the Contributing guide: > > http://biopython.org/wiki/Contributing I can go ahead and try to put a summary of our discussions on that page, if nobody opposes. The change can be rewritten afterwards or deleted anyway. The only issue is that I can only to that on the weekend and not before (travelling abroad from Wednsday to Friday). What I think is needed is actually a final decision on how thigs will progress. Will there be an official git branch? The official will still be cvs? Where will it be hosted? These are lots of important questions, but I think there is enough discussion to arrive at a decision. > (called, say, ActiveBranches). Tiago, I thought you did a page for PopGen > earlier like this but I can't find it right now. We should keep > communication at a high level to avoid confusing fragmentation. Coincidentally I was editing that page today. I took the liberty of creating a link from the documentation page to it. So it should be reachable now. Tiago From p.j.a.cock at googlemail.com Tue Mar 17 10:44:08 2009 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Tue, 17 Mar 2009 14:44:08 +0000 Subject: [Biopython-dev] biopython on github In-Reply-To: <6d941f120903170619n4cb8d4dfr8a72f8ac1e0e896d@mail.gmail.com> References: <320fb6e00903170146x59218aa0m857cab797ad3f440@mail.gmail.com> <20090317124930.GE57054@sobchak.mgh.harvard.edu> <6d941f120903170619n4cb8d4dfr8a72f8ac1e0e896d@mail.gmail.com> Message-ID: <320fb6e00903170744j543f643fg6ef8d677287e2361@mail.gmail.com> 2009/3/17 Tiago Ant?o : > I can go ahead and try to put a summary of our discussions on that > page, if nobody opposes. The change can be rewritten afterwards or > deleted anyway. The only issue is that I can only to that on the > weekend and not before (travelling abroad from Wednsday to Friday). Sure - by the weekend I hope we'll have come to a consensus. > What I think is needed is actually a final decision on how thigs will > progress. Will there be an official git branch? The official will > still be cvs? Where will it be hosted? These are lots of important > questions, but I think there is enough discussion to arrive at a > decision. I think it is still to early for a final decision, but here is my suggested plan: In the short term (at least until Biopython 1.50 beta is out, perhaps until Biopython 1.50 proper is out), CVS will remain the official repository. Bartek will continue automatically updating the mirrored copy on github, which will otherwise be treated as READ ONLY. If needs be, he may have to reimport the whole history (the tag issue troubles me - see the other thread), so there may be some bumps along this road. Contributions/bug fixes can continue via bugzilla with a patch, and contributors can also try providing a URL to their own git branch if they prefer. During this period I hope most (ideally all) our active developers with CVS access will create an account on github, and try out forking from the CVS mirror, creating their own branches, checking in some changes, and doing some simple merges - for example pulling code from other Biopython developer's public branches. This should give us the confidence to trust git and github enough to use it for real. i.e. For the roughly the next month, we will continue as before with CVS for the real work, but will also try out github. Once Biopython 1.50 final is out (hopefully by the end of April 2009, probably sooner), we need to decide if we will actually make the more to git on github. At this point, I would expect this to happen by declaring CVS read only, a static archive (and emergency fall back). Bartek would turn off his automatic syncing. We would then continue working on the github branch with the full CVS history, with a core of Biopython developers having write access to the "official" branch, doing new work under their own personal branches for eventual merging into the main trunk. I'd still like to have a copy of the "official" git repository running on biopython.org, but this may not be that easy without some technical expertise in house to do this. From initial discussion with the OBF team about the idea of running git on their servers, my impression is if we can do it ourselves, we may. Jason Stajich actually suggested we use github independently. Peter P.S. Could you all update your entry on the wiki participants page (and if you have one, your wiki user page) to include a link to your github account: http://biopython.org/wiki/Participants From biopython at maubp.freeserve.co.uk Tue Mar 17 10:46:53 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 17 Mar 2009 14:46:53 +0000 Subject: [Biopython-dev] Preparing for Biopython 1.50 (beta) In-Reply-To: <49BEDD9B.6030905@u.washington.edu> References: <320fb6e00903160516yd63f61fu21ca7560562dd6dd@mail.gmail.com> <20090316225558.GC57054@sobchak.mgh.harvard.edu> <49BEDD9B.6030905@u.washington.edu> Message-ID: <320fb6e00903170746g632f56a5hfae8a4960e77fa85@mail.gmail.com> 2009/3/16 David Schruth : > I've got some 454 and Solid data you could test it on too. > > Has anybody else looked into how these other two Next Gen formats might > complicate things? Roche 454 sequencers produce their own binary SFF files (standing for sequence file format?), but they provide tools which turn these into standard Sanger style files using PHRED qualities. In theory, we might be able to parse the SFF files directly, see for example http://blog.malde.org/index.php/2008/11/14/454-sequencing-and-parsing-the-sff-binary-format/ and the links given. In practice, most sequencing centers using Roche 454 will be happy to provide FASTQ or FASTA+QUAL files, and the code on Bug 2767 (or the associated experimental branch on github) should work fine on these. http://bugzilla.open-bio.org/show_bug.cgi?id=2767 You are free to try out the proposed code yourself now, but if you have some particular 454 files you'd like me to check, please email me (off the mailing list). If you can share some real data which we could include in Biopython for a unit test that would also be great (but unless you tell me this explicitly, I'll only make sure we can parse your files). Regarding SOLiD files, they work in colour space and I am under the impression that it doesn't make sense to convert them to sequence space until after doing the assembly or genome mapping (in colour space). See for example http://solidsoftwaretools.com/gf/project/mapreads/ i.e. It may not be appropriate to parse SOLiD reads into Biopython SeqRecord objects, and thus wouldn't belong in Bio.SeqIO. That isn't to say we wouldn't want a parser elsewhere in Biopython, perhaps under Bio.Sequencing would be best. Peter From biopython at maubp.freeserve.co.uk Tue Mar 17 10:57:49 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 17 Mar 2009 14:57:49 +0000 Subject: [Biopython-dev] biopython on github In-Reply-To: <20090316224240.GA57054@sobchak.mgh.harvard.edu> References: <320fb6e00902250210t2ad19536ke379e219ba6f7dae@mail.gmail.com> <8b34ec180902260526m3ff42f3x2a99a77d4d0fb928@mail.gmail.com> <320fb6e00902260600p5fb90241td1ded497c08cb901@mail.gmail.com> <128a885f0903121407g133ed8ctda57b21ff8adb70e@mail.gmail.com> <8b34ec180903121620w9c2ec46i8fed9ccb4781370e@mail.gmail.com> <320fb6e00903130521s69c5b3eg55b71191b1e8ff21@mail.gmail.com> <128a885f0903142243r372026d7vdf5bbe998db3a326@mail.gmail.com> <20090315185443.GA30296@kunkel> <320fb6e00903160430h125d11a3jd100497d3e25ffb8@mail.gmail.com> <20090316224240.GA57054@sobchak.mgh.harvard.edu> Message-ID: <320fb6e00903170757s183f6f59x40549f7e3a853f06@mail.gmail.com> On Mon, Mar 16, 2009 at 10:42 PM, Brad Chapman wrote: > Peter wrote: >> I'm thinking a news post on >> http://news.open-bio.org/news/category/obf-projects/biopython/ about >> version control would be a good idea at this point. ?How about this - > > This is great, and I would move the last paragraph describing > the Git repository to the beginning; start with what we are doing and > then describe the rationale. This should help for those with ADD, and > also give more prominent credit to Bartek, Giovanni and you for the > work that went into this. Good idea about the reordering - done, and published: http://news.open-bio.org/news/2009/03/biopython-and-version-control-systems/ It will also show up on http://biopython.org/wiki/News via the RSS feed. Peter From rodrigo_faccioli at uol.com.br Tue Mar 17 11:30:48 2009 From: rodrigo_faccioli at uol.com.br (Rodrigo faccioli) Date: Tue, 17 Mar 2009 12:30:48 -0300 Subject: [Biopython-dev] PDB Parser error Message-ID: <3715adb70903170830x61bb6e3bl4412a8cf1504d80c@mail.gmail.com> I built a relational database in PostgreSQL. This database stores some informations form PDB file. These informations are about its sequence, atoms and sbonds. Now, I'm building a parser for this my database which I want to load it in a biopython PDB parser structure. The idea is keep on whole my souce-code based in biopython PDB parser, because will be necessary to do some operations with these informations. So, I study the Bio.PDB directory and I read in the PDBPaerser.py file, its _parse_coordinates method where there is some methods about initialization structure. I run them in my code. However, is showing the message below. Traceback (most recent call last): File "src/testefcfrpPDB.py", line 32, in main() File "src/testefcfrpPDB.py", line 30, in main structure = FcfrpPDB.getPDBFile(id) File "/home/faccioli/workspace/blast/src/FcfrpPDB.py", line 67, in getPDBFile return fcfrpPDBParser.loadStructureFromDatabase(id) File "/home/faccioli/workspace/blast/src/FcfrpPDBParser.py", line 48, in loadStructureFromDatabase self._structure_builder.init_atom(D_Atoms[i].get_id(), D_Atoms[i].get_coord(), D_Atoms[i].get_bfactor(),D_Atoms[i].get_occupancy() ,D_Atoms[i].get_altloc(), D_Atoms[i].get_fullname(), D_Atoms[i].get_serial_number()) File "/usr/lib/python2.5/site-packages/biopython-1.49-py2.5-linux-i686.egg/Bio/PDB/StructureBuilder.py", line 182, in init_atom if residue.has_id(name): File "/usr/lib/python2.5/site-packages/biopython-1.49-py2.5-linux-i686.egg/Bio/PDB/Entity.py", line 96, in has_id return self.child_dict.has_key(id) TypeError: list objects are unhashable This post is my first post in biopython developer's list and I don't know what is the its process to send a code. Thanks for any help. -- Rodrigo Antonio Faccioli Ph.D Student in Electrical Engineering University of Sao Paulo - USP Engineering School of Sao Carlos - EESC Department of Electrical Engineering - SEL Intelligent System in Structure Bioinformatics http://laips.sel.eesc.usp.br Phone: 55 (16) 3373-9366 Ext 229 Curriculum Lattes - http://lattes.cnpq.br/1025157978990218 From lpritc at scri.ac.uk Tue Mar 17 11:42:55 2009 From: lpritc at scri.ac.uk (Leighton Pritchard) Date: Tue, 17 Mar 2009 15:42:55 +0000 Subject: [Biopython-dev] Preparing for Biopython 1.50 (beta) In-Reply-To: <320fb6e00903170746g632f56a5hfae8a4960e77fa85@mail.gmail.com> Message-ID: Hi, On 17/03/2009 14:46, "Peter" wrote: > 2009/3/16 David Schruth : >> I've got some 454 and Solid data you could test it on too. >> >> Has anybody else looked into how these other two Next Gen formats might >> complicate things? > Regarding SOLiD files, they work in colour space and I am under the > impression that it doesn't make sense to convert them to sequence > space until after doing the assembly or genome mapping (in colour > space). See for example > http://solidsoftwaretools.com/gf/project/mapreads/ i.e. It may not be > appropriate to parse SOLiD reads into Biopython SeqRecord objects, and > thus wouldn't belong in Bio.SeqIO. That isn't to say we wouldn't want > a parser elsewhere in Biopython, perhaps under Bio.Sequencing would be > best. That's my understanding and practical experience, too. For lurkers' benefit SOLiD data looks like this: >4_48_57_F3 T33111210002200023033000000211000101 >4_48_89_F3 T22002312223133113013303322223322223 >4_48_95_F3 T22300102100203322101021130203000201 where each of the four values (0,1,2,3) corresponds to one of 16 dimers (AA, AC, AG, AT, CA, ...), i.e. Each colour value is degenerate for four possible dimers. This system is described at http://www3.appliedbiosystems.com/cms/groups/mcb_marketing/documents/general documents/cms_057559.pdf. The use of an appropriate colour->dimer mapping makes it possible, in principle, to go from colour space to nucleotide sequence, so long as a single base of the sequence is known. In reality a single colour space read error silently makes the rest of the SOLiD read mapping incorrect. Practical use of SOLiD data involves mapping the sequence reads to a reference sequence (either by converting the reference to colour space, or dynamic programming) prior to conversion to 'base space'. The mapping process is probably better handled by dedicated applications, and I think the role for Biopython in this is to parse their output. GFF is, awkwardly enough, a popular output format for this kind of analysis. L. -- Dr Leighton Pritchard MRSC D131, Plant Pathology Programme, SCRI Errol Road, Invergowrie, Perth and Kinross, Scotland, DD2 5DA e:lpritc at scri.ac.uk w:http://www.scri.ac.uk/staff/leightonpritchard gpg/pgp: 0xFEFC205C tel:+44(0)1382 562731 x2405 ______________________________________________________ SCRI, Invergowrie, Dundee, DD2 5DA. The Scottish Crop Research Institute is a charitable company limited by guarantee. Registered in Scotland No: SC 29367. Recognised by the Inland Revenue as a Scottish Charity No: SC 006662. DISCLAIMER: This email is from the Scottish Crop Research Institute, but the views expressed by the sender are not necessarily the views of SCRI and its subsidiaries. This email and any files transmitted with it are confidential to the intended recipient at the e-mail address to which it has been addressed. It may not be disclosed or used by any other than that addressee. If you are not the intended recipient you are requested to preserve this confidentiality and you must not use, disclose, copy, print or rely on this e-mail in any way. Please notify postmaster at scri.ac.uk quoting the name of the sender and delete the email from your system. Although SCRI has taken reasonable precautions to ensure no viruses are present in this email, neither the Institute nor the sender accepts any responsibility for any viruses, and it is your responsibility to scan the email and the attachments (if any). ______________________________________________________ From biopython at maubp.freeserve.co.uk Tue Mar 17 12:01:25 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 17 Mar 2009 16:01:25 +0000 Subject: [Biopython-dev] PDB Parser error In-Reply-To: <3715adb70903170830x61bb6e3bl4412a8cf1504d80c@mail.gmail.com> References: <3715adb70903170830x61bb6e3bl4412a8cf1504d80c@mail.gmail.com> Message-ID: <320fb6e00903170901v6533910bl57ddd534dc05cf51@mail.gmail.com> On Tue, Mar 17, 2009 at 3:30 PM, Rodrigo faccioli wrote: > I built a relational database in PostgreSQL. This database stores some > informations form PDB file. These informations are about its sequence, atoms > and sbonds. Now, I'm building a parser for this my database which I want to > load it in a biopython PDB parser structure. The idea is ?keep on whole my > souce-code ?based in biopython PDB parser, because will be necessary to do > some operations with these informations. > > So, I study the Bio.PDB directory and I read in the PDBPaerser.py file, its > _parse_coordinates method where there is some methods about initialization > structure. I run them in my code. However, is showing the message below. > Traceback (most recent call last): > ?File "src/testefcfrpPDB.py", line 32, in > ... > ?File > "/usr/lib/python2.5/site-packages/biopython-1.49-py2.5-linux-i686.egg/Bio/PDB/Entity.py", > line 96, in has_id > ? ?return self.child_dict.has_key(id) > TypeError: list objects are unhashable > > This post is my first post in biopython developer's list and I don't know > what is the its process to send a code. Its hard to say without seeing your full code (and even then, without the database it would be difficult to reproduce it). As you have a TypeError, I suspect you have something as the wrong datatype - maybe a list that should be a string or something. If you want to share the full file testefcfrpPDB.py you could post it on http://pastebin.com/ or something (do you have your own website?). Peter From biopython at maubp.freeserve.co.uk Tue Mar 17 13:59:43 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 17 Mar 2009 17:59:43 +0000 Subject: [Biopython-dev] biopython on github In-Reply-To: <320fb6e00903170744j543f643fg6ef8d677287e2361@mail.gmail.com> References: <320fb6e00903170146x59218aa0m857cab797ad3f440@mail.gmail.com> <20090317124930.GE57054@sobchak.mgh.harvard.edu> <6d941f120903170619n4cb8d4dfr8a72f8ac1e0e896d@mail.gmail.com> <320fb6e00903170744j543f643fg6ef8d677287e2361@mail.gmail.com> Message-ID: <320fb6e00903171059r7a5528d5i19bf5fed9cfd8a63@mail.gmail.com> I wrote: > In the short term (at least until Biopython 1.50 beta is out, perhaps > until Biopython 1.50 proper is out), CVS will remain the official > repository. ?... During this period I hope most (ideally all) our active > developers with CVS access will create an account on github, and > try out forking from the CVS mirror, creating their own branches, > checking in some changes, and doing some simple merges - for > example pulling code from other Biopython developer's public > branches. This should give us the confidence to trust git and > github enough to use it for real. Brad and I have been trying this out in practice, and it seems to work OK. I started a fork to test the patches for Bug 2767, adding quality parsers to Bio.SeqIO, http://github.com/chapmanb/biopython-seqio-quality/tree/master I made a few incremental checkins, pushed to github one by one. Brad then took a fork of this in order to make some minor changes and fix a typo in the documentation : http://github.com/chapmanb/biopython-seqio-quality/tree/master At this point the "network" diagrams showed up the two branches as diverging. Brad then sent me a "pull" request, suggesting I might want to pull his work into my branch. Using the git command line tool, I was able to pull and merge Brad's changes (as I had made no changes in the meantime this could be done automatically), and then push the merged version back up to github on my branch. At this point my branch and brad's agreed once again, and the "network" diagram no longer shows both. Note that my branch now includes a commit from Brad. At this point, Brad may choose to delete his branch, or perhaps make further changes. Now all this worked, but I was wondering if the github web interface could have simplified any of this, if I'd only know where to click. For example, does github offer any way to view a diff between to branches? Or, as I suspect, do they simply expect you to use the git tools directly for this? Peter From bugzilla-daemon at portal.open-bio.org Tue Mar 17 14:06:00 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 17 Mar 2009 14:06:00 -0400 Subject: [Biopython-dev] [Bug 2767] Bio.SeqIO support for FASTQ and QUAL files In-Reply-To: Message-ID: <200903171806.n2HI60op012464@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2767 ------- Comment #11 from biopython-bugzilla at maubp.freeserve.co.uk 2009-03-17 14:06 EST ------- (In reply to comment #10) > I've made these changes available on a test github branch, > http://github.com/peterjc/biopython-seqio-quality/tree/master > > This doesn't include all the example files for the unit tests yet. > I've now checked this into CVS. The extra example files will follow later... leaving this bug open until that is done. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From dalloliogm at gmail.com Tue Mar 17 14:35:04 2009 From: dalloliogm at gmail.com (Giovanni Marco Dall'Olio) Date: Tue, 17 Mar 2009 19:35:04 +0100 Subject: [Biopython-dev] biopython on github In-Reply-To: <320fb6e00903171059r7a5528d5i19bf5fed9cfd8a63@mail.gmail.com> References: <320fb6e00903170146x59218aa0m857cab797ad3f440@mail.gmail.com> <20090317124930.GE57054@sobchak.mgh.harvard.edu> <6d941f120903170619n4cb8d4dfr8a72f8ac1e0e896d@mail.gmail.com> <320fb6e00903170744j543f643fg6ef8d677287e2361@mail.gmail.com> <320fb6e00903171059r7a5528d5i19bf5fed9cfd8a63@mail.gmail.com> Message-ID: <5aa3b3570903171135nb49de80h6c6ee0930c147d29@mail.gmail.com> On Tue, Mar 17, 2009 at 6:59 PM, Peter wrote: > > Brad and I have been trying this out in practice, and it seems to work OK. > > I started a fork to test the patches for Bug 2767, adding quality > parsers to Bio.SeqIO, > http://github.com/chapmanb/biopython-seqio-quality/tree/master > I made a few incremental checkins, pushed to github one by one. > > Brad then took a fork of this in order to make some minor changes and > fix a typo in the documentation : > http://github.com/chapmanb/biopython-seqio-quality/tree/master Yes, basically this is the way it should be working. Usually I do something similar, only I use more the procedure explained here: - http://www.kernel.org/pub/software/scm/git/docs/v1.4.4.4/tutorial.html (section 'Using git for collaboration') I fetch the other branch and call it as master:otheruser-incoming, then compare the two branches with gitk or with git log master..otheruser-incoming. > > > At this point the "network" diagrams showed up the two branches as > diverging. ?Brad then sent me a "pull" request, suggesting I might > want to pull his work into my branch. > > Using the git command line tool, I was able to pull and merge Brad's > changes (as I had made no changes in the meantime this could be done > automatically), If you go on 'Fork Queue' on github, it should show other people's commits. However, I don't trust doing this with a web interface... moreover, it seems to not work properly some times (it is not clear how it defines if a commit will 'apply cleanly' or not) On the same page, there is a 'pull merge request' button, which (I never tried it) should send a merge request to the selected recipents. > and then push the merged version back up to github on > my branch. ?At this point my branch and brad's agreed once again, and > the "network" diagram no longer shows both. ?Note that my branch now > includes a commit from Brad. Yes, this is right. The graph only shows the commits which differ, so it included your two branches as a single one. If you fell comfortable with the git mechanisms, maybe later you could create a second branch in the 'biopython/biopython' repository, and call it 'accepted-github-changes', or something like that, which will collect all the changes that can be submitted to the cvs. > At this point, Brad may choose to delete his branch, or perhaps make > further changes. I wonder if a good strategy with this is create branches only to test specific changes, and then delete them. If Brad keeps his branch, later he will have to remember to update it, which maybe is less trouble than deleting a branch and creating it when necessary. > Now all this worked, but I was wondering if the github web interface > could have simplified any of this, if I'd only know where to click. > For example, does github offer any way to view a diff between to > branches? ?Or, as I suspect, do they simply expect you to use the git > tools directly for this? For my knowledge, there are not such tools :-(. You must rely on the commit's messages to identify the differences between different branches. Maybe they will implement such feature at some point. > > Peter > > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev -- My blog on bioinformatics (now in English): http://bioinfoblog.it From dalloliogm at gmail.com Tue Mar 17 14:36:24 2009 From: dalloliogm at gmail.com (Giovanni Marco Dall'Olio) Date: Tue, 17 Mar 2009 19:36:24 +0100 Subject: [Biopython-dev] biopython on github In-Reply-To: <320fb6e00903170744j543f643fg6ef8d677287e2361@mail.gmail.com> References: <320fb6e00903170146x59218aa0m857cab797ad3f440@mail.gmail.com> <20090317124930.GE57054@sobchak.mgh.harvard.edu> <6d941f120903170619n4cb8d4dfr8a72f8ac1e0e896d@mail.gmail.com> <320fb6e00903170744j543f643fg6ef8d677287e2361@mail.gmail.com> Message-ID: <5aa3b3570903171136k3dc616a3hc937381d940cd305@mail.gmail.com> 2009/3/17 Peter Cock > 2009/3/17 Tiago Ant?o : > > I'd still like to have a copy of the "official" git repository running > on biopython.org, but this may not be that easy without some technical > expertise in house to do this. From initial discussion with the OBF > team about the idea of running git on their servers, my impression is > if we can do it ourselves, we may. Jason Stajich actually suggested > we use github independently. Well, basically it is not strictly necessary to have git installed on their computers to create a mirror. You can just create the clone on your computer, raw-ly copy the files there, and then you will be able to push the new changes with an ssh access. Since git is a distributed source control system, it doesn't require to configure a server part as with cvs :-) To my knowledge, the pygr project (also a bioinformatics suite in python) have an official repository hosted in gitourious, and a mirror in github to collect patches from there. -- My blog on bioinformatics (now in English): http://bioinfoblog.it From tiagoantao at gmail.com Tue Mar 17 15:09:13 2009 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Tue, 17 Mar 2009 19:09:13 +0000 Subject: [Biopython-dev] biopython on github In-Reply-To: <5aa3b3570903171136k3dc616a3hc937381d940cd305@mail.gmail.com> References: <320fb6e00903170146x59218aa0m857cab797ad3f440@mail.gmail.com> <20090317124930.GE57054@sobchak.mgh.harvard.edu> <6d941f120903170619n4cb8d4dfr8a72f8ac1e0e896d@mail.gmail.com> <320fb6e00903170744j543f643fg6ef8d677287e2361@mail.gmail.com> <5aa3b3570903171136k3dc616a3hc937381d940cd305@mail.gmail.com> Message-ID: <6d941f120903171209g751b5b86p797e79b333972301@mail.gmail.com> OK, in order to exercise and try github development I have forked a branch to work on the PopGen code. The idea of the branch is to serve as a platform for merging with the "official" branch. So, the idea is: 1. Official branch - The stable thingy 2. PopGen stabilizer branch - The place to merge contributions from PopGen development branches. The idea is that people can go crazy on their own branches and this intermediate one serves as a point to stabilize (unit test, documentation, QA, ...) before the commit to the official one 3. Crazy branches - Develop your crazy idea. I have 3 ideas myself: One for Jason's structure code, one for my LDNe code and another for statistics. Many more welcomed.... The development procedure would be like this: A. People would have all the fun on their development branches B. When they felt confident they would submit their code to the stabilizer branch, where we would check that all the important things were there: unit test, code comments, QA, documentation C. When things were in good shape, we would propose changes to the official branch And, by the way, bug fixes of existing production would also be done on the stabilizer branch. Does this make any sense? In my view, with things like git, a policy like this encourages both innovation while preserving stability and robustness of the official branch. Tiago On Tue, Mar 17, 2009 at 6:36 PM, Giovanni Marco Dall'Olio wrote: > > > 2009/3/17 Peter Cock >> >> 2009/3/17 Tiago Ant?o : >> >> I'd still like to have a copy of the "official" git repository running >> on biopython.org, but this may not be that easy without some technical >> expertise in house to do this. ?From initial discussion with the OBF >> team about the idea of running git on their servers, my impression is >> if we can do it ourselves, we may. ?Jason Stajich actually suggested >> we use github independently. > > Well, basically it is not strictly necessary to have git installed on their > computers to create a mirror. > You can just create the clone on your computer, raw-ly copy the files there, > and then you will be able to push the new changes with an ssh access. > Since git is a distributed source control system, it doesn't require to > configure a server part as with cvs :-) > > To my knowledge, the pygr project (also a bioinformatics suite in python) > have an official repository hosted in gitourious, and a mirror in github to > collect patches from there. > > > > > -- > > My blog on bioinformatics (now in English): http://bioinfoblog.it > -- "A man who dares to waste one hour of time has not discovered the value of life" - Charles Darwin From mailinglist.honeypot at gmail.com Tue Mar 17 15:21:57 2009 From: mailinglist.honeypot at gmail.com (Steve Lianoglou) Date: Tue, 17 Mar 2009 15:21:57 -0400 Subject: [Biopython-dev] biopython on github In-Reply-To: <6d941f120903171209g751b5b86p797e79b333972301@mail.gmail.com> References: <320fb6e00903170146x59218aa0m857cab797ad3f440@mail.gmail.com> <20090317124930.GE57054@sobchak.mgh.harvard.edu> <6d941f120903170619n4cb8d4dfr8a72f8ac1e0e896d@mail.gmail.com> <320fb6e00903170744j543f643fg6ef8d677287e2361@mail.gmail.com> <5aa3b3570903171136k3dc616a3hc937381d940cd305@mail.gmail.com> <6d941f120903171209g751b5b86p797e79b333972301@mail.gmail.com> Message-ID: <7063D4EA-D827-4D91-A15C-53F148660D96@gmail.com> Hi, I really just loom around here, but a slight correction/point: > A. People would have all the fun on their development branches > B. When they felt confident they would submit their code to the > stabilizer branch, where we would check that all the important things > were there: unit test, code comments, QA, documentation > C. When things were in good shape, we would propose changes to the > official branch I'm very much a git noob, and from having been following this thread a bit, it seems that many of us are, so for the noobs: I think somewhere around B, the person wanting to commit new code would have to rebase[1] their branch against the official "stabilizer branch" (that they had originally forked from). This would put the onus of fixing any breaks and keeping track of recent developments on the branch you propose to merge into (since you originally branched), on the person who is writing the new code. This makes it easier for the "official keepers of the one true branch" to accept new patches, since they know the patch will work on the latest version. Anyway, I think I just wanted to point out that rebase was there since I don't think there's anything really equivalent in the CVS/SVN world. -steve [1] rebase : http://www.kernel.org/pub/software/scm/git/docs/git-rebase.html From tiagoantao at gmail.com Tue Mar 17 15:27:10 2009 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Tue, 17 Mar 2009 19:27:10 +0000 Subject: [Biopython-dev] biopython on github In-Reply-To: <7063D4EA-D827-4D91-A15C-53F148660D96@gmail.com> References: <320fb6e00903170146x59218aa0m857cab797ad3f440@mail.gmail.com> <20090317124930.GE57054@sobchak.mgh.harvard.edu> <6d941f120903170619n4cb8d4dfr8a72f8ac1e0e896d@mail.gmail.com> <320fb6e00903170744j543f643fg6ef8d677287e2361@mail.gmail.com> <5aa3b3570903171136k3dc616a3hc937381d940cd305@mail.gmail.com> <6d941f120903171209g751b5b86p797e79b333972301@mail.gmail.com> <7063D4EA-D827-4D91-A15C-53F148660D96@gmail.com> Message-ID: <6d941f120903171227o54bf9d36s645404de9962eed3@mail.gmail.com> 2009/3/17 Steve Lianoglou : > I think somewhere around B, the person wanting to commit new code would have > to rebase[1] their branch against the official "stabilizer branch" (that So, if I understand well, anyone wanting to submit a change to the official version would be responsible for rebasing, right? PS - being a git noob and a longtime cvs/svn user and manager I much appreciated Randal Schwartz google talk at: http://www.youtube.com/watch?v=8dhZ9BXQgc4 Especially aroung 30 minutes it is really informative. From mailinglist.honeypot at gmail.com Tue Mar 17 15:34:11 2009 From: mailinglist.honeypot at gmail.com (Steve Lianoglou) Date: Tue, 17 Mar 2009 15:34:11 -0400 Subject: [Biopython-dev] biopython on github In-Reply-To: <6d941f120903171227o54bf9d36s645404de9962eed3@mail.gmail.com> References: <320fb6e00903170146x59218aa0m857cab797ad3f440@mail.gmail.com> <20090317124930.GE57054@sobchak.mgh.harvard.edu> <6d941f120903170619n4cb8d4dfr8a72f8ac1e0e896d@mail.gmail.com> <320fb6e00903170744j543f643fg6ef8d677287e2361@mail.gmail.com> <5aa3b3570903171136k3dc616a3hc937381d940cd305@mail.gmail.com> <6d941f120903171209g751b5b86p797e79b333972301@mail.gmail.com> <7063D4EA-D827-4D91-A15C-53F148660D96@gmail.com> <6d941f120903171227o54bf9d36s645404de9962eed3@mail.gmail.com> Message-ID: <711E86ED-F220-4E97-84BC-9E94753E111A@gmail.com> On Mar 17, 2009, at 3:27 PM, Tiago Ant?o wrote: > 2009/3/17 Steve Lianoglou : >> I think somewhere around B, the person wanting to commit new code >> would have >> to rebase[1] their branch against the official "stabilizer >> branch" (that > > So, if I understand well, anyone wanting to submit a change to the > official version would be responsible for rebasing, right? And if I understand it well, then I think you're right. I think that's a reasonable policy. That puts the responsibility to ensure that any new code I write works with whatever has been approved already on me, and not you. While this may require a bit extra responsibility on the committer, I'd be surprised if it would be enough to deter any new would-be committers from taking a shot at contributing code (maybe it would? I guess it's debatable). > PS - being a git noob and a longtime cvs/svn user and manager I much > appreciated Randal Schwartz google talk at: > http://www.youtube.com/watch?v=8dhZ9BXQgc4 Especially aroung 30 > minutes it is really informative. Sweet. To be honest, the only video I ever saw of git was Linus' SVN-bash google talk, which somehow put me off from considering git longer than I should have, so this is a good link to have :-) Thanks, -steve -- Steve Lianoglou Graduate Student: Physiology, Biophysics and Systems Biology Weill Medical College of Cornell University http://cbio.mskcc.org/~lianos From biopython at maubp.freeserve.co.uk Tue Mar 17 16:21:45 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 17 Mar 2009 20:21:45 +0000 Subject: [Biopython-dev] biopython on github In-Reply-To: <6d941f120903171209g751b5b86p797e79b333972301@mail.gmail.com> References: <320fb6e00903170146x59218aa0m857cab797ad3f440@mail.gmail.com> <20090317124930.GE57054@sobchak.mgh.harvard.edu> <6d941f120903170619n4cb8d4dfr8a72f8ac1e0e896d@mail.gmail.com> <320fb6e00903170744j543f643fg6ef8d677287e2361@mail.gmail.com> <5aa3b3570903171136k3dc616a3hc937381d940cd305@mail.gmail.com> <6d941f120903171209g751b5b86p797e79b333972301@mail.gmail.com> Message-ID: <320fb6e00903171321y4b94f220h7d2d1172ee085e15@mail.gmail.com> 2009/3/17 Tiago Ant?o : > OK, in order to exercise and try github development I have forked a > branch to work on the PopGen code. The idea of the branch is to serve > as a platform for merging with the "official" branch. So, the idea is: > > 1. Official branch - The stable thingy > 2. PopGen stabilizer branch - The place to merge contributions from > PopGen development branches. The idea is that people can go crazy on > their own branches and this intermediate one serves as a point to > stabilize (unit test, documentation, QA, ...) before the commit to the > official one > 3. Crazy branches - Develop your crazy idea. I have 3 ideas myself: > One for Jason's structure code, one for my LDNe code and another for > statistics. Many more welcomed.... > > The development procedure would be like this: > A. People would have all the fun on their development branches > B. When they felt confident they would submit their code to the > stabilizer branch, where we would check that all the important things > were there: unit test, code comments, QA, documentation > C. When things were in good shape, we would propose changes to the > official branch > > And, by the way, bug fixes of existing production would also be done > on the stabilizer branch. > > Does this make any sense? Totally. But keep in mind the current "official" git branch (the one being updated from CVS) may get nuked if we have to redo the import to fix the missing version tags - so I would suggest you name your branches with "test" or "provisional" or something temporary in the text for now. > In my view, with things like git, a policy like this encourages both > innovation while preserving stability and robustness of the official > branch. Yes - and this like the right approach for Bio.PopGen, with you acting as the gatekeeper. Peter From chapmanb at 50mail.com Tue Mar 17 17:34:14 2009 From: chapmanb at 50mail.com (Brad Chapman) Date: Tue, 17 Mar 2009 17:34:14 -0400 Subject: [Biopython-dev] biopython on github In-Reply-To: <320fb6e00903171059r7a5528d5i19bf5fed9cfd8a63@mail.gmail.com> References: <320fb6e00903170146x59218aa0m857cab797ad3f440@mail.gmail.com> <20090317124930.GE57054@sobchak.mgh.harvard.edu> <6d941f120903170619n4cb8d4dfr8a72f8ac1e0e896d@mail.gmail.com> <320fb6e00903170744j543f643fg6ef8d677287e2361@mail.gmail.com> <320fb6e00903171059r7a5528d5i19bf5fed9cfd8a63@mail.gmail.com> Message-ID: <20090317213414.GK57054@sobchak.mgh.harvard.edu> Hi Peter; > Using the git command line tool, I was able to pull and merge Brad's > changes (as I had made no changes in the meantime this could be done > automatically), and then push the merged version back up to github on > my branch. At this point my branch and brad's agreed once again, and > the "network" diagram no longer shows both. Note that my branch now > includes a commit from Brad. Sweet. Glad that worked. I deleted my branch (edit->delete repository). While doing so, I noticed that there is also a 'Repository Collaborators' section within the 'edit' page. So, another working model is to have multiple users simultaneously editing one forked revision. If you are already communicating on the work through the mailing list or wiki, this is more like CVS/SVN then the branching model. > Now all this worked, but I was wondering if the github web interface > could have simplified any of this, if I'd only know where to click. > For example, does github offer any way to view a diff between to > branches? Or, as I suspect, do they simply expect you to use the git > tools directly for this? What was the command you used for this? git diff is still befuddling to me. Brad From bugzilla-daemon at portal.open-bio.org Wed Mar 18 10:18:39 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 18 Mar 2009 10:18:39 -0400 Subject: [Biopython-dev] [Bug 2777] [Solution is one line change!] Entity sorting altered by detach_child() calls In-Reply-To: Message-ID: <200903181418.n2IEIdIm003158@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2777 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #5 from biopython-bugzilla at maubp.freeserve.co.uk 2009-03-18 10:18 EST ------- Fix checked into CVS as Bio/PDB/Entity.py revision 1.26, marking as fixed. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython at maubp.freeserve.co.uk Wed Mar 18 11:07:42 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 18 Mar 2009 15:07:42 +0000 Subject: [Biopython-dev] Preparing for Biopython 1.50 (beta) In-Reply-To: <320fb6e00903160516yd63f61fu21ca7560562dd6dd@mail.gmail.com> References: <320fb6e00903160516yd63f61fu21ca7560562dd6dd@mail.gmail.com> Message-ID: <320fb6e00903180807u4a0f7a5aqaa91f20b40891ca4@mail.gmail.com> On Mon, Mar 16, 2009 at 12:16 PM, Peter wrote: > Bug 2767 - Bio.SeqIO support for FASTQ and QUAL files That's in CVS now, Brad and I have used it a bit, but further testing before the beta wouldn't hurt. > Bug 2551 - Adding advanced __getitem__ to generic alignment, e.g. > align[1:2,5:-5] Anyone want try this out? http://bugzilla.open-bio.org/show_bug.cgi?id=2551 > Any other nominations for Biopython 1.50? Other candidates with patches that have since come to mind: Bug 2733 - Runing unit tests where Biopthyon wasn't built from source http://bugzilla.open-bio.org/show_bug.cgi?id=2733 This seemed patch seemed OK from both my and Bruce's testing. Bug 2738 - Speed up GenBank parsing, in particular location parsing http://bugzilla.open-bio.org/show_bug.cgi?id=2738 I would want to run some theses with EMBL files before committing this. Bug 2745 - Bio.GenBank.LocationParserError with a GenBank CON file http://bugzilla.open-bio.org/show_bug.cgi?id=2745 I'd like to change CONTIG line parsing to just use a string (or a list of strings). Peter From nuin at genedrift.org Wed Mar 18 15:50:28 2009 From: nuin at genedrift.org (Paulo Nuin) Date: Wed, 18 Mar 2009 15:50:28 -0400 Subject: [Biopython-dev] biopython on github In-Reply-To: <320fb6e00903161007p3e36b6d3j29e4c319c762576a@mail.gmail.com> References: <8b34ec180902231029u7a9d003r533af7f078f4a8e2@mail.gmail.com> <8b34ec180903121620w9c2ec46i8fed9ccb4781370e@mail.gmail.com> <320fb6e00903130521s69c5b3eg55b71191b1e8ff21@mail.gmail.com> <128a885f0903142243r372026d7vdf5bbe998db3a326@mail.gmail.com> <20090315185443.GA30296@kunkel> <320fb6e00903160430h125d11a3jd100497d3e25ffb8@mail.gmail.com> <8b34ec180903160724h2e239fafi22d8f5fa9c1de7cc@mail.gmail.com> <320fb6e00903160800s36b8231fo57e0a11506f8635d@mail.gmail.com> <8b34ec180903160955m3d427927wce61940f51cf5337@mail.gmail.com> <49BE8532.9040701@genedrift.org> <320fb6e00903161007p3e36b6d3j29e4c319c762576a@mail.gmail.com> Message-ID: <49C15084.8040208@genedrift.org> Peter wrote: > On Mon, Mar 16, 2009 at 4:58 PM, Paulo Nuin wrote: > >> No problem on Vista. >> >> Git (version 1.5.6.1-preview20080701) >> >> Paulo >> > > Hi Paulo, > > Could you be a bit more precise about the version are you using and > where got it from? i.e. Are you using cygwin or the Windows native > port, http://code.google.com/p/msysgit/ > I'm using msysgit version 1.5.6. > And did you mean in general you have no problems with git on Windows > Vista, or have you also tried fetching Biopython from github, > building, testing (and installing it)? For example, are there any new > line issues from the unit tests? This is one area where CVS and git > may differ slightly... > I'm using Github to store a couple of projects and this version is working great. Also Eclipse addon is also fine. I cloned BioPython but haven't tried installing or building it. Paulo From bugzilla-daemon at portal.open-bio.org Thu Mar 19 09:42:23 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 19 Mar 2009 09:42:23 -0400 Subject: [Biopython-dev] [Bug 2654] Bio.Blast.NCBIStandalone does not support the output file argument In-Reply-To: Message-ID: <200903191342.n2JDgN3p016978@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2654 yvan.strahm at bccs.uib.no changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |yvan.strahm at bccs.uib.no -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Mar 19 13:08:16 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 19 Mar 2009 13:08:16 -0400 Subject: [Biopython-dev] [Bug 2654] Bio.Blast.NCBIStandalone does not support the output file argument In-Reply-To: Message-ID: <200903191708.n2JH8GqS032350@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2654 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2009-03-19 13:08 EST ------- Fixed in Bio/Blast/NCBIStandalone.py CVS revision 1.86 http://cvs.biopython.org/cgi-bin/viewcvs/viewcvs.cgi/biopython/Bio/Blast/NCBIStandalone.py?cvsroot=biopython Note that the three tools themselves all use -o (lower case) for the output file, but refer to it slightly differently: $ ./rpsblast --help | grep " -o " -o Output File for Alignment [File Out] Optional $ ./blastpgp --help | grep " -o " -o Output File for Alignment [File Out] Optional $ ./blastall --help | grep " -o " -o BLAST report Output File [File Out] Optional Our function for rpsblast already supported this argument under the name "align_outfile" which I have therefore also used for blastpgp (this is good name as blastpgp outputs more than one type of file). For blastall "align_outfile" doesn't seem entirely appropriate, and although it is inconsistent I have gone for "outfile" instead. Example usage: #imports and setting up input parameters omitted out_handle, err_handle = NCBIStandalone.blastall(blastall_exe, "blastp", blastdb_nr, query_file, expectation=0.000001, nprocessors=1, filter="F", outfile=output_file, alignments=5, descriptions=5) assert "" == err_handle.read() assert "" = out_handle.read() #Important so we wait for BLAST to finish! err_handle.close() out_handle.close() assert os.path.isfile(output_file) count = 0 for blast_record in NCBIXML.parse(open(output_file)) : count += 1 print "Found %i BLAST results" % count -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython at maubp.freeserve.co.uk Thu Mar 19 15:00:51 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 19 Mar 2009 19:00:51 +0000 Subject: [Biopython-dev] biopython on github In-Reply-To: <20090317213414.GK57054@sobchak.mgh.harvard.edu> References: <320fb6e00903170146x59218aa0m857cab797ad3f440@mail.gmail.com> <20090317124930.GE57054@sobchak.mgh.harvard.edu> <6d941f120903170619n4cb8d4dfr8a72f8ac1e0e896d@mail.gmail.com> <320fb6e00903170744j543f643fg6ef8d677287e2361@mail.gmail.com> <320fb6e00903171059r7a5528d5i19bf5fed9cfd8a63@mail.gmail.com> <20090317213414.GK57054@sobchak.mgh.harvard.edu> Message-ID: <320fb6e00903191200q4ccff93v7e082990d115bc09@mail.gmail.com> On Tue, Mar 17, 2009 at 9:34 PM, Brad Chapman wrote: > Hi Peter; > >> Using the git command line tool, I was able to pull and merge Brad's >> changes (as I had made no changes in the meantime this could be done >> automatically), and then push the merged version back up to github on >> my branch. ?At this point my branch and brad's agreed once again, and >> the "network" diagram no longer shows both. ?Note that my branch now >> includes a commit from Brad. > > Sweet. Glad that worked. I deleted my branch (edit->delete > repository). How long did it take to process? I deleted mine (after attempting to merge against the CVS mirror). The delete was still in progress over 12 hours later! > While doing so, I noticed that there is also a 'Repository > Collaborators' section within the 'edit' page. So, another working > model is to have multiple users simultaneously editing one forked > revision. If you are already communicating on the work through the > mailing list or wiki, this is more like CVS/SVN then the branching > model. Yes, this should be a fairly simple way to give all our current CVS developers direct access to a master branch on github. >> Now all this worked, but I was wondering if the github web interface >> could have simplified any of this, if I'd only know where to click. >> For example, does github offer any way to view a diff between to >> branches? ?Or, as I suspect, do they simply expect you to use the git >> tools directly for this? > > What was the command you used for this? git diff is still befuddling > to me. I didn't actually figure that out (how to do a diff between two branches on github). And this afternoon github seems to be down, so I haven't played with it any more. Peter From chris.lasher at gmail.com Fri Mar 20 00:52:49 2009 From: chris.lasher at gmail.com (Chris Lasher) Date: Fri, 20 Mar 2009 00:52:49 -0400 Subject: [Biopython-dev] Help pages in Biopython wiki Message-ID: <128a885f0903192152m7d1e24fdh3ace50021851b36e@mail.gmail.com> Would it be possible to get the help documentation installed for the Biopython wiki? http://biopython.org/wiki/Help Chris From lpritc at scri.ac.uk Fri Mar 20 04:42:44 2009 From: lpritc at scri.ac.uk (Leighton Pritchard) Date: Fri, 20 Mar 2009 08:42:44 +0000 Subject: [Biopython-dev] Help pages in Biopython wiki In-Reply-To: <128a885f0903192152m7d1e24fdh3ace50021851b36e@mail.gmail.com> Message-ID: Hi Chris, That page doesn't exist, yet (click on the 'page' tab to see this), and no pages link to it (see here: http://biopython.org/wiki/Special:WhatLinksHere/Help) What help were you expecting to see there? L. On 20/03/2009 04:52, "Chris Lasher" wrote: > Would it be possible to get the help documentation installed for the > Biopython wiki? > > http://biopython.org/wiki/Help > > Chris > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev > > ______________________________________________________________________ > This email has been scanned by the MessageLabs Email Security System. > For more information please visit http://www.messagelabs.com/email > ______________________________________________________________________ -- Dr Leighton Pritchard MRSC D131, Plant Pathology Programme, SCRI Errol Road, Invergowrie, Perth and Kinross, Scotland, DD2 5DA e:lpritc at scri.ac.uk w:http://www.scri.ac.uk/staff/leightonpritchard gpg/pgp: 0xFEFC205C tel:+44(0)1382 562731 x2405 ______________________________________________________ SCRI, Invergowrie, Dundee, DD2 5DA. The Scottish Crop Research Institute is a charitable company limited by guarantee. Registered in Scotland No: SC 29367. Recognised by the Inland Revenue as a Scottish Charity No: SC 006662. DISCLAIMER: This email is from the Scottish Crop Research Institute, but the views expressed by the sender are not necessarily the views of SCRI and its subsidiaries. This email and any files transmitted with it are confidential to the intended recipient at the e-mail address to which it has been addressed. It may not be disclosed or used by any other than that addressee. If you are not the intended recipient you are requested to preserve this confidentiality and you must not use, disclose, copy, print or rely on this e-mail in any way. Please notify postmaster at scri.ac.uk quoting the name of the sender and delete the email from your system. Although SCRI has taken reasonable precautions to ensure no viruses are present in this email, neither the Institute nor the sender accepts any responsibility for any viruses, and it is your responsibility to scan the email and the attachments (if any). ______________________________________________________ From biopython at maubp.freeserve.co.uk Fri Mar 20 06:41:49 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 20 Mar 2009 10:41:49 +0000 Subject: [Biopython-dev] biopython on github In-Reply-To: <320fb6e00903191200q4ccff93v7e082990d115bc09@mail.gmail.com> References: <320fb6e00903170146x59218aa0m857cab797ad3f440@mail.gmail.com> <20090317124930.GE57054@sobchak.mgh.harvard.edu> <6d941f120903170619n4cb8d4dfr8a72f8ac1e0e896d@mail.gmail.com> <320fb6e00903170744j543f643fg6ef8d677287e2361@mail.gmail.com> <320fb6e00903171059r7a5528d5i19bf5fed9cfd8a63@mail.gmail.com> <20090317213414.GK57054@sobchak.mgh.harvard.edu> <320fb6e00903191200q4ccff93v7e082990d115bc09@mail.gmail.com> Message-ID: <320fb6e00903200341n7df020a7j95c611ab0a886ccb@mail.gmail.com> On Thu, Mar 19, 2009 at 7:00 PM, Peter wrote: > On Tue, Mar 17, 2009 at 9:34 PM, Brad Chapman wrote: >> Sweet. Glad that worked. I deleted my branch (edit->delete >> repository). > > How long did it take to process? ?I deleted mine (after attempting to > merge against the CVS mirror). ?The delete was still in progress over > 12 hours later! And the branch delete is still on-going :( > ... ?And this afternoon github seems to be down, so I haven't played with it any more. Its back online again, but right now for me github is a bit of a damp squid [*]. As my initial branch/fork of biopython still exists but is being deleted, it seems in the meantime I can't create a new branch of biopython. Odd, and rather frustrating. Hopefully it will sort itself out shortly, and I can have another play with merging branches... Peter [*] For the benefit of non-native English speakers, or or anyone whose sense of humour works differently to mine, this was a pun, based on the English phrase "damp squib" for a disappointing event, and the fact that github's error page has some kind of cartoon squid/octopus-cat creature on it. From dalloliogm at gmail.com Fri Mar 20 07:15:21 2009 From: dalloliogm at gmail.com (Giovanni Marco Dall'Olio) Date: Fri, 20 Mar 2009 12:15:21 +0100 Subject: [Biopython-dev] biopython on github In-Reply-To: <320fb6e00903200341n7df020a7j95c611ab0a886ccb@mail.gmail.com> References: <320fb6e00903170146x59218aa0m857cab797ad3f440@mail.gmail.com> <20090317124930.GE57054@sobchak.mgh.harvard.edu> <6d941f120903170619n4cb8d4dfr8a72f8ac1e0e896d@mail.gmail.com> <320fb6e00903170744j543f643fg6ef8d677287e2361@mail.gmail.com> <320fb6e00903171059r7a5528d5i19bf5fed9cfd8a63@mail.gmail.com> <20090317213414.GK57054@sobchak.mgh.harvard.edu> <320fb6e00903191200q4ccff93v7e082990d115bc09@mail.gmail.com> <320fb6e00903200341n7df020a7j95c611ab0a886ccb@mail.gmail.com> Message-ID: <5aa3b3570903200415m2f46a45fs8be270f28357a994@mail.gmail.com> On Fri, Mar 20, 2009 at 11:41 AM, Peter wrote: > On Thu, Mar 19, 2009 at 7:00 PM, Peter wrote: >> On Tue, Mar 17, 2009 at 9:34 PM, Brad Chapman wrote: >>> Sweet. Glad that worked. I deleted my branch (edit->delete >>> repository). >> >> How long did it take to process? ?I deleted mine (after attempting to >> merge against the CVS mirror). ?The delete was still in progress over >> 12 hours later! > > And the branch delete is still on-going :( > >> ... ?And this afternoon github seems to be down, so I haven't played with it any more. > > Its back online again, but right now for me github is a bit of a damp squid [*]. > As my initial branch/fork of biopython still exists but is being > deleted, it seems > in the meantime I can't create a new branch of biopython. mmm are you referring to this: - http://github.com/peterjc/biopython-seqio-quality/network ? I can see it, and also fetch/pull changes from it.. I see that you have renamed your fork as seqio-quality. Ok, but I think it is better to keep the fork's name as 'biopython', and then create many branches inside it. For example: git clone cd biopython # make some commits to your master branch: touch testfile.txt git add testfile.txt git commit -a -m 'test file added' # push the changes to your github repository ('origin' refers to github; see $(CWD)/biopython/.git/config) git push origin master # create a branch called 'experimental-seqio-quality', and switch to it: # without arguments, git branch shows the list of branches and the current one: git branch # create the experimental-seqio-quality branch: git branch experimental-seqio-quality # switch to it: git checkout experimental-seqio-quality # check that experimental-seqio-quality is the current working branch: git branch # now you are working in the branch called 'experimental-seqio-quality'. All the changes you # commit here, will not be saved in the 'master' branch or the others, as long as you don't # merge them: touch seqio-parser git add seqio-parser git commit -a -m 'added seqioparser' git push origin experimental-seqio-quality # after pushing, git will create a new branch in github. Look for example at my fork here: # - http://github.com/biopython/biopython/network ############ Here is how you can merge and compare your branch with someone else's or with the biopython one: # add a reference to biopython official branch git remote add biopython git://github.com/biopython/biopython.git # obtain the set of changes from the biopython branch, and merge them git fetch biopython git log master biopython/master git diff master biopython/master git merge master biopython/master git remote add peter git://github.com/peterjc/biopython-seqio-quality.git git fetch peter # there it should be a way to do this without having to fetch git diff master peter/master For references, look at this guide: http://github.com/guides/keeping-a-git-fork-in-sync-with-the-forked-repo >?Odd, and rather > frustrating. ?Hopefully it will sort itself out shortly, and I can > have another play > with merging branches... > > Peter > > [*] For the benefit of non-native English speakers, or or anyone whose sense > of humour works differently to mine, this was a pun, based on the English phrase > "damp squib" for a disappointing event, and the fact that github's > error page has > some kind of cartoon squid/octopus-cat creature on it. > > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev > -- My blog on bioinformatics (now in English): http://bioinfoblog.it From cymon.cox at googlemail.com Fri Mar 20 07:16:27 2009 From: cymon.cox at googlemail.com (Cymon Cox) Date: Fri, 20 Mar 2009 11:16:27 +0000 Subject: [Biopython-dev] Test - ignore Message-ID: <7265d4f0903200416o7c8135ddrfae4aad723bd17b7@mail.gmail.com> From biopython at maubp.freeserve.co.uk Fri Mar 20 07:32:15 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 20 Mar 2009 11:32:15 +0000 Subject: [Biopython-dev] biopython on github In-Reply-To: <5aa3b3570903200415m2f46a45fs8be270f28357a994@mail.gmail.com> References: <320fb6e00903170146x59218aa0m857cab797ad3f440@mail.gmail.com> <20090317124930.GE57054@sobchak.mgh.harvard.edu> <6d941f120903170619n4cb8d4dfr8a72f8ac1e0e896d@mail.gmail.com> <320fb6e00903170744j543f643fg6ef8d677287e2361@mail.gmail.com> <320fb6e00903171059r7a5528d5i19bf5fed9cfd8a63@mail.gmail.com> <20090317213414.GK57054@sobchak.mgh.harvard.edu> <320fb6e00903191200q4ccff93v7e082990d115bc09@mail.gmail.com> <320fb6e00903200341n7df020a7j95c611ab0a886ccb@mail.gmail.com> <5aa3b3570903200415m2f46a45fs8be270f28357a994@mail.gmail.com> Message-ID: <320fb6e00903200432s59ddf9a8vfd8230c0a07cd598@mail.gmail.com> >> As my initial branch/fork of biopython still exists but is being >> deleted, it seems in the meantime I can't create a new branch >> of biopython. > > mmm are you referring to this: > - http://github.com/peterjc/biopython-seqio-quality/network > ? > > I can see it, and also fetch/pull changes from it.. True, the network page is still there for me. But http://github.com/peterjc/biopython-seqio-quality/ which redirects to http://github.com/peterjc/biopython-seqio-quality/tree/master shows me just a "This repository is being deleted" page. > I see that you have renamed your fork as seqio-quality. Ok, but I > think it is better to keep the fork's name as 'biopython', and then > create many branches inside it. I don't think I had entirely understood github's use of fork versus branch. I'll have so do some more reading and try again once my account has settled down. Thanks for the details in your email. Peter From bugzilla-daemon at portal.open-bio.org Fri Mar 20 08:18:53 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 20 Mar 2009 08:18:53 -0400 Subject: [Biopython-dev] [Bug 2678] Bio.Entrez module does not always retrieve or find DTD files In-Reply-To: Message-ID: <200903201218.n2KCIrSX026346@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2678 ------- Comment #10 from mdehoon at ims.u-tokyo.ac.jp 2009-03-20 08:18 EST ------- (In reply to comment #7) > (In reply to comment #6) > > If the DTD is available locally in Bio/Entrez/DTDs, then Bio.Entrez will read > > it from there. If not, it tries to download it. This may fail if the servers > > are busy. If the needed DTDs are saved in Bio/Entrez/DTDs (and installed when > > Biopython is installed), you won't run into this problem. > > I was just looking at this on my Windows XP Python 2.3 machine, and when it > tried to download missing DTD files it was just using a filename as the URL. In hindsight, I wonder if trying to download missing DTD files is really a good idea. Suppose a user does a large number of Entrez queries, and saves the results as XML files. Then, he tries to parse each of those XML files. If a DTD file is missing, then Bio.Entrez will try to download the same DTD file for each XML file it is trying to parse. This is not only wasteful, but also bypasses Entrez's rule of no more than three accesses per second. In addition, this is fragile. The XML files typically contain a full url to the needed DTD. But many of Entrez's DTD files contain references to other DTD files, and those references can be relative. When Bio.Entrez gets such a relative path to where the DTD file is located, it is difficult to figure out the absolute path to the DTD. Now we are looking for it in http://www.ncbi.nlm.nih.gov/dtd/, but this does not seem to contain all required DTDs. It may therefore make sense not to download the DTD file, but to raise an Exception with a helpful error message, specifying which DTD file is missing, where it can possibly be found, and where the DTD file can be installed. It requires some more effort from the user, but it is more robust, won't break Entrez' rules, and is more efficient. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From chapmanb at 50mail.com Fri Mar 20 08:55:18 2009 From: chapmanb at 50mail.com (Brad Chapman) Date: Fri, 20 Mar 2009 08:55:18 -0400 Subject: [Biopython-dev] biopython on github In-Reply-To: <320fb6e00903200432s59ddf9a8vfd8230c0a07cd598@mail.gmail.com> References: <20090317124930.GE57054@sobchak.mgh.harvard.edu> <6d941f120903170619n4cb8d4dfr8a72f8ac1e0e896d@mail.gmail.com> <320fb6e00903170744j543f643fg6ef8d677287e2361@mail.gmail.com> <320fb6e00903171059r7a5528d5i19bf5fed9cfd8a63@mail.gmail.com> <20090317213414.GK57054@sobchak.mgh.harvard.edu> <320fb6e00903191200q4ccff93v7e082990d115bc09@mail.gmail.com> <320fb6e00903200341n7df020a7j95c611ab0a886ccb@mail.gmail.com> <5aa3b3570903200415m2f46a45fs8be270f28357a994@mail.gmail.com> <320fb6e00903200432s59ddf9a8vfd8230c0a07cd598@mail.gmail.com> Message-ID: <20090320125518.GA351@sobchak.mgh.harvard.edu> Hi all; > >> As my initial branch/fork of biopython still exists but is being > >> deleted, it seems in the meantime I can't create a new branch > >> of biopython. [...] > True, the network page is still there for me. But > http://github.com/peterjc/biopython-seqio-quality/ which redirects to > http://github.com/peterjc/biopython-seqio-quality/tree/master > shows me just a "This repository is being deleted" page. Peter, the repository deletion was very quick for me, so it looks like it got stuck somewhere with the GitHub downtime. Does this help for getting it removed: http://originblog.wordpress.com/2008/04/28/github-tips-removing-a-remote-branch/ > > I see that you have renamed your fork as seqio-quality. Ok, but I > > think it is better to keep the fork's name as 'biopython', and then > > create many branches inside it. > > I don't think I had entirely understood github's use of fork versus branch. > I'll have so do some more reading and try again once my account has > settled down. Thanks for the details in your email. Wow, now I am mad confused. I thought forks and branches were conceptually the same. Giovanni, it seems like you are suggesting one branch (the GitHub fork) and then a second branch (the git branch command). We were thinking of a standard case as: 1. Fork the Biopython trunk at GitHub. Name this something so it makes sense what the fork/branch is for. 2. Work on the fork/branch. If you want, invite others to work on it with you. 3. When finished, be sure you are up to date with the master Biopython trunk. 4. Submit the fork/branch for inclusion in Biopython. 5. Once included, delete the fork/branch. Which parts of this fall out of "standard" git practice? In general, we should strive to keep this as simple as possible. If using Git is complicated then we are losing a lot of our advantage over CVS/patches. Giovanni, the example commands were very helpful; I added details to the Git page on how to see diffs of branches: http://biopython.org/wiki/GitMigration#Evaluating_changes Brad From bugzilla-daemon at portal.open-bio.org Fri Mar 20 09:57:00 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 20 Mar 2009 09:57:00 -0400 Subject: [Biopython-dev] [Bug 2678] Bio.Entrez module does not always retrieve or find DTD files In-Reply-To: Message-ID: <200903201357.n2KDv0JJ001146@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2678 ------- Comment #11 from biopython-bugzilla at maubp.freeserve.co.uk 2009-03-20 09:57 EST ------- (In reply to comment #10) > > In hindsight, I wonder if trying to download missing DTD files is really a > good idea. Suppose a user does a large number of Entrez queries, and saves > the results as XML files. Then, he tries to parse each of those XML files. > If a DTD file is missing, then Bio.Entrez will try to download the same DTD > file for each XML file it is trying to parse. This is not only wasteful, but > also bypasses Entrez's rule of no more than three accesses per second. Very true. We should be able to enforce the access limit here without too much trouble. More generally, it would make sense for the DTD file to be saved - ideally to the python site-packages but as we may not have write access, at least to a cache. > In addition, this is fragile. The XML files typically contain a full url to > the needed DTD. But many of Entrez's DTD files contain references to other > DTD files, and those references can be relative. When Bio.Entrez gets such a > relative path to where the DTD file is located, it is difficult to figure out > the absolute path to the DTD. Now we are looking for it in > http://www.ncbi.nlm.nih.gov/dtd/, but this does not seem to contain all > required DTDs. When I looked into the DTD URLs, I didn't see the NCBI using an relative links, but they may have changed things since. Additionally the NCBI have a (different but overlapping) set of DTD files at: http://eutils.ncbi.nlm.nih.gov/entrez/query/DTD/ Can we get some python XML/DTD library to resolve these links for us? > It may therefore make sense not to download the DTD file, but to raise an > Exception with a helpful error message, specifying which DTD file is missing, > where it can possibly be found, and where the DTD file can be installed. It > requires some more effort from the user, but it is more robust, won't break > Entrez' rules, and is more efficient. Biopython 1.49 generally failed to download missing DTD files. Right now the current code in CVS does much better at coping with missing DTD files, but in a very wasteful way. In either version, it does at least issue warnings, indicating something is not right. As a user, I would prefer Bio.Entrez to download missing DTD files on demand AND SAVE THEM. As a developer I can see this is rather complicated, and you are right Michiel - a simple error message with instructions is much more straight forward. Note that the error might also suggest upgrading to the latest Biopython, or reporting the issue to us - but it would then be a very long error message! If you want to switch to a helpful error message for missing DTD files, I'm OK with that. We could also ship the current code for Biopython 1.50. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From dalloliogm at gmail.com Fri Mar 20 10:25:41 2009 From: dalloliogm at gmail.com (Giovanni Marco Dall'Olio) Date: Fri, 20 Mar 2009 15:25:41 +0100 Subject: [Biopython-dev] biopython on github In-Reply-To: <20090320125518.GA351@sobchak.mgh.harvard.edu> References: <6d941f120903170619n4cb8d4dfr8a72f8ac1e0e896d@mail.gmail.com> <320fb6e00903170744j543f643fg6ef8d677287e2361@mail.gmail.com> <320fb6e00903171059r7a5528d5i19bf5fed9cfd8a63@mail.gmail.com> <20090317213414.GK57054@sobchak.mgh.harvard.edu> <320fb6e00903191200q4ccff93v7e082990d115bc09@mail.gmail.com> <320fb6e00903200341n7df020a7j95c611ab0a886ccb@mail.gmail.com> <5aa3b3570903200415m2f46a45fs8be270f28357a994@mail.gmail.com> <320fb6e00903200432s59ddf9a8vfd8230c0a07cd598@mail.gmail.com> <20090320125518.GA351@sobchak.mgh.harvard.edu> Message-ID: <5aa3b3570903200725p1437ceem6a538af640c52ced@mail.gmail.com> On Fri, Mar 20, 2009 at 1:55 PM, Brad Chapman wrote: > Hi all; > >> >> As my initial branch/fork of biopython still exists but is being >> >> deleted, it seems in the meantime I can't create a new branch >> >> of biopython. > [...] >> True, the network page is still there for me. But >> http://github.com/peterjc/biopython-seqio-quality/ which redirects to >> http://github.com/peterjc/biopython-seqio-quality/tree/master >> shows me just a "This repository is being deleted" page. > > Peter, the repository deletion was very quick for me, so it looks like it > got stuck somewhere with the GitHub downtime. Does this help for getting it > removed: > > http://originblog.wordpress.com/2008/04/28/github-tips-removing-a-remote-branch/ > >> > I see that you have renamed your fork as seqio-quality. Ok, but I >> > think it is better to keep the fork's name as 'biopython', and then >> > create many branches inside it. >> >> I don't think I had entirely understood github's use of fork versus branch. >> I'll have so do some more reading and try again once my account has >> settled down. ?Thanks for the details in your email. > > Wow, now I am mad confused. I thought forks and branches were > conceptually the same. Consider that the term "fork" is specific to github, and has nothing to do with git. There is no 'git fork' command. When you do a 'fork' in github, what it does it to create a personal 'space' on your account on github, to host all your personalizations, including new commits and also new branches of development. It is a kind of 'working space', that indicates all the work you have done. I understand it seems a bit complicated at first :-( but I think that, without using github, it is even more difficult to understand these things. In your account you can have more than one experimental branch. For example, I can create a branch called 'experimental-xzy-parser', another called 'personal modifications', and keep the master branch as it is (or rename it). if you want to contribute to my 'xyz parser', you can fetch this branch into your space, with a command like: $: git remote add giovanni $: git pull giovanni master:experimental-xyz-parser # (not sure about this last command) this should create a branch called 'experimental-xyz-parser' in your computer, so you can work with it, make modifications, and later push it to github (where it will happear in the network graph). > Giovanni, it seems like you are suggesting one > branch (the GitHub fork) and then a second branch (the git branch > command). We were thinking of a standard case as: > > 1. Fork the Biopython trunk at GitHub. Name this something so it > makes sense what the fork/branch is for. > 2. Work on the fork/branch. If you want, invite others to work on it > with you. > 3. When finished, be sure you are up to date with the master > Biopython trunk. > 4. Submit the fork/branch for inclusion in Biopython. > 5. Once included, delete the fork/branch. > > Which parts of this fall out of "standard" git practice? In general, > we should strive to keep this as simple as possible. If using Git is > complicated then we are losing a lot of our advantage over CVS/patches. > > Giovanni, the example commands were very helpful; I added details to the Git > page on how to see diffs of branches: > > http://biopython.org/wiki/GitMigration#Evaluating_changes > > Brad > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev > -- My blog on bioinformatics (now in English): http://bioinfoblog.it From bugzilla-daemon at portal.open-bio.org Fri Mar 20 10:50:49 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 20 Mar 2009 10:50:49 -0400 Subject: [Biopython-dev] [Bug 2767] Bio.SeqIO support for FASTQ and QUAL files In-Reply-To: Message-ID: <200903201450.n2KEonrB005712@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2767 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #12 from biopython-bugzilla at maubp.freeserve.co.uk 2009-03-20 10:50 EST ------- Code is in CVS with unit tests. Marking as fixed. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Mar 20 10:53:37 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 20 Mar 2009 10:53:37 -0400 Subject: [Biopython-dev] [Bug 2770] suggestion: raise a warning if Entrez.email is not set In-Reply-To: Message-ID: <200903201453.n2KErbfO006014@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2770 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |WONTFIX ------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk 2009-03-20 10:53 EST ------- Resolved as won't fix (unless the NCBI change their guidelines). -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Mar 20 11:49:52 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 20 Mar 2009 11:49:52 -0400 Subject: [Biopython-dev] [Bug 2718] Bio.Graphics and output file formats (PDF, EPS, SVG, and bitmaps) In-Reply-To: Message-ID: <200903201549.n2KFnqs8011031@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2718 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #5 from biopython-bugzilla at maubp.freeserve.co.uk 2009-03-20 11:49 EST ------- (In reply to comment #0) > (1) All the Bio.Graphics "write to file/handle" functions to accept any of the > supported file formats (like Bio.Graphics.GenomeDiagram), which would require > renderPM at run time for the bitmap formats (see Bug 2710). They should share > some code for mapping format names to ReportLab rendering module. This would > be easy to do without changing the existing mix of method names. That should be working in CVS now. > (2) Update the docstrings for the "write to file/handle" functions to make it > clear they can accept a filename OR a handle (a result of the underlying > reportlab renderer's drawToFile function's behaviour - see note below). This was done in CVS some time ago (comment 2) > (3) Standardise on the method naming (and perhaps deprecate the old methods). > Using "write" seems to be a sensible choice based on the current names used in > Bio.Graphics. This one is more difficult. GenomeDiagram uses a two step system - draw then write, where draw creates the ReportLab drawing object, and write saves it to a file. I'm going to leave this for another day... Marking bug as fixed. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Mar 20 13:32:50 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 20 Mar 2009 13:32:50 -0400 Subject: [Biopython-dev] [Bug 2795] New: Add commit, rollback, close to DBServer object Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2795 Summary: Add commit, rollback, close to DBServer object Product: Biopython Version: Not Applicable Platform: All OS/Version: All Status: NEW Severity: enhancement Priority: P2 Component: BioSQL AssignedTo: biopython-dev at biopython.org ReportedBy: biopython-bugzilla at maubp.freeserve.co.uk The DBServer object is defined in file BioSQL/BioSeqDatabase.py and it might make sense to add the following methods to it: def commit(self): """Commits the current transaction to the database.""" return self.adaptor.commit() def rollback(self): """Rolls backs the current transaction.""" return self.adaptor.rollback() def close(self): """Close the connection. No further activity possible.""" return self.adaptor.close() I think the adaptor is intended to hide internal implementation details, so we shouldn't be forcing people to use it directly for transaction support. Consider this example from http://www.biopython.org/wiki/BioSQL currently: from Bio import Entrez from Bio import SeqIO from BioSQL import BioSeqDatabase server = BioSeqDatabase.open_database(driver="MySQLdb", user="root", passwd = "", host = "localhost", db="bioseqdb") db = server["orchids"] handle = Entrez.efetch(db="nuccore", id="6273291,6273290,6273289", rettype="genbank") db.load(SeqIO.parse(handle, "genbank")) server.adaptor.commit() The last line would become just: server.commit() This seems cleaner. Patch to follow... -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Mar 20 13:34:14 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 20 Mar 2009 13:34:14 -0400 Subject: [Biopython-dev] [Bug 2795] Add commit, rollback, close to DBServer object In-Reply-To: Message-ID: <200903201734.n2KHYEZR018864@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2795 ------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2009-03-20 13:34 EST ------- Created an attachment (id=1263) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1263&action=view) BioSQL patch Patch to implement the change described. Tested with MySQL only. Cymon - what do you think of this? And does it work on PostgreSQL? -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Mar 20 13:59:14 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 20 Mar 2009 13:59:14 -0400 Subject: [Biopython-dev] [Bug 2795] Add commit, rollback, close to DBServer object In-Reply-To: Message-ID: <200903201759.n2KHxENC020654@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2795 ------- Comment #2 from cymon.cox at gmail.com 2009-03-20 13:59 EST ------- (In reply to comment #1) > Created an attachment (id=1263) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1263&action=view) [details] > BioSQL patch > > Patch to implement the change described. Tested with MySQL only. > > Cymon - what do you think of this? And does it work on PostgreSQL? I think it makes sense, and works on PostgreSQL with the psycopg2 driver. C. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Mar 20 14:07:55 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 20 Mar 2009 14:07:55 -0400 Subject: [Biopython-dev] [Bug 2795] Add commit, rollback, close to DBServer object In-Reply-To: Message-ID: <200903201807.n2KI7t37021424@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2795 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk 2009-03-20 14:07 EST ------- (In reply to comment #2) > I think it makes sense, and works on PostgreSQL with the psycopg2 driver. > C. Great, checked in, marking as fixed. We should update the wiki once Biopython 1.50 is out... -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Mar 20 14:52:44 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 20 Mar 2009 14:52:44 -0400 Subject: [Biopython-dev] [Bug 2754] Bio.PDB: Parse warnings should print to stderr, not stdout In-Reply-To: Message-ID: <200903201852.n2KIqiBO024589@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2754 ------- Comment #10 from eric.talevich at gmail.com 2009-03-20 14:52 EST ------- Here's the github branch where I'm working on this bug: http://github.com/etal/biopython/tree/master I've applied the two patches attached here and converted the test script from print-and-compare to unittest. The tests pass now, but I haven't added checks for specific parsing errors, just the general PDBConstructionError raised when parsing the example file with PERMISSIVE=0. The warnings are hidden during tests, as expected, but in this branch the PDBParser warnings are noticeably more annoying during normal use. Fixing this will require more tweaking in Bio/PDB/PDBParser.py -- I'll do that in the same branch, since I don't think you'd want to merge one fix without the other. Same goes for the __debug__ protection in StructureBuilder.py. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Mar 20 16:08:37 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 20 Mar 2009 16:08:37 -0400 Subject: [Biopython-dev] [Bug 2754] Bio.PDB: Parse warnings should print to stderr, not stdout In-Reply-To: Message-ID: <200903202008.n2KK8bpj029413@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2754 ------- Comment #11 from biopython-bugzilla at maubp.freeserve.co.uk 2009-03-20 16:08 EST ------- (In reply to comment #10) > Here's the github branch where I'm working on this bug: > http://github.com/etal/biopython/tree/master I've had a quick look on github, and this look interesting and I hope we can get it into Biopython proper before too long. Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython at maubp.freeserve.co.uk Fri Mar 20 16:44:34 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 20 Mar 2009 20:44:34 +0000 Subject: [Biopython-dev] biopython on github In-Reply-To: <20090320125518.GA351@sobchak.mgh.harvard.edu> References: <6d941f120903170619n4cb8d4dfr8a72f8ac1e0e896d@mail.gmail.com> <320fb6e00903170744j543f643fg6ef8d677287e2361@mail.gmail.com> <320fb6e00903171059r7a5528d5i19bf5fed9cfd8a63@mail.gmail.com> <20090317213414.GK57054@sobchak.mgh.harvard.edu> <320fb6e00903191200q4ccff93v7e082990d115bc09@mail.gmail.com> <320fb6e00903200341n7df020a7j95c611ab0a886ccb@mail.gmail.com> <5aa3b3570903200415m2f46a45fs8be270f28357a994@mail.gmail.com> <320fb6e00903200432s59ddf9a8vfd8230c0a07cd598@mail.gmail.com> <20090320125518.GA351@sobchak.mgh.harvard.edu> Message-ID: <320fb6e00903201344w64b303a1q1b1aac2740bac04a@mail.gmail.com> On Fri, Mar 20, 2009 at 12:55 PM, Brad Chapman wrote: > > Peter, the repository deletion was very quick for me, so it looks like it > got stuck somewhere with the GitHub downtime. They've fixed it - I picked a bad day to delete a "fork". Giovanni wrote: >> > I see that you have renamed your fork as seqio-quality. Ok, but I >> > think it is better to keep the fork's name as 'biopython', and then >> > create many branches inside it. Agreed - when I did that, I hadn't appreciated github's distinction between branches and forks. Peter wrote: >> I don't think I had entirely understood github's use of fork versus branch. >> I'll have so do some more reading and try again once my account has >> settled down. Thanks for the details in your email. Brad wrote: > Wow, now I am mad confused. I thought forks and branches were > conceptually the same. Giovanni, it seems like you are suggesting one > branch (the GitHub fork) and then a second branch (the git branch > command). We were thinking of a standard case as: > > 1. Fork the Biopython trunk at GitHub. Name this something so it > makes sense what the fork/branch is for. > 2. Work on the fork/branch. If you want, invite others to work on it > with you. > 3. When finished, be sure you are up to date with the master > Biopython trunk. > 4. Submit the fork/branch for inclusion in Biopython. > 5. Once included, delete the fork/branch. If I understand correctly, a potential contributor does this: 1. Fork Biopython trunk at GitHub, which will give you your own public repository (aka a "fork" in github's terminology), called by default contributorname/biopython, containing initially a single master branch, e.g. http://github.com/peterjc/biopython/tree/master 2. Using the git command line tool, create a branch within your repository to work on a problem, say bug2551, and upload this branch to your github account. e.g. http://github.com/peterjc/biopython/tree/bug2551 (I presume) 3. Work on your code, and commit changes to your bug2551 branch and push these up to your github account. 4. Once you are happy, submit this bug2551 branch for inclusion in Biopython (in the short term via Bugzilla, but if/when we have moved to github fully, as a pull request to the main biopython master, or if appropriate the master of the mainterainer of that module). 5. Once the changes are in the main Biopython, you can delete the bug2551 branch (but not the whole "fork" which may contain other branches). Almost the same... I'll try this shortly (maybe Monday). Peter From bugzilla-daemon at portal.open-bio.org Sat Mar 21 00:13:10 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sat, 21 Mar 2009 00:13:10 -0400 Subject: [Biopython-dev] [Bug 2678] Bio.Entrez module does not always retrieve or find DTD files In-Reply-To: Message-ID: <200903210413.n2L4DAgf028509@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2678 mdehoon at ims.u-tokyo.ac.jp changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #12 from mdehoon at ims.u-tokyo.ac.jp 2009-03-21 00:13 EST ------- (In reply to comment #11) I've changed Parser.py to show an informative error message about the missing DTD file, where most likely it can be found, and where to install it. Since this is probably the best we can do, I'm marking this bug as fixed. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sat Mar 21 00:24:43 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sat, 21 Mar 2009 00:24:43 -0400 Subject: [Biopython-dev] [Bug 2771] Bio.Entrez.read can't parse XML files from dbSNP (snp database) In-Reply-To: Message-ID: <200903210424.n2L4OhOA029253@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2771 ------- Comment #5 from mdehoon at ims.u-tokyo.ac.jp 2009-03-21 00:24 EST ------- (In reply to comment #0) > >>> handle = Entrez.efetch(db='snp', id='9996597', retmode='xml') > >>> cont = handle.read() > >>> print cont > ' > > ... > > With Bio.Entrez currently in CVS, Entrez.read does not raise an exception, but simply returns an empty record. The problem is that EFetch from the SNP database uses an XML Schema instead of a DTD to describe the contents of the XML file, as shown in the first few lines of the XML file: The last url shows the XML Schema. All other Entrez Utilities I've seen so far use a DTD instead of an XML Schema. Hence, Entrez.read only has a DTD parser to find out how to interpret the XML file. In principle, Bio.Entrez can be modified to add an XML Schema parser. While this is not trivial, it is probably not super difficult. Marco, would you be willing to write such a parser? If you have a parser for the XML Schema, I can show you how to integrate it with Bio.Entrez. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From mjldehoon at yahoo.com Sat Mar 21 00:47:07 2009 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Fri, 20 Mar 2009 21:47:07 -0700 (PDT) Subject: [Biopython-dev] Bio.Entrez catching more errors In-Reply-To: <320fb6e00903101640s5db8ed9hc1335d02f5e4123@mail.gmail.com> Message-ID: <334920.51680.qm@web62402.mail.re1.yahoo.com> I think it is good if we catch more errors in Bio.Entrez, but I think the error catching should be done by the parser, not when retrieving. As you show, NCBI Entrez returns error messages in various different formats: plain text, HTML, incorrect XML, broken XML. Since there are many ways to access NCBI Entrez, there may be other styles of error messages that we don't know about. Then there is the added complication of accessing NCBI Entrez to get information in formats other than XML, e.g. GenBank files. And all this may be changed over time by NCBI. Since the error message is ill-defined, code trying to identify error messages won't be robust. On the other hand, the format of files expected by a given parser is well-defined: Either the file agrees with the format expected by the parser, or it doesn't; if it doesn't, then that's an error. We may not be able to extract the exact error message returned by NCBI, but a parser for format XYZ can tell you that the file is not in format XYZ. Maybe the XML parser can say it doesn't look like an XML file, but that's about it. Once NCBI Entrez starts to return errors in a uniform format, we can modify our parsers to find out the exact error message. Until that happens, trying to do so on our side will not be robust. --Michiel --- On Tue, 3/10/09, Peter wrote: > From: Peter > Subject: [Biopython-dev] Bio.Entrez catching more errors > To: "BioPython-Dev Mailing List" > Date: Tuesday, March 10, 2009, 7:40 PM > Hi All, > > It occured to me that the Bio.Entrez._open function can > look at the > retmode argument (if present) and spot if there is a > mismatch between > the requested format (e.g. XML, HTML, text or asn.1) and > the actual > data the NCBI returned. Something along the following > lines could be > added to the end of the _open function in > Bio/Entrez/__init__.py to > acheive this: > > elif "retmode" in params and > params["retmode"].lower()=="html" \ > and not data.lower().startswith(" \ > and not data.lower().startswith(" html") : > raise TypeError("Requested HTML, but > didn't get it: %s..." % data) > elif "retmode" in params and > params["retmode"].lower()=="xml" \ > and not data.lower().startswith(" raise TypeError("Requested XML, but didn't > get it: %s..." % data) > elif "retmode" in params and > params["retmode"] \ > and > params["retmode"].lower()!="xml" \ > and data.lower().startswith(" raise TypeError("Didn't request XML, but > got it: %s..." % data) > elif "retmode" in params and > params["retmode"] \ > and > params["retmode"].lower()!="html" \ > and (data.lower().startswith(" \ > data.lower().startswith(" html")): > #Expected for some error pages (e.g. the Bad > Gateway caught above) > raise TypeError("Didn't request HTML, but > got it: %s..." % data) > > I'm sure my XML/HTML detection could be made more > robust here - I hope > the principle is clear. My motivation is that I have > noticed the NCBI > can return HTML error pages, and while we do catch some of > these > explicitly (e.g. Bad Gateway, or Service Unavailable), I > think any > HTML page when the user asked from XML, text or asn.1 > should be > treated as error. Similarly, not getting XML when you ask > for it etc. > > Note that by raising the exception including the message > text it > should be much easier to diagnose these failures. As a > tiny > refinement to the above code, we should only add the > "..." if there is > more text to follow - this isn't always the case. > > e.g. The following give an HTML error page (while some > databases like > "protein" are better behaved in this respect): > >>> print Entrez.efetch(db="homologene", > id="nonexistant", retmode="text").read() > >>> print Entrez.efetch(db="homologene", > id="nonexistant", > retmode="asn.1").read() > > Similarly, these give an XML like fragment (which is not a > valid XML > file in itself - arguably an NCBI bug; some databases like > "protein" > are better behaved in this respect): > >>> print Entrez.efetch(db="pubmed", > id="nonexistant", retmode="xml").read() > >>> print Entrez.efetch(db="homologene", > id="nonexistant", retmode="xml").read() > >>> print Entrez.efetch(db="cdd", > id="nonexistant", retmode="xml").read() > >>> print Entrez.efetch(db="taxonomy", > id="nonexistant", retmode="xml").read() > > My suggested change to Bio.Entrez would also catch the > following > examples (using an invalid database) where the NCBI ignore > the retmode > and return an HTML help page: > >>> print > Entrez.efetch(db="nonexistant", > id="123456", retmode="xml").read() > >>> print > Entrez.efetch(db="nonexistant", > id="123456", retmode="text").read() > > In a less clear cut example, this would flag the following > as an error > as the NCBI seem to return ASN.1 text instead of HTML > here:: > >>> print Entrez.efetch(db="nucleotide", > retmode="html", id="123456").read() > > Overall, I think this change should catch lots of errors > which > otherwise may not be detected until later (e.g. while > trying to parse > the file). > > -------------------------------------------------------------------------------------------------- > > On another point, should we catch these responses as > errors:? > > >>> efetch(db="snp", > id="123456").read() > 'PmFetch > response\n
\n1:
> id: 123456 Error occurred: cannot get document
> summary\n
' > >>> efetch(db="snp", > id="123456", retmode="html").read() > 'PmFetch > response\n
\n1:
> id: 123456 Error occurred: cannot get document
> summary\n
' > >>> efetch(db="snp", > id="123456", retmode="xml").read() > ' version="1.0"?>\n xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"\nxmlns="http://www.ncbi.nlm.nih.gov/SNP/docsum"\nxsi:schemaLocation="http://www.ncbi.nlm.nih.gov/SNP/docsum\nhttp://www.ncbi.nlm.nih.gov/SNP/docsum/eudocsum.xsd">1: > id: 123456 Error occurred: cannot get document > summary\n\n' > >>> efetch(db="snp", > id="123456", retmode="text").read() > '1: id: 123456 Error occurred: cannot get document > summary\n' > > and, > >>> print efetch(db="homologene", > retmode="html", id="fake").read() > > >

Error occurred: Empty id list - > nothing todo

... > > Looking for the string "Error occurred: " looks > fairly safe here, and > should cover a range of entries. Of course, you can > imagine false > positives too, e.g. a valid PUBMED plain text record for a > tutorial > article with a title like "Yikes! An Error Occurred: A > beginner's > Guide To Defensive Programming." could match. > > Peter > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev From mjldehoon at yahoo.com Sat Mar 21 00:54:08 2009 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Fri, 20 Mar 2009 21:54:08 -0700 (PDT) Subject: [Biopython-dev] Bio.Enzyme (was: Re: Bio.ExPASy) In-Reply-To: <76595.11423.qm@web62404.mail.re1.yahoo.com> Message-ID: <517737.76119.qm@web62403.mail.re1.yahoo.com> I've created a simplified version of the parser in Bio.Enzyme in Bio.ExPASy.Enzyme. The idea behind it is to collect all parsers related to ExPASy databases in Bio.ExPASy so that they can be found more easily by users. Bio.ExPASy.Enzyme works essentially the same as Bio.Enzyme, but I've done a few things a bit differently. The biggest change is probably that Bio.Enzyme stores information as attributes to a record, whereas Bio.ExPASy.Enzyme has a Record derived from a dictionary, and stores information in the dictionary (same as Bio.Medline). Does anybody have any objection if Bio.ExPASy.Enzyme becomes the "official" parser for ExPASy's Enzyme database? If not, I'll modify the documentation and tests accordingly, and start the deprecation process for Bio.Enzyme. --Michiel --- On Sun, 3/15/09, Michiel de Hoon wrote: > From: Michiel de Hoon > Subject: [Biopython-dev] Bio.ExPASy > To: biopython-dev at biopython.org > Date: Sunday, March 15, 2009, 6:24 AM > Hi everybody, > > As discussed previously, I have moved the Bio.Prosite code > to Bio.ExPASy, and I've added a ScanProsite module to > Bio.ExPASy. I guess Bio.Enzyme should also move to > Bio.ExPASy. See > > http://biopython.org/DIST/docs/tutorial/Tutorial.proposal.html > > for the documentation of Biopython as currently in CVS. > > --Michiel. > > > > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev From bugzilla-daemon at portal.open-bio.org Sat Mar 21 01:05:19 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sat, 21 Mar 2009 01:05:19 -0400 Subject: [Biopython-dev] [Bug 2759] Unit test for Bio.PDB.HSExposure In-Reply-To: Message-ID: <200903210505.n2L55Jb0031713@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2759 ------- Comment #8 from eric.talevich at gmail.com 2009-03-21 01:05 EST ------- Marco & Peter, have either of you applied these patches to a git branch yet? My branch for Bug 2754 and related changes also converts test_PDB.py to unittest. (I silence the warnings by calling warnings.simplefilter('ignore') in the setUp method.) I'd like to try cherry-picking this commit if it's available on github. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From mjldehoon at yahoo.com Sat Mar 21 01:33:42 2009 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Fri, 20 Mar 2009 22:33:42 -0700 (PDT) Subject: [Biopython-dev] biopython on github In-Reply-To: <20090320125518.GA351@sobchak.mgh.harvard.edu> Message-ID: <587027.97686.qm@web62408.mail.re1.yahoo.com> > Which parts of this fall out of "standard" git > practice? In general, > we should strive to keep this as simple as possible. If > using Git is > complicated then we are losing a lot of our advantage over > CVS/patches. I haven't been following this topic closely, and as an "outsider" using git seems more complicated than using cvs or svn. And to be honest, I don't know if Biopython actually needs the branching and forking stuff. I think that this is more useful for bigger projects, where multiple developers may be working on interrelated parts of code at the same time. That hardly ever happens in Biopython, though. --Michiel. From idoerg at gmail.com Sat Mar 21 01:55:36 2009 From: idoerg at gmail.com (Iddo Friedberg) Date: Fri, 20 Mar 2009 22:55:36 -0700 Subject: [Biopython-dev] It's out! Message-ID: <49C48158.9060004@gmail.com> I'm first to announce this.... hehehe http://bioinformatics.oxfordjournals.org/cgi/content/abstract/btp163v1 -- Iddo Friedberg Ph.D. Atkinson Hall MC 0446 University of California San Diego 9500 Gilman Dr. La Jolla, CA 92093-0446 USA http://iddo-friedberg.net From dalloliogm at gmail.com Sat Mar 21 09:57:54 2009 From: dalloliogm at gmail.com (Giovanni Marco Dall'Olio) Date: Sat, 21 Mar 2009 14:57:54 +0100 Subject: [Biopython-dev] biopython on github In-Reply-To: <320fb6e00903201344w64b303a1q1b1aac2740bac04a@mail.gmail.com> References: <320fb6e00903170744j543f643fg6ef8d677287e2361@mail.gmail.com> <320fb6e00903171059r7a5528d5i19bf5fed9cfd8a63@mail.gmail.com> <20090317213414.GK57054@sobchak.mgh.harvard.edu> <320fb6e00903191200q4ccff93v7e082990d115bc09@mail.gmail.com> <320fb6e00903200341n7df020a7j95c611ab0a886ccb@mail.gmail.com> <5aa3b3570903200415m2f46a45fs8be270f28357a994@mail.gmail.com> <320fb6e00903200432s59ddf9a8vfd8230c0a07cd598@mail.gmail.com> <20090320125518.GA351@sobchak.mgh.harvard.edu> <320fb6e00903201344w64b303a1q1b1aac2740bac04a@mail.gmail.com> Message-ID: <5aa3b3570903210657v46b1b1bbj80c013b83ff635e3@mail.gmail.com> On Fri, Mar 20, 2009 at 9:44 PM, Peter wrote: > If I understand correctly, a potential contributor does this: > 1. Fork Biopython trunk at GitHub, which will give you your own > public repository (aka a "fork" in github's terminology), called > by default contributorname/biopython, containing initially a > single master branch, e.g. > http://github.com/peterjc/biopython/tree/master > 2. Using the git command line tool, create a branch within your > repository to work on a problem, say bug2551, and upload this > branch to your github account. e.g. > http://github.com/peterjc/biopython/tree/bug2551 (I presume) > 3. Work on your code, and commit changes to your bug2551 branch > and push these up to your github account. > 4. Once you are happy, submit this bug2551 branch for inclusion in > Biopython (in the short term via Bugzilla, but if/when we have moved > to github fully, as a pull request to the main biopython master, > or if appropriate the master of the mainterainer of that module). > 5. Once the changes are in the main Biopython, you can delete > the bug2551 branch (but not the whole "fork" which may contain > other branches). Yes, I think this is the procedure. It is a good idea to create a branch with a bug's name, so more people can work at the same time on the same fix. -- My blog on bioinformatics (now in English): http://bioinfoblog.it From bugzilla-daemon at portal.open-bio.org Sat Mar 21 10:32:41 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sat, 21 Mar 2009 10:32:41 -0400 Subject: [Biopython-dev] [Bug 2759] Unit test for Bio.PDB.HSExposure In-Reply-To: Message-ID: <200903211432.n2LEWfXP000985@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2759 ------- Comment #9 from dalloliogm at gmail.com 2009-03-21 10:32 EST ------- (In reply to comment #8) > Marco & Peter, have either of you applied these patches to a git branch yet? My > branch for Bug 2754 and related changes also converts test_PDB.py to unittest. > (I silence the warnings by calling warnings.simplefilter('ignore') in the setUp > method.) I'd like to try cherry-picking this commit if it's available on > github. ok... Is your branch this one: - http://github.com/etal/biopython/commit/65f5cf9fa8d6d63976b0942e00bd9aecef7e4197 ? This was my proposal: - http://github.com/dalloliogm/biopython/blob/alternative-pdb-exposure-test/Tests/test_PDBexposure.py I have structured the unittest in a different way, so every test case represents a pdb file with some known values for PDB exposure etc..: but the result should be the same. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From dalloliogm at gmail.com Sat Mar 21 10:40:05 2009 From: dalloliogm at gmail.com (Giovanni Marco Dall'Olio) Date: Sat, 21 Mar 2009 15:40:05 +0100 Subject: [Biopython-dev] biopython on github In-Reply-To: <587027.97686.qm@web62408.mail.re1.yahoo.com> References: <20090320125518.GA351@sobchak.mgh.harvard.edu> <587027.97686.qm@web62408.mail.re1.yahoo.com> Message-ID: <5aa3b3570903210740n7f818560x47991ed97ed616df@mail.gmail.com> On Sat, Mar 21, 2009 at 6:33 AM, Michiel de Hoon wrote: > >> Which parts of this fall out of "standard" git >> practice? In general, >> we should strive to keep this as simple as possible. If >> using Git is >> complicated then we are losing a lot of our advantage over >> CVS/patches. > > I haven't been following this topic closely, and as an "outsider" using git seems more complicated than using cvs or svn. And to be honest, I don't know if Biopython actually needs the branching and forking stuff. ok, but I assure you if you don't want to learn the advanced features it can be used as you did with cvs. The only difference, maybe, is that you work with a local copy (offline) and push the changes only when you are sure about them. If you keep a mirror on github to collect patched and enhancements, it has some advantages: - more than one people can work on a patch at the same time - it is a lot easier to create customized branches of biopython. So if someone needs to create a custom version of biopython for its own purposes, it will be always easy to keep it compatible with the official code. - people can play with the code and propose enhancements, without having to ask for write rights. This means that more people can take confidence with biopython's code and propose fixes. Have a look at this video, where it shows that the Ruby On Rails project has grown quicker when it has moved to github: - http://python.genedrift.org/2009/03/15/ror-commits/ (the jump should be on minute 5.10 or so) > I think that this is more useful for bigger projects, where multiple developers may be working on interrelated parts of code at the same time. That hardly ever happens in Biopython, though. Let's say I want to propose a patch to biopython. One of you developers will probably need to look at it and propose some changes to adapt it with the rest of biopython. Isn't it this situation are you describing (multiple developers working on interrelated parts of the code)? Another example is the popgen module. Since it is a pretty big module, and independent from the rest, an 'experimental popgen branch' of biopython has been created, based on what was the latest biopython's cvs at the time. However, in the range of time that it has passed since when this branch has been created, the biopython's cvs has changed: so maybe now the experimental popgen branch is not compatible any more with the official code, if some module or convention has been changed. So, git and github make the process of creating a new branch of development and keeping it compatible with the original one easier. > --Michiel. > > > > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev > -- My blog on bioinformatics (now in English): http://bioinfoblog.it From eric.talevich at gmail.com Sat Mar 21 11:23:56 2009 From: eric.talevich at gmail.com (Eric Talevich) Date: Sat, 21 Mar 2009 11:23:56 -0400 Subject: [Biopython-dev] [Bug 2759] Unit test for Bio.PDB.HSExposure In-Reply-To: <200903211432.n2LEWfXP000985@portal.open-bio.org> References: <200903211432.n2LEWfXP000985@portal.open-bio.org> Message-ID: <3f6baf360903210823o7a597a92va0edd2a281deb465@mail.gmail.com> On Sat, Mar 21, 2009 at 10:32 AM, wrote: > http://bugzilla.open-bio.org/show_bug.cgi?id=2759 > > > ------- Comment #9 from dalloliogm at gmail.com 2009-03-21 10:32 EST ------- > (In reply to comment #8) > > Marco & Peter, have either of you applied these patches to a git branch > yet? My > > branch for Bug 2754 and related changes also converts test_PDB.py to > unittest. > > (I silence the warnings by calling warnings.simplefilter('ignore') in the > setUp > > method.) I'd like to try cherry-picking this commit if it's available on > > github. > > ok... Is your branch this one: > - > > http://github.com/etal/biopython/commit/65f5cf9fa8d6d63976b0942e00bd9aecef7e4197 > ? > > > This was my proposal: > - > > http://github.com/dalloliogm/biopython/blob/alternative-pdb-exposure-test/Tests/test_PDBexposure.py > > > I have structured the unittest in a different way, so every test case > represents a pdb file with some known values for PDB exposure etc..: but > the > result should be the same. > > Oh, I see now that these are meant to be separate files. Yes, that's my branch. Perhaps test_PDB.py should be renamed to test_PDBParser.py, and the NeighborSearch test moved elsewhere. In that case, there's no merging problem here, and the only change needed in test_PDBexposure.py is to silence the warnings... right? From dalloliogm at gmail.com Sat Mar 21 12:14:45 2009 From: dalloliogm at gmail.com (Giovanni Marco Dall'Olio) Date: Sat, 21 Mar 2009 17:14:45 +0100 Subject: [Biopython-dev] [Bug 2759] Unit test for Bio.PDB.HSExposure In-Reply-To: <3f6baf360903210823o7a597a92va0edd2a281deb465@mail.gmail.com> References: <200903211432.n2LEWfXP000985@portal.open-bio.org> <3f6baf360903210823o7a597a92va0edd2a281deb465@mail.gmail.com> Message-ID: <5aa3b3570903210914id0bad69xc5459de68b64ec55@mail.gmail.com> On Sat, Mar 21, 2009 at 4:23 PM, Eric Talevich wrote: > On Sat, Mar 21, 2009 at 10:32 AM, wrote: > >> http://bugzilla.open-bio.org/show_bug.cgi?id=2759 >> >> >> ------- Comment #9 from dalloliogm at gmail.com ?2009-03-21 10:32 EST ------- >> (In reply to comment #8) >> > Marco & Peter, have either of you applied these patches to a git branch >> yet? My >> > branch for Bug 2754 and related changes also converts test_PDB.py to >> unittest. >> > (I silence the warnings by calling warnings.simplefilter('ignore') in the >> setUp >> > method.) I'd like to try cherry-picking this commit if it's available on >> > github. >> >> ok... Is your branch this one: >> - >> >> http://github.com/etal/biopython/commit/65f5cf9fa8d6d63976b0942e00bd9aecef7e4197 >> ? >> >> >> This was my proposal: >> - >> >> http://github.com/dalloliogm/biopython/blob/alternative-pdb-exposure-test/Tests/test_PDBexposure.py >> >> >> I have structured the unittest in a different way, so every test case >> represents a pdb file with some known values for PDB exposure etc..: but >> the >> result should be the same. >> >> > > Oh, I see now that these are meant to be separate files. Yes, that's my > branch. Perhaps test_PDB.py should be renamed to test_PDBParser.py, and the > NeighborSearch test moved elsewhere. In that case, there's no merging > problem here, and the only change needed in test_PDBexposure.py is to > silence the warnings... right? well, it depends also on what Peter think. Mine was only a proof of concept to see if the unittest could be refactored in that way. In principle, it should be equivalent to the the original one and execute the same tests. If you want to use it, the problem is that it make use of a decorator function (@classmethod) which is not supported by earlier versions of python. This can be resolved by moving all the instructions in setUpAll into setUp, like here: - http://github.com/dalloliogm/biopython/commit/83864b8a1269aaf52ac193d7bf9ed9ca5edc5a30 (however, this way the setUp instructions - like opening and parsing the PPDB file - will be repeated for every test). > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev > -- My blog on bioinformatics (now in English): http://bioinfoblog.it From eric.talevich at gmail.com Sat Mar 21 13:13:52 2009 From: eric.talevich at gmail.com (Eric Talevich) Date: Sat, 21 Mar 2009 13:13:52 -0400 Subject: [Biopython-dev] [Bug 2759] Unit test for Bio.PDB.HSExposure In-Reply-To: <5aa3b3570903210914id0bad69xc5459de68b64ec55@mail.gmail.com> References: <200903211432.n2LEWfXP000985@portal.open-bio.org> <3f6baf360903210823o7a597a92va0edd2a281deb465@mail.gmail.com> <5aa3b3570903210914id0bad69xc5459de68b64ec55@mail.gmail.com> Message-ID: <3f6baf360903211013k423b925avc4a3e714ce36ff85@mail.gmail.com> On Sat, Mar 21, 2009 at 12:14 PM, Giovanni Marco Dall'Olio < dalloliogm at gmail.com> wrote: > On Sat, Mar 21, 2009 at 4:23 PM, Eric Talevich > wrote: > > On Sat, Mar 21, 2009 at 10:32 AM, >wrote: > > > >> http://bugzilla.open-bio.org/show_bug.cgi?id=2759 > >> > >> > >> ok... Is your branch this one: > >> - > >> > http://github.com/etal/biopython/commit/65f5cf9fa8d6d63976b0942e00bd9aecef7e4197 > >> ? > >> > >> > >> This was my proposal: > >> - > >> > http://github.com/dalloliogm/biopython/blob/alternative-pdb-exposure-test/Tests/test_PDBexposure.py > >> > > > If you want to use it, the problem is that it make use of a decorator > function (@classmethod) which is not supported by earlier versions of > python. > > Decorators and @classmethod were added in Python 2.4. Since support for Python 2.3 is being dropped after the release of BioPython 1.50 (I believe), it should be safe to apply the decorator to post-1.50 branches. If this needs to be in 1.50, the older way of "mymethod = classmethod(mymethod)" would work fine in Py2.3, although I personally would just move the PDB loading steps to setUp, since the parser is pretty quick and the code for that is easy to read. I'll finish up my work on Bug 2754 and merge/rebase it before trying to integrate this code -- that should bring the parse warnings under control and make it easier for Peter to dispatch this bug. From biopython at maubp.freeserve.co.uk Sat Mar 21 17:16:43 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Sat, 21 Mar 2009 21:16:43 +0000 Subject: [Biopython-dev] biopython on github In-Reply-To: <587027.97686.qm@web62408.mail.re1.yahoo.com> References: <20090320125518.GA351@sobchak.mgh.harvard.edu> <587027.97686.qm@web62408.mail.re1.yahoo.com> Message-ID: <320fb6e00903211416r457e303bnc0515b576bbe6c9a@mail.gmail.com> On Sat, Mar 21, 2009 at 5:33 AM, Michiel de Hoon wrote: > I haven't been following this topic closely, and as an > "outsider" using git seems more complicated than using > cvs or svn. And to be honest, I don't know if Biopython > actually needs the branching and forking stuff. I think > that this is more useful for bigger projects, where > multiple developers may be working on interrelated > parts of code at the same time. That hardly ever > happens in Biopython, though. Certainly git and github is much more powerful, and therefore more complicated. There is no denying that. However, if we move to git on github, I would expect those of us with CVS access to all be given write access to the official Biopython branch (probably using the collaborators feature). If that is done, I think you won't find things so different from now. i.e. Initially at least, it would be business as usual - our core official developers would be trusted to work directly on the main branch as now (with discussions before commits as appropriate), and do not have to worry about forking/branching etc (unless they want to). In terms of the actual command(s) you'd have to type in at the terminal to commit a change to the online repository, this goes from one step: cvs commit -m "Comment here" file1.py file2.py ... to two steps. First you you have to commit changes locally (to git on your personal machine) and then push them to the main Biopython branch on public server (on github). Once I'm back at work where I have git installed, I'll write this up on the wiki - assuming Brad doesn't beat me too it ;) The big change is for non-core developers, i.e. potential contributors (like Eric who is currently trying some Bio.PDB changes). For them, using git allows them to work on their changes and keep in sync with the master repository with much more ease. Peter From chris.lasher at gmail.com Sat Mar 21 22:33:11 2009 From: chris.lasher at gmail.com (Chris Lasher) Date: Sat, 21 Mar 2009 22:33:11 -0400 Subject: [Biopython-dev] Help pages in Biopython wiki In-Reply-To: References: <128a885f0903192152m7d1e24fdh3ace50021851b36e@mail.gmail.com> Message-ID: <128a885f0903211933w2fd8986ek53ad8d083cca3534@mail.gmail.com> On Fri, Mar 20, 2009 at 4:42 AM, Leighton Pritchard wrote: > Hi Chris, > > That page doesn't exist, yet (click on the 'page' tab to see this), and no > pages link to it (see here: > http://biopython.org/wiki/Special:WhatLinksHere/Help) > > What help were you expecting to see there? Hi Leighton, I'm fairly certain there are pages one can install with a MediaWiki instance that provide the standard help. They look like this: http://www.mediawiki.org/wiki/Help:Contents They contain the standard documentation about how to edit, format, create new pages, etc. Useful things for new community members and people like me who forget the nuances of each wiki software's markup language from time to time. :-) Chris From biopython at maubp.freeserve.co.uk Sun Mar 22 06:18:49 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Sun, 22 Mar 2009 10:18:49 +0000 Subject: [Biopython-dev] Help pages in Biopython wiki In-Reply-To: <128a885f0903211933w2fd8986ek53ad8d083cca3534@mail.gmail.com> References: <128a885f0903192152m7d1e24fdh3ace50021851b36e@mail.gmail.com> <128a885f0903211933w2fd8986ek53ad8d083cca3534@mail.gmail.com> Message-ID: <320fb6e00903220318g7e214c8bmf1e6012e5db505fd@mail.gmail.com> On Sun, Mar 22, 2009 at 2:33 AM, Chris Lasher wrote: > Hi Leighton, > > I'm fairly certain there are pages one can install with a MediaWiki instance > that provide the standard help. They look like this: > http://www.mediawiki.org/wiki/Help:Contents > > They contain the standard documentation about how to edit, format, create > new pages, etc. Useful things for new community members and people like me > who forget the nuances of each wiki software's markup language from time to > time. :-) > > Chris I'm glad Leighton asked - otherwise I would had. Would it suffice to create an a manual help page, saying this is a wiki and we are happy for people to create their own account to fix any minor errors they spot, and just link to http://www.mediawiki.org/wiki/Help:Contents for help? Peter From biopython at maubp.freeserve.co.uk Sun Mar 22 06:51:17 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Sun, 22 Mar 2009 10:51:17 +0000 Subject: [Biopython-dev] [Bug 2759] Unit test for Bio.PDB.HSExposure In-Reply-To: <3f6baf360903211013k423b925avc4a3e714ce36ff85@mail.gmail.com> References: <200903211432.n2LEWfXP000985@portal.open-bio.org> <3f6baf360903210823o7a597a92va0edd2a281deb465@mail.gmail.com> <5aa3b3570903210914id0bad69xc5459de68b64ec55@mail.gmail.com> <3f6baf360903211013k423b925avc4a3e714ce36ff85@mail.gmail.com> Message-ID: <320fb6e00903220351u53563f03m4c54359278c5b7f0@mail.gmail.com> On Sat, Mar 21, 2009 at 5:13 PM, Eric Talevich wrote: > Giovanni wrote: >> If you want to use it, the problem is that it make use of a decorator >> function (@classmethod) which is not supported by earlier versions of >> python. > > Decorators and @classmethod were added in Python 2.4. Since support for > Python 2.3 is being dropped after the release of BioPython 1.50 (I believe), > it should be safe to apply the decorator to post-1.50 branches. If this > needs to be in 1.50, the older way of "mymethod = classmethod(mymethod)" > would work fine in Py2.3, although I personally would just move the PDB > loading steps to setUp, since the parser is pretty quick and the code for > that is easy to read. Extra PDB unit tests would be nice to have in Biopython 1.50, which means they must work on Python 2.3, so no decorators please. I agree with Eric that it is simpler just to use setUp for PDB file parsing. Yes, it is slower as for each test method the PDB file is reloaded - but you also make sure it is a clean object structure, which is important as some operations we will testing may change the object. e.g. HSExposure: http://bugzilla.open-bio.org/show_bug.cgi?id=2759#c4 Peter From biopython at maubp.freeserve.co.uk Sun Mar 22 06:44:42 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Sun, 22 Mar 2009 10:44:42 +0000 Subject: [Biopython-dev] Bio.Entrez catching more errors In-Reply-To: <334920.51680.qm@web62402.mail.re1.yahoo.com> References: <320fb6e00903101640s5db8ed9hc1335d02f5e4123@mail.gmail.com> <334920.51680.qm@web62402.mail.re1.yahoo.com> Message-ID: <320fb6e00903220344t1057bf74mcdc1f2256d8b29b4@mail.gmail.com> On Sat, Mar 21, 2009 at 4:47 AM, Michiel de Hoon wrote: > > I think it is good if we catch more errors in Bio.Entrez, but I think > the error catching should be done by the parser, not when > retrieving. We could do that - maybe some common functions for checking the first line to see if it looks like HTML or XML would help. It means lots of changes to lots of parsers, but would help outside the use case of Bio.Entrez - so this perhaps worth doing anyway. What about the fairly common situation (at, its something I've done fairly often) where Bio.Entrez.efetch() is used to fetch records which are saved directly to file without verification - e.g. to be parsed by another program? Unless the error is caught in Bio.Entrez.efetch() it may be out of our control. > As you show, NCBI Entrez returns error messages in various > different formats: plain text, HTML, incorrect XML, broken XML. > Since there are many ways to access NCBI Entrez, there may > be other styles of error messages that we don't know about. > Then there is the added complication of accessing NCBI Entrez > to get information in formats other than XML, e.g. GenBank files. > And all this may be changed over time by NCBI. > > Since the error message is ill-defined, code trying to identify > error messages won't be robust. All very true. But the main point in my original email was on something slightly different... > On the other hand, the format of files expected by a given > parser is well-defined: Either the file agrees with the format > expected by the parser, or it doesn't; if it doesn't, then that's > an error. Its not that simple - we are often dealing with loosely defined file formats, and you may be able to reasonably interpret one file in several different formats (giving difference/incorrect data). Some parsers are very tolerant at the moment, for example GenBank files can have a legitimate free format comment before the records, so the parser skips anything until it recognizes a GenBank locus id line. > We may not be able to extract the exact error message > returned by NCBI, but a parser for format XYZ can tell > you that the file is not in format XYZ. Some parsers may be able to do this, but not all. > Maybe the XML parser can say it doesn't look like an > XML file, but that's about it. This is an easy case because XML is so strictly defined. Spotting a non-XML file is pretty trivial. > Once NCBI Entrez starts to return errors in a uniform > format, we can modify our parsers to find out the > exact error message. Until that happens, trying to do > so on our side will not be robust. I agree that pulling out error messages (the second half of my original email in the thread) is error prone. You might argue that catching any errors is still worthwhile, as long as there are no false positives. The first half of the email (the main point) was based on a special case: HTML and XML are pretty easy to identify. If you ask for HTML and don't get it, it is an error (and vice versa). If you ask for XML and don't get it, it is an error (and vice versa). The fact that the NCBI currently often return an HTML or XML error page when a plain text format was requested is then easily detected as an error (simply from the file type). This will still work even if the NCBI do change their error formats or wording - it should be pretty robust. Peter From bugzilla-daemon at portal.open-bio.org Sun Mar 22 07:36:38 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sun, 22 Mar 2009 07:36:38 -0400 Subject: [Biopython-dev] [Bug 2754] Bio.PDB: Parse warnings should print to stderr, not stdout In-Reply-To: Message-ID: <200903221136.n2MBacSc000608@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2754 ------- Comment #12 from biopython-bugzilla at maubp.freeserve.co.uk 2009-03-22 07:36 EST ------- I have a thought last night about this - how about we keep PERMISSIVE=1 as the default but offer a "very permissive" mode: PERMISSIVE=2 (or more), silently ignore problems, continue parsing. PERMISSIVE=1 (or True), use stderr via the warning module, continue parsing. PERMISSIVE=0 (or False), raise exceptions, halt parsing. It would ofter an alternative way to silence the warnings in the unit tests, and could be controlled at the level of individual tests - for example where we want to make sure certain errors are caught. It might also be useful in ordinary scripts. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From tiagoantao at gmail.com Sun Mar 22 07:50:50 2009 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Sun, 22 Mar 2009 11:50:50 +0000 Subject: [Biopython-dev] biopython on github In-Reply-To: <587027.97686.qm@web62408.mail.re1.yahoo.com> References: <20090320125518.GA351@sobchak.mgh.harvard.edu> <587027.97686.qm@web62408.mail.re1.yahoo.com> Message-ID: <6d941f120903220450y4005b63bvd23dcb4981edec7b@mail.gmail.com> On Sat, Mar 21, 2009 at 5:33 AM, Michiel de Hoon wrote: > I haven't been following this topic closely, and as an "outsider" using git seems more complicated than using cvs or svn. And to be honest, I don't know if Biopython actually needs the branching and forking stuff. I think that this is more useful for bigger projects, where multiple developers may be working on interrelated parts of code at the same time. That hardly ever happens in Biopython, though. I would actually take this argument and reverse it: The reason why biopython has been a small project, and above all, slow to develop and innovate is excessive centralization. Using a distributed technology allows for people to try new ideas and to get things moving (while still maintaining an official rock stable version with maybe glacial policies). Lets not kid ourselves: biopython lacks a lot of stuff that is fundamental in modern computational biology. The current status quo is essentially maintaining a frozen set of functionality (most new code is really just code cleanup and optimization). While I would be cautious with a distributed environment and would agree that checks has to be put in place to assure that the official product is rock solid, has documentation and is reasonably future proof, I nonetheless warmly welcome this new development. It is also good, for a change, to have an active discussion on the list: Now this actually seems like proper, live community. Tiago From eric.talevich at gmail.com Sun Mar 22 11:25:23 2009 From: eric.talevich at gmail.com (Eric Talevich) Date: Sun, 22 Mar 2009 11:25:23 -0400 Subject: [Biopython-dev] [Bug 2754] Bio.PDB: Parse warnings should print to stderr, not stdout In-Reply-To: <200903221136.n2MBacSc000608@portal.open-bio.org> References: <200903221136.n2MBacSc000608@portal.open-bio.org> Message-ID: <3f6baf360903220825g2b871432yba5749dab4c2ba34@mail.gmail.com> On Sun, Mar 22, 2009 at 7:36 AM, wrote: > http://bugzilla.open-bio.org/show_bug.cgi?id=2754 > > > > ------- Comment #12 from biopython-bugzilla at maubp.freeserve.co.uk 2009-03-22 07:36 EST ------- > I have a thought last night about this - how about we keep PERMISSIVE=1 as > the > default but offer a "very permissive" mode: > > PERMISSIVE=2 (or more), silently ignore problems, continue parsing. > PERMISSIVE=1 (or True), use stderr via the warning module, continue > parsing. > PERMISSIVE=0 (or False), raise exceptions, halt parsing. > > It would ofter an alternative way to silence the warnings in the unit > tests, > and could be controlled at the level of individual tests - for example > where we > want to make sure certain errors are caught. > > It might also be useful in ordinary scripts. > > I like the idea. I still have to comb through the documentation for the warnings module some more, but I think it should be possible to do all of this through that API -- loading PERMISSIVE=0 turns the warnings into full exceptions, =1 makes them messages on stderr, and =2 switches them off. At some point I'd like to make a script called something like pdbtidy.py which parses a potentially not-quite-conformant PDB file in a permissive mode, lists all complaints (including things like discontinuously-numbered residues, atom collisions, psi-phi outliers, etc.), and writes out a fixed version of the file. The model for this is HTML Tidy. Do you think this would have a place in the Biopython distribution? From biopython at maubp.freeserve.co.uk Sun Mar 22 11:53:21 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Sun, 22 Mar 2009 15:53:21 +0000 Subject: [Biopython-dev] PDB tidy script, was: [Bug 275 Message-ID: <320fb6e00903220853u7b594ee3na86560e34f742b5f@mail.gmail.com> On Bug 2754 comment 12, I wrote: http://bugzilla.open-bio.org/show_bug.cgi?id=2754#c12 >> I have a thought last night about this - how about we keep PERMISSIVE=1 >> as the default but offer a "very permissive" mode: >> >> PERMISSIVE=2 (or more), silently ignore problems, continue parsing. >> PERMISSIVE=1 (or True), use stderr via the warning module, continue >> parsing. >> PERMISSIVE=0 (or False), raise exceptions, halt parsing. >> >> It would ofter an alternative way to silence the warnings in the unit >> tests, and could be controlled at the level of individual tests - for >> example where we want to make sure certain errors are caught. >> >> It might also be useful in ordinary scripts. Eric replied: > I like the idea. I still have to comb through the documentation for the > warnings module some more, but I think it should be possible to do all of > this through that API -- loading PERMISSIVE=0 turns the warnings into full > exceptions, =1 makes them messages on stderr, and =2 switches them off. It doesn't really matter - all the PDB contruction warning/errors go though _handle_PDB_exception() to this would be the least invasive way to implement this. > At some point I'd like to make a script called something like pdbtidy.py > which parses a potentially not-quite-conformant PDB file in a permissive > mode, lists all complaints (including things like discontinuously-numbered > residues, atom collisions, psi-phi outliers, etc.), and writes out a fixed > version of the file. The model for this is HTML Tidy. Do you think this > would have a place in the Biopython distribution? It sounds useful to me, it can probably go in the scripts subdirectory, along with the PDB surface exposure script. One drawback is that currently Bio.PDB's header parsing leaves a lot to be desired, and very little of the header is output when saving a PDB file (Thomas' focus is/was very much on the 3D data). Peter From lpritc at scri.ac.uk Mon Mar 23 05:02:53 2009 From: lpritc at scri.ac.uk (Leighton Pritchard) Date: Mon, 23 Mar 2009 09:02:53 +0000 Subject: [Biopython-dev] biopython on github In-Reply-To: <5aa3b3570903210740n7f818560x47991ed97ed616df@mail.gmail.com> Message-ID: On 21/03/2009 14:40, "Giovanni Marco Dall'Olio" wrote: > Have a look at this video, where it shows that the Ruby On Rails > project has grown quicker when it has moved to github: > > - http://python.genedrift.org/2009/03/15/ror-commits/ > > (the jump should be on minute 5.10 or so) I've seen this argument a couple of times, now - mostly on blogs - and I'm not sure that it's all that clear-cut. The RoR video shows a greater number of individual names associated with commits, after the move to github. However, it's not clear whether this is because a large number of individuals have suddenly decided to contribute to the project, or whether the move to a version control system in which author attribution remains with contributed code - as opposed to the bottleneck of having to be submitted with the id of someone with write access - is responsible. I don't think there's enough evidence to say 'the move to github caused an increase in contributions'. As a counter-example, the number of people who have recorded contributions to Biopython code is 46 (from the CONTRIB file on CVS). I don't think that there are that many ids associated with committing the codebase on there. My name's only associated with GenomeDiagram in the commit comments, not as an author/committer of the code - at least, as far as the CVS application is concerned - for example. This might change with git. Of course, I might be misunderstanding git's attribution model, or how the stats for that RoR video were compiled... L. -- Dr Leighton Pritchard MRSC D131, Plant Pathology Programme, SCRI Errol Road, Invergowrie, Perth and Kinross, Scotland, DD2 5DA e:lpritc at scri.ac.uk w:http://www.scri.ac.uk/staff/leightonpritchard gpg/pgp: 0xFEFC205C tel:+44(0)1382 562731 x2405 ______________________________________________________ SCRI, Invergowrie, Dundee, DD2 5DA. The Scottish Crop Research Institute is a charitable company limited by guarantee. Registered in Scotland No: SC 29367. Recognised by the Inland Revenue as a Scottish Charity No: SC 006662. DISCLAIMER: This email is from the Scottish Crop Research Institute, but the views expressed by the sender are not necessarily the views of SCRI and its subsidiaries. This email and any files transmitted with it are confidential to the intended recipient at the e-mail address to which it has been addressed. It may not be disclosed or used by any other than that addressee. If you are not the intended recipient you are requested to preserve this confidentiality and you must not use, disclose, copy, print or rely on this e-mail in any way. Please notify postmaster at scri.ac.uk quoting the name of the sender and delete the email from your system. Although SCRI has taken reasonable precautions to ensure no viruses are present in this email, neither the Institute nor the sender accepts any responsibility for any viruses, and it is your responsibility to scan the email and the attachments (if any). ______________________________________________________ From p.j.a.cock at googlemail.com Mon Mar 23 06:14:10 2009 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Mon, 23 Mar 2009 10:14:10 +0000 Subject: [Biopython-dev] biopython on github In-Reply-To: References: <5aa3b3570903210740n7f818560x47991ed97ed616df@mail.gmail.com> Message-ID: <320fb6e00903230314y212be042gfd2f0b86f8738f2d@mail.gmail.com> On Mon, Mar 23, 2009 at 9:02 AM, Leighton Pritchard wrote: > On 21/03/2009 14:40, "Giovanni Marco Dall'Olio" > wrote: > >> Have a look at this video, where it shows that the Ruby On Rails >> project has grown quicker when it has moved to github: >> >> - http://python.genedrift.org/2009/03/15/ror-commits/ >> >> (the jump should be on minute 5.10 or so) > > I've seen this argument a couple of times, now - mostly on blogs - and I'm > not sure that it's all that clear-cut. > > The RoR video shows a greater number of individual names associated with > commits, after the move to github. ?However, it's not clear whether this is > because a large number of individuals have suddenly decided to contribute to > the project, or whether the move to a version control system in which author > attribution remains with contributed code - as opposed to the bottleneck of > having to be submitted with the id of someone with write access - is > responsible. ?I don't think there's enough evidence to say 'the move to > github caused an increase in contributions'. > > As a counter-example, the number of people who have recorded contributions > to Biopython code is 46 (from the CONTRIB file on CVS). ?I don't think that > there are that many ids associated with committing the codebase on there. > My name's only associated with GenomeDiagram in the commit comments, not as > an author/committer of the code - at least, as far as the CVS application is > concerned - for example. ?This might change with git. ?Of course, I might be > misunderstanding git's attribution model, or how the stats for that RoR > video were compiled... Leighton has a good point about the attribution, and the dangers in over interpreting such a video. With git/github it will make it easier to see who contributed patches (if a patch is pulled into another branch, both the person doing the merge and the person who originally checked in the patch get recorded), and that may indirectly encourage more contributions. As Leighton points out, we do try and give credit now in CVS commit comments, but these are checked in by a core developer. I imagine this happened with RoR, but compiling this information for that video would probably have been too much work. As well as switching tools, you are also changing the metric. Something else to consider is how you are measuring activity: the git and github documentation and press encourages people to commit more often - for example while working on a bug fix or a new feature, I might make three incremental commits on my local copy of the repository, before I am happy enough to push this to the online repository. This would then show as three commits, wouldn't it - but on CVS it would probably be just one. i.e. On CVS I suspect you naturally tend to get a smaller number of larger commits than with git. This difference will probably vary from person to person - I haven't counted or anything, but with CVS I think I tend to commit lots of smaller changes, while Michiel for example tends to make fewer but larger commits). i.e. If the RoR video shows a sudden jump in the number of commits, that doesn't mean more code changes. Scaling by number of lines changed would be another metric which is perhaps more robust - but has drawbacks of its own. Peter From eric.talevich at gmail.com Mon Mar 23 16:39:05 2009 From: eric.talevich at gmail.com (Eric Talevich) Date: Mon, 23 Mar 2009 16:39:05 -0400 Subject: [Biopython-dev] PDB tidy script, was: [Bug 275 In-Reply-To: <320fb6e00903220853u7b594ee3na86560e34f742b5f@mail.gmail.com> References: <320fb6e00903220853u7b594ee3na86560e34f742b5f@mail.gmail.com> Message-ID: <3f6baf360903231339i22438e3bia554a0b7bdda7a5d@mail.gmail.com> On Sun, Mar 22, 2009 at 11:53 AM, Peter wrote: > > One drawback is that currently Bio.PDB's header parsing leaves a lot to > be desired, and very little of the header is output when saving a PDB file > (Thomas' focus is/was very much on the 3D data). > > Peter > I haven't been on this list long enough to know -- is Thomas still supporting the PDB module? If so, would he give his blessing to some more invasive changes to the PDB module, such as unifying PDBParser and parse_pdb_header? That separation has always seemed curiously vestigal to me. Now that github gives us some flexibility with public branches, it would be nice to have a discussion on some longer-term plans for Bio.PDB. I do a fair amount of work with PDB files and PyMol at my lab, and if the Biopython core devs are open to it, I can start merging enhancements into my public branch on github. However, if there's already a plan for the module, it's obviously best for me not to publish a divergent branch. -Eric From biopython at maubp.freeserve.co.uk Mon Mar 23 17:05:21 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 23 Mar 2009 21:05:21 +0000 Subject: [Biopython-dev] PDB tidy script Message-ID: <320fb6e00903231405l479ddcc6of9cd0c1aa8fd98d4@mail.gmail.com> On Mon, Mar 23, 2009 at 8:39 PM, Eric Talevich wrote: > On Sun, Mar 22, 2009 at 11:53 AM, Peter wrote: > >> >> One drawback is that currently Bio.PDB's header parsing leaves a lot to >> be desired, and very little of the header is output when saving a PDB file >> (Thomas' focus is/was very much on the 3D data). >> >> Peter > > I haven't been on this list long enough to know -- is Thomas still > supporting the PDB module? If so, would he give his blessing to some more > invasive changes to the PDB module, such as unifying PDBParser and > parse_pdb_header? That separation has always seemed curiously vestigal to > me. > Now that github gives us some flexibility with public branches, it would > be nice to have a discussion on some longer-term plans for Bio.PDB. I do a > fair amount of work with PDB files and PyMol at my lab, and if the Biopython > core devs are open to it, I can start merging enhancements into my public > branch on github. However, if there's already a plan for the module, it's > obviously best for me not to publish a divergent branch. If you look back over the history, there initially was no header parsing, it was a contribution from Kristian Rother, and I would agree, it is rather disjoint from the rest of the code. One thing I personally wanted last time I was working with PDB files was to have secondary structure information (for them alpha and beta sheet lines in the header) mapped onto the residue objects automatically. And yes, Thomas is supporting the PDB module, but his time has been rather limited of late. When I asked him about some of the open enhancement requests in bugzilla recently (off list) he said said we needed "a separate class to parse all the info in the header, not a slew of additions to the core parser class (which is designed to deal with the 3D data only)." I would suggest you try and get Thomas involved now for his input on the design (before you start coding), but if need be press ahead anyway for your own use, and he can always comment on your public branch. I hope the two of you can work together on this, and if/when Thomas does stand down (or delagate), you could then be in an excellent position to take over as the Bio.PDB maintainer if that's what you wanted. Peter From sbassi at clubdelarazon.org Tue Mar 24 02:24:38 2009 From: sbassi at clubdelarazon.org (Sebastian Bassi) Date: Tue, 24 Mar 2009 03:24:38 -0300 Subject: [Biopython-dev] SeqIO and qual: Question about reading and writing qual files Message-ID: <9e2f512b0903232324qb509c60v4154d3e1bffb089e@mail.gmail.com> I have a .fasta file and its corresponding .qual file. I run seqclean on the fasta file and I got a shorter .fasta file as output (that is expected). Using the .cln file from seqclean, I want to "trim" the .qual file the same way my new fasta is trimmed. I can read the cln and parse the information of "where to trim". For example, in one original sequence of 1000 bp, I may need to trim from 150 to 800. The problem is that I can't modify qual values using the new SeqIO qual parser (at least the size of the list can't be modified). I read the example in the doc, where it is cut doing something like: sub_rec = fullrec[150:800] But, this works only when there is a sequence (so, when read it as "fastq"), but it doesn't work when the sequence is read as "qual" (because there is no sequence and in this case I can't modify the length of the list in letter_annotations['phred_quality'], it is true that I can modify qual values in the list, but I want to modify list size). Here is the error: Traceback (most recent call last): File "/home/sbassi/bioinfo/INTA/qualparser.py", line 18, in s.letter_annotations['phred_quality'] = [0,0,0,0,10,1] File "/home/sbassi/test/virtualenv-1.3.2/t6/lib/python2.5/site-packages/biopython-1.49-py2.5-linux-i686.egg/Bio/SeqRecord.py", line 33, in __setitem__ "strings) of length %i." % self._length) TypeError: We only allow python sequences (lists, tuples or strings) of length 5. (Note: 5 was the size of the original qual record, when I tried to set it to [0,0,0,0,10,1], I get this). So my question is: Does it make sense to allow the user to modify the size of the list in letter_annotations['phred_quality'] in qual sequences? I think this is a nice feature for qual SeqIO.parse. If I can modify the list size, then I can save the modified version with SeqIO.write(x,fh,"qual") and have a qual file with a new size. I am using 1.49 with new files from CVS. -- Sebasti?n Bassi. Diplomado en Ciencia y Tecnolog?a. Non standard disclaimer: READ CAREFULLY. By reading this email, you agree, on behalf of your employer, to release me from all obligations and waivers arising from any and all NON-NEGOTIATED agreements, licenses, terms-of-service, shrinkwrap, clickwrap, browsewrap, confidentiality, non-disclosure, non-compete and acceptable use policies ("BOGUS AGREEMENTS") that I have entered into with your employer, its partners, licensors, agents and assigns, in perpetuity, without prejudice to my ongoing rights and privileges. You further represent that you have the authority to release me from any BOGUS AGREEMENTS on behalf of your employer. From biopython at maubp.freeserve.co.uk Tue Mar 24 05:49:33 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 24 Mar 2009 09:49:33 +0000 Subject: [Biopython-dev] SeqIO and qual: Question about reading and writing qual files In-Reply-To: <9e2f512b0903232324qb509c60v4154d3e1bffb089e@mail.gmail.com> References: <9e2f512b0903232324qb509c60v4154d3e1bffb089e@mail.gmail.com> Message-ID: <320fb6e00903240249h4d0bf648rfd5de741e582f687@mail.gmail.com> On Tue, Mar 24, 2009 at 6:24 AM, Sebastian Bassi wrote: > I have a .fasta file and its corresponding .qual file. > I run seqclean on the fasta file and I got a shorter .fasta file as > output (that is expected). Whose seqclean script are you using? If it doesn't output the trimmed qual file, can it work with FASTQ output instead? > Using the .cln file from seqclean, I want to "trim" the .qual file the > same way my new fasta is trimmed. > I can read the cln and parse the information of "where to trim". > For example, in one original sequence of 1000 bp, I may need to trim > from 150 to 800. > The problem is that I can't modify qual values using the new SeqIO > qual parser (at least the size of the list can't be modified). I read > the example in the doc, where it is cut doing something like: > sub_rec = fullrec[150:800] > But, this works only when there is a sequence (so, when read it as > "fastq"), but it doesn't work when the sequence is read as "qual" > (because there is no sequence ... > So my question is: Does it make sense to allow the user to modify the > size of the list in letter_annotations['phred_quality'] in qual > sequences? I think this is a nice feature for qual SeqIO.parse. This was one area of the new SeqRecord slicing I was a little unsure about - slicing a qual file's SeqRecord (or any SeqRecord with a None for the sequence). I hadn't done anything about it immediately as I couldn't think of a use case for it - so that's solved ;) One solution would be to introduce an UnknownSeq object, which would be much nicer to deal with than a None object, as it would have a length and support slicing. I've mentioned this idea before, but haven't yet put forward any actual code. This seems most elegant. Another option would be to special case handle slicing a SeqRecord with a None sequence, where we'd slice its per-letter-annotation. For now, you can force this with the current code by: #Not recommend, short term hack s.letter_annotations._length = 6 s.letter_annotations['phred_quality'] = [0,0,0,0,10,1] Right now, without changing Biopython, I have another workaround for you: Use the paired reader in Bio.SeqIO.QualityIO on the untrimmed FASTA and QUAL files, which will give you SeqRecords with both the sequence and the quality - and trim these by slicing the SeqRecord. Peter From sbassi at clubdelarazon.org Tue Mar 24 10:59:51 2009 From: sbassi at clubdelarazon.org (Sebastian Bassi) Date: Tue, 24 Mar 2009 11:59:51 -0300 Subject: [Biopython-dev] SeqIO and qual: Question about reading and writing qual files In-Reply-To: <320fb6e00903240249h4d0bf648rfd5de741e582f687@mail.gmail.com> References: <9e2f512b0903232324qb509c60v4154d3e1bffb089e@mail.gmail.com> <320fb6e00903240249h4d0bf648rfd5de741e582f687@mail.gmail.com> Message-ID: <9e2f512b0903240759n3c7f8b8fpc96bccd4d629082d@mail.gmail.com> On Tue, Mar 24, 2009 at 6:49 AM, Peter wrote: > Whose seqclean script are you using? If it doesn't output the trimmed > qual file, can it work with FASTQ output instead? I am using the seqclean found here: http://compbio.dfci.harvard.edu/tgi/software/ I doesn't output a trimmed qual file because seqclean accepts only fasta as input. Oh, wait!!!. Looking at my seqclean directory I found a cln2qual script. So I looked at the README to see what is it, and I found: "If after seqclean one needs to trim the corresponding quality values too, according to the new coordinates or trash codes found by seqclean, the utility script "cln2qual" is included (see the usage message). It expects a fasta-like file containing space delimited quality values for each nucleotide of the original sequences. It should be run after the seqclean, as it parses the trimming ("clear range") coordinates and trash codes from the cleaning report and applies them to the quality records." So this utility does what I was about to do with Biopython. But anyway, regarding this: > This was one area of the new SeqRecord slicing I was a little unsure > about - slicing a qual file's SeqRecord (or any SeqRecord with a None > for the sequence). I hadn't done anything about it immediately as I > couldn't think of a use case for it - so that's solved ;) > One solution would be to introduce an UnknownSeq object, which .... I agree with the need of an UnknownSeq object for modify the size of the qual file. Best, SB. From biopython at maubp.freeserve.co.uk Tue Mar 24 11:13:40 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 24 Mar 2009 15:13:40 +0000 Subject: [Biopython-dev] SeqIO and qual: Question about reading and writing qual files In-Reply-To: <9e2f512b0903240759n3c7f8b8fpc96bccd4d629082d@mail.gmail.com> References: <9e2f512b0903232324qb509c60v4154d3e1bffb089e@mail.gmail.com> <320fb6e00903240249h4d0bf648rfd5de741e582f687@mail.gmail.com> <9e2f512b0903240759n3c7f8b8fpc96bccd4d629082d@mail.gmail.com> Message-ID: <320fb6e00903240813x5fdb3589qef340129b5e267c0@mail.gmail.com> On Tue, Mar 24, 2009 at 2:59 PM, Sebastian Bassi wrote: > But anyway, regarding this: > >> This was one area of the new SeqRecord slicing I was a little unsure >> about - slicing a qual file's SeqRecord (or any SeqRecord with a None >> for the sequence). ?I hadn't done anything about it immediately as I >> couldn't think of a use case for it - so that's solved ;) >> One solution would be to introduce an UnknownSeq object, which >> .... > > I agree with the need of an UnknownSeq object for modify the size of > the qual file. Suppose you read in a qual file (or a GenBank file with no sequence, just a CONTIG line), and instead of None, the SeqRecord object(s) had a new UnknownSeq object saying they where made up of a given number of "N" characters using a DNA alphabet. What would you expect to get if you used Bio.SeqIO to write out the file in FASTA format? To my mind there are two sensible options - write out the file using the "NNN....N" sequence, or raise an error. Peter From biopython at maubp.freeserve.co.uk Tue Mar 24 11:23:20 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 24 Mar 2009 15:23:20 +0000 Subject: [Biopython-dev] SeqIO and qual: Question about reading and writing qual files In-Reply-To: <320fb6e00903240249h4d0bf648rfd5de741e582f687@mail.gmail.com> References: <9e2f512b0903232324qb509c60v4154d3e1bffb089e@mail.gmail.com> <320fb6e00903240249h4d0bf648rfd5de741e582f687@mail.gmail.com> Message-ID: <320fb6e00903240823o53267d8bn36908f001708f974@mail.gmail.com> On Tue, Mar 24, 2009 at 9:49 AM, Peter wrote: > > This was one area of the new SeqRecord slicing I was a little unsure > about - slicing a qual file's SeqRecord (or any SeqRecord with a None > for the sequence). ?I hadn't done anything about it immediately as I > couldn't think of a use case for it - so that's solved ;) > > One solution would be to introduce an UnknownSeq object, which > would be much nicer to deal with than a None object, as it would have > a length and support slicing. ?I've mentioned this idea before, but > haven't yet put forward any actual code. ?This seems most elegant. > > Another option would be to special case handle slicing a SeqRecord > with a None sequence, where we'd slice its per-letter-annotation. That should now be working with the change I've just checked into CVS, but the combination of slicing per-letter-annotation while the sequence is None is a real pain. I'm almost tempted to back out the qual parser for the next release (FASTQ support is fine), but let's see if if we can reach a consensus on a new UnknownSeq class instead (see my earlier email on this - what would you expect to happen if you read in a QUAL file and tried to save it as a FASTA file?). Peter From sbassi at clubdelarazon.org Tue Mar 24 11:33:56 2009 From: sbassi at clubdelarazon.org (Sebastian Bassi) Date: Tue, 24 Mar 2009 12:33:56 -0300 Subject: [Biopython-dev] SeqIO and qual: Question about reading and writing qual files In-Reply-To: <320fb6e00903240813x5fdb3589qef340129b5e267c0@mail.gmail.com> References: <9e2f512b0903232324qb509c60v4154d3e1bffb089e@mail.gmail.com> <320fb6e00903240249h4d0bf648rfd5de741e582f687@mail.gmail.com> <9e2f512b0903240759n3c7f8b8fpc96bccd4d629082d@mail.gmail.com> <320fb6e00903240813x5fdb3589qef340129b5e267c0@mail.gmail.com> Message-ID: <9e2f512b0903240833g7768de97q8f10fe72cde7e64a@mail.gmail.com> On Tue, Mar 24, 2009 at 12:13 PM, Peter wrote: .... > characters using a DNA alphabet. What would you expect to get if you > used Bio.SeqIO to write out the file in FASTA format? To my mind there > are two sensible options - write out the file using the "NNN....N" > sequence, or raise an error. "N" is OK (with the same length of the qual file), that is what ABI does when the QV is low. This is not the same case but I always think of "N" as "unknown". Raise an error is not bad because I don't see the need to go from an non-sequence qual to a fasta (it doesn't make sense). But that I don't see the need, doesn't means someone else may have a reason. Best, -- Sebasti?n Bassi. Diplomado en Ciencia y Tecnolog?a. Non standard disclaimer: READ CAREFULLY. By reading this email, you agree, on behalf of your employer, to release me from all obligations and waivers arising from any and all NON-NEGOTIATED agreements, licenses, terms-of-service, shrinkwrap, clickwrap, browsewrap, confidentiality, non-disclosure, non-compete and acceptable use policies ("BOGUS AGREEMENTS") that I have entered into with your employer, its partners, licensors, agents and assigns, in perpetuity, without prejudice to my ongoing rights and privileges. You further represent that you have the authority to release me from any BOGUS AGREEMENTS on behalf of your employer. From bugzilla-daemon at portal.open-bio.org Tue Mar 24 14:25:17 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 24 Mar 2009 14:25:17 -0400 Subject: [Biopython-dev] [Bug 2799] New: UnknownSeq object (e.g. for QUAL files) Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2799 Summary: UnknownSeq object (e.g. for QUAL files) Product: Biopython Version: Not Applicable Platform: All OS/Version: All Status: NEW Severity: enhancement Priority: P2 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: biopython-bugzilla at maubp.freeserve.co.uk Sometimes we want to represent an unknown sequence with a known length, e.g. "N"*length for nucleotides. This enhancement is about adding an UnknownSeq object to Biopython which would have the following init arguments: * length * alphabet * character (single letter string, defaulting to "X" for protein and "N" for nucleotides, "?" otherwise) Currently the Bio.SeqIO "qual" parser produces SeqRecord objects where the seq is None, yet there is a known length. This can also occur in GenBank files where the is a CONTIG line but no sequence. This makes supporting slicing (Bug 2507) complicated. Adding a new UnknownSeq class would solve this elegantly. In general, the UnknownSeq object should act as a Seq object whose sequence is the character*length. Slicing or adding UnknownSeq objects should give a new UnknownSeq object. Complement, reverse complement, transcribe and back transcribe can also return new UnknownSeq objects of the same length (alphabet permitting). Translation can return an UnknownSeq object using "X" and a protein alphabet (with the length roughly one third of the nucleotide length - whatever is consistent with the Seq translate method). Adding an UnknownSeq object to a Seq would have to give a new Seq object (or an error?). One use-case example here would be joining together contigs with unknown regions of a given length (strings of N's). This bug is a placeholder for patches or pointers to possible implementations (e.g. I intend to try some ideas on a branch on github). I expect most of the discussion to be on the (dev) mailing list, rather than bugzilla. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From tiagoantao at gmail.com Tue Mar 24 14:42:56 2009 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Tue, 24 Mar 2009 18:42:56 +0000 Subject: [Biopython-dev] biopython on github In-Reply-To: <20090317124930.GE57054@sobchak.mgh.harvard.edu> References: <320fb6e00903170146x59218aa0m857cab797ad3f440@mail.gmail.com> <20090317124930.GE57054@sobchak.mgh.harvard.edu> Message-ID: <6d941f120903241142m2f39213yfd180fcdc7ab7f0e@mail.gmail.com> Hi, On Tue, Mar 17, 2009 at 12:49 PM, Brad Chapman wrote: > There is a lot of good material in this thread for new potential > developers. Tiago, it would make sense to condense what you've > written and include it with the Contributing guide: Just a followup on this: I think it makes no sense to put much of the new content before there is an official step of moving to github. What I am doing, is just to put, for test purposes a framework to see how these suggestions my work. I' ve created a fork http://github.com/tiagoantao/biopython-popgen-test/ with several branches The proposed idea is: 1. The master branch should be a clearing house and stability point for things to be suggested for submission to the official branch. All code here should have unit tests, all unit tests should pass and documentation should exist. Is is also a place to correct bugs that are discovered in the official trunk (if these are simple to correct and don' t require the creation of a temporary branch to sort them out) 2. There is a stats branch to work on Bio.PopGen.Stats. If you want to work on statistics you can follow/fork from the statistics branch. Any code that people might have should be discussed to if they want to make it on the official release. 3. Less interesting to others, I will personally create a genepop branch to make an enhancement to the existing parser and on the ability to call the genepop binary. So: People work on their very personal branches (like my genepop one). Development branches that might have shared interests (like the stats one) should be forked/shared commit and people interested should discuss among themselves. Whenever some content is deemed ready it is then put on the popgen master branch (alongside with tests and documentation). When the master branch is in a stable state, then the changes are proposed to the official one. In my view, this protects the people working on the official thing from the potential chaos of new developments, while creating a framework which allow for people to test innovations... Tiago From biopython at maubp.freeserve.co.uk Tue Mar 24 14:54:28 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 24 Mar 2009 18:54:28 +0000 Subject: [Biopython-dev] biopython on github In-Reply-To: <6d941f120903241142m2f39213yfd180fcdc7ab7f0e@mail.gmail.com> References: <320fb6e00903170146x59218aa0m857cab797ad3f440@mail.gmail.com> <20090317124930.GE57054@sobchak.mgh.harvard.edu> <6d941f120903241142m2f39213yfd180fcdc7ab7f0e@mail.gmail.com> Message-ID: <320fb6e00903241154g1a0f468cy512b29504b8b637a@mail.gmail.com> 2009/3/24 Tiago Ant?o : > Hi, > > On Tue, Mar 17, 2009 at 12:49 PM, Brad Chapman wrote: >> There is a lot of good material in this thread for new potential >> developers. Tiago, it would make sense to condense what you've >> written and include it with the Contributing guide: > > Just a followup on this: I think it makes no sense to put much of the > new content before there is an official step of moving to github. True - but we do need enough pointers for people to help try things out. > What I am doing, is just to put, for test purposes a framework to see > how these suggestions my work.... > In my view, this protects the people working on the official thing > from the potential chaos of new developments, while creating a > framework which allow for people to test innovations... That sounds great, and a good model for other (self contained) modules under active development. I'm thinking along similar lines for Bio.SeqIO and AlignIO (and by implication, the SeqRecord and the Alignment classes). I would assume (although you didn't say this) you would also pull changes to the official trunk into your branches periodically - at very least after each official Biopython release. Peter From bartek at rezolwenta.eu.org Tue Mar 24 19:58:30 2009 From: bartek at rezolwenta.eu.org (Bartek Wilczynski) Date: Wed, 25 Mar 2009 00:58:30 +0100 Subject: [Biopython-dev] history on github - where are the tags? In-Reply-To: <8b34ec180903241649p7e81a2cew6587512c0cef16f@mail.gmail.com> References: <320fb6e00903170206h570989bbgb6b9a761d2aa70ed@mail.gmail.com> <8b34ec180903241649p7e81a2cew6587512c0cef16f@mail.gmail.com> Message-ID: <8b34ec180903241658k21a76269r789600f92c17fbbb@mail.gmail.com> Hi all, Sorry for being quiet all that time, but the conference (+jet lag both ways) proved to be more engaging than I thought. For the tags, they were not pushed to github before, because I didn't know I need to specifically do it qith git push --tags. Now they are pushed to the repository and you can fetch them to local copies by git pull -t in any local directory which resulted from cloning the official branch. They probably won't get automatically transfered to derived branches, I guess you need to pull them from the original (official) branch. cheers Bartek On Wed, Mar 25, 2009 at 12:49 AM, Bartek Wilczynski wrote: > Hi all, > > Sorry for being quiet all that time, but the conference (+jet lag both > ways) proved to be more engaging than I thought. > > For the tags, they were not pushed to github before, because I didn't > know I need to specifically do it qith git push --tags. > > Now they are pushed to the repository and you can fetch them to local > copies by git pull -t in any local directory which resulted from > cloning the official branch. > > They probably won't get automatically transfered to derived branches, > I guess you need to pull > them from the original (official) branch. > > cheers > Bartek > > On Tue, Mar 17, 2009 at 10:06 AM, Peter wrote: >> Hi Bartek et al, >> >> I've just been looking over the github mirror of CVS, and wanted to >> see it presented the history of individual files. ?For example, this >> page looks at the Bio/SeqRecord.py history using ViewCVS: >> http://cvs.biopython.org/cgi-bin/viewcvs/viewcvs.cgi/biopython/Bio/SeqRecord.py?cvsroot=biopython >> >> For comparison, in GitHub, >> http://github.com/biopython/biopython/commits/master/Bio/SeqRecord.py >> >> As you can see, all the comments and changes are there - which is >> great. ?But I can't see the CVS tag information, which I assume would >> be converting into git tags. ?Is this information present in the git >> repository, but not shown by github, or was it lost during the >> migration? ?This might seem like a little thing, but I have found it >> incredibly important for tracing bugs reported in older releases, for >> example in narrowing down when something changed. >> >> Peter >> > > > > -- > Bartek Wilczynski > ================== > Postdoctoral fellow > EMBL, Furlong group > Meyerhoffstrasse 1, > 69012 Heidelberg, > Germany > tel: +49 6221 387 8433 > -- Bartek Wilczynski ================== Postdoctoral fellow EMBL, Furlong group Meyerhoffstrasse 1, 69012 Heidelberg, Germany tel: +49 6221 387 8433 From biopython at maubp.freeserve.co.uk Wed Mar 25 06:01:45 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 25 Mar 2009 10:01:45 +0000 Subject: [Biopython-dev] SeqIO and qual: Question about reading and writing qual files In-Reply-To: <9e2f512b0903240833g7768de97q8f10fe72cde7e64a@mail.gmail.com> References: <9e2f512b0903232324qb509c60v4154d3e1bffb089e@mail.gmail.com> <320fb6e00903240249h4d0bf648rfd5de741e582f687@mail.gmail.com> <9e2f512b0903240759n3c7f8b8fpc96bccd4d629082d@mail.gmail.com> <320fb6e00903240813x5fdb3589qef340129b5e267c0@mail.gmail.com> <9e2f512b0903240833g7768de97q8f10fe72cde7e64a@mail.gmail.com> Message-ID: <320fb6e00903250301v59319214pa3246e0a49899e87@mail.gmail.com> On Tue, Mar 24, 2009 at 3:33 PM, Sebastian Bassi wrote: > On Tue, Mar 24, 2009 at 12:13 PM, Peter wrote: > .... >> characters using a DNA alphabet. What would you expect to get if you >> used Bio.SeqIO to write out the file in FASTA format? ?To my mind there >> are two sensible options - write out the file using the "NNN....N" >> sequence, or raise an error. > > "N" is OK (with the same length of the qual file), that is what ABI > does when the QV is low. This is not the same case but I always think > of "N" as "unknown". > Raise an error is not bad because I don't see the need to go from an > non-sequence qual to a fasta (it doesn't make sense). But that I don't > see the need, doesn't means someone else may have a reason. > Best, I've filed an enhancement bug for the possible enhancement to add an UnknownSeq object, perhaps as part of the Bio.Seq module, Bug 2799 http://bugzilla.open-bio.org/show_bug.cgi?id=2799 I've done an initial patch (which I plan to upload on Bugzilla) which is available now on git hub on a new branch: http://github.com/peterjc/biopython/tree/bug2799-UnknownSeq Note this doesn't do anything special (yet) when writing output files, so they will by default record a string of whatever unknown sequence character was used. It would make sense for GenBank/EMBL in SeqIO to also take advantage o the UnknownSeq object, because here the sequence is essentially optional (consider files with just a CONTIG line), but should always have a length. Sebastian - could you have a quick play with this github code (using the new UnknownSeq class), and the current CVS code (using None), and make sure both support the slicing operations you were trying earlier? Thanks. Peter From biopython at maubp.freeserve.co.uk Wed Mar 25 06:28:46 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 25 Mar 2009 10:28:46 +0000 Subject: [Biopython-dev] history on github - where are the tags? In-Reply-To: <8b34ec180903241658k21a76269r789600f92c17fbbb@mail.gmail.com> References: <320fb6e00903170206h570989bbgb6b9a761d2aa70ed@mail.gmail.com> <8b34ec180903241649p7e81a2cew6587512c0cef16f@mail.gmail.com> <8b34ec180903241658k21a76269r789600f92c17fbbb@mail.gmail.com> Message-ID: <320fb6e00903250328y19165a77t470124ce490cea3d@mail.gmail.com> On Tue, Mar 24, 2009 at 11:58 PM, Bartek Wilczynski wrote: > > Hi all, > > Sorry for being quiet all that time, but the conference (+jet lag both > ways) proved to be more engaging than I thought. That's fine - sleep is important ;) > For the tags, they were not pushed to github before, because I didn't > know I need to specifically do it qith git push --tags. I assume you've updated your cron job so this will happen automatically in future (e.g. when we do Biopython 1.50 beta). > Now they are pushed to the repository and you can fetch them to local > copies by git pull -t in any local directory which resulted from > cloning the official branch. Yes, I've checked and I can get the tags with: git pull -t ... or, git pull --tags ... They also show up in github (near the top, drop down menu next to branches) and in gitx (and I assume other GUI clients). They have commit comments like "This commit was manufactured by cvs2svn to create tag 'biopython-146'", which is fine. However, all the tags seem to have associated with them the deletion of the files AUTHORS and Bio/UniGene/UniGene.py which is rather odd. If you can work out how this happened, would it be trivial to back these tags out and redo it? > They probably won't get automatically transfered to derived branches, > I guess you need to pull them from the original (official) branch. That makes sense. Peter From mjldehoon at yahoo.com Wed Mar 25 07:47:59 2009 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Wed, 25 Mar 2009 04:47:59 -0700 (PDT) Subject: [Biopython-dev] Bio.Entrez catching more errors In-Reply-To: <320fb6e00903220344t1057bf74mcdc1f2256d8b29b4@mail.gmail.com> Message-ID: <559251.50851.qm@web62401.mail.re1.yahoo.com> > What about the fairly common situation (at, its something > I've done fairly often) where Bio.Entrez.efetch() is used > to fetch records which are saved directly to file without > verification - e.g. to be parsed by another program? > Unless the error is caught in Bio.Entrez.efetch() > it may be out of our control. That is easy: just run the output returned by NCBI through the appropriate parser. If the parser is happy, proceed to save the NCBI output in a file. > The first half of the email (the main point) was based > on a special case: HTML and XML are pretty easy to > identify. If you ask for HTML and don't get it, it is > an error (and vice versa). If you ask for XML and don't > get it, it is an error (and vice versa). The fact that > the NCBI currently often return an HTML or XML error > page when a plain text format was requested is then > easily detected as an error (simply from the file type). > This will still work even if the NCBI do change their > error formats or wording - it should be pretty robust. Have a look at serialset.xml in the Bio.Entrez test cases ... this is the output obtained from NCBI using efetch from the journals database with retmode='xml'. The file looks like XML, but it doesn't start with " References: <320fb6e00903220344t1057bf74mcdc1f2256d8b29b4@mail.gmail.com> <559251.50851.qm@web62401.mail.re1.yahoo.com> Message-ID: <320fb6e00903250515vd885b34s629dd9253d4f9186@mail.gmail.com> On Wed, Mar 25, 2009 at 11:47 AM, Michiel de Hoon wrote: > >> What about the fairly common situation (at, its something >> I've done fairly often) where Bio.Entrez.efetch() is used >> to fetch records which are saved directly to file without >> verification - e.g. to be parsed by another program? >> Unless the error is caught in Bio.Entrez.efetch() >> it may be out of our control. > > That is easy: just run the output returned by NCBI through > the appropriate parser. If the parser is happy, proceed to > save the NCBI output in a file. Possible, but you'd need to cache the handle's data in order to be able to save it after parsing. The UndoHandle doesn't do this. You could save the data to a file, and then check the parser can read it back - however, this would be complicated if you are downloading data in batches to go into a single file. >> The first half of the email (the main point) was based >> on a special case: HTML and XML are pretty easy to >> identify. ?If you ask for HTML and don't get it, it is >> an error (and vice versa). ?If you ask for XML and don't >> get it, it is an error (and vice versa). ?The fact that >> the NCBI currently often return an HTML or XML error >> page when a plain text format was requested is then >> easily detected as an error (simply from the file type). >> This will still work even if the NCBI do change their >> error formats or wording - it should be pretty robust. > > Have a look at serialset.xml in the Bio.Entrez test cases ... this > is the output obtained from NCBI using efetch from the journals > database with retmode='xml'. The file looks like XML, but it > doesn't start with " correctly, so while it's not pretty to me this would not count as > an error. I do concede my sample code for detecting XML or HTML could be improved, and this provides a good test case for a difficult XML file. Maybe when we expect XML (or HTML), all we should check is the file starts with "<"? e.g. elif "retmode" in params and params["retmode"].lower()=="html" \ and not data.lower().startswith("<") : raise TypeError("Requested HTML, but didn't get it: %s..." % data) elif "retmode" in params and params["retmode"].lower()=="xml" \ and not data.lower().startswith("<") : raise TypeError("Requested XML, but didn't get it: %s..." % data) elif "retmode" in params and params["retmode"] \ and params["retmode"].lower()!="xml" \ and data.lower().startswith(" References: <320fb6e00903170206h570989bbgb6b9a761d2aa70ed@mail.gmail.com> <8b34ec180903241649p7e81a2cew6587512c0cef16f@mail.gmail.com> <8b34ec180903241658k21a76269r789600f92c17fbbb@mail.gmail.com> <320fb6e00903250328y19165a77t470124ce490cea3d@mail.gmail.com> Message-ID: <8b34ec180903250516v75efdd2i95cb77145b4d3001@mail.gmail.com> On Wed, Mar 25, 2009 at 11:28 AM, Peter wrote: > I assume you've updated your cron job so this will happen > automatically in future (e.g. when we do Biopython 1.50 beta). Yes, naturally. > > However, all the tags seem to have associated with them the deletion > of the files AUTHORS and Bio/UniGene/UniGene.py which is rather odd. > If you can work out how this happened, would it be trivial to back > these tags out and redo it? > That's really odd. I don't know exactly where it comes from, but I've done some detective work and here are my findings: For the AUTHORS file, it was indeed deleted in a commit by Jeff Chang (2001): http://github.com/biopython/biopython/tree/c9dfca8631c23b47bddb519dce9e98d07079eb65 Which "renames" the AUTHORS file into CONTRIB file. The AUTHORS file is in the biopython tags prior to 1.00a1 and then it should not be there anymore (it's in CVS'a attic) I don't know where how it came back... Similarly, the Bio/Unigene/UniGene.py file was removed by Jeff in a commit: http://github.com/biopython/biopython/commit/8b940e38d0fbb7c471366f844318c32b08bdd8c2 And similarly, UniGene.py is no longer in CVS repo (but it's still in the attic). What these files have in common, is that there are some commits to them after they've been moved to Attic (sic!) http://github.com/biopython/biopython/commits/master/Bio/UniGene/UniGene.py http://github.com/biopython/biopython/commits/master/AUTHORS I don't know exactly how this could happen, but this inconsistency in CVS might be causing cvs2git to actually include these guys. I'll increase the verbosity of the log messages in my cron script, so Maybe I'll see some indication of a problem. If nobody has a reason for these files to be included in the current trunk, I'll go ahead and remove them from git. cheers Bartek -- Bartek Wilczynski ================== Postdoctoral fellow EMBL, Furlong group Meyerhoffstrasse 1, 69012 Heidelberg, Germany tel: +49 6221 387 8433 From biopython at maubp.freeserve.co.uk Wed Mar 25 08:20:05 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 25 Mar 2009 12:20:05 +0000 Subject: [Biopython-dev] history on github - where are the tags? In-Reply-To: <8b34ec180903250516v75efdd2i95cb77145b4d3001@mail.gmail.com> References: <320fb6e00903170206h570989bbgb6b9a761d2aa70ed@mail.gmail.com> <8b34ec180903241649p7e81a2cew6587512c0cef16f@mail.gmail.com> <8b34ec180903241658k21a76269r789600f92c17fbbb@mail.gmail.com> <320fb6e00903250328y19165a77t470124ce490cea3d@mail.gmail.com> <8b34ec180903250516v75efdd2i95cb77145b4d3001@mail.gmail.com> Message-ID: <320fb6e00903250520nedc0aaj84c10a1b2a72e8a2@mail.gmail.com> >> However, all the tags seem to have associated with them the deletion >> of the files AUTHORS and Bio/UniGene/UniGene.py which is rather odd. >> If you can work out how this happened, would it be trivial to back >> these tags out and redo it? >> > That's really odd. I don't know exactly where it comes from, but I've > done some detective work and here are my findings: > > For the AUTHORS ?file, it was indeed deleted in a commit by Jeff Chang (2001): > http://github.com/biopython/biopython/tree/c9dfca8631c23b47bddb519dce9e98d07079eb65 > Which "renames" the AUTHORS file into CONTRIB file. > > The AUTHORS file is in the biopython tags prior to 1.00a1 and then it > should not be there anymore (it's in CVS'a attic) >?I don't know where how it came back... > > Similarly, the Bio/Unigene/UniGene.py file was removed by Jeff in a commit: > http://github.com/biopython/biopython/commit/8b940e38d0fbb7c471366f844318c32b08bdd8c2 > > And similarly, UniGene.py is no longer in CVS repo (but it's still in > the attic). > > What these files have in common, is that there are some commits to > them after they've been moved to Attic (sic!) > > http://github.com/biopython/biopython/commits/master/Bio/UniGene/UniGene.py > http://github.com/biopython/biopython/commits/master/AUTHORS > > I don't know exactly how this could happen, but this inconsistency in > CVS might be causing cvs2git to actually include these guys. It does sound like a hidden hickup in our CVS repository... very strange. Peter From bartek at rezolwenta.eu.org Wed Mar 25 08:43:00 2009 From: bartek at rezolwenta.eu.org (Bartek Wilczynski) Date: Wed, 25 Mar 2009 13:43:00 +0100 Subject: [Biopython-dev] history on github - where are the tags? In-Reply-To: <320fb6e00903250520nedc0aaj84c10a1b2a72e8a2@mail.gmail.com> References: <320fb6e00903170206h570989bbgb6b9a761d2aa70ed@mail.gmail.com> <8b34ec180903241649p7e81a2cew6587512c0cef16f@mail.gmail.com> <8b34ec180903241658k21a76269r789600f92c17fbbb@mail.gmail.com> <320fb6e00903250328y19165a77t470124ce490cea3d@mail.gmail.com> <8b34ec180903250516v75efdd2i95cb77145b4d3001@mail.gmail.com> <320fb6e00903250520nedc0aaj84c10a1b2a72e8a2@mail.gmail.com> Message-ID: <8b34ec180903250543g3029edb4h33d332371ef4e469@mail.gmail.com> On Wed, Mar 25, 2009 at 1:20 PM, Peter wrote: >> I don't know exactly how this could happen, but this inconsistency in >> CVS might be causing cvs2git to actually include these guys. > > It does sound like a hidden hickup in our CVS repository... very strange. I would rather call it a glitch in a transition. I was actually quite surprised that the transition went so smooth. Now we can see that actually some things did not transfer too well... I did a thorough check to compare checkouts from current CVS and git trunks to see that there are also some other differences: As you can see below, there apart from these two files present only in git, a number of directories are not missing in git. I've checked: they are all empty directories leftover because you cannot delete a directory from CVS (some of them, like Bio.Tools have actually a number of directories in them, but they are all empty). I think that it's actually a desired behavior (removing empty directories) but if anyone is missing any of these dirs, please let me know. The diff: Only in git_branch/: AUTHORS Only in biopython/Bio: Ais Only in biopython/Bio: CDD Only in biopython/Bio: cmmCIF Only in biopython/Bio: config Only in biopython/Bio: dbdefs Only in biopython/Bio: ECell Only in biopython/Bio: expressions Only in biopython/Bio: formatdefs Only in biopython/Bio: Gobase Only in biopython/Bio: iodefs Only in biopython/Bio: Kabat Only in biopython/Bio: LocusLink Only in biopython/Bio: MultiProc Only in biopython/Bio/PDB: mmCIF_lex Only in biopython/Bio: Rebase Only in biopython/Bio/SCOP: tests Only in biopython/Bio: sources Only in biopython/Bio: Tools Only in git_branch/Bio/UniGene: UniGene.py Only in biopython/Doc/cookbook: biopython_test Only in biopython/Doc/cookbook: genbank_to_fasta Only in biopython/Doc/cookbook: LogisticRegression Only in biopython: Experimental Only in git_branch/: .git Only in biopython/Martel: examples Only in biopython/Tests: CDD Only in biopython/Tests: ECell Only in biopython/Tests: Gobase Only in biopython/Tests: Kabat Only in biopython/Tests: LocusLink Only in biopython/Tests: Ndb Only in biopython/Tests: UnitTests Only in biopython/Tests: WIT cheers Bartek From biopython at maubp.freeserve.co.uk Wed Mar 25 08:47:02 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 25 Mar 2009 12:47:02 +0000 Subject: [Biopython-dev] history on github - where are the tags? In-Reply-To: <8b34ec180903250543g3029edb4h33d332371ef4e469@mail.gmail.com> References: <320fb6e00903170206h570989bbgb6b9a761d2aa70ed@mail.gmail.com> <8b34ec180903241649p7e81a2cew6587512c0cef16f@mail.gmail.com> <8b34ec180903241658k21a76269r789600f92c17fbbb@mail.gmail.com> <320fb6e00903250328y19165a77t470124ce490cea3d@mail.gmail.com> <8b34ec180903250516v75efdd2i95cb77145b4d3001@mail.gmail.com> <320fb6e00903250520nedc0aaj84c10a1b2a72e8a2@mail.gmail.com> <8b34ec180903250543g3029edb4h33d332371ef4e469@mail.gmail.com> Message-ID: <320fb6e00903250547s7d88a1b3h8c52dd852047edb6@mail.gmail.com> On Wed, Mar 25, 2009 at 12:43 PM, Bartek Wilczynski wrote: > I did a ?thorough check to compare checkouts from current CVS and git > trunks to see that there are also some other differences: > As you can see below, there apart from these two files present only in > git, a number of directories are not missing in git. I've checked: > they are all empty directories leftover because you cannot delete a > directory from CVS (some of them, like Bio.Tools have actually a > number of directories in them, but they are all empty). > > I think that it's actually a desired behavior (removing empty > directories) but if anyone is missing any of these dirs, please let me > know. I don't care about the missing empty directories - if/once we move to git, we would have deleted them anyway. So if that has been done automatically, that's fine in my opinion. Peter From tiagoantao at gmail.com Wed Mar 25 11:39:42 2009 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Wed, 25 Mar 2009 15:39:42 +0000 Subject: [Biopython-dev] biopython on github In-Reply-To: <320fb6e00903241154g1a0f468cy512b29504b8b637a@mail.gmail.com> References: <320fb6e00903170146x59218aa0m857cab797ad3f440@mail.gmail.com> <20090317124930.GE57054@sobchak.mgh.harvard.edu> <6d941f120903241142m2f39213yfd180fcdc7ab7f0e@mail.gmail.com> <320fb6e00903241154g1a0f468cy512b29504b8b637a@mail.gmail.com> Message-ID: <6d941f120903250839i62f6d8f9i8a5f5b85ff694848@mail.gmail.com> On Tue, Mar 24, 2009 at 6:54 PM, Peter wrote: >> In my view, this protects the people working on the official thing >> from the potential chaos of new developments, while creating a >> framework which allow for people to test innovations... > > That sounds great, and a good model for other (self contained) modules under Just a minor point. any development branches should be seen as highly unstable. I say this just because I am restarting to work on statistics and I am seeing massive refactoring going on. So if people track development branches, they should be prepared for chaos ;) . Which is exactly the opposite they should expect from the official branch ;) From biopython at maubp.freeserve.co.uk Wed Mar 25 11:45:00 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 25 Mar 2009 15:45:00 +0000 Subject: [Biopython-dev] biopython on github In-Reply-To: <6d941f120903250839i62f6d8f9i8a5f5b85ff694848@mail.gmail.com> References: <320fb6e00903170146x59218aa0m857cab797ad3f440@mail.gmail.com> <20090317124930.GE57054@sobchak.mgh.harvard.edu> <6d941f120903241142m2f39213yfd180fcdc7ab7f0e@mail.gmail.com> <320fb6e00903241154g1a0f468cy512b29504b8b637a@mail.gmail.com> <6d941f120903250839i62f6d8f9i8a5f5b85ff694848@mail.gmail.com> Message-ID: <320fb6e00903250845u23dea2a6o5330bfdec0d577ef@mail.gmail.com> 2009/3/25 Tiago Ant?o : > Just a minor point. any development branches should be seen as highly > unstable. I say this just because I am restarting to work on > statistics and I am seeing massive refactoring going on. So if people > track development branches, they should be prepared for chaos ;) . > Which is exactly the opposite they should expect from the official > branch ;) We should probably all write something on the wiki page for our personal forks, describing what you're using it for, what at the main branches likely to be of interest etc. Peter From bartek at rezolwenta.eu.org Wed Mar 25 12:33:13 2009 From: bartek at rezolwenta.eu.org (Bartek Wilczynski) Date: Wed, 25 Mar 2009 17:33:13 +0100 Subject: [Biopython-dev] biopython on github In-Reply-To: <320fb6e00903250845u23dea2a6o5330bfdec0d577ef@mail.gmail.com> References: <320fb6e00903170146x59218aa0m857cab797ad3f440@mail.gmail.com> <20090317124930.GE57054@sobchak.mgh.harvard.edu> <6d941f120903241142m2f39213yfd180fcdc7ab7f0e@mail.gmail.com> <320fb6e00903241154g1a0f468cy512b29504b8b637a@mail.gmail.com> <6d941f120903250839i62f6d8f9i8a5f5b85ff694848@mail.gmail.com> <320fb6e00903250845u23dea2a6o5330bfdec0d577ef@mail.gmail.com> Message-ID: <8b34ec180903250933y5a4bdf6elae31f683d2848205@mail.gmail.com> 2009/3/25 Peter : > > We should probably all write something on the wiki page for our > personal forks, describing what you're using it for, what at the main > branches likely to be of interest etc. Hi, I'll be happy to write some draft version of guidelines for developers and contibutors to the wiki. It just seems that currently there are some problems with biopython wiki. Does anyone know what is the problem? Is it some kind of internal OBF issue or is it because of increased interest in biopython after the application note was published? Do we have access to any access statistics to the website? cheers Bartek From biopython at maubp.freeserve.co.uk Wed Mar 25 12:41:00 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 25 Mar 2009 16:41:00 +0000 Subject: [Biopython-dev] biopython on github In-Reply-To: <8b34ec180903250933y5a4bdf6elae31f683d2848205@mail.gmail.com> References: <320fb6e00903170146x59218aa0m857cab797ad3f440@mail.gmail.com> <20090317124930.GE57054@sobchak.mgh.harvard.edu> <6d941f120903241142m2f39213yfd180fcdc7ab7f0e@mail.gmail.com> <320fb6e00903241154g1a0f468cy512b29504b8b637a@mail.gmail.com> <6d941f120903250839i62f6d8f9i8a5f5b85ff694848@mail.gmail.com> <320fb6e00903250845u23dea2a6o5330bfdec0d577ef@mail.gmail.com> <8b34ec180903250933y5a4bdf6elae31f683d2848205@mail.gmail.com> Message-ID: <320fb6e00903250941o6e99e06egb672b62f2d661e15@mail.gmail.com> On Wed, Mar 25, 2009 at 4:33 PM, Bartek Wilczynski wrote: > > 2009/3/25 Peter : >> >> We should probably all write something on the wiki page for our >> personal forks, describing what you're using it for, what at the main >> branches likely to be of interest etc. > > Hi, > > I'll be happy to write some draft version of guidelines for developers > and contibutors to the wiki. Certainly add a section to the git migration page. > It just seems that currently there are some problems with biopython > wiki. Does anyone know what is the problem? > Is it some kind of internal OBF issue or is it because of increased > interest in biopython after the application note was > published? Do we have access to any access statistics to the website? Its seems to be all the OBF pages (e.g. bioperl.org too), and its been more than an hour so I'll drop their support team an email. Peter From sbassi at clubdelarazon.org Wed Mar 25 12:59:28 2009 From: sbassi at clubdelarazon.org (Sebastian Bassi) Date: Wed, 25 Mar 2009 13:59:28 -0300 Subject: [Biopython-dev] SeqIO and qual: Question about reading and writing qual files In-Reply-To: <320fb6e00903250301v59319214pa3246e0a49899e87@mail.gmail.com> References: <9e2f512b0903232324qb509c60v4154d3e1bffb089e@mail.gmail.com> <320fb6e00903240249h4d0bf648rfd5de741e582f687@mail.gmail.com> <9e2f512b0903240759n3c7f8b8fpc96bccd4d629082d@mail.gmail.com> <320fb6e00903240813x5fdb3589qef340129b5e267c0@mail.gmail.com> <9e2f512b0903240833g7768de97q8f10fe72cde7e64a@mail.gmail.com> <320fb6e00903250301v59319214pa3246e0a49899e87@mail.gmail.com> Message-ID: <9e2f512b0903250959h26081e4ak3246252d02be2ee0@mail.gmail.com> On Wed, Mar 25, 2009 at 7:01 AM, Peter wrote: .... > Sebastian - could you have a quick play with this github code (using the new > UnknownSeq class), and the current CVS code (using None), and make sure > both support the slicing operations you were trying earlier? Thanks. OK, I'll try both today and report back to the list. From eric.talevich at gmail.com Wed Mar 25 17:44:30 2009 From: eric.talevich at gmail.com (Eric Talevich) Date: Wed, 25 Mar 2009 17:44:30 -0400 Subject: [Biopython-dev] PDB tidy script In-Reply-To: <320fb6e00903231405l479ddcc6of9cd0c1aa8fd98d4@mail.gmail.com> References: <320fb6e00903231405l479ddcc6of9cd0c1aa8fd98d4@mail.gmail.com> Message-ID: <3f6baf360903251444l3064963bp788750ed7a67e4d4@mail.gmail.com> On Mon, Mar 23, 2009 at 5:05 PM, Peter wrote: > > If you look back over the history, there initially was no header parsing, > it was a contribution from Kristian Rother, and I would agree, it is rather > disjoint from the rest of the code. One thing I personally wanted last > time I was working with PDB files was to have secondary structure > information (for them alpha and beta sheet lines in the header) > mapped onto the residue objects automatically. > > And yes, Thomas is supporting the PDB module, but his time has > been rather limited of late. When I asked him about some of the > open enhancement requests in bugzilla recently (off list) he said > said we needed "a separate class to parse all the info in the header, > not a slew of additions to the core parser class (which is designed > to deal with the 3D data only)." > > I can understand both those wishes. Looking at the features currently available in the module, the best approach might be to leave the 3D parser and PDB.Entity-derived classes alone and add another wrapper class containing the header, sequence (maybe), secondary and tertiary structure as separate attributes. When working in the REPL, I've wished for a simpler function to load PDB files by path and figure out the name automatically; this would be an easy way to do it without violating Thomas's parser -- just use parse_pdb_header() in the wrapper, and use the name from there as the first argument to PDB.get_structure(). For example (quick & dirty): class PDBLoader: def __init__(self, path): self.__dict__ = parse_pdb_header(path) if not self.name: self.name = os.path.basename(path).split('.')[0] parse_3d = PDBParser() self.structure = parse_3d.get_structure(self.name, path) # self.secondary = ? # link 1/2/3ary data in various ways ... >>> pdb = PDBLoader('a_structure.pdb') >>> dir(pdb) ['__doc__', '__init__', '__module__', 'author', 'compound', 'deposition_date', 'head', 'journal_reference', 'name', 'release_date', 'resolution', 'source', 'structure', 'structure_method', 'structure_reference'] In that case, it would be reasonable to let get_structure and parse_pdb_header take an open file-like object as an alternative to the PDB file's path to avoid opening and closing the same file repeatedly. There's also some cleanup to do in parse_pdb_header.py alongside this. Does this sound reasonable? -Eric From chapmanb at 50mail.com Wed Mar 25 17:55:48 2009 From: chapmanb at 50mail.com (Brad Chapman) Date: Wed, 25 Mar 2009 17:55:48 -0400 Subject: [Biopython-dev] biopython on github In-Reply-To: <320fb6e00903250845u23dea2a6o5330bfdec0d577ef@mail.gmail.com> References: <320fb6e00903170146x59218aa0m857cab797ad3f440@mail.gmail.com> <20090317124930.GE57054@sobchak.mgh.harvard.edu> <6d941f120903241142m2f39213yfd180fcdc7ab7f0e@mail.gmail.com> <320fb6e00903241154g1a0f468cy512b29504b8b637a@mail.gmail.com> <6d941f120903250839i62f6d8f9i8a5f5b85ff694848@mail.gmail.com> <320fb6e00903250845u23dea2a6o5330bfdec0d577ef@mail.gmail.com> Message-ID: <20090325215548.GB21577@sobchak.mgh.harvard.edu> Hey all; Good discussion on this; I touch on a few points from different threads below. Michiel: > I haven't been following this topic closely, and as an "outsider" > using git seems more complicated than using cvs or svn. And to be > honest, I don't know if Biopython actually needs the branching and > forking stuff. I think that this is more useful for bigger projects, > where multiple developers may be working on interrelated parts of code > at the same time. That hardly ever happens in Biopython, though. Tiago: > I would actually take this argument and reverse it: [...] > Using a distributed technology allows for people to try new ideas and > to get things moving (while still maintaining an official rock stable > version with maybe glacial policies). I fall in between these two viewpoints. Git has more complications and, unless we manage those, we risk introducing additional barriers to contribution. Imagine looking at biopython on git hub and seeing 10 different branches for different users, many of which may be old and out of date. This could lead to the impression that we are not organized toward a single goal. If you are still interested, how do you know which ones could use your help and what they are for? The solution to this is documentation on the wiki. We rely too much on the mailing list and expect people to keep up. Peter read my mind on this: Peter: > We should probably all write something on the wiki page for our > personal forks, describing what you're using it for, what at the main > branches likely to be of interest etc. I started a page over the weekend doing this: http://biopython.org/wiki/Active_projects It's a skeleton so add or subtract away. My idea for this is that it is for longer projects that could use outside help. It's not reasonable to spend time writing up things you'll be finishing in a week or so; for that bugzilla does fine keeping interested parties up to date. Another idea on this page is a specific wish list of libraries for future work. This is a starting point for anyone who comes into Biopython fresh and would like to take something on. Also, it encourages people who have developed external libraries to deal with problems we are interested in to consider folding them into Biopython. Me: > > There is a lot of good material in this thread for new potential > > developers. Tiago, it would make sense to condense what you've > > written and include it with the Contributing guide: Tiago: > Just a followup on this: I think it makes no sense to put much of the > new content before there is an official step of moving to github. We are serious about moving to Git and need to have the documentation in place so others can learn it. You wrote up a lot of good stuff, and it will be lost on the mailing list. Brad From bugzilla-daemon at portal.open-bio.org Wed Mar 25 18:43:57 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 25 Mar 2009 18:43:57 -0400 Subject: [Biopython-dev] [Bug 2799] UnknownSeq object (e.g. for QUAL files) In-Reply-To: Message-ID: <200903252243.n2PMhvoT007523@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2799 ------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2009-03-25 18:43 EST ------- I've made my first attempt at this available as a personal branch on github, http://github.com/peterjc/biopython/tree/bug2799-UnknownSeq -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From sbassi at clubdelarazon.org Wed Mar 25 19:15:05 2009 From: sbassi at clubdelarazon.org (Sebastian Bassi) Date: Wed, 25 Mar 2009 20:15:05 -0300 Subject: [Biopython-dev] SeqIO and qual: Question about reading and writing qual files In-Reply-To: <320fb6e00903250301v59319214pa3246e0a49899e87@mail.gmail.com> References: <9e2f512b0903232324qb509c60v4154d3e1bffb089e@mail.gmail.com> <320fb6e00903240249h4d0bf648rfd5de741e582f687@mail.gmail.com> <9e2f512b0903240759n3c7f8b8fpc96bccd4d629082d@mail.gmail.com> <320fb6e00903240813x5fdb3589qef340129b5e267c0@mail.gmail.com> <9e2f512b0903240833g7768de97q8f10fe72cde7e64a@mail.gmail.com> <320fb6e00903250301v59319214pa3246e0a49899e87@mail.gmail.com> Message-ID: <9e2f512b0903251615x7c14c90en3b3a9b2b6ff86186@mail.gmail.com> On Wed, Mar 25, 2009 at 7:01 AM, Peter wrote: > Sebastian - could you have a quick play with this github code (using the new > UnknownSeq class), and the current CVS code (using None), and make sure > both support the slicing operations you were trying earlier? Thanks. First I tried the CVS code (with None in seq), it worked. Then I tried the git code and it also worked. One thing I noticed is that I got "?" instead of "N" the "sequence" of the UnknownSeq. >From a practical point of view, both versions are the same, but the concept of UnknownSeq looks solid than None, because if I don't know about about biopython internals, I would never try to slice a None seq. With "None": len(s) returns: Traceback (most recent call last): File "/home/sbassi/bioinfo/INTA/qualparser.py", line 21, in print len(s) File "/home/sbassi/test/virtualenv-1.3.2/t6/lib/python2.5/site-packages/biopython-1.49-py2.5-linux-i686.egg/Bio/SeqRecord.py", line 481, in __len__ return len(self.seq) TypeError: object of type 'NoneType' has no len() So I would never try to do: new_s = s[10:30] But with the UnknownSeq object, len(s) returns an actual length, so it is more intuitive that it can be sliced. I liked the github interface, may I setup my own repository? Best, -- Sebasti?n Bassi. Diplomado en Ciencia y Tecnolog?a. Non standard disclaimer: READ CAREFULLY. By reading this email, you agree, on behalf of your employer, to release me from all obligations and waivers arising from any and all NON-NEGOTIATED agreements, licenses, terms-of-service, shrinkwrap, clickwrap, browsewrap, confidentiality, non-disclosure, non-compete and acceptable use policies ("BOGUS AGREEMENTS") that I have entered into with your employer, its partners, licensors, agents and assigns, in perpetuity, without prejudice to my ongoing rights and privileges. You further represent that you have the authority to release me from any BOGUS AGREEMENTS on behalf of your employer. From biopython at maubp.freeserve.co.uk Wed Mar 25 19:30:14 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 25 Mar 2009 23:30:14 +0000 Subject: [Biopython-dev] SeqIO and qual: Question about reading and writing qual files In-Reply-To: <9e2f512b0903251615x7c14c90en3b3a9b2b6ff86186@mail.gmail.com> References: <9e2f512b0903232324qb509c60v4154d3e1bffb089e@mail.gmail.com> <320fb6e00903240249h4d0bf648rfd5de741e582f687@mail.gmail.com> <9e2f512b0903240759n3c7f8b8fpc96bccd4d629082d@mail.gmail.com> <320fb6e00903240813x5fdb3589qef340129b5e267c0@mail.gmail.com> <9e2f512b0903240833g7768de97q8f10fe72cde7e64a@mail.gmail.com> <320fb6e00903250301v59319214pa3246e0a49899e87@mail.gmail.com> <9e2f512b0903251615x7c14c90en3b3a9b2b6ff86186@mail.gmail.com> Message-ID: <320fb6e00903251630t45da293fl4d8d111b7e7eedc9@mail.gmail.com> On Wed, Mar 25, 2009 at 11:15 PM, Sebastian Bassi: >> Sebastian - could you have a quick play with this github code (using the new >> UnknownSeq class), and the current CVS code (using None), and make sure >> both support the slicing operations you were trying earlier? ?Thanks. > > First I tried the CVS code (with None in seq), it worked. OK, good. That will do in the very short term - the UnknownSeq needs some more testing and general approval before I'd check that in. > Then I tried the git code and it also worked. One thing I noticed is > that I got "?" instead of "N" the "sequence" of the UnknownSeq. I felt we shouldn't use an "N" unless we are confident the sequence is nucleotides. In practice, this is probably a safe assumption for FASTQ and QUAL files - unless anyone can think of a counter example? Do you think it is safe to assume FASTQ and QUAL files are just for nucleotides? I mean, you could translate a CDS from transcriptome sequencing, and for the sake of argument give each amino acid a quality score from the three nucleotide quality scores, and then save this a protein FASTQ file. But I've never heard of anyone actually doing this ;) > From a practical point of view, both versions are the same, but the > concept of UnknownSeq looks solid than None, because if I don't know > about about biopython internals, I would never try to slice a None > seq. With "None": > len(s) returns: > > Traceback (most recent call last): > ... > TypeError: object of type 'NoneType' has no len() > > So I would never try to do: > new_s = s[10:30] > > But with the UnknownSeq object, len(s) returns an actual length, so it > is more intuitive that it can be sliced. I agree the UnknownSeq is more intuitive - plus it makes the SeqRecord __getitem__ code nicer, and it means you can do len(SeqRecord) too, which was problematic if the sequence was None. > > I liked the github interface, may I setup my own repository? > Yes - this is one of the nice things about git, it makes it easy for anyone to make their own local branch of Biopython, but keep it under version control and pull in changes from the master branch (or another git user) quite easily. It should also make it easy to offer changes back to the main project (assuming we do switch to hosting it on git, for now it is still being done via CVS). However, bear in mind this is still only a test migration, and it is still possible we'll have to redo the CVS to git migration. There is a long (and on going) thread on this mailing list about all this already, with an evolving wiki page: http://biopython.org/wiki/GitMigration Peter From bartek at rezolwenta.eu.org Wed Mar 25 21:02:59 2009 From: bartek at rezolwenta.eu.org (Bartek Wilczynski) Date: Thu, 26 Mar 2009 02:02:59 +0100 Subject: [Biopython-dev] biopython on github In-Reply-To: <20090325215548.GB21577@sobchak.mgh.harvard.edu> References: <320fb6e00903170146x59218aa0m857cab797ad3f440@mail.gmail.com> <20090317124930.GE57054@sobchak.mgh.harvard.edu> <6d941f120903241142m2f39213yfd180fcdc7ab7f0e@mail.gmail.com> <320fb6e00903241154g1a0f468cy512b29504b8b637a@mail.gmail.com> <6d941f120903250839i62f6d8f9i8a5f5b85ff694848@mail.gmail.com> <320fb6e00903250845u23dea2a6o5330bfdec0d577ef@mail.gmail.com> <20090325215548.GB21577@sobchak.mgh.harvard.edu> Message-ID: <8b34ec180903251802h30661c80q51aab573f5c07c5@mail.gmail.com> On Wed, Mar 25, 2009 at 10:55 PM, Brad Chapman wrote: > Hey all; > Good discussion on this; I touch on a few points from different > threads below. > Indeed, I'm very happy that we got the ball rolling and more people now take part in the discussion. > I fall in between these two viewpoints. Git has more complications and, > unless we manage those, we risk introducing additional barriers to > contribution. Imagine looking at biopython on git hub and seeing 10 > different branches for different users, many of which may be old and > out of date. This could lead to the impression that we are not > organized toward a single goal. If you are still interested, how > do you know which ones could use your help and what they are for? > > The solution to this is documentation on the wiki. We rely too much on > the mailing list and expect people to keep up. Peter read my mind on > this: > > Peter: >> We should probably all write something on the wiki page for our >> personal forks, describing what you're using it for, what at the main >> branches likely to be of interest etc. > > I started a page over the weekend doing this: > > http://biopython.org/wiki/Active_projects > > It's a skeleton so add or subtract away. My idea for this is that it > is for longer projects that could use outside help. It's not reasonable > to spend time writing up things you'll be finishing in a week or so; for > that bugzilla does fine keeping interested parties up to date. > > Another idea on this page is a specific wish list of libraries for > future work. This is a starting point for anyone who comes into > Biopython fresh and would like to take something on. Also, it encourages > people who have developed external libraries to deal with problems we > are interested in to consider folding them into Biopython. Great ideas. I fully agree that we need clear documentation if we want more people to contribute. > > Me: >> > There is a lot of good material in this thread for new potential >> > developers. Tiago, it would make sense to condense what you've >> > written and include it with the Contributing guide: > > Tiago: >> Just a followup on this: I think it makes no sense to put much of the >> new content before there is an official step of moving to github. > > We are serious about moving to Git and need to have the documentation in > place so others can learn it. You wrote up a lot of good stuff, and it > will be lost on the mailing list. Continuing on that topic. I think there are three (more or less separate) issues here: 1) Describing git usage technically, to make sure all developers have a smooth transition to git from CVS 2) Describing typical ways to use git in biopython. This is very important to calrify how we are going to use cool features of git/github in biopython. I'm not advocating here to write it very precisely and I'm fully aware that it's going to change over time as we learn to use things better, but writing things up will help us understand how we want to use git/github. 3) General contributing guide with coding style and testing framework etc. I think that point 3 is quite well separated from the other two points, which are more git related. I think it is also nicely handled by the current wiki page: http://biopython.org/wiki/Contributing. It might be mildly adapted to include some info on git branches, but these will be minor things. Points 1 and 2 are not so easily separable, but I don't think it's a major problem. Current version of the http://biopython.org/wiki/GitMigration touches upon them, but it is meant as a temporary info, so it does not describe how things should be done after we really make the switch. I think we need to spearate these issues (temporary arrangements vs. final desired procedures), so I made a new wiki page: http://biopython.org/wiki/GitUsage which is meant as an early draft of such guidelines. This page is meant to serve as a technical tutorial describing typical tasks in biopython development. Please feel free to modify/expand this page and/or send comments to the mailing list. I've tried to keep it close to our current development model, but there is a lot of room for discussion and I'm very open to new ideas. cheers Bartek From lpritc at scri.ac.uk Thu Mar 26 07:21:26 2009 From: lpritc at scri.ac.uk (Leighton Pritchard) Date: Thu, 26 Mar 2009 11:21:26 +0000 Subject: [Biopython-dev] Biopython on Twitter Message-ID: Hi all, There's a fair old bit of chatter on the latest bandwagon: Twitter, about Biopython (http://search.twitter.com/search?max_id=1393366734&page=1&q=biopython). Seeing as both BioPerl and the OBF have 'official' Twitter accounts, it might be useful to have a Biopython Twitter account as a way of getting news out automatically (there's a python-twitter API: http://code.google.com/p/python-twitter/), and as a way of facilitating conversation or community around Biopython - suitable representatives of the official edifice/holders of the password no doubt to be discussed ;) Anyhoo, to avoid it being squatted in the interim, I've set up an account in Biopython's name, with Peter's email account (thanks, Peter) - he also knows the password. If no-one likes the idea or thinks it worthwhile, or Twitter goes the way of Gopher and OS/2 Warp in short order, it can just die on the vine - but given the number of tweets mentioning Biopython, it would be a shame for that to happen too soon ;) The Biopython Twitter home page is at http://twitter.com/Biopython L. -- Dr Leighton Pritchard MRSC D131, Plant Pathology Programme, SCRI Errol Road, Invergowrie, Perth and Kinross, Scotland, DD2 5DA e:lpritc at scri.ac.uk w:http://www.scri.ac.uk/staff/leightonpritchard gpg/pgp: 0xFEFC205C tel:+44(0)1382 562731 x2405 ______________________________________________________ SCRI, Invergowrie, Dundee, DD2 5DA. The Scottish Crop Research Institute is a charitable company limited by guarantee. Registered in Scotland No: SC 29367. Recognised by the Inland Revenue as a Scottish Charity No: SC 006662. DISCLAIMER: This email is from the Scottish Crop Research Institute, but the views expressed by the sender are not necessarily the views of SCRI and its subsidiaries. This email and any files transmitted with it are confidential to the intended recipient at the e-mail address to which it has been addressed. It may not be disclosed or used by any other than that addressee. If you are not the intended recipient you are requested to preserve this confidentiality and you must not use, disclose, copy, print or rely on this e-mail in any way. Please notify postmaster at scri.ac.uk quoting the name of the sender and delete the email from your system. Although SCRI has taken reasonable precautions to ensure no viruses are present in this email, neither the Institute nor the sender accepts any responsibility for any viruses, and it is your responsibility to scan the email and the attachments (if any). ______________________________________________________ From tiagoantao at gmail.com Thu Mar 26 08:13:20 2009 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Thu, 26 Mar 2009 12:13:20 +0000 Subject: [Biopython-dev] biopython on github In-Reply-To: <20090325215548.GB21577@sobchak.mgh.harvard.edu> References: <320fb6e00903170146x59218aa0m857cab797ad3f440@mail.gmail.com> <20090317124930.GE57054@sobchak.mgh.harvard.edu> <6d941f120903241142m2f39213yfd180fcdc7ab7f0e@mail.gmail.com> <320fb6e00903241154g1a0f468cy512b29504b8b637a@mail.gmail.com> <6d941f120903250839i62f6d8f9i8a5f5b85ff694848@mail.gmail.com> <320fb6e00903250845u23dea2a6o5330bfdec0d577ef@mail.gmail.com> <20090325215548.GB21577@sobchak.mgh.harvard.edu> Message-ID: <6d941f120903260513v734b5dd8kd8d148bebec9674b@mail.gmail.com> Hi, On Wed, Mar 25, 2009 at 9:55 PM, Brad Chapman wrote: > The solution to this is documentation on the wiki. We rely too much on > the mailing list and expect people to keep up. Peter read my mind on > this: I fully agree on this. There is lots of implicit policy that is either not documented at all or only to be read here on the mailing list. All should be on the wiki. Clear, transparent, explicit, for everybody to see (at least that is my personal opinion). > We are serious about moving to Git and need to have the documentation in > place so others can learn it. You wrote up a lot of good stuff, and it > will be lost on the mailing list. I am planning on changing http://biopython.org/wiki/PopGen_dev and "GITify" it completely. I will draft a document with a policy for updates (just as a starting point, please feel free to disagree), the currently existing branches and so on. I will include a set of tips on how to pull stuff from GIT, regarding this part I note: a. maybe this can be moved, in the future, to the general biopython documentaion b. I am far from being a git specialist. Corrections will surely be needed and encouraged. I will write back here when the changes are done. Tiago From jblanca at btc.upv.es Thu Mar 26 08:24:59 2009 From: jblanca at btc.upv.es (Jose Blanca) Date: Thu, 26 Mar 2009 13:24:59 +0100 Subject: [Biopython-dev] [BioPython] about the SeqRecord slicing In-Reply-To: <320fb6e00903260505j387279b7kfa4c69c33efe5487@mail.gmail.com> References: <200903261248.02279.jblanca@btc.upv.es> <320fb6e00903260505j387279b7kfa4c69c33efe5487@mail.gmail.com> Message-ID: <200903261324.59655.jblanca@btc.upv.es> Fisrt of all sorry for sending the last mail to the BioPython general list. On Thursday 26 March 2009 13:05:25 Peter wrote: > Can you give me an example of where you want to pull out a single > character from a SeqRecord, and its quality? ?I would consider things > like this quite elegant: > > for letter, quality in zip(record.seq, > record.letter_annotations("phred_quality") : > ? ?#do stuff I'm implementing a Contig class similar to the Alignment class but with the added capability of supporting sequences that do not start and end at the same position and with the capability of masking the sequences. I'm implementing the __getitem__ method. When I request a column I get for all sequences a int slice and I return the result of adding them all. I could solve the problem as you suggest. The problem is that this Contig class can work also with Seqs and strs (to simplify its use when we don't need a full SeqRecord). If SeqRecord behaves more like a Seq or a str I wouldn't need to check for the special SeqRecord case in the Contig.__getitem__ method. Best regards, -- Jose M. Blanca Postigo Instituto Universitario de Conservacion y Mejora de la Agrodiversidad Valenciana (COMAV) Universidad Politecnica de Valencia (UPV) Edificio CPI (Ciudad Politecnica de la Innovacion), 8E 46022 Valencia (SPAIN) Tlf.:+34-96-3877000 (ext 88473) From chapmanb at 50mail.com Thu Mar 26 08:57:07 2009 From: chapmanb at 50mail.com (Brad Chapman) Date: Thu, 26 Mar 2009 08:57:07 -0400 Subject: [Biopython-dev] biopython on github In-Reply-To: <8b34ec180903251802h30661c80q51aab573f5c07c5@mail.gmail.com> References: <320fb6e00903170146x59218aa0m857cab797ad3f440@mail.gmail.com> <20090317124930.GE57054@sobchak.mgh.harvard.edu> <6d941f120903241142m2f39213yfd180fcdc7ab7f0e@mail.gmail.com> <320fb6e00903241154g1a0f468cy512b29504b8b637a@mail.gmail.com> <6d941f120903250839i62f6d8f9i8a5f5b85ff694848@mail.gmail.com> <320fb6e00903250845u23dea2a6o5330bfdec0d577ef@mail.gmail.com> <20090325215548.GB21577@sobchak.mgh.harvard.edu> <8b34ec180903251802h30661c80q51aab573f5c07c5@mail.gmail.com> Message-ID: <20090326125707.GE21577@sobchak.mgh.harvard.edu> Hi all; Bartek: > Continuing on that topic. I think there are three (more or less > separate) issues here: > 1) Describing git usage technically, to make sure all developers have > a smooth transition to git from CVS > 2) Describing typical ways to use git in biopython. [...] > 3) General contributing guide with coding style and testing framework etc. > > I think that point 3 is quite well separated from the other two > points, which are more git related. I think it is also nicely handled > by the current wiki page: http://biopython.org/wiki/Contributing. [...] > Points 1 and 2 are not so easily separable, but I don't think it's a > major problem. Current version of the > http://biopython.org/wiki/GitMigration > touches upon them, but it is meant as a temporary info, so it does > not describe how things should be done after we really make the > switch. I think we need to spearate these issues (temporary > arrangements vs. final desired procedures), so I made a new wiki page: > http://biopython.org/wiki/GitUsage > which is meant as an early draft of such guidelines. This page is > meant to serve as a technical tutorial describing typical tasks in > biopython development. Great writeup, and I agree with you on everything up until the last point. Why do we need two pages with overlapping information? This means we have to do more work to keep them in sync and creates confusion. GitMigration is/was our documentation page. If it is the name that makes it seem temporary, we should kill GitMigration and re-route all wiki links to GitUsage. Then we can continue forward with getting the documentation up to par on GitUsage. Having the disclaimer that the page and migration is in process is enough of a warning. When we move to git permanently, we can just remove the warnings, update the final links and we will be done. Brad From tiagoantao at gmail.com Thu Mar 26 09:09:31 2009 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Thu, 26 Mar 2009 13:09:31 +0000 Subject: [Biopython-dev] biopython on github In-Reply-To: <20090326125707.GE21577@sobchak.mgh.harvard.edu> References: <320fb6e00903170146x59218aa0m857cab797ad3f440@mail.gmail.com> <20090317124930.GE57054@sobchak.mgh.harvard.edu> <6d941f120903241142m2f39213yfd180fcdc7ab7f0e@mail.gmail.com> <320fb6e00903241154g1a0f468cy512b29504b8b637a@mail.gmail.com> <6d941f120903250839i62f6d8f9i8a5f5b85ff694848@mail.gmail.com> <320fb6e00903250845u23dea2a6o5330bfdec0d577ef@mail.gmail.com> <20090325215548.GB21577@sobchak.mgh.harvard.edu> <8b34ec180903251802h30661c80q51aab573f5c07c5@mail.gmail.com> <20090326125707.GE21577@sobchak.mgh.harvard.edu> Message-ID: <6d941f120903260609q247ad2b0o4c810fa7afda7449@mail.gmail.com> I've added some text regarding git on http://biopython.org/wiki/PopGen_dev (see "Code and Contributing" and "Existing Development branches"). Feel free to criticise. I've included a link to the wonderful GitUsage page Giovanni: if you feel that I've deleted/changed something I should not have, please say. On Thu, Mar 26, 2009 at 12:57 PM, Brad Chapman wrote: > Hi all; > > Bartek: >> Continuing on that topic. I think there are three (more or less >> separate) issues here: >> 1) Describing git usage technically, to make sure all developers have >> a smooth transition to git from CVS >> 2) Describing typical ways to use git in biopython. > [...] >> 3) General contributing guide with coding style and testing framework etc. >> >> I think that point 3 is quite well separated from the other two >> points, which are more git related. I think it is also nicely handled >> by the current wiki page: http://biopython.org/wiki/Contributing. > [...] >> Points 1 and 2 are not so easily separable, but I don't think it's a >> major problem. Current version of the >> http://biopython.org/wiki/GitMigration >> ?touches upon them, but it is meant as a temporary info, so it does >> not describe how things should be done after we really make the >> switch. I think we need to spearate these issues (temporary >> arrangements vs. final desired procedures), so I made a new wiki page: >> ?http://biopython.org/wiki/GitUsage >> which is meant as an early draft of such guidelines. This page is >> meant to serve as a technical tutorial describing typical tasks in >> biopython development. > > Great writeup, and I agree with you on everything up until the last > point. Why do we need two pages with overlapping information? This > means we have to do more work to keep them in sync and creates confusion. > GitMigration is/was our documentation page. If it is the name that > makes it seem temporary, we should kill GitMigration and re-route all > wiki links to GitUsage. Then we can continue forward with getting > the documentation up to par on GitUsage. > > Having the disclaimer that the page and migration is in process is > enough of a warning. When we move to git permanently, we can just > remove the warnings, update the final links and we will be done. > > Brad > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev > -- "A man who dares to waste one hour of time has not discovered the value of life" - Charles Darwin From bartek at rezolwenta.eu.org Thu Mar 26 10:49:54 2009 From: bartek at rezolwenta.eu.org (Bartek Wilczynski) Date: Thu, 26 Mar 2009 15:49:54 +0100 Subject: [Biopython-dev] biopython on github In-Reply-To: <20090326125707.GE21577@sobchak.mgh.harvard.edu> References: <320fb6e00903170146x59218aa0m857cab797ad3f440@mail.gmail.com> <20090317124930.GE57054@sobchak.mgh.harvard.edu> <6d941f120903241142m2f39213yfd180fcdc7ab7f0e@mail.gmail.com> <320fb6e00903241154g1a0f468cy512b29504b8b637a@mail.gmail.com> <6d941f120903250839i62f6d8f9i8a5f5b85ff694848@mail.gmail.com> <320fb6e00903250845u23dea2a6o5330bfdec0d577ef@mail.gmail.com> <20090325215548.GB21577@sobchak.mgh.harvard.edu> <8b34ec180903251802h30661c80q51aab573f5c07c5@mail.gmail.com> <20090326125707.GE21577@sobchak.mgh.harvard.edu> Message-ID: <8b34ec180903260749q2b59594fo1d34cd1f721ff3b7@mail.gmail.com> Hi, On Thu, Mar 26, 2009 at 1:57 PM, Brad Chapman wrote: > Great writeup, and I agree with you on everything up until the last > point. Why do we need two pages with overlapping information? This > means we have to do more work to keep them in sync and creates confusion. > GitMigration is/was our documentation page. If it is the name that > makes it seem temporary, we should kill GitMigration and re-route all > wiki links to GitUsage. Then we can continue forward with getting > the documentation up to par on GitUsage. > > Having the disclaimer that the page and migration is in process is > enough of a warning. When we move to git permanently, we can just > remove the warnings, update the final links and we will be done. > I agree that two pages with mostly the same stuff is too much. My original idea was to first extract the "non-temporary" info from the GitMigration page and expand it into the GitUsage page. It needs a lot of work but at least the extraction part is don. Now I would suggest not to kill the GitMigration, but to remove most things from it and just leave the stuff relevant for the (hopefully not too long) transitional period. After a second of thought I decided to go ahead and change the GitMigration so that it does not overlap with GitUsage. See for yourself here: http://biopython.org/wiki/GitMigration We can revert the changes if people don't like it. cheers Bartek From biopython at maubp.freeserve.co.uk Thu Mar 26 11:07:33 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 26 Mar 2009 15:07:33 +0000 Subject: [Biopython-dev] [BioPython] about the SeqRecord slicing In-Reply-To: <200903261324.59655.jblanca@btc.upv.es> References: <200903261248.02279.jblanca@btc.upv.es> <320fb6e00903260505j387279b7kfa4c69c33efe5487@mail.gmail.com> <200903261324.59655.jblanca@btc.upv.es> Message-ID: <320fb6e00903260807m64d36b55n41cce7510a6809e3@mail.gmail.com> On Thu, Mar 26, 2009 at 12:24 PM, Jose Blanca wrote: > On Thursday 26 March 2009 13:05:25 Peter wrote: >> Can you give me an example of where you want to pull out a single >> character from a SeqRecord, and its quality? ?I would consider things >> like this quite elegant: >> >> for letter, quality in zip(record.seq, >> record.letter_annotations("phred_quality") : >> ? ?#do stuff > > I'm implementing a Contig class similar to the Alignment class but with the > added capability of supporting sequences that do not start and end at the > same position and with the capability of masking the sequences. > I'm implementing the __getitem__ method. > When I request a column I get for all sequences a int slice and I return the > result of adding them all. I could solve the problem as you suggest. The > problem is that this Contig class can work also with Seqs and strs (to > simplify its use when we don't need a full SeqRecord). If SeqRecord behaves > more like a Seq or a str I wouldn't need to check for the special SeqRecord > case in the Contig.__getitem__ method. > Best regards, If you pull out a column from a Seq or string based alignment, there is no annotation to worry about, and you can return the column as a Seq or string. As things stand, if it was a SeqRecord based alignment, having my_string[i], my_seq[i] and my_seqrecord[i] all return a single letter string is actually rather nice for generic code - as long as you are happy returning a Seq or a string for the column. However, if I understand you, when pulling a column from a SeqRecord based alignment in addition to the column's sequence you'd like the get the per-letter-annotations as well. This assumes that all the SeqRecord objects in the alignment have the same per-letter-annotation present - some might have quality and others might not! But how would you want to store this new column object? Using a string or a Seq doesn't support any annotation - you *could* use a SeqRecord with no id, name, description, features, annotation - just a sequence and any common per-letter-annotation. Is this what you had in mind? Peter From jblanca at btc.upv.es Thu Mar 26 11:14:13 2009 From: jblanca at btc.upv.es (Jose Blanca) Date: Thu, 26 Mar 2009 16:14:13 +0100 Subject: [Biopython-dev] [BioPython] about the SeqRecord slicing In-Reply-To: <320fb6e00903260807m64d36b55n41cce7510a6809e3@mail.gmail.com> References: <200903261248.02279.jblanca@btc.upv.es> <200903261324.59655.jblanca@btc.upv.es> <320fb6e00903260807m64d36b55n41cce7510a6809e3@mail.gmail.com> Message-ID: <200903261614.13454.jblanca@btc.upv.es> On Thursday 26 March 2009 16:07:33 Peter wrote: > On Thu, Mar 26, 2009 at 12:24 PM, Jose Blanca wrote: > > On Thursday 26 March 2009 13:05:25 Peter wrote: > However, if I understand you, when pulling a column from a SeqRecord > based alignment in addition to the column's sequence you'd like the get the > per-letter-annotations as well. This assumes that all the SeqRecord > objects in the alignment have the same per-letter-annotation present - some > might have quality and others might not! But how would you want to store > this new column object? Using a string or a Seq doesn't support any > annotation - you *could* use a SeqRecord with no id, name, description, > features, annotation - just a sequence and any common > per-letter-annotation. Is this what you had in mind? Yes, that's exactly what I have in mind. Do you see any problem with that approach? -- Jose M. Blanca Postigo Instituto Universitario de Conservacion y Mejora de la Agrodiversidad Valenciana (COMAV) Universidad Politecnica de Valencia (UPV) Edificio CPI (Ciudad Politecnica de la Innovacion), 8E 46022 Valencia (SPAIN) Tlf.:+34-96-3877000 (ext 88473) From biopython at maubp.freeserve.co.uk Thu Mar 26 11:32:23 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 26 Mar 2009 15:32:23 +0000 Subject: [Biopython-dev] [BioPython] about the SeqRecord slicing In-Reply-To: <200903261614.13454.jblanca@btc.upv.es> References: <200903261248.02279.jblanca@btc.upv.es> <200903261324.59655.jblanca@btc.upv.es> <320fb6e00903260807m64d36b55n41cce7510a6809e3@mail.gmail.com> <200903261614.13454.jblanca@btc.upv.es> Message-ID: <320fb6e00903260832m65c6888dpc856d033ceceda5@mail.gmail.com> On Thu, Mar 26, 2009 at 3:14 PM, Jose Blanca wrote: > On Thursday 26 March 2009 16:07:33 Peter wrote: >> However, if I understand you, when pulling a column from a SeqRecord >> based alignment in addition to the column's sequence you'd like the get the >> per-letter-annotations as well. ?This assumes that all the SeqRecord >> objects in the alignment have the same per-letter-annotation present - some >> might have quality and others might not! ?But how would you want to store >> this new column object? ?Using a string or a Seq doesn't support any >> annotation - you *could* use a SeqRecord with no id, name, description, >> features, annotation - just a sequence and any common >> per-letter-annotation. ?Is this what you had in mind? > > Yes, that's exactly what I have in mind. Do you see any problem with that > approach? Well yes. For your code to work on SeqRecord objects (based on the verbal description earlier), it needs at least the following changes to the SeqRecord: The SeqRecord __getitem__ would have to return a SeqRecord when given a single integer index, holding a single letter sequence. What about the name/id/description and annotations (e.g. organism) - do they really apply to a single letter from the sequence? Technically writing the code to offer this isn't such a problem, but I am unconvinced this is the best behaviour for normal usage. Also closely related to this, what would you expect __iter__ to iterate over? Currently it acts like iteration over the record's sequence. You'd also want the SeqRecord to support __add__ (and __radd__) so that two SeqRecord objects can be added together. I have thought about this before, and it is a *much* more complicated issue due to the meta data. In general the only safe and unambiguous choice is to exclude it from the combined record: * sequence - just add (using normal rules for adding Seq objects) * name/id/description - if the two agree, use that? Otherwise default to a blank value? * annotations - for each keyed value, you could combine the entries? Or just throwing them all away? * letter_annotations - if an entry is present in both you can combine it. Otherwise throw them away? * features - these could be combined, adjusting the locations for one record's features as appropriate I'm not ruling out adding SeqRecord addition, but I don't want to rush it while we are trying to get Biopython 1.50 done. Peter From biopython at maubp.freeserve.co.uk Thu Mar 26 11:49:49 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 26 Mar 2009 15:49:49 +0000 Subject: [Biopython-dev] Biopython on Twitter In-Reply-To: References: Message-ID: <320fb6e00903260849n683d3e39kf68fd91727970dc7@mail.gmail.com> On Thu, Mar 26, 2009 at 11:21 AM, Leighton Pritchard wrote: > Hi all, > > There's a fair old bit of chatter on the latest bandwagon: Twitter, about > Biopython > (http://search.twitter.com/search?max_id=1393366734&page=1&q=biopython). > Seeing as both BioPerl and the OBF have 'official' Twitter accounts, it > might be useful to have a Biopython Twitter account as a way of getting news > out automatically (there's a python-twitter API: > http://code.google.com/p/python-twitter/), and as a way of facilitating > conversation or community around Biopython - suitable representatives of the > official edifice/holders of the password no doubt to be discussed ;) > > Anyhoo, to avoid it being squatted in the interim, I've set up an account in > Biopython's name, with Peter's email account (thanks, Peter) - he also knows > the password. > > If no-one likes the idea or thinks it worthwhile, or Twitter goes the way of > Gopher and OS/2 Warp in short order, it can just die on the vine - but given > the number of tweets mentioning Biopython, it would be a shame for that to > happen too soon ;) > > The Biopython Twitter home page is at http://twitter.com/Biopython Quite a few people have started following this already - which is fun. I see the OBF news page entries are automatically pushed to their twitter account, http://twitter.com/obf_news plus the BioPerl tagged entries are also pushed to http://twitter.com/bioperl - I'll get in touch to see how they did it so we can have the Biopython news feed automatically echoed to twitter as well. This servers as a good point to remind/inform you that there are RSS, Atom etc feeds for the Biopython news - links on http://biopython.org/wiki/News e.g. http://news.open-bio.org/news/category/obf-projects/biopython/feed/rdf http://news.open-bio.org/news/category/obf-projects/biopython/feed/rss http://news.open-bio.org/news/category/obf-projects/biopython/feed/rss2 http://news.open-bio.org/news/category/obf-projects/biopython/feed/atom We could probably also echo the CVS (or git) RSS feed into twitter, but I suspect that would drown out any more interesting tweets. The RSS feed is listed on http://biopython.org/wiki/CVS and shown on the wiki too at: http://biopython.org/wiki/Tracking_CVS_commits (not sure how often this gets updated). The feed itself is here: http://biopython.open-bio.org/CVS2RSS/biopython.rss Peter From lpritc at scri.ac.uk Thu Mar 26 12:31:07 2009 From: lpritc at scri.ac.uk (Leighton Pritchard) Date: Thu, 26 Mar 2009 16:31:07 +0000 Subject: [Biopython-dev] Biopython on Twitter In-Reply-To: <320fb6e00903260849n683d3e39kf68fd91727970dc7@mail.gmail.com> Message-ID: Hi all, It's great to see that people have picked up on the Biopython Twitter account already - I hope that it proves useful in the longer term. Regarding the social etiquette of Twitter, and the ease with which 'following' can be taken to imply 'approval' I wonder if it would be a good policy to restrict the Twitter accounts that Biopython follows only to those representing organisations or groups. Following some individuals and not others might be seen to privilege a self-selecting group, cabal or 'elite', even the accidental suggestion of which I think would be best avoided. On 26/03/2009 15:49, "Peter" wrote: > > Quite a few people have started following this already - which is fun. I see > the OBF news page entries are automatically pushed to their twitter account, > http://twitter.com/obf_news plus the BioPerl tagged entries are also pushed > to http://twitter.com/bioperl - I'll get in touch to see how they did > it so we can [...] > We could probably also echo the CVS (or git) RSS feed into twitter, but I > suspect that would drown out any more interesting tweets. Signal to noise is apparently not an issue that bothers very many Tweeters, but I see no harm in starting a trend ;) L. -- Dr Leighton Pritchard MRSC D131, Plant Pathology Programme, SCRI Errol Road, Invergowrie, Perth and Kinross, Scotland, DD2 5DA e:lpritc at scri.ac.uk w:http://www.scri.ac.uk/staff/leightonpritchard gpg/pgp: 0xFEFC205C tel:+44(0)1382 562731 x2405 ______________________________________________________ SCRI, Invergowrie, Dundee, DD2 5DA. The Scottish Crop Research Institute is a charitable company limited by guarantee. Registered in Scotland No: SC 29367. Recognised by the Inland Revenue as a Scottish Charity No: SC 006662. DISCLAIMER: This email is from the Scottish Crop Research Institute, but the views expressed by the sender are not necessarily the views of SCRI and its subsidiaries. This email and any files transmitted with it are confidential to the intended recipient at the e-mail address to which it has been addressed. It may not be disclosed or used by any other than that addressee. If you are not the intended recipient you are requested to preserve this confidentiality and you must not use, disclose, copy, print or rely on this e-mail in any way. Please notify postmaster at scri.ac.uk quoting the name of the sender and delete the email from your system. Although SCRI has taken reasonable precautions to ensure no viruses are present in this email, neither the Institute nor the sender accepts any responsibility for any viruses, and it is your responsibility to scan the email and the attachments (if any). ______________________________________________________ From jblanca at btc.upv.es Fri Mar 27 04:22:27 2009 From: jblanca at btc.upv.es (Jose Blanca) Date: Fri, 27 Mar 2009 09:22:27 +0100 Subject: [Biopython-dev] [BioPython] about the SeqRecord slicing In-Reply-To: <320fb6e00903260832m65c6888dpc856d033ceceda5@mail.gmail.com> References: <200903261248.02279.jblanca@btc.upv.es> <200903261614.13454.jblanca@btc.upv.es> <320fb6e00903260832m65c6888dpc856d033ceceda5@mail.gmail.com> Message-ID: <200903270922.27152.jblanca@btc.upv.es> On Thursday 26 March 2009 16:32:23 Peter wrote: > The SeqRecord __getitem__ would have to return a SeqRecord when given > a single integer index, holding a single letter sequence. What about > the name/id/description and annotations (e.g. organism) - do they > really apply to a single letter from the sequence? Technically > writing the code to offer this isn't such a problem, but I am > unconvinced this is the best behaviour for normal usage. You're right, I was not thinking on the rest of the properties because I don't need them. They're a problem when slicing and adding SeqRecords. But they're also a problem in standard slicing. Should the annotations be kept when the SeqRecord is sliced? Are they still relevant? None of the behaviours will be ok for all the cases. > Also closely related to this, what would you expect __iter__ to > iterate over? Currently it acts like iteration over the record's > sequence. The SeqRecord can already hold a sequence of length one, so we have the same problem. In fact I could do seq_rec[n:n+1] and I would obtain the SeqRecord that I want. > You'd also want the SeqRecord to support __add__ (and __radd__) so > that two SeqRecord objects can be added together. I have thought > about this before, and it is a *much* more complicated issue due to > the meta data. In general the only safe and unambiguous choice is to > exclude it from the combined record: > * sequence - just add (using normal rules for adding Seq objects) > * name/id/description - if the two agree, use that? Otherwise default > to a blank value? > * annotations - for each keyed value, you could combine the entries? > Or just throwing them all away? > * letter_annotations - if an entry is present in both you can combine > it. Otherwise throw them away? > * features - these could be combined, adjusting the locations for one > record's features as appropriate As I said before I think that the same problem is presented when you do a slice. If I have the sequence of a gene named X with some annotations and I slice a part, is still be named geneX? Should the annotations be kept? > I'm not ruling out adding SeqRecord addition, but I don't want to rush > it while we are trying to get Biopython 1.50 done. That's quite sensible. I think that is a good thing to discuss all this issues, I keep learning a lot from you. Best regards, -- Jose M. Blanca Postigo Instituto Universitario de Conservacion y Mejora de la Agrodiversidad Valenciana (COMAV) Universidad Politecnica de Valencia (UPV) Edificio CPI (Ciudad Politecnica de la Innovacion), 8E 46022 Valencia (SPAIN) Tlf.:+34-96-3877000 (ext 88473) From biopython at maubp.freeserve.co.uk Fri Mar 27 06:29:10 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 27 Mar 2009 10:29:10 +0000 Subject: [Biopython-dev] [BioPython] about the SeqRecord slicing In-Reply-To: <200903270922.27152.jblanca@btc.upv.es> References: <200903261248.02279.jblanca@btc.upv.es> <200903261614.13454.jblanca@btc.upv.es> <320fb6e00903260832m65c6888dpc856d033ceceda5@mail.gmail.com> <200903270922.27152.jblanca@btc.upv.es> Message-ID: <320fb6e00903270329r74a48dcerf8e00a0ba3776af4@mail.gmail.com> On Fri, Mar 27, 2009 at 8:22 AM, Jose Blanca wrote: > On Thursday 26 March 2009 16:32:23 Peter wrote: > >> You'd also want the SeqRecord to support __add__ (and __radd__) so >> that two SeqRecord objects can be added together. ?I have thought >> about this before, and it is a *much* more complicated issue due to >> the meta data. ?In general the only safe and unambiguous choice is to >> exclude it from the combined record: >> * sequence - just add (using normal rules for adding Seq objects) >> * name/id/description - if the two agree, use that? ?Otherwise default >> to a blank value? >> * annotations - for each keyed value, you could combine the entries? >> Or just throwing them all away? >> * letter_annotations - if an entry is present in both you can combine >> it. ?Otherwise throw them away? >> * features - these could be combined, adjusting the locations for one >> record's features as appropriate > > As I said before I think that the same problem is presented when you do a > slice. If I have the sequence of a gene named X with some annotations and I > slice a part, is still be named geneX? Should the annotations be kept? The problems about the annotation when slicing a SeqRecord are similar, but I think things are worse when adding two SeqRecords together. For slicing, there are a few sub of cases: - per-letter-annotation can be sliced too - easy. - features - we retain only features fully inside the new sub-sequence (the border line features which cross the slice boundary are a small problem - excluding them is the simplest solution to code and explain). - id/name - debatable. Currently kept. - description - debatable. Consider a description which says "whole genome", that doesn't really apply to a partial sequence. On the other hand, it may. Currently kept for the sub-record. - annotations - again debatable. Without context information, we can't guess. The only sensible options are keep it all (as in CVS) or none of it. I think it is worth keeping the id/name in general (consider typical use cases like cropping a domain from a gene, or cropping columns off an alignment). I would be OK with dropping the contents of the annotations dictionary and description is order to avoid ambiguity, but this would prevent certain tasks. Peter From sbassi at clubdelarazon.org Fri Mar 27 09:31:01 2009 From: sbassi at clubdelarazon.org (Sebastian Bassi) Date: Fri, 27 Mar 2009 10:31:01 -0300 Subject: [Biopython-dev] [BioPython] about the SeqRecord slicing In-Reply-To: <320fb6e00903270329r74a48dcerf8e00a0ba3776af4@mail.gmail.com> References: <200903261248.02279.jblanca@btc.upv.es> <200903261614.13454.jblanca@btc.upv.es> <320fb6e00903260832m65c6888dpc856d033ceceda5@mail.gmail.com> <200903270922.27152.jblanca@btc.upv.es> <320fb6e00903270329r74a48dcerf8e00a0ba3776af4@mail.gmail.com> Message-ID: <9e2f512b0903270631l2b806f55oc02b1e1396bd0bfb@mail.gmail.com> On Fri, Mar 27, 2009 at 7:29 AM, Peter wrote: .... > - id/name - debatable. Currently kept. > - description - debatable. Consider a description which says "whole genome", > that doesn't really apply to a partial sequence. On the other hand, it may. > Currently kept for the sub-record. I think is up to the user to keep updated the id/name/descripption field when slicing a sequence. ..... > I would be OK with dropping the contents of the annotations dictionary and > description is order to avoid ambiguity, but this would prevent certain tasks. Another option is to make this behavior optional (I mean, select to keep or to drop the annotations, but default I would drop them). From biopython at maubp.freeserve.co.uk Fri Mar 27 09:57:30 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 27 Mar 2009 13:57:30 +0000 Subject: [Biopython-dev] [BioPython] about the SeqRecord slicing In-Reply-To: <9e2f512b0903270631l2b806f55oc02b1e1396bd0bfb@mail.gmail.com> References: <200903261248.02279.jblanca@btc.upv.es> <200903261614.13454.jblanca@btc.upv.es> <320fb6e00903260832m65c6888dpc856d033ceceda5@mail.gmail.com> <200903270922.27152.jblanca@btc.upv.es> <320fb6e00903270329r74a48dcerf8e00a0ba3776af4@mail.gmail.com> <9e2f512b0903270631l2b806f55oc02b1e1396bd0bfb@mail.gmail.com> Message-ID: <320fb6e00903270657j1aa06199o4996f11c25bf2a3b@mail.gmail.com> On Fri, Mar 27, 2009 at 1:31 PM, Sebastian Bassi wrote: > I think is up to the user to keep updated the id/name/descripption > field when slicing a sequence. If you make a new SeqRecord by first slicing a Seq object (which is how you have to do it with Biopython 1.49 or older), then dealing with ALL the annotation is explicitly in the hands of the user. Or are you saying when slicing a SeqRecord you wouldn't expect the id/name/description to be preserved for the sub-record? > ..... >> I would be OK with dropping the contents of the annotations >> dictionary and description is order to avoid ambiguity, but >> this would prevent certain tasks. > > Another option is to make this behavior optional (I mean, select to > keep or to drop the annotations, but default I would drop them). How would you make it optional? As an extra non-standard argument to __getitem__? e.g.something like my_record[10:50, annotation=False]? That seems nasty. I am sympathetic to dropping the annotations dictionary when creating a "child" SeqRecord when slicing its parent. There is also the database cross reference list (which i forgot on my last email). Again, I wouldn't object to dropping this for a sliced sub-record. If we did drop the annotations and dbxrefs when slicing, the user can manually choose to explicitly copy them from the parent object if the do want them. Peter From jblanca at btc.upv.es Fri Mar 27 10:02:57 2009 From: jblanca at btc.upv.es (Jose Blanca) Date: Fri, 27 Mar 2009 15:02:57 +0100 Subject: [Biopython-dev] [BioPython] about the SeqRecord slicing In-Reply-To: <320fb6e00903270657j1aa06199o4996f11c25bf2a3b@mail.gmail.com> References: <200903261248.02279.jblanca@btc.upv.es> <9e2f512b0903270631l2b806f55oc02b1e1396bd0bfb@mail.gmail.com> <320fb6e00903270657j1aa06199o4996f11c25bf2a3b@mail.gmail.com> Message-ID: <200903271502.57872.jblanca@btc.upv.es> On Friday 27 March 2009 14:57:30 Peter wrote: > How would you make it optional? As an extra non-standard argument > to __getitem__? e.g.something like my_record[10:50, annotation=False]? > That seems nasty. That's very nasty, not pythonic, and adds complexity to the api. > I am sympathetic to dropping the annotations dictionary when creating > a "child" SeqRecord when slicing its parent. There is also the database > cross reference list (which i forgot on my last email). Again, I wouldn't > object to dropping this for a sliced sub-record. > > If we did drop the annotations and dbxrefs when slicing, the user can > manually choose to explicitly copy them from the parent object if the > do want them. I also think that dropping all that stuff when slicing or adding is the best behaviour. -- Jose M. Blanca Postigo Instituto Universitario de Conservacion y Mejora de la Agrodiversidad Valenciana (COMAV) Universidad Politecnica de Valencia (UPV) Edificio CPI (Ciudad Politecnica de la Innovacion), 8E 46022 Valencia (SPAIN) Tlf.:+34-96-3877000 (ext 88473) From sbassi at clubdelarazon.org Fri Mar 27 10:17:55 2009 From: sbassi at clubdelarazon.org (Sebastian Bassi) Date: Fri, 27 Mar 2009 11:17:55 -0300 Subject: [Biopython-dev] [BioPython] about the SeqRecord slicing In-Reply-To: <320fb6e00903270657j1aa06199o4996f11c25bf2a3b@mail.gmail.com> References: <200903261248.02279.jblanca@btc.upv.es> <200903261614.13454.jblanca@btc.upv.es> <320fb6e00903260832m65c6888dpc856d033ceceda5@mail.gmail.com> <200903270922.27152.jblanca@btc.upv.es> <320fb6e00903270329r74a48dcerf8e00a0ba3776af4@mail.gmail.com> <9e2f512b0903270631l2b806f55oc02b1e1396bd0bfb@mail.gmail.com> <320fb6e00903270657j1aa06199o4996f11c25bf2a3b@mail.gmail.com> Message-ID: <9e2f512b0903270717s13c82d19v7c48dddda4a8fcb@mail.gmail.com> On Fri, Mar 27, 2009 at 10:57 AM, Peter wrote: > How would you make it optional? As an extra non-standard argument > to __getitem__? e.g.something like my_record[10:50, annotation=False]? > That seems nasty. Yes it is nasty this way, I never meant to do it in __getitem__. Anyway I can't think a nice and intuitive way to do it. > If we did drop the annotations and dbxrefs when slicing, the user can > manually choose to explicitly copy them from the parent object if the > do want them. Yes, that is OK. From biopython at maubp.freeserve.co.uk Fri Mar 27 10:24:13 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 27 Mar 2009 14:24:13 +0000 Subject: [Biopython-dev] [BioPython] about the SeqRecord slicing In-Reply-To: <9e2f512b0903270717s13c82d19v7c48dddda4a8fcb@mail.gmail.com> References: <200903261248.02279.jblanca@btc.upv.es> <200903261614.13454.jblanca@btc.upv.es> <320fb6e00903260832m65c6888dpc856d033ceceda5@mail.gmail.com> <200903270922.27152.jblanca@btc.upv.es> <320fb6e00903270329r74a48dcerf8e00a0ba3776af4@mail.gmail.com> <9e2f512b0903270631l2b806f55oc02b1e1396bd0bfb@mail.gmail.com> <320fb6e00903270657j1aa06199o4996f11c25bf2a3b@mail.gmail.com> <9e2f512b0903270717s13c82d19v7c48dddda4a8fcb@mail.gmail.com> Message-ID: <320fb6e00903270724r432b4daco920648d921890623@mail.gmail.com> On Fri, Mar 27, 2009 at 2:17 PM, Sebastian Bassi wrote: > On Fri, Mar 27, 2009 at 10:57 AM, Peter wrote: >> How would you make it optional? ?As an extra non-standard argument >> to __getitem__? ?e.g.something like my_record[10:50, annotation=False]? >> That seems nasty. > > Yes it is nasty this way, I never meant to do it in __getitem__. > Anyway I can't think a nice and intuitive way to do it. Me neither right now. >> If we did drop the annotations and dbxrefs when slicing, the user can >> manually choose to explicitly copy them from the parent object if the >> do want them. > > Yes, that is OK. Jose agrees, so that makes a mini consensus (at least amongst everyone who has tried the CVS code and posted to this thread). I've made that change in CVS, see Bio/SeqRecord.py revision 1.31. http://cvs.biopython.org/cgi-bin/viewcvs/viewcvs.cgi/biopython/Bio/SeqRecord.py?cvsroot=biopython As I said before, I want to preserve the id and name - preserving these would be key for cross referencing the sub-record back to its parent. Do either of you think we should also discard the description? Peter From eric.talevich at gmail.com Fri Mar 27 11:16:19 2009 From: eric.talevich at gmail.com (Eric Talevich) Date: Fri, 27 Mar 2009 11:16:19 -0400 Subject: [Biopython-dev] [BioPython] about the SeqRecord slicing In-Reply-To: <320fb6e00903270724r432b4daco920648d921890623@mail.gmail.com> References: <200903261248.02279.jblanca@btc.upv.es> <200903261614.13454.jblanca@btc.upv.es> <320fb6e00903260832m65c6888dpc856d033ceceda5@mail.gmail.com> <200903270922.27152.jblanca@btc.upv.es> <320fb6e00903270329r74a48dcerf8e00a0ba3776af4@mail.gmail.com> <9e2f512b0903270631l2b806f55oc02b1e1396bd0bfb@mail.gmail.com> <320fb6e00903270657j1aa06199o4996f11c25bf2a3b@mail.gmail.com> <9e2f512b0903270717s13c82d19v7c48dddda4a8fcb@mail.gmail.com> <320fb6e00903270724r432b4daco920648d921890623@mail.gmail.com> Message-ID: <3f6baf360903270816x4fcfd8ccg5906a9edb53709d4@mail.gmail.com> On Fri, Mar 27, 2009 at 10:24 AM, Peter wrote: > On Fri, Mar 27, 2009 at 2:17 PM, Sebastian Bassi > wrote: > > On Fri, Mar 27, 2009 at 10:57 AM, Peter > wrote: > >> How would you make it optional? As an extra non-standard argument > >> to __getitem__? e.g.something like my_record[10:50, annotation=False]? > >> That seems nasty. > > > > Yes it is nasty this way, I never meant to do it in __getitem__. > > Anyway I can't think a nice and intuitive way to do it. > > Me neither right now. > > >> If we did drop the annotations and dbxrefs when slicing, the user can > >> manually choose to explicitly copy them from the parent object if the > >> do want them. > > > > Yes, that is OK. > > One way to allow non-default options for adding and slicing is to provide a couple of functions at the class or module level (classmethod, staticmethod, plain ol' function) that have the necessary keyword arguments. These functions would do the same thing by default as the corresponding syntax, and the syntax-friendly magic methods would just pass their arguments straight to these functions. This makes the syntax pretty for the common cases, and makes the nonstandard stuff visually obvious. Examples: my_record.slice(10, 50) == my_record[10:50] my_record.slice(10, 50, annotation=True) == my_record[10:50] plus updated annotations my_record.add(other_record) == my_record + other_record my_record.add(other_record, annotation=True) == my_record + other_record, keeping annotations my_record.slice(10, 50, annotation=True).add( my_record.slice(100, 200, annotation=True), annotation=True) == my_record[10:50] + my_record[100:200], keeping all annotations (a pain otherwise) From biopython at maubp.freeserve.co.uk Fri Mar 27 11:51:53 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 27 Mar 2009 15:51:53 +0000 Subject: [Biopython-dev] [BioPython] about the SeqRecord slicing In-Reply-To: <3f6baf360903270816x4fcfd8ccg5906a9edb53709d4@mail.gmail.com> References: <200903261248.02279.jblanca@btc.upv.es> <200903261614.13454.jblanca@btc.upv.es> <320fb6e00903260832m65c6888dpc856d033ceceda5@mail.gmail.com> <200903270922.27152.jblanca@btc.upv.es> <320fb6e00903270329r74a48dcerf8e00a0ba3776af4@mail.gmail.com> <9e2f512b0903270631l2b806f55oc02b1e1396bd0bfb@mail.gmail.com> <320fb6e00903270657j1aa06199o4996f11c25bf2a3b@mail.gmail.com> <9e2f512b0903270717s13c82d19v7c48dddda4a8fcb@mail.gmail.com> <320fb6e00903270724r432b4daco920648d921890623@mail.gmail.com> <3f6baf360903270816x4fcfd8ccg5906a9edb53709d4@mail.gmail.com> Message-ID: <320fb6e00903270851i47db9121p6d272b5f7095a5d3@mail.gmail.com> On Fri, Mar 27, 2009 at 3:16 PM, Eric Talevich wrote: > One way to allow non-default options for adding and slicing is to provide a > couple of functions at the class or module level (classmethod, staticmethod, > plain ol' function) that have the necessary keyword arguments. These > functions would do the same thing by default as the corresponding syntax, > and the syntax-friendly magic methods would just pass their arguments > straight to these functions. This makes the syntax pretty for the common > cases, and makes the nonstandard stuff visually obvious. > > Examples: > > my_record.slice(10, 50) == my_record[10:50] > my_record.slice(10, 50, annotation=True) == my_record[10:50] plus updated > annotations > ... I think I understand your idea, but I'm not very keen on adding slice and add methods as alternatives to __getitem__ and __add__. As things stand (with CVS after the change an hour ago), if you want the annotations dictionary copied with a slice you must do this explicitly: >>> from Bio import SeqIO >>> my_record = SeqIO.read(open("NC_005816.gb"),"genbank") >>> my_record SeqRecord(seq=Seq('TGTAACGAACGGTGCAATAGTGATCCACACCCAACGCCTGAAATCAGATCCAGG...CTG', IUPACAmbiguousDNA()), id='NC_005816.1', name='NC_005816', description='Yersinia pestis biovar Microtus str. 91001 plasmid pPCP1, complete sequence.', dbxrefs=['Project:10638']) >>> len(my_record) 9609 >>> len(my_record.features) 29 >>> len(my_record.annotations) 11 >>> len(my_record.dbxrefs) 1 Doing a slice will not copy/preserve the annotations dict or dbxrefs list: >>> sub_record = my_record[1000:2000] >>> sub_record SeqRecord(seq=Seq('GAAAAAAGAGTATGACGTGCATCTTGATGAAAATCTGGTGAACTTCGACAAACA...GGA', IUPACAmbiguousDNA()), id='NC_005816.1', name='NC_005816', description='Yersinia pestis biovar Microtus str. 91001 plasmid pPCP1, complete sequence.', dbxrefs=[]) >>> len(sub_record) 1000 >>> len(sub_record.features) 2 >>> assert not sub_record.annotations and not sub_record.dbxrefs You can then choose to blindly reuse the annotations and dbxrefs if you want to: >>> sub_record.annotations = my_record.anntations #shares the dict >>> sub_record.dbxrefs = my_record.dbxrefs #shares the list or as a simple copy: >>> sub_record.annotations = my_record.annotations.copy() >>> sub_record.dbxrefs = my_record.dbxrefs[:] The good thing about this is it makes you think about the annotations, and which (if any) are appropriate to transfer to the sub-record. As per my earlier email, maybe we should do the same with the description? Peter From chapmanb at 50mail.com Sat Mar 28 21:06:52 2009 From: chapmanb at 50mail.com (Brad Chapman) Date: Sat, 28 Mar 2009 21:06:52 -0400 Subject: [Biopython-dev] Biopython on Twitter In-Reply-To: References: <320fb6e00903260849n683d3e39kf68fd91727970dc7@mail.gmail.com> Message-ID: <20090329010652.GA914@kunkel> Hi all; It is great we are exploring getting news out about Biopython in additional ways. One thing this can really help with is recognizing contributions to Biopython. Another is pointing out interesting discussion threads on the mailing lists and getting others involved. Do you think it would be worthwhile to "advertise" on the main list for someone interested in coordinating news and communication? They could do things like: - Send updates through twitter on day to day activities, like: Bartek and Tiago cleaned up documentation on Git submissions (link to wiki page) Peter, Jose and Sebastian are discussing slicing on SeqRecords (link to mailing list discussion) - Send out monthly news reports on new items in Biopython, in the style of Peter's update recently: http://news.open-bio.org/news/2009/03/biopython-next-gen-sequencing/ (but it should also give credit to the fine people who coded it) Perhaps there are members who are interested in Biopython and follow what is going on but aren't coders. This would be a way to get involved, and also take some of the burden off Peter. What do y'all think? Brad > > It's great to see that people have picked up on the Biopython Twitter > account already - I hope that it proves useful in the longer term. > > Regarding the social etiquette of Twitter, and the ease with which > 'following' can be taken to imply 'approval' I wonder if it would be a good > policy to restrict the Twitter accounts that Biopython follows only to those > representing organisations or groups. Following some individuals and not > others might be seen to privilege a self-selecting group, cabal or 'elite', > even the accidental suggestion of which I think would be best avoided. > > On 26/03/2009 15:49, "Peter" wrote: > > > > Quite a few people have started following this already - which is fun. I see > > the OBF news page entries are automatically pushed to their twitter account, > > http://twitter.com/obf_news plus the BioPerl tagged entries are also pushed > > to http://twitter.com/bioperl - I'll get in touch to see how they did > > it so we can > > [...] > > > We could probably also echo the CVS (or git) RSS feed into twitter, but I > > suspect that would drown out any more interesting tweets. > > Signal to noise is apparently not an issue that bothers very many Tweeters, > but I see no harm in starting a trend ;) > > L. > > -- > Dr Leighton Pritchard MRSC > D131, Plant Pathology Programme, SCRI > Errol Road, Invergowrie, Perth and Kinross, Scotland, DD2 5DA > e:lpritc at scri.ac.uk w:http://www.scri.ac.uk/staff/leightonpritchard > gpg/pgp: 0xFEFC205C tel:+44(0)1382 562731 x2405 > > > ______________________________________________________ > SCRI, Invergowrie, Dundee, DD2 5DA. > The Scottish Crop Research Institute is a charitable company limited by guarantee. > Registered in Scotland No: SC 29367. > Recognised by the Inland Revenue as a Scottish Charity No: SC 006662. > > > DISCLAIMER: > > This email is from the Scottish Crop Research Institute, but the views expressed by the sender are not necessarily the views of SCRI and its subsidiaries. This email and any files transmitted with it are confidential to the intended recipient at the e-mail address to which it has been addressed. It may not be disclosed or used by any other than that > addressee. > If you are not the intended recipient you are requested to preserve this confidentiality and you must not use, disclose, copy, print or rely on > this e-mail in any way. Please notify postmaster at scri.ac.uk quoting the name of the sender and delete the email from your system. > > Although SCRI has taken reasonable precautions to ensure no viruses are present in this email, neither the Institute nor the sender accepts any responsibility for any viruses, and it is your responsibility to scan the email and the attachments (if any). > ______________________________________________________ > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev From biopython at maubp.freeserve.co.uk Sun Mar 29 18:58:47 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Sun, 29 Mar 2009 23:58:47 +0100 Subject: [Biopython-dev] Biopython on Twitter In-Reply-To: <20090329010652.GA914@kunkel> References: <320fb6e00903260849n683d3e39kf68fd91727970dc7@mail.gmail.com> <20090329010652.GA914@kunkel> Message-ID: <320fb6e00903291558o6299575dq80eea647b1c6a900@mail.gmail.com> On Sun, Mar 29, 2009 at 2:06 AM, Brad Chapman wrote: > Hi all; > It is great we are exploring getting news out about Biopython in > additional ways. One thing this can really help with is recognizing > contributions to Biopython. Another is pointing out interesting > discussion threads on the mailing lists and getting others involved. Do you think the recent release notes and NEWS file entries have been a bit too impersonal? We can certainly be a bit more explicit if people that is a good thing. For example, should we mention Bartek by name in the paragraph on the new Bio.Motif module? This is linked to from the wiki's news page BTW: http://biopython.open-bio.org/SRC/biopython/NEWS http://cvs.biopython.org/cgi-bin/viewcvs/viewcvs.cgi/biopython/NEWS?cvsroot=biopython > Do you think it would be worthwhile to "advertise" on the main list > for someone interested in coordinating news and communication? > ... Perhaps there are members who are interested in Biopython and > follow what is going on but aren't coders. This would be a way to > get involved, ... Are you up for the job yourself Brad? From your own blog we know you can and do write regularly anyway ;) Would you like an account on the OBF news server? Email me off list and we can sort that out. In terms of micro-blogging via twitter, you sound like you have a better feel for this than me - I don't even have a personal twitter account. Monthly news posts (perhaps cc'd to the announcement email list) would be a nice idea - especially if we can encourage more lurkers to speak up. For a while BioPerl had something like this going (digest emails or something), but it needs a pretty dedicated person or team. In the meantime as you've noticed I've started making more use of our news facility myself... Peter From biopython at maubp.freeserve.co.uk Mon Mar 30 06:26:09 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 30 Mar 2009 11:26:09 +0100 Subject: [Biopython-dev] Testing Biopython with NumPy 1.3 Message-ID: <320fb6e00903300326x4cb5eb95r87dbd5c95d5379d9@mail.gmail.com> Hi all, NumPy 1.3 is about to be released, so we should try and make sure the forthcoming Biopython 1.50 release works with it. Of particular interest, this will be the first version of NumPy to support Python 2.6 on Windows, so we will hopefully be able to include a Python 2.6 Windows installer for Biopython 1.50 :) There is a release candidate out for NumPy 1.3, but so far no Windows installer for Python 2.6, but in the meantime I've just tried the NumPy 1.3 beta release instead. The good news is everything seems to compile with MinGW, but unfortunately test_Cluster.py is failing on the second line of Bio/Cluster/__init__.py, "from cluster import *". This could be a hiccup with NumPy itself - I am using their beta after all, or perhaps they have changed something. To try and narrow down the problem, has anyone else tried NumPy 1.3 (beta or release candidate) with the latest Biopython from CVS (on any platform)? Thanks, Peter From biopython at maubp.freeserve.co.uk Mon Mar 30 06:29:02 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 30 Mar 2009 11:29:02 +0100 Subject: [Biopython-dev] Testing Biopython with NumPy 1.3 In-Reply-To: <320fb6e00903300326x4cb5eb95r87dbd5c95d5379d9@mail.gmail.com> References: <320fb6e00903300326x4cb5eb95r87dbd5c95d5379d9@mail.gmail.com> Message-ID: <320fb6e00903300329ra19fe06j1cd12477e591afdf@mail.gmail.com> On Mon, Mar 30, 2009 at 11:26 AM, Peter wrote: > Hi all, > > NumPy 1.3 is about to be released, so we should try and make sure the > forthcoming Biopython 1.50 release works with it. ?Of particular interest, > this will be the first version of NumPy to support Python 2.6 on Windows, > so we will hopefully be able to include a Python 2.6 Windows installer > for Biopython 1.50 :) > > There is a release candidate out for NumPy 1.3, but so far no Windows > installer for Python 2.6, but in the meantime I've just tried the NumPy 1.3 > beta release instead. David Cournapeau has just updated sourceforge - so I will try again with the actual release candidate instead of just the beta... Peter From biopython at maubp.freeserve.co.uk Mon Mar 30 06:38:58 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 30 Mar 2009 11:38:58 +0100 Subject: [Biopython-dev] Testing Biopython with NumPy 1.3 In-Reply-To: <320fb6e00903300329ra19fe06j1cd12477e591afdf@mail.gmail.com> References: <320fb6e00903300326x4cb5eb95r87dbd5c95d5379d9@mail.gmail.com> <320fb6e00903300329ra19fe06j1cd12477e591afdf@mail.gmail.com> Message-ID: <320fb6e00903300338v35b14fa2yc0d2ba68925808da@mail.gmail.com> On Mon, Mar 30, 2009 at 11:29 AM, Peter wrote: > David Cournapeau has just updated sourceforge - so I will try again with > the actual release candidate instead of just the beta... Nope - using numpy-1.3.0rc1-win32-superpack-python2.6.exe on Windows XP, Python 2.6 using the python.org installer, with Biopython compiled with cygwin mingw32 as normal, same error - test_Cluster.py is failing on the second line of Bio/Cluster/__init__.py, "from cluster import *". So the question stands - has anyone else tried Biopython (from CVS) with NumPy 1.3 (beta or release candidate) on any platform? I should be able to check it tonight on a Linux machine myself without too much trouble... but a few more data points wouldn't hurt ;) Peter From biopython at maubp.freeserve.co.uk Mon Mar 30 07:15:06 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 30 Mar 2009 12:15:06 +0100 Subject: [Biopython-dev] test_Nexus.py and NamedTemporaryFile mode Message-ID: <320fb6e00903300415i350610c0i4c2aeed1834011da@mail.gmail.com> I've been running the test suite again on Windows, and was reminded of this open issue with NamedTemporaryFile on Windows... On Fri, Feb 13, 2009 at 5:02 PM, Peter wrote: > On Tue, Feb 10, 2009 at 11:25 AM, Michiel de Hoon wrote: >> >>> The test_Nexus tearDown used to make sure the temp output >>> files were removed. ?This is important on Windows which >>> does not do this automatically. ?I see you now allocate >>> "random" filenames using tempfile.NamedTemporaryFile(...) >>> so presumably we would need to record these so that the >>> tearDown method knows what temp files to remove. >> >> From reading the Python documentation, the file created by >> tempfile.NamedTemporaryFile is removed automatically >> when the file handle is closed, even on Windows. > > That's good to know. ?On a related point, I've just found > test_Nexus.py is failing on Windows XP with Python 2.6 (but is fine > with Python 2.3, 2.4 and 2.5): > > C:\repository\biopython\Tests>c:\python26\python test_Nexus.py > Test Nexus module ... ERROR > Test Tree module. ... ok > > ====================================================================== > ERROR: Test Nexus module > ---------------------------------------------------------------------- > Traceback (most recent call last): > ?File "test_Nexus.py", line 114, in test_NexusTest1 > ? ?f1=tempfile.NamedTemporaryFile(mode='r+w+b') > ?File "c:\python26\lib\tempfile.py", line 445, in NamedTemporaryFile > ? ?file = _os.fdopen(fd, mode, bufsize) > OSError: [Errno 22] Invalid argument > > ---------------------------------------------------------------------- > Ran 2 tests in 0.016s > > FAILED (errors=1) You can recreate this at the python 2.6 prompt with the one line: f1=tempfile.NamedTemporaryFile(mode='r+w+b') I couldn't solve this from looking at the Python documentation, but after some Google searching the answer seems to be just to use the default mode (w+b): f1=tempfile.NamedTemporaryFile() This works on Windows with Python 2.3 to 2.6, and also works on Mac OS X and Linux too (only one version of Python tested here). Fix checked into CVS. Peter From cy at cymon.org Mon Mar 30 07:42:00 2009 From: cy at cymon.org (Cymon Cox) Date: Mon, 30 Mar 2009 12:42:00 +0100 Subject: [Biopython-dev] Multiple alignment - Clustalw etc... Message-ID: <7265d4f0903300442h276df25ay1d78fb04180c5b5b@mail.gmail.com> Hi Folks, I've been trying to formalize a bunch of randomly scattered bits of code to support the use of the alignment programme Muscle (http://www.drive5.com/muscle/). I prefer to use this software in preference to Clustalw - subjectively, it seems to give the most accurate alignments. (Whether Biopython would want to support a second alignment programme/external dependency is another matter...) Anyway, while doing so, I realised just how awkward the current interface to Clustalw is, which doesn't fit the SeqIO/AlignIO paradigm well. Currently, if we have a bunch of SeqRecords, say after downloading from GenBank or being pulled from a BioSQL db, we have to write them to disk and call clustalw on the file: >>> from Bio import Clustalw >>> from Bio.Clustalw import MultipleAlignCL >>> cline = MultipleAlignCL("f002", command="clustalw") >>> align = Clustalw.do_alignment(cline) It seems to me more appropriate to be able to call clustalw directly on a bunch of SeqRecords: eg (suggested implementation) >>> records = list(SeqIO.parse(open("f002", "r"), "fasta")) >>> from Bio.Align import MultipleAlignment >>> align = MultipleAlignment(records, executable="clustalw") Secondly, the biopython interface does not support calling Clustalw to perform profile alignments, (suggested implementation) # The scaffold alignment: >>> align = AlignIO.read(open("blah.nex", "r"), "nexus") # The sequences we want to add to it: >>> records = list(SeqIO.parse(open("f002", "r"), "fasta")) >>> from Bio.Align import ProfileAlignment >>> align = ProfileAlignment(align, records, executable="clustalw") Calls to MultipleAlignment and ProfileAlignment would take a **options parameter to collect any additional command line options. Thirdly, should an alignment object have a Alignment.refine_alignment(executable="clustalw") method? Any thoughts? Cheers, C. -- ____________________________________________________________________ Cymon J. Cox Centro de Ciencias do Mar Faculdade de Ciencias do Mar e Ambiente (FCMA) Universidade do Algarve Campus de Gambelas 8005-139 Faro Portugal Phone: +0351 289800909 ext 7909 Fax: +0351 289800051 Email: cy at cymon.org, cymon at ualg.pt, cymon.cox at gmail.com HomePage : http://biology.duke.edu/bryology/cymon.html -8.63/-6.77 From chapmanb at 50mail.com Mon Mar 30 09:00:27 2009 From: chapmanb at 50mail.com (Brad Chapman) Date: Mon, 30 Mar 2009 09:00:27 -0400 Subject: [Biopython-dev] Testing Biopython with NumPy 1.3 In-Reply-To: <320fb6e00903300338v35b14fa2yc0d2ba68925808da@mail.gmail.com> References: <320fb6e00903300326x4cb5eb95r87dbd5c95d5379d9@mail.gmail.com> <320fb6e00903300329ra19fe06j1cd12477e591afdf@mail.gmail.com> <320fb6e00903300338v35b14fa2yc0d2ba68925808da@mail.gmail.com> Message-ID: <20090330130027.GB36526@sobchak.mgh.harvard.edu> Hi Peter; Things work on FreeBSD 7.1 with python2.5 and the numpy release candidate: > python2.5 Python 2.5.4 (r254:67916, Feb 18 2009, 08:20:57) [GCC 4.2.1 20070719 [FreeBSD]] on freebsd7 >>> import numpy >>> numpy.__version__ '1.3.0rc1' > python2.5 test_Cluster.py test_clusterdistance (__main__.TestCluster) ... ok test_distancematrix_kmedoids (__main__.TestCluster) ... ok test_kcluster (__main__.TestCluster) ... ok test_matrix_parse (__main__.TestCluster) ... ok test_median_mean (__main__.TestCluster) ... ok test_somcluster (__main__.TestCluster) ... ok test_treecluster (__main__.TestCluster) ... ok ---------------------------------------------------------------------- Ran 7 tests in 0.009s OK The whole test suite passes as well. Maybe this is a windows issue? Brad > On Mon, Mar 30, 2009 at 11:29 AM, Peter wrote: > > David Cournapeau has just updated sourceforge - so I will try again with > > the actual release candidate instead of just the beta... > > Nope - using numpy-1.3.0rc1-win32-superpack-python2.6.exe on Windows > XP, Python 2.6 using the python.org installer, with Biopython compiled > with cygwin mingw32 as normal, same error - test_Cluster.py is failing > on the second line of Bio/Cluster/__init__.py, "from cluster import > *". > > So the question stands - has anyone else tried Biopython (from CVS) > with NumPy 1.3 (beta or release candidate) on any platform? I should > be able to check it tonight on a Linux machine myself without too much > trouble... but a few more data points wouldn't hurt ;) > > Peter > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev From biopython at maubp.freeserve.co.uk Mon Mar 30 09:23:31 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 30 Mar 2009 14:23:31 +0100 Subject: [Biopython-dev] Testing Biopython with NumPy 1.3 In-Reply-To: <20090330130027.GB36526@sobchak.mgh.harvard.edu> References: <320fb6e00903300326x4cb5eb95r87dbd5c95d5379d9@mail.gmail.com> <320fb6e00903300329ra19fe06j1cd12477e591afdf@mail.gmail.com> <320fb6e00903300338v35b14fa2yc0d2ba68925808da@mail.gmail.com> <20090330130027.GB36526@sobchak.mgh.harvard.edu> Message-ID: <320fb6e00903300623j1f17fe6fia6ded742a7c610ec@mail.gmail.com> On Mon, Mar 30, 2009 at 2:00 PM, Brad Chapman wrote: > Hi Peter; > Things work on FreeBSD 7.1 with python2.5 and the numpy release > candidate: > ... > The whole test suite passes as well. Maybe this is a windows issue? > Brad Thanks Brad - nice to know we have Biopython being tested on a fourth major OS being tested (FreeBSD, in addition to Linux, Mac OS X and Windows XP). I've just used NumPy 1.3.0rc1 with Python 2.4.3 on a Linux box, and test_Cluster and the rest of the Biopython tests passed. This looks like a Windows and/or Python 2.6 problem - I should be able to try a Linux machine with Python 2.6 tonight... Peter From biopython at maubp.freeserve.co.uk Mon Mar 30 10:37:18 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 30 Mar 2009 15:37:18 +0100 Subject: [Biopython-dev] Multiple alignment - Clustalw etc... In-Reply-To: <7265d4f0903300442h276df25ay1d78fb04180c5b5b@mail.gmail.com> References: <7265d4f0903300442h276df25ay1d78fb04180c5b5b@mail.gmail.com> Message-ID: <320fb6e00903300737i73f6efaex7b0a22ee685c74c1@mail.gmail.com> On Mon, Mar 30, 2009 at 12:42 PM, Cymon Cox wrote: > > Hi Folks, > > I've been trying to formalize a bunch of randomly scattered bits of code to > support the use of the alignment programme Muscle > (http://www.drive5.com/muscle/). I prefer to use this software in preference > to Clustalw - subjectively, it seems to give the most accurate alignments. > (Whether Biopython would want to support a second alignment programme > /external dependency is another matter...) A wrapper for MUSCLE wouldn't hurt - although there is scope for some rearrangement of our command line tool wrappers rather than adding more and more top level modules. Maybe under Bio.Align, and move the Clustalw wrapper there too. > Anyway, while doing so, I realised just how awkward the current interface to > Clustalw is, which doesn't fit the SeqIO/AlignIO paradigm well. What I typically do fits pretty nicely with the SeqIO/AlignIO paradigm: (1) use SeqIO to prepare the FASTA input file. (2) run the command line tool (e.g. MUSCLE). (3) use AlignIO (or SeqIO) to read the alignment output file. Actually I think that Bio.Clustalw interface is now a bit out of place, as it hides some of this from you. (Note that Bio.Clustalw predates Bio.AlignIO, and that by working with handles Bio.AlignIO is fairly tool neutral). > Currently, if we have a bunch of SeqRecords, say after downloading from > GenBank or being pulled from a BioSQL db, we have to write them to disk > and call clustalw on the file: > >>>> from Bio import Clustalw >>>> from Bio.Clustalw import MultipleAlignCL >>>> cline = MultipleAlignCL("f002", command="clustalw") >>>> align = Clustalw.do_alignment(cline) Well yes. Typically for any alignment tool you'd have to write the unaligned records in FASTA format. Some tools may let handle this via standard input, so you may be able to use a pipe instead of a file - but the issues are similar. > It seems to me more appropriate to be able to call clustalw directly on a > bunch of SeqRecords: > > eg (suggested implementation) >>>> records = list(SeqIO.parse(open("f002", "r"), "fasta")) >>>> from Bio.Align import MultipleAlignment >>>> align = MultipleAlignment(records, executable="clustalw") i.e. Have a Biopython wrapper use a temp file to record the given records to in a format appropriate for the command line tool selected, and capturing the output? In the case of ClustalW or MUSCLE this means making a temp FASTA input file. For ClustalW we'd then have to open the output file, read it, and then delete it. For other tools we may be able to just capture its output on stdout and not have to clean up a temp output file. All the possible command line tools have their own arguments, range of file formats, behaviour with respect to default filenames etc. Trying to capture all this in a single wrapper seems rather ambitious. For example, how would you handle gap penalties? Keep in mind that different tools may use the same name for a gap extension penalty but interpret the values differently. Also, while I can see this might be nice for short alignments (which are quick to run), its rather implicit or magic. I personally prefer to have to deal with the files explicitly myself - but then I have been dealing with large alignments which I want to keep on disk. > Secondly, the biopython interface does not support calling > Clustalw to perform profile alignments, > > (suggested implementation) > # The scaffold alignment: >>>> align = AlignIO.read(open("blah.nex", "r"), "nexus") > # The sequences we want to add to it: >>>> records = list(SeqIO.parse(open("f002", "r"), "fasta")) >>>> from Bio.Align import ProfileAlignment >>>> align = ProfileAlignment(align, records, executable="clustalw") > > Calls to MultipleAlignment and ProfileAlignment would take a > **options parameter to collect any additional command line options. > > Thirdly, should an alignment object have a > Alignment.refine_alignment(executable="clustalw") > method? > > Any thoughts? I may have misunderstood you, but the ideas you've sketched out seem very very broad/ambitious - and actually take us further away from the SeqIO/AlignIO interface by hiding all the filenames and handles from the user. I think these should be kept explicit. Peter From eric.talevich at gmail.com Mon Mar 30 14:34:09 2009 From: eric.talevich at gmail.com (Eric Talevich) Date: Mon, 30 Mar 2009 14:34:09 -0400 Subject: [Biopython-dev] Google Summer of Code -- phyloXML parser project Message-ID: <3f6baf360903301134p421a41f2if2b8980e9e166451@mail.gmail.com> Hi folks, I noticed earlier this month that several Biopython developers had signed up as potential mentors in OBF's Summer of Code application. Although OBF apparently wasn't selected as a mentoring organization this year, some other bioinformatics-related groups were -- in particular, the National Evolutionary Synthesis Center's page mentions involvement with the Bio* projects: http://socghop.appspot.com/org/show/google/gsoc2009/nescent The project I'd like to work on is a phyloXML parser for Biopython. NESCent's idea list includes a similar entry for BioRuby (links below). I asked the mentor, Christian Zmasek, if it would be acceptable to do the project with Biopython instead of BioRuby, and he said it would, but he'd prefer to have a Biopython specialist on board as another mentor. Would any of you be interested in being a mentor for this project? I imagine it would have some things in common with the existing Nexus parser, as a starting point. http://www.phyloxml.org/ https://www.nescent.org/wg/phyloinformatics/index.php?title=Phyloinformatics_Summer_of_Code_2009#phyloXML_support_in_BioRuby Thanks, Eric From chapmanb at 50mail.com Mon Mar 30 17:00:07 2009 From: chapmanb at 50mail.com (Brad Chapman) Date: Mon, 30 Mar 2009 17:00:07 -0400 Subject: [Biopython-dev] Multiple alignment - Clustalw etc... In-Reply-To: <320fb6e00903300737i73f6efaex7b0a22ee685c74c1@mail.gmail.com> References: <7265d4f0903300442h276df25ay1d78fb04180c5b5b@mail.gmail.com> <320fb6e00903300737i73f6efaex7b0a22ee685c74c1@mail.gmail.com> Message-ID: <20090330210007.GC72956@sobchak.mgh.harvard.edu> Cymon; I wrote a bunch of the Clustalw stuff a long while ago, and it sounds like Peter has a good handle on integrating it with AlignIO so I will leave that to him. On the choosing aligners side of things, have you tried MAFFT? http://align.bmr.kyushu-u.ac.jp/mafft/software/ It's updated regularly and seems to have good buzz in the community. I haven't had to do lots of multiple alignments recently, but it's worked well for the few I've done. Having support for multiple aligners is good stuff; I second Peter's suggestion of having these live under Bio.Align. Brad > On Mon, Mar 30, 2009 at 12:42 PM, Cymon Cox wrote: > > > > Hi Folks, > > > > I've been trying to formalize a bunch of randomly scattered bits of code to > > support the use of the alignment programme Muscle > > (http://www.drive5.com/muscle/). I prefer to use this software in preference > > to Clustalw - subjectively, it seems to give the most accurate alignments. > > (Whether Biopython would want to support a second alignment programme > > /external dependency is another matter...) > > A wrapper for MUSCLE wouldn't hurt - although there is scope for some > rearrangement of our command line tool wrappers rather than adding more > and more top level modules. Maybe under Bio.Align, and move the Clustalw > wrapper there too. > > > Anyway, while doing so, I realised just how awkward the current interface to > > Clustalw is, which doesn't fit the SeqIO/AlignIO paradigm well. > > What I typically do fits pretty nicely with the SeqIO/AlignIO paradigm: > (1) use SeqIO to prepare the FASTA input file. > (2) run the command line tool (e.g. MUSCLE). > (3) use AlignIO (or SeqIO) to read the alignment output file. > > Actually I think that Bio.Clustalw interface is now a bit out of place, > as it hides some of this from you. (Note that Bio.Clustalw predates > Bio.AlignIO, and that by working with handles Bio.AlignIO is fairly > tool neutral). > > > Currently, if we have a bunch of SeqRecords, say after downloading from > > GenBank or being pulled from a BioSQL db, we have to write them to disk > > and call clustalw on the file: > > > >>>> from Bio import Clustalw > >>>> from Bio.Clustalw import MultipleAlignCL > >>>> cline = MultipleAlignCL("f002", command="clustalw") > >>>> align = Clustalw.do_alignment(cline) > > Well yes. Typically for any alignment tool you'd have to write the > unaligned records in FASTA format. Some tools may let handle > this via standard input, so you may be able to use a pipe instead > of a file - but the issues are similar. > > > It seems to me more appropriate to be able to call clustalw directly on a > > bunch of SeqRecords: > > > > eg (suggested implementation) > >>>> records = list(SeqIO.parse(open("f002", "r"), "fasta")) > >>>> from Bio.Align import MultipleAlignment > >>>> align = MultipleAlignment(records, executable="clustalw") > > i.e. Have a Biopython wrapper use a temp file to record the > given records to in a format appropriate for the command line > tool selected, and capturing the output? In the case of > ClustalW or MUSCLE this means making a temp FASTA input > file. For ClustalW we'd then have to open the output file, read > it, and then delete it. For other tools we may be able to just > capture its output on stdout and not have to clean up a temp > output file. > > All the possible command line tools have their own arguments, > range of file formats, behaviour with respect to default filenames > etc. Trying to capture all this in a single wrapper seems rather > ambitious. For example, how would you handle gap penalties? > Keep in mind that different tools may use the same name for > a gap extension penalty but interpret the values differently. > > Also, while I can see this might be nice for short alignments > (which are quick to run), its rather implicit or magic. I personally > prefer to have to deal with the files explicitly myself - but then I > have been dealing with large alignments which I want to keep > on disk. > > > Secondly, the biopython interface does not support calling > > Clustalw to perform profile alignments, > > > > (suggested implementation) > > # The scaffold alignment: > >>>> align = AlignIO.read(open("blah.nex", "r"), "nexus") > > # The sequences we want to add to it: > >>>> records = list(SeqIO.parse(open("f002", "r"), "fasta")) > >>>> from Bio.Align import ProfileAlignment > >>>> align = ProfileAlignment(align, records, executable="clustalw") > > > > Calls to MultipleAlignment and ProfileAlignment would take a > > **options parameter to collect any additional command line options. > > > > Thirdly, should an alignment object have a > > Alignment.refine_alignment(executable="clustalw") > > method? > > > > Any thoughts? > > I may have misunderstood you, but the ideas you've sketched out > seem very very broad/ambitious - and actually take us further away > from the SeqIO/AlignIO interface by hiding all the filenames and > handles from the user. I think these should be kept explicit. > > Peter > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev From chapmanb at 50mail.com Mon Mar 30 17:14:48 2009 From: chapmanb at 50mail.com (Brad Chapman) Date: Mon, 30 Mar 2009 17:14:48 -0400 Subject: [Biopython-dev] Google Summer of Code -- phyloXML parser project In-Reply-To: <3f6baf360903301134p421a41f2if2b8980e9e166451@mail.gmail.com> References: <3f6baf360903301134p421a41f2if2b8980e9e166451@mail.gmail.com> Message-ID: <20090330211448.GF72956@sobchak.mgh.harvard.edu> Hi Eric; I would be happy to help with mentoring. I have been helping another student with his application and could definitely give you feedback on yours. Based on good ones coming through the list, it should be detailed with a week by week description of what you plan to be working on and specific deliverables. They also have a short description of the motivation and your qualifications. This is my first time doing this, so I don't know much about the selection process. If more than one Biopython project was selected, I couldn't realistically mentor both; I am not even sure if that is a possibility. Either way, Google recommends having two mentors per student so it would be good to have someone else step up as well. Let me know if you have any specific questions while you are getting things together this week, Brad > Hi folks, > > I noticed earlier this month that several Biopython developers had signed up > as potential mentors in OBF's Summer of Code application. Although OBF > apparently wasn't selected as a mentoring organization this year, some other > bioinformatics-related groups were -- in particular, the National > Evolutionary Synthesis Center's page mentions involvement with the Bio* > projects: > > http://socghop.appspot.com/org/show/google/gsoc2009/nescent > > The project I'd like to work on is a phyloXML parser for Biopython. > NESCent's idea list includes a similar entry for BioRuby (links below). I > asked the mentor, Christian Zmasek, if it would be acceptable to do the > project with Biopython instead of BioRuby, and he said it would, but he'd > prefer to have a Biopython specialist on board as another mentor. > > Would any of you be interested in being a mentor for this project? I imagine > it would have some things in common with the existing Nexus parser, as a > starting point. > > http://www.phyloxml.org/ > https://www.nescent.org/wg/phyloinformatics/index.php?title=Phyloinformatics_Summer_of_Code_2009#phyloXML_support_in_BioRuby > > Thanks, > Eric > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev From chapmanb at 50mail.com Mon Mar 30 17:33:17 2009 From: chapmanb at 50mail.com (Brad Chapman) Date: Mon, 30 Mar 2009 17:33:17 -0400 Subject: [Biopython-dev] Biopython on Twitter In-Reply-To: <320fb6e00903291558o6299575dq80eea647b1c6a900@mail.gmail.com> References: <320fb6e00903260849n683d3e39kf68fd91727970dc7@mail.gmail.com> <20090329010652.GA914@kunkel> <320fb6e00903291558o6299575dq80eea647b1c6a900@mail.gmail.com> Message-ID: <20090330213317.GG72956@sobchak.mgh.harvard.edu> Hi Peter; Thanks for the feedback. I was definitely not being critical of your postings, or fishing for extra jobs for myself. On the contrary, I was inspired by the news items and brainstorming some ways to get additional people involved. People who express an interest in Biopython and don't get involved often list the following reasons: - Not feeling like they are technically able to contribute. Perhaps they are just learning Python, or don't feel comfortable with the Biopython library itself. - Traditional academics doesn't offer recognition for contributing to open source projects. While we can't change academics, we can try and come up with ways to improve the visibility of contributors and make sure they are recognized in the larger bioinformatics community. My thought was that a "news coordinator" would give one or more interested people a chance to help the community, learn more about Biopython by being involved, and also increase name recognition for everyone coding, bug fixing and discussing. In terms of how it is done, those were only my random suggestions. Certainly if someone took it up they could be as creative as they want about how to go about it. Brad > On Sun, Mar 29, 2009 at 2:06 AM, Brad Chapman wrote: > > Hi all; > > It is great we are exploring getting news out about Biopython in > > additional ways. One thing this can really help with is recognizing > > contributions to Biopython. Another is pointing out interesting > > discussion threads on the mailing lists and getting others involved. > > Do you think the recent release notes and NEWS file entries have been > a bit too impersonal? We can certainly be a bit more explicit if people > that is a good thing. For example, should we mention Bartek by name > in the paragraph on the new Bio.Motif module? > > This is linked to from the wiki's news page BTW: > http://biopython.open-bio.org/SRC/biopython/NEWS > http://cvs.biopython.org/cgi-bin/viewcvs/viewcvs.cgi/biopython/NEWS?cvsroot=biopython > > > Do you think it would be worthwhile to "advertise" on the main list > > for someone interested in coordinating news and communication? > > ... Perhaps there are members who are interested in Biopython and > > follow what is going on but aren't coders. This would be a way to > > get involved, ... > > Are you up for the job yourself Brad? From your own blog we know > you can and do write regularly anyway ;) Would you like an account > on the OBF news server? Email me off list and we can sort that out. > > In terms of micro-blogging via twitter, you sound like you have a better > feel for this than me - I don't even have a personal twitter account. > > Monthly news posts (perhaps cc'd to the announcement email list) > would be a nice idea - especially if we can encourage more lurkers > to speak up. For a while BioPerl had something like this going > (digest emails or something), but it needs a pretty dedicated person > or team. In the meantime as you've noticed I've started making > more use of our news facility myself... > > Peter From biopython at maubp.freeserve.co.uk Mon Mar 30 17:58:52 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 30 Mar 2009 22:58:52 +0100 Subject: [Biopython-dev] Biopython on Twitter In-Reply-To: <20090330213317.GG72956@sobchak.mgh.harvard.edu> References: <320fb6e00903260849n683d3e39kf68fd91727970dc7@mail.gmail.com> <20090329010652.GA914@kunkel> <320fb6e00903291558o6299575dq80eea647b1c6a900@mail.gmail.com> <20090330213317.GG72956@sobchak.mgh.harvard.edu> Message-ID: <320fb6e00903301458s7216ec97gc4ac71a03d0fd350@mail.gmail.com> On Mon, Mar 30, 2009 at 10:33 PM, Brad Chapman wrote: > Hi Peter; > Thanks for the feedback. I was definitely not being critical of your > postings, ... I hadn't had that impression, but that's still nice to hear ;) > ... or fishing for extra jobs for myself. Darn - I thought you'd be an excellent choice. > On the contrary, I was inspired by the news items and > brainstorming some ways to get additional people involved. Well unless anyone already lurking on the dev mailing list steps forward (*hint hint*), do you (Brad) want to try asking on the main discussion list to see if there are any takers? > People who express an interest in Biopython and don't get > involved often list the following reasons: > > - Not feeling like they are technically able to contribute. Perhaps > ?they are just learning Python, or don't feel comfortable with the > ?Biopython library itself. I find once they get over any shyness, even just having beginners asking questions can be valuable in itself. It shows us potential blind spots, or areas of the documentation which need clarification (or writing) - plus of course it can bring about discussions etc. > - Traditional academics doesn't offer recognition for contributing to > ?open source projects. While we can't change academics, we can try > ?and come up with ways to improve the visibility of contributors and > ?make sure they are recognized in the larger bioinformatics > ?community. > > My thought was that a "news coordinator" would give one or more > interested people a chance to help the community, learn more about > Biopython by being involved, and also increase name recognition for > everyone coding, bug fixing and discussing. Some of us are very aware of this issue (accademic recognition for contributions to projects like Biopython), and different employers will take different attitudes here. In some cases making our contributors more visible won't always be a good idea... In my case work on Biopython was a definite plus point in landing my current job, but there are of course still limits to how much work time I can reasonably spend on this (and limits to how much time I spend out of work - like right now on this email). > In terms of how it is done, those were only my random suggestions. > Certainly if someone took it up they could be as creative as they > want about how to go about it. > > Brad It's certainly worth a go :) Peter From biopython at maubp.freeserve.co.uk Mon Mar 30 18:35:05 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 30 Mar 2009 23:35:05 +0100 Subject: [Biopython-dev] Testing Biopython with NumPy 1.3 In-Reply-To: <320fb6e00903300623j1f17fe6fia6ded742a7c610ec@mail.gmail.com> References: <320fb6e00903300326x4cb5eb95r87dbd5c95d5379d9@mail.gmail.com> <320fb6e00903300329ra19fe06j1cd12477e591afdf@mail.gmail.com> <320fb6e00903300338v35b14fa2yc0d2ba68925808da@mail.gmail.com> <20090330130027.GB36526@sobchak.mgh.harvard.edu> <320fb6e00903300623j1f17fe6fia6ded742a7c610ec@mail.gmail.com> Message-ID: <320fb6e00903301535j21ae6659r931c9be0fd17faf3@mail.gmail.com> On Mon, Mar 30, 2009 at 2:23 PM, Peter wrote: > I've just used NumPy 1.3.0rc1 with Python 2.4.3 on a Linux box, and > test_Cluster and the rest of the Biopython tests passed. ?This looks like > a Windows and/or Python 2.6 problem - I should be able to try a Linux > machine with Python 2.6 tonight... I've just tried it on Ubuntu Jaunty (Alpha 6), with Python 2.6.1+ (already installed), the wise and clustalw packages installed, Numpy 1.3.0rc1 installed from source, and Biopython CVS installed from source. Again, test_Cluster.py and the rest of our tests pass (ignoring those with additional external dependencies like BioSQL, fdist, simcoal2). So, whatever is going wrong on test_Cluster.py seems to be specific to Windows (XP) and Python 2.6 - and possibly just my Windows development machine. Peter From mjldehoon at yahoo.com Mon Mar 30 20:08:34 2009 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Mon, 30 Mar 2009 17:08:34 -0700 (PDT) Subject: [Biopython-dev] Testing Biopython with NumPy 1.3 In-Reply-To: <320fb6e00903301535j21ae6659r931c9be0fd17faf3@mail.gmail.com> Message-ID: <730606.962.qm@web62408.mail.re1.yahoo.com> > So, whatever is going wrong on test_Cluster.py seems to be > specific to > Windows (XP) and Python 2.6 - and possibly just my Windows > development > machine. > I believe that the problem is that msvcr90.dll is missing. This is the C runtime from Microsoft. Earlier Pythons used msvcr71.dll, if I'm not mistaken. --Michiel From biopython at maubp.freeserve.co.uk Tue Mar 31 05:12:21 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 31 Mar 2009 10:12:21 +0100 Subject: [Biopython-dev] Testing Biopython with NumPy 1.3 In-Reply-To: <730606.962.qm@web62408.mail.re1.yahoo.com> References: <320fb6e00903301535j21ae6659r931c9be0fd17faf3@mail.gmail.com> <730606.962.qm@web62408.mail.re1.yahoo.com> Message-ID: <320fb6e00903310212o29bba163ma9d68a901eabc2c9@mail.gmail.com> On Tue, Mar 31, 2009 at 1:08 AM, Michiel de Hoon wrote: > >> So, whatever is going wrong on test_Cluster.py seems to be >> specific to Windows (XP) and Python 2.6 - and possibly just >> my Windows development machine. >> > I believe that the problem is that msvcr90.dll is missing. This > is the C runtime from Microsoft. Earlier Pythons used > msvcr71.dll, if I'm not mistaken. You may be right - there is some stuff on the numpy mailing list about this and manifest files etc when using mingw32. It may be simplest to try the appropriate MS compiler instead... Peter From biopython at maubp.freeserve.co.uk Tue Mar 31 06:28:35 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 31 Mar 2009 11:28:35 +0100 Subject: [Biopython-dev] Python's new DVCS chosen Message-ID: <320fb6e00903310328x3c2d8bc0n8138f551da7ea4a2@mail.gmail.com> Hi all, This might be of interest (although I'm sure some of you already know). Earlier this month on the python-dev mailing list, Guido van Rossum wrote: > Dear Python developers, > > The decision is made! I've selected a DVCS to use for Python. > We're switching to Mercurial (Hg). > > The implementation and schedule is still up in the air -- I am > hoping that we can switch before the summer. > ... http://mail.python.org/pipermail/python-dev/2009-March/087931.html See also PEP-374, http://www.python.org/dev/peps/pep-0374/ Interestingly, Mercurial (Hg) didn't get much of a mention in our discussions here. Peter From bartek at rezolwenta.eu.org Tue Mar 31 07:05:07 2009 From: bartek at rezolwenta.eu.org (Bartek Wilczynski) Date: Tue, 31 Mar 2009 13:05:07 +0200 Subject: [Biopython-dev] Python's new DVCS chosen In-Reply-To: <320fb6e00903310328x3c2d8bc0n8138f551da7ea4a2@mail.gmail.com> References: <320fb6e00903310328x3c2d8bc0n8138f551da7ea4a2@mail.gmail.com> Message-ID: <8b34ec180903310405x5d5353f0q2de270a3c16bdc95@mail.gmail.com> Hi, On Tue, Mar 31, 2009 at 12:28 PM, Peter wrote: > Hi all, > > This might be of interest (although I'm sure some of you already > know). ?Earlier this month on the python-dev mailing list, Guido > van Rossum wrote: >> We're switching to Mercurial (Hg). > Interestingly, Mercurial (Hg) didn't get much of a mention in our > discussions here. Their evaluation of different options (in PEP 374) was mentioned on the list by Bruce, so everyone was able to make their opinions. As Guido explains in another paragraph: >It's hard to explain my reasons for choosing -- like most language >decisions (especially the difficult ones) it's mostly a matter of gut >feelings. One thing I know is that it's better to decide now than to >spend another year discussing the pros and cons. All that could be >said has been said, pretty much, and my mind is made up. He seems to find all the candidates good enough. It's a matter then of a consensus between developers. Git happened to have many antagonists on python-dev list, but it happened to have more protagonists on biopython-dev. I think we have made a consensus decision to try out git/github and I think it's extremely counter-productive to re-open the discussion on our choice now. I'm not a git fanboy, but because there are _no_ universal criteria to choose between git vs. bzr vs. Hg we should not spend more time on this issue. cheers Bartek From cy at cymon.org Tue Mar 31 07:25:27 2009 From: cy at cymon.org (Cymon Cox) Date: Tue, 31 Mar 2009 12:25:27 +0100 Subject: [Biopython-dev] Multiple alignment - Clustalw etc... In-Reply-To: <320fb6e00903300737i73f6efaex7b0a22ee685c74c1@mail.gmail.com> References: <7265d4f0903300442h276df25ay1d78fb04180c5b5b@mail.gmail.com> <320fb6e00903300737i73f6efaex7b0a22ee685c74c1@mail.gmail.com> Message-ID: <7265d4f0903310425p60a8f80ewb8aee8cc6b4a663c@mail.gmail.com> Hi Peter, 2009/3/30 Peter > On Mon, Mar 30, 2009 at 12:42 PM, Cymon Cox wrote: > > > > Hi Folks, > > > > I've been trying to formalize a bunch of randomly scattered bits of code > to > > support the use of the alignment programme Muscle > > (http://www.drive5.com/muscle/). I prefer to use this software in > preference > > to Clustalw - subjectively, it seems to give the most accurate > alignments. > > (Whether Biopython would want to support a second alignment programme > > /external dependency is another matter...) > > A wrapper for MUSCLE wouldn't hurt - although there is scope for some > rearrangement of our command line tool wrappers rather than adding more > and more top level modules. Maybe under Bio.Align, and move the Clustalw > wrapper there too. Agreed - it would seem more appropriate to have the alignment interfaces in Bio.Align. > > Anyway, while doing so, I realised just how awkward the current interface > to > > Clustalw is, which doesn't fit the SeqIO/AlignIO paradigm well. > > What I typically do fits pretty nicely with the SeqIO/AlignIO paradigm: > (1) use SeqIO to prepare the FASTA input file. > (2) run the command line tool (e.g. MUSCLE). > (3) use AlignIO (or SeqIO) to read the alignment output file. Well, yes - we can always not use the biopython interface. > Actually I think that Bio.Clustalw interface is now a bit out of place, > as it hides some of this from you. (Note that Bio.Clustalw predates > Bio.AlignIO, and that by working with handles Bio.AlignIO is fairly > tool neutral). > > > Currently, if we have a bunch of SeqRecords, say after downloading from > > GenBank or being pulled from a BioSQL db, we have to write them to disk > > and call clustalw on the file: > > > >>>> from Bio import Clustalw > >>>> from Bio.Clustalw import MultipleAlignCL > >>>> cline = MultipleAlignCL("f002", command="clustalw") > >>>> align = Clustalw.do_alignment(cline) > > Well yes. Typically for any alignment tool you'd have to write the > unaligned records in FASTA format. Some tools may let handle > this via standard input, so you may be able to use a pipe instead > of a file - but the issues are similar. > > > It seems to me more appropriate to be able to call clustalw directly on a > > bunch of SeqRecords: > > > > eg (suggested implementation) > >>>> records = list(SeqIO.parse(open("f002", "r"), "fasta")) > >>>> from Bio.Align import MultipleAlignment > >>>> align = MultipleAlignment(records, executable="clustalw") > > i.e. Have a Biopython wrapper use a temp file to record the > given records to in a format appropriate for the command line > tool selected, and capturing the output? In the case of > ClustalW or MUSCLE this means making a temp FASTA input > file. For ClustalW we'd then have to open the output file, read > it, and then delete it. Yes, that's what I'm suggesting. Here's my reasoning: it seems to me the input and output formats of the data required by a particular alignment tool are incidental and should be hidden from the user. At present the Clustalw interface forces you to write a fasta formatted file of your records to disk, and then has Clustalw write an aligned matrix to disk in a format specified by the user. If the latter is Clustal format, then the record is parsed and an alignment object is returned, else None is returned. In either case, an output file(s) remains on disk. So, say we have a bunch of sequences in pir format and we'd like them aligned and saved in stockholm format: from Bio import SeqIO from Bio import AlignIO from Bio import Clustalw from Bio.Clustalw import MultipleAlignCL records = SeqIO.parse(open("Tests/NBRF/DMA_nuc.pir", "r"), "pir") AlignIO.write([records], open("temp.fasta", "w"), "fasta") cline = MultipleAlignCL("temp.fasta", command="clustalw") align = Clustalw.do_alignment(cline) AlignIO.write([align], open("temp.sth", "w"), "stockholm") we end up with 4 output files on disk: temp.aln, temp.dnd, temp.fasta, temp.sth - 3 of which are incidental. (BTW, using the above procedure on the files "B_nuc.pir" and "Cw_prot.pir" in Tests/NBRF hangs on RH and Ubuntu linux: it seems to be waiting for the subprocess to return, which it never does: pid, sts = os.waitpid(self.pid, 0)) As I say, I'd like to see this: >>> from Bio.Align import MultipleAlignment >>> records = list(SeqIO.parse(open("Tests/NBRF/DMA_nuc.pir", "r"), "pir")) >>> align = MultipleAlignment(records, executable="clustalw") >>> AlignIO.write([align], open("temp.sth", "w"), "stockholm") ie resulting in one file temp.sth, which we've explicitly written to disk. > For other tools we may be able to just > capture its output on stdout and not have to clean up a temp > output file. > > All the possible command line tools have their own arguments, > range of file formats, behaviour with respect to default filenames > etc. Trying to capture all this in a single wrapper seems rather > ambitious. For example, how would you handle gap penalties? > Keep in mind that different tools may use the same name for > a gap extension penalty but interpret the values differently. Sorry, I wasn't very clear about what I intended: MultipleAlignment(records, executable="clustalw", ) returns Clustalw.do_alignment(records, ) and MultipleAlignment(records, executable="muscle", ) returns Muscle.do_alignments(records, ) I'm not suggesting unifying all programme options into a single interface, just wrap the individual alignment tool modules in a common call, MulitpleAlignment(), align_records(), or whatever... As for the keyword options, at present the Clustalw interface supports the manual setting of some attributes to the MultipleAlignCL instance, but there is no type or value checking. I think as many options as possible should be supported through keyword arguments - tedious, but doable. Also, while I can see this might be nice for short alignments > (which are quick to run), its rather implicit or magic. Not sure what you mean here? Why would the size of alignment matter? And as for it being magic, its seems to me it does, and only does, what it says on the label - aligns the data. > I personally > prefer to have to deal with the files explicitly myself - but then I > have been dealing with large alignments which I want to keep > on disk. I tend to build many (small - <100 taxa) single gene alignments - in one use-case, 280 of them... > Secondly, the biopython interface does not support calling > > Clustalw to perform profile alignments, > > > > (suggested implementation) > > # The scaffold alignment: > >>>> align = AlignIO.read(open("blah.nex", "r"), "nexus") > > # The sequences we want to add to it: > >>>> records = list(SeqIO.parse(open("f002", "r"), "fasta")) > >>>> from Bio.Align import ProfileAlignment > >>>> align = ProfileAlignment(align, records, executable="clustalw") > > > > Calls to MultipleAlignment and ProfileAlignment would take a > > **options parameter to collect any additional command line options. > I'm very keen to see profile alignments supported - be it either in Clustalw or Muscle, or both. > > > Thirdly, should an alignment object have a > > Alignment.refine_alignment(executable="clustalw") > > method? > > > > Any thoughts? > > I may have misunderstood you, but the ideas you've sketched out > seem very very broad/ambitious - and actually take us further away > from the SeqIO/AlignIO interface by hiding all the filenames and > handles from the user. I think these should be kept explicit. OK, well having had my say, I'm quite happy to write the Muscle module in the style of the current Clustalw interface, or whatever style is most appropriate for exposing the filename handles. But I'm not sure what that would be - perhaps you could elaborate on this a bit... Cheers, C. -- ____________________________________________________________________ Cymon J. Cox Centro de Ciencias do Mar Faculdade de Ciencias do Mar e Ambiente (FCMA) Universidade do Algarve Campus de Gambelas 8005-139 Faro Portugal Phone: +0351 289800909 ext 7909 Fax: +0351 289800051 Email: cy at cymon.org, cymon at ualg.pt, cymon.cox at gmail.com HomePage : http://biology.duke.edu/bryology/cymon.html -8.63/-6.77 From biopython at maubp.freeserve.co.uk Tue Mar 31 07:27:07 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 31 Mar 2009 12:27:07 +0100 Subject: [Biopython-dev] Python's new DVCS chosen In-Reply-To: <8b34ec180903310405x5d5353f0q2de270a3c16bdc95@mail.gmail.com> References: <320fb6e00903310328x3c2d8bc0n8138f551da7ea4a2@mail.gmail.com> <8b34ec180903310405x5d5353f0q2de270a3c16bdc95@mail.gmail.com> Message-ID: <320fb6e00903310427s46e45337g42ced1a8e9c3a37f@mail.gmail.com> On Tue, Mar 31, 2009 at 12:05 PM, Bartek Wilczynski wrote: > I think we have made a consensus decision to try out git/github and I > think it's extremely counter-productive to re-open the discussion on > our choice now. I'm not a git fanboy, but because there are _no_ > universal criteria to choose between git vs. bzr vs. Hg we should not > spend more time on this issue. I hadn't intended to reopen the debate - it was just a post for interests sake. As you can probably tell from looking at the biopython network graph on github (which I got to work on Linux but only with Adobe's flash plugin - gnash etc didn't seem to cope), I've been getting to grips with git (and github). Peter From biopython at maubp.freeserve.co.uk Tue Mar 31 08:56:21 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 31 Mar 2009 13:56:21 +0100 Subject: [Biopython-dev] Multiple alignment - Clustalw etc... In-Reply-To: <7265d4f0903310425p60a8f80ewb8aee8cc6b4a663c@mail.gmail.com> References: <7265d4f0903300442h276df25ay1d78fb04180c5b5b@mail.gmail.com> <320fb6e00903300737i73f6efaex7b0a22ee685c74c1@mail.gmail.com> <7265d4f0903310425p60a8f80ewb8aee8cc6b4a663c@mail.gmail.com> Message-ID: <320fb6e00903310556h670634c2rcaa56c254ade07c5@mail.gmail.com> On Tue, Mar 31, 2009 at 12:25 PM, Cymon Cox wrote: >> What I typically do fits pretty nicely with the SeqIO/AlignIO paradigm: >> (1) use SeqIO to prepare the FASTA input file. >> (2) run the command line tool (e.g. MUSCLE). >> (3) use AlignIO (or SeqIO) to read the alignment output file. > > Well, yes - we can always not use the biopython interface. Ideally step (2) in the above would be handled via a Biopython command line wrapper, offering keyword arguments etc. >> i.e. Have a Biopython wrapper use a temp file to record the >> given records to in a format appropriate for the command line >> tool selected, and capturing the output? ?In the case of >> ClustalW or MUSCLE this means making a temp FASTA input >> file. ?For ClustalW we'd then have to open the output file, read >> it, and then delete it. > > Yes, that's what I'm suggesting. > > Here's my reasoning: it seems to me the input and output formats of the data > required by a particular alignment tool are incidental and should be hidden > from the user. OK - I see this as doing some implicit behind the scenes magic. Arguably this kind of thing is still nice to have if it makes things simpler for the user. I may over use this mantra, but "Explicit is better than implicit", from the Zen of Python. http://www.python.org/dev/peps/pep-0020/ > At present the Clustalw interface forces you to write a fasta > formatted file of your records to disk, and then has Clustalw > write an aligned matrix to disk in a format specified by the user. The Clustalw tool only takes FASTA formatted input, so if you have a bunch of sequences in memory you are forced to convert them into FASTA format to use them as input. The question is where does this conversion take place - explicitly by the user, or implicitly by a wrapper. > If the latter is Clustal format, then the record is parsed and an alignment > object is returned, else None is returned. In either case, an output file(s) > remains on disk. It should be a fairly simple enhancement to look at the arguments to see if another output format we can parse was selected, e.g. PHYLIP?) and also parse that. Do you think that would be a sensible addition to Bio.Clustalw.do_alignment? Its never been an issue for me as if you are using the Bio.Clustalw.do_alignment interface you probably don't care about the output file format. > So, say we have a bunch of sequences in pir format and we'd like them > aligned and saved in stockholm format: > > from Bio import SeqIO > from Bio import AlignIO > from Bio import Clustalw > from Bio.Clustalw import MultipleAlignCL > records = SeqIO.parse(open("Tests/NBRF/DMA_nuc.pir", "r"), "pir") > AlignIO.write([records], open("temp.fasta", "w"), "fasta") The above line is wrong - it should be: SeqIO.write(records, open("temp.fasta", "w"), "fasta") At this point your PIR sequences are not yet aligned, so they'll (probably) have different lengths, so shouldn't be treated as an alignment. If it doesn't raise an error maybe it should... Also you don't explicitly close the handle this way. > cline = MultipleAlignCL("temp.fasta", command="clustalw") > align = Clustalw.do_alignment(cline) > AlignIO.write([align], open("temp.sth", "w"), "stockholm") > we end up with 4 output files on disk: temp.aln, ?temp.dnd, ?temp.fasta, > temp.sth - 3 of which are incidental. Yes - but as the ClustalW doesn't read in PIR files, and doesn't output Stockholm files on its own, so this has to happen. It's just a question of who does it (the user, or the wrapper code). > (BTW, using the above procedure on the files "B_nuc.pir" and "Cw_prot.pir" > in Tests/NBRF hangs on RH and Ubuntu linux: it seems to be waiting for the > subprocess to return, which it never does: pid, sts = os.waitpid(self.pid, > 0)) I would guess this is because you never properly closed the temp.fasta file, so it may not have been flushed to disk when the Clustalw tool was called. > As I say, I'd like to see this: >>>> from Bio.Align import MultipleAlignment >>>> records = list(SeqIO.parse(open("Tests/NBRF/DMA_nuc.pir", "r"), "pir")) >>>> align = MultipleAlignment(records, executable="clustalw") >>>> AlignIO.write([align], open("temp.sth", "w"), "stockholm") > > ie resulting in one file temp.sth, which we've explicitly written to disk. So you'd like the wrapper to take care of creating and deleting the temp input FASTA file, and also deleting the temp output ClustalW file after parsing it. This can probably be done quite cleanly using python's NamedTemporaryFile object. >>?For other tools we may be able to just capture its output on >> stdout and not have to clean up a temp output file. >> >> All the possible command line tools have their own arguments, >> range of file formats, behaviour with respect to default filenames >> etc. ?Trying to capture all this in a single wrapper seems rather >> ambitious. ?For example, how would you handle gap penalties? >> Keep in mind that different tools may use the same name for >> a gap extension penalty but interpret the values differently. > > Sorry, I wasn't very clear about what I intended: > > MultipleAlignment(records, executable="clustalw", ) > returns Clustalw.do_alignment(records, ) > and > MultipleAlignment(records, executable="muscle", ) > returns Muscle.do_alignments(records, ) > > I'm not suggesting unifying all programme options into a single interface, > just wrap the individual alignment tool modules in a common call, > MulitpleAlignment(), align_records(), or whatever... I see. > As for the keyword options, at present the Clustalw interface supports the > manual setting of some attributes to the MultipleAlignCL instance, but there > is no type or value checking. I think as many options as possible should be > supported through keyword arguments - tedious, but doable. > >> Also, while I can see this might be nice for short alignments >> (which are quick to run), its rather implicit or magic. > > Not sure what you mean here? Why would the size of alignment matter? Size of alignment influences the compute time, and therefore is an issue for anyone doing things at the python prompt. Moreover, if the alignments are big and slow, you generally want to make sure the output file is kept on disk, as you'll probably want to read it more than once. > And as for it being magic, its seems to me it does, and only does, what > it says on the label - aligns the data. The magic is the behind the scenes creation/deletion of the input/output files, and the conversion between file formats. >> I personally prefer to have to deal with the files explicitly myself >> - but then I have been dealing with large alignments which I want >> to keep on disk. > > I tend to build many (small - <100 taxa) single gene alignments - in one > use-case, 280 of them... In your case I would assume the alignment takes minutes to run. You tend to care more about preserving the output files if they take hours to create ;) >> > Secondly, the biopython interface does not support calling >> > Clustalw to perform profile alignments, That is something we should probably add. > OK, well having had my say, I'm quite happy to write the Muscle module in > the style of the current Clustalw interface, or whatever style is most > appropriate for exposing the filename handles. But I'm not sure what that > would be - perhaps you could elaborate on this a bit... I've elaborated, perhaps too much? ;) Basically you seem to be thinking about a high level abstraction for multiple alignment tools (dependent on the Bio.SeqIO and Bio.AlignIO module), while I am more focused on the low level abstraction for wrapping any command line tool. This isn't to say we can't have both, but to me it makes sense to start with the low level stuff first. We (unfortunately) have several styles of command line tool wrappers in Biopython already - this is a wart that has been on my mental to do list for some time. I think we should focus on dealing with command line strings, and keep this separate from how the tools are invoked (e.g. subprocess or os.system), preparation of input files, and how any output is parsed. As long as this core is in place, more advanced wrappers are possible on top of this basic infrastructure (Tiago may have some comments here from his Bio.PopGen work). Essentially all our command line wrappers start by building a command line string. In some cases this command line string is exposed to the user (e.g. Bio.EMBOSS), and they can choose how they want to invoke it. For example, they can explicitly opt to use the Python subprocess module and pipes if they want to - or use a standard invocation from Bio.Applications (we may want to add a couple of variations to this module). Other wrappers (e.g. Bio.Blast.NCBIStandalone) instead call the tool for you. In the case of Bio.Blast.NCBIStandalone, if you don't want the handles because you've told Blast to save its output to a file, our wrapper still returns the standard output and standard error handles - it is forced on you (see Bug 2654). Also, there is no easy way to see what the actual command line string was, which can make debugging hard, and also prevents certain things (e.g. submitting the command line as a task to a cluster of workstations). At least Bio.Clustalw offers a command line string object (MultipleAlignCL), its just the do_alignment helper function I'm not so keen on. The Bio.Clustalw.do_alignment wrapper is rather unusual in that it automatically parses the output - while most of our wrappers don't. Decoupling the parsing is more modular - it makes it easy for the user to use any parser for the output from a command line tool (either using stdout, or by reading an output file). I like this, and it fits with the handle based approach in most of our parsers. So, I would suggest we think about adding new wrappers under Bio.Align (e.g. Bio.Align.Clustalw, Bio.Align.Muscle, Bio.Align.TCoffee - or perhaps all together in Bio.Align.Applications or something) based on the Bio.Application module as used in Bio.EMBOSS. We could then deprecate Bio.Clustalw, which should also help tidy up the top level name space. Initially at least, I wouldn't include any clever wrapper code at all. Once we have the basic command line objects done, these could be used to later add another layer on top implementing Cymon's ideas for multiple alignment wrappers taking care of intermediate file and inter-converting file formats on the fly, although I remain to be convinced about the value this. If you can pull it off (cross platform, on several versions of python) then a user friendly high level interface would be impressive. Peter From bartek at rezolwenta.eu.org Tue Mar 31 09:14:39 2009 From: bartek at rezolwenta.eu.org (Bartek Wilczynski) Date: Tue, 31 Mar 2009 15:14:39 +0200 Subject: [Biopython-dev] Python's new DVCS chosen In-Reply-To: <320fb6e00903310427s46e45337g42ced1a8e9c3a37f@mail.gmail.com> References: <320fb6e00903310328x3c2d8bc0n8138f551da7ea4a2@mail.gmail.com> <8b34ec180903310405x5d5353f0q2de270a3c16bdc95@mail.gmail.com> <320fb6e00903310427s46e45337g42ced1a8e9c3a37f@mail.gmail.com> Message-ID: <8b34ec180903310614k1fe4a08bkac19c2cc96b36fad@mail.gmail.com> On Tue, Mar 31, 2009 at 1:27 PM, Peter wrote: > On Tue, Mar 31, 2009 at 12:05 PM, Bartek Wilczynski > wrote: >> I think we have made a consensus decision to try out git/github and I >> think it's extremely counter-productive to re-open the discussion on >> our choice now. I'm not a git fanboy, but because there are _no_ >> universal criteria to choose between git vs. bzr vs. Hg we should not >> spend more time on this issue. > > I hadn't intended to reopen the debate - it was just a post for interests sake. > That's relieving. Maybe I'm becoming overly sensitive on the subject. > As you can probably tell from looking at the biopython network graph > on github (which I got to work on Linux but only with Adobe's flash > plugin - gnash etc didn't seem to cope), I've been getting to grips > with git (and github). > I haven't checked for a while, but it seem's that we've got quite a number of people making changes on different branches. That's cool. I'd like to encourage people to share their impressions of git+github with others on the list. If there are any issues, it's better to discuss them early. cheers Bartek From biopython at maubp.freeserve.co.uk Tue Mar 31 10:10:00 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 31 Mar 2009 15:10:00 +0100 Subject: [Biopython-dev] Easy Git - git for mere mortals? Message-ID: <320fb6e00903310710x693527f2k25b49d958543939d@mail.gmail.com> Hi all, Have any of you tried out easygit (eg)? If it is as good as it sounds on their website, it might be a sensible option for those migrating from CVS/SVN to git for the first time. http://www.gnome.org/~newren/eg/ Reading the easygit documentation, it sounds like git gives the user plenty of ways to shoot themselves in the foot (especially if used to CVS/SVN), and a lot of what easygit does is catch some of these potential mistakes. They also stress you can mix and match git and easy git, so it can act as a stepping stone to using git directly. This presentation seems a fairly gentle introduction (with plenty of for interest stuff in the second half that can be ignored), http://www.gnome.org/~newren/eg/presentations/git-introduction.pdf There are quite a few other wrappers for git too - all referred to as "porcelain", which apparently follows from Linux's division of end user commands in git into external "porcelain" and internal "plumbing". The "porcelain" are the bits of a bathroom the end user sees (like the sink), while they normally only interact with the "ugly plumbing" when something goes wrong (like dropping an ear ring down the sink). This kind of quirky language doesn't really make the documentation any clearer in my opinion, still I'm sure things are improving gradually (or at least, I hope they are). For the moment I've come to the conclusion the git man pages are not really suitable for beginners. Peter P.S. For the moment, let's keep the wiki page focused on using git itself directly - too many choices will confuse things. From cy at cymon.org Tue Mar 31 10:49:20 2009 From: cy at cymon.org (Cymon Cox) Date: Tue, 31 Mar 2009 15:49:20 +0100 Subject: [Biopython-dev] Multiple alignment - Clustalw etc... In-Reply-To: <320fb6e00903310556h670634c2rcaa56c254ade07c5@mail.gmail.com> References: <7265d4f0903300442h276df25ay1d78fb04180c5b5b@mail.gmail.com> <320fb6e00903300737i73f6efaex7b0a22ee685c74c1@mail.gmail.com> <7265d4f0903310425p60a8f80ewb8aee8cc6b4a663c@mail.gmail.com> <320fb6e00903310556h670634c2rcaa56c254ade07c5@mail.gmail.com> Message-ID: <7265d4f0903310749x154623few2689a0285f5f6983@mail.gmail.com> Hi Peter, 2009/3/31 Peter > On Tue, Mar 31, 2009 at 12:25 PM, Cymon Cox wrote:# > > > At present the Clustalw interface forces you to write a fasta > > formatted file of your records to disk, and then has Clustalw > > write an aligned matrix to disk in a format specified by the user. > > The Clustalw tool only takes FASTA formatted input, so if you have > a bunch of sequences in memory you are forced to convert them > into FASTA format to use them as input. The question is where > does this conversion take place - explicitly by the user, or implicitly > by a wrapper. Agreed - that's the question... > > If the latter is Clustal format, then the record is parsed and an > alignment > > object is returned, else None is returned. In either case, an output > file(s) > > remains on disk. > > It should be a fairly simple enhancement to look at the arguments > to see if another output format we can parse was selected, e.g. > PHYLIP?) and also parse that. Do you think that would be a > sensible addition to Bio.Clustalw.do_alignment? No - I dont think there should be any output file (of any format) at all, an alignment object should always be returned and the user explicitly write to format they want using AlignIO. (But I think this becomes clearer below...) > Its never been > an issue for me as if you are using the Bio.Clustalw.do_alignment > interface you probably don't care about the output file format. Quite. (Unless you are trying to write to a format not supported by biopython e.g. GCG, GDE, of course.) > > So, say we have a bunch of sequences in pir format and we'd like them > > aligned and saved in stockholm format: > > > > from Bio import SeqIO > > from Bio import AlignIO > > from Bio import Clustalw > > from Bio.Clustalw import MultipleAlignCL > > records = SeqIO.parse(open("Tests/NBRF/DMA_nuc.pir", "r"), "pir") > > AlignIO.write([records], open("temp.fasta", "w"), "fasta") > > The above line is wrong Doh! Grrr... Yeah, perhaps it should have raised an error - I'll follow this up elsewhere - but even with the corrected line and explicitly opening and closing the file handles, I still can get clustalw to align this file... (later...) > we end up with 4 output files on disk: temp.aln, temp.dnd, temp.fasta, > > temp.sth - 3 of which are incidental. > > Yes - but as the ClustalW doesn't read in PIR files, and doesn't output > Stockholm files on its own, so this has to happen. It's just a question > of who does it (the user, or the wrapper code). Yep... > > As I say, I'd like to see this: > >>>> from Bio.Align import MultipleAlignment > >>>> records = list(SeqIO.parse(open("Tests/NBRF/DMA_nuc.pir", "r"), > "pir")) > >>>> align = MultipleAlignment(records, executable="clustalw") > >>>> AlignIO.write([align], open("temp.sth", "w"), "stockholm") > > > > ie resulting in one file temp.sth, which we've explicitly written to > disk. > > So you'd like the wrapper to take care of creating and deleting the > temp input FASTA file, and also deleting the temp output ClustalW > file after parsing it. This can probably be done quite cleanly using > python's NamedTemporaryFile object. > Yep. > >> Also, while I can see this might be nice for short alignments > >> (which are quick to run), its rather implicit or magic. > > > > Not sure what you mean here? Why would the size of alignment matter? > > Size of alignment influences the compute time, and therefore is an issue > for > anyone doing things at the python prompt. Moreover, if the alignments are > big and slow, you generally want to make sure the output file is kept on > disk, > as you'll probably want to read it more than once. Agreed, but should the call to align the data (ie to clustalw) be writing the output to disk or should the user be making an explicit call using AlignIO? > > And as for it being magic, its seems to me it does, and only does, what > > it says on the label - aligns the data. > > The magic is the behind the scenes creation/deletion of the input/output > files, and the conversion between file formats. Fair enough - then magic it be... :) > > OK, well having had my say, I'm quite happy to write the Muscle module in > > the style of the current Clustalw interface, or whatever style is most > > appropriate for exposing the filename handles. But I'm not sure what that > > would be - perhaps you could elaborate on this a bit... > > I've elaborated, perhaps too much? ;) > > Basically you seem to be thinking about a high level abstraction for > multiple alignment tools (dependent on the Bio.SeqIO and Bio.AlignIO > module), while I am more focused on the low level abstraction for > wrapping any command line tool. This isn't to say we can't have both, > but to me it makes sense to start with the low level stuff first. > > We (unfortunately) have several styles of command line tool wrappers > in Biopython already - this is a wart that has been on my mental to do > list for some time. I think we should focus on dealing with command > line strings, and keep this separate from how the tools are invoked > (e.g. subprocess or os.system), preparation of input files, and how > any output is parsed. As long as this core is in place, more advanced > wrappers are possible on top of this basic infrastructure (Tiago may > have some comments here from his Bio.PopGen work). > > Essentially all our command line wrappers start by building a command > line string. In some cases this command line string is exposed to the > user (e.g. Bio.EMBOSS), and they can choose how they want to invoke > it. For example, they can explicitly opt to use the Python subprocess > module and pipes if they want to - or use a standard invocation from > Bio.Applications (we may want to add a couple of variations to this > module). > > Other wrappers (e.g. Bio.Blast.NCBIStandalone) instead call the tool > for you. In the case of Bio.Blast.NCBIStandalone, if you don't want > the handles because you've told Blast to save its output to a file, > our wrapper still returns the standard output and standard error > handles - it is forced on you (see Bug 2654). Also, there is no easy > way to see what the actual command line string was, which can make > debugging hard, and also prevents certain things (e.g. submitting the > command line as a task to a cluster of workstations). At least > Bio.Clustalw offers a command line string object (MultipleAlignCL), > its just the do_alignment helper function I'm not so keen on. > > The Bio.Clustalw.do_alignment wrapper is rather unusual in that it > automatically parses the output - while most of our wrappers don't. > Decoupling the parsing is more modular - it makes it easy for the user > to use any parser for the output from a command line tool (either > using stdout, or by reading an output file). I like this, and it fits > with the handle based approach in most of our parsers. Thanks for your thoughts on this, it helps clarify some things... > So, I would suggest we think about adding new wrappers under Bio.Align > (e.g. Bio.Align.Clustalw, Bio.Align.Muscle, Bio.Align.TCoffee - or > perhaps all together in Bio.Align.Applications or something) based on > the Bio.Application module as used in Bio.EMBOSS. We could then > deprecate Bio.Clustalw, which should also help tidy up the top level > name space. Initially at least, I wouldn't include any clever wrapper > code at all. OK, I'll aim for this with the Muscle code... Cheers, C. -- ____________________________________________________________________ Cymon J. Cox Centro de Ciencias do Mar Faculdade de Ciencias do Mar e Ambiente (FCMA) Universidade do Algarve Campus de Gambelas 8005-139 Faro Portugal Phone: +0351 289800909 ext 7909 Fax: +0351 289800051 Email: cy at cymon.org, cymon at ualg.pt, cymon.cox at gmail.com HomePage : http://biology.duke.edu/bryology/cymon.html -8.63/-6.77 From biopython at maubp.freeserve.co.uk Tue Mar 31 11:24:32 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 31 Mar 2009 16:24:32 +0100 Subject: [Biopython-dev] Multiple alignment - Clustalw etc... In-Reply-To: <7265d4f0903310749x154623few2689a0285f5f6983@mail.gmail.com> References: <7265d4f0903300442h276df25ay1d78fb04180c5b5b@mail.gmail.com> <320fb6e00903300737i73f6efaex7b0a22ee685c74c1@mail.gmail.com> <7265d4f0903310425p60a8f80ewb8aee8cc6b4a663c@mail.gmail.com> <320fb6e00903310556h670634c2rcaa56c254ade07c5@mail.gmail.com> <7265d4f0903310749x154623few2689a0285f5f6983@mail.gmail.com> Message-ID: <320fb6e00903310824v6fb0e1d2gff32b3effccd00b1@mail.gmail.com> On Tue, Mar 31, 2009 at 3:49 PM, Cymon Cox wrote: >>> >>> If the latter is Clustal format, then the record is parsed and an >>> alignment object is returned, else None is returned. In either >>> case, an output file(s) remains on disk. >> >> It should be a fairly simple enhancement to look at the arguments >> to see if another output format we can parse was selected, e.g. >> PHYLIP?) and also parse that. ?Do you think that would be a >> sensible addition to Bio.Clustalw.do_alignment? > > No - I dont think there should be any output file (of any format) at all, an > alignment object should always be returned and the user explicitly write to > format they want using AlignIO. (But I think this becomes clearer below...) Well there must be an output file, since ClustalW won't write its output alignment to stdout. Of course, you would have a wrapper which deletes the output file after it has been parsed into an Alignment object. However, we shouldn't change the existing Bio.Clustalw.do_alignment function to do this (or to delete the .dnd guide tree), since people may be using the call for these "side effects". >> ?Its never been >> an issue for me as if you are using the Bio.Clustalw.do_alignment >> interface you probably don't care about the output file format. > > Quite. (Unless you are trying to write to a format not supported by > biopython e.g. GCG, GDE, of course.) What I was saying was Bio.Clustalw.do_alignment knows the requested output format, and if it is ClustalW it automatically parses the output file and returns the alignment. Since this code was written, Bio.AlignIO was added and could potentially be used to parse PHYLIP (etc) output from the Clustalw tool. And one day maybe GCG etc too. i.e. Right now Bio.Clustalw.do_alignment will return an alignment if it is in ClustalW format, or None if it isn't. I'm suggesting Bio.Clustalw.do_alignment could return an alignment when Bio.AlignIO can parse the requested file format, or None if it can't. This would only be a small enhancement, and may not be worth bothering with if we are thinking about deprecating Bio.Clustalw with a replacement under Bio.Align. >> Size of alignment influences the compute time, and therefore is an issue >> for anyone doing things at the python prompt. ?Moreover, if the alignments >> are big and slow, you generally want to make sure the output file is kept >> on disk, as you'll probably want to read it more than once. > > Agreed, but should the call to align the data (ie to clustalw) be writing > the output to disk or should the user be making an explicit call using > AlignIO? The command line tool ClustalW will itself write the output to disk. I don't recall off hand, but other tools like Muscle may give the option of writing to a file or to stdout. In either case, the tool writes to a handle, and the user may want to *read* this handle using Bio.AlignIO. If I want the tool's output to go straight to a file, I'd get the tool to do it. The only reason I can see to be *writing* the alignment with Bio.AlignIO would be for file conversion (or after manipulating the alignment), and that would done by the user's python code. If you are talking about the data preparation (i.e. the input file rather than the output file), then I think it is up to the user's code to prepare a suitable input FASTA file (e.g. from SeqRecord objects with Bio.SeqIO) before calling the command line tool. >>> And as for it being magic, its seems to me it does, and only does, what >>> it says on the label - aligns the data. >> >> The magic is the behind the scenes creation/deletion of the input/output >> files, and the conversion between file formats. > > Fair enough - then magic it be... :) :) >> > OK, well having had my say, I'm quite happy to write the Muscle module in >> > the style of the current Clustalw interface, or whatever style is most >> > appropriate for exposing the filename handles. But I'm not sure what that >> > would be - perhaps you could elaborate on this a bit... >> >> I've elaborated, ... > > Thanks for your thoughts on this, it helps clarify some things... Oh good. If you don't agree with any of that, do say so by the way. >> So, I would suggest we think about adding new wrappers under Bio.Align >> (e.g. Bio.Align.Clustalw, Bio.Align.Muscle, Bio.Align.TCoffee - or >> perhaps all together in Bio.Align.Applications or something) based on >> the Bio.Application module as used in Bio.EMBOSS. ?We could then >> deprecate Bio.Clustalw, which should also help tidy up the top level >> name space. ?Initially at least, I wouldn't include any clever wrapper >> code at all. > > OK, I'll aim for this with the Muscle code... That sounds good. Now can I tempt you into trying out github at the same time, so we can see your proposed code evolve in public? Could I add at this point that I don't think the wrapper should set any default arguments - leave that up to the command line tool itself. Otherwise you can get the situation where the Biopython defaults get out of sync with the tool's own default values (an issue with our online qblast wrapper and the NCBI change their default settings over time). As an aside, I have used Muscle with Biopython thanks to its option for strict Clustal ouput, which can be parsed by Bio.AlignIO fine. For this I just generated my own command line on the fly, but I was only using a couple of the command line arguments. Peter From bugzilla-daemon at portal.open-bio.org Tue Mar 31 13:05:50 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 31 Mar 2009 13:05:50 -0400 Subject: [Biopython-dev] [Bug 2799] UnknownSeq object (e.g. for QUAL files) In-Reply-To: Message-ID: <200903311705.n2VH5oKe025136@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2799 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk 2009-03-31 13:05 EST ------- Checked into CVS from http://github.com/peterjc/biopython/tree/bug2799-UnknownSeq Checking in Bio/Seq.py; /home/repository/biopython/biopython/Bio/Seq.py,v <-- Seq.py new revision: 1.67; previous revision: 1.66 done Checking in Bio/SeqRecord.py; /home/repository/biopython/biopython/Bio/SeqRecord.py,v <-- SeqRecord.py new revision: 1.32; previous revision: 1.31 done Checking in Bio/GenBank/__init__.py; /home/repository/biopython/biopython/Bio/GenBank/__init__.py,v <-- __init__.py new revision: 1.106; previous revision: 1.105 done Checking in Bio/SeqIO/InsdcIO.py; /home/repository/biopython/biopython/Bio/SeqIO/InsdcIO.py,v <-- InsdcIO.py new revision: 1.9; previous revision: 1.8 done Checking in Bio/SeqIO/QualityIO.py; /home/repository/biopython/biopython/Bio/SeqIO/QualityIO.py,v <-- QualityIO.py new revision: 1.8; previous revision: 1.7 done Checking in Tests/test_SeqIO.py; /home/repository/biopython/biopython/Tests/test_SeqIO.py,v <-- test_SeqIO.py new revision: 1.50; previous revision: 1.49 done Checking in Tests/output/test_GenBank; /home/repository/biopython/biopython/Tests/output/test_GenBank,v <-- test_GenBank new revision: 1.41; previous revision: 1.40 done Checking in Tests/output/test_SeqIO; /home/repository/biopython/biopython/Tests/output/test_SeqIO,v <-- test_SeqIO new revision: 1.36; previous revision: 1.35 done -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython at maubp.freeserve.co.uk Tue Mar 31 13:12:37 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 31 Mar 2009 18:12:37 +0100 Subject: [Biopython-dev] SeqIO and qual: Question about reading and writing qual files In-Reply-To: <320fb6e00903251630t45da293fl4d8d111b7e7eedc9@mail.gmail.com> References: <9e2f512b0903232324qb509c60v4154d3e1bffb089e@mail.gmail.com> <320fb6e00903240249h4d0bf648rfd5de741e582f687@mail.gmail.com> <9e2f512b0903240759n3c7f8b8fpc96bccd4d629082d@mail.gmail.com> <320fb6e00903240813x5fdb3589qef340129b5e267c0@mail.gmail.com> <9e2f512b0903240833g7768de97q8f10fe72cde7e64a@mail.gmail.com> <320fb6e00903250301v59319214pa3246e0a49899e87@mail.gmail.com> <9e2f512b0903251615x7c14c90en3b3a9b2b6ff86186@mail.gmail.com> <320fb6e00903251630t45da293fl4d8d111b7e7eedc9@mail.gmail.com> Message-ID: <320fb6e00903311012y393761dev975a39464ab82043@mail.gmail.com> On Thu, Mar 26, 2009 at 12:30 AM, Peter wrote: > On Wed, Mar 25, 2009 at 11:15 PM, Sebastian Bassi: >>> Sebastian - could you have a quick play with this github code (using the new >>> UnknownSeq class), and the current CVS code (using None), and make sure >>> both support the slicing operations you were trying earlier? ?Thanks. >> >> ... >> >> From a practical point of view, both versions are the same, but the >> concept of UnknownSeq looks solid than None, because if I don't know >> about about biopython internals, I would never try to slice a None >> seq. With "None" ... >> But with the UnknownSeq object, len(s) returns an actual length, so it >> is more intuitive that it can be sliced. > > I agree the UnknownSeq is more intuitive - plus it makes the SeqRecord > __getitem__ code nicer, and it means you can do len(SeqRecord) too, > which was problematic if the sequence was None. I've checked this into CVS after this discussion (and a little off thread). I wasn't comfortable with using None for a sequence, and doing this while also wanting to support len(...) and slicing on such SeqRecord objects was basically horrible. >> Then I tried the git code and it also worked. One thing I noticed is >> that I got "?" instead of "N" the "sequence" of the UnknownSeq. > > I felt we shouldn't use an "N" unless we are confident the sequence > is nucleotides. In practice, this is probably a safe assumption for > FASTQ and QUAL files - unless anyone can think of a counter example? > Do you think it is safe to assume FASTQ and QUAL files are just for > nucleotides? > > I mean, you could translate a CDS from transcriptome sequencing, > and for the sake of argument give each amino acid a quality score > from the three nucleotide quality scores, and then save this a protein > FASTQ file. But I've never heard of anyone actually doing this ;) So, should we assume QUAL files (and perhaps FASTQ files) are nucleotides when reading them in, and enforce this when writing them out? This would mean the QUAL files' UnknownSeq objects would use the letter "N" instead of "?". Or is it more generic to leave it as it is, and not make or force any assumptions about the nature of the sequence? Peter From biopython at maubp.freeserve.co.uk Tue Mar 31 17:38:48 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 31 Mar 2009 22:38:48 +0100 Subject: [Biopython-dev] Plan for Biopython 1.50 (beta) Message-ID: <320fb6e00903311438g6fb0813bt18a035d485a6bb99@mail.gmail.com> Hi all, OK guys, after a brief chat off the mailing list, I'm hoping to do the Biopython 1.50 beta release roughly this weekend, somewhere between Friday 4 and Monday 6 April. Until then please consider CVS "frozen" for anything other that documentation changes or unit test additions, or at a push really tiny changes. Once I'm ready to actually do the release, I'll send out an email requesting no further CVS commits. Those of you that have committed changes, please check the NEWS file and DEPRECATED file is up to date - thanks. After the release of Biopython 1.50 beta, we'll reopen CVS again for small changes and documentation. While the beta is being tested by our user base, I'd like us to push to finish any missing documentation - in particular for new modules Bio.Motif (Bartek) and Bio.Graphics.GenomeDiagram (me and/or Leighton), plus the new SeqRecord slicing and UnknownSeq class (me). Depending on the feedback from the beta, I'd hope we can do the final release of Biopython 1.50 well before the end of April, and then reopen CVS for new code. That would also be a good point to evaluate moving from CVS to git. In the meantime, while CVS is (semi) frozen you can all try using github for keeping your pending submissions under version control ;) Peter From mjldehoon at yahoo.com Sun Mar 1 12:17:28 2009 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Sun, 1 Mar 2009 04:17:28 -0800 (PST) Subject: [Biopython-dev] ScanProsite Message-ID: <704108.77040.qm@web62402.mail.re1.yahoo.com> ScanProsite is a web tool to scan protein sequences against the PROSITE database (see http://www.expasy.org/tools/scanprosite/). Biopython contains code in Bio.Prosite to interact with ScanProsite. However, this code needs to be updated, as it does not work with the current ScanProsite web pages: Neither accessing ScanProsite nor extracting the hits from the HTML page works. This problem is relatively easy to solve, since ExPASy nowadays allows programmatic access to ScanProsite (see http://www.expasy.org/tools/scanprosite/ScanPrositeREST.html). This returns the Prosite hits in XML format, which can be parsed easily in Python. The only issue now is how this should be presented to the user. The current (broken) way to access Prosite looks like this: >>> from Bio import ExPASy >>> handle = ExPASy.scanprosite1(seq=mysequence) to get a handle to the raw HTML output, and >>> from Bio import Prosite >>> hits = Prosite.scan_sequence_expasy(seq=mysequence) which returns the hits as a Python list. One possibility is to have a ScanProsite module under Bio.Prosite or Bio.ExPASy for interaction with ScanProsite. Something like this: >>> from Bio.ExPASy import ScanProsite >>> handle = ScanProsite.search(seq=mysequence) >>> hits = ScanProsite.read(handle) Another option is to have a scan function in the Bio.Prosite module that accesses the ScanProsite web tool and parses the results: >>> from Bio import Prosite >>> hits = Prosite.scan(seq=mysequence) This is more straightforward, but on the other hand people may want to save the XML search results in an XML file, and for that purpose we'd need a function that does the parsing only. Any opinions? --Michiel From bugzilla-daemon at portal.open-bio.org Sun Mar 1 17:00:36 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sun, 1 Mar 2009 12:00:36 -0500 Subject: [Biopython-dev] [Bug 2495] parse element symbols for ATOM/HETATM records (Bio.PDB.PDBParser) In-Reply-To: Message-ID: <200903011700.n21H0alo006588@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2495 ------- Comment #1 from barry_finzel at yahoo.com 2009-03-01 12:00 EST ------- IO.save should also write these element types on an output PDB file -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sun Mar 1 17:06:54 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sun, 1 Mar 2009 12:06:54 -0500 Subject: [Biopython-dev] [Bug 2292] Bio.PDBIO writes TER records without any required fields In-Reply-To: Message-ID: <200903011706.n21H6sJp007165@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2292 barry_finzel at yahoo.com changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |barry_finzel at yahoo.com ------- Comment #2 from barry_finzel at yahoo.com 2009-03-01 12:06 EST ------- IO.save is also writing TER cards at the end of chains, rather than at the end of polypeptide chains. TER cards should never follow HETATM atom records. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sun Mar 1 17:22:28 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sun, 1 Mar 2009 12:22:28 -0500 Subject: [Biopython-dev] [Bug 2774] New: Bio.PDBIO.save doesn't write the required END record Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2774 Summary: Bio.PDBIO.save doesn't write the required END record Product: Biopython Version: Not Applicable Platform: All OS/Version: Mac OS Status: NEW Severity: normal Priority: P2 Component: Other AssignedTo: biopython-dev at biopython.org ReportedBy: barry_finzel at yahoo.com According to the PDB format specification (http://www.wwpdb.org/documentation/format32/sect1.html) All PDB files must be terminated with a record containing just "END\n". Easy to fix in PDBIO.save() -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython at maubp.freeserve.co.uk Mon Mar 2 10:26:38 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 2 Mar 2009 10:26:38 +0000 Subject: [Biopython-dev] ScanProsite In-Reply-To: <704108.77040.qm@web62402.mail.re1.yahoo.com> References: <704108.77040.qm@web62402.mail.re1.yahoo.com> Message-ID: <320fb6e00903020226n3e5929ean957f38315c28d863@mail.gmail.com> On Sun, Mar 1, 2009 at 12:17 PM, Michiel de Hoon wrote: > ScanProsite is a web tool to scan protein sequences against the PROSITE > database (see http://www.expasy.org/tools/scanprosite/). Biopython contains > code in Bio.Prosite to interact with ScanProsite. However, this code needs to > be updated, as it does not work with the current ScanProsite web pages: > Neither accessing ScanProsite nor extracting the hits from the HTML page works. > > This problem is relatively easy to solve, since ExPASy nowadays allows > programmatic access to ScanProsite > (see http://www.expasy.org/tools/scanprosite/ScanPrositeREST.html). This > returns the Prosite hits in XML format, which can be parsed easily in Python. > > The only issue now is how this should be presented to the user. ... > ... > This is more straightforward, but on the other hand people may want to save the > XML search results in an XML file, and for that purpose we'd need a function that > does the parsing only. > > Any opinions? I would definitely have two functions, one returning a handle to the XML, and one for parsing XML from a handle. This would be more consistent with Bio.Entrez and other parsers, and more flexible. For example, the user can opt to save the XML to disk, and they can also use our parser on files or the remote site - plus of course they can use any other XML parser they may prefer. I like your suggestion to have a REST XML based module under Bio.ExPASy, which means we can deprecate the HTML based Bio.Prosite module and in the process make the top level list of modules in Biopython a bit shorter. In the long term I think that will help people find functionality. Peter From bugzilla-daemon at portal.open-bio.org Mon Mar 2 15:22:53 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 2 Mar 2009 10:22:53 -0500 Subject: [Biopython-dev] [Bug 2776] New: Bio.pairwise2 returns non-optimal alignment in at least some cases Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2776 Summary: Bio.pairwise2 returns non-optimal alignment in at least some cases Product: Biopython Version: 1.49 Platform: PC OS/Version: Linux Status: NEW Severity: normal Priority: P2 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: klaus.kopec at tuebingen.mpg.de At least in some cases, Bio.pairwise2 returns an alignment that is not the one with the highest score for the input parameters. This occurs in localXX and globalXX. Yet, I only encountered the problem with large mismatch values (which I use as I need mismatch free alignments). simple example (the bug also occured for longer sequences): >>> sequence1 = 'GKG' >>> sequence2 = 'GWG' >>> A = pairwise2.align.globalms(sequence1, sequence2, 5, -100, -5, -5)[0] >>> A[0] 'GKG--' >>> A[1] '--GWG' >>> A[2] -15.0 whereas 'GK-G' 'G-WG' would get a score of 0 System: Kubuntu 8.10 64Bit, Python 2.6.1, Biopython 1.49 (my pairwise2.py is identical to the current CVS version of it) -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Mar 4 12:41:33 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 4 Mar 2009 07:41:33 -0500 Subject: [Biopython-dev] [Bug 2777] New: [Solution is one line change!] Entity sorting altered by detach_child() calls Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2777 Summary: [Solution is one line change!] Entity sorting altered by detach_child() calls Product: Biopython Version: 1.49 Platform: PC OS/Version: Linux Status: NEW Severity: trivial Priority: P1 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: klaus.kopec at tuebingen.mpg.de detach_child(self, id) in Bio.PDB.Entity changes the order of self.child_list. This bug is caused by line 71, where self.child_list is set to self.child_dict.values() which are values of an unordered(!) dict: self.child_list=self.child_dict.values() Solution: Replace line 71 by: self.child_list.remove(child) -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Mar 4 12:48:19 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 4 Mar 2009 07:48:19 -0500 Subject: [Biopython-dev] [Bug 2777] [Solution is one line change!] Entity sorting altered by detach_child() calls In-Reply-To: Message-ID: <200903041248.n24CmJSZ008104@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2777 ------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2009-03-04 07:48 EST ------- Have you got a short example to demonstrate the original problem? It would be useful to evaluate your change, and could be made into a unit test too. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Mar 4 13:58:41 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 4 Mar 2009 08:58:41 -0500 Subject: [Biopython-dev] [Bug 2777] [Solution is one line change!] Entity sorting altered by detach_child() calls In-Reply-To: Message-ID: <200903041358.n24Dwfjk015027@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2777 ------- Comment #2 from klaus.kopec at tuebingen.mpg.de 2009-03-04 08:58 EST ------- Created an attachment (id=1253) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1253&action=view) example PDB file that can be used to see the bug ## Python Code to see the bug: import os from Bio.PDB.PDBParser import PDBParser p=PDBParser(PERMISSIVE=1) filename=os.path.expanduser("entity_detach_order_bug_example.pdb") s=p.get_structure('Entity.py bug example: detach changes order', filename) print 'order before detach:' for r in s[0]['A'].child_list: print r.id detach_me = s[0]['A'].child_list[-1] ## this is independent of the chosen entry in the list s[0]['A'].detach_child(detach_me.id) print 'order after detach:' for r in s[0]['A'].child_list: print r.id -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Mar 4 14:18:28 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 4 Mar 2009 09:18:28 -0500 Subject: [Biopython-dev] [Bug 2777] [Solution is one line change!] Entity sorting altered by detach_child() calls In-Reply-To: Message-ID: <200903041418.n24EISvd016743@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2777 ------- Comment #3 from klaus.kopec at tuebingen.mpg.de 2009-03-04 09:18 EST ------- the output of the code in my Comment #2 is: order before detach: ('H_PCA', 1, ' ') (' ', 2, ' ') (' ', 3, ' ') (' ', 4, ' ') order after detach: (' ', 2, ' ') (' ', 3, ' ') ('H_PCA', 1, ' ') I forgot to mention, that the line "self.child_list.sort(self._sort)" needs to be commented out as well for the fix to work (as hetatms are otherwise sorted to the end). hmmm... it just came to me, that this probably breaks the Parser for some other PDB files, where residues are unsorted. These changes do not break any existing unit tests for the PDB module, so maybe it's still a step in the right direction. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Mar 4 14:37:34 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 4 Mar 2009 09:37:34 -0500 Subject: [Biopython-dev] [Bug 2777] [Solution is one line change!] Entity sorting altered by detach_child() calls In-Reply-To: Message-ID: <200903041437.n24EbYhj018545@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2777 ------- Comment #4 from biopython-bugzilla at maubp.freeserve.co.uk 2009-03-04 09:37 EST ------- Created an attachment (id=1254) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1254&action=view) Patch for Bio/PDB/Entity.py based on Klaus Kopec's suggestion I've attached a patch which makes the suggested change. I'm hoping to get Thomas (the original author) to comment but otherwise I see no reason not to commit this fix soon. The old code did this: def detach_child(self, id): "Remove a child." child=self.child_dict[id] child.detach_parent() del self.child_dict[id] self.child_list=self.child_dict.values() self.child_list.sort(self._sort) It used a sort which should have preserved the order - but that only works if the child_list is always kept sorted. Looking at the add method, this isn't true: def add(self, entity): "Add a child to the Entity." entity_id=entity.get_id() if self.has_id(entity_id): raise PDBConstructionException( \ "%s defined twice" % str(entity_id)) entity.set_parent(self) self.child_list.append(entity) #self.child_list.sort(self._sort) self.child_dict[entity_id]=entity Interestingly the sort was commented out in the original version first committed to Biopython's CVS, so this change predates the integration into Biopython. I haven't checked to see if there are any other ways the child_list could become unsorted - that doesn't really matter. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Mar 4 16:17:31 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 4 Mar 2009 11:17:31 -0500 Subject: [Biopython-dev] [Bug 2774] Bio.PDBIO.save doesn't write the required END record In-Reply-To: Message-ID: <200903041617.n24GHVd1029752@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2774 thamelry at binf.ku.dk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #1 from thamelry at binf.ku.dk 2009-03-04 11:17 EST ------- save method now has option 'write_end': io.save(fp, write_end=1) if 1, END is written. The reason this is not done by default is that one sometimes calls 'save' multiple times, for example when concatenating files. So always writing END is not a good approach. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Mar 4 19:10:37 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 4 Mar 2009 14:10:37 -0500 Subject: [Biopython-dev] [Bug 2778] New: Efficiency improvement in function Bio.SeqUtils.GC() Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2778 Summary: Efficiency improvement in function Bio.SeqUtils.GC() Product: Biopython Version: 1.48 Platform: PC OS/Version: Linux Status: NEW Severity: normal Priority: P5 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: wscott at chem.ubc.ca Bio.SeqUtils.GC recalculates the gc variable in a loop using a dictionary whereas it could simply be calculated after the loop. The following code is suggested to replace the function: def ScoGC(seq): """ calculates G+C content """ gc=sum(map(seq.count,['G','C','g','c','S','s'])) return gc*100.0/len(seq) -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Mar 4 19:12:27 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 4 Mar 2009 14:12:27 -0500 Subject: [Biopython-dev] [Bug 2778] Efficiency improvement in function Bio.SeqUtils.GC() In-Reply-To: Message-ID: <200903041912.n24JCR2U014353@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2778 wscott at chem.ubc.ca changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |wscott at chem.ubc.ca ------- Comment #1 from wscott at chem.ubc.ca 2009-03-04 14:12 EST ------- of course, rename ScoGC to GC... -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Mar 4 22:03:59 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 4 Mar 2009 17:03:59 -0500 Subject: [Biopython-dev] [Bug 2779] New: Seq.count() docstring should note unexpected behaviour Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2779 Summary: Seq.count() docstring should note unexpected behaviour Product: Biopython Version: 1.49 Platform: PC OS/Version: Windows XP Status: NEW Severity: normal Priority: P2 Component: Documentation AssignedTo: biopython-dev at biopython.org ReportedBy: baoilleach at gmail.com The Seq.count() method has the following docs: "Count method, like that of a python string." This is a cop-out as it does not tell the user anything. In particular, it does not lead the user to expect that Seq("GGG").count("GG")==1. This might make sense for Python strings, but it's incorrect for sequences. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Mar 5 09:19:40 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 5 Mar 2009 04:19:40 -0500 Subject: [Biopython-dev] [Bug 2678] Bio.Entrez module does not always retrieve or find DTD files In-Reply-To: Message-ID: <200903050919.n259Je8d016299@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2678 ruzzo at cs.washington.edu changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |ruzzo at cs.washington.edu ------- Comment #8 from ruzzo at cs.washington.edu 2009-03-05 04:19 EST ------- I'm new to biopython, so I may be doing something else wrong, but in attempting to efetch a pubmed record tonight I see similar errors which seem to be fixed by downloading & installing several (new) DTD's: nlmmedline_090101.dtd nlmmedlinecitation_090101.dtd nlmsharedcatcit_090101.dtd nlmcommon_090101.dtd and possibly pubmed_090101.dtd -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Mar 5 09:23:31 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 5 Mar 2009 04:23:31 -0500 Subject: [Biopython-dev] [Bug 2779] Seq.count() docstring should note unexpected behaviour In-Reply-To: Message-ID: <200903050923.n259NV4S016627@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2779 ------- Comment #1 from lpritc at scri.sari.ac.uk 2009-03-05 04:23 EST ------- I think that's a good point about expected behaviour for count() in a biological sequence. Presumably, we all expect that Seq('GGG').count('GG') should find all overlapping matches, and return the value 2, in order to make intuitive 'biological' sense. There are, after all, two 'GG's in that sequence. This doesn't correspond to string count()ing behaviour, or to standard re module behaviour. The obvious way round it, that I've used before, is to compile the search string as a regular expression, and iterate regular expression matches from one symbol after the start of the preceding match (if any): >>> import re >>> startpos = 0 >>> seq = 'GGGG' >>> motif = 'GG' >>> motif_re = re.compile(motif) >>> matches = [] >>> while True: ... m = motif_re.search(seq, startpos) ... if m is None: ... break ... startpos = m.start() + 1 ... matches.append(m) ... >>> matches [<_sre.SRE_Match object at 0x68f38>, <_sre.SRE_Match object at 0x96ac60>, <_sre.SRE_Match object at 0x96a950>] >>> [(m.start(), m.group()) for m in matches] [(0, 'GG'), (1, 'GG'), (2, 'GG')] This could probably be done more efficiently. Is something like this already implemented in Bio.Motif -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Mar 5 09:24:43 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 5 Mar 2009 04:24:43 -0500 Subject: [Biopython-dev] [Bug 2779] Seq.count() docstring should note unexpected behaviour In-Reply-To: Message-ID: <200903050924.n259OhYw016750@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2779 ------- Comment #2 from lpritc at scri.sari.ac.uk 2009-03-05 04:24 EST ------- D'oh! There isn't a Bio.Motif. My bad. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Mar 5 09:43:09 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 5 Mar 2009 04:43:09 -0500 Subject: [Biopython-dev] [Bug 2779] Seq.count() docstring should note unexpected behaviour In-Reply-To: Message-ID: <200903050943.n259h9XG018545@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2779 ------- Comment #3 from baoilleach at gmail.com 2009-03-05 04:43 EST ------- Thanks for the workaround but could you replace the current count by that code? Can you imagine any existing code that would break because of correction of buggy behaviour? -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Mar 5 10:16:52 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 5 Mar 2009 05:16:52 -0500 Subject: [Biopython-dev] [Bug 2779] Seq.count() docstring should note unexpected behaviour In-Reply-To: Message-ID: <200903051016.n25AGqSW021680@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2779 ------- Comment #4 from lpritc at scri.sari.ac.uk 2009-03-05 05:16 EST ------- Created an attachment (id=1255) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1255&action=view) Patch to Seq.py that modified count behaviour for Seq and MutableSeq objects to return correct counts for substrings of length > 1 (In reply to comment #3) > Thanks for the workaround but could you replace the current count by that code? I don't have access to CVS ;) It would be nice to get consensus that the behaviour that this code would produce is the desired behaviour for everyone, that we've got an acceptable way of implementing it, and that it doesn't break anything downstream. There's bound to be, at best, a lag time. I've attached a proposed patch based on the above code, though it's not necessarily the best way to solve this problem. > Can you imagine any existing code that would break because of correction of > buggy behaviour? That should come out in the testing. And it turns out that there is a Bio.Motif, but it's in CVS. D'oh! again... -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Mar 5 10:22:40 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 5 Mar 2009 05:22:40 -0500 Subject: [Biopython-dev] [Bug 2779] Seq.count() docstring should note unexpected behaviour In-Reply-To: Message-ID: <200903051022.n25AMeIt022121@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2779 ------- Comment #5 from biopython-bugzilla at maubp.freeserve.co.uk 2009-03-05 05:22 EST ------- Prior to Biopython 1.45, the count method only worked with single letter search strings. I changed this just over a year ago for Biopython 1.45 as Bug 2386, but unfortunately at the time none of us considered this overlapping/non-overlapping behaviour. With hindsight we should have had this debate then. http://cvs.biopython.org/cgi-bin/viewcvs/viewcvs.cgi/biopython/Bio/Seq.py.diff?r1=1.19&r2=1.20&cvsroot=biopython We should either: (a) stick with the python string compatible behaviour (which has been a general principle for the Seq class), but document this issue more clearly as a non-overlapping search does run counter to biological usage. or, (b) Or change the behaviour as Leighton suggests to do an overlapping search. This could break any code relying on the old python string-like behaviour. I agree we need to have a discussion of this over on the main mailing list, as making the change could break people's code. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Mar 5 10:42:27 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 5 Mar 2009 05:42:27 -0500 Subject: [Biopython-dev] [Bug 2780] New: PDB file HETATMs cannot be alternative location of a residue that is an ATOM Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2780 Summary: PDB file HETATMs cannot be alternative location of a residue that is an ATOM Product: Biopython Version: 1.49 Platform: PC OS/Version: Linux Status: NEW Severity: normal Priority: P2 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: klaus.kopec at tuebingen.mpg.de In PDB files where HETATMs and ATOMs are altlocs of each other (e.g. 1RR2, residue 184), they are treated as two separate residues. A obvious solution is to add an "else" case to the "if" in StructureBuilder.py line 115 (method init_residue(...)) that introduces some kind of mixed (HETATM as well as ATOM) DisorderedResidue. The Main problem with that: the hetero field of the residue ids will differ between the residues, therefore the whole access-over-ids mechanism will most likely not work with these MixedDisorderedResidues as straight forward as it does so far. Sadly, I could not come up with a good solution for this. Maybe some __getattr__ magic that alters the way Chains access their residues might work by allowing access to residues by only using the second and third component of the id 3-tuple?! -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Mar 5 10:44:12 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 5 Mar 2009 05:44:12 -0500 Subject: [Biopython-dev] [Bug 2678] Bio.Entrez module does not always retrieve or find DTD files In-Reply-To: Message-ID: <200903051044.n25AiCH9023924@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2678 ------- Comment #9 from biopython-bugzilla at maubp.freeserve.co.uk 2009-03-05 05:44 EST ------- (In reply to comment #8) > I'm new to biopython, so I may be doing something else wrong, but in > attempting to efetch a pubmed record tonight I see similar errors which > seem to be fixed by downloading & installing several (new) DTD's: > > nlmmedline_090101.dtd > nlmmedlinecitation_090101.dtd > nlmsharedcatcit_090101.dtd > nlmcommon_090101.dtd > and possibly > pubmed_090101.dtd > Those have been added to CVS, and will be installed with Biopython 1.50 - perhaps we should hurry up our release plans. http://cvs.biopython.org/cgi-bin/viewcvs/viewcvs.cgi/biopython/Bio/Entrez/DTDs/?cvsroot=biopython -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Mar 5 10:46:09 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 5 Mar 2009 05:46:09 -0500 Subject: [Biopython-dev] [Bug 2780] PDB file HETATMs cannot be alternative location of a residue that is an ATOM In-Reply-To: Message-ID: <200903051046.n25Ak9DH024105@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2780 ------- Comment #1 from klaus.kopec at tuebingen.mpg.de 2009-03-05 05:46 EST ------- Created an attachment (id=1256) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1256&action=view) PDB file slice with 2 residues, that can be used to see the bug. slice of PDB file 1RR2 (example mentioned in my bug submission) showing two altloc residues where one is a HETATM and the other an ATOM. They are treated as two residues in Biopython. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Mar 5 10:56:39 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 5 Mar 2009 05:56:39 -0500 Subject: [Biopython-dev] [Bug 2778] Efficiency improvement in function Bio.SeqUtils.GC() In-Reply-To: Message-ID: <200903051056.n25AudjU024927@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2778 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk 2009-03-05 05:56 EST ------- I've checked that in, but with the existing code to catch a zero length sequence and return 0 instead of raising a ZeroDivisionError. def GC(seq): """Calculates G+C content, ...""" gc=sum(map(seq.count,['G','C','g','c','S','s'])) if gc == 0: return 0 return gc*100.0/len(seq) The old code had been modified several times - it originally calculated the GC% as the CG count divided by the ATCG count, thus it had to count all the bases. You are right, this is much cleaner. Thanks. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Mar 5 11:18:33 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 5 Mar 2009 06:18:33 -0500 Subject: [Biopython-dev] [Bug 2779] Seq.count() docstring should note unexpected behaviour In-Reply-To: Message-ID: <200903051118.n25BIXdp026743@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2779 ------- Comment #6 from baoilleach at gmail.com 2009-03-05 06:18 EST ------- Sorry - could you clarify which mailing list you mean by the "main mailing list", the dev list or the discuss list? -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Mar 5 12:27:49 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 5 Mar 2009 07:27:49 -0500 Subject: [Biopython-dev] [Bug 2779] Seq.count() docstring should note unexpected behaviour In-Reply-To: Message-ID: <200903051227.n25CRnmA001571@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2779 ------- Comment #7 from biopython-bugzilla at maubp.freeserve.co.uk 2009-03-05 07:27 EST ------- (In reply to comment #6) > Sorry - could you clarify which mailing list you mean by the "main mailing > list", the dev list or the discuss list? I was thinking the main discussion list, and we should focus on the desired behaviour rather than how we might implement it. See: http://lists.open-bio.org/pipermail/biopython/2009-March/004960.html Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Mar 5 12:31:50 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 5 Mar 2009 07:31:50 -0500 Subject: [Biopython-dev] [Bug 2781] New: Bio.PDB Structure instances cannot be deepcopied Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2781 Summary: Bio.PDB Structure instances cannot be deepcopied Product: Biopython Version: 1.49 Platform: PC OS/Version: Linux Status: NEW Severity: minor Priority: P3 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: klaus.kopec at tuebingen.mpg.de For some reason, copy.deepcopy() of a Structure instance results in: Exception RuntimeError: 'maximum recursion depth exceeded while calling a Python object' in ignored for most PDB files I tried. Maybe implementing some __deepcopy__ methods might help, but I am unsure, as I did not perform profound research concerning this bug. My system: Kubuntu 8.10 64-Bit, Python 2.6.1 -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython at maubp.freeserve.co.uk Thu Mar 5 12:40:16 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 5 Mar 2009 12:40:16 +0000 Subject: [Biopython-dev] Fwd: [Utilities-announce] NCBI E-Utilities requirements updated In-Reply-To: <7B6F170840CA6C4DA63EE0C8A7BB43EC051985F6@NIHCESMLBX15.nih.gov> References: <7B6F170840CA6C4DA63EE0C8A7BB43EC051985F6@NIHCESMLBX15.nih.gov> Message-ID: <320fb6e00903050440h138893b3yb2484a557621fc41@mail.gmail.com> This email was sent out a few weeks ago, but it took a while before the NCBI webpage was actually updated (maybe a caching issue) so I didn't rush to relax our rules immediately. Under the new rules we must make no more than three requests every second. We could track the times of the last two requests in order to enforce this as worded, but I think it would be simpler just to switch from using a minimum 3 second pause between Bio.Entrez requests to just a minimum 0.33334 second pause. This is a much simpler code change and will comply with the new relaxed rules. Unless anyone has a counter suggestion, I will update Bio.Entrez and the tutorial shortly. Peter ---------- Forwarded message ---------- From: Date: Thu, Feb 26, 2009 at 6:55 PM Subject: [Utilities-announce] NCBI E-Utilities requirements updated To: utilities-announce at ncbi.nlm.nih.gov NCBI E-Utilities users, E-Utilities system use requirements have been modified ?from no more than 1 request every 3 seconds to no more than 3 requests every second. The online documentation has been updated to reflect this change: http://eutils.ncbi.nlm.nih.gov/entrez/query/static/eutils_help.html Thank you. NCBI/NLM/NIH _______________________________________________ Utilities-announce mailing list http://www.ncbi.nlm.nih.gov/mailman/listinfo/utilities-announce From bugzilla-daemon at portal.open-bio.org Thu Mar 5 12:58:40 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 5 Mar 2009 07:58:40 -0500 Subject: [Biopython-dev] [Bug 2779] Seq.count() docstring should note unexpected behaviour In-Reply-To: Message-ID: <200903051258.n25Cwe9p004288@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2779 ------- Comment #8 from barwil at gmail.com 2009-03-05 07:58 EST ------- (In reply to comment #4) > This could probably be done more efficiently. Is something like this already > implemented in Bio.Motif > In Bio.Motif you can do: m=Bio.Motif.Motif() m.add_instance(Seq("GG"),m.alphabet)) for i in m.search_instances(Seq("GGGG",m.alphabet)): print i this should give you overlapping hits there is Bio.Motif in CVS, but the same implementation is in Bio.AlignAce.Motif (now obsoleted). -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython at maubp.freeserve.co.uk Thu Mar 5 12:58:40 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 5 Mar 2009 12:58:40 +0000 Subject: [Biopython-dev] determining the version In-Reply-To: <320fb6e00902190225o34092311saddf02ec39f1e1dd@mail.gmail.com> References: <320fb6e00809241412r54c2a3a1mc69f3e573f1eaac7@mail.gmail.com> <63700.34226.qm@web62405.mail.re1.yahoo.com> <320fb6e00809250222h3d0d15bw763446b5f0ec44d1@mail.gmail.com> <320fb6e00810010929y4dab07a5ya25767cc0818654d@mail.gmail.com> <320fb6e00902190225o34092311saddf02ec39f1e1dd@mail.gmail.com> Message-ID: <320fb6e00903050458r5ef0e202l5e1a61031fb80c2@mail.gmail.com> On Thu, Feb 19, 2009 at 10:25 AM, Peter wrote: > > Since this thread last year, there have been no objections. ?Following > a recent question on the main mailing list about how to determine the > version of Biopython this seems worth doing before the next release. > Again, an objections or comments on the implementation details? > Otherwise I'll make this change shortly. > Changes made in CVS, and updated the release instructions: http://biopython.org/wiki/Building_a_release In between releases, should we leave the __version__ as is, or explicitly update it to be something like "1.49+" just after releasing 1.49? This only affects people installing Biopython from CVS, so they should be technically inclined... Peter From bugzilla-daemon at portal.open-bio.org Thu Mar 5 14:47:30 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 5 Mar 2009 09:47:30 -0500 Subject: [Biopython-dev] [Bug 2507] Adding __getitem__ to SeqRecord for element access and slicing In-Reply-To: Message-ID: <200903051447.n25ElU37014276@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2507 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #14 from biopython-bugzilla at maubp.freeserve.co.uk 2009-03-05 09:47 EST ------- We seem to have reached agreement on the mailing list, so checking this patch in, and marking this issue as fixed. Note we may want to review the choice of name for the new per-letter-annotations attribute (as long as this happens before the Biopython 1.50 release), currently this is letter_annotations as per a brief discussion on the mailing list. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Mar 5 14:47:43 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 5 Mar 2009 09:47:43 -0500 Subject: [Biopython-dev] [Bug 2551] Adding advanced __getitem__ to generic alignment, e.g. align[1:2, 5:-5] In-Reply-To: Message-ID: <200903051447.n25ElhAb014302@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2551 Bug 2551 depends on bug 2507, which changed state. Bug 2507 Summary: Adding __getitem__ to SeqRecord for element access and slicing http://bugzilla.open-bio.org/show_bug.cgi?id=2507 What |Old Value |New Value ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Mar 5 14:47:44 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 5 Mar 2009 09:47:44 -0500 Subject: [Biopython-dev] [Bug 2767] Bio.SeqIO support for FASTQ and QUAL files In-Reply-To: Message-ID: <200903051447.n25EliM6014314@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2767 Bug 2767 depends on bug 2507, which changed state. Bug 2507 Summary: Adding __getitem__ to SeqRecord for element access and slicing http://bugzilla.open-bio.org/show_bug.cgi?id=2507 What |Old Value |New Value ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Mar 5 15:31:17 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 5 Mar 2009 10:31:17 -0500 Subject: [Biopython-dev] [Bug 2778] Efficiency improvement in function Bio.SeqUtils.GC() In-Reply-To: Message-ID: <200903051531.n25FVHOq018242@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2778 ------- Comment #3 from bsouthey at gmail.com 2009-03-05 10:31 EST ------- (In reply to comment #2) > I've checked that in, but with the existing code to catch a zero length > sequence and return 0 instead of raising a ZeroDivisionError. > > def GC(seq): > """Calculates G+C content, ...""" > gc=sum(map(seq.count,['G','C','g','c','S','s'])) > if gc == 0: return 0 > return gc*100.0/len(seq) > I think that it is clearer to check that the sequence length is not zero rather than assuming that if the sum is zero then the sequence length is also zero. def GC(seq): """Calculates G+C content, ...""" gc=sum(map(seq.count,['G','C','g','c','S','s'])) if len(seq) > 0: return gc*100.0/len(seq) else: return 0 -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Mar 5 15:51:20 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 5 Mar 2009 10:51:20 -0500 Subject: [Biopython-dev] [Bug 2778] Efficiency improvement in function Bio.SeqUtils.GC() In-Reply-To: Message-ID: <200903051551.n25FpKGf020282@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2778 ------- Comment #4 from lpritc at scri.sari.ac.uk 2009-03-05 10:51 EST ------- (In reply to comment #3) > (In reply to comment #2) > > I've checked that in, but with the existing code to catch a zero length > > sequence and return 0 instead of raising a ZeroDivisionError. > > > > def GC(seq): > > """Calculates G+C content, ...""" > > gc=sum(map(seq.count,['G','C','g','c','S','s'])) > > if gc == 0: return 0 > > return gc*100.0/len(seq) > > > > I think that it is clearer to check that the sequence length is not zero rather > than assuming that if the sum is zero then the sequence length is also zero. > > def GC(seq): > """Calculates G+C content, ...""" > gc=sum(map(seq.count,['G','C','g','c','S','s'])) > if len(seq) > 0: > return gc*100.0/len(seq) > else: > return 0 It would probably be clearest, quickest and most efficient to comment that particular line of the code to point out that it does elegant double-duty as a check for zero sequence length ;) -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Mar 5 15:56:38 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 5 Mar 2009 10:56:38 -0500 Subject: [Biopython-dev] [Bug 2778] Efficiency improvement in function Bio.SeqUtils.GC() In-Reply-To: Message-ID: <200903051556.n25Fuc13020807@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2778 ------- Comment #5 from biopython-bugzilla at maubp.freeserve.co.uk 2009-03-05 10:56 EST ------- (In reply to comment #3) > I think that it is clearer to check that the sequence length is > not zero rather than assuming that if the sum is zero then the > sequence length is also zero. I agree, but had chosen to keep the old code. > def GC(seq): > """Calculates G+C content, ...""" > gc=sum(map(seq.count,['G','C','g','c','S','s'])) > if len(seq) > 0: > return gc*100.0/len(seq) > else: > return 0 > Your length test isn't very elegant, this is much nicer/more pythonic I think: if seq : gc = sum(map(seq.count,['G','C','g','c','S','s'])) return gc*100.0/len(seq) else : return 0 However, given most of the time the sequence will not be empty, this should be faster: try : gc = sum(map(seq.count,['G','C','g','c','S','s'])) return gc*100.0/len(seq) except ZeroDivisionError : return 0 CVS updated. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Mar 5 16:04:07 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 5 Mar 2009 11:04:07 -0500 Subject: [Biopython-dev] [Bug 2551] Adding advanced __getitem__ to generic alignment, e.g. align[1:2, 5:-5] In-Reply-To: Message-ID: <200903051604.n25G471v021470@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2551 ------- Comment #4 from biopython-bugzilla at maubp.freeserve.co.uk 2009-03-05 11:04 EST ------- Created an attachment (id=1257) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1257&action=view) Patch for Bio/Align/Generic.py to support array like access This requires the patch to the SeqRecord __getitem__ method just committed to CVS for Bug 2507. This includes an extended doctest which tries to illustrate the typical usage I expect. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bsouthey at gmail.com Thu Mar 5 16:59:08 2009 From: bsouthey at gmail.com (Bruce Southey) Date: Thu, 5 Mar 2009 10:59:08 -0600 Subject: [Biopython-dev] determining the version In-Reply-To: <320fb6e00903050458r5ef0e202l5e1a61031fb80c2@mail.gmail.com> References: <320fb6e00809241412r54c2a3a1mc69f3e573f1eaac7@mail.gmail.com> <63700.34226.qm@web62405.mail.re1.yahoo.com> <320fb6e00809250222h3d0d15bw763446b5f0ec44d1@mail.gmail.com> <320fb6e00810010929y4dab07a5ya25767cc0818654d@mail.gmail.com> <320fb6e00902190225o34092311saddf02ec39f1e1dd@mail.gmail.com> <320fb6e00903050458r5ef0e202l5e1a61031fb80c2@mail.gmail.com> Message-ID: On Thu, Mar 5, 2009 at 6:58 AM, Peter wrote: > On Thu, Feb 19, 2009 at 10:25 AM, Peter wrote: >> >> Since this thread last year, there have been no objections. ?Following >> a recent question on the main mailing list about how to determine the >> version of Biopython this seems worth doing before the next release. >> Again, an objections or comments on the implementation details? >> Otherwise I'll make this change shortly. >> > > Changes made in CVS, and updated the release instructions: > http://biopython.org/wiki/Building_a_release > > In between releases, should we leave the __version__ as is, or > explicitly update it to be something like "1.49+" just after releasing > 1.49? ?This only affects people installing Biopython from CVS, so they > should be technically inclined... > > Peter > I agree that it would be helpful to distinguish between an official release and a build from the CVS. Furthermore, it would then be important to know when the build from CVS was done at least relative to the official releases. So I think you tending to have a numbering scheme like: 1.49 is an official release 1.49+ (or similar) is CVS after the 1.49 official release but before the next official release 1.50. 1.50 will be an official release 1.50+ (or similar) is the CVS after the 1.50 official release but before the next official release whatever number it will be. If so the release instructions should also include an instruction to change the CVS numbering in the version in __init__.py files after release has been made. Also, after looking at the release instructions shouldn't BioSQL and Doc also have version-related information? Ideally the Biopython BioSQL code should have some connection to the main version of BioSQL - I don't use it so it is not an issue for me (yet). Bruce From biopython at maubp.freeserve.co.uk Thu Mar 5 17:50:04 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 5 Mar 2009 17:50:04 +0000 Subject: [Biopython-dev] determining the version In-Reply-To: References: <320fb6e00809241412r54c2a3a1mc69f3e573f1eaac7@mail.gmail.com> <63700.34226.qm@web62405.mail.re1.yahoo.com> <320fb6e00809250222h3d0d15bw763446b5f0ec44d1@mail.gmail.com> <320fb6e00810010929y4dab07a5ya25767cc0818654d@mail.gmail.com> <320fb6e00902190225o34092311saddf02ec39f1e1dd@mail.gmail.com> <320fb6e00903050458r5ef0e202l5e1a61031fb80c2@mail.gmail.com> Message-ID: <320fb6e00903050950k4d0cce9i1fe1442e15cf9cf7@mail.gmail.com> On Thu, Mar 5, 2009 at 4:59 PM, Bruce Southey wrote: > > I agree that it would be helpful to distinguish between an official > release and a build from the CVS. Furthermore, it would then be > important to know when the build from CVS was done at least relative > to the official releases. > > So I think you tending to have a numbering scheme like: > 1.49 is an official release > 1.49+ (or similar) is CVS after the 1.49 official release but before > the next official release 1.50. > 1.50 will be an official release > 1.50+ (or similar) is the CVS after the 1.50 official release but > before the next official release whatever number it will be. That is one of the two suggestions I was putting forward. The other was just leaving the version number as that of the most recent release - people should know if they are running CVS as this has to be done deliberately. One tiny downside is the "+" gets turned into an underscore for filenames (e.g. egg files, and I assume a windows installer), but we won't be releasing those so that doesn't matter. > If so the release instructions should also include an instruction to > change the CVS numbering in the version in __init__.py files after > release has been made. Yes - assuming people are happy with this suggested scheme. Note that if we switch to SVN, something automated with the SVN revision number might be possible. > Also, after looking at the release instructions shouldn't BioSQL and > Doc also have version-related information? > Ideally the Biopython BioSQL code should have some connection to the > main version of BioSQL - I don't use it so it is not an issue for me > (yet). Because Bio/* and BioSQL/* are always shipped and packaged together, to my mind they together make up Biopython and share the same version number. As to why BioSQL is top level rather than being Bio.BioSQL, it was long ago and I have no idea. For the documentation, recent releases of the tutorial have included the target version of Biopython together with the date. Again, this should be in the release instructions. Peter From bugzilla-daemon at portal.open-bio.org Thu Mar 5 17:54:01 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 5 Mar 2009 12:54:01 -0500 Subject: [Biopython-dev] [Bug 2767] Bio.SeqIO support for FASTQ and QUAL files In-Reply-To: Message-ID: <200903051754.n25Hs1cW030546@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2767 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Attachment #1251 is|0 |1 obsolete| | ------- Comment #9 from biopython-bugzilla at maubp.freeserve.co.uk 2009-03-05 12:54 EST ------- Created an attachment (id=1258) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1258&action=view) Read/write support for FASTQ and QUAL files, using the letter_annotations dict Small update to earlier version, with minor comment changes. Also includes explicit rounding of scores to the nearest integer when writing out PHRED scores in Solexa format (and vice versa). This conversion still needs verifying against real world examples. I've been testing with real world PHRED based files only so far. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Mar 6 16:08:50 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 6 Mar 2009 11:08:50 -0500 Subject: [Biopython-dev] [Bug 2779] Seq.count() docstring should note unexpected behaviour In-Reply-To: Message-ID: <200903061608.n26G8oL9003353@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2779 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #9 from biopython-bugzilla at maubp.freeserve.co.uk 2009-03-06 11:08 EST ------- I have updated the docstrings in CVS to stress that like the python string a non-overlapping count is used, marking this bug as fixed. >From the mailing list discussion having a overlapping count available would be a welcome enhancement, perhaps as a separate method, e.g. overlapping_count. Leighton's patch or Sebastian's code in Bio/SeqUtils/MeltingTemp.py could be used for the implementation. We can do this on a new enhancement bug, once a consensus is reached on the mailing list. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Mar 6 17:34:58 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 6 Mar 2009 12:34:58 -0500 Subject: [Biopython-dev] [Bug 2783] New: Using alternative start codons in Bio.Seq translate method/function Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2783 Summary: Using alternative start codons in Bio.Seq translate method/function Product: Biopython Version: Not Applicable Platform: All OS/Version: All Status: NEW Severity: enhancement Priority: P2 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: biopython-bugzilla at maubp.freeserve.co.uk This bug covers an issue originally raised on Bug 2381. This bug is specifically for how to translates a CDS using a non-standard start codon (a codon which doesn't normally encode methionine). In computing, we often blindly translate without worrying about start codons. For example, you might translated a whole genomes (in all six frames) as part of looking for open reading frames. Translating a partial CDS where the start is missing is another example. The current Bio.Seq translation functionality supports these usages. In real biology however, translation from RNA to amino acids always starts at a initiation/start codon (typically AUG) which becomes the methionine at the start of the protein. In eukaryotes, usually the only start codon is AUG, and it normally encodes methionine, so this doesn't seem special. However, in many organisms there are lots of genes with a alternative start/initiation codons which do NOT normally encode methionine. However, when they are used as a start/initiation code they DO get translated as methionine! For example, there are 418 annotated genes in E. coli K12 with non-standard start codons - which you might want to translate into proteins (which *should* start with a methionine). For example, using the following NCBI FASTA file of CDS sequences, ftp://ftp.ncbi.nih.gov/genomes/Bacteria/Escherichia_coli_K12_substr__MG1655 Here is the CDS for gene yaaX: >ref|NC_000913.2|:5234-5530 GTGAAAAAGATGCAATCTATCGTACTCGCACTTTCCCTGGTTCTGGTCGCTCCCATGGCA GCACAGGCTGCGGAAATTACGTTAGTCCCGTCAGTAAAATTACAGATAGGCGATCGTGAT AATCGTGGCTATTACTGGGATGGAGGTCACTGGCGCGACCACGGCTGGTGGAAACAACAT TATGAATGGCGAGGCAATCGCTGGCACCTACACGGACCGCCGCCACCGCCGCGCCACCAT AAGAAAGCTCCTCATGATCATCACGGCGGTCATGGTCCAGGCAAACATCACCGCTAA This starts GTC which is a valid bacterial start codon. I'd like to be able to translate this and get the actual biologically relevant protein as given in the GenBank file NC_000913.gbk (with or without the stop symbol at the end), which starts with "M" not "V": CDS 5234..5530 /gene="yaaX" /locus_tag="b0005" /codon_start=1 /transl_table=11 /product="predicted protein" /protein_id="NP_414546.1" /db_xref="ASAP:ABE-0000015" /db_xref="UniProtKB/Swiss-Prot:P75616" /db_xref="GI:16127999" /db_xref="ECOCYC:G6081" /db_xref="EcoGene:EG14384" /db_xref="GeneID:944747" /translation="MKKMQSIVLALSLVLVAPMAAQAAEITLVPSVKLQIGDRDNRGY YWDGGHWRDHGWWKQHYEWRGNRWHLHGPPPPPRHHKKAPHDHHGGHGPGKHHR" Without any non-standard start codon support, my translations start with a V (rather than the desired M): >>> from Bio.Seq import Seq >>> yaaX = Seq("GTGAAAAAGATGCAATCTATCGTACTCGCACTTTCCCTGGTTCTGGTCGCTCCCATGGCA" ... "GCACAGGCTGCGGAAATTACGTTAGTCCCGTCAGTAAAATTACAGATAGGCGATCGTGAT" ... "AATCGTGGCTATTACTGGGATGGAGGTCACTGGCGCGACCACGGCTGGTGGAAACAACAT" ... "TATGAATGGCGAGGCAATCGCTGGCACCTACACGGACCGCCGCCACCGCCGCGCCACCAT" ... "AAGAAAGCTCCTCATGATCATCACGGCGGTCATGGTCCAGGCAAACATCACCGCTAA") >>> print yaaX.translate(table=11) VKKMQSIVLALSLVLVAPMAAQAAEITLVPSVKLQIGDRDNRGYYWDGGHWRDHGWWKQHYEWRGNRWHLHGPPPPPRHHKKAPHDHHGGHGPGKHHR* >>> print yaaX.translate(table=11, to_stop=True) VKKMQSIVLALSLVLVAPMAAQAAEITLVPSVKLQIGDRDNRGYYWDGGHWRDHGWWKQHYEWRGNRWHLHGPPPPPRHHKKAPHDHHGGHGPGKHHR These start with "V", while in this situation I want an "M" because I know this is a full CDS and the first codon is a start codon. I therefore want to add an optional argument to the Seq object's translate method (and the Bio.Seq.translate function) so that I can obtain the desired results (both with and without the terminator stop symbol). I want an option to tell Biopython that this sequence commences with a start/initiation codon: >>> print yaaX.translate(table=11, with_start_codon=True) MKKMQSIVLALSLVLVAPMAAQAAEITLVPSVKLQIGDRDNRGYYWDGGHWRDHGWWKQHYEWRGNRWHLHGPPPPPRHHKKAPHDHHGGHGPGKHHR* >>> print yaaX.translate(table=11, to_stop=True, with_start_codon=True) MKKMQSIVLALSLVLVAPMAAQAAEITLVPSVKLQIGDRDNRGYYWDGGHWRDHGWWKQHYEWRGNRWHLHGPPPPPRHHKKAPHDHHGGHGPGKHHR I have in the above example called this new argument "with_start_codon", but I am open to naming suggestions. If False (default), then nothing changes. If the new argument is True, this indicates that the first codon should be a valid start/initiation codon (in the declared translation table), and that it should be translated as a methionine. I will upload a patch implementing this in a moment... This proposal is NOT about an option to have the translate function/method search the sequence for the first valid start codon (either in frame or not). This proposal is NOT about an option to check the sequence is a valid CDS (i.e. starts with a start codon, ends with an in frame stop codon, and has no internal premature stop codons), and then translating it. While this makes sense (and BioPerl does this), this would prevent certain uses. e.g. a partial CDS sequence where the 3' end is missing. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Mar 6 17:36:24 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 6 Mar 2009 12:36:24 -0500 Subject: [Biopython-dev] [Bug 2783] Using alternative start codons in Bio.Seq translate method/function In-Reply-To: Message-ID: <200903061736.n26HaOWH012440@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2783 ------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2009-03-06 12:36 EST ------- Created an attachment (id=1259) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1259&action=view) Patch for Bio/Seq.py to support non-standard start codons in translation Patch implementing my proposed change, based my earlier patch attachment 1040 on Bug 2381. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Mar 6 17:38:39 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 6 Mar 2009 12:38:39 -0500 Subject: [Biopython-dev] [Bug 2381] translate and transcribe methods for the Seq object (in Bio.Seq) In-Reply-To: Message-ID: <200903061738.n26Hcd04012626@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2381 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #55 from biopython-bugzilla at maubp.freeserve.co.uk 2009-03-06 12:38 EST ------- I'm closing this bug as basic translate and transcribe methods where included with Biopython 1.49. I have filed Bug 2381 for "Using alternative start codons in Bio.Seq translate method/function". -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Mar 6 17:43:25 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 6 Mar 2009 12:43:25 -0500 Subject: [Biopython-dev] [Bug 2783] Using alternative start codons in Bio.Seq translate method/function In-Reply-To: Message-ID: <200903061743.n26HhPRX013186@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2783 ------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk 2009-03-06 12:43 EST ------- (In reply to comment #1) > Created an attachment (id=1259) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1259&action=view) [details] > Patch for Bio/Seq.py to support non-standard start codons in translation > > Patch implementing my proposed change, based my earlier patch > attachment 1040 [details] on Bug 2381. Actually, it was based on the patch in attachment 1032 (not 1040) on Bug 2381. Other names proposed for this new argument included: init - rejected as potentially confusing force_methionine - possible, but implies any codon would be allowed even something that isn't a valid start codon alt_start - perhaps confusing? -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Mar 6 19:54:17 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 6 Mar 2009 14:54:17 -0500 Subject: [Biopython-dev] [Bug 2783] Using alternative start codons in Bio.Seq translate method/function In-Reply-To: Message-ID: <200903061954.n26JsHK4026141@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2783 ------- Comment #3 from eric.talevich at gmail.com 2009-03-06 14:54 EST ------- (In reply to comment #2) How about require_start? Or require_met, if you don't mind how strange it looks as English. The name with_start_codon seems like it would take a codon or alternate table as the argument. I also see two choices being made by using this parameter: (1) Check that the sequence starts with a valid start codon, and if not, raise an exception; (2) Use a set of alternate genetic codes for looking up the initial methionine. >From the other bug's discussion it seems like there are a number of boolean options that could reasonably be used with the translate() method, but adding them all as keyword args would clutter up the API. What about using a bitmask in Bio.Seq that can be used with translate()? The re module takes a bitmask as the last parameter for most functions, for example, and it looks pretty clean compared to a series of boolean keyword args. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From mjldehoon at yahoo.com Sun Mar 8 12:03:31 2009 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Sun, 8 Mar 2009 05:03:31 -0700 (PDT) Subject: [Biopython-dev] ScanProsite In-Reply-To: <320fb6e00903020226n3e5929ean957f38315c28d863@mail.gmail.com> Message-ID: <956971.84123.qm@web62404.mail.re1.yahoo.com> --- On Mon, 3/2/09, Peter wrote: > I like your suggestion to have a REST XML based module > under Bio.ExPASy, which means we can deprecate the HTML based > Bio.Prosite module and in the process make the top level list of > modules in Biopython a bit shorter. In the long term I think that > will help people find functionality. > Then, how about the following code organization: Bio/ExPASy/__init__.py contains get_prodoc_entry Interface to the get-prodoc-entry CGI script. get_prosite_entry Interface to the get-prosite-entry CGI script. get_prosite_raw Interface to the get-prosite-raw CGI script. get_sprot_raw Interface to the get-sprot-raw CGI script. sprot_search_ful Interface to the sprot-search-ful CGI script. sprot_search_de Interface to the sprot-search-de CGI script. (currently in Bio/ExPASy.py) Bio/ExPASy/Prosite.py contains read(), parse(), Record for Prosite files (currently in Bio/Prosite/__init__.py), as well as a Pattern class to handle Prosite patterns (currently in Bio/Prosite/Pattern.py, but this seems to be unused). Bio/ExPASy/Prodoc.py contains read(), parse(), Record for Prosite documentation files (currently in Bio/Prosite/Prodoc.py) Bio/ExPASy/ScanProsite contains scan(), read(), Record to interact with ScanProsite (currently a broken version to access ScanProsite and parse its results exists in Bio/ExPASy.py and Bio/Prosite/__init__.py). I have a simplified version of the Prosite and Prodoc parsers. If we use the scheme above, I'll put the new version in Bio/ExPASy/Prosite.py and Bio/ExPASy/Prodoc.py, and deprecate Bio.Prosite. --Michiel. From biopython at maubp.freeserve.co.uk Tue Mar 10 20:29:54 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 10 Mar 2009 20:29:54 +0000 Subject: [Biopython-dev] [Utilities-announce] NCBI E-Utilities requirements updated In-Reply-To: <320fb6e00903050440h138893b3yb2484a557621fc41@mail.gmail.com> References: <7B6F170840CA6C4DA63EE0C8A7BB43EC051985F6@NIHCESMLBX15.nih.gov> <320fb6e00903050440h138893b3yb2484a557621fc41@mail.gmail.com> Message-ID: <320fb6e00903101329i69e40fc0i6a2b13332df55e7a@mail.gmail.com> On Thu, Mar 5, 2009 at 12:40 PM, Peter wrote: > This email was sent out a few weeks ago, but it took a while before > the NCBI webpage was actually updated (maybe a caching issue) so I > didn't rush to relax our rules immediately. > > Under the new rules we must make no more than three requests every > second. We could track the times of the last two requests in order to > enforce this as worded, but I think it would be simpler just to switch > from using a minimum 3 second pause between Bio.Entrez requests to > just a minimum 0.33334 second pause. This is a much simpler code > change and will comply with the new relaxed rules. > > Unless anyone has a counter suggestion, I will update Bio.Entrez and > the tutorial shortly. Change made in CVS, including the tutorial. Peter From bugzilla-daemon at portal.open-bio.org Tue Mar 10 20:36:28 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 10 Mar 2009 16:36:28 -0400 Subject: [Biopython-dev] [Bug 2762] GFF capability in SeqIO In-Reply-To: Message-ID: <200903102036.n2AKaSje008217@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2762 ------- Comment #5 from biopython-bugzilla at maubp.freeserve.co.uk 2009-03-10 16:36 EST ------- For anyone following this bug, Brad has some related code posted on his blog - see this mailing list discussion: http://lists.open-bio.org/pipermail/biopython/2009-March/004983.html -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Mar 10 20:49:30 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 10 Mar 2009 16:49:30 -0400 Subject: [Biopython-dev] [Bug 2783] Using alternative start codons in Bio.Seq translate method/function In-Reply-To: Message-ID: <200903102049.n2AKnUoD009300@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2783 ------- Comment #4 from biopython-bugzilla at maubp.freeserve.co.uk 2009-03-10 16:49 EST ------- On comment #3, Eric wrote: > > How about require_start? Or require_met, if you don't mind how strange > it looks as English. The name with_start_codon seems like it would take > a codon or alternate table as the argument. I think "require_start" is OK. Or "require_start_codon". > I also see two choices being made by using this parameter: > (1) Check that the sequence starts with a valid start codon, and > if not, raise an exception; That is what my patch does. Plus of course translating the valid start codon as a methionine. > (2) Use a set of alternate genetic codes for looking up the initial > methionine. I'm unsure what you mean here. If you mean actually having the translate method search for the first valid start codon, I am really not keen on this at all. This is complicated, and verges on gene/ORF finding, which I specifically wanted to avoid: Peter wrote in comment #0: >> This proposal is NOT about an option to have the translate >> function/method search the sequence for the first valid >> start codon (either in frame or not). On comment #3, Eric wrote: > From the other bug's discussion it seems like there are a number of boolean > options that could reasonably be used with the translate() method, but adding > them all as keyword args would clutter up the API. What about using a bitmask > in Bio.Seq that can be used with translate()? The re module takes a bitmask as > the last parameter for most functions, for example, and it looks pretty clean > compared to a series of boolean keyword args. I agree that there is a risk of confusion with too many arguments. But I don't think a bitmask would help - I think it makes it worse! I'm not saying its a good thing, but we have lots of functions/methods in Biopython already with lots of arguments (e.g. the standalone BLAST wrappers, or the Bio.Entrez functions). On the other hand, I can't immediately think of a single python function which uses a bitmask. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython at maubp.freeserve.co.uk Tue Mar 10 23:40:29 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 10 Mar 2009 23:40:29 +0000 Subject: [Biopython-dev] Bio.Entrez catching more errors Message-ID: <320fb6e00903101640s5db8ed9hc1335d02f5e4123@mail.gmail.com> Hi All, It occured to me that the Bio.Entrez._open function can look at the retmode argument (if present) and spot if there is a mismatch between the requested format (e.g. XML, HTML, text or asn.1) and the actual data the NCBI returned. Something along the following lines could be added to the end of the _open function in Bio/Entrez/__init__.py to acheive this: elif "retmode" in params and params["retmode"].lower()=="html" \ and not data.lower().startswith(">> print Entrez.efetch(db="homologene", id="nonexistant", retmode="text").read() >>> print Entrez.efetch(db="homologene", id="nonexistant", retmode="asn.1").read() Similarly, these give an XML like fragment (which is not a valid XML file in itself - arguably an NCBI bug; some databases like "protein" are better behaved in this respect): >>> print Entrez.efetch(db="pubmed", id="nonexistant", retmode="xml").read() >>> print Entrez.efetch(db="homologene", id="nonexistant", retmode="xml").read() >>> print Entrez.efetch(db="cdd", id="nonexistant", retmode="xml").read() >>> print Entrez.efetch(db="taxonomy", id="nonexistant", retmode="xml").read() My suggested change to Bio.Entrez would also catch the following examples (using an invalid database) where the NCBI ignore the retmode and return an HTML help page: >>> print Entrez.efetch(db="nonexistant", id="123456", retmode="xml").read() >>> print Entrez.efetch(db="nonexistant", id="123456", retmode="text").read() In a less clear cut example, this would flag the following as an error as the NCBI seem to return ASN.1 text instead of HTML here:: >>> print Entrez.efetch(db="nucleotide", retmode="html", id="123456").read() Overall, I think this change should catch lots of errors which otherwise may not be detected until later (e.g. while trying to parse the file). -------------------------------------------------------------------------------------------------- On another point, should we catch these responses as errors:? >>> efetch(db="snp", id="123456").read() 'PmFetch response\n
\n1:
id: 123456 Error occurred: cannot get document
summary\n
' >>> efetch(db="snp", id="123456", retmode="html").read() 'PmFetch response\n
\n1:
id: 123456 Error occurred: cannot get document
summary\n
' >>> efetch(db="snp", id="123456", retmode="xml").read() '\n1: id: 123456 Error occurred: cannot get document summary\n\n' >>> efetch(db="snp", id="123456", retmode="text").read() '1: id: 123456 Error occurred: cannot get document summary\n' and, >>> print efetch(db="homologene", retmode="html", id="fake").read()

Error occurred: Empty id list - nothing todo

... Looking for the string "Error occurred: " looks fairly safe here, and should cover a range of entries. Of course, you can imagine false positives too, e.g. a valid PUBMED plain text record for a tutorial article with a title like "Yikes! An Error Occurred: A beginner's Guide To Defensive Programming." could match. Peter From lorena.carlo at gmail.com Wed Mar 11 15:58:24 2009 From: lorena.carlo at gmail.com (=?ISO-8859-1?Q?Lorena_Carl=F3?=) Date: Wed, 11 Mar 2009 09:58:24 -0600 Subject: [Biopython-dev] function to map uniprot IDs with PDB IDs Message-ID: <22d7b0c30903110858x4125dc25v24fafec3d561209e@mail.gmail.com> Hi all, I would like to know if there is an implemented function in Biopython that allows getting the PDB id from a Uniprotkb ID?. Thanks, Lorena From biopython at maubp.freeserve.co.uk Wed Mar 11 16:12:36 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 11 Mar 2009 16:12:36 +0000 Subject: [Biopython-dev] function to map uniprot IDs with PDB IDs In-Reply-To: <22d7b0c30903110858x4125dc25v24fafec3d561209e@mail.gmail.com> References: <22d7b0c30903110858x4125dc25v24fafec3d561209e@mail.gmail.com> Message-ID: <320fb6e00903110912g717ccb52q4242a6ff169b5d1f@mail.gmail.com> On Wed, Mar 11, 2009 at 3:58 PM, Lorena Carl? wrote: > Hi all, > > I would like to know if there is an implemented function in Biopython that > allows getting the PDB id from a Uniprotkb ID?. > > Thanks, > Lorena There isn't a simple one-to-one mapping from a UniProtKB/Swiss-Prot ID to a PDB ID, see http://www.uniprot.org/faq/2 Are you working from UniProtKB/Swiss-Prot files? How about something like this: # This assumes you have downloaded the following file # to your working directory: # http://www.uniprot.org/uniprot/P00734.txt from Bio import SeqIO record = SeqIO.read(open("P00734.txt"),"swiss") for xref in record.dbxrefs : if xref.startswith("PDB:") : print xref.split(":",1)[1] Peter P.S. This is more a question for the main discussion list, rather than Biopython development From bugzilla-daemon at portal.open-bio.org Wed Mar 11 23:39:02 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 11 Mar 2009 19:39:02 -0400 Subject: [Biopython-dev] [Bug 2788] New: Bio.Nexus.Trees newick parser crash Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2788 Summary: Bio.Nexus.Trees newick parser crash Product: Biopython Version: 1.49 Platform: Macintosh OS/Version: Mac OS Status: NEW Severity: normal Priority: P2 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: matzke at berkeley.edu The newick files I have been working with seem to open fine in several different programs/packages (Dendroscope, R's APE package, phylocom, python alfacinha module), but not the newick parser in Bio.Nexus.Trees. Rather than upload a file I've got the full newick string hard-coded below: ============ from Bio.Nexus.Trees import Tree tree_str = '(((((((((((((((((Sambucus:43.136024,Viburnum:43.136040)Adoxaceae:53.892513,(Acanthopanax:34.719704,Aralia:34.719727,Dendropanax:34.719727,Evodiopanax:34.719727,Kalopanax:34.719727,Schefflera:34.719727)Araliaceae:62.308830):7.045975,Ilex:104.074516):3.056864,((((((Catalpa:22.623766,Paulownia:22.623785)Bignoniaceae:22.623766,(Clerodendrum:19.864199,Premna:19.864218)Verbenaceae:25.383331):22.378326,(Chionanthus:29.443968,Forestiera:29.443979,Fraxinus:29.443979,Ligustrum:29.443979,Osmanthus:29.443979,Syringa:29.443979)Oleaceae:38.181892):19.113832,(Adina:38.252457,Cephalanthus:38.252472,Emmenopterys:38.252472,Pinckneya:38.252472,Randia:38.252472)Rubiaceae:48.487236):2.360018,Ehretia:89.099709):13.495450,Eucommia:102.595161):4.536214):0.905059,((((Clethra:78.134140,((Cliftonia:38.402752,Cyrilla:38.402775)Cyrillaceae:38.402752,(Arbutus:38.402752,Elliottia:38.402775,Enkianthus:38.402775,Kalmia:38.402775,Lyonia:38.402775,Oxydendrum:38.402775,Rhododendron:38.402775,Vaccinium:38.402775)Ericaceae:38.402752):1.328631):12.980787,(((Halesia:30.391993,Pterostyrax:30.392012,Styrax:30.392012)Styracaceae:51.775261,Symplocos:82.167252):0.000000,(Camellia:41.083626,Franklinia:41.083649,Gordonia:41.083649,Stewartia:41.083649,Ternstroemia:41.083649)Theaceae:41.083626):8.947675):0.000149,Diospyros:91.115099):2.023849,((Ardisia:18.344650,Myrsine:18.344666)Myrsinaceae:74.794174,Bumelia:93.138824):0.000101):14.897509):1.462594,((Alangium:48.167362,Aucuba:48.167370,Cornus:48.167370,Macrocarpium:48.167370,Torricellia:48.167370)Cornaceae:53.025345,(Hydrangea:97.032310,(Davidia:48.516151,Nyssa:48.516167)Nyssaceae:48.516151):4.160399):8.306321):7.064716,Schoepfia:116.563736):0.000000,((((Altingia:50.813206,Liquidambar:50.813213)Altingiaceae:50.813206,(Disanthus:50.813206,Distylium:50.813213,Fortuneria:50.813213,Hamamelis:50.813213,Loropetalum:50.813213,Sinowilsonia:50.813213)Hamamelidaceae:50.813206):0.000131,(Cercidiphyllum:87.828712,Daphniphyllum:87.828712):13.797829):13.247040,(((((((Choerospondias:21.440735,Cotinus:21.440742,Pist! acia:21. 440742,Rhus:21.440742,Toxicodendron:21.440742)Anacardiaceae:37.304596,(Acer:29.372665,Aesculus:29.372681,Dipteronia:29.372681,Koelreuteria:29.372681,Sapindus:29.372681)Sapindaceae:29.372665):0.000114,((Cedrela:49.350353,(Ailanthus:24.675177,Leitneria:24.675188,Picrasma:24.675188)Simaroubaceae:24.675177):4.016092,(Evodia:26.683222,Phellodendron:26.683233,Ptelea:26.683233,Zanthoxylum:26.683233)Rutaceae:26.683222):5.379002):29.842871,(Firmiana:32.917126,Tilia:32.917149)Malvaceae:55.671188):12.661992,(Lagerstroemia:84.110847,Szyzygium:84.110847):17.139463):2.612011,((((((Alnus:16.609535,Betula:16.609543,Carpinus:16.609543,Corylus:16.609543,Ostrya:16.609543)Betulaceae:37.306709,((Carya:25.504854,Cyclocarya:25.504866,Juglans:25.504866,Engelhardtia:25.504866,Platycarya:25.504866,Pterocarya:25.504866)Juglandaceae:25.504854,Myrica:51.009708):2.906531):9.893459,(Castanea:31.904850,Castanopsis:31.904873,Cyclobalanopsis:31.904873,Fagus:31.904873,Lithocarpus:31.904873,Quercus:31.904873)Fagaceae:31.904850):21.681023,(((((Celtis:20.739927,Pteroceltis:20.739939)Cannabaceae:20.739927,((Broussounetia:12.614990,Cudrania:12.615005,Maclura:12.615005,Morus:12.615005)Moraceae:12.614990,Oreocnide:25.229980):16.249876):10.909924,(Aphananthe:26.194889,Hemiptelea:26.194897,Planera:26.194897,Ulmus:26.194897,Zelkova:26.194897)Ulmaceae:26.194889):11.649286,(Hovenia:32.019470,Rhamnus:32.019493,Ziziphus:32.019493)Rhamnaceae:32.019596):8.938065,(((Amelanchier:36.488564,(Crataegus:36.488586,Mespilus:36.488586):0.000000):0.000000,Chaenomeles:36.488586,Eriobotrya:36.488586,Malus:36.488586,Photinia:36.488586,Pyrus:36.488586,Sorbus:36.488586):0.000000,Prunus:36.488586)Rosaceae:36.488564):12.513593):4.616908,(Albizia:31.901920,Cercis:31.901943,Cladrastis:31.901943,Dalbergia:31.901943,Erythrina:31.901943,Gleditsia:31.901943,Gymnocladus:31.901943,Laburnum:31.901943,Maackia:31.901943,Ormosia:31.901943,Robinia:31.901943,Sophora:31.901943)Fabaceae:58.205711):4.139401,((Euonymus:90.433327,Sloanea:90.433327):0.000101,((Mallotus:28.689901,Sapium:28.6! 89920)Eu phorbiaceae:50.330055,(Idesia:29.019764,Poliothyrsis:29.019779,Populus:29.019779,Salix:29.019779,Xylosma:29.019779)Salicaceae:50.000195):11.413469):3.813607):9.615288):0.000000,(Staphylea:21.372393,Tapiscia:21.372404,Turpinia:21.372404)Staphyleaceae:82.489929):11.011259):1.690163):7.829397,Buxus:124.393143):0.000000,Tetracentron:124.393143):2.763555,Meliosma:127.156693):1.664427,Platanus:128.821121):2.029122,Euptelea:130.850250):11.447736,((Asimina:95.972672,(Liriodendron:47.125092,Magnolia:47.125114,Manglieita:47.125114,Michelia:47.125114)Magnoliaceae:48.847580):46.325292,(Actinodaphne:49.903526,Cinnamomum:49.903542,Lindera:49.903542,Litsea:49.903542,Machilus:49.903542,Neolitsea:49.903542,Nothaphoebe:49.903542,Persea:49.903542,Phoebe:49.903542,Sassafras:49.903542,Umbellularia:49.903542)Lauraceae:92.394188):0.000257):1.840266,(Yucca:110.138222,((Sabal:100.000000,(Serenoa:95.000000,Trachycarpus:95.000000)ST:5.000000)Arecaceae:10.000000,(Arundinaria:20.476601,Phyllostachys:20.476624,Semiarundinaria:20.476624)Poaceae:89.661629):0.000000):34):30.861772,Illicium:175.000000)aus2ast:175.000000,(((((Cephalotaxus:125.000000,(Taxus:100.000000,Torreya:100.000000)TT1:25.000000)Taxaceae:90.000000,((((((((Calocedrus:85.000000,Platycladus:85.000000)CP:5.000000,(Cupressus:85.000000,Juniperus:85.000000)CJ:5.000000)CJCP:5.000000,Chamaecyparis:95.000000)CCJCP:5.000000,(Thuja:7.870000,Thujopsis:7.870000)TT2:92.13)CJCPTT:30.000000,((Cryptomeria:120.000000,Taxodium:120.000000)CT:5.000000,Glyptostrobus:125.000000)CTG:5.000000)CupCallTax:5.830000,((Metasequoia:125.000000,Sequoia:125.000000)MS:5.000000,Sequoiadendron:130.000000)Sequoioid:5.830000)STCC:49.060001,Taiwania:184.889999)Taw+others:15.110000,Cunninghamia:200.000000)nonSci:15.000000)Tax+nonSci:10.000000,Sciadopitys:225.000000):25.000000,(((Abies:106.000000,Keteleeria:106.000000)AK:54.000000,(Pseudolarix:156.000000,Tsuga:156.000000)NTP:4.000000)NTPAK:24.000000,((Larix:87.000000,Pseudotsuga:87.000000)LP:81.000000,(Picea:155.000000,Pinus:155.000000)PPC:13.000000)Pinoideae:! 16.00000 0)Pinaceae:66.000000)Coniferales:25.000000,Ginkgo:275.000000)gymnosperm:75.000000)seedplant:50.000000;' tree_obj = Tree(tree_str) print tree_obj ============ This brings up the follow error for "tree_obj = Tree(tree_str)": ======== ValueError: invalid literal for float(): seedplant ======== It looks like it is looking for a floating point number where "seedplant" is. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Mar 12 10:17:01 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 12 Mar 2009 06:17:01 -0400 Subject: [Biopython-dev] [Bug 2788] Bio.Nexus.Trees newick parser crash In-Reply-To: Message-ID: <200903121017.n2CAH13S012060@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2788 ------- Comment #1 from cymon.cox at gmail.com 2009-03-12 06:17 EST ------- (In reply to comment #0) > The newick files I have been working with seem to open fine in several > different programs/packages (Dendroscope, R's APE package, phylocom, python > alfacinha module), but not the newick parser in Bio.Nexus.Trees. [a big tree] > tree_obj = Tree(tree_str) > > print tree_obj > ============ > > > This brings up the follow error for "tree_obj = Tree(tree_str)": > ======== > ValueError: invalid literal for float(): seedplant > ======== > > It looks like it is looking for a floating point number where "seedplant" is. Your tree is decorated with node labels, which the parser cannot handle. This came up recently (within the last year?) but I can't find the bug/message. Should probably catch this and return an informative error - or implement node labels... C. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Mar 12 10:38:59 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 12 Mar 2009 06:38:59 -0400 Subject: [Biopython-dev] [Bug 2788] Bio.Nexus.Trees newick parser does not support internal node labels In-Reply-To: Message-ID: <200903121038.n2CAcxMR014167@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2788 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Severity|normal |enhancement OS/Version|Mac OS |All Platform|Macintosh |All Summary|Bio.Nexus.Trees newick |Bio.Nexus.Trees newick |parser crash |parser does not support | |internal node labels ------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk 2009-03-12 06:38 EST ------- I thought it looked familiar, but I must have only searched the currently open bugs. This looks *very* similar to Bug 2543 which dealt with internal node names, which was fixed for Biopython 1.49 (and 1.49 beta). Frank wrote: > Nexus.Trees has been extended to deal with internal node names, or "special > comments" in the format [& blablalba]. Such comments comments can appear > directly after the taxon label, after the closing parentheses, or between > branchlength / support values attached to a node or a taxon labels, ... i.e. On Bug 2543, Frank didn't go as far as the enhancement to cope with "naked" node labels, just those in the square brackets. Consider this smaller example Cymon gave on Bug 2543: >>> from Bio.Nexus.Trees import Tree >>> tree_str2 = "(((t9:0.385832, (t8:0.445135,t4:0.41401)C:0.024032)B:0.041436, t6:0.392496)A:0.0291131, t2:0.497673, ((t0:0.301171, t7:0.482152)E:0.0268148, ((t5:0.0984167,t3:0.488578)G:0.0349662, t1:0.130208)F:0.0318288)D:0.0273876);" >>> tree_obj = Tree(tree_str2) Traceback (most recent call last): ... ValueError: invalid literal for float(): A I've retitled this and marked it as an enhancement. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Mar 12 10:41:30 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 12 Mar 2009 06:41:30 -0400 Subject: [Biopython-dev] [Bug 2543] Bio.Nexus.Trees can't handle named ancestors In-Reply-To: Message-ID: <200903121041.n2CAfUwH014362@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2543 ------- Comment #7 from biopython-bugzilla at maubp.freeserve.co.uk 2009-03-12 06:41 EST ------- On comment #5 Frank wrote: > In my opinion, naming nodes is a feature, and I would not regard the lack of > this feature as a bug. But I'll have a look at the code and see how easy > this can be changed. It would actually be nice if P4 and Bio.Nexus, both > being python programs, could read each other's trees. This enhancement is now covered by Bug 2788. It appears that now several other programs support this Newick tree variant, making it a bit more important. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From chris.lasher at gmail.com Thu Mar 12 21:07:21 2009 From: chris.lasher at gmail.com (Chris Lasher) Date: Thu, 12 Mar 2009 17:07:21 -0400 Subject: [Biopython-dev] biopython on github In-Reply-To: <320fb6e00902260600p5fb90241td1ded497c08cb901@mail.gmail.com> References: <5aa3b3570902150729g367022a5p334b2c33f86461f@mail.gmail.com> <320fb6e00902230731h6257376sb2d6772f72b6e03a@mail.gmail.com> <3f6baf360902230843u320e9fe9wc0a03928383d6cbb@mail.gmail.com> <320fb6e00902230908j38f5755la85a55bfc461a763@mail.gmail.com> <8b34ec180902231029u7a9d003r533af7f078f4a8e2@mail.gmail.com> <8b34ec180902250140k4fb1bef0y913b97db0e309e4b@mail.gmail.com> <320fb6e00902250210t2ad19536ke379e219ba6f7dae@mail.gmail.com> <8b34ec180902250256k6f6f5c1bvbf85d8b68a315927@mail.gmail.com> <8b34ec180902260526m3ff42f3x2a99a77d4d0fb928@mail.gmail.com> <320fb6e00902260600p5fb90241td1ded497c08cb901@mail.gmail.com> Message-ID: <128a885f0903121407g133ed8ctda57b21ff8adb70e@mail.gmail.com> On Thu, Feb 26, 2009 at 10:00 AM, Peter wrote: > Another option to consider would be to switch to running git on > biopython.org, but use the git-cvsserver tool to provide an emulated > CVS server on top of the git repository. ?This sounds possible in > theory, and would be nice for any "old fashioned" biopython developers > because is should be fairly transparent - they can continue to treat > it as CVS and just work on the main trunk. ?This would require someone > competent to do the conversion and alter the server setup - we'd have > to talk to the OBF team about this. ?However, if anyone has first hand > experience on git-cvsserver perhaps they could comment on weather this > sounds like a good plan or not. I must be missing something, Peter. Why would BioPython continue to operate with CVS? I suppose I just really hope to see BioPython running with something other than CVS, and I'd really like to see it go either under Bazaar or Git. Chris From bartek at rezolwenta.eu.org Thu Mar 12 23:20:23 2009 From: bartek at rezolwenta.eu.org (Bartek Wilczynski) Date: Fri, 13 Mar 2009 00:20:23 +0100 Subject: [Biopython-dev] biopython on github In-Reply-To: <128a885f0903121407g133ed8ctda57b21ff8adb70e@mail.gmail.com> References: <5aa3b3570902150729g367022a5p334b2c33f86461f@mail.gmail.com> <3f6baf360902230843u320e9fe9wc0a03928383d6cbb@mail.gmail.com> <320fb6e00902230908j38f5755la85a55bfc461a763@mail.gmail.com> <8b34ec180902231029u7a9d003r533af7f078f4a8e2@mail.gmail.com> <8b34ec180902250140k4fb1bef0y913b97db0e309e4b@mail.gmail.com> <320fb6e00902250210t2ad19536ke379e219ba6f7dae@mail.gmail.com> <8b34ec180902250256k6f6f5c1bvbf85d8b68a315927@mail.gmail.com> <8b34ec180902260526m3ff42f3x2a99a77d4d0fb928@mail.gmail.com> <320fb6e00902260600p5fb90241td1ded497c08cb901@mail.gmail.com> <128a885f0903121407g133ed8ctda57b21ff8adb70e@mail.gmail.com> Message-ID: <8b34ec180903121620w9c2ec46i8fed9ccb4781370e@mail.gmail.com> On Thu, Mar 12, 2009 at 10:07 PM, Chris Lasher wrote: > > I must be missing something, Peter. Why would BioPython continue to > operate with CVS? I suppose I just really hope to see BioPython > running with something other than CVS, and I'd really like to see it > go either under Bazaar or Git. > Hi Chris, The idea is to do the switch in two steps: - first we still have the main branch in CVS while we have git and/or bzr branches synchronized with it for people to branch and contribute - If this works nicely, we will switch to one of these systems completely (while possibly keeping the other branch in sync, but this is not yet decided) The first step is to some extent operational (I'm currently busy with other stuff, but I'll get arround it hopefully this weekend), but the second step requires decision on our side (git or bzr?) and action on the side of OBF (there is no git or bazar installed on obf servers). cheers -- Bartek Wilczynski From biopython at maubp.freeserve.co.uk Fri Mar 13 12:21:14 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 13 Mar 2009 12:21:14 +0000 Subject: [Biopython-dev] biopython on github In-Reply-To: <8b34ec180903121620w9c2ec46i8fed9ccb4781370e@mail.gmail.com> References: <5aa3b3570902150729g367022a5p334b2c33f86461f@mail.gmail.com> <320fb6e00902230908j38f5755la85a55bfc461a763@mail.gmail.com> <8b34ec180902231029u7a9d003r533af7f078f4a8e2@mail.gmail.com> <8b34ec180902250140k4fb1bef0y913b97db0e309e4b@mail.gmail.com> <320fb6e00902250210t2ad19536ke379e219ba6f7dae@mail.gmail.com> <8b34ec180902250256k6f6f5c1bvbf85d8b68a315927@mail.gmail.com> <8b34ec180902260526m3ff42f3x2a99a77d4d0fb928@mail.gmail.com> <320fb6e00902260600p5fb90241td1ded497c08cb901@mail.gmail.com> <128a885f0903121407g133ed8ctda57b21ff8adb70e@mail.gmail.com> <8b34ec180903121620w9c2ec46i8fed9ccb4781370e@mail.gmail.com> Message-ID: <320fb6e00903130521s69c5b3eg55b71191b1e8ff21@mail.gmail.com> On Thu, Mar 12, 2009 at 11:20 PM, Bartek Wilczynski wrote: > On Thu, Mar 12, 2009 at 10:07 PM, Chris Lasher wrote: >> On Thu, Feb 26, 2009 at 10:00 AM, Peter wrote: >>> Another option to consider would be to switch to running git on >>> biopython.org, but use the git-cvsserver tool to provide an emulated >>> CVS server on top of the git repository. This sounds possible in >>> theory, and would be nice for any "old fashioned" biopython developers >>> because is should be fairly transparent - they can continue to treat >>> it as CVS and just work on the main trunk. This would require someone >>> competent to do the conversion and alter the server setup - we'd have >>> to talk to the OBF team about this. However, if anyone has first hand >>> experience on git-cvsserver perhaps they could comment on weather this >>> sounds like a good plan or not. >> >> I must be missing something, Peter. Why would BioPython continue to >> operate with CVS? I suppose I just really hope to see BioPython >> running with something other than CVS, and I'd really like to see it >> go either under Bazaar or Git. I'm warming to the idea of git, and had noticed git includes the optional git-cvsserver tool which emulates a CVS server while using git underneath. I was wondering if anyone had first hand experience of this. If we did move from CVS to git (still hosted on biopython.org), this would seem to offer a nice migration path for of our "old school" CVS developers - they can carry on as usual. Of course, if none of us care about having to learn a new interface, then a simple switch would be less hassle to setup. For the server side of things, we'll need to talk to the OBF team about any such move - as far as I know they've only managed CVS to SVN migrations in the past. Peter > Hi Chris, > > The idea is to do the switch in two steps: > - first we still have the main branch in CVS while we have git and/or > bzr branches synchronized with it for people to branch and contribute > - If this works nicely, we will switch to one of these systems > completely (while possibly keeping the other branch in sync, but this > is not yet decided) That does seem like a good plan. Of course, there is the related issue of where we host the official repository (externally, e.g. on github or lauchpad) or in house (on biopython.org). I favour keeping the official repository on biopython.org but this will require OBF technical support (do we have the expertise within Biopython? Bartek? Chris?). > The first step is to some extent operational (I'm currently busy with > other stuff, but I'll get arround it hopefully this weekend), but the > second step requires decision on our side (git or bzr?) and action on > the side of OBF (there is no git or bazar installed on obf servers). There is also the previously semi-agreed solution of switching from CVS to SVN on biopython.org, but this would be only a gradual improvement. I gather there are mature tools for using git+svn together, so it should be better than using git+cvs together. Other than meaning all the OBF hosted projects are on SVN (I think we are the last still on CVS), this is beginning to seem a bit pointless. Peter From bugzilla-daemon at portal.open-bio.org Fri Mar 13 15:48:39 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 13 Mar 2009 11:48:39 -0400 Subject: [Biopython-dev] [Bug 2780] PDB file HETATMs cannot be alternative location of a residue that is an ATOM In-Reply-To: Message-ID: <200903131548.n2DFmdZ6015899@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2780 ------- Comment #2 from klaus.kopec at tuebingen.mpg.de 2009-03-13 11:48 EST ------- PDB IDs of some more occurances (simply search the file for "HETATM" and look for a HETATM record that is followed by a ATOM with the same residue number and a different altloc). 1din 1k4q 1k55 - multiple occurances 1k56 1rqh 1rr2 1xpk 1xpl - multiple occurances 1xpm - multiple occurances -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From jblanca at btc.upv.es Fri Mar 13 15:59:01 2009 From: jblanca at btc.upv.es (Jose Blanca) Date: Fri, 13 Mar 2009 16:59:01 +0100 Subject: [Biopython-dev] library to create gel image In-Reply-To: <200902271157.49948.jblanca@btc.upv.es> References: <200902261612.54306.jblanca@btc.upv.es> <320fb6e00902270245q65c0b924obd5181576374134c@mail.gmail.com> <200902271157.49948.jblanca@btc.upv.es> Message-ID: <200903131659.01590.jblanca@btc.upv.es> Hi: I've fishished a first version of a program that reads a list of Applied Biosystems fsa files and draws a virtual gel. It does not reads the sequence because my users are interested in fragment analysis, but the basic infraestructure is in place to do it. It does what my users need. It's quite slow though, but I'm not investing time in optimizing it. If anybody wants to take a look at the code is in: http://bioinf.comav.upv.es/svn/gelify/gelifyfsa/ I distribute it under the GPL licence. If you think that any part of the code could be of any use for the Biopython project I would be very please to give it to the comunity. Best regards, Jose Blanca On Friday 27 February 2009 11:57:49 Jose Blanca wrote: > On Friday 27 February 2009 11:45:59 Peter wrote: > > On Fri, Feb 27, 2009 at 9:05 AM, Jose Blanca wrote: > > That's much clearer - is the Genographer software showing the actual > > image (zoomed as required, with the colours adjusted as required), or > > an artificial recreation? > > Is an artificial recreation, the same as I'm trying to accomplish. I just > want more resolution an automated process (genographer is a GUI > application) > > > Are you trying to create this figure for illustrative purposes only? > > I mean would a slightly cartoon like recreation be fine, or are you > > trying to make it as realistic as possible? > > I want to analyze it. > > > I see you are having to reverse engineer their file format. I guess > > other people have tried this in the past so there may be more clues > > out on the internet. Have you tried emailing the company to see if > > they would publish the file format specifications (unlikely I fear, > > but worth asking). > > Fortunately the ABIF was reverse enginered by people more clever than me. > And a couple of years ago Applied published an specification. > http://bioinf.comav.upv.es/svn/gelify/gelifyfsa/src/doc/ABIF_File_Format.pd >f You can't beleive everything in that specification, but it is a good > start. Reading an abif file is not a problem, drawing the gel with as > little coding as possible is another thing. > Regards, > > Jose Blanca > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev -- Jose M. Blanca Postigo Instituto Universitario de Conservacion y Mejora de la Agrodiversidad Valenciana (COMAV) Universidad Politecnica de Valencia (UPV) Edificio CPI (Ciudad Politecnica de la Innovacion), 8E 46022 Valencia (SPAIN) Tlf.:+34-96-3877000 (ext 88473) From biopython at maubp.freeserve.co.uk Fri Mar 13 16:12:12 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 13 Mar 2009 16:12:12 +0000 Subject: [Biopython-dev] library to create gel image In-Reply-To: <200903131659.01590.jblanca@btc.upv.es> References: <200902261612.54306.jblanca@btc.upv.es> <320fb6e00902270245q65c0b924obd5181576374134c@mail.gmail.com> <200902271157.49948.jblanca@btc.upv.es> <200903131659.01590.jblanca@btc.upv.es> Message-ID: <320fb6e00903130912k455c49d6y6baff970ad064bd@mail.gmail.com> On Fri, Mar 13, 2009 at 3:59 PM, Jose Blanca wrote: > Hi: > I've fishished a first version of a program that reads a list of Applied > Biosystems fsa files and draws a virtual gel. It does not reads the sequence > because my users are interested in fragment analysis, but the basic > infraestructure is in place to do it. > It does what my users need. It's quite slow though, but I'm not investing time > in optimizing it. Do you have any example images online for people to look at? Peter From jblanca at btc.upv.es Fri Mar 13 16:16:46 2009 From: jblanca at btc.upv.es (Jose Blanca) Date: Fri, 13 Mar 2009 17:16:46 +0100 Subject: [Biopython-dev] library to create gel image In-Reply-To: <320fb6e00903130912k455c49d6y6baff970ad064bd@mail.gmail.com> References: <200902261612.54306.jblanca@btc.upv.es> <200903131659.01590.jblanca@btc.upv.es> <320fb6e00903130912k455c49d6y6baff970ad064bd@mail.gmail.com> Message-ID: <200903131716.46413.jblanca@btc.upv.es> Here you have one: http://bioinf.comav.upv.es/svn/gelify/gelifyfsa/src/doc/out.png Jose Blanca On Friday 13 March 2009 17:12:12 Peter wrote: > On Fri, Mar 13, 2009 at 3:59 PM, Jose Blanca wrote: > > Hi: > > I've fishished a first version of a program that reads a list of Applied > > Biosystems fsa files and draws a virtual gel. It does not reads the > > sequence because my users are interested in fragment analysis, but the > > basic infraestructure is in place to do it. > > It does what my users need. It's quite slow though, but I'm not investing > > time in optimizing it. > > Do you have any example images online for people to look at? > > Peter -- Jose M. Blanca Postigo Instituto Universitario de Conservacion y Mejora de la Agrodiversidad Valenciana (COMAV) Universidad Politecnica de Valencia (UPV) Edificio CPI (Ciudad Politecnica de la Innovacion), 8E 46022 Valencia (SPAIN) Tlf.:+34-96-3877000 (ext 88473) From chris.lasher at gmail.com Sun Mar 15 05:43:34 2009 From: chris.lasher at gmail.com (Chris Lasher) Date: Sun, 15 Mar 2009 01:43:34 -0400 Subject: [Biopython-dev] biopython on github In-Reply-To: <320fb6e00903130521s69c5b3eg55b71191b1e8ff21@mail.gmail.com> References: <5aa3b3570902150729g367022a5p334b2c33f86461f@mail.gmail.com> <8b34ec180902231029u7a9d003r533af7f078f4a8e2@mail.gmail.com> <8b34ec180902250140k4fb1bef0y913b97db0e309e4b@mail.gmail.com> <320fb6e00902250210t2ad19536ke379e219ba6f7dae@mail.gmail.com> <8b34ec180902250256k6f6f5c1bvbf85d8b68a315927@mail.gmail.com> <8b34ec180902260526m3ff42f3x2a99a77d4d0fb928@mail.gmail.com> <320fb6e00902260600p5fb90241td1ded497c08cb901@mail.gmail.com> <128a885f0903121407g133ed8ctda57b21ff8adb70e@mail.gmail.com> <8b34ec180903121620w9c2ec46i8fed9ccb4781370e@mail.gmail.com> <320fb6e00903130521s69c5b3eg55b71191b1e8ff21@mail.gmail.com> Message-ID: <128a885f0903142243r372026d7vdf5bbe998db3a326@mail.gmail.com> On Fri, Mar 13, 2009 at 8:21 AM, Peter wrote: > On Thu, Mar 12, 2009 at 11:20 PM, Bartek Wilczynski > wrote: >> On Thu, Mar 12, 2009 at 10:07 PM, Chris Lasher wrote: >>> On Thu, Feb 26, 2009 at 10:00 AM, Peter wrote: >>>> Another option to consider would be to switch to running git on >>>> biopython.org, but use the git-cvsserver tool to provide an emulated >>>> CVS server on top of the git repository. ?This sounds possible in >>>> theory, and would be nice for any "old fashioned" biopython developers >>>> because is should be fairly transparent - they can continue to treat >>>> it as CVS and just work on the main trunk. ?This would require someone >>>> competent to do the conversion and alter the server setup - we'd have >>>> to talk to the OBF team about this. ?However, if anyone has first hand >>>> experience on git-cvsserver perhaps they could comment on weather this >>>> sounds like a good plan or not. >>> >>> I must be missing something, Peter. Why would BioPython continue to >>> operate with CVS? I suppose I just really hope to see BioPython >>> running with something other than CVS, and I'd really like to see it >>> go either under Bazaar or Git. > > I'm warming to the idea of git, and had noticed git includes the > optional git-cvsserver tool which emulates a CVS server while using > git underneath. ?I was wondering if anyone had first hand experience > of this. ?If we did move from CVS to git (still hosted on > biopython.org), this would seem to offer a nice migration path for of > our "old school" CVS developers - they can carry on as usual. ?Of > course, if none of us care about having to learn a new interface, then > a simple switch would be less hassle to setup. ?For the server side of > things, we'll need to talk to the OBF team about any such move - as > far as I know they've only managed CVS to SVN migrations in the past. > > Peter > >> Hi Chris, >> >> The idea is to do the switch in two steps: >> - first we still have the main branch in CVS while we have git and/or >> bzr branches synchronized with it for people to branch and contribute >> - If this works nicely, we will switch to one of these systems >> completely (while possibly keeping the other branch in sync, but this >> is not yet decided) > > That does seem like a good plan. ?Of course, there is the related > issue of where we host the official repository (externally, e.g. on > github or lauchpad) or in house (on biopython.org). ?I favour keeping > the official repository on biopython.org but this will require OBF > technical support (do we have the expertise within Biopython? Bartek? > Chris?). > >> The first step is to some extent operational (I'm currently busy with >> other stuff, but I'll get arround it hopefully this weekend), but the >> second step requires decision on our side (git or bzr?) and action on >> the side of OBF (there is no git or bazar installed on obf servers). > > There is also the previously semi-agreed solution of switching from > CVS to SVN on biopython.org, but this would be only a gradual > improvement. ?I gather there are mature tools for using git+svn > together, so it should be better than using git+cvs together. ?Other > than meaning all the OBF hosted projects are on SVN (I think we are > the last still on CVS), this is beginning to seem a bit pointless. > > Peter > Peter et al., I started off writing an email about why I think hosting at GitHub or Launchpad is a better idea, but it got a bit verbose, so I just wrote up a blog post instead. (Besides, links and images are more fun, and make the intarwebs go 'round.) Please see http://igotgenes.blogspot.com/2009/03/why-biopython-needs-to-move-to-github.html or http://tinyurl.com/a9o7ae Chris From mjldehoon at yahoo.com Sun Mar 15 10:24:11 2009 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Sun, 15 Mar 2009 03:24:11 -0700 (PDT) Subject: [Biopython-dev] Bio.ExPASy Message-ID: <76595.11423.qm@web62404.mail.re1.yahoo.com> Hi everybody, As discussed previously, I have moved the Bio.Prosite code to Bio.ExPASy, and I've added a ScanProsite module to Bio.ExPASy. I guess Bio.Enzyme should also move to Bio.ExPASy. See http://biopython.org/DIST/docs/tutorial/Tutorial.proposal.html for the documentation of Biopython as currently in CVS. --Michiel. From mjldehoon at yahoo.com Sun Mar 15 12:53:28 2009 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Sun, 15 Mar 2009 05:53:28 -0700 (PDT) Subject: [Biopython-dev] Fw: Re: Bio.Entrez catching more errors Message-ID: <722257.11611.qm@web62401.mail.re1.yahoo.com> --- On Sun, 3/15/09, Michiel de Hoon wrote: > Whereas I think it's a good idea if Bio.Entrez catches > more errors, I think the parser is a more suitable place to > check for errors. See Bio.ExPASy.ScanProsite for an example > of catching errors with an XML parser; this avoids using a > File.UndoHandle. > > --Michiel > > --- On Tue, 3/10/09, Peter > wrote: > > > From: Peter > > Subject: [Biopython-dev] Bio.Entrez catching more > errors > > To: "BioPython-Dev Mailing List" > > > Date: Tuesday, March 10, 2009, 7:40 PM > > Hi All, > > > > It occured to me that the Bio.Entrez._open function > can > > look at the > > retmode argument (if present) and spot if there is a > > mismatch between > > the requested format (e.g. XML, HTML, text or asn.1) > and > > the actual > > data the NCBI returned. Something along the following > > lines could be > > added to the end of the _open function in > > Bio/Entrez/__init__.py to > > acheive this: > > > > elif "retmode" in params and > > params["retmode"].lower()=="html" > \ > > and not > data.lower().startswith(" > \ > > and not data.lower().startswith(" > html") : > > raise TypeError("Requested HTML, but > > didn't get it: %s..." % data) > > elif "retmode" in params and > > params["retmode"].lower()=="xml" > \ > > and not > data.lower().startswith(" > raise TypeError("Requested XML, but > didn't > > get it: %s..." % data) > > elif "retmode" in params and > > params["retmode"] \ > > and > > params["retmode"].lower()!="xml" > \ > > and data.lower().startswith(" : > > raise TypeError("Didn't request XML, > but > > got it: %s..." % data) > > elif "retmode" in params and > > params["retmode"] \ > > and > > params["retmode"].lower()!="html" > \ > > and (data.lower().startswith(" or > > \ > > data.lower().startswith(" > html")): > > #Expected for some error pages (e.g. the Bad > > Gateway caught above) > > raise TypeError("Didn't request HTML, > but > > got it: %s..." % data) > > > > I'm sure my XML/HTML detection could be made more > > robust here - I hope > > the principle is clear. My motivation is that I have > > noticed the NCBI > > can return HTML error pages, and while we do catch > some of > > these > > explicitly (e.g. Bad Gateway, or Service Unavailable), > I > > think any > > HTML page when the user asked from XML, text or asn.1 > > should be > > treated as error. Similarly, not getting XML when you > ask > > for it etc. > > > > Note that by raising the exception including the > message > > text it > > should be much easier to diagnose these failures. As > a > > tiny > > refinement to the above code, we should only add the > > "..." if there is > > more text to follow - this isn't always the case. > > > > e.g. The following give an HTML error page (while some > > databases like > > "protein" are better behaved in this > respect): > > >>> print > Entrez.efetch(db="homologene", > > id="nonexistant", > retmode="text").read() > > >>> print > Entrez.efetch(db="homologene", > > id="nonexistant", > > retmode="asn.1").read() > > > > Similarly, these give an XML like fragment (which is > not a > > valid XML > > file in itself - arguably an NCBI bug; some databases > like > > "protein" > > are better behaved in this respect): > > >>> print > Entrez.efetch(db="pubmed", > > id="nonexistant", > retmode="xml").read() > > >>> print > Entrez.efetch(db="homologene", > > id="nonexistant", > retmode="xml").read() > > >>> print Entrez.efetch(db="cdd", > > id="nonexistant", > retmode="xml").read() > > >>> print > Entrez.efetch(db="taxonomy", > > id="nonexistant", > retmode="xml").read() > > > > My suggested change to Bio.Entrez would also catch the > > following > > examples (using an invalid database) where the NCBI > ignore > > the retmode > > and return an HTML help page: > > >>> print > > Entrez.efetch(db="nonexistant", > > id="123456", retmode="xml").read() > > >>> print > > Entrez.efetch(db="nonexistant", > > id="123456", > retmode="text").read() > > > > In a less clear cut example, this would flag the > following > > as an error > > as the NCBI seem to return ASN.1 text instead of HTML > > here:: > > >>> print > Entrez.efetch(db="nucleotide", > > retmode="html", > id="123456").read() > > > > Overall, I think this change should catch lots of > errors > > which > > otherwise may not be detected until later (e.g. while > > trying to parse > > the file). > > > > > -------------------------------------------------------------------------------------------------- > > > > On another point, should we catch these responses as > > errors:? > > > > >>> efetch(db="snp", > > id="123456").read() > > 'PmFetch > > > response\n
\n1:
> > id: 123456 Error occurred: cannot get document
> >
> summary\n
' > > >>> efetch(db="snp", > > id="123456", > retmode="html").read() > > 'PmFetch > > > response\n
\n1:
> > id: 123456 Error occurred: cannot get document
> >
> summary\n
' > > >>> efetch(db="snp", > > id="123456", retmode="xml").read() > > ' > version="1.0"?>\n > > xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"\nxmlns="http://www.ncbi.nlm.nih.gov/SNP/docsum"\nxsi:schemaLocation="http://www.ncbi.nlm.nih.gov/SNP/docsum\nhttp://www.ncbi.nlm.nih.gov/SNP/docsum/eudocsum.xsd">1: > > id: 123456 Error occurred: cannot get document > > summary\n\n' > > >>> efetch(db="snp", > > id="123456", > retmode="text").read() > > '1: id: 123456 Error occurred: cannot get document > > summary\n' > > > > and, > > >>> print efetch(db="homologene", > > retmode="html", id="fake").read() > > > > > >

Error occurred: Empty id list - > > nothing todo

... > > > > Looking for the string "Error occurred: " > looks > > fairly safe here, and > > should cover a range of entries. Of course, you can > > imagine false > > positives too, e.g. a valid PUBMED plain text record > for a > > tutorial > > article with a title like "Yikes! An Error > Occurred: A > > beginner's > > Guide To Defensive Programming." could match. > > > > Peter > > _______________________________________________ > > Biopython-dev mailing list > > Biopython-dev at lists.open-bio.org > > > http://lists.open-bio.org/mailman/listinfo/biopython-dev From chapmanb at 50mail.com Sun Mar 15 18:54:43 2009 From: chapmanb at 50mail.com (Brad Chapman) Date: Sun, 15 Mar 2009 14:54:43 -0400 Subject: [Biopython-dev] biopython on github In-Reply-To: <128a885f0903142243r372026d7vdf5bbe998db3a326@mail.gmail.com> References: <8b34ec180902231029u7a9d003r533af7f078f4a8e2@mail.gmail.com> <8b34ec180902250140k4fb1bef0y913b97db0e309e4b@mail.gmail.com> <320fb6e00902250210t2ad19536ke379e219ba6f7dae@mail.gmail.com> <8b34ec180902250256k6f6f5c1bvbf85d8b68a315927@mail.gmail.com> <8b34ec180902260526m3ff42f3x2a99a77d4d0fb928@mail.gmail.com> <320fb6e00902260600p5fb90241td1ded497c08cb901@mail.gmail.com> <128a885f0903121407g133ed8ctda57b21ff8adb70e@mail.gmail.com> <8b34ec180903121620w9c2ec46i8fed9ccb4781370e@mail.gmail.com> <320fb6e00903130521s69c5b3eg55b71191b1e8ff21@mail.gmail.com> <128a885f0903142243r372026d7vdf5bbe998db3a326@mail.gmail.com> Message-ID: <20090315185443.GA30296@kunkel> Hi all; It is good to see the discussion around revision control systems; Chris and Paulo's posts make some nice points. Source code management is an important issue that influences perception of Biopython and barriers to contributing. My two cents on what we should do is: - Pick a distributed source code management system. My preference is Git, only because it currently has more steam behind it. Git/Bazaar will likely end up being like the VHS/Beta debate. - Test drive use of Git on an official GitHub repository. This would involve a few things: = Bartek and Giovanni: Can you coordinate on a single GitHub Biopython instance and remove the others to eliminate confusion? = Write up documentation for contributors. This is where we could use some volunteers from those interested to update the web pages. The two main places that need updating are: http://biopython.org/wiki/Contributing http://biopython.org/wiki/CVS I think we should ensure people are clear on what is being done and where you can contribute. - Ensure GitHub can be synced with current CVS. Bartek, it sounds like you have a handle on this. - Evaluate the success of Git. This is easy to measure in terms of new contributors, increased happiness, and what not. At the same time we can monitor how GitHub evolves over time. - If successful, talk to the OpenBio team about hosting Git locally. Peter, Michiel, et al -- how do you feel? I think being cautious with the transition, as Peter recommends, is important. I am old enough to remember Sourceforge being new and everyone saying how it was stupid not to move there; then over time Sourceforge got slow with all the users and people moved away from it. This is just to say -- no one knows how GitHub (or Launchpad) will evolve. OpenBio is a stable, small, nice community and to the extent we can use their resources I believe we should. Overall, the specifics of the above proposal aren't as important as just doing something unambiguous and then evaluating how it works. Right now things are a big confusing, which I think could put off new developers, who are always welcome. Looking forward to talking about code instead of revision control, Brad > On Fri, Mar 13, 2009 at 8:21 AM, Peter wrote: > > On Thu, Mar 12, 2009 at 11:20 PM, Bartek Wilczynski > > wrote: > >> On Thu, Mar 12, 2009 at 10:07 PM, Chris Lasher wrote: > >>> On Thu, Feb 26, 2009 at 10:00 AM, Peter wrote: > >>>> Another option to consider would be to switch to running git on > >>>> biopython.org, but use the git-cvsserver tool to provide an emulated > >>>> CVS server on top of the git repository. ?This sounds possible in > >>>> theory, and would be nice for any "old fashioned" biopython developers > >>>> because is should be fairly transparent - they can continue to treat > >>>> it as CVS and just work on the main trunk. ?This would require someone > >>>> competent to do the conversion and alter the server setup - we'd have > >>>> to talk to the OBF team about this. ?However, if anyone has first hand > >>>> experience on git-cvsserver perhaps they could comment on weather this > >>>> sounds like a good plan or not. > >>> > >>> I must be missing something, Peter. Why would BioPython continue to > >>> operate with CVS? I suppose I just really hope to see BioPython > >>> running with something other than CVS, and I'd really like to see it > >>> go either under Bazaar or Git. > > > > I'm warming to the idea of git, and had noticed git includes the > > optional git-cvsserver tool which emulates a CVS server while using > > git underneath. ?I was wondering if anyone had first hand experience > > of this. ?If we did move from CVS to git (still hosted on > > biopython.org), this would seem to offer a nice migration path for of > > our "old school" CVS developers - they can carry on as usual. ?Of > > course, if none of us care about having to learn a new interface, then > > a simple switch would be less hassle to setup. ?For the server side of > > things, we'll need to talk to the OBF team about any such move - as > > far as I know they've only managed CVS to SVN migrations in the past. > > > > Peter > > > >> Hi Chris, > >> > >> The idea is to do the switch in two steps: > >> - first we still have the main branch in CVS while we have git and/or > >> bzr branches synchronized with it for people to branch and contribute > >> - If this works nicely, we will switch to one of these systems > >> completely (while possibly keeping the other branch in sync, but this > >> is not yet decided) > > > > That does seem like a good plan. ?Of course, there is the related > > issue of where we host the official repository (externally, e.g. on > > github or lauchpad) or in house (on biopython.org). ?I favour keeping > > the official repository on biopython.org but this will require OBF > > technical support (do we have the expertise within Biopython? Bartek? > > Chris?). > > > >> The first step is to some extent operational (I'm currently busy with > >> other stuff, but I'll get arround it hopefully this weekend), but the > >> second step requires decision on our side (git or bzr?) and action on > >> the side of OBF (there is no git or bazar installed on obf servers). > > > > There is also the previously semi-agreed solution of switching from > > CVS to SVN on biopython.org, but this would be only a gradual > > improvement. ?I gather there are mature tools for using git+svn > > together, so it should be better than using git+cvs together. ?Other > > than meaning all the OBF hosted projects are on SVN (I think we are > > the last still on CVS), this is beginning to seem a bit pointless. > > > > Peter > > > > Peter et al., > > I started off writing an email about why I think hosting at GitHub or > Launchpad is a better idea, but it got a bit verbose, so I just wrote > up a blog post instead. (Besides, links and images are more fun, and > make the intarwebs go 'round.) Please see > http://igotgenes.blogspot.com/2009/03/why-biopython-needs-to-move-to-github.html > or > http://tinyurl.com/a9o7ae > > Chris > > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev From bartek at rezolwenta.eu.org Sun Mar 15 20:12:46 2009 From: bartek at rezolwenta.eu.org (Bartek Wilczynski) Date: Sun, 15 Mar 2009 21:12:46 +0100 Subject: [Biopython-dev] biopython on github In-Reply-To: <20090315185443.GA30296@kunkel> References: <8b34ec180902231029u7a9d003r533af7f078f4a8e2@mail.gmail.com> <320fb6e00902250210t2ad19536ke379e219ba6f7dae@mail.gmail.com> <8b34ec180902250256k6f6f5c1bvbf85d8b68a315927@mail.gmail.com> <8b34ec180902260526m3ff42f3x2a99a77d4d0fb928@mail.gmail.com> <320fb6e00902260600p5fb90241td1ded497c08cb901@mail.gmail.com> <128a885f0903121407g133ed8ctda57b21ff8adb70e@mail.gmail.com> <8b34ec180903121620w9c2ec46i8fed9ccb4781370e@mail.gmail.com> <320fb6e00903130521s69c5b3eg55b71191b1e8ff21@mail.gmail.com> <128a885f0903142243r372026d7vdf5bbe998db3a326@mail.gmail.com> <20090315185443.GA30296@kunkel> Message-ID: <8b34ec180903151312q5a3b2bcdwc526aef5d4ca2cfc@mail.gmail.com> Hi all, On Sun, Mar 15, 2009 at 7:54 PM, Brad Chapman wrote: > > - Pick a distributed source code management system. My preference > ?is Git, only because it currently has more steam behind it. > ?Git/Bazaar will likely end up being like the VHS/Beta debate. > > - Test drive use of Git on an official GitHub repository. This would > ?involve a few things: > > ?= Bartek and Giovanni: Can you coordinate on a single GitHub > ? ?Biopython instance and remove the others to eliminate confusion? > ?= Write up documentation for contributors. This is where we could use > ? ?some volunteers from those interested to update the web pages. > ? ?The two main places that need updating are: > > ? ?http://biopython.org/wiki/Contributing > ? ?http://biopython.org/wiki/CVS > > ? ?I think we should ensure people are clear on what is being done > ? ?and where you can contribute. > > - Ensure GitHub can be synced with current CVS. Bartek, it sounds > ?like you have a handle on this. > > - Evaluate the success of Git. This is easy to measure in terms of > ?new contributors, increased happiness, and what not. At the same > ?time we can monitor how GitHub evolves over time. > I think there are some important points brought by Brad (and others). - From the technical point of view, I don't see any serious problems: - I can setup a new branch in github (current one includes some testing changes done by Giovanni) - it will be synchronized daily with changes from CVS - I'll set up a script to also save a backup of the official branch at the OBF server (to ensure that we do not depend on github) - I can make a (short) documentation on how to contribute. I don't know wheteher anyone beside me is still interested in testdriving launchpad/bzr as an alternative. If there are no other people, I'll close the current testing branches from launchpad. > > Peter, Michiel, et al -- how do you feel? I would also very happily hear from other developers. Especially if there are any people who would be unhappy if we finally moved away from CVS. I'll post when I will have a running setup of cvs2git conversion. cheers Bartek From bartek at rezolwenta.eu.org Sun Mar 15 23:14:07 2009 From: bartek at rezolwenta.eu.org (Bartek Wilczynski) Date: Mon, 16 Mar 2009 00:14:07 +0100 Subject: [Biopython-dev] biopython on github In-Reply-To: <8b34ec180903151312q5a3b2bcdwc526aef5d4ca2cfc@mail.gmail.com> References: <8b34ec180902231029u7a9d003r533af7f078f4a8e2@mail.gmail.com> <8b34ec180902250256k6f6f5c1bvbf85d8b68a315927@mail.gmail.com> <8b34ec180902260526m3ff42f3x2a99a77d4d0fb928@mail.gmail.com> <320fb6e00902260600p5fb90241td1ded497c08cb901@mail.gmail.com> <128a885f0903121407g133ed8ctda57b21ff8adb70e@mail.gmail.com> <8b34ec180903121620w9c2ec46i8fed9ccb4781370e@mail.gmail.com> <320fb6e00903130521s69c5b3eg55b71191b1e8ff21@mail.gmail.com> <128a885f0903142243r372026d7vdf5bbe998db3a326@mail.gmail.com> <20090315185443.GA30296@kunkel> <8b34ec180903151312q5a3b2bcdwc526aef5d4ca2cfc@mail.gmail.com> Message-ID: <8b34ec180903151614k37db9568sc04b10bcdb688139@mail.gmail.com> Hi all, I've now set up script on my machine to update the biopython git branch on github once every hour. (thanks to Giovanni for creating and setting up the account) It's created using the git fast-import script because of its speed. You can find it here: http://github.com/biopython/biopython/ It's a different branch than the one created earlier by Giovanni. The old one is now called biopython_old and will soon disappear from github (there were some temporary changes in it) Th script also leaves a copy of the repository on dev.open-bio.org, just in case :) I've written a short guide on the wiki : http://biopython.org/wiki/GitMigration Please correct or give me comments if you don't like something or if you feel something is missing. I'm going to a conference, so I might be slow in responding to emails next week... cheers Bartek From dalloliogm at gmail.com Mon Mar 16 09:49:29 2009 From: dalloliogm at gmail.com (Giovanni Marco Dall'Olio) Date: Mon, 16 Mar 2009 10:49:29 +0100 Subject: [Biopython-dev] biopython on github In-Reply-To: <8b34ec180903151614k37db9568sc04b10bcdb688139@mail.gmail.com> References: <8b34ec180902231029u7a9d003r533af7f078f4a8e2@mail.gmail.com> <8b34ec180902260526m3ff42f3x2a99a77d4d0fb928@mail.gmail.com> <320fb6e00902260600p5fb90241td1ded497c08cb901@mail.gmail.com> <128a885f0903121407g133ed8ctda57b21ff8adb70e@mail.gmail.com> <8b34ec180903121620w9c2ec46i8fed9ccb4781370e@mail.gmail.com> <320fb6e00903130521s69c5b3eg55b71191b1e8ff21@mail.gmail.com> <128a885f0903142243r372026d7vdf5bbe998db3a326@mail.gmail.com> <20090315185443.GA30296@kunkel> <8b34ec180903151312q5a3b2bcdwc526aef5d4ca2cfc@mail.gmail.com> <8b34ec180903151614k37db9568sc04b10bcdb688139@mail.gmail.com> Message-ID: <5aa3b3570903160249l6db16b6ew349e394bc3e126dc@mail.gmail.com> On Mon, Mar 16, 2009 at 12:14 AM, Bartek Wilczynski < bartek at rezolwenta.eu.org> wrote: > Hi all, > > I've written a short guide on the wiki : > http://biopython.org/wiki/GitMigration I also have a draft for some documentation... I can contribute it later this morning (now I don't have time). p.s. the biopython website seems to be offline at the moment... -- My blog on bioinformatics (now in English): http://bioinfoblog.it From biopython at maubp.freeserve.co.uk Mon Mar 16 11:05:38 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 16 Mar 2009 11:05:38 +0000 Subject: [Biopython-dev] biopython on github In-Reply-To: <5aa3b3570903160249l6db16b6ew349e394bc3e126dc@mail.gmail.com> References: <8b34ec180902231029u7a9d003r533af7f078f4a8e2@mail.gmail.com> <320fb6e00902260600p5fb90241td1ded497c08cb901@mail.gmail.com> <128a885f0903121407g133ed8ctda57b21ff8adb70e@mail.gmail.com> <8b34ec180903121620w9c2ec46i8fed9ccb4781370e@mail.gmail.com> <320fb6e00903130521s69c5b3eg55b71191b1e8ff21@mail.gmail.com> <128a885f0903142243r372026d7vdf5bbe998db3a326@mail.gmail.com> <20090315185443.GA30296@kunkel> <8b34ec180903151312q5a3b2bcdwc526aef5d4ca2cfc@mail.gmail.com> <8b34ec180903151614k37db9568sc04b10bcdb688139@mail.gmail.com> <5aa3b3570903160249l6db16b6ew349e394bc3e126dc@mail.gmail.com> Message-ID: <320fb6e00903160405p5337f8b1m16d3c3d891950fd6@mail.gmail.com> On Mon, Mar 16, 2009 at 9:49 AM, Giovanni Marco Dall'Olio wrote: > On Mon, Mar 16, 2009 at 12:14 AM, Bartek Wilczynski < > bartek at rezolwenta.eu.org> wrote: >> Hi all, >> >> I've written a short guide on the wiki : >> http://biopython.org/wiki/GitMigration > > I also have a draft for some documentation... I can contribute it later this > morning (now I don't have time). In the meantime, I have updated the following pages accordingly: http://biopython.org/wiki/CVS http://biopython.org/wiki/SVN http://biopython.org/wiki/Subversion_migration http://biopython.org/wiki/Git #place holder, will be important if we do fully move to git http://biopython.org/wiki/GitMigration #Fixing biopython to Biopython etc Peter > p.s. the biopython website seems to be offline at the moment... All the OBF pages were out for bit this morning (e.g. OBF helpdesk #332), but it is back now. From biopython at maubp.freeserve.co.uk Mon Mar 16 11:30:12 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 16 Mar 2009 11:30:12 +0000 Subject: [Biopython-dev] biopython on github In-Reply-To: <20090315185443.GA30296@kunkel> References: <8b34ec180902231029u7a9d003r533af7f078f4a8e2@mail.gmail.com> <320fb6e00902250210t2ad19536ke379e219ba6f7dae@mail.gmail.com> <8b34ec180902250256k6f6f5c1bvbf85d8b68a315927@mail.gmail.com> <8b34ec180902260526m3ff42f3x2a99a77d4d0fb928@mail.gmail.com> <320fb6e00902260600p5fb90241td1ded497c08cb901@mail.gmail.com> <128a885f0903121407g133ed8ctda57b21ff8adb70e@mail.gmail.com> <8b34ec180903121620w9c2ec46i8fed9ccb4781370e@mail.gmail.com> <320fb6e00903130521s69c5b3eg55b71191b1e8ff21@mail.gmail.com> <128a885f0903142243r372026d7vdf5bbe998db3a326@mail.gmail.com> <20090315185443.GA30296@kunkel> Message-ID: <320fb6e00903160430h125d11a3jd100497d3e25ffb8@mail.gmail.com> On Sun, Mar 15, 2009 at 6:54 PM, Brad Chapman wrote: > Hi all; > It is good to see the discussion around revision control systems; > Chris and Paulo's posts make some nice points. Source code > management is an important issue that influences perception of > Biopython and barriers to contributing. > > My two cents on what we should do is: > > - Pick a distributed source code management system. My preference > ?is Git, only because it currently has more steam behind it. > ?Git/Bazaar will likely end up being like the VHS/Beta debate. I would agree git has more mind share, but I have no technical reason to choose one over the other. In terms of read only access, having a mirrored trunk branch on both git (e.g. github) and bazaar (e.g. launchpad) is possible for evaluation purposes. > - Test drive use of Git on an official GitHub repository. This would > ?involve a few things ... Giovanni has shared the github "Biopython" user information so we (i.e. Biopython) can use that for any official presence on github - which is great. Bartek and Giovanni seem to have this working OK. I think having the latest CVS trunk in Launchpad automatically is stalled because they (launchpad) can't cope with a simple username/password for accessing a remote CVS server. Is that right Bartek? > - Evaluate the success of Git. This is easy to measure in terms of > ?new contributors, increased happiness, and what not. At the same > ?time we can monitor how GitHub evolves over time. It may not be that easy to measure in practice... > - If successful, talk to the OpenBio team about hosting Git locally. I have contacted the OBF to ask who we should talk to about this idea (given it will probably involve server access to install new software and perhaps changing firewall/port settings). > Peter, Michiel, et al -- how do you feel? I'm happy in principle with a switch to git, ideally hosted on biopython.org (see below). > I think being cautious with the transition, as Peter recommends, is > important. I am old enough to remember Sourceforge being new and > everyone saying how it was stupid not to move there; then over time > Sourceforge got slow with all the users and people moved > away from it. This is just to say -- no one knows how GitHub (or > Launchpad) will evolve. OpenBio is a stable, small, nice community > and to the extent we can use their resources I believe we should. I did have that same example in mind - having to depend on a third party like GitHub, LaunchPad or Sourceforge is great until things go wrong. The Open Bio Foundation is much smaller, and while they don't have 100% uptime either, they are normally very responsive to issues because they only support a small number of projects. Of course, ideally we might have both - an OBF hosted (git) repository on biopython.org, synced to github for people to enjoy its collaborative additions. > Overall, the specifics of the above proposal aren't as important as > just doing something unambiguous and then evaluating how it works. > Right now things are a big confusing, which I think could put off > new developers, who are always welcome. > > Looking forward to talking about code instead of revision control, That would be nice :) Peter From biopython at maubp.freeserve.co.uk Mon Mar 16 12:16:06 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 16 Mar 2009 12:16:06 +0000 Subject: [Biopython-dev] Preparing for Biopython 1.50 (beta) Message-ID: <320fb6e00903160516yd63f61fu21ca7560562dd6dd@mail.gmail.com> Hi All, I think we should probably do another release soon - for one thing the NCBI updated their DTD files, and it would be great if Biopython shipped with them included (see discussion on Bug 2678). We still need to work on the documentation for Bio.Graphics.GenomeDiagram (Bug 2671) and Bio.Motif (Bug 2694), but in the meantime I think it would be sensible to do a Biopython 1.50 beta release in the next couple of weeks. I'd like to include the following changes as part of the beta, but it would be sensible to have someone else try these out first. Any volunteers? Bug 2767 - Bio.SeqIO support for FASTQ and QUAL files Bug 2551 - Adding advanced __getitem__ to generic alignment, e.g. align[1:2,5:-5] Any other nominations for Biopython 1.50? I'd also like to resolve Bug 2597 (Enforce alphabet letters in Seq objects), but that might deserve an alpha release given the higher chance of breaking existing scripts... Peter From biopython at maubp.freeserve.co.uk Mon Mar 16 13:18:19 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 16 Mar 2009 13:18:19 +0000 Subject: [Biopython-dev] biopython on github In-Reply-To: <320fb6e00903160430h125d11a3jd100497d3e25ffb8@mail.gmail.com> References: <8b34ec180902231029u7a9d003r533af7f078f4a8e2@mail.gmail.com> <8b34ec180902250256k6f6f5c1bvbf85d8b68a315927@mail.gmail.com> <8b34ec180902260526m3ff42f3x2a99a77d4d0fb928@mail.gmail.com> <320fb6e00902260600p5fb90241td1ded497c08cb901@mail.gmail.com> <128a885f0903121407g133ed8ctda57b21ff8adb70e@mail.gmail.com> <8b34ec180903121620w9c2ec46i8fed9ccb4781370e@mail.gmail.com> <320fb6e00903130521s69c5b3eg55b71191b1e8ff21@mail.gmail.com> <128a885f0903142243r372026d7vdf5bbe998db3a326@mail.gmail.com> <20090315185443.GA30296@kunkel> <320fb6e00903160430h125d11a3jd100497d3e25ffb8@mail.gmail.com> Message-ID: <320fb6e00903160618g2b5b6acs6695fab5ef432bc7@mail.gmail.com> Hi all, I'm thinking a news post on http://news.open-bio.org/news/category/obf-projects/biopython/ about version control would be a good idea at this point. How about this - keywords like git, subversion and the other project names would be links: Title: Biopython and version control systems Originally, all the OBF hosted projects used CVS for their source code repositories. At the start of 2008, BioPerl and BioJava moved over to Subversion (SVN), followed by BioSQL. Biopython was originally going to do the same, but this didn't actually happen. Having all the Bio* projects using the same version control system would have simplified server administration for the OBF, but wouldn't have actually made that much difference to Biopython development. Discussion has since shifted towards next generation distributed version control systems like git or bazaar. Quote from Linus Torvalds, The slogan of Subversion for a while was ?CVS done right?, or something like that, and if you start with that kind of slogan, there's nowhere you can go. There is no way to do CVS right In addition to creating the Linux kernel, Linus Torvalds more recently wrote git, a prominent example of a distributed version control system. Rather than switching from CVS to SVN, the BioRuby project choose instead to use git, hosted on github. Biopython is considering doing something similar - using a distributed version control system like git should make it easier for potential Biopython contributors to manage their own local copies of Biopython under version control. Initially for evaluation purposes only, Giovanni and Bartek have setup a Biopython branch on GitHub, which will automatically be updated from the OBF hosted Biopython CVS repository [Link to wiki page]. If this is favorably received, then moving Biopython from CVS to git seems likely at some point this year. Peter on behalf of the Biopython developers I hope this has everyone's approval... if not please reply here so we can revise this before it gets posted. Note that I've avoided getting into specifics here, such as hosting arrangements, as the details will go out of date. Peter From bartek at rezolwenta.eu.org Mon Mar 16 14:24:42 2009 From: bartek at rezolwenta.eu.org (Bartek Wilczynski) Date: Mon, 16 Mar 2009 15:24:42 +0100 Subject: [Biopython-dev] biopython on github In-Reply-To: <320fb6e00903160430h125d11a3jd100497d3e25ffb8@mail.gmail.com> References: <8b34ec180902231029u7a9d003r533af7f078f4a8e2@mail.gmail.com> <8b34ec180902250256k6f6f5c1bvbf85d8b68a315927@mail.gmail.com> <8b34ec180902260526m3ff42f3x2a99a77d4d0fb928@mail.gmail.com> <320fb6e00902260600p5fb90241td1ded497c08cb901@mail.gmail.com> <128a885f0903121407g133ed8ctda57b21ff8adb70e@mail.gmail.com> <8b34ec180903121620w9c2ec46i8fed9ccb4781370e@mail.gmail.com> <320fb6e00903130521s69c5b3eg55b71191b1e8ff21@mail.gmail.com> <128a885f0903142243r372026d7vdf5bbe998db3a326@mail.gmail.com> <20090315185443.GA30296@kunkel> <320fb6e00903160430h125d11a3jd100497d3e25ffb8@mail.gmail.com> Message-ID: <8b34ec180903160724h2e239fafi22d8f5fa9c1de7cc@mail.gmail.com> On Mon, Mar 16, 2009 at 12:30 PM, Peter wrote: > On Sun, Mar 15, 2009 at 6:54 PM, Brad Chapman wrote: >> - Pick a distributed source code management system. My preference >> ?is Git, only because it currently has more steam behind it. >> ?Git/Bazaar will likely end up being like the VHS/Beta debate. > > I would agree git has more mind share, but I have no technical reason > to choose one over the other. > > In terms of read only access, having a mirrored trunk branch on both > git (e.g. github) and bazaar (e.g. launchpad) is possible for > evaluation purposes. It is possible, but I don't know if we should do this. To some extent having too much choice might be problematic.... We've done some tests on both bzr and git and it seems that both can do the job for us. I assume, that the purpose of "test-driving" instead of directly switching to git is to give us a possibility to go back in case things go really bad. But I don't think it's a likely event. Bigger projects are using git (or bzr) and doing fine, so we shouldn't have problems either. On the other hand I don't expect that having the possibility to test-drive two options is going to make the decision any easier. I don't expect too many people to try both options and even if it happens I don't think there will be a clear acclamation that one is better than the other. Honestly, we can't expect that all developers will learn two tools just to help us choose... Even though I was myself one of the proponents of switching to bzr I think that we should focus on one option and git seems to be the one with bigger mind share among biopythonistas. So I would vote for dropping the discussion on bzr and focusing on making sure that noone is left behind with their problems during the (possibly not too long) transition to git. > >> - Test drive use of Git on an official GitHub repository. This would >> ?involve a few things ... > > Giovanni has shared the github "Biopython" user information so we > (i.e. Biopython) can use that for any official presence on github - > which is great. ?Bartek and Giovanni seem to have this working OK. > > I think having the latest CVS trunk in Launchpad automatically is > stalled because they (launchpad) can't cope with a simple > username/password for accessing a remote CVS server. ?Is that right > Bartek? > Yes, we have now the biopython branch on github synchronized with CVS on an hourly basis. There is no problem with synchronizing a branch on launchpad in the same script, but I didn't do it for reasons explained above. >> - Evaluate the success of Git. This is easy to measure in terms of >> ?new contributors, increased happiness, and what not. At the same >> ?time we can monitor how GitHub evolves over time. > > It may not be that easy to measure in practice... > Well, If everyone will be able to use git I'd say it's a success. We don't need a perfect solution. We want to move to _a_ distributed version control system. > I did have that same example in mind - having to depend on a third > party like GitHub, LaunchPad or Sourceforge is great until things go > wrong. ?The Open Bio Foundation is much smaller, and while they don't > have 100% uptime either, they are normally very responsive to issues > because they only support a small number of projects. ?Of course, > ideally we might have both - an OBF hosted (git) repository on > biopython.org, synced to github for people to enjoy its collaborative > additions. > There is one difference between moving to sourceforge and moving to git. With git, it is much less of a problem to switch hosting. The fundamental idea is that every branch (including all local developer branches) can be a "master" branch. So switching to a different hosting location is a matter of an e-mail on the developer mailing list telling people to update the location of the "master" in their branches. So I think that we need to worry less about git hosting than we would need to worry about cvs (or svn for that matter). cheers Bartek From biopython at maubp.freeserve.co.uk Mon Mar 16 15:00:16 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 16 Mar 2009 15:00:16 +0000 Subject: [Biopython-dev] biopython on github In-Reply-To: <8b34ec180903160724h2e239fafi22d8f5fa9c1de7cc@mail.gmail.com> References: <8b34ec180902231029u7a9d003r533af7f078f4a8e2@mail.gmail.com> <8b34ec180902260526m3ff42f3x2a99a77d4d0fb928@mail.gmail.com> <320fb6e00902260600p5fb90241td1ded497c08cb901@mail.gmail.com> <128a885f0903121407g133ed8ctda57b21ff8adb70e@mail.gmail.com> <8b34ec180903121620w9c2ec46i8fed9ccb4781370e@mail.gmail.com> <320fb6e00903130521s69c5b3eg55b71191b1e8ff21@mail.gmail.com> <128a885f0903142243r372026d7vdf5bbe998db3a326@mail.gmail.com> <20090315185443.GA30296@kunkel> <320fb6e00903160430h125d11a3jd100497d3e25ffb8@mail.gmail.com> <8b34ec180903160724h2e239fafi22d8f5fa9c1de7cc@mail.gmail.com> Message-ID: <320fb6e00903160800s36b8231fo57e0a11506f8635d@mail.gmail.com> On Mon, Mar 16, 2009 at 2:24 PM, Bartek Wilczynski wrote: > > On Mon, Mar 16, 2009 at 12:30 PM, Peter wrote: >> On Sun, Mar 15, 2009 at 6:54 PM, Brad Chapman wrote: >>> - Pick a distributed source code management system. My preference >>> ?is Git, only because it currently has more steam behind it. >>> ?Git/Bazaar will likely end up being like the VHS/Beta debate. >> >> I would agree git has more mind share, but I have no technical reason >> to choose one over the other. >> >> In terms of read only access, having a mirrored trunk branch on both >> git (e.g. github) and bazaar (e.g. launchpad) is possible for >> evaluation purposes. > > It is possible, but I don't know if we should do this. To some extent > having too much choice might be problematic... True. > We've done some tests on both bzr and git and it seems that both > can do the job for us. I assume, that the purpose of "test-driving" > instead of directly switching to git is to give us a possibility to go > back in case things go really bad. But I don't think it's a likely > event. Bigger projects are using git (or bzr) and doing fine, so > we shouldn't have problems either. Well yes, having a fall back plan during this migration is essential. I do think there is a separate need for "test driving" for those of us with Biopython CVS access how don't have personally experience with git (or github). Making the switch before then would be a very bad idea. I personally need to make time to play with git and github, doing a couple of *real* branches and merges. I hope to so this week, some of the changes I'd like to do for Biopython 1.50 would make good candidates... but this is time that might otherwise be spent on bug fixes, documentation etc. And there is of course my real job too... ;) Related to this, what OS and version of git are you (Bartel and Giovanni) using? > On the other hand I don't expect that having the possibility to > test-drive two options is going to make the decision any easier. > I don't expect too many people to try both options and even if it > happens I don't think there will be a clear acclamation that one > is better than the other. I agree. > Honestly, we can't expect that all developers will learn two tools > just to help us choose... Even though I was myself one of the > proponents of switching to bzr. > I think that we should focus on one option and git seems to be the one > with bigger mind share among biopythonistas. > So I would vote for dropping the discussion on bzr and focusing on > making sure that noone is left behind with their > problems during the (possibly not too long) transition to git. I'm happy with dropping discussion on bzr, in favour of git. (As an aside I always liked the term biopythoneers, but biopythonistas is fun too.) >> Giovanni has shared the github "Biopython" user information so we >> (i.e. Biopython) can use that for any official presence on github - >> which is great. ?Bartek and Giovanni seem to have this working OK. >> >> I think having the latest CVS trunk in Launchpad automatically is >> stalled because they (launchpad) can't cope with a simple >> username/password for accessing a remote CVS server. ?Is that right >> Bartek? > > Yes, we have now the biopython branch on github synchronized with CVS > on an hourly basis. > There is no problem with synchronizing a branch on launchpad in the > same script, but I didn't do it for reasons explained above. OK. Do you want to make sure your Launchpad branch is clearly labeled as not current? > Well, If everyone will be able to use git I'd say it's a success. We > don't need a perfect solution. We want to move to _a_ distributed > version control system. Well, I suspect there are some silent contributors who don't care either way - its not perfect, but CVS works well enough. Better the devil you know ... ;) > ... > There is one difference between moving to sourceforge and moving to git. > With git, it is much less of a problem to switch hosting... So I think that we > need to worry less about git hosting than we would need to worry about > cvs (or svn for that matter). That is another good reason to pick git. Peter From bartek at rezolwenta.eu.org Mon Mar 16 16:55:40 2009 From: bartek at rezolwenta.eu.org (Bartek Wilczynski) Date: Mon, 16 Mar 2009 17:55:40 +0100 Subject: [Biopython-dev] biopython on github In-Reply-To: <320fb6e00903160800s36b8231fo57e0a11506f8635d@mail.gmail.com> References: <8b34ec180902231029u7a9d003r533af7f078f4a8e2@mail.gmail.com> <320fb6e00902260600p5fb90241td1ded497c08cb901@mail.gmail.com> <128a885f0903121407g133ed8ctda57b21ff8adb70e@mail.gmail.com> <8b34ec180903121620w9c2ec46i8fed9ccb4781370e@mail.gmail.com> <320fb6e00903130521s69c5b3eg55b71191b1e8ff21@mail.gmail.com> <128a885f0903142243r372026d7vdf5bbe998db3a326@mail.gmail.com> <20090315185443.GA30296@kunkel> <320fb6e00903160430h125d11a3jd100497d3e25ffb8@mail.gmail.com> <8b34ec180903160724h2e239fafi22d8f5fa9c1de7cc@mail.gmail.com> <320fb6e00903160800s36b8231fo57e0a11506f8635d@mail.gmail.com> Message-ID: <8b34ec180903160955m3d427927wce61940f51cf5337@mail.gmail.com> On Mon, Mar 16, 2009 at 4:00 PM, Peter wrote: > > I do think there is a separate need for "test driving" for those of us with > Biopython CVS access how don't have personally experience with git > (or github). ?Making the switch before then would be a very bad idea. > > I personally need to make time to play with git and github, doing a > couple of *real* branches and merges. ?I hope to so this week, some > of the changes I'd like to do for Biopython 1.50 would make good > candidates... but this is time that might otherwise be spent on bug > fixes, documentation etc. ?And there is of course my real job too... ;) > > Related to this, what OS and version of git are you (Bartel and Giovanni) using? > I'm currently using the binary installations on mac (intel) and ubuntu (8.10). I haven't experienced any problems which is quite expected on unix-like systems. It would be interesting to hear from people's experiences on windows. > > OK. ?Do you want to make sure your Launchpad branch is clearly labeled > as not current? > I've removed the bzr branches from launchpad, so there should be no more confusion. cheers Bartek From nuin at genedrift.org Mon Mar 16 16:58:26 2009 From: nuin at genedrift.org (Paulo Nuin) Date: Mon, 16 Mar 2009 12:58:26 -0400 Subject: [Biopython-dev] biopython on github In-Reply-To: <8b34ec180903160955m3d427927wce61940f51cf5337@mail.gmail.com> References: <8b34ec180902231029u7a9d003r533af7f078f4a8e2@mail.gmail.com> <320fb6e00902260600p5fb90241td1ded497c08cb901@mail.gmail.com> <128a885f0903121407g133ed8ctda57b21ff8adb70e@mail.gmail.com> <8b34ec180903121620w9c2ec46i8fed9ccb4781370e@mail.gmail.com> <320fb6e00903130521s69c5b3eg55b71191b1e8ff21@mail.gmail.com> <128a885f0903142243r372026d7vdf5bbe998db3a326@mail.gmail.com> <20090315185443.GA30296@kunkel> <320fb6e00903160430h125d11a3jd100497d3e25ffb8@mail.gmail.com> <8b34ec180903160724h2e239fafi22d8f5fa9c1de7cc@mail.gmail.com> <320fb6e00903160800s36b8231fo57e0a11506f8635d@mail.gmail.com> <8b34ec180903160955m3d427927wce61940f51cf5337@mail.gmail.com> Message-ID: <49BE8532.9040701@genedrift.org> No problem on Vista. Git (version 1.5.6.1-preview20080701) Paulo Bartek Wilczynski wrote: > On Mon, Mar 16, 2009 at 4:00 PM, Peter wrote: > >> I do think there is a separate need for "test driving" for those of us with >> Biopython CVS access how don't have personally experience with git >> (or github). Making the switch before then would be a very bad idea. >> >> I personally need to make time to play with git and github, doing a >> couple of *real* branches and merges. I hope to so this week, some >> of the changes I'd like to do for Biopython 1.50 would make good >> candidates... but this is time that might otherwise be spent on bug >> fixes, documentation etc. And there is of course my real job too... ;) >> >> Related to this, what OS and version of git are you (Bartel and Giovanni) using? >> >> > I'm currently using the binary installations on mac (intel) and ubuntu > (8.10). I haven't > experienced any problems which is quite expected on unix-like systems. > It would be > interesting to hear from people's experiences on windows. > > >> OK. Do you want to make sure your Launchpad branch is clearly labeled >> as not current? >> >> > > I've removed the bzr branches from launchpad, so there should be no > more confusion. > > cheers > Bartek > > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev > From biopython at maubp.freeserve.co.uk Mon Mar 16 17:07:18 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 16 Mar 2009 17:07:18 +0000 Subject: [Biopython-dev] biopython on github In-Reply-To: <49BE8532.9040701@genedrift.org> References: <8b34ec180902231029u7a9d003r533af7f078f4a8e2@mail.gmail.com> <8b34ec180903121620w9c2ec46i8fed9ccb4781370e@mail.gmail.com> <320fb6e00903130521s69c5b3eg55b71191b1e8ff21@mail.gmail.com> <128a885f0903142243r372026d7vdf5bbe998db3a326@mail.gmail.com> <20090315185443.GA30296@kunkel> <320fb6e00903160430h125d11a3jd100497d3e25ffb8@mail.gmail.com> <8b34ec180903160724h2e239fafi22d8f5fa9c1de7cc@mail.gmail.com> <320fb6e00903160800s36b8231fo57e0a11506f8635d@mail.gmail.com> <8b34ec180903160955m3d427927wce61940f51cf5337@mail.gmail.com> <49BE8532.9040701@genedrift.org> Message-ID: <320fb6e00903161007p3e36b6d3j29e4c319c762576a@mail.gmail.com> On Mon, Mar 16, 2009 at 4:58 PM, Paulo Nuin wrote: > > No problem on Vista. > > Git (version 1.5.6.1-preview20080701) > > Paulo Hi Paulo, Could you be a bit more precise about the version are you using and where got it from? i.e. Are you using cygwin or the Windows native port, http://code.google.com/p/msysgit/ And did you mean in general you have no problems with git on Windows Vista, or have you also tried fetching Biopython from github, building, testing (and installing it)? For example, are there any new line issues from the unit tests? This is one area where CVS and git may differ slightly... Thanks, Peter From dalloliogm at gmail.com Mon Mar 16 19:57:38 2009 From: dalloliogm at gmail.com (Giovanni Marco Dall'Olio) Date: Mon, 16 Mar 2009 20:57:38 +0100 Subject: [Biopython-dev] biopython on github In-Reply-To: <320fb6e00903160800s36b8231fo57e0a11506f8635d@mail.gmail.com> References: <8b34ec180902231029u7a9d003r533af7f078f4a8e2@mail.gmail.com> <320fb6e00902260600p5fb90241td1ded497c08cb901@mail.gmail.com> <128a885f0903121407g133ed8ctda57b21ff8adb70e@mail.gmail.com> <8b34ec180903121620w9c2ec46i8fed9ccb4781370e@mail.gmail.com> <320fb6e00903130521s69c5b3eg55b71191b1e8ff21@mail.gmail.com> <128a885f0903142243r372026d7vdf5bbe998db3a326@mail.gmail.com> <20090315185443.GA30296@kunkel> <320fb6e00903160430h125d11a3jd100497d3e25ffb8@mail.gmail.com> <8b34ec180903160724h2e239fafi22d8f5fa9c1de7cc@mail.gmail.com> <320fb6e00903160800s36b8231fo57e0a11506f8635d@mail.gmail.com> Message-ID: <5aa3b3570903161257h75b4289bn6cebed8312834fc9@mail.gmail.com> On Mon, Mar 16, 2009 at 4:00 PM, Peter wrote: > On Mon, Mar 16, 2009 at 2:24 PM, Bartek Wilczynski > wrote: > > Related to this, what OS and version of git are you (Bartel and Giovanni) > using? I am using git 1.5.4.3 on an Ubuntu 8.04 distribution. At home, I am using a git binary distribution on an Ubuntu 8.10. At the moment I am having some strange problems, relative to the fact that I had a branch previously named as 'biopython' in my account, so it seems don't understand well the fact that the old branch has been renamed. For example, I don't have the 'Fork' button.... but it must be a temporary problem, I already contacted the github's tech support. > Peter > > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev > -- My blog on bioinformatics (now in English): http://bioinfoblog.it From bartek at rezolwenta.eu.org Mon Mar 16 21:04:57 2009 From: bartek at rezolwenta.eu.org (Bartek Wilczynski) Date: Mon, 16 Mar 2009 22:04:57 +0100 Subject: [Biopython-dev] biopython on github In-Reply-To: <5aa3b3570903161257h75b4289bn6cebed8312834fc9@mail.gmail.com> References: <8b34ec180902231029u7a9d003r533af7f078f4a8e2@mail.gmail.com> <128a885f0903121407g133ed8ctda57b21ff8adb70e@mail.gmail.com> <8b34ec180903121620w9c2ec46i8fed9ccb4781370e@mail.gmail.com> <320fb6e00903130521s69c5b3eg55b71191b1e8ff21@mail.gmail.com> <128a885f0903142243r372026d7vdf5bbe998db3a326@mail.gmail.com> <20090315185443.GA30296@kunkel> <320fb6e00903160430h125d11a3jd100497d3e25ffb8@mail.gmail.com> <8b34ec180903160724h2e239fafi22d8f5fa9c1de7cc@mail.gmail.com> <320fb6e00903160800s36b8231fo57e0a11506f8635d@mail.gmail.com> <5aa3b3570903161257h75b4289bn6cebed8312834fc9@mail.gmail.com> Message-ID: <8b34ec180903161404s506757c2k80597a12a362cfc1@mail.gmail.com> Hi, On Mon, Mar 16, 2009 at 8:57 PM, Giovanni Marco Dall'Olio wrote: > > At the moment I am having some strange problems, relative to the fact that I > had a branch previously named as 'biopython' in my account, so it seems > don't understand well the fact that the old branch has been renamed. > For example, I don't have the 'Fork' button.... but it must be a temporary > problem, I already contacted the github's tech support. > This is connected with the change I made in the repository. Namely I renamed the branch created by Giovanni to biopuython-old and created a new one (the "official" one) called biopython again. The "rename" feature was flagged as experimental, and I don't think we would expect to use it anymore, and there were warnings that it can affect the branches forked from the branched previously created by Giovanni. These two branches were incompatible, since they were done with different scripts (different revision numbers). So if you need to make retain some changes you made to the old branch, please export them from your local copy as changesets and apply these back to the new forks made from the new repository. I'm sorry for the inconvenience. cheers Bartek From chapmanb at 50mail.com Mon Mar 16 22:42:40 2009 From: chapmanb at 50mail.com (Brad Chapman) Date: Mon, 16 Mar 2009 18:42:40 -0400 Subject: [Biopython-dev] biopython on github In-Reply-To: <320fb6e00903160430h125d11a3jd100497d3e25ffb8@mail.gmail.com> References: <320fb6e00902250210t2ad19536ke379e219ba6f7dae@mail.gmail.com> <8b34ec180902250256k6f6f5c1bvbf85d8b68a315927@mail.gmail.com> <8b34ec180902260526m3ff42f3x2a99a77d4d0fb928@mail.gmail.com> <320fb6e00902260600p5fb90241td1ded497c08cb901@mail.gmail.com> <128a885f0903121407g133ed8ctda57b21ff8adb70e@mail.gmail.com> <8b34ec180903121620w9c2ec46i8fed9ccb4781370e@mail.gmail.com> <320fb6e00903130521s69c5b3eg55b71191b1e8ff21@mail.gmail.com> <128a885f0903142243r372026d7vdf5bbe998db3a326@mail.gmail.com> <20090315185443.GA30296@kunkel> <320fb6e00903160430h125d11a3jd100497d3e25ffb8@mail.gmail.com> Message-ID: <20090316224240.GA57054@sobchak.mgh.harvard.edu> Hey everyone; Wow, y'all are quick. Bartek, Giovanni and Peter -- thanks for all the hard work and organization. Consolidating a couple of threads below... > >> I've written a short guide on the wiki : > >> http://biopython.org/wiki/GitMigration > > > > I also have a draft for some documentation... I can contribute it later this > > morning (now I don't have time). > > In the meantime, I have updated the following pages accordingly: The documentation looks awesome. My only suggestion would be to change the navigation link that current points to CVS to point to a generic page like SourceCode. Then that landing page could link to the current CVS and explain we are working to transition to Git, with links to those pages. Currently, the Git docs are a bit buried from the front page. Peter, I don't appear to have wiki permissions to edit the navigation bar; do you? Peter: > I'm thinking a news post on > http://news.open-bio.org/news/category/obf-projects/biopython/ about > version control would be a good idea at this point. How about this - This is great, and I would move the last paragraph describing the Git repository to the beginning; start with what we are doing and then describe the rationale. This should help for those with ADD, and also give more prominent credit to Bartek, Giovanni and you for the work that went into this. > > - Evaluate the success of Git. This is easy to measure in terms of > > ?new contributors, increased happiness, and what not. At the same > > ?time we can monitor how GitHub evolves over time. > > It may not be that easy to measure in practice... How about these two metrics: - How do current developers like it? Beyond the initial learning curve, does it work at least as good as CVS for day to day stuff? - Does it lower the entry barriers to contributing to Biopython? The main reason to do this is to ease the initial work for coders who feel CVS/Patches/Bugzilla is too much. If we find new contributors through this, it's a win. Modest expectations are good. If either of these fail miserably, then we can re-evaluate. Brad From chapmanb at 50mail.com Mon Mar 16 22:55:58 2009 From: chapmanb at 50mail.com (Brad Chapman) Date: Mon, 16 Mar 2009 18:55:58 -0400 Subject: [Biopython-dev] Preparing for Biopython 1.50 (beta) In-Reply-To: <320fb6e00903160516yd63f61fu21ca7560562dd6dd@mail.gmail.com> References: <320fb6e00903160516yd63f61fu21ca7560562dd6dd@mail.gmail.com> Message-ID: <20090316225558.GC57054@sobchak.mgh.harvard.edu> Peter; > I think we should probably do another release soon Good call. +1 from me. > I'd like to include the following changes as part of the beta, but it > would be sensible to have someone else try these out first. Any > volunteers? > > Bug 2767 - Bio.SeqIO support for FASTQ and QUAL files The code for this looked good when I reviewed it earlier. I will test it out with some solexa reads from here this week; any reason not to check the patch and files into CVS? Then I can fire up my coal-powered revision control system, feed two punch cards into the mouth of the machine, hope the vacuum tubes don't burn out again, and check it out locally. Brad From tiagoantao at gmail.com Tue Mar 17 00:11:50 2009 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Tue, 17 Mar 2009 00:11:50 +0000 Subject: [Biopython-dev] biopython on github In-Reply-To: <20090316224240.GA57054@sobchak.mgh.harvard.edu> References: <320fb6e00902250210t2ad19536ke379e219ba6f7dae@mail.gmail.com> <8b34ec180902260526m3ff42f3x2a99a77d4d0fb928@mail.gmail.com> <320fb6e00902260600p5fb90241td1ded497c08cb901@mail.gmail.com> <128a885f0903121407g133ed8ctda57b21ff8adb70e@mail.gmail.com> <8b34ec180903121620w9c2ec46i8fed9ccb4781370e@mail.gmail.com> <320fb6e00903130521s69c5b3eg55b71191b1e8ff21@mail.gmail.com> <128a885f0903142243r372026d7vdf5bbe998db3a326@mail.gmail.com> <20090315185443.GA30296@kunkel> <320fb6e00903160430h125d11a3jd100497d3e25ffb8@mail.gmail.com> <20090316224240.GA57054@sobchak.mgh.harvard.edu> Message-ID: <6d941f120903161711p71c7c940t1eabe933c0fa43e5@mail.gmail.com> I've been reading this thread and mainly staying silent but there is one question that is not clear in my mind but I believe it is important: How is the "official" biopython trunk controlled? Currently what is on CVS is the gospel and Peter and Michiel essencially have control of what is there and what is labelled as a "biopython distribution". How will this work now? The second question, related to the first is how will different branches (of different persons) be managed? I am seeing people starting working on the same code in different directions and then having problems merging everything together. Maybe these questions stem from my ignorance of distributed version control. But, if not, I think they should be resolved before advancing. My suggestion: write (or at least informally agree) the policy before advancing. While distributed version control seems a good idea (no opposition), it also seems a good way to create new problems. BTW, I would be tempted to suggest that a labelled release would be a good starting point for a distributed revision control bootstrap. On Mon, Mar 16, 2009 at 10:42 PM, Brad Chapman wrote: > Hey everyone; > Wow, y'all are quick. Bartek, Giovanni and Peter -- thanks for all > the hard work and organization. Consolidating a couple of threads > below... > >> >> I've written a short guide on the wiki : >> >> http://biopython.org/wiki/GitMigration >> > >> > I also have a draft for some documentation... I can contribute it later this >> > morning (now I don't have time). >> >> In the meantime, I have updated the following pages accordingly: > > The documentation looks awesome. My only suggestion would be to > change the navigation link that current points to CVS to point to a > generic page like SourceCode. Then that landing page could link > to the current CVS and explain we are working to transition to > Git, with links to those pages. Currently, the Git docs are a > bit buried from the front page. > > Peter, I don't appear to have wiki permissions to edit the navigation > bar; do you? > > Peter: >> I'm thinking a news post on >> http://news.open-bio.org/news/category/obf-projects/biopython/ about >> version control would be a good idea at this point. ?How about this - > > This is great, and I would move the last paragraph describing > the Git repository to the beginning; start with what we are doing and > then describe the rationale. This should help for those with ADD, and > also give more prominent credit to Bartek, Giovanni and you for the > work that went into this. > >> > - Evaluate the success of Git. This is easy to measure in terms of >> > ?new contributors, increased happiness, and what not. At the same >> > ?time we can monitor how GitHub evolves over time. >> >> It may not be that easy to measure in practice... > > How about these two metrics: > > - How do current developers like it? Beyond the initial learning > ?curve, does it work at least as good as CVS for day to day stuff? > > - Does it lower the entry barriers to contributing to Biopython? The > ?main reason to do this is to ease the initial work for coders who > ?feel CVS/Patches/Bugzilla is too much. If we find new contributors > ?through this, it's a win. > > Modest expectations are good. If either of these fail miserably, then > we can re-evaluate. > > Brad > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev > -- "A man who dares to waste one hour of time has not discovered the value of life" - Charles Darwin From dschruth at u.washington.edu Mon Mar 16 23:15:39 2009 From: dschruth at u.washington.edu (David Schruth) Date: Mon, 16 Mar 2009 16:15:39 -0700 Subject: [Biopython-dev] Preparing for Biopython 1.50 (beta) In-Reply-To: <20090316225558.GC57054@sobchak.mgh.harvard.edu> References: <320fb6e00903160516yd63f61fu21ca7560562dd6dd@mail.gmail.com> <20090316225558.GC57054@sobchak.mgh.harvard.edu> Message-ID: <49BEDD9B.6030905@u.washington.edu> I've got some 454 and Solid data you could test it on too. Has anybody else looked into how these other two Next Gen formats might complicate things? Brad Chapman wrote: > Peter; > > >> I think we should probably do another release soon >> > > Good call. +1 from me. > > >> I'd like to include the following changes as part of the beta, but it >> would be sensible to have someone else try these out first. Any >> volunteers? >> >> Bug 2767 - Bio.SeqIO support for FASTQ and QUAL files >> > > The code for this looked good when I reviewed it earlier. I will > test it out with some solexa reads from here this week; any reason > not to check the patch and files into CVS? Then I can fire up my > coal-powered revision control system, feed two punch cards into the > mouth of the machine, hope the vacuum tubes don't burn out again, > and check it out locally. > > Brad > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev > -------------- next part -------------- A non-text attachment was scrubbed... Name: dschruth.vcf Type: text/x-vcard Size: 450 bytes Desc: not available URL: From bugzilla-daemon at portal.open-bio.org Tue Mar 17 00:40:01 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 16 Mar 2009 20:40:01 -0400 Subject: [Biopython-dev] [Bug 2790] New: Genepop parser creates a full representation of the file on memory Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2790 Summary: Genepop parser creates a full representation of the file on memory Product: Biopython Version: Not Applicable Platform: All OS/Version: All Status: NEW Severity: normal Priority: P2 Component: PopGen AssignedTo: biopython-dev at biopython.org ReportedBy: tiagoantao at gmail.com The genepop parser creates a full representation of the file on memory. This is fine for most users (like with 100/200 individuals and 100 markers) but, more and more people appear now with thousands of individuals and/or thousands of loci. In some cases the whole file doesn't fit memory. An alternative (iterator based) interface has to be created which only maintains a subset of the file in memory -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From idoerg at gmail.com Tue Mar 17 00:49:39 2009 From: idoerg at gmail.com (Iddo Friedberg) Date: Mon, 16 Mar 2009 17:49:39 -0700 Subject: [Biopython-dev] Preparing for Biopython 1.50 (beta) In-Reply-To: <49BEDD9B.6030905@u.washington.edu> References: <320fb6e00903160516yd63f61fu21ca7560562dd6dd@mail.gmail.com> <20090316225558.GC57054@sobchak.mgh.harvard.edu> <49BEDD9B.6030905@u.washington.edu> Message-ID: <1237250979.20135.5.camel@lafa> I have. For one thing, GenBank has some new files that break the current parser. LOCUS ABDH01000000 55108 rc DNA linear ENV 26-NOV-2007 This is a typical header for an environmental sequence (notice the ENV). Note taht this does not necessarily have to be a next-gen sequence. It can also be Sanger. The point is, it's not genome associated, but obtained using metagenomic methods To our business: the "rc" breaks the parser. The file itself is attahed. Note that in the end iit does not have a sequence, but rather a WGS field that points to sequence files. I'll actually be happy to take this one. ./I On Mon, 2009-03-16 at 16:15 -0700, David Schruth wrote: > I've got some 454 and Solid data you could test it on too. > > Has anybody else looked into how these other two Next Gen formats might > complicate things? > > Brad Chapman wrote: > > Peter; > > > > > >> I think we should probably do another release soon > >> > > > > Good call. +1 from me. > > > > > >> I'd like to include the following changes as part of the beta, but it > >> would be sensible to have someone else try these out first. Any > >> volunteers? > >> > >> Bug 2767 - Bio.SeqIO support for FASTQ and QUAL files > >> > > > > The code for this looked good when I reviewed it earlier. I will > > test it out with some solexa reads from here this week; any reason > > not to check the patch and files into CVS? Then I can fire up my > > coal-powered revision control system, feed two punch cards into the > > mouth of the machine, hope the vacuum tubes don't burn out again, > > and check it out locally. > > > > Brad > > _______________________________________________ > > Biopython-dev mailing list > > Biopython-dev at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/biopython-dev > > > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev -- Iddo Friedberg, Ph.D. CALIT2 Atkinson Hall MC #0446 University of California San Diego 9500 Gilman Drive La Jolla, CA 92093-0446 USA +1 (858) 534-0570 http://iddo-friedberg.org -------------- next part -------------- LOCUS ABDH01000000 55108 rc DNA linear ENV 26-NOV-2007 DEFINITION Termite gut metagenome, whole genome shotgun sequencing project. ACCESSION ABDH00000000 VERSION ABDH00000000.1 GI:161074815 PROJECT GenomeProject:19107 DBLINK Project:19107 KEYWORDS WGS. SOURCE termite gut metagenome ORGANISM termite gut metagenome unclassified sequences; metagenomes; organismal metagenomes. REFERENCE 1 (bases 1 to 55108) AUTHORS Warnecke,F., Luginbuhl,P., Ivanova,N., Ghassemian,M., Richardson,T.H., Stege,J.T., Cayouette,M., McHardy,A.C., Djordjevic,G., Aboushadi,N., Sorek,R., Tringe,S.G., Podar,M., Martin,H.G., Kunin,V., Dalevi,D., Madejska,J., Kirton,E., Platt,D., Szeto,E., Salamov,A., Barry,K., Mikhailova,N., Kyrpides,N.C., Matson,E.G., Ottesen,E.A., Zhang,X., Hernandez,M., Murillo,C., Acosta,L.G., Rigoutsos,I., Tamayo,G., Green,B.D., Chang,C., Rubin,E.M., Mathur,E.J., Robertson,D.E., Hugenholtz,P. and Leadbetter,J.R. TITLE Metagenomic and functional analysis of hindgut microbiota of a wood-feeding higher termite JOURNAL Nature 450 (7169), 560-565 (2007) PUBMED 18033299 REFERENCE 2 (bases 1 to 55108) AUTHORS Warnecke,F., Luginbuhl,P., Ivanova,N., Ghassemian,M., Richardson,T.H., Stege,J.T., Cayouette,M., Djordjevic,G., Aboushadi,N., Sorek,R., Tringe,S.G., Podar,M., Garcia Martin,H., Kunin,V., McHardy,A.C., Dalevi,D., Madejska,J., Kirton,E., Platt,D., Szeto,E., Salamov,A., Barry,K., Mikhailova,N., Kyrpides,N., Matson,E.G., Ottesen,E.A., Zhang,X., Hernandez,M., Murillo,C., Acosta,L.G., Rigoutsos,I., Tamayo,G., Green,B., Chang,C., Rubin,E.M., Mathur,E.J., Robertson,D.E., Hugenholtz,P. and Leadbetter,J.R. TITLE Direct Submission JOURNAL Submitted (27-JUN-2007) Microbial Ecology Program, US DOE Joint Genome Institute, 2800 Mitchell Drive B100, Walnut Creek, CA 94598-1698, USA COMMENT The termite gut metagenome whole genome shotgun (WGS) project has the project accession ABDH00000000. This version of the project (01) has the accession number ABDH01000000, and consists of sequences ABDH01000001-ABDH01055108. URL -- http://www.jgi.doe.gov JGI Project ID:4001605 Contact: Philip Hugenholtz (PHugenholtz at lbl.gov) sampling site latitude: N10.11.260; sampling site longitude: W083.51.345; sampling site altitude: 310 m AMSL; sample type: lumen content; host species: Nasutitermes sp.; anatomic site: gut, proctodeal segment 3, lumen; association type: symbiosis; sample treatment and preservation: termites were collected, transported to laboratory alive within 36 hours, P3 gut lumen fluid was extracted and stored frozen in buffered saline solution until DNA extraction. The JGI and collaborators endorse the principles for the distribution and use of large scale sequencing data adopted by the larger genome sequencing community and urge users of this data to follow them. It is our intention to publish the work of this project in a timely fashion and we welcome collaborative interaction on the project and analysis. (http://www.genome.gov/page.cfm?pageID=10506376). FEATURES Location/Qualifiers source 1..55108 /organism="termite gut metagenome" /mol_type="genomic DNA" /isolation_source="Nasutitermes sp. proctodeal segment 3 gut lumen" /db_xref="taxon:433724" /environmental_sample /country="Costa Rica" /lat_lon="10.1877 N 83.8558 W" /note="metagenomic" WGS ABDH01000001-ABDH01055108 // From chris.lasher at gmail.com Tue Mar 17 03:45:33 2009 From: chris.lasher at gmail.com (Chris Lasher) Date: Mon, 16 Mar 2009 23:45:33 -0400 Subject: [Biopython-dev] biopython on github In-Reply-To: <6d941f120903161711p71c7c940t1eabe933c0fa43e5@mail.gmail.com> References: <320fb6e00902250210t2ad19536ke379e219ba6f7dae@mail.gmail.com> <320fb6e00902260600p5fb90241td1ded497c08cb901@mail.gmail.com> <128a885f0903121407g133ed8ctda57b21ff8adb70e@mail.gmail.com> <8b34ec180903121620w9c2ec46i8fed9ccb4781370e@mail.gmail.com> <320fb6e00903130521s69c5b3eg55b71191b1e8ff21@mail.gmail.com> <128a885f0903142243r372026d7vdf5bbe998db3a326@mail.gmail.com> <20090315185443.GA30296@kunkel> <320fb6e00903160430h125d11a3jd100497d3e25ffb8@mail.gmail.com> <20090316224240.GA57054@sobchak.mgh.harvard.edu> <6d941f120903161711p71c7c940t1eabe933c0fa43e5@mail.gmail.com> Message-ID: <128a885f0903162045l474d0df3w2b8fad7f7f129a3b@mail.gmail.com> 2009/3/16 Tiago Ant?o > I've been reading this thread and mainly staying silent but there is > one question that is not clear in my mind but I believe it is > important: > > How is the "official" biopython trunk controlled? Currently what is on > CVS is the gospel and Peter and Michiel essencially have control of > what is there and what is labelled as a "biopython distribution". How > will this work now? In a distributed workflow, there is no technical official repository. The "official repository" is socially enforced. Technically, there is no official repository of the Linux kernel anymore. However, there is an "official" version, which is Linus Torvald's repository. It is socially enforced. I think Michiel and Peter still head the Biopython project--at least they have the most clout, I would say. Therefore, we will probably look to one of their branches as the "official" branch of Biopython. When one of them wants to step down in duty, we will socially pass the torch on to the next taker. See "6.3 Using gatekeepers" at http://doc.bazaar-vcs.org/latest/en/user-guide/index.html#team-collaboration-distributed-style See also http://betterexplained.com/articles/intro-to-distributed-version-control-illustrated/ > The second question, related to the first is how will different > branches (of different persons) be managed? I am seeing people > starting working on the same code in different directions and then > having problems merging everything together. People are supposed to work in different directions; this is the point of distributed workflows. Merging tends not to be so difficult, and compared to centralized models like CVS and SVN, it's a cinch. We will help provide documentation for proper merging habits (e.g., merge early, merge often, and no rebasing after pushing, etc.). There are also screencasts popping up (in particular Scott Chacon's re-make of his Gitcasts, now at learn.github) that we will link to for educational purposes. And of course, other developers will be around to help out in tricky merges. Chris From bugzilla-daemon at portal.open-bio.org Tue Mar 17 04:11:34 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 17 Mar 2009 00:11:34 -0400 Subject: [Biopython-dev] [Bug 2791] New: GenBank Scanner does not parse environmental (ENV) files Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2791 Summary: GenBank Scanner does not parse environmental (ENV) files Product: Biopython Version: 1.49 Platform: All OS/Version: All Status: NEW Severity: major Priority: P2 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: idoerg at gmail.com CC: idoerg at gmail.com GenBank Scanner does not parse environmental (ENV) files. Breask on the 'rc' characters in the LOCUS lines. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Mar 17 04:14:50 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 17 Mar 2009 00:14:50 -0400 Subject: [Biopython-dev] [Bug 2791] GenBank Scanner does not parse environmental (ENV) files In-Reply-To: Message-ID: <200903170414.n2H4Eoit008338@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2791 idoerg at gmail.com changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |ASSIGNED -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Mar 17 04:32:30 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 17 Mar 2009 00:32:30 -0400 Subject: [Biopython-dev] [Bug 2791] GenBank Scanner does not parse environmental (ENV) files In-Reply-To: Message-ID: <200903170432.n2H4WUQn009490@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2791 idoerg at gmail.com changed: What |Removed |Added ---------------------------------------------------------------------------- AssignedTo|biopython-dev at biopython.org |idoerg at gmail.com Status|ASSIGNED |NEW -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. You are the assignee for the bug, or are watching the assignee. From biopython at maubp.freeserve.co.uk Tue Mar 17 08:46:03 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 17 Mar 2009 08:46:03 +0000 Subject: [Biopython-dev] biopython on github In-Reply-To: <128a885f0903162045l474d0df3w2b8fad7f7f129a3b@mail.gmail.com> References: <320fb6e00902250210t2ad19536ke379e219ba6f7dae@mail.gmail.com> <128a885f0903121407g133ed8ctda57b21ff8adb70e@mail.gmail.com> <8b34ec180903121620w9c2ec46i8fed9ccb4781370e@mail.gmail.com> <320fb6e00903130521s69c5b3eg55b71191b1e8ff21@mail.gmail.com> <128a885f0903142243r372026d7vdf5bbe998db3a326@mail.gmail.com> <20090315185443.GA30296@kunkel> <320fb6e00903160430h125d11a3jd100497d3e25ffb8@mail.gmail.com> <20090316224240.GA57054@sobchak.mgh.harvard.edu> <6d941f120903161711p71c7c940t1eabe933c0fa43e5@mail.gmail.com> <128a885f0903162045l474d0df3w2b8fad7f7f129a3b@mail.gmail.com> Message-ID: <320fb6e00903170146x59218aa0m857cab797ad3f440@mail.gmail.com> On Tue, Mar 17, 2009 at 3:45 AM, Chris Lasher wrote: > 2009/3/16 Tiago Ant?o > >> I've been reading this thread and mainly staying silent but there is >> one question that is not clear in my mind but I believe it is >> important: >> >> How is the "official" biopython trunk controlled? Currently what is on >> CVS is the gospel and Peter and Michiel essencially have control of >> what is there and what is labelled as a "biopython distribution". How >> will this work now? > > In a distributed workflow, there is no technical official repository. The > "official repository" is socially enforced. Technically, there is no > official repository of the Linux kernel anymore. However, there is an > "official" version, which is Linus Torvald's repository. It is socially > enforced. I think Michiel and Peter still head the Biopython project--at > least they have the most clout, I would say. Therefore, we will probably > look to one of their branches as the "official" branch of Biopython. When > one of them wants to step down in duty, we will socially pass the torch on > to the next taker. I think it is essential we have a clearly labeled official trunk (perhaps with branches for releases), which will be used for all the official releases (tar balls, zip files and windows installers). Our main webpage should make this very clear. We could potentially continue to have a shared official branch (e.g. belonging to the generic github biopython user), and give all the existing CVS contributors write access - and continue to manage this as before. So for example, if Frank wanted to check in some minor changes to Bio.Nexus he could just do it. Future contributors patches/branches might get taken up by a developer on a personal branch for testing, before being merged into the official branch. i.e. We can initially continue as before - right now I don't have a feel for how much work the role of an official branch maintainer would be, and it is difficult to guess without more hands on experience using the new tools. >> The second question, related to the first is how will different >> branches (of different persons) be managed? I am seeing people >> starting working on the same code in different directions and then >> having problems merging everything together. > > People are supposed to work in different directions; this is the point of > distributed workflows. Merging tends not to be so difficult, and compared to > centralized models like CVS and SVN, it's a cinch. We will help provide > documentation for proper merging habits (e.g., merge early, merge often, and > no rebasing after pushing, etc.). There are also screencasts popping up (in > particular Scott Chacon's re-make of his Gitcasts, now at learn.github) that > we will link to for educational purposes. And of course, other developers > will be around to help out in tricky merges. Well, yes, in theory we have the same problem now with CVS - and while the tools may make merging easier, some communication is essential when working on the key modules which impact large parts of the code base. Peter From biopython at maubp.freeserve.co.uk Tue Mar 17 08:58:00 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 17 Mar 2009 08:58:00 +0000 Subject: [Biopython-dev] biopython on github In-Reply-To: <20090316224240.GA57054@sobchak.mgh.harvard.edu> References: <320fb6e00902250210t2ad19536ke379e219ba6f7dae@mail.gmail.com> <8b34ec180902260526m3ff42f3x2a99a77d4d0fb928@mail.gmail.com> <320fb6e00902260600p5fb90241td1ded497c08cb901@mail.gmail.com> <128a885f0903121407g133ed8ctda57b21ff8adb70e@mail.gmail.com> <8b34ec180903121620w9c2ec46i8fed9ccb4781370e@mail.gmail.com> <320fb6e00903130521s69c5b3eg55b71191b1e8ff21@mail.gmail.com> <128a885f0903142243r372026d7vdf5bbe998db3a326@mail.gmail.com> <20090315185443.GA30296@kunkel> <320fb6e00903160430h125d11a3jd100497d3e25ffb8@mail.gmail.com> <20090316224240.GA57054@sobchak.mgh.harvard.edu> Message-ID: <320fb6e00903170158o757a4fc4naae80f83850d6093@mail.gmail.com> > > The documentation looks awesome. My only suggestion would be to > change the navigation link that current points to CVS to point to a > generic page like SourceCode. Then that landing page could link > to the current CVS and explain we are working to transition to > Git, with links to those pages. Currently, the Git docs are a > bit buried from the front page. > > Peter, I don't appear to have wiki permissions to edit the navigation > bar; do you? I'm not sure how to do it (although I probably have the relevant permissions). I can probably give you admin rights - you use the "Chapmanb" username on the wiki, right? > Peter: >> I'm thinking a news post on >> http://news.open-bio.org/news/category/obf-projects/biopython/ about >> version control would be a good idea at this point. ?How about this - > > This is great, and I would move the last paragraph describing > the Git repository to the beginning; start with what we are doing and > then describe the rationale. This should help for those with ADD, and > also give more prominent credit to Bartek, Giovanni and you for the > work that went into this. OK. New version, with the markup for the links included: Initially for evaluation purposes only, Giovanni and Bartek have setup a mirror of Biopython on GitHub, which is automatically updated from the OBF hosted Biopython CVS repository. See our git migration wiki page for details. If this is favorably received, then moving Biopython from CVS to git seems likely at some point this year. Originally, all the OBF hosted projects used CVS for their source code repositories. At the start of 2008, BioPerl and BioJava moved over to Subversion (SVN), followed by BioSQL. Biopython was originally going to do the same, but this didn't actually happen. Having all the Bio* projects using the same version control system would have simplified server administration for the OBF, but using SVN wouldn't really have made that much difference to Biopython development. Discussion on the Biopython development mailing list has since shifted towards next-generation distributed version control systems like git or Bazaar. Quote from Linus Torvalds,
The slogan of Subversion for a while was ?CVS done right?, or something like that, and if you start with that kind of slogan, there's nowhere you can go. There is no way to do CVS right.
In addition to creating the Linux kernel, Linus Torvalds more recently wrote git, a prominent example of a distributed version control system. Rather than switching from CVS to SVN, the BioRuby project choose instead to use git, hosted on github (see the BioRuby repository). Biopython is considering doing something similar - using a distributed version control system like git should make it easier for potential Biopython contributors to manage their own local copies of Biopython under version control. Peter, on behalf of the Biopython developers From biopython at maubp.freeserve.co.uk Tue Mar 17 09:06:31 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 17 Mar 2009 09:06:31 +0000 Subject: [Biopython-dev] history on github - where are the tags? Message-ID: <320fb6e00903170206h570989bbgb6b9a761d2aa70ed@mail.gmail.com> Hi Bartek et al, I've just been looking over the github mirror of CVS, and wanted to see it presented the history of individual files. For example, this page looks at the Bio/SeqRecord.py history using ViewCVS: http://cvs.biopython.org/cgi-bin/viewcvs/viewcvs.cgi/biopython/Bio/SeqRecord.py?cvsroot=biopython For comparison, in GitHub, http://github.com/biopython/biopython/commits/master/Bio/SeqRecord.py As you can see, all the comments and changes are there - which is great. But I can't see the CVS tag information, which I assume would be converting into git tags. Is this information present in the git repository, but not shown by github, or was it lost during the migration? This might seem like a little thing, but I have found it incredibly important for tracing bugs reported in older releases, for example in narrowing down when something changed. Peter From biopython at maubp.freeserve.co.uk Tue Mar 17 09:41:22 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 17 Mar 2009 09:41:22 +0000 Subject: [Biopython-dev] biopython on github In-Reply-To: <8b34ec180903161404s506757c2k80597a12a362cfc1@mail.gmail.com> References: <8b34ec180902231029u7a9d003r533af7f078f4a8e2@mail.gmail.com> <8b34ec180903121620w9c2ec46i8fed9ccb4781370e@mail.gmail.com> <320fb6e00903130521s69c5b3eg55b71191b1e8ff21@mail.gmail.com> <128a885f0903142243r372026d7vdf5bbe998db3a326@mail.gmail.com> <20090315185443.GA30296@kunkel> <320fb6e00903160430h125d11a3jd100497d3e25ffb8@mail.gmail.com> <8b34ec180903160724h2e239fafi22d8f5fa9c1de7cc@mail.gmail.com> <320fb6e00903160800s36b8231fo57e0a11506f8635d@mail.gmail.com> <5aa3b3570903161257h75b4289bn6cebed8312834fc9@mail.gmail.com> <8b34ec180903161404s506757c2k80597a12a362cfc1@mail.gmail.com> Message-ID: <320fb6e00903170241i5b4a122ax1f33ff18450771df@mail.gmail.com> On Mon, Mar 16, 2009 at 9:04 PM, Bartek Wilczynski wrote: > Hi, > On Mon, Mar 16, 2009 at 8:57 PM, Giovanni Marco Dall'Olio > wrote: >> >> At the moment I am having some strange problems, relative to the fact that I >> had a branch previously named as 'biopython' in my account, so it seems >> don't understand well the fact that the old branch has been renamed. >> For example, I don't have the 'Fork' button.... but it must be a temporary >> problem, I already contacted the github's tech support. > > This is connected with the change I made in the repository. Namely I > renamed the branch created by Giovanni to biopuython-old and created > a new one (the "official" one) called biopython again. > > The "rename" feature was flagged as experimental, and I don't think we > would expect to use it anymore, and there were warnings that it can affect > the branches forked from the branched previously created by Giovanni. We may need to do another rename, if we have to repeat the CVS to git migration. For example, see my other email about the CVS tags (missing?). Another potential question is can you re-map the CVS usernames as part of the migration? e.g. Can you somehow replace CVS users "bartek", "peterc", ... with guthub users "barwil", "peterjc", ...? Not essential, but it would be nice. I would suggest as a precaution we rename it sooner rather than later (while only a few people will be inconvenienced), going from biopython to biopython-cvs-mirror (or similar). If this does end up being the actual trunk branch, we can just fork it under a new branch name like "biopython" or "biopython-official" etc. Peter From lpritc at scri.ac.uk Tue Mar 17 09:59:32 2009 From: lpritc at scri.ac.uk (Leighton Pritchard) Date: Tue, 17 Mar 2009 09:59:32 +0000 Subject: [Biopython-dev] biopython on github In-Reply-To: <320fb6e00903170146x59218aa0m857cab797ad3f440@mail.gmail.com> Message-ID: Hi all, This has been an occasionally frustrating thread to read... On 17/03/2009 08:46, "Peter" wrote: > On Tue, Mar 17, 2009 at 3:45 AM, Chris Lasher wrote: >> 2009/3/16 Tiago Ant?o >> >>> How is the "official" biopython trunk controlled? Currently what is on >>> CVS is the gospel and Peter and Michiel essencially have control of >>> what is there and what is labelled as a "biopython distribution". How >>> will this work now? >> In a distributed workflow, there is no technical official repository. The >> "official repository" is socially enforced. That was true before. Unless I misread the Biopython licencing, there was no real barrier to putting a branched copy of the code on your own server/site, with your own modifications. What git does is provide tools to make merging of that sort of code easier (along with a number of of other nice features, such as authentication of contributions). The presence of git does not ensure that your changes, or anyone else's, will be merged with any other repository, and nor does it ensure the quality of contributed code. Git, while nice, and ideal for a number of tasks, is no magic bullet. To an extent, the 'official' repository is, pragmatically, the one that is most stable and well-tested. If my hypothetical branched version had become more stable and widely-used than the 'official' trunk, and become the most frequently downloaded and implemented, and received new contributions in its own right, it might then be considered de facto 'the distribution'; nasty online spats with the original authors notwithstanding. The 'social enforcement' of politeness (i.e. *I* don't take credit for *your* work) prevents this to an extent, as it ought to under any versioning system. There's a competing tendency to consider that the coders who spent the most time creating the code understand it the best, and are in the best position to maintain it directly. This is true to a large degree, and entirely applicable to Biopython's contributed modules. git can potentially facilitate that sort of contribution to the 'official' trunk in a way that CVS can't, due to its permissions bottleneck. However, the mechanics of incorporating that contributed code are more or less the same: the people with control of the 'official' trunk review the code and decide whether to include it. This is true whether the code is submitted as a patch to Bugzilla, emailed to a developer, put up on public CVS on your site, or in a forked git repository. The same is true of your own git repository - you don't have to include someone else's forked code if you don't want to. What possibly needs to change is not the version control system, but the way in which people think about their contribution. Contributions can be made productively under any versioning system, and the key questions remain the same in all cases: Does the new code work (are there tests)? Does the new code break any old code? Is there documentation? Is the API consistent? "What version control system are we using?" is a minor detail, unless it is inherently broken, hinders any of the above, or causes some other deal-breaking issue (for Linus Torvalds, this included speed issues for merges). >> I think Michiel and Peter still head the Biopython project--at >> least they have the most clout, I would say. Therefore, we will probably >> look to one of their branches as the "official" branch of Biopython. When >> one of them wants to step down in duty, we will socially pass the torch on >> to the next taker. It has always been thus. Now, instead of passing on the user authentication to the CVS server at OBF, the user authentication to the biopython github account will be passed on, instead: > I think it is essential we have a clearly labeled official trunk > (perhaps with branches for releases), which will be used for all the > official releases (tar balls, zip files and windows installers). Our > main webpage should make this very clear. > > We could potentially continue to have a shared official branch (e.g. > belonging to the generic github biopython user), and give all the > existing CVS contributors write access - and continue to manage this > as before. So for example, if Frank wanted to check in some minor > changes to Bio.Nexus he could just do it. Future contributors > patches/branches might get taken up by a developer on a personal > branch for testing, before being merged into the official branch. > > i.e. We can initially continue as before - right now I don't have a > feel for how much work the role of an official branch maintainer would > be, and it is difficult to guess without more hands on experience > using the new tools. Plus ca change (avec git)... >>> The second question, related to the first is how will different >>> branches (of different persons) be managed? I am seeing people >>> starting working on the same code in different directions and then >>> having problems merging everything together. >> >> People are supposed to work in different directions; this is the point of >> distributed workflows. I may have a different understanding of 'different directions' than you mean, but I don't think that it's good for a community project if people work in different directions. I also don't think that that is the point of distributed workflows; on the contrary, I think that they are intended to make it easier to work independently towards a common goal. Even if that is by working on loosely- or non-interacting parts of the whole. >> Merging tends not to be so difficult, and compared to >> centralized models like CVS and SVN, it's a cinch. We will help provide >> documentation for proper merging habits (e.g., merge early, merge often, and >> no rebasing after pushing, etc.). There are also screencasts popping up (in >> particular Scott Chacon's re-make of his Gitcasts, now at learn.github) that >> we will link to for educational purposes. >> And of course, other developers will be around to help out in tricky merges. This characterises one of the frustrating aspects of this thread (not getting at you personally, Chris) - the occasional implicit assumption that 'things will be inherently *better* if we use git'. Developers are around to help now, even using CVS (which also has clear, long-standing stable documentation - and even an O'Reilly book). Several people don't seem to think that that - and the way that code is reviewed and incorporated into the main distribution - is good enough, and I don't think that this will change just because the version control system has changed. Nor will changing revision control system generate significant free time to write, test and document code. But we may have the recession to do that last one for us. > Well, yes, in theory we have the same problem now with CVS - and while > the tools may make merging easier, some communication is essential > when working on the key modules which impact large parts of the code > base. I would put it more strongly than that: communication is essential in all aspects of the project. A number of related blog posts make statements along the lines of "I don't use Biopython, or post to the mailing lists, but I think that they're doing *this* wrong", or "I submitted code, but it didn't get taken up immediately". Now, venting and ranting on a blog is fine, but it's not really *communicating*, any more than it was when I thought that the BioSQL GenBank upload code was broken, fixed it (for my purposes) and told no-one. Git won't change the communication issue (in either direction) any more than it changes the code review process. FWIW, I think that git looks like a good way to go, and that it could help encourage people to make local modifications of Biopython for their own benefit and in their own interests and expert area, in a way that is visible to the core distribution (unlike the patch submission process that is now implemented). In that way it could facilitate more rapid expansion of the core distribution. However, the bottlenecks of ensuring code quality, testing and documentation will only ease if that is taken up by the individuals/groups making those contributions, in addition to the core developers. And yes, I know I'm late with the new GenomeDiagram docs... ;) L. -- Dr Leighton Pritchard MRSC D131, Plant Pathology Programme, SCRI Errol Road, Invergowrie, Perth and Kinross, Scotland, DD2 5DA e:lpritc at scri.ac.uk w:http://www.scri.ac.uk/staff/leightonpritchard gpg/pgp: 0xFEFC205C tel:+44(0)1382 562731 x2405 ______________________________________________________ SCRI, Invergowrie, Dundee, DD2 5DA. The Scottish Crop Research Institute is a charitable company limited by guarantee. Registered in Scotland No: SC 29367. Recognised by the Inland Revenue as a Scottish Charity No: SC 006662. DISCLAIMER: This email is from the Scottish Crop Research Institute, but the views expressed by the sender are not necessarily the views of SCRI and its subsidiaries. This email and any files transmitted with it are confidential to the intended recipient at the e-mail address to which it has been addressed. It may not be disclosed or used by any other than that addressee. If you are not the intended recipient you are requested to preserve this confidentiality and you must not use, disclose, copy, print or rely on this e-mail in any way. Please notify postmaster at scri.ac.uk quoting the name of the sender and delete the email from your system. Although SCRI has taken reasonable precautions to ensure no viruses are present in this email, neither the Institute nor the sender accepts any responsibility for any viruses, and it is your responsibility to scan the email and the attachments (if any). ______________________________________________________ From bartek at rezolwenta.eu.org Tue Mar 17 10:06:33 2009 From: bartek at rezolwenta.eu.org (Bartek Wilczynski) Date: Tue, 17 Mar 2009 11:06:33 +0100 Subject: [Biopython-dev] history on github - where are the tags? In-Reply-To: <8b34ec180903170302v7dca4f04w85a11d3f0fbe6314@mail.gmail.com> References: <320fb6e00903170206h570989bbgb6b9a761d2aa70ed@mail.gmail.com> <8b34ec180903170302v7dca4f04w85a11d3f0fbe6314@mail.gmail.com> Message-ID: <8b34ec180903170306ocf4b9e7s6d34cacdfb7e423b@mail.gmail.com> Hi, I'll look into this. I'm now heading for a plane, so I can't do it now. cheers Bartek On Tue, Mar 17, 2009 at 11:02 AM, Bartek Wilczynski wrote: > Hi, > > I'll look into this. I'm now heading for a plane, so I can't do it now. > > cheers > ?Bartek > > On Tue, Mar 17, 2009 at 10:06 AM, Peter wrote: >> Hi Bartek et al, >> >> I've just been looking over the github mirror of CVS, and wanted to >> see it presented the history of individual files. ?For example, this >> page looks at the Bio/SeqRecord.py history using ViewCVS: >> http://cvs.biopython.org/cgi-bin/viewcvs/viewcvs.cgi/biopython/Bio/SeqRecord.py?cvsroot=biopython >> >> For comparison, in GitHub, >> http://github.com/biopython/biopython/commits/master/Bio/SeqRecord.py >> >> As you can see, all the comments and changes are there - which is >> great. ?But I can't see the CVS tag information, which I assume would >> be converting into git tags. ?Is this information present in the git >> repository, but not shown by github, or was it lost during the >> migration? ?This might seem like a little thing, but I have found it >> incredibly important for tracing bugs reported in older releases, for >> example in narrowing down when something changed. >> >> Peter >> > > > > -- > Bartek Wilczynski > ================== > Postdoctoral fellow > EMBL, Furlong group > Meyerhoffstrasse 1, > 69012 Heidelberg, > Germany > tel: +49 6221 387 8433 > -- Bartek Wilczynski ================== Postdoctoral fellow EMBL, Furlong group Meyerhoffstrasse 1, 69012 Heidelberg, Germany tel: +49 6221 387 8433 From biopython at maubp.freeserve.co.uk Tue Mar 17 10:17:25 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 17 Mar 2009 10:17:25 +0000 Subject: [Biopython-dev] gitignore file for github Message-ID: <320fb6e00903170317q683202c6ycd799de0ba748ef4@mail.gmail.com> Hi all, I think we should add a .gitignore file to the github mirror copy repository, which should ignore: * the build subdirectory and all its contents * all *.pyc files (recursively, e.g. for the unit tests) * all LaTeX temporary files recursively under Doc (e.g. *.aux, *.log) Is there anything else this should include? There are a few output files created by the unit tests that we might want to include... Otherwise all these files show up as "unstaged" to use git's terminology, and there is a risk of someone accidentally committing them. Peter From biopython at maubp.freeserve.co.uk Tue Mar 17 10:57:37 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 17 Mar 2009 10:57:37 +0000 Subject: [Biopython-dev] gitignore file for github In-Reply-To: <320fb6e00903170317q683202c6ycd799de0ba748ef4@mail.gmail.com> References: <320fb6e00903170317q683202c6ycd799de0ba748ef4@mail.gmail.com> Message-ID: <320fb6e00903170357s14a20each59f50f5e155298b0@mail.gmail.com> On Tue, Mar 17, 2009 at 10:17 AM, Peter wrote: > Hi all, > > I think we should add a .gitignore file to the github mirror copy > repository, which should ignore: > > * the build subdirectory and all its contents > * all *.pyc files (recursively, e.g. for the unit tests) > * all LaTeX temporary files recursively under Doc (e.g. *.aux, *.log) > > Is there anything else this should include? ?There are a few output > files created by the unit tests that we might want to include... This seems to work pretty well: #Ignore the build directory (and its sub-directories): build #Ignore backup files from some Unix editors, *~ #Ignore all compiled python files (e.g. from running the unit tests): *.pyc #The graphics unit tests produce output files for human inspection #(at the time of writing, only PDF files are created but I expect #this to change). Tests/Graphics/*.pdf Tests/Graphics/*.eps Tests/Graphics/*.svg Tests/Graphics/*.png I've uploaded this as part of one of my test branches on github, http://github.com/peterjc/biopython-seqio-quality/tree/master Peter From bugzilla-daemon at portal.open-bio.org Tue Mar 17 10:59:22 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 17 Mar 2009 06:59:22 -0400 Subject: [Biopython-dev] [Bug 2767] Bio.SeqIO support for FASTQ and QUAL files In-Reply-To: Message-ID: <200903171059.n2HAxMms006144@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2767 ------- Comment #10 from biopython-bugzilla at maubp.freeserve.co.uk 2009-03-17 06:59 EST ------- I've made these changes available on a test github branch, http://github.com/peterjc/biopython-seqio-quality/tree/master This doesn't include all the example files for the unit tests yet. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From tiagoantao at gmail.com Tue Mar 17 11:18:52 2009 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Tue, 17 Mar 2009 11:18:52 +0000 Subject: [Biopython-dev] biopython on github In-Reply-To: <320fb6e00903170146x59218aa0m857cab797ad3f440@mail.gmail.com> References: <320fb6e00902250210t2ad19536ke379e219ba6f7dae@mail.gmail.com> <8b34ec180903121620w9c2ec46i8fed9ccb4781370e@mail.gmail.com> <320fb6e00903130521s69c5b3eg55b71191b1e8ff21@mail.gmail.com> <128a885f0903142243r372026d7vdf5bbe998db3a326@mail.gmail.com> <20090315185443.GA30296@kunkel> <320fb6e00903160430h125d11a3jd100497d3e25ffb8@mail.gmail.com> <20090316224240.GA57054@sobchak.mgh.harvard.edu> <6d941f120903161711p71c7c940t1eabe933c0fa43e5@mail.gmail.com> <128a885f0903162045l474d0df3w2b8fad7f7f129a3b@mail.gmail.com> <320fb6e00903170146x59218aa0m857cab797ad3f440@mail.gmail.com> Message-ID: <6d941f120903170418k46481c8bj8c20d510314f57ee@mail.gmail.com> On Tue, Mar 17, 2009 at 8:46 AM, Peter wrote: > I think it is essential we have a clearly labeled official trunk > (perhaps with branches for releases), which will be used for all the > official releases (tar balls, zip files and windows installers). ?Our > main webpage should make this very clear. I agree. I would like to take this opportunity just to make my opinion clear (I normally tend to list hipothesis and refrain to give my own opinions). 1. I don't think there is a pressing need to go from CVS to whatever. While CVS is not perfect I don't think it has been a big hurdle. But if people want to go in that direction, I have no strong feelings against it also. 2. The hurdle was that _policy_ was too conservative: Some time ago it was not acceptable even to consider a development branch. That stiffles things (although it ensures stability which is good). Fortunately things are more negotiatable now. The point is: the main issues are policy, not technology. 3. Like it or not, different mechanisms (ie centralized versus distributed VCSs) facilitate different policies. Distributed version control facilitates branching to a massive degree. 4. I think a middle ground is a good idea: While there is an official distribution (eg that one that is labelled biopython 1.50 and that will end up on most users computers) which is agressively controled, there should be space for people to try out new things. 5. People that try out new things should be aware (to avoid disappointment) that their new code might not be accepted, for many reasons on the official trunk: not enough documentation, no test cases, design not acceptable, poorly-commented code, whatever. It would be very sad that people would start working on something, spend lots of time on their branch just to see their code refused to be on the "official" trunk. So, in my view things work like this: A. The "official" version on biopython.org is controlled by a "head honcho", currently Peter with input from biopython-dev. This is the version that most users will ever see in practice. B. The official version has a lot of quality enforcement on top. C. People should be free to branch away and try new things. D. People that branch away should be aware that their stuff might not be accepted on the official distribution. If they want it accepted they should come to biopython-dev and have a cup of tea with the community. E. Maybe some contact points should be defined for modules? F. People who want their code included in the "official" distribution should seriously think in branching from the "official" branch and not from any other. I would really like to see an "official" git branch which should be created, in my opinion from a stable release and either by Peter or Michiel (or any other long term CVS-write user). In my case I would branch to maintain some of the PopGen code. Tiago From lpritc at scri.ac.uk Tue Mar 17 12:19:28 2009 From: lpritc at scri.ac.uk (Leighton Pritchard) Date: Tue, 17 Mar 2009 12:19:28 +0000 Subject: [Biopython-dev] biopython on github In-Reply-To: <6d941f120903170418k46481c8bj8c20d510314f57ee@mail.gmail.com> Message-ID: On 17/03/2009 11:18, "Tiago Ant?o" wrote: > On Tue, Mar 17, 2009 at 8:46 AM, Peter > wrote: >> I think it is essential we have a clearly labeled official trunk >> (perhaps with branches for releases), which will be used for all the >> official releases (tar balls, zip files and windows installers). ?Our >> main webpage should make this very clear. > > I agree. > > I would like to take this opportunity just to make my opinion clear (I > normally tend to list hipothesis and refrain to give my own opinions). [...] +1 for Tiago's opinion. L. -- Dr Leighton Pritchard MRSC D131, Plant Pathology Programme, SCRI Errol Road, Invergowrie, Perth and Kinross, Scotland, DD2 5DA e:lpritc at scri.ac.uk w:http://www.scri.ac.uk/staff/leightonpritchard gpg/pgp: 0xFEFC205C tel:+44(0)1382 562731 x2405 ______________________________________________________ SCRI, Invergowrie, Dundee, DD2 5DA. The Scottish Crop Research Institute is a charitable company limited by guarantee. Registered in Scotland No: SC 29367. Recognised by the Inland Revenue as a Scottish Charity No: SC 006662. DISCLAIMER: This email is from the Scottish Crop Research Institute, but the views expressed by the sender are not necessarily the views of SCRI and its subsidiaries. This email and any files transmitted with it are confidential to the intended recipient at the e-mail address to which it has been addressed. It may not be disclosed or used by any other than that addressee. If you are not the intended recipient you are requested to preserve this confidentiality and you must not use, disclose, copy, print or rely on this e-mail in any way. Please notify postmaster at scri.ac.uk quoting the name of the sender and delete the email from your system. Although SCRI has taken reasonable precautions to ensure no viruses are present in this email, neither the Institute nor the sender accepts any responsibility for any viruses, and it is your responsibility to scan the email and the attachments (if any). ______________________________________________________ From biopython at maubp.freeserve.co.uk Tue Mar 17 12:44:05 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 17 Mar 2009 12:44:05 +0000 Subject: [Biopython-dev] biopython on github In-Reply-To: <6d941f120903170418k46481c8bj8c20d510314f57ee@mail.gmail.com> References: <320fb6e00902250210t2ad19536ke379e219ba6f7dae@mail.gmail.com> <320fb6e00903130521s69c5b3eg55b71191b1e8ff21@mail.gmail.com> <128a885f0903142243r372026d7vdf5bbe998db3a326@mail.gmail.com> <20090315185443.GA30296@kunkel> <320fb6e00903160430h125d11a3jd100497d3e25ffb8@mail.gmail.com> <20090316224240.GA57054@sobchak.mgh.harvard.edu> <6d941f120903161711p71c7c940t1eabe933c0fa43e5@mail.gmail.com> <128a885f0903162045l474d0df3w2b8fad7f7f129a3b@mail.gmail.com> <320fb6e00903170146x59218aa0m857cab797ad3f440@mail.gmail.com> <6d941f120903170418k46481c8bj8c20d510314f57ee@mail.gmail.com> Message-ID: <320fb6e00903170544i401fefa4gbfa2b2d542e94816@mail.gmail.com> 2009/3/17 Tiago Ant?o : > On Tue, Mar 17, 2009 at 8:46 AM, Peter wrote: >> I think it is essential we have a clearly labeled official trunk >> (perhaps with branches for releases), which will be used for all the >> official releases (tar balls, zip files and windows installers). ?Our >> main webpage should make this very clear. > > I agree. > > I would like to take this opportunity just to make my opinion clear (I > normally tend to list hipothesis and refrain to give my own opinions). > > 1. I don't think there is a pressing need to go from CVS to whatever. > While CVS is not perfect I don't think it has been a big hurdle. But > if people want to go in that direction, I have no strong feelings > against it also. On a purely pragmatic level, yes, CVS has been enough. This is one real reason why there hasn't been a great deal of pressure on us to move - it wasn't "broken" for how Biopython worked, although it does make branching non-trivial. Moving from CVS to a distributed version control system (DVCS) won't make much difference for those of us with CVS access - the big benefit as I see it is for potential contributors who can easily make a branch to try out their ideas, and keep it in sync with the master branch. This could transform how new modules or bug fixes get contributed, hopefully for the better. > 2. The hurdle was that _policy_ was too conservative: Some time ago it > was not acceptable even to consider a development branch. That > stiffles things (although it ensures stability which is good). > Fortunately things are more negotiatable now. The point is: the main > issues are policy, not technology. Historically Biopython has worked from a single stable branch (Brad - can you comment about the history of this effective policy?). I recall saying something in the last year or so about not wanting to do any branching in CVS while the SVN migration seemed imminent, but this was primarily to avoid any complication in the migration itself, rather than any deep objection to branches themselves. > 3. Like it or not, different mechanisms (ie centralized versus > distributed VCSs) facilitate different policies. Distributed version > control facilitates branching to a massive degree. True. > 4. I think a middle ground is a good idea: While there is an official > distribution (eg that one that is labelled biopython 1.50 and that > will end up on most users computers) which is agressively controled, > there should be space for people to try out new things. I'm not quite sure what you mean by agressively controlled. Moving to a DVCS really should make public experimental branches much easier. > 5. People that try out new things should be aware (to avoid > disappointment) that their new code might not be accepted, for many > reasons on the official trunk: not enough documentation, no test > cases, design not acceptable, poorly-commented code, whatever. It > would be very sad that people would start working on something, spend > lots of time on their branch just to see their code refused to be on > the "official" trunk. That is a risk - especially if anyone were to go off and work in complete isolation without even posting anything to this mailing list. > So, in my view things work like this: > A. The "official" version on biopython.org is controlled by a "head > honcho", currently Peter with input from biopython-dev. This is the > version that most users will ever see in practice. That could work - although having anyone as a single bottle neck is a risk, assuming you get someone to agree to the role in the first place ;) I am generally happy with the current arrangement where module owners have a degree of autonomy over their modules. I wouldn't want to have to approve every single minor change you (Tiago) make to Bio.PopGen - but I suppose occasional review and merging of code from Tiago's branch on request wouldn't be too onerous. > B. The official version has a lot of quality enforcement on top. What does that mean? e.g. a strict policy about unit tests before anything goes into the main branch? > C. People should be free to branch away and try new things. Given the Biopython license (as Leighton pointed out) this is already the case with CVS. Its just using a DVCS makes should this easier, especially for keeping branches in sync with the official branch, and hopefully for any merges back. > D. People that branch away should be aware that their stuff might not > be accepted on the official distribution. If they want it accepted > they should come to biopython-dev and have a cup of tea with the > community. I agree. I like tea. > E. Maybe some contact points should be defined for modules? Do you mean something more explicit about documenting who currently maintains each module? > F. People who want their code included in the "official" distribution > should seriously think in branching from the "official" branch and not > from any other. I agree. > I would really like to see an "official" git branch which should be > created, in my opinion from a stable release and either by Peter or > Michiel (or any other long term CVS-write user). I think we'll have that - and in the short term the CVS mirror on github can be used. > In my case I would branch to maintain some of the PopGen code. Great. Peter From chapmanb at 50mail.com Tue Mar 17 12:49:30 2009 From: chapmanb at 50mail.com (Brad Chapman) Date: Tue, 17 Mar 2009 08:49:30 -0400 Subject: [Biopython-dev] biopython on github In-Reply-To: References: <320fb6e00903170146x59218aa0m857cab797ad3f440@mail.gmail.com> Message-ID: <20090317124930.GE57054@sobchak.mgh.harvard.edu> Hi everyone; Nice to see the discussion around trying out git. Leighton and Tiago, you both brought up some definite concerns in moving to a distributed version control system. Git aims to help solve the problem of a them versus us community. When you read posts critical of Biopython, you will find a lot of complaints about "they didn't do this." This is confusing, as anyone using, coding with, interested in, or contributing to Biopython is a member of the community. CVS can help create this division, since it appears as a walled off repository only the core developers can access. Git frees up the source code and lowers this barrier to contributing. Now instead of saying "why didn't the developers integrate the code I sent to the mailing list and write tests and documentation for it," we can all turn the question back on ourselves and ask why we didn't create a branch with our new contribution and do it, soliciting help from others in Biopython. With solving the problems come potential concerns. This coincidental blog post from yesterday intelligently covers a lot of the issues: http://www.pointy-stick.com/blog/2009/03/16/dark-side-distributed-version-control/ The one we should be most concerned about is fragmentation. The community of Python coders in bioinformatics is too small to be split up; surely we are better served by resolving any differences and producing one high quality reusable code base. Tiago's assessment of how things should work practically looks exactly right. Hard working core developers, like Peter and Michiel, will be maintaining the trunk which we roll releases off of. Contributors can either submit patches as now, or create short branches which get merged back in. The advantage of branches is that others can test and develop the branched code, and that the software should help deal with some of the pain of merging. There is a lot of good material in this thread for new potential developers. Tiago, it would make sense to condense what you've written and include it with the Contributing guide: http://biopython.org/wiki/Contributing We should also create a place on the wiki from the developer documentation: http://biopython.org/wiki/Documentation#Documentation_for_Developers that describes active development branches and their goals (called, say, ActiveBranches). Tiago, I thought you did a page for PopGen earlier like this but I can't find it right now. We should keep communication at a high level to avoid confusing fragmentation. This is a difficult change in terms of how things work; we are asking the right questions to create a good environment for improvement. Brad > Hi all, > > This has been an occasionally frustrating thread to read... > > On 17/03/2009 08:46, "Peter" wrote: > > > On Tue, Mar 17, 2009 at 3:45 AM, Chris Lasher wrote: > >> 2009/3/16 Tiago Ant?o > >> > > >>> How is the "official" biopython trunk controlled? Currently what is on > >>> CVS is the gospel and Peter and Michiel essencially have control of > >>> what is there and what is labelled as a "biopython distribution". How > >>> will this work now? > > >> In a distributed workflow, there is no technical official repository. The > >> "official repository" is socially enforced. > > That was true before. Unless I misread the Biopython licencing, there was > no real barrier to putting a branched copy of the code on your own > server/site, with your own modifications. What git does is provide tools to > make merging of that sort of code easier (along with a number of of other > nice features, such as authentication of contributions). The presence of > git does not ensure that your changes, or anyone else's, will be merged with > any other repository, and nor does it ensure the quality of contributed > code. Git, while nice, and ideal for a number of tasks, is no magic bullet. > > To an extent, the 'official' repository is, pragmatically, the one that is > most stable and well-tested. If my hypothetical branched version had become > more stable and widely-used than the 'official' trunk, and become the most > frequently downloaded and implemented, and received new contributions in its > own right, it might then be considered de facto 'the distribution'; nasty > online spats with the original authors notwithstanding. The 'social > enforcement' of politeness (i.e. *I* don't take credit for *your* work) > prevents this to an extent, as it ought to under any versioning system. > > There's a competing tendency to consider that the coders who spent the most > time creating the code understand it the best, and are in the best position > to maintain it directly. This is true to a large degree, and entirely > applicable to Biopython's contributed modules. git can potentially > facilitate that sort of contribution to the 'official' trunk in a way that > CVS can't, due to its permissions bottleneck. However, the mechanics of > incorporating that contributed code are more or less the same: the people > with control of the 'official' trunk review the code and decide whether to > include it. This is true whether the code is submitted as a patch to > Bugzilla, emailed to a developer, put up on public CVS on your site, or in a > forked git repository. The same is true of your own git repository - you > don't have to include someone else's forked code if you don't want to. > > What possibly needs to change is not the version control system, but the way > in which people think about their contribution. Contributions can be made > productively under any versioning system, and the key questions remain the > same in all cases: Does the new code work (are there tests)? Does the new > code break any old code? Is there documentation? Is the API consistent? > > "What version control system are we using?" is a minor detail, unless it is > inherently broken, hinders any of the above, or causes some other > deal-breaking issue (for Linus Torvalds, this included speed issues for > merges). > > >> I think Michiel and Peter still head the Biopython project--at > >> least they have the most clout, I would say. Therefore, we will probably > >> look to one of their branches as the "official" branch of Biopython. When > >> one of them wants to step down in duty, we will socially pass the torch on > >> to the next taker. > > It has always been thus. Now, instead of passing on the user authentication > to the CVS server at OBF, the user authentication to the biopython github > account will be passed on, instead: > > > I think it is essential we have a clearly labeled official trunk > > (perhaps with branches for releases), which will be used for all the > > official releases (tar balls, zip files and windows installers). Our > > main webpage should make this very clear. > > > > We could potentially continue to have a shared official branch (e.g. > > belonging to the generic github biopython user), and give all the > > existing CVS contributors write access - and continue to manage this > > as before. So for example, if Frank wanted to check in some minor > > changes to Bio.Nexus he could just do it. Future contributors > > patches/branches might get taken up by a developer on a personal > > branch for testing, before being merged into the official branch. > > > > i.e. We can initially continue as before - right now I don't have a > > feel for how much work the role of an official branch maintainer would > > be, and it is difficult to guess without more hands on experience > > using the new tools. > > Plus ca change (avec git)... > > >>> The second question, related to the first is how will different > >>> branches (of different persons) be managed? I am seeing people > >>> starting working on the same code in different directions and then > >>> having problems merging everything together. > >> > >> People are supposed to work in different directions; this is the point of > >> distributed workflows. > > I may have a different understanding of 'different directions' than you > mean, but I don't think that it's good for a community project if people > work in different directions. I also don't think that that is the point of > distributed workflows; on the contrary, I think that they are intended to > make it easier to work independently towards a common goal. Even if that is > by working on loosely- or non-interacting parts of the whole. > > >> Merging tends not to be so difficult, and compared to > >> centralized models like CVS and SVN, it's a cinch. We will help provide > >> documentation for proper merging habits (e.g., merge early, merge often, and > >> no rebasing after pushing, etc.). There are also screencasts popping up (in > >> particular Scott Chacon's re-make of his Gitcasts, now at learn.github) that > >> we will link to for educational purposes. > >> And of course, other developers will be around to help out in tricky merges. > > This characterises one of the frustrating aspects of this thread (not > getting at you personally, Chris) - the occasional implicit assumption that > 'things will be inherently *better* if we use git'. Developers are around > to help now, even using CVS (which also has clear, long-standing stable > documentation - and even an O'Reilly book). Several people don't seem to > think that that - and the way that code is reviewed and incorporated into > the main distribution - is good enough, and I don't think that this will > change just because the version control system has changed. Nor will > changing revision control system generate significant free time to write, > test and document code. But we may have the recession to do that last one > for us. > > > Well, yes, in theory we have the same problem now with CVS - and while > > the tools may make merging easier, some communication is essential > > when working on the key modules which impact large parts of the code > > base. > > I would put it more strongly than that: communication is essential in all > aspects of the project. A number of related blog posts make statements > along the lines of "I don't use Biopython, or post to the mailing lists, but > I think that they're doing *this* wrong", or "I submitted code, but it > didn't get taken up immediately". Now, venting and ranting on a blog is > fine, but it's not really *communicating*, any more than it was when I > thought that the BioSQL GenBank upload code was broken, fixed it (for my > purposes) and told no-one. Git won't change the communication issue (in > either direction) any more than it changes the code review process. > > FWIW, I think that git looks like a good way to go, and that it could help > encourage people to make local modifications of Biopython for their own > benefit and in their own interests and expert area, in a way that is visible > to the core distribution (unlike the patch submission process that is now > implemented). In that way it could facilitate more rapid expansion of the > core distribution. However, the bottlenecks of ensuring code quality, > testing and documentation will only ease if that is taken up by the > individuals/groups making those contributions, in addition to the core > developers. > > And yes, I know I'm late with the new GenomeDiagram docs... ;) > > L. > > -- > Dr Leighton Pritchard MRSC > D131, Plant Pathology Programme, SCRI > Errol Road, Invergowrie, Perth and Kinross, Scotland, DD2 5DA > e:lpritc at scri.ac.uk w:http://www.scri.ac.uk/staff/leightonpritchard > gpg/pgp: 0xFEFC205C tel:+44(0)1382 562731 x2405 > > > ______________________________________________________ > SCRI, Invergowrie, Dundee, DD2 5DA. > The Scottish Crop Research Institute is a charitable company limited by guarantee. > Registered in Scotland No: SC 29367. > Recognised by the Inland Revenue as a Scottish Charity No: SC 006662. > > > DISCLAIMER: > > This email is from the Scottish Crop Research Institute, but the views expressed by the sender are not necessarily the views of SCRI and its subsidiaries. This email and any files transmitted with it are confidential to the intended recipient at the e-mail address to which it has been addressed. It may not be disclosed or used by any other than that > addressee. > If you are not the intended recipient you are requested to preserve this confidentiality and you must not use, disclose, copy, print or rely on > this e-mail in any way. Please notify postmaster at scri.ac.uk quoting the name of the sender and delete the email from your system. > > Although SCRI has taken reasonable precautions to ensure no viruses are present in this email, neither the Institute nor the sender accepts any responsibility for any viruses, and it is your responsibility to scan the email and the attachments (if any). > ______________________________________________________ > > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev From tiagoantao at gmail.com Tue Mar 17 13:10:18 2009 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Tue, 17 Mar 2009 13:10:18 +0000 Subject: [Biopython-dev] biopython on github In-Reply-To: <320fb6e00903170544i401fefa4gbfa2b2d542e94816@mail.gmail.com> References: <320fb6e00902250210t2ad19536ke379e219ba6f7dae@mail.gmail.com> <128a885f0903142243r372026d7vdf5bbe998db3a326@mail.gmail.com> <20090315185443.GA30296@kunkel> <320fb6e00903160430h125d11a3jd100497d3e25ffb8@mail.gmail.com> <20090316224240.GA57054@sobchak.mgh.harvard.edu> <6d941f120903161711p71c7c940t1eabe933c0fa43e5@mail.gmail.com> <128a885f0903162045l474d0df3w2b8fad7f7f129a3b@mail.gmail.com> <320fb6e00903170146x59218aa0m857cab797ad3f440@mail.gmail.com> <6d941f120903170418k46481c8bj8c20d510314f57ee@mail.gmail.com> <320fb6e00903170544i401fefa4gbfa2b2d542e94816@mail.gmail.com> Message-ID: <6d941f120903170610g161342f0ief365d68f25707c1@mail.gmail.com> Hi, > I'm not quite sure what you mean by agressively controlled. ?Moving to > a DVCS really should make public experimental branches much easier. I mean that the official release is a very controlled (a good thing!). Development branches should be more free. > That is a risk - especially if anyone were to go off and work in > complete isolation without even posting anything to this mailing list. I think our obligation is to inform people of the issue. If then people go away and don't communicate, then it becomes their problem. I think just a couple of sentences on the Contributing page on the wiki would be more than enough. > That could work - although having anyone as a single bottle neck is a > risk, assuming you get someone to agree to the role in the first place > ;) ?I am generally happy with the current arrangement where module > owners have a degree of autonomy over their modules. ?I wouldn't want > to have to approve every single minor change you (Tiago) make to > Bio.PopGen - but I suppose occasional review and merging of code from > Tiago's branch on request wouldn't be too onerous. I agree. I am just trying to make this "explicit" policy. So that everybody knows the rules of the game. If people dont agree than that should be discussed and changed. But the point is, these kind of management issues should be written down somewhere in a transparent way. >> B. The official version has a lot of quality enforcement on top. > > What does that mean? ?e.g. a strict policy about unit tests before > anything goes into the main branch? I was reading http://biopython.org/wiki/Contributing and the main stuff is already there (the "submitting code" place). But the point is: the official version should be stable and reliable (as it is now, IMHO) >> E. Maybe some contact points should be defined for modules? > > Do you mean something more explicit about documenting who currently > maintains each module? That is my point. Makes any sense? From chapmanb at 50mail.com Tue Mar 17 13:04:53 2009 From: chapmanb at 50mail.com (Brad Chapman) Date: Tue, 17 Mar 2009 09:04:53 -0400 Subject: [Biopython-dev] Preparing for Biopython 1.50 (beta) In-Reply-To: <49BEDD9B.6030905@u.washington.edu> References: <320fb6e00903160516yd63f61fu21ca7560562dd6dd@mail.gmail.com> <20090316225558.GC57054@sobchak.mgh.harvard.edu> <49BEDD9B.6030905@u.washington.edu> Message-ID: <20090317130453.GF57054@sobchak.mgh.harvard.edu> Hi David; > I've got some 454 and Solid data you could test it on too. > > Has anybody else looked into how these other two Next Gen formats might > complicate things? Sweet. We definitely want to support output from them as well; it is great to have someone on board who is working with data from other machines. Peter did a pretty thorough investigation of the different formats and wrote it up in the docs to the proposed QualityIO module: http://github.com/peterjc/biopython-seqio-quality/blob/6fdf27393cb7318b229ff8587721e83544da968d/Bio/SeqIO/QualityIO.py Does this make sense with your experience? If you feel comfortable with git, Peter set up a new branch with his code for this: http://github.com/peterjc/biopython-seqio-quality/tree/master and we'd be more than happy to have you testing it. Alternatively, if you want to submit some smaller data files we can use in testing, you could attach them to the current enhancement request: http://bugzilla.open-bio.org/show_bug.cgi?id=2767 Thanks for the help, Brad > > Brad Chapman wrote: > > Peter; > > > > > >> I think we should probably do another release soon > >> > > > > Good call. +1 from me. > > > > > >> I'd like to include the following changes as part of the beta, but it > >> would be sensible to have someone else try these out first. Any > >> volunteers? > >> > >> Bug 2767 - Bio.SeqIO support for FASTQ and QUAL files > >> > > > > The code for this looked good when I reviewed it earlier. I will > > test it out with some solexa reads from here this week; any reason > > not to check the patch and files into CVS? Then I can fire up my > > coal-powered revision control system, feed two punch cards into the > > mouth of the machine, hope the vacuum tubes don't burn out again, > > and check it out locally. > > > > Brad > > _______________________________________________ > > Biopython-dev mailing list > > Biopython-dev at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/biopython-dev > > > begin:vcard > fn:David Schruth > n:Schruth;David > org:University of Washington, Department of Oceanography;The Center for Environmental Genomics > adr;dom:616 NE Northlake Place;;Benjamin Hall IRB, Room 306;Seattle;WA;98105 > email;internet:dschruth at u.washington.edu > title:Bioinformatics Research Consultant > tel;work:(206) 328-7381 > tel;cell:(206) 250-9110 > x-mozilla-html:FALSE > url:http://armbrustlab.ocean.washington.edu/people/schruth > version:2.1 > end:vcard > From tiagoantao at gmail.com Tue Mar 17 13:19:38 2009 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Tue, 17 Mar 2009 13:19:38 +0000 Subject: [Biopython-dev] biopython on github In-Reply-To: <20090317124930.GE57054@sobchak.mgh.harvard.edu> References: <320fb6e00903170146x59218aa0m857cab797ad3f440@mail.gmail.com> <20090317124930.GE57054@sobchak.mgh.harvard.edu> Message-ID: <6d941f120903170619n4cb8d4dfr8a72f8ac1e0e896d@mail.gmail.com> On Tue, Mar 17, 2009 at 12:49 PM, Brad Chapman wrote: > There is a lot of good material in this thread for new potential > developers. Tiago, it would make sense to condense what you've > written and include it with the Contributing guide: > > http://biopython.org/wiki/Contributing I can go ahead and try to put a summary of our discussions on that page, if nobody opposes. The change can be rewritten afterwards or deleted anyway. The only issue is that I can only to that on the weekend and not before (travelling abroad from Wednsday to Friday). What I think is needed is actually a final decision on how thigs will progress. Will there be an official git branch? The official will still be cvs? Where will it be hosted? These are lots of important questions, but I think there is enough discussion to arrive at a decision. > (called, say, ActiveBranches). Tiago, I thought you did a page for PopGen > earlier like this but I can't find it right now. We should keep > communication at a high level to avoid confusing fragmentation. Coincidentally I was editing that page today. I took the liberty of creating a link from the documentation page to it. So it should be reachable now. Tiago From p.j.a.cock at googlemail.com Tue Mar 17 14:44:08 2009 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Tue, 17 Mar 2009 14:44:08 +0000 Subject: [Biopython-dev] biopython on github In-Reply-To: <6d941f120903170619n4cb8d4dfr8a72f8ac1e0e896d@mail.gmail.com> References: <320fb6e00903170146x59218aa0m857cab797ad3f440@mail.gmail.com> <20090317124930.GE57054@sobchak.mgh.harvard.edu> <6d941f120903170619n4cb8d4dfr8a72f8ac1e0e896d@mail.gmail.com> Message-ID: <320fb6e00903170744j543f643fg6ef8d677287e2361@mail.gmail.com> 2009/3/17 Tiago Ant?o : > I can go ahead and try to put a summary of our discussions on that > page, if nobody opposes. The change can be rewritten afterwards or > deleted anyway. The only issue is that I can only to that on the > weekend and not before (travelling abroad from Wednsday to Friday). Sure - by the weekend I hope we'll have come to a consensus. > What I think is needed is actually a final decision on how thigs will > progress. Will there be an official git branch? The official will > still be cvs? Where will it be hosted? These are lots of important > questions, but I think there is enough discussion to arrive at a > decision. I think it is still to early for a final decision, but here is my suggested plan: In the short term (at least until Biopython 1.50 beta is out, perhaps until Biopython 1.50 proper is out), CVS will remain the official repository. Bartek will continue automatically updating the mirrored copy on github, which will otherwise be treated as READ ONLY. If needs be, he may have to reimport the whole history (the tag issue troubles me - see the other thread), so there may be some bumps along this road. Contributions/bug fixes can continue via bugzilla with a patch, and contributors can also try providing a URL to their own git branch if they prefer. During this period I hope most (ideally all) our active developers with CVS access will create an account on github, and try out forking from the CVS mirror, creating their own branches, checking in some changes, and doing some simple merges - for example pulling code from other Biopython developer's public branches. This should give us the confidence to trust git and github enough to use it for real. i.e. For the roughly the next month, we will continue as before with CVS for the real work, but will also try out github. Once Biopython 1.50 final is out (hopefully by the end of April 2009, probably sooner), we need to decide if we will actually make the more to git on github. At this point, I would expect this to happen by declaring CVS read only, a static archive (and emergency fall back). Bartek would turn off his automatic syncing. We would then continue working on the github branch with the full CVS history, with a core of Biopython developers having write access to the "official" branch, doing new work under their own personal branches for eventual merging into the main trunk. I'd still like to have a copy of the "official" git repository running on biopython.org, but this may not be that easy without some technical expertise in house to do this. From initial discussion with the OBF team about the idea of running git on their servers, my impression is if we can do it ourselves, we may. Jason Stajich actually suggested we use github independently. Peter P.S. Could you all update your entry on the wiki participants page (and if you have one, your wiki user page) to include a link to your github account: http://biopython.org/wiki/Participants From biopython at maubp.freeserve.co.uk Tue Mar 17 14:46:53 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 17 Mar 2009 14:46:53 +0000 Subject: [Biopython-dev] Preparing for Biopython 1.50 (beta) In-Reply-To: <49BEDD9B.6030905@u.washington.edu> References: <320fb6e00903160516yd63f61fu21ca7560562dd6dd@mail.gmail.com> <20090316225558.GC57054@sobchak.mgh.harvard.edu> <49BEDD9B.6030905@u.washington.edu> Message-ID: <320fb6e00903170746g632f56a5hfae8a4960e77fa85@mail.gmail.com> 2009/3/16 David Schruth : > I've got some 454 and Solid data you could test it on too. > > Has anybody else looked into how these other two Next Gen formats might > complicate things? Roche 454 sequencers produce their own binary SFF files (standing for sequence file format?), but they provide tools which turn these into standard Sanger style files using PHRED qualities. In theory, we might be able to parse the SFF files directly, see for example http://blog.malde.org/index.php/2008/11/14/454-sequencing-and-parsing-the-sff-binary-format/ and the links given. In practice, most sequencing centers using Roche 454 will be happy to provide FASTQ or FASTA+QUAL files, and the code on Bug 2767 (or the associated experimental branch on github) should work fine on these. http://bugzilla.open-bio.org/show_bug.cgi?id=2767 You are free to try out the proposed code yourself now, but if you have some particular 454 files you'd like me to check, please email me (off the mailing list). If you can share some real data which we could include in Biopython for a unit test that would also be great (but unless you tell me this explicitly, I'll only make sure we can parse your files). Regarding SOLiD files, they work in colour space and I am under the impression that it doesn't make sense to convert them to sequence space until after doing the assembly or genome mapping (in colour space). See for example http://solidsoftwaretools.com/gf/project/mapreads/ i.e. It may not be appropriate to parse SOLiD reads into Biopython SeqRecord objects, and thus wouldn't belong in Bio.SeqIO. That isn't to say we wouldn't want a parser elsewhere in Biopython, perhaps under Bio.Sequencing would be best. Peter From biopython at maubp.freeserve.co.uk Tue Mar 17 14:57:49 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 17 Mar 2009 14:57:49 +0000 Subject: [Biopython-dev] biopython on github In-Reply-To: <20090316224240.GA57054@sobchak.mgh.harvard.edu> References: <320fb6e00902250210t2ad19536ke379e219ba6f7dae@mail.gmail.com> <8b34ec180902260526m3ff42f3x2a99a77d4d0fb928@mail.gmail.com> <320fb6e00902260600p5fb90241td1ded497c08cb901@mail.gmail.com> <128a885f0903121407g133ed8ctda57b21ff8adb70e@mail.gmail.com> <8b34ec180903121620w9c2ec46i8fed9ccb4781370e@mail.gmail.com> <320fb6e00903130521s69c5b3eg55b71191b1e8ff21@mail.gmail.com> <128a885f0903142243r372026d7vdf5bbe998db3a326@mail.gmail.com> <20090315185443.GA30296@kunkel> <320fb6e00903160430h125d11a3jd100497d3e25ffb8@mail.gmail.com> <20090316224240.GA57054@sobchak.mgh.harvard.edu> Message-ID: <320fb6e00903170757s183f6f59x40549f7e3a853f06@mail.gmail.com> On Mon, Mar 16, 2009 at 10:42 PM, Brad Chapman wrote: > Peter wrote: >> I'm thinking a news post on >> http://news.open-bio.org/news/category/obf-projects/biopython/ about >> version control would be a good idea at this point. ?How about this - > > This is great, and I would move the last paragraph describing > the Git repository to the beginning; start with what we are doing and > then describe the rationale. This should help for those with ADD, and > also give more prominent credit to Bartek, Giovanni and you for the > work that went into this. Good idea about the reordering - done, and published: http://news.open-bio.org/news/2009/03/biopython-and-version-control-systems/ It will also show up on http://biopython.org/wiki/News via the RSS feed. Peter From rodrigo_faccioli at uol.com.br Tue Mar 17 15:30:48 2009 From: rodrigo_faccioli at uol.com.br (Rodrigo faccioli) Date: Tue, 17 Mar 2009 12:30:48 -0300 Subject: [Biopython-dev] PDB Parser error Message-ID: <3715adb70903170830x61bb6e3bl4412a8cf1504d80c@mail.gmail.com> I built a relational database in PostgreSQL. This database stores some informations form PDB file. These informations are about its sequence, atoms and sbonds. Now, I'm building a parser for this my database which I want to load it in a biopython PDB parser structure. The idea is keep on whole my souce-code based in biopython PDB parser, because will be necessary to do some operations with these informations. So, I study the Bio.PDB directory and I read in the PDBPaerser.py file, its _parse_coordinates method where there is some methods about initialization structure. I run them in my code. However, is showing the message below. Traceback (most recent call last): File "src/testefcfrpPDB.py", line 32, in main() File "src/testefcfrpPDB.py", line 30, in main structure = FcfrpPDB.getPDBFile(id) File "/home/faccioli/workspace/blast/src/FcfrpPDB.py", line 67, in getPDBFile return fcfrpPDBParser.loadStructureFromDatabase(id) File "/home/faccioli/workspace/blast/src/FcfrpPDBParser.py", line 48, in loadStructureFromDatabase self._structure_builder.init_atom(D_Atoms[i].get_id(), D_Atoms[i].get_coord(), D_Atoms[i].get_bfactor(),D_Atoms[i].get_occupancy() ,D_Atoms[i].get_altloc(), D_Atoms[i].get_fullname(), D_Atoms[i].get_serial_number()) File "/usr/lib/python2.5/site-packages/biopython-1.49-py2.5-linux-i686.egg/Bio/PDB/StructureBuilder.py", line 182, in init_atom if residue.has_id(name): File "/usr/lib/python2.5/site-packages/biopython-1.49-py2.5-linux-i686.egg/Bio/PDB/Entity.py", line 96, in has_id return self.child_dict.has_key(id) TypeError: list objects are unhashable This post is my first post in biopython developer's list and I don't know what is the its process to send a code. Thanks for any help. -- Rodrigo Antonio Faccioli Ph.D Student in Electrical Engineering University of Sao Paulo - USP Engineering School of Sao Carlos - EESC Department of Electrical Engineering - SEL Intelligent System in Structure Bioinformatics http://laips.sel.eesc.usp.br Phone: 55 (16) 3373-9366 Ext 229 Curriculum Lattes - http://lattes.cnpq.br/1025157978990218 From lpritc at scri.ac.uk Tue Mar 17 15:42:55 2009 From: lpritc at scri.ac.uk (Leighton Pritchard) Date: Tue, 17 Mar 2009 15:42:55 +0000 Subject: [Biopython-dev] Preparing for Biopython 1.50 (beta) In-Reply-To: <320fb6e00903170746g632f56a5hfae8a4960e77fa85@mail.gmail.com> Message-ID: Hi, On 17/03/2009 14:46, "Peter" wrote: > 2009/3/16 David Schruth : >> I've got some 454 and Solid data you could test it on too. >> >> Has anybody else looked into how these other two Next Gen formats might >> complicate things? > Regarding SOLiD files, they work in colour space and I am under the > impression that it doesn't make sense to convert them to sequence > space until after doing the assembly or genome mapping (in colour > space). See for example > http://solidsoftwaretools.com/gf/project/mapreads/ i.e. It may not be > appropriate to parse SOLiD reads into Biopython SeqRecord objects, and > thus wouldn't belong in Bio.SeqIO. That isn't to say we wouldn't want > a parser elsewhere in Biopython, perhaps under Bio.Sequencing would be > best. That's my understanding and practical experience, too. For lurkers' benefit SOLiD data looks like this: >4_48_57_F3 T33111210002200023033000000211000101 >4_48_89_F3 T22002312223133113013303322223322223 >4_48_95_F3 T22300102100203322101021130203000201 where each of the four values (0,1,2,3) corresponds to one of 16 dimers (AA, AC, AG, AT, CA, ...), i.e. Each colour value is degenerate for four possible dimers. This system is described at http://www3.appliedbiosystems.com/cms/groups/mcb_marketing/documents/general documents/cms_057559.pdf. The use of an appropriate colour->dimer mapping makes it possible, in principle, to go from colour space to nucleotide sequence, so long as a single base of the sequence is known. In reality a single colour space read error silently makes the rest of the SOLiD read mapping incorrect. Practical use of SOLiD data involves mapping the sequence reads to a reference sequence (either by converting the reference to colour space, or dynamic programming) prior to conversion to 'base space'. The mapping process is probably better handled by dedicated applications, and I think the role for Biopython in this is to parse their output. GFF is, awkwardly enough, a popular output format for this kind of analysis. L. -- Dr Leighton Pritchard MRSC D131, Plant Pathology Programme, SCRI Errol Road, Invergowrie, Perth and Kinross, Scotland, DD2 5DA e:lpritc at scri.ac.uk w:http://www.scri.ac.uk/staff/leightonpritchard gpg/pgp: 0xFEFC205C tel:+44(0)1382 562731 x2405 ______________________________________________________ SCRI, Invergowrie, Dundee, DD2 5DA. The Scottish Crop Research Institute is a charitable company limited by guarantee. Registered in Scotland No: SC 29367. Recognised by the Inland Revenue as a Scottish Charity No: SC 006662. DISCLAIMER: This email is from the Scottish Crop Research Institute, but the views expressed by the sender are not necessarily the views of SCRI and its subsidiaries. This email and any files transmitted with it are confidential to the intended recipient at the e-mail address to which it has been addressed. It may not be disclosed or used by any other than that addressee. If you are not the intended recipient you are requested to preserve this confidentiality and you must not use, disclose, copy, print or rely on this e-mail in any way. Please notify postmaster at scri.ac.uk quoting the name of the sender and delete the email from your system. Although SCRI has taken reasonable precautions to ensure no viruses are present in this email, neither the Institute nor the sender accepts any responsibility for any viruses, and it is your responsibility to scan the email and the attachments (if any). ______________________________________________________ From biopython at maubp.freeserve.co.uk Tue Mar 17 16:01:25 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 17 Mar 2009 16:01:25 +0000 Subject: [Biopython-dev] PDB Parser error In-Reply-To: <3715adb70903170830x61bb6e3bl4412a8cf1504d80c@mail.gmail.com> References: <3715adb70903170830x61bb6e3bl4412a8cf1504d80c@mail.gmail.com> Message-ID: <320fb6e00903170901v6533910bl57ddd534dc05cf51@mail.gmail.com> On Tue, Mar 17, 2009 at 3:30 PM, Rodrigo faccioli wrote: > I built a relational database in PostgreSQL. This database stores some > informations form PDB file. These informations are about its sequence, atoms > and sbonds. Now, I'm building a parser for this my database which I want to > load it in a biopython PDB parser structure. The idea is ?keep on whole my > souce-code ?based in biopython PDB parser, because will be necessary to do > some operations with these informations. > > So, I study the Bio.PDB directory and I read in the PDBPaerser.py file, its > _parse_coordinates method where there is some methods about initialization > structure. I run them in my code. However, is showing the message below. > Traceback (most recent call last): > ?File "src/testefcfrpPDB.py", line 32, in > ... > ?File > "/usr/lib/python2.5/site-packages/biopython-1.49-py2.5-linux-i686.egg/Bio/PDB/Entity.py", > line 96, in has_id > ? ?return self.child_dict.has_key(id) > TypeError: list objects are unhashable > > This post is my first post in biopython developer's list and I don't know > what is the its process to send a code. Its hard to say without seeing your full code (and even then, without the database it would be difficult to reproduce it). As you have a TypeError, I suspect you have something as the wrong datatype - maybe a list that should be a string or something. If you want to share the full file testefcfrpPDB.py you could post it on http://pastebin.com/ or something (do you have your own website?). Peter From biopython at maubp.freeserve.co.uk Tue Mar 17 17:59:43 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 17 Mar 2009 17:59:43 +0000 Subject: [Biopython-dev] biopython on github In-Reply-To: <320fb6e00903170744j543f643fg6ef8d677287e2361@mail.gmail.com> References: <320fb6e00903170146x59218aa0m857cab797ad3f440@mail.gmail.com> <20090317124930.GE57054@sobchak.mgh.harvard.edu> <6d941f120903170619n4cb8d4dfr8a72f8ac1e0e896d@mail.gmail.com> <320fb6e00903170744j543f643fg6ef8d677287e2361@mail.gmail.com> Message-ID: <320fb6e00903171059r7a5528d5i19bf5fed9cfd8a63@mail.gmail.com> I wrote: > In the short term (at least until Biopython 1.50 beta is out, perhaps > until Biopython 1.50 proper is out), CVS will remain the official > repository. ?... During this period I hope most (ideally all) our active > developers with CVS access will create an account on github, and > try out forking from the CVS mirror, creating their own branches, > checking in some changes, and doing some simple merges - for > example pulling code from other Biopython developer's public > branches. This should give us the confidence to trust git and > github enough to use it for real. Brad and I have been trying this out in practice, and it seems to work OK. I started a fork to test the patches for Bug 2767, adding quality parsers to Bio.SeqIO, http://github.com/chapmanb/biopython-seqio-quality/tree/master I made a few incremental checkins, pushed to github one by one. Brad then took a fork of this in order to make some minor changes and fix a typo in the documentation : http://github.com/chapmanb/biopython-seqio-quality/tree/master At this point the "network" diagrams showed up the two branches as diverging. Brad then sent me a "pull" request, suggesting I might want to pull his work into my branch. Using the git command line tool, I was able to pull and merge Brad's changes (as I had made no changes in the meantime this could be done automatically), and then push the merged version back up to github on my branch. At this point my branch and brad's agreed once again, and the "network" diagram no longer shows both. Note that my branch now includes a commit from Brad. At this point, Brad may choose to delete his branch, or perhaps make further changes. Now all this worked, but I was wondering if the github web interface could have simplified any of this, if I'd only know where to click. For example, does github offer any way to view a diff between to branches? Or, as I suspect, do they simply expect you to use the git tools directly for this? Peter From bugzilla-daemon at portal.open-bio.org Tue Mar 17 18:06:00 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 17 Mar 2009 14:06:00 -0400 Subject: [Biopython-dev] [Bug 2767] Bio.SeqIO support for FASTQ and QUAL files In-Reply-To: Message-ID: <200903171806.n2HI60op012464@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2767 ------- Comment #11 from biopython-bugzilla at maubp.freeserve.co.uk 2009-03-17 14:06 EST ------- (In reply to comment #10) > I've made these changes available on a test github branch, > http://github.com/peterjc/biopython-seqio-quality/tree/master > > This doesn't include all the example files for the unit tests yet. > I've now checked this into CVS. The extra example files will follow later... leaving this bug open until that is done. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From dalloliogm at gmail.com Tue Mar 17 18:35:04 2009 From: dalloliogm at gmail.com (Giovanni Marco Dall'Olio) Date: Tue, 17 Mar 2009 19:35:04 +0100 Subject: [Biopython-dev] biopython on github In-Reply-To: <320fb6e00903171059r7a5528d5i19bf5fed9cfd8a63@mail.gmail.com> References: <320fb6e00903170146x59218aa0m857cab797ad3f440@mail.gmail.com> <20090317124930.GE57054@sobchak.mgh.harvard.edu> <6d941f120903170619n4cb8d4dfr8a72f8ac1e0e896d@mail.gmail.com> <320fb6e00903170744j543f643fg6ef8d677287e2361@mail.gmail.com> <320fb6e00903171059r7a5528d5i19bf5fed9cfd8a63@mail.gmail.com> Message-ID: <5aa3b3570903171135nb49de80h6c6ee0930c147d29@mail.gmail.com> On Tue, Mar 17, 2009 at 6:59 PM, Peter wrote: > > Brad and I have been trying this out in practice, and it seems to work OK. > > I started a fork to test the patches for Bug 2767, adding quality > parsers to Bio.SeqIO, > http://github.com/chapmanb/biopython-seqio-quality/tree/master > I made a few incremental checkins, pushed to github one by one. > > Brad then took a fork of this in order to make some minor changes and > fix a typo in the documentation : > http://github.com/chapmanb/biopython-seqio-quality/tree/master Yes, basically this is the way it should be working. Usually I do something similar, only I use more the procedure explained here: - http://www.kernel.org/pub/software/scm/git/docs/v1.4.4.4/tutorial.html (section 'Using git for collaboration') I fetch the other branch and call it as master:otheruser-incoming, then compare the two branches with gitk or with git log master..otheruser-incoming. > > > At this point the "network" diagrams showed up the two branches as > diverging. ?Brad then sent me a "pull" request, suggesting I might > want to pull his work into my branch. > > Using the git command line tool, I was able to pull and merge Brad's > changes (as I had made no changes in the meantime this could be done > automatically), If you go on 'Fork Queue' on github, it should show other people's commits. However, I don't trust doing this with a web interface... moreover, it seems to not work properly some times (it is not clear how it defines if a commit will 'apply cleanly' or not) On the same page, there is a 'pull merge request' button, which (I never tried it) should send a merge request to the selected recipents. > and then push the merged version back up to github on > my branch. ?At this point my branch and brad's agreed once again, and > the "network" diagram no longer shows both. ?Note that my branch now > includes a commit from Brad. Yes, this is right. The graph only shows the commits which differ, so it included your two branches as a single one. If you fell comfortable with the git mechanisms, maybe later you could create a second branch in the 'biopython/biopython' repository, and call it 'accepted-github-changes', or something like that, which will collect all the changes that can be submitted to the cvs. > At this point, Brad may choose to delete his branch, or perhaps make > further changes. I wonder if a good strategy with this is create branches only to test specific changes, and then delete them. If Brad keeps his branch, later he will have to remember to update it, which maybe is less trouble than deleting a branch and creating it when necessary. > Now all this worked, but I was wondering if the github web interface > could have simplified any of this, if I'd only know where to click. > For example, does github offer any way to view a diff between to > branches? ?Or, as I suspect, do they simply expect you to use the git > tools directly for this? For my knowledge, there are not such tools :-(. You must rely on the commit's messages to identify the differences between different branches. Maybe they will implement such feature at some point. > > Peter > > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev -- My blog on bioinformatics (now in English): http://bioinfoblog.it From dalloliogm at gmail.com Tue Mar 17 18:36:24 2009 From: dalloliogm at gmail.com (Giovanni Marco Dall'Olio) Date: Tue, 17 Mar 2009 19:36:24 +0100 Subject: [Biopython-dev] biopython on github In-Reply-To: <320fb6e00903170744j543f643fg6ef8d677287e2361@mail.gmail.com> References: <320fb6e00903170146x59218aa0m857cab797ad3f440@mail.gmail.com> <20090317124930.GE57054@sobchak.mgh.harvard.edu> <6d941f120903170619n4cb8d4dfr8a72f8ac1e0e896d@mail.gmail.com> <320fb6e00903170744j543f643fg6ef8d677287e2361@mail.gmail.com> Message-ID: <5aa3b3570903171136k3dc616a3hc937381d940cd305@mail.gmail.com> 2009/3/17 Peter Cock > 2009/3/17 Tiago Ant?o : > > I'd still like to have a copy of the "official" git repository running > on biopython.org, but this may not be that easy without some technical > expertise in house to do this. From initial discussion with the OBF > team about the idea of running git on their servers, my impression is > if we can do it ourselves, we may. Jason Stajich actually suggested > we use github independently. Well, basically it is not strictly necessary to have git installed on their computers to create a mirror. You can just create the clone on your computer, raw-ly copy the files there, and then you will be able to push the new changes with an ssh access. Since git is a distributed source control system, it doesn't require to configure a server part as with cvs :-) To my knowledge, the pygr project (also a bioinformatics suite in python) have an official repository hosted in gitourious, and a mirror in github to collect patches from there. -- My blog on bioinformatics (now in English): http://bioinfoblog.it From tiagoantao at gmail.com Tue Mar 17 19:09:13 2009 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Tue, 17 Mar 2009 19:09:13 +0000 Subject: [Biopython-dev] biopython on github In-Reply-To: <5aa3b3570903171136k3dc616a3hc937381d940cd305@mail.gmail.com> References: <320fb6e00903170146x59218aa0m857cab797ad3f440@mail.gmail.com> <20090317124930.GE57054@sobchak.mgh.harvard.edu> <6d941f120903170619n4cb8d4dfr8a72f8ac1e0e896d@mail.gmail.com> <320fb6e00903170744j543f643fg6ef8d677287e2361@mail.gmail.com> <5aa3b3570903171136k3dc616a3hc937381d940cd305@mail.gmail.com> Message-ID: <6d941f120903171209g751b5b86p797e79b333972301@mail.gmail.com> OK, in order to exercise and try github development I have forked a branch to work on the PopGen code. The idea of the branch is to serve as a platform for merging with the "official" branch. So, the idea is: 1. Official branch - The stable thingy 2. PopGen stabilizer branch - The place to merge contributions from PopGen development branches. The idea is that people can go crazy on their own branches and this intermediate one serves as a point to stabilize (unit test, documentation, QA, ...) before the commit to the official one 3. Crazy branches - Develop your crazy idea. I have 3 ideas myself: One for Jason's structure code, one for my LDNe code and another for statistics. Many more welcomed.... The development procedure would be like this: A. People would have all the fun on their development branches B. When they felt confident they would submit their code to the stabilizer branch, where we would check that all the important things were there: unit test, code comments, QA, documentation C. When things were in good shape, we would propose changes to the official branch And, by the way, bug fixes of existing production would also be done on the stabilizer branch. Does this make any sense? In my view, with things like git, a policy like this encourages both innovation while preserving stability and robustness of the official branch. Tiago On Tue, Mar 17, 2009 at 6:36 PM, Giovanni Marco Dall'Olio wrote: > > > 2009/3/17 Peter Cock >> >> 2009/3/17 Tiago Ant?o : >> >> I'd still like to have a copy of the "official" git repository running >> on biopython.org, but this may not be that easy without some technical >> expertise in house to do this. ?From initial discussion with the OBF >> team about the idea of running git on their servers, my impression is >> if we can do it ourselves, we may. ?Jason Stajich actually suggested >> we use github independently. > > Well, basically it is not strictly necessary to have git installed on their > computers to create a mirror. > You can just create the clone on your computer, raw-ly copy the files there, > and then you will be able to push the new changes with an ssh access. > Since git is a distributed source control system, it doesn't require to > configure a server part as with cvs :-) > > To my knowledge, the pygr project (also a bioinformatics suite in python) > have an official repository hosted in gitourious, and a mirror in github to > collect patches from there. > > > > > -- > > My blog on bioinformatics (now in English): http://bioinfoblog.it > -- "A man who dares to waste one hour of time has not discovered the value of life" - Charles Darwin From mailinglist.honeypot at gmail.com Tue Mar 17 19:21:57 2009 From: mailinglist.honeypot at gmail.com (Steve Lianoglou) Date: Tue, 17 Mar 2009 15:21:57 -0400 Subject: [Biopython-dev] biopython on github In-Reply-To: <6d941f120903171209g751b5b86p797e79b333972301@mail.gmail.com> References: <320fb6e00903170146x59218aa0m857cab797ad3f440@mail.gmail.com> <20090317124930.GE57054@sobchak.mgh.harvard.edu> <6d941f120903170619n4cb8d4dfr8a72f8ac1e0e896d@mail.gmail.com> <320fb6e00903170744j543f643fg6ef8d677287e2361@mail.gmail.com> <5aa3b3570903171136k3dc616a3hc937381d940cd305@mail.gmail.com> <6d941f120903171209g751b5b86p797e79b333972301@mail.gmail.com> Message-ID: <7063D4EA-D827-4D91-A15C-53F148660D96@gmail.com> Hi, I really just loom around here, but a slight correction/point: > A. People would have all the fun on their development branches > B. When they felt confident they would submit their code to the > stabilizer branch, where we would check that all the important things > were there: unit test, code comments, QA, documentation > C. When things were in good shape, we would propose changes to the > official branch I'm very much a git noob, and from having been following this thread a bit, it seems that many of us are, so for the noobs: I think somewhere around B, the person wanting to commit new code would have to rebase[1] their branch against the official "stabilizer branch" (that they had originally forked from). This would put the onus of fixing any breaks and keeping track of recent developments on the branch you propose to merge into (since you originally branched), on the person who is writing the new code. This makes it easier for the "official keepers of the one true branch" to accept new patches, since they know the patch will work on the latest version. Anyway, I think I just wanted to point out that rebase was there since I don't think there's anything really equivalent in the CVS/SVN world. -steve [1] rebase : http://www.kernel.org/pub/software/scm/git/docs/git-rebase.html From tiagoantao at gmail.com Tue Mar 17 19:27:10 2009 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Tue, 17 Mar 2009 19:27:10 +0000 Subject: [Biopython-dev] biopython on github In-Reply-To: <7063D4EA-D827-4D91-A15C-53F148660D96@gmail.com> References: <320fb6e00903170146x59218aa0m857cab797ad3f440@mail.gmail.com> <20090317124930.GE57054@sobchak.mgh.harvard.edu> <6d941f120903170619n4cb8d4dfr8a72f8ac1e0e896d@mail.gmail.com> <320fb6e00903170744j543f643fg6ef8d677287e2361@mail.gmail.com> <5aa3b3570903171136k3dc616a3hc937381d940cd305@mail.gmail.com> <6d941f120903171209g751b5b86p797e79b333972301@mail.gmail.com> <7063D4EA-D827-4D91-A15C-53F148660D96@gmail.com> Message-ID: <6d941f120903171227o54bf9d36s645404de9962eed3@mail.gmail.com> 2009/3/17 Steve Lianoglou : > I think somewhere around B, the person wanting to commit new code would have > to rebase[1] their branch against the official "stabilizer branch" (that So, if I understand well, anyone wanting to submit a change to the official version would be responsible for rebasing, right? PS - being a git noob and a longtime cvs/svn user and manager I much appreciated Randal Schwartz google talk at: http://www.youtube.com/watch?v=8dhZ9BXQgc4 Especially aroung 30 minutes it is really informative. From mailinglist.honeypot at gmail.com Tue Mar 17 19:34:11 2009 From: mailinglist.honeypot at gmail.com (Steve Lianoglou) Date: Tue, 17 Mar 2009 15:34:11 -0400 Subject: [Biopython-dev] biopython on github In-Reply-To: <6d941f120903171227o54bf9d36s645404de9962eed3@mail.gmail.com> References: <320fb6e00903170146x59218aa0m857cab797ad3f440@mail.gmail.com> <20090317124930.GE57054@sobchak.mgh.harvard.edu> <6d941f120903170619n4cb8d4dfr8a72f8ac1e0e896d@mail.gmail.com> <320fb6e00903170744j543f643fg6ef8d677287e2361@mail.gmail.com> <5aa3b3570903171136k3dc616a3hc937381d940cd305@mail.gmail.com> <6d941f120903171209g751b5b86p797e79b333972301@mail.gmail.com> <7063D4EA-D827-4D91-A15C-53F148660D96@gmail.com> <6d941f120903171227o54bf9d36s645404de9962eed3@mail.gmail.com> Message-ID: <711E86ED-F220-4E97-84BC-9E94753E111A@gmail.com> On Mar 17, 2009, at 3:27 PM, Tiago Ant?o wrote: > 2009/3/17 Steve Lianoglou : >> I think somewhere around B, the person wanting to commit new code >> would have >> to rebase[1] their branch against the official "stabilizer >> branch" (that > > So, if I understand well, anyone wanting to submit a change to the > official version would be responsible for rebasing, right? And if I understand it well, then I think you're right. I think that's a reasonable policy. That puts the responsibility to ensure that any new code I write works with whatever has been approved already on me, and not you. While this may require a bit extra responsibility on the committer, I'd be surprised if it would be enough to deter any new would-be committers from taking a shot at contributing code (maybe it would? I guess it's debatable). > PS - being a git noob and a longtime cvs/svn user and manager I much > appreciated Randal Schwartz google talk at: > http://www.youtube.com/watch?v=8dhZ9BXQgc4 Especially aroung 30 > minutes it is really informative. Sweet. To be honest, the only video I ever saw of git was Linus' SVN-bash google talk, which somehow put me off from considering git longer than I should have, so this is a good link to have :-) Thanks, -steve -- Steve Lianoglou Graduate Student: Physiology, Biophysics and Systems Biology Weill Medical College of Cornell University http://cbio.mskcc.org/~lianos From biopython at maubp.freeserve.co.uk Tue Mar 17 20:21:45 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 17 Mar 2009 20:21:45 +0000 Subject: [Biopython-dev] biopython on github In-Reply-To: <6d941f120903171209g751b5b86p797e79b333972301@mail.gmail.com> References: <320fb6e00903170146x59218aa0m857cab797ad3f440@mail.gmail.com> <20090317124930.GE57054@sobchak.mgh.harvard.edu> <6d941f120903170619n4cb8d4dfr8a72f8ac1e0e896d@mail.gmail.com> <320fb6e00903170744j543f643fg6ef8d677287e2361@mail.gmail.com> <5aa3b3570903171136k3dc616a3hc937381d940cd305@mail.gmail.com> <6d941f120903171209g751b5b86p797e79b333972301@mail.gmail.com> Message-ID: <320fb6e00903171321y4b94f220h7d2d1172ee085e15@mail.gmail.com> 2009/3/17 Tiago Ant?o : > OK, in order to exercise and try github development I have forked a > branch to work on the PopGen code. The idea of the branch is to serve > as a platform for merging with the "official" branch. So, the idea is: > > 1. Official branch - The stable thingy > 2. PopGen stabilizer branch - The place to merge contributions from > PopGen development branches. The idea is that people can go crazy on > their own branches and this intermediate one serves as a point to > stabilize (unit test, documentation, QA, ...) before the commit to the > official one > 3. Crazy branches - Develop your crazy idea. I have 3 ideas myself: > One for Jason's structure code, one for my LDNe code and another for > statistics. Many more welcomed.... > > The development procedure would be like this: > A. People would have all the fun on their development branches > B. When they felt confident they would submit their code to the > stabilizer branch, where we would check that all the important things > were there: unit test, code comments, QA, documentation > C. When things were in good shape, we would propose changes to the > official branch > > And, by the way, bug fixes of existing production would also be done > on the stabilizer branch. > > Does this make any sense? Totally. But keep in mind the current "official" git branch (the one being updated from CVS) may get nuked if we have to redo the import to fix the missing version tags - so I would suggest you name your branches with "test" or "provisional" or something temporary in the text for now. > In my view, with things like git, a policy like this encourages both > innovation while preserving stability and robustness of the official > branch. Yes - and this like the right approach for Bio.PopGen, with you acting as the gatekeeper. Peter From chapmanb at 50mail.com Tue Mar 17 21:34:14 2009 From: chapmanb at 50mail.com (Brad Chapman) Date: Tue, 17 Mar 2009 17:34:14 -0400 Subject: [Biopython-dev] biopython on github In-Reply-To: <320fb6e00903171059r7a5528d5i19bf5fed9cfd8a63@mail.gmail.com> References: <320fb6e00903170146x59218aa0m857cab797ad3f440@mail.gmail.com> <20090317124930.GE57054@sobchak.mgh.harvard.edu> <6d941f120903170619n4cb8d4dfr8a72f8ac1e0e896d@mail.gmail.com> <320fb6e00903170744j543f643fg6ef8d677287e2361@mail.gmail.com> <320fb6e00903171059r7a5528d5i19bf5fed9cfd8a63@mail.gmail.com> Message-ID: <20090317213414.GK57054@sobchak.mgh.harvard.edu> Hi Peter; > Using the git command line tool, I was able to pull and merge Brad's > changes (as I had made no changes in the meantime this could be done > automatically), and then push the merged version back up to github on > my branch. At this point my branch and brad's agreed once again, and > the "network" diagram no longer shows both. Note that my branch now > includes a commit from Brad. Sweet. Glad that worked. I deleted my branch (edit->delete repository). While doing so, I noticed that there is also a 'Repository Collaborators' section within the 'edit' page. So, another working model is to have multiple users simultaneously editing one forked revision. If you are already communicating on the work through the mailing list or wiki, this is more like CVS/SVN then the branching model. > Now all this worked, but I was wondering if the github web interface > could have simplified any of this, if I'd only know where to click. > For example, does github offer any way to view a diff between to > branches? Or, as I suspect, do they simply expect you to use the git > tools directly for this? What was the command you used for this? git diff is still befuddling to me. Brad From bugzilla-daemon at portal.open-bio.org Wed Mar 18 14:18:39 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 18 Mar 2009 10:18:39 -0400 Subject: [Biopython-dev] [Bug 2777] [Solution is one line change!] Entity sorting altered by detach_child() calls In-Reply-To: Message-ID: <200903181418.n2IEIdIm003158@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2777 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #5 from biopython-bugzilla at maubp.freeserve.co.uk 2009-03-18 10:18 EST ------- Fix checked into CVS as Bio/PDB/Entity.py revision 1.26, marking as fixed. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython at maubp.freeserve.co.uk Wed Mar 18 15:07:42 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 18 Mar 2009 15:07:42 +0000 Subject: [Biopython-dev] Preparing for Biopython 1.50 (beta) In-Reply-To: <320fb6e00903160516yd63f61fu21ca7560562dd6dd@mail.gmail.com> References: <320fb6e00903160516yd63f61fu21ca7560562dd6dd@mail.gmail.com> Message-ID: <320fb6e00903180807u4a0f7a5aqaa91f20b40891ca4@mail.gmail.com> On Mon, Mar 16, 2009 at 12:16 PM, Peter wrote: > Bug 2767 - Bio.SeqIO support for FASTQ and QUAL files That's in CVS now, Brad and I have used it a bit, but further testing before the beta wouldn't hurt. > Bug 2551 - Adding advanced __getitem__ to generic alignment, e.g. > align[1:2,5:-5] Anyone want try this out? http://bugzilla.open-bio.org/show_bug.cgi?id=2551 > Any other nominations for Biopython 1.50? Other candidates with patches that have since come to mind: Bug 2733 - Runing unit tests where Biopthyon wasn't built from source http://bugzilla.open-bio.org/show_bug.cgi?id=2733 This seemed patch seemed OK from both my and Bruce's testing. Bug 2738 - Speed up GenBank parsing, in particular location parsing http://bugzilla.open-bio.org/show_bug.cgi?id=2738 I would want to run some theses with EMBL files before committing this. Bug 2745 - Bio.GenBank.LocationParserError with a GenBank CON file http://bugzilla.open-bio.org/show_bug.cgi?id=2745 I'd like to change CONTIG line parsing to just use a string (or a list of strings). Peter From nuin at genedrift.org Wed Mar 18 19:50:28 2009 From: nuin at genedrift.org (Paulo Nuin) Date: Wed, 18 Mar 2009 15:50:28 -0400 Subject: [Biopython-dev] biopython on github In-Reply-To: <320fb6e00903161007p3e36b6d3j29e4c319c762576a@mail.gmail.com> References: <8b34ec180902231029u7a9d003r533af7f078f4a8e2@mail.gmail.com> <8b34ec180903121620w9c2ec46i8fed9ccb4781370e@mail.gmail.com> <320fb6e00903130521s69c5b3eg55b71191b1e8ff21@mail.gmail.com> <128a885f0903142243r372026d7vdf5bbe998db3a326@mail.gmail.com> <20090315185443.GA30296@kunkel> <320fb6e00903160430h125d11a3jd100497d3e25ffb8@mail.gmail.com> <8b34ec180903160724h2e239fafi22d8f5fa9c1de7cc@mail.gmail.com> <320fb6e00903160800s36b8231fo57e0a11506f8635d@mail.gmail.com> <8b34ec180903160955m3d427927wce61940f51cf5337@mail.gmail.com> <49BE8532.9040701@genedrift.org> <320fb6e00903161007p3e36b6d3j29e4c319c762576a@mail.gmail.com> Message-ID: <49C15084.8040208@genedrift.org> Peter wrote: > On Mon, Mar 16, 2009 at 4:58 PM, Paulo Nuin wrote: > >> No problem on Vista. >> >> Git (version 1.5.6.1-preview20080701) >> >> Paulo >> > > Hi Paulo, > > Could you be a bit more precise about the version are you using and > where got it from? i.e. Are you using cygwin or the Windows native > port, http://code.google.com/p/msysgit/ > I'm using msysgit version 1.5.6. > And did you mean in general you have no problems with git on Windows > Vista, or have you also tried fetching Biopython from github, > building, testing (and installing it)? For example, are there any new > line issues from the unit tests? This is one area where CVS and git > may differ slightly... > I'm using Github to store a couple of projects and this version is working great. Also Eclipse addon is also fine. I cloned BioPython but haven't tried installing or building it. Paulo From bugzilla-daemon at portal.open-bio.org Thu Mar 19 13:42:23 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 19 Mar 2009 09:42:23 -0400 Subject: [Biopython-dev] [Bug 2654] Bio.Blast.NCBIStandalone does not support the output file argument In-Reply-To: Message-ID: <200903191342.n2JDgN3p016978@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2654 yvan.strahm at bccs.uib.no changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |yvan.strahm at bccs.uib.no -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Mar 19 17:08:16 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 19 Mar 2009 13:08:16 -0400 Subject: [Biopython-dev] [Bug 2654] Bio.Blast.NCBIStandalone does not support the output file argument In-Reply-To: Message-ID: <200903191708.n2JH8GqS032350@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2654 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2009-03-19 13:08 EST ------- Fixed in Bio/Blast/NCBIStandalone.py CVS revision 1.86 http://cvs.biopython.org/cgi-bin/viewcvs/viewcvs.cgi/biopython/Bio/Blast/NCBIStandalone.py?cvsroot=biopython Note that the three tools themselves all use -o (lower case) for the output file, but refer to it slightly differently: $ ./rpsblast --help | grep " -o " -o Output File for Alignment [File Out] Optional $ ./blastpgp --help | grep " -o " -o Output File for Alignment [File Out] Optional $ ./blastall --help | grep " -o " -o BLAST report Output File [File Out] Optional Our function for rpsblast already supported this argument under the name "align_outfile" which I have therefore also used for blastpgp (this is good name as blastpgp outputs more than one type of file). For blastall "align_outfile" doesn't seem entirely appropriate, and although it is inconsistent I have gone for "outfile" instead. Example usage: #imports and setting up input parameters omitted out_handle, err_handle = NCBIStandalone.blastall(blastall_exe, "blastp", blastdb_nr, query_file, expectation=0.000001, nprocessors=1, filter="F", outfile=output_file, alignments=5, descriptions=5) assert "" == err_handle.read() assert "" = out_handle.read() #Important so we wait for BLAST to finish! err_handle.close() out_handle.close() assert os.path.isfile(output_file) count = 0 for blast_record in NCBIXML.parse(open(output_file)) : count += 1 print "Found %i BLAST results" % count -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython at maubp.freeserve.co.uk Thu Mar 19 19:00:51 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 19 Mar 2009 19:00:51 +0000 Subject: [Biopython-dev] biopython on github In-Reply-To: <20090317213414.GK57054@sobchak.mgh.harvard.edu> References: <320fb6e00903170146x59218aa0m857cab797ad3f440@mail.gmail.com> <20090317124930.GE57054@sobchak.mgh.harvard.edu> <6d941f120903170619n4cb8d4dfr8a72f8ac1e0e896d@mail.gmail.com> <320fb6e00903170744j543f643fg6ef8d677287e2361@mail.gmail.com> <320fb6e00903171059r7a5528d5i19bf5fed9cfd8a63@mail.gmail.com> <20090317213414.GK57054@sobchak.mgh.harvard.edu> Message-ID: <320fb6e00903191200q4ccff93v7e082990d115bc09@mail.gmail.com> On Tue, Mar 17, 2009 at 9:34 PM, Brad Chapman wrote: > Hi Peter; > >> Using the git command line tool, I was able to pull and merge Brad's >> changes (as I had made no changes in the meantime this could be done >> automatically), and then push the merged version back up to github on >> my branch. ?At this point my branch and brad's agreed once again, and >> the "network" diagram no longer shows both. ?Note that my branch now >> includes a commit from Brad. > > Sweet. Glad that worked. I deleted my branch (edit->delete > repository). How long did it take to process? I deleted mine (after attempting to merge against the CVS mirror). The delete was still in progress over 12 hours later! > While doing so, I noticed that there is also a 'Repository > Collaborators' section within the 'edit' page. So, another working > model is to have multiple users simultaneously editing one forked > revision. If you are already communicating on the work through the > mailing list or wiki, this is more like CVS/SVN then the branching > model. Yes, this should be a fairly simple way to give all our current CVS developers direct access to a master branch on github. >> Now all this worked, but I was wondering if the github web interface >> could have simplified any of this, if I'd only know where to click. >> For example, does github offer any way to view a diff between to >> branches? ?Or, as I suspect, do they simply expect you to use the git >> tools directly for this? > > What was the command you used for this? git diff is still befuddling > to me. I didn't actually figure that out (how to do a diff between two branches on github). And this afternoon github seems to be down, so I haven't played with it any more. Peter From chris.lasher at gmail.com Fri Mar 20 04:52:49 2009 From: chris.lasher at gmail.com (Chris Lasher) Date: Fri, 20 Mar 2009 00:52:49 -0400 Subject: [Biopython-dev] Help pages in Biopython wiki Message-ID: <128a885f0903192152m7d1e24fdh3ace50021851b36e@mail.gmail.com> Would it be possible to get the help documentation installed for the Biopython wiki? http://biopython.org/wiki/Help Chris From lpritc at scri.ac.uk Fri Mar 20 08:42:44 2009 From: lpritc at scri.ac.uk (Leighton Pritchard) Date: Fri, 20 Mar 2009 08:42:44 +0000 Subject: [Biopython-dev] Help pages in Biopython wiki In-Reply-To: <128a885f0903192152m7d1e24fdh3ace50021851b36e@mail.gmail.com> Message-ID: Hi Chris, That page doesn't exist, yet (click on the 'page' tab to see this), and no pages link to it (see here: http://biopython.org/wiki/Special:WhatLinksHere/Help) What help were you expecting to see there? L. On 20/03/2009 04:52, "Chris Lasher" wrote: > Would it be possible to get the help documentation installed for the > Biopython wiki? > > http://biopython.org/wiki/Help > > Chris > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev > > ______________________________________________________________________ > This email has been scanned by the MessageLabs Email Security System. > For more information please visit http://www.messagelabs.com/email > ______________________________________________________________________ -- Dr Leighton Pritchard MRSC D131, Plant Pathology Programme, SCRI Errol Road, Invergowrie, Perth and Kinross, Scotland, DD2 5DA e:lpritc at scri.ac.uk w:http://www.scri.ac.uk/staff/leightonpritchard gpg/pgp: 0xFEFC205C tel:+44(0)1382 562731 x2405 ______________________________________________________ SCRI, Invergowrie, Dundee, DD2 5DA. The Scottish Crop Research Institute is a charitable company limited by guarantee. Registered in Scotland No: SC 29367. Recognised by the Inland Revenue as a Scottish Charity No: SC 006662. DISCLAIMER: This email is from the Scottish Crop Research Institute, but the views expressed by the sender are not necessarily the views of SCRI and its subsidiaries. This email and any files transmitted with it are confidential to the intended recipient at the e-mail address to which it has been addressed. It may not be disclosed or used by any other than that addressee. If you are not the intended recipient you are requested to preserve this confidentiality and you must not use, disclose, copy, print or rely on this e-mail in any way. Please notify postmaster at scri.ac.uk quoting the name of the sender and delete the email from your system. Although SCRI has taken reasonable precautions to ensure no viruses are present in this email, neither the Institute nor the sender accepts any responsibility for any viruses, and it is your responsibility to scan the email and the attachments (if any). ______________________________________________________ From biopython at maubp.freeserve.co.uk Fri Mar 20 10:41:49 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 20 Mar 2009 10:41:49 +0000 Subject: [Biopython-dev] biopython on github In-Reply-To: <320fb6e00903191200q4ccff93v7e082990d115bc09@mail.gmail.com> References: <320fb6e00903170146x59218aa0m857cab797ad3f440@mail.gmail.com> <20090317124930.GE57054@sobchak.mgh.harvard.edu> <6d941f120903170619n4cb8d4dfr8a72f8ac1e0e896d@mail.gmail.com> <320fb6e00903170744j543f643fg6ef8d677287e2361@mail.gmail.com> <320fb6e00903171059r7a5528d5i19bf5fed9cfd8a63@mail.gmail.com> <20090317213414.GK57054@sobchak.mgh.harvard.edu> <320fb6e00903191200q4ccff93v7e082990d115bc09@mail.gmail.com> Message-ID: <320fb6e00903200341n7df020a7j95c611ab0a886ccb@mail.gmail.com> On Thu, Mar 19, 2009 at 7:00 PM, Peter wrote: > On Tue, Mar 17, 2009 at 9:34 PM, Brad Chapman wrote: >> Sweet. Glad that worked. I deleted my branch (edit->delete >> repository). > > How long did it take to process? ?I deleted mine (after attempting to > merge against the CVS mirror). ?The delete was still in progress over > 12 hours later! And the branch delete is still on-going :( > ... ?And this afternoon github seems to be down, so I haven't played with it any more. Its back online again, but right now for me github is a bit of a damp squid [*]. As my initial branch/fork of biopython still exists but is being deleted, it seems in the meantime I can't create a new branch of biopython. Odd, and rather frustrating. Hopefully it will sort itself out shortly, and I can have another play with merging branches... Peter [*] For the benefit of non-native English speakers, or or anyone whose sense of humour works differently to mine, this was a pun, based on the English phrase "damp squib" for a disappointing event, and the fact that github's error page has some kind of cartoon squid/octopus-cat creature on it. From dalloliogm at gmail.com Fri Mar 20 11:15:21 2009 From: dalloliogm at gmail.com (Giovanni Marco Dall'Olio) Date: Fri, 20 Mar 2009 12:15:21 +0100 Subject: [Biopython-dev] biopython on github In-Reply-To: <320fb6e00903200341n7df020a7j95c611ab0a886ccb@mail.gmail.com> References: <320fb6e00903170146x59218aa0m857cab797ad3f440@mail.gmail.com> <20090317124930.GE57054@sobchak.mgh.harvard.edu> <6d941f120903170619n4cb8d4dfr8a72f8ac1e0e896d@mail.gmail.com> <320fb6e00903170744j543f643fg6ef8d677287e2361@mail.gmail.com> <320fb6e00903171059r7a5528d5i19bf5fed9cfd8a63@mail.gmail.com> <20090317213414.GK57054@sobchak.mgh.harvard.edu> <320fb6e00903191200q4ccff93v7e082990d115bc09@mail.gmail.com> <320fb6e00903200341n7df020a7j95c611ab0a886ccb@mail.gmail.com> Message-ID: <5aa3b3570903200415m2f46a45fs8be270f28357a994@mail.gmail.com> On Fri, Mar 20, 2009 at 11:41 AM, Peter wrote: > On Thu, Mar 19, 2009 at 7:00 PM, Peter wrote: >> On Tue, Mar 17, 2009 at 9:34 PM, Brad Chapman wrote: >>> Sweet. Glad that worked. I deleted my branch (edit->delete >>> repository). >> >> How long did it take to process? ?I deleted mine (after attempting to >> merge against the CVS mirror). ?The delete was still in progress over >> 12 hours later! > > And the branch delete is still on-going :( > >> ... ?And this afternoon github seems to be down, so I haven't played with it any more. > > Its back online again, but right now for me github is a bit of a damp squid [*]. > As my initial branch/fork of biopython still exists but is being > deleted, it seems > in the meantime I can't create a new branch of biopython. mmm are you referring to this: - http://github.com/peterjc/biopython-seqio-quality/network ? I can see it, and also fetch/pull changes from it.. I see that you have renamed your fork as seqio-quality. Ok, but I think it is better to keep the fork's name as 'biopython', and then create many branches inside it. For example: git clone cd biopython # make some commits to your master branch: touch testfile.txt git add testfile.txt git commit -a -m 'test file added' # push the changes to your github repository ('origin' refers to github; see $(CWD)/biopython/.git/config) git push origin master # create a branch called 'experimental-seqio-quality', and switch to it: # without arguments, git branch shows the list of branches and the current one: git branch # create the experimental-seqio-quality branch: git branch experimental-seqio-quality # switch to it: git checkout experimental-seqio-quality # check that experimental-seqio-quality is the current working branch: git branch # now you are working in the branch called 'experimental-seqio-quality'. All the changes you # commit here, will not be saved in the 'master' branch or the others, as long as you don't # merge them: touch seqio-parser git add seqio-parser git commit -a -m 'added seqioparser' git push origin experimental-seqio-quality # after pushing, git will create a new branch in github. Look for example at my fork here: # - http://github.com/biopython/biopython/network ############ Here is how you can merge and compare your branch with someone else's or with the biopython one: # add a reference to biopython official branch git remote add biopython git://github.com/biopython/biopython.git # obtain the set of changes from the biopython branch, and merge them git fetch biopython git log master biopython/master git diff master biopython/master git merge master biopython/master git remote add peter git://github.com/peterjc/biopython-seqio-quality.git git fetch peter # there it should be a way to do this without having to fetch git diff master peter/master For references, look at this guide: http://github.com/guides/keeping-a-git-fork-in-sync-with-the-forked-repo >?Odd, and rather > frustrating. ?Hopefully it will sort itself out shortly, and I can > have another play > with merging branches... > > Peter > > [*] For the benefit of non-native English speakers, or or anyone whose sense > of humour works differently to mine, this was a pun, based on the English phrase > "damp squib" for a disappointing event, and the fact that github's > error page has > some kind of cartoon squid/octopus-cat creature on it. > > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev > -- My blog on bioinformatics (now in English): http://bioinfoblog.it From cymon.cox at googlemail.com Fri Mar 20 11:16:27 2009 From: cymon.cox at googlemail.com (Cymon Cox) Date: Fri, 20 Mar 2009 11:16:27 +0000 Subject: [Biopython-dev] Test - ignore Message-ID: <7265d4f0903200416o7c8135ddrfae4aad723bd17b7@mail.gmail.com> From biopython at maubp.freeserve.co.uk Fri Mar 20 11:32:15 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 20 Mar 2009 11:32:15 +0000 Subject: [Biopython-dev] biopython on github In-Reply-To: <5aa3b3570903200415m2f46a45fs8be270f28357a994@mail.gmail.com> References: <320fb6e00903170146x59218aa0m857cab797ad3f440@mail.gmail.com> <20090317124930.GE57054@sobchak.mgh.harvard.edu> <6d941f120903170619n4cb8d4dfr8a72f8ac1e0e896d@mail.gmail.com> <320fb6e00903170744j543f643fg6ef8d677287e2361@mail.gmail.com> <320fb6e00903171059r7a5528d5i19bf5fed9cfd8a63@mail.gmail.com> <20090317213414.GK57054@sobchak.mgh.harvard.edu> <320fb6e00903191200q4ccff93v7e082990d115bc09@mail.gmail.com> <320fb6e00903200341n7df020a7j95c611ab0a886ccb@mail.gmail.com> <5aa3b3570903200415m2f46a45fs8be270f28357a994@mail.gmail.com> Message-ID: <320fb6e00903200432s59ddf9a8vfd8230c0a07cd598@mail.gmail.com> >> As my initial branch/fork of biopython still exists but is being >> deleted, it seems in the meantime I can't create a new branch >> of biopython. > > mmm are you referring to this: > - http://github.com/peterjc/biopython-seqio-quality/network > ? > > I can see it, and also fetch/pull changes from it.. True, the network page is still there for me. But http://github.com/peterjc/biopython-seqio-quality/ which redirects to http://github.com/peterjc/biopython-seqio-quality/tree/master shows me just a "This repository is being deleted" page. > I see that you have renamed your fork as seqio-quality. Ok, but I > think it is better to keep the fork's name as 'biopython', and then > create many branches inside it. I don't think I had entirely understood github's use of fork versus branch. I'll have so do some more reading and try again once my account has settled down. Thanks for the details in your email. Peter From bugzilla-daemon at portal.open-bio.org Fri Mar 20 12:18:53 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 20 Mar 2009 08:18:53 -0400 Subject: [Biopython-dev] [Bug 2678] Bio.Entrez module does not always retrieve or find DTD files In-Reply-To: Message-ID: <200903201218.n2KCIrSX026346@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2678 ------- Comment #10 from mdehoon at ims.u-tokyo.ac.jp 2009-03-20 08:18 EST ------- (In reply to comment #7) > (In reply to comment #6) > > If the DTD is available locally in Bio/Entrez/DTDs, then Bio.Entrez will read > > it from there. If not, it tries to download it. This may fail if the servers > > are busy. If the needed DTDs are saved in Bio/Entrez/DTDs (and installed when > > Biopython is installed), you won't run into this problem. > > I was just looking at this on my Windows XP Python 2.3 machine, and when it > tried to download missing DTD files it was just using a filename as the URL. In hindsight, I wonder if trying to download missing DTD files is really a good idea. Suppose a user does a large number of Entrez queries, and saves the results as XML files. Then, he tries to parse each of those XML files. If a DTD file is missing, then Bio.Entrez will try to download the same DTD file for each XML file it is trying to parse. This is not only wasteful, but also bypasses Entrez's rule of no more than three accesses per second. In addition, this is fragile. The XML files typically contain a full url to the needed DTD. But many of Entrez's DTD files contain references to other DTD files, and those references can be relative. When Bio.Entrez gets such a relative path to where the DTD file is located, it is difficult to figure out the absolute path to the DTD. Now we are looking for it in http://www.ncbi.nlm.nih.gov/dtd/, but this does not seem to contain all required DTDs. It may therefore make sense not to download the DTD file, but to raise an Exception with a helpful error message, specifying which DTD file is missing, where it can possibly be found, and where the DTD file can be installed. It requires some more effort from the user, but it is more robust, won't break Entrez' rules, and is more efficient. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From chapmanb at 50mail.com Fri Mar 20 12:55:18 2009 From: chapmanb at 50mail.com (Brad Chapman) Date: Fri, 20 Mar 2009 08:55:18 -0400 Subject: [Biopython-dev] biopython on github In-Reply-To: <320fb6e00903200432s59ddf9a8vfd8230c0a07cd598@mail.gmail.com> References: <20090317124930.GE57054@sobchak.mgh.harvard.edu> <6d941f120903170619n4cb8d4dfr8a72f8ac1e0e896d@mail.gmail.com> <320fb6e00903170744j543f643fg6ef8d677287e2361@mail.gmail.com> <320fb6e00903171059r7a5528d5i19bf5fed9cfd8a63@mail.gmail.com> <20090317213414.GK57054@sobchak.mgh.harvard.edu> <320fb6e00903191200q4ccff93v7e082990d115bc09@mail.gmail.com> <320fb6e00903200341n7df020a7j95c611ab0a886ccb@mail.gmail.com> <5aa3b3570903200415m2f46a45fs8be270f28357a994@mail.gmail.com> <320fb6e00903200432s59ddf9a8vfd8230c0a07cd598@mail.gmail.com> Message-ID: <20090320125518.GA351@sobchak.mgh.harvard.edu> Hi all; > >> As my initial branch/fork of biopython still exists but is being > >> deleted, it seems in the meantime I can't create a new branch > >> of biopython. [...] > True, the network page is still there for me. But > http://github.com/peterjc/biopython-seqio-quality/ which redirects to > http://github.com/peterjc/biopython-seqio-quality/tree/master > shows me just a "This repository is being deleted" page. Peter, the repository deletion was very quick for me, so it looks like it got stuck somewhere with the GitHub downtime. Does this help for getting it removed: http://originblog.wordpress.com/2008/04/28/github-tips-removing-a-remote-branch/ > > I see that you have renamed your fork as seqio-quality. Ok, but I > > think it is better to keep the fork's name as 'biopython', and then > > create many branches inside it. > > I don't think I had entirely understood github's use of fork versus branch. > I'll have so do some more reading and try again once my account has > settled down. Thanks for the details in your email. Wow, now I am mad confused. I thought forks and branches were conceptually the same. Giovanni, it seems like you are suggesting one branch (the GitHub fork) and then a second branch (the git branch command). We were thinking of a standard case as: 1. Fork the Biopython trunk at GitHub. Name this something so it makes sense what the fork/branch is for. 2. Work on the fork/branch. If you want, invite others to work on it with you. 3. When finished, be sure you are up to date with the master Biopython trunk. 4. Submit the fork/branch for inclusion in Biopython. 5. Once included, delete the fork/branch. Which parts of this fall out of "standard" git practice? In general, we should strive to keep this as simple as possible. If using Git is complicated then we are losing a lot of our advantage over CVS/patches. Giovanni, the example commands were very helpful; I added details to the Git page on how to see diffs of branches: http://biopython.org/wiki/GitMigration#Evaluating_changes Brad From bugzilla-daemon at portal.open-bio.org Fri Mar 20 13:57:00 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 20 Mar 2009 09:57:00 -0400 Subject: [Biopython-dev] [Bug 2678] Bio.Entrez module does not always retrieve or find DTD files In-Reply-To: Message-ID: <200903201357.n2KDv0JJ001146@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2678 ------- Comment #11 from biopython-bugzilla at maubp.freeserve.co.uk 2009-03-20 09:57 EST ------- (In reply to comment #10) > > In hindsight, I wonder if trying to download missing DTD files is really a > good idea. Suppose a user does a large number of Entrez queries, and saves > the results as XML files. Then, he tries to parse each of those XML files. > If a DTD file is missing, then Bio.Entrez will try to download the same DTD > file for each XML file it is trying to parse. This is not only wasteful, but > also bypasses Entrez's rule of no more than three accesses per second. Very true. We should be able to enforce the access limit here without too much trouble. More generally, it would make sense for the DTD file to be saved - ideally to the python site-packages but as we may not have write access, at least to a cache. > In addition, this is fragile. The XML files typically contain a full url to > the needed DTD. But many of Entrez's DTD files contain references to other > DTD files, and those references can be relative. When Bio.Entrez gets such a > relative path to where the DTD file is located, it is difficult to figure out > the absolute path to the DTD. Now we are looking for it in > http://www.ncbi.nlm.nih.gov/dtd/, but this does not seem to contain all > required DTDs. When I looked into the DTD URLs, I didn't see the NCBI using an relative links, but they may have changed things since. Additionally the NCBI have a (different but overlapping) set of DTD files at: http://eutils.ncbi.nlm.nih.gov/entrez/query/DTD/ Can we get some python XML/DTD library to resolve these links for us? > It may therefore make sense not to download the DTD file, but to raise an > Exception with a helpful error message, specifying which DTD file is missing, > where it can possibly be found, and where the DTD file can be installed. It > requires some more effort from the user, but it is more robust, won't break > Entrez' rules, and is more efficient. Biopython 1.49 generally failed to download missing DTD files. Right now the current code in CVS does much better at coping with missing DTD files, but in a very wasteful way. In either version, it does at least issue warnings, indicating something is not right. As a user, I would prefer Bio.Entrez to download missing DTD files on demand AND SAVE THEM. As a developer I can see this is rather complicated, and you are right Michiel - a simple error message with instructions is much more straight forward. Note that the error might also suggest upgrading to the latest Biopython, or reporting the issue to us - but it would then be a very long error message! If you want to switch to a helpful error message for missing DTD files, I'm OK with that. We could also ship the current code for Biopython 1.50. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From dalloliogm at gmail.com Fri Mar 20 14:25:41 2009 From: dalloliogm at gmail.com (Giovanni Marco Dall'Olio) Date: Fri, 20 Mar 2009 15:25:41 +0100 Subject: [Biopython-dev] biopython on github In-Reply-To: <20090320125518.GA351@sobchak.mgh.harvard.edu> References: <6d941f120903170619n4cb8d4dfr8a72f8ac1e0e896d@mail.gmail.com> <320fb6e00903170744j543f643fg6ef8d677287e2361@mail.gmail.com> <320fb6e00903171059r7a5528d5i19bf5fed9cfd8a63@mail.gmail.com> <20090317213414.GK57054@sobchak.mgh.harvard.edu> <320fb6e00903191200q4ccff93v7e082990d115bc09@mail.gmail.com> <320fb6e00903200341n7df020a7j95c611ab0a886ccb@mail.gmail.com> <5aa3b3570903200415m2f46a45fs8be270f28357a994@mail.gmail.com> <320fb6e00903200432s59ddf9a8vfd8230c0a07cd598@mail.gmail.com> <20090320125518.GA351@sobchak.mgh.harvard.edu> Message-ID: <5aa3b3570903200725p1437ceem6a538af640c52ced@mail.gmail.com> On Fri, Mar 20, 2009 at 1:55 PM, Brad Chapman wrote: > Hi all; > >> >> As my initial branch/fork of biopython still exists but is being >> >> deleted, it seems in the meantime I can't create a new branch >> >> of biopython. > [...] >> True, the network page is still there for me. But >> http://github.com/peterjc/biopython-seqio-quality/ which redirects to >> http://github.com/peterjc/biopython-seqio-quality/tree/master >> shows me just a "This repository is being deleted" page. > > Peter, the repository deletion was very quick for me, so it looks like it > got stuck somewhere with the GitHub downtime. Does this help for getting it > removed: > > http://originblog.wordpress.com/2008/04/28/github-tips-removing-a-remote-branch/ > >> > I see that you have renamed your fork as seqio-quality. Ok, but I >> > think it is better to keep the fork's name as 'biopython', and then >> > create many branches inside it. >> >> I don't think I had entirely understood github's use of fork versus branch. >> I'll have so do some more reading and try again once my account has >> settled down. ?Thanks for the details in your email. > > Wow, now I am mad confused. I thought forks and branches were > conceptually the same. Consider that the term "fork" is specific to github, and has nothing to do with git. There is no 'git fork' command. When you do a 'fork' in github, what it does it to create a personal 'space' on your account on github, to host all your personalizations, including new commits and also new branches of development. It is a kind of 'working space', that indicates all the work you have done. I understand it seems a bit complicated at first :-( but I think that, without using github, it is even more difficult to understand these things. In your account you can have more than one experimental branch. For example, I can create a branch called 'experimental-xzy-parser', another called 'personal modifications', and keep the master branch as it is (or rename it). if you want to contribute to my 'xyz parser', you can fetch this branch into your space, with a command like: $: git remote add giovanni $: git pull giovanni master:experimental-xyz-parser # (not sure about this last command) this should create a branch called 'experimental-xyz-parser' in your computer, so you can work with it, make modifications, and later push it to github (where it will happear in the network graph). > Giovanni, it seems like you are suggesting one > branch (the GitHub fork) and then a second branch (the git branch > command). We were thinking of a standard case as: > > 1. Fork the Biopython trunk at GitHub. Name this something so it > makes sense what the fork/branch is for. > 2. Work on the fork/branch. If you want, invite others to work on it > with you. > 3. When finished, be sure you are up to date with the master > Biopython trunk. > 4. Submit the fork/branch for inclusion in Biopython. > 5. Once included, delete the fork/branch. > > Which parts of this fall out of "standard" git practice? In general, > we should strive to keep this as simple as possible. If using Git is > complicated then we are losing a lot of our advantage over CVS/patches. > > Giovanni, the example commands were very helpful; I added details to the Git > page on how to see diffs of branches: > > http://biopython.org/wiki/GitMigration#Evaluating_changes > > Brad > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev > -- My blog on bioinformatics (now in English): http://bioinfoblog.it From bugzilla-daemon at portal.open-bio.org Fri Mar 20 14:50:49 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 20 Mar 2009 10:50:49 -0400 Subject: [Biopython-dev] [Bug 2767] Bio.SeqIO support for FASTQ and QUAL files In-Reply-To: Message-ID: <200903201450.n2KEonrB005712@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2767 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #12 from biopython-bugzilla at maubp.freeserve.co.uk 2009-03-20 10:50 EST ------- Code is in CVS with unit tests. Marking as fixed. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Mar 20 14:53:37 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 20 Mar 2009 10:53:37 -0400 Subject: [Biopython-dev] [Bug 2770] suggestion: raise a warning if Entrez.email is not set In-Reply-To: Message-ID: <200903201453.n2KErbfO006014@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2770 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |WONTFIX ------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk 2009-03-20 10:53 EST ------- Resolved as won't fix (unless the NCBI change their guidelines). -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Mar 20 15:49:52 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 20 Mar 2009 11:49:52 -0400 Subject: [Biopython-dev] [Bug 2718] Bio.Graphics and output file formats (PDF, EPS, SVG, and bitmaps) In-Reply-To: Message-ID: <200903201549.n2KFnqs8011031@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2718 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #5 from biopython-bugzilla at maubp.freeserve.co.uk 2009-03-20 11:49 EST ------- (In reply to comment #0) > (1) All the Bio.Graphics "write to file/handle" functions to accept any of the > supported file formats (like Bio.Graphics.GenomeDiagram), which would require > renderPM at run time for the bitmap formats (see Bug 2710). They should share > some code for mapping format names to ReportLab rendering module. This would > be easy to do without changing the existing mix of method names. That should be working in CVS now. > (2) Update the docstrings for the "write to file/handle" functions to make it > clear they can accept a filename OR a handle (a result of the underlying > reportlab renderer's drawToFile function's behaviour - see note below). This was done in CVS some time ago (comment 2) > (3) Standardise on the method naming (and perhaps deprecate the old methods). > Using "write" seems to be a sensible choice based on the current names used in > Bio.Graphics. This one is more difficult. GenomeDiagram uses a two step system - draw then write, where draw creates the ReportLab drawing object, and write saves it to a file. I'm going to leave this for another day... Marking bug as fixed. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Mar 20 17:32:50 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 20 Mar 2009 13:32:50 -0400 Subject: [Biopython-dev] [Bug 2795] New: Add commit, rollback, close to DBServer object Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2795 Summary: Add commit, rollback, close to DBServer object Product: Biopython Version: Not Applicable Platform: All OS/Version: All Status: NEW Severity: enhancement Priority: P2 Component: BioSQL AssignedTo: biopython-dev at biopython.org ReportedBy: biopython-bugzilla at maubp.freeserve.co.uk The DBServer object is defined in file BioSQL/BioSeqDatabase.py and it might make sense to add the following methods to it: def commit(self): """Commits the current transaction to the database.""" return self.adaptor.commit() def rollback(self): """Rolls backs the current transaction.""" return self.adaptor.rollback() def close(self): """Close the connection. No further activity possible.""" return self.adaptor.close() I think the adaptor is intended to hide internal implementation details, so we shouldn't be forcing people to use it directly for transaction support. Consider this example from http://www.biopython.org/wiki/BioSQL currently: from Bio import Entrez from Bio import SeqIO from BioSQL import BioSeqDatabase server = BioSeqDatabase.open_database(driver="MySQLdb", user="root", passwd = "", host = "localhost", db="bioseqdb") db = server["orchids"] handle = Entrez.efetch(db="nuccore", id="6273291,6273290,6273289", rettype="genbank") db.load(SeqIO.parse(handle, "genbank")) server.adaptor.commit() The last line would become just: server.commit() This seems cleaner. Patch to follow... -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Mar 20 17:34:14 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 20 Mar 2009 13:34:14 -0400 Subject: [Biopython-dev] [Bug 2795] Add commit, rollback, close to DBServer object In-Reply-To: Message-ID: <200903201734.n2KHYEZR018864@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2795 ------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2009-03-20 13:34 EST ------- Created an attachment (id=1263) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1263&action=view) BioSQL patch Patch to implement the change described. Tested with MySQL only. Cymon - what do you think of this? And does it work on PostgreSQL? -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Mar 20 17:59:14 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 20 Mar 2009 13:59:14 -0400 Subject: [Biopython-dev] [Bug 2795] Add commit, rollback, close to DBServer object In-Reply-To: Message-ID: <200903201759.n2KHxENC020654@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2795 ------- Comment #2 from cymon.cox at gmail.com 2009-03-20 13:59 EST ------- (In reply to comment #1) > Created an attachment (id=1263) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1263&action=view) [details] > BioSQL patch > > Patch to implement the change described. Tested with MySQL only. > > Cymon - what do you think of this? And does it work on PostgreSQL? I think it makes sense, and works on PostgreSQL with the psycopg2 driver. C. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Mar 20 18:07:55 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 20 Mar 2009 14:07:55 -0400 Subject: [Biopython-dev] [Bug 2795] Add commit, rollback, close to DBServer object In-Reply-To: Message-ID: <200903201807.n2KI7t37021424@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2795 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk 2009-03-20 14:07 EST ------- (In reply to comment #2) > I think it makes sense, and works on PostgreSQL with the psycopg2 driver. > C. Great, checked in, marking as fixed. We should update the wiki once Biopython 1.50 is out... -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Mar 20 18:52:44 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 20 Mar 2009 14:52:44 -0400 Subject: [Biopython-dev] [Bug 2754] Bio.PDB: Parse warnings should print to stderr, not stdout In-Reply-To: Message-ID: <200903201852.n2KIqiBO024589@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2754 ------- Comment #10 from eric.talevich at gmail.com 2009-03-20 14:52 EST ------- Here's the github branch where I'm working on this bug: http://github.com/etal/biopython/tree/master I've applied the two patches attached here and converted the test script from print-and-compare to unittest. The tests pass now, but I haven't added checks for specific parsing errors, just the general PDBConstructionError raised when parsing the example file with PERMISSIVE=0. The warnings are hidden during tests, as expected, but in this branch the PDBParser warnings are noticeably more annoying during normal use. Fixing this will require more tweaking in Bio/PDB/PDBParser.py -- I'll do that in the same branch, since I don't think you'd want to merge one fix without the other. Same goes for the __debug__ protection in StructureBuilder.py. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Mar 20 20:08:37 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 20 Mar 2009 16:08:37 -0400 Subject: [Biopython-dev] [Bug 2754] Bio.PDB: Parse warnings should print to stderr, not stdout In-Reply-To: Message-ID: <200903202008.n2KK8bpj029413@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2754 ------- Comment #11 from biopython-bugzilla at maubp.freeserve.co.uk 2009-03-20 16:08 EST ------- (In reply to comment #10) > Here's the github branch where I'm working on this bug: > http://github.com/etal/biopython/tree/master I've had a quick look on github, and this look interesting and I hope we can get it into Biopython proper before too long. Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython at maubp.freeserve.co.uk Fri Mar 20 20:44:34 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 20 Mar 2009 20:44:34 +0000 Subject: [Biopython-dev] biopython on github In-Reply-To: <20090320125518.GA351@sobchak.mgh.harvard.edu> References: <6d941f120903170619n4cb8d4dfr8a72f8ac1e0e896d@mail.gmail.com> <320fb6e00903170744j543f643fg6ef8d677287e2361@mail.gmail.com> <320fb6e00903171059r7a5528d5i19bf5fed9cfd8a63@mail.gmail.com> <20090317213414.GK57054@sobchak.mgh.harvard.edu> <320fb6e00903191200q4ccff93v7e082990d115bc09@mail.gmail.com> <320fb6e00903200341n7df020a7j95c611ab0a886ccb@mail.gmail.com> <5aa3b3570903200415m2f46a45fs8be270f28357a994@mail.gmail.com> <320fb6e00903200432s59ddf9a8vfd8230c0a07cd598@mail.gmail.com> <20090320125518.GA351@sobchak.mgh.harvard.edu> Message-ID: <320fb6e00903201344w64b303a1q1b1aac2740bac04a@mail.gmail.com> On Fri, Mar 20, 2009 at 12:55 PM, Brad Chapman wrote: > > Peter, the repository deletion was very quick for me, so it looks like it > got stuck somewhere with the GitHub downtime. They've fixed it - I picked a bad day to delete a "fork". Giovanni wrote: >> > I see that you have renamed your fork as seqio-quality. Ok, but I >> > think it is better to keep the fork's name as 'biopython', and then >> > create many branches inside it. Agreed - when I did that, I hadn't appreciated github's distinction between branches and forks. Peter wrote: >> I don't think I had entirely understood github's use of fork versus branch. >> I'll have so do some more reading and try again once my account has >> settled down. Thanks for the details in your email. Brad wrote: > Wow, now I am mad confused. I thought forks and branches were > conceptually the same. Giovanni, it seems like you are suggesting one > branch (the GitHub fork) and then a second branch (the git branch > command). We were thinking of a standard case as: > > 1. Fork the Biopython trunk at GitHub. Name this something so it > makes sense what the fork/branch is for. > 2. Work on the fork/branch. If you want, invite others to work on it > with you. > 3. When finished, be sure you are up to date with the master > Biopython trunk. > 4. Submit the fork/branch for inclusion in Biopython. > 5. Once included, delete the fork/branch. If I understand correctly, a potential contributor does this: 1. Fork Biopython trunk at GitHub, which will give you your own public repository (aka a "fork" in github's terminology), called by default contributorname/biopython, containing initially a single master branch, e.g. http://github.com/peterjc/biopython/tree/master 2. Using the git command line tool, create a branch within your repository to work on a problem, say bug2551, and upload this branch to your github account. e.g. http://github.com/peterjc/biopython/tree/bug2551 (I presume) 3. Work on your code, and commit changes to your bug2551 branch and push these up to your github account. 4. Once you are happy, submit this bug2551 branch for inclusion in Biopython (in the short term via Bugzilla, but if/when we have moved to github fully, as a pull request to the main biopython master, or if appropriate the master of the mainterainer of that module). 5. Once the changes are in the main Biopython, you can delete the bug2551 branch (but not the whole "fork" which may contain other branches). Almost the same... I'll try this shortly (maybe Monday). Peter From bugzilla-daemon at portal.open-bio.org Sat Mar 21 04:13:10 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sat, 21 Mar 2009 00:13:10 -0400 Subject: [Biopython-dev] [Bug 2678] Bio.Entrez module does not always retrieve or find DTD files In-Reply-To: Message-ID: <200903210413.n2L4DAgf028509@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2678 mdehoon at ims.u-tokyo.ac.jp changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #12 from mdehoon at ims.u-tokyo.ac.jp 2009-03-21 00:13 EST ------- (In reply to comment #11) I've changed Parser.py to show an informative error message about the missing DTD file, where most likely it can be found, and where to install it. Since this is probably the best we can do, I'm marking this bug as fixed. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sat Mar 21 04:24:43 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sat, 21 Mar 2009 00:24:43 -0400 Subject: [Biopython-dev] [Bug 2771] Bio.Entrez.read can't parse XML files from dbSNP (snp database) In-Reply-To: Message-ID: <200903210424.n2L4OhOA029253@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2771 ------- Comment #5 from mdehoon at ims.u-tokyo.ac.jp 2009-03-21 00:24 EST ------- (In reply to comment #0) > >>> handle = Entrez.efetch(db='snp', id='9996597', retmode='xml') > >>> cont = handle.read() > >>> print cont > ' > > ... >
> With Bio.Entrez currently in CVS, Entrez.read does not raise an exception, but simply returns an empty record. The problem is that EFetch from the SNP database uses an XML Schema instead of a DTD to describe the contents of the XML file, as shown in the first few lines of the XML file: The last url shows the XML Schema. All other Entrez Utilities I've seen so far use a DTD instead of an XML Schema. Hence, Entrez.read only has a DTD parser to find out how to interpret the XML file. In principle, Bio.Entrez can be modified to add an XML Schema parser. While this is not trivial, it is probably not super difficult. Marco, would you be willing to write such a parser? If you have a parser for the XML Schema, I can show you how to integrate it with Bio.Entrez. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From mjldehoon at yahoo.com Sat Mar 21 04:47:07 2009 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Fri, 20 Mar 2009 21:47:07 -0700 (PDT) Subject: [Biopython-dev] Bio.Entrez catching more errors In-Reply-To: <320fb6e00903101640s5db8ed9hc1335d02f5e4123@mail.gmail.com> Message-ID: <334920.51680.qm@web62402.mail.re1.yahoo.com> I think it is good if we catch more errors in Bio.Entrez, but I think the error catching should be done by the parser, not when retrieving. As you show, NCBI Entrez returns error messages in various different formats: plain text, HTML, incorrect XML, broken XML. Since there are many ways to access NCBI Entrez, there may be other styles of error messages that we don't know about. Then there is the added complication of accessing NCBI Entrez to get information in formats other than XML, e.g. GenBank files. And all this may be changed over time by NCBI. Since the error message is ill-defined, code trying to identify error messages won't be robust. On the other hand, the format of files expected by a given parser is well-defined: Either the file agrees with the format expected by the parser, or it doesn't; if it doesn't, then that's an error. We may not be able to extract the exact error message returned by NCBI, but a parser for format XYZ can tell you that the file is not in format XYZ. Maybe the XML parser can say it doesn't look like an XML file, but that's about it. Once NCBI Entrez starts to return errors in a uniform format, we can modify our parsers to find out the exact error message. Until that happens, trying to do so on our side will not be robust. --Michiel --- On Tue, 3/10/09, Peter wrote: > From: Peter > Subject: [Biopython-dev] Bio.Entrez catching more errors > To: "BioPython-Dev Mailing List" > Date: Tuesday, March 10, 2009, 7:40 PM > Hi All, > > It occured to me that the Bio.Entrez._open function can > look at the > retmode argument (if present) and spot if there is a > mismatch between > the requested format (e.g. XML, HTML, text or asn.1) and > the actual > data the NCBI returned. Something along the following > lines could be > added to the end of the _open function in > Bio/Entrez/__init__.py to > acheive this: > > elif "retmode" in params and > params["retmode"].lower()=="html" \ > and not data.lower().startswith(" \ > and not data.lower().startswith(" html") : > raise TypeError("Requested HTML, but > didn't get it: %s..." % data) > elif "retmode" in params and > params["retmode"].lower()=="xml" \ > and not data.lower().startswith(" raise TypeError("Requested XML, but didn't > get it: %s..." % data) > elif "retmode" in params and > params["retmode"] \ > and > params["retmode"].lower()!="xml" \ > and data.lower().startswith(" raise TypeError("Didn't request XML, but > got it: %s..." % data) > elif "retmode" in params and > params["retmode"] \ > and > params["retmode"].lower()!="html" \ > and (data.lower().startswith(" \ > data.lower().startswith(" html")): > #Expected for some error pages (e.g. the Bad > Gateway caught above) > raise TypeError("Didn't request HTML, but > got it: %s..." % data) > > I'm sure my XML/HTML detection could be made more > robust here - I hope > the principle is clear. My motivation is that I have > noticed the NCBI > can return HTML error pages, and while we do catch some of > these > explicitly (e.g. Bad Gateway, or Service Unavailable), I > think any > HTML page when the user asked from XML, text or asn.1 > should be > treated as error. Similarly, not getting XML when you ask > for it etc. > > Note that by raising the exception including the message > text it > should be much easier to diagnose these failures. As a > tiny > refinement to the above code, we should only add the > "..." if there is > more text to follow - this isn't always the case. > > e.g. The following give an HTML error page (while some > databases like > "protein" are better behaved in this respect): > >>> print Entrez.efetch(db="homologene", > id="nonexistant", retmode="text").read() > >>> print Entrez.efetch(db="homologene", > id="nonexistant", > retmode="asn.1").read() > > Similarly, these give an XML like fragment (which is not a > valid XML > file in itself - arguably an NCBI bug; some databases like > "protein" > are better behaved in this respect): > >>> print Entrez.efetch(db="pubmed", > id="nonexistant", retmode="xml").read() > >>> print Entrez.efetch(db="homologene", > id="nonexistant", retmode="xml").read() > >>> print Entrez.efetch(db="cdd", > id="nonexistant", retmode="xml").read() > >>> print Entrez.efetch(db="taxonomy", > id="nonexistant", retmode="xml").read() > > My suggested change to Bio.Entrez would also catch the > following > examples (using an invalid database) where the NCBI ignore > the retmode > and return an HTML help page: > >>> print > Entrez.efetch(db="nonexistant", > id="123456", retmode="xml").read() > >>> print > Entrez.efetch(db="nonexistant", > id="123456", retmode="text").read() > > In a less clear cut example, this would flag the following > as an error > as the NCBI seem to return ASN.1 text instead of HTML > here:: > >>> print Entrez.efetch(db="nucleotide", > retmode="html", id="123456").read() > > Overall, I think this change should catch lots of errors > which > otherwise may not be detected until later (e.g. while > trying to parse > the file). > > -------------------------------------------------------------------------------------------------- > > On another point, should we catch these responses as > errors:? > > >>> efetch(db="snp", > id="123456").read() > 'PmFetch > response\n
\n1:
> id: 123456 Error occurred: cannot get document
> summary\n
' > >>> efetch(db="snp", > id="123456", retmode="html").read() > 'PmFetch > response\n
\n1:
> id: 123456 Error occurred: cannot get document
> summary\n
' > >>> efetch(db="snp", > id="123456", retmode="xml").read() > ' version="1.0"?>\n xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"\nxmlns="http://www.ncbi.nlm.nih.gov/SNP/docsum"\nxsi:schemaLocation="http://www.ncbi.nlm.nih.gov/SNP/docsum\nhttp://www.ncbi.nlm.nih.gov/SNP/docsum/eudocsum.xsd">1: > id: 123456 Error occurred: cannot get document > summary\n\n' > >>> efetch(db="snp", > id="123456", retmode="text").read() > '1: id: 123456 Error occurred: cannot get document > summary\n' > > and, > >>> print efetch(db="homologene", > retmode="html", id="fake").read() > > >

Error occurred: Empty id list - > nothing todo

... > > Looking for the string "Error occurred: " looks > fairly safe here, and > should cover a range of entries. Of course, you can > imagine false > positives too, e.g. a valid PUBMED plain text record for a > tutorial > article with a title like "Yikes! An Error Occurred: A > beginner's > Guide To Defensive Programming." could match. > > Peter > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev From mjldehoon at yahoo.com Sat Mar 21 04:54:08 2009 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Fri, 20 Mar 2009 21:54:08 -0700 (PDT) Subject: [Biopython-dev] Bio.Enzyme (was: Re: Bio.ExPASy) In-Reply-To: <76595.11423.qm@web62404.mail.re1.yahoo.com> Message-ID: <517737.76119.qm@web62403.mail.re1.yahoo.com> I've created a simplified version of the parser in Bio.Enzyme in Bio.ExPASy.Enzyme. The idea behind it is to collect all parsers related to ExPASy databases in Bio.ExPASy so that they can be found more easily by users. Bio.ExPASy.Enzyme works essentially the same as Bio.Enzyme, but I've done a few things a bit differently. The biggest change is probably that Bio.Enzyme stores information as attributes to a record, whereas Bio.ExPASy.Enzyme has a Record derived from a dictionary, and stores information in the dictionary (same as Bio.Medline). Does anybody have any objection if Bio.ExPASy.Enzyme becomes the "official" parser for ExPASy's Enzyme database? If not, I'll modify the documentation and tests accordingly, and start the deprecation process for Bio.Enzyme. --Michiel --- On Sun, 3/15/09, Michiel de Hoon wrote: > From: Michiel de Hoon > Subject: [Biopython-dev] Bio.ExPASy > To: biopython-dev at biopython.org > Date: Sunday, March 15, 2009, 6:24 AM > Hi everybody, > > As discussed previously, I have moved the Bio.Prosite code > to Bio.ExPASy, and I've added a ScanProsite module to > Bio.ExPASy. I guess Bio.Enzyme should also move to > Bio.ExPASy. See > > http://biopython.org/DIST/docs/tutorial/Tutorial.proposal.html > > for the documentation of Biopython as currently in CVS. > > --Michiel. > > > > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev From bugzilla-daemon at portal.open-bio.org Sat Mar 21 05:05:19 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sat, 21 Mar 2009 01:05:19 -0400 Subject: [Biopython-dev] [Bug 2759] Unit test for Bio.PDB.HSExposure In-Reply-To: Message-ID: <200903210505.n2L55Jb0031713@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2759 ------- Comment #8 from eric.talevich at gmail.com 2009-03-21 01:05 EST ------- Marco & Peter, have either of you applied these patches to a git branch yet? My branch for Bug 2754 and related changes also converts test_PDB.py to unittest. (I silence the warnings by calling warnings.simplefilter('ignore') in the setUp method.) I'd like to try cherry-picking this commit if it's available on github. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From mjldehoon at yahoo.com Sat Mar 21 05:33:42 2009 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Fri, 20 Mar 2009 22:33:42 -0700 (PDT) Subject: [Biopython-dev] biopython on github In-Reply-To: <20090320125518.GA351@sobchak.mgh.harvard.edu> Message-ID: <587027.97686.qm@web62408.mail.re1.yahoo.com> > Which parts of this fall out of "standard" git > practice? In general, > we should strive to keep this as simple as possible. If > using Git is > complicated then we are losing a lot of our advantage over > CVS/patches. I haven't been following this topic closely, and as an "outsider" using git seems more complicated than using cvs or svn. And to be honest, I don't know if Biopython actually needs the branching and forking stuff. I think that this is more useful for bigger projects, where multiple developers may be working on interrelated parts of code at the same time. That hardly ever happens in Biopython, though. --Michiel. From idoerg at gmail.com Sat Mar 21 05:55:36 2009 From: idoerg at gmail.com (Iddo Friedberg) Date: Fri, 20 Mar 2009 22:55:36 -0700 Subject: [Biopython-dev] It's out! Message-ID: <49C48158.9060004@gmail.com> I'm first to announce this.... hehehe http://bioinformatics.oxfordjournals.org/cgi/content/abstract/btp163v1 -- Iddo Friedberg Ph.D. Atkinson Hall MC 0446 University of California San Diego 9500 Gilman Dr. La Jolla, CA 92093-0446 USA http://iddo-friedberg.net From dalloliogm at gmail.com Sat Mar 21 13:57:54 2009 From: dalloliogm at gmail.com (Giovanni Marco Dall'Olio) Date: Sat, 21 Mar 2009 14:57:54 +0100 Subject: [Biopython-dev] biopython on github In-Reply-To: <320fb6e00903201344w64b303a1q1b1aac2740bac04a@mail.gmail.com> References: <320fb6e00903170744j543f643fg6ef8d677287e2361@mail.gmail.com> <320fb6e00903171059r7a5528d5i19bf5fed9cfd8a63@mail.gmail.com> <20090317213414.GK57054@sobchak.mgh.harvard.edu> <320fb6e00903191200q4ccff93v7e082990d115bc09@mail.gmail.com> <320fb6e00903200341n7df020a7j95c611ab0a886ccb@mail.gmail.com> <5aa3b3570903200415m2f46a45fs8be270f28357a994@mail.gmail.com> <320fb6e00903200432s59ddf9a8vfd8230c0a07cd598@mail.gmail.com> <20090320125518.GA351@sobchak.mgh.harvard.edu> <320fb6e00903201344w64b303a1q1b1aac2740bac04a@mail.gmail.com> Message-ID: <5aa3b3570903210657v46b1b1bbj80c013b83ff635e3@mail.gmail.com> On Fri, Mar 20, 2009 at 9:44 PM, Peter wrote: > If I understand correctly, a potential contributor does this: > 1. Fork Biopython trunk at GitHub, which will give you your own > public repository (aka a "fork" in github's terminology), called > by default contributorname/biopython, containing initially a > single master branch, e.g. > http://github.com/peterjc/biopython/tree/master > 2. Using the git command line tool, create a branch within your > repository to work on a problem, say bug2551, and upload this > branch to your github account. e.g. > http://github.com/peterjc/biopython/tree/bug2551 (I presume) > 3. Work on your code, and commit changes to your bug2551 branch > and push these up to your github account. > 4. Once you are happy, submit this bug2551 branch for inclusion in > Biopython (in the short term via Bugzilla, but if/when we have moved > to github fully, as a pull request to the main biopython master, > or if appropriate the master of the mainterainer of that module). > 5. Once the changes are in the main Biopython, you can delete > the bug2551 branch (but not the whole "fork" which may contain > other branches). Yes, I think this is the procedure. It is a good idea to create a branch with a bug's name, so more people can work at the same time on the same fix. -- My blog on bioinformatics (now in English): http://bioinfoblog.it From bugzilla-daemon at portal.open-bio.org Sat Mar 21 14:32:41 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sat, 21 Mar 2009 10:32:41 -0400 Subject: [Biopython-dev] [Bug 2759] Unit test for Bio.PDB.HSExposure In-Reply-To: Message-ID: <200903211432.n2LEWfXP000985@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2759 ------- Comment #9 from dalloliogm at gmail.com 2009-03-21 10:32 EST ------- (In reply to comment #8) > Marco & Peter, have either of you applied these patches to a git branch yet? My > branch for Bug 2754 and related changes also converts test_PDB.py to unittest. > (I silence the warnings by calling warnings.simplefilter('ignore') in the setUp > method.) I'd like to try cherry-picking this commit if it's available on > github. ok... Is your branch this one: - http://github.com/etal/biopython/commit/65f5cf9fa8d6d63976b0942e00bd9aecef7e4197 ? This was my proposal: - http://github.com/dalloliogm/biopython/blob/alternative-pdb-exposure-test/Tests/test_PDBexposure.py I have structured the unittest in a different way, so every test case represents a pdb file with some known values for PDB exposure etc..: but the result should be the same. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From dalloliogm at gmail.com Sat Mar 21 14:40:05 2009 From: dalloliogm at gmail.com (Giovanni Marco Dall'Olio) Date: Sat, 21 Mar 2009 15:40:05 +0100 Subject: [Biopython-dev] biopython on github In-Reply-To: <587027.97686.qm@web62408.mail.re1.yahoo.com> References: <20090320125518.GA351@sobchak.mgh.harvard.edu> <587027.97686.qm@web62408.mail.re1.yahoo.com> Message-ID: <5aa3b3570903210740n7f818560x47991ed97ed616df@mail.gmail.com> On Sat, Mar 21, 2009 at 6:33 AM, Michiel de Hoon wrote: > >> Which parts of this fall out of "standard" git >> practice? In general, >> we should strive to keep this as simple as possible. If >> using Git is >> complicated then we are losing a lot of our advantage over >> CVS/patches. > > I haven't been following this topic closely, and as an "outsider" using git seems more complicated than using cvs or svn. And to be honest, I don't know if Biopython actually needs the branching and forking stuff. ok, but I assure you if you don't want to learn the advanced features it can be used as you did with cvs. The only difference, maybe, is that you work with a local copy (offline) and push the changes only when you are sure about them. If you keep a mirror on github to collect patched and enhancements, it has some advantages: - more than one people can work on a patch at the same time - it is a lot easier to create customized branches of biopython. So if someone needs to create a custom version of biopython for its own purposes, it will be always easy to keep it compatible with the official code. - people can play with the code and propose enhancements, without having to ask for write rights. This means that more people can take confidence with biopython's code and propose fixes. Have a look at this video, where it shows that the Ruby On Rails project has grown quicker when it has moved to github: - http://python.genedrift.org/2009/03/15/ror-commits/ (the jump should be on minute 5.10 or so) > I think that this is more useful for bigger projects, where multiple developers may be working on interrelated parts of code at the same time. That hardly ever happens in Biopython, though. Let's say I want to propose a patch to biopython. One of you developers will probably need to look at it and propose some changes to adapt it with the rest of biopython. Isn't it this situation are you describing (multiple developers working on interrelated parts of the code)? Another example is the popgen module. Since it is a pretty big module, and independent from the rest, an 'experimental popgen branch' of biopython has been created, based on what was the latest biopython's cvs at the time. However, in the range of time that it has passed since when this branch has been created, the biopython's cvs has changed: so maybe now the experimental popgen branch is not compatible any more with the official code, if some module or convention has been changed. So, git and github make the process of creating a new branch of development and keeping it compatible with the original one easier. > --Michiel. > > > > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev > -- My blog on bioinformatics (now in English): http://bioinfoblog.it From eric.talevich at gmail.com Sat Mar 21 15:23:56 2009 From: eric.talevich at gmail.com (Eric Talevich) Date: Sat, 21 Mar 2009 11:23:56 -0400 Subject: [Biopython-dev] [Bug 2759] Unit test for Bio.PDB.HSExposure In-Reply-To: <200903211432.n2LEWfXP000985@portal.open-bio.org> References: <200903211432.n2LEWfXP000985@portal.open-bio.org> Message-ID: <3f6baf360903210823o7a597a92va0edd2a281deb465@mail.gmail.com> On Sat, Mar 21, 2009 at 10:32 AM, wrote: > http://bugzilla.open-bio.org/show_bug.cgi?id=2759 > > > ------- Comment #9 from dalloliogm at gmail.com 2009-03-21 10:32 EST ------- > (In reply to comment #8) > > Marco & Peter, have either of you applied these patches to a git branch > yet? My > > branch for Bug 2754 and related changes also converts test_PDB.py to > unittest. > > (I silence the warnings by calling warnings.simplefilter('ignore') in the > setUp > > method.) I'd like to try cherry-picking this commit if it's available on > > github. > > ok... Is your branch this one: > - > > http://github.com/etal/biopython/commit/65f5cf9fa8d6d63976b0942e00bd9aecef7e4197 > ? > > > This was my proposal: > - > > http://github.com/dalloliogm/biopython/blob/alternative-pdb-exposure-test/Tests/test_PDBexposure.py > > > I have structured the unittest in a different way, so every test case > represents a pdb file with some known values for PDB exposure etc..: but > the > result should be the same. > > Oh, I see now that these are meant to be separate files. Yes, that's my branch. Perhaps test_PDB.py should be renamed to test_PDBParser.py, and the NeighborSearch test moved elsewhere. In that case, there's no merging problem here, and the only change needed in test_PDBexposure.py is to silence the warnings... right? From dalloliogm at gmail.com Sat Mar 21 16:14:45 2009 From: dalloliogm at gmail.com (Giovanni Marco Dall'Olio) Date: Sat, 21 Mar 2009 17:14:45 +0100 Subject: [Biopython-dev] [Bug 2759] Unit test for Bio.PDB.HSExposure In-Reply-To: <3f6baf360903210823o7a597a92va0edd2a281deb465@mail.gmail.com> References: <200903211432.n2LEWfXP000985@portal.open-bio.org> <3f6baf360903210823o7a597a92va0edd2a281deb465@mail.gmail.com> Message-ID: <5aa3b3570903210914id0bad69xc5459de68b64ec55@mail.gmail.com> On Sat, Mar 21, 2009 at 4:23 PM, Eric Talevich wrote: > On Sat, Mar 21, 2009 at 10:32 AM, wrote: > >> http://bugzilla.open-bio.org/show_bug.cgi?id=2759 >> >> >> ------- Comment #9 from dalloliogm at gmail.com ?2009-03-21 10:32 EST ------- >> (In reply to comment #8) >> > Marco & Peter, have either of you applied these patches to a git branch >> yet? My >> > branch for Bug 2754 and related changes also converts test_PDB.py to >> unittest. >> > (I silence the warnings by calling warnings.simplefilter('ignore') in the >> setUp >> > method.) I'd like to try cherry-picking this commit if it's available on >> > github. >> >> ok... Is your branch this one: >> - >> >> http://github.com/etal/biopython/commit/65f5cf9fa8d6d63976b0942e00bd9aecef7e4197 >> ? >> >> >> This was my proposal: >> - >> >> http://github.com/dalloliogm/biopython/blob/alternative-pdb-exposure-test/Tests/test_PDBexposure.py >> >> >> I have structured the unittest in a different way, so every test case >> represents a pdb file with some known values for PDB exposure etc..: but >> the >> result should be the same. >> >> > > Oh, I see now that these are meant to be separate files. Yes, that's my > branch. Perhaps test_PDB.py should be renamed to test_PDBParser.py, and the > NeighborSearch test moved elsewhere. In that case, there's no merging > problem here, and the only change needed in test_PDBexposure.py is to > silence the warnings... right? well, it depends also on what Peter think. Mine was only a proof of concept to see if the unittest could be refactored in that way. In principle, it should be equivalent to the the original one and execute the same tests. If you want to use it, the problem is that it make use of a decorator function (@classmethod) which is not supported by earlier versions of python. This can be resolved by moving all the instructions in setUpAll into setUp, like here: - http://github.com/dalloliogm/biopython/commit/83864b8a1269aaf52ac193d7bf9ed9ca5edc5a30 (however, this way the setUp instructions - like opening and parsing the PPDB file - will be repeated for every test). > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev > -- My blog on bioinformatics (now in English): http://bioinfoblog.it From eric.talevich at gmail.com Sat Mar 21 17:13:52 2009 From: eric.talevich at gmail.com (Eric Talevich) Date: Sat, 21 Mar 2009 13:13:52 -0400 Subject: [Biopython-dev] [Bug 2759] Unit test for Bio.PDB.HSExposure In-Reply-To: <5aa3b3570903210914id0bad69xc5459de68b64ec55@mail.gmail.com> References: <200903211432.n2LEWfXP000985@portal.open-bio.org> <3f6baf360903210823o7a597a92va0edd2a281deb465@mail.gmail.com> <5aa3b3570903210914id0bad69xc5459de68b64ec55@mail.gmail.com> Message-ID: <3f6baf360903211013k423b925avc4a3e714ce36ff85@mail.gmail.com> On Sat, Mar 21, 2009 at 12:14 PM, Giovanni Marco Dall'Olio < dalloliogm at gmail.com> wrote: > On Sat, Mar 21, 2009 at 4:23 PM, Eric Talevich > wrote: > > On Sat, Mar 21, 2009 at 10:32 AM, >wrote: > > > >> http://bugzilla.open-bio.org/show_bug.cgi?id=2759 > >> > >> > >> ok... Is your branch this one: > >> - > >> > http://github.com/etal/biopython/commit/65f5cf9fa8d6d63976b0942e00bd9aecef7e4197 > >> ? > >> > >> > >> This was my proposal: > >> - > >> > http://github.com/dalloliogm/biopython/blob/alternative-pdb-exposure-test/Tests/test_PDBexposure.py > >> > > > If you want to use it, the problem is that it make use of a decorator > function (@classmethod) which is not supported by earlier versions of > python. > > Decorators and @classmethod were added in Python 2.4. Since support for Python 2.3 is being dropped after the release of BioPython 1.50 (I believe), it should be safe to apply the decorator to post-1.50 branches. If this needs to be in 1.50, the older way of "mymethod = classmethod(mymethod)" would work fine in Py2.3, although I personally would just move the PDB loading steps to setUp, since the parser is pretty quick and the code for that is easy to read. I'll finish up my work on Bug 2754 and merge/rebase it before trying to integrate this code -- that should bring the parse warnings under control and make it easier for Peter to dispatch this bug. From biopython at maubp.freeserve.co.uk Sat Mar 21 21:16:43 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Sat, 21 Mar 2009 21:16:43 +0000 Subject: [Biopython-dev] biopython on github In-Reply-To: <587027.97686.qm@web62408.mail.re1.yahoo.com> References: <20090320125518.GA351@sobchak.mgh.harvard.edu> <587027.97686.qm@web62408.mail.re1.yahoo.com> Message-ID: <320fb6e00903211416r457e303bnc0515b576bbe6c9a@mail.gmail.com> On Sat, Mar 21, 2009 at 5:33 AM, Michiel de Hoon wrote: > I haven't been following this topic closely, and as an > "outsider" using git seems more complicated than using > cvs or svn. And to be honest, I don't know if Biopython > actually needs the branching and forking stuff. I think > that this is more useful for bigger projects, where > multiple developers may be working on interrelated > parts of code at the same time. That hardly ever > happens in Biopython, though. Certainly git and github is much more powerful, and therefore more complicated. There is no denying that. However, if we move to git on github, I would expect those of us with CVS access to all be given write access to the official Biopython branch (probably using the collaborators feature). If that is done, I think you won't find things so different from now. i.e. Initially at least, it would be business as usual - our core official developers would be trusted to work directly on the main branch as now (with discussions before commits as appropriate), and do not have to worry about forking/branching etc (unless they want to). In terms of the actual command(s) you'd have to type in at the terminal to commit a change to the online repository, this goes from one step: cvs commit -m "Comment here" file1.py file2.py ... to two steps. First you you have to commit changes locally (to git on your personal machine) and then push them to the main Biopython branch on public server (on github). Once I'm back at work where I have git installed, I'll write this up on the wiki - assuming Brad doesn't beat me too it ;) The big change is for non-core developers, i.e. potential contributors (like Eric who is currently trying some Bio.PDB changes). For them, using git allows them to work on their changes and keep in sync with the master repository with much more ease. Peter From chris.lasher at gmail.com Sun Mar 22 02:33:11 2009 From: chris.lasher at gmail.com (Chris Lasher) Date: Sat, 21 Mar 2009 22:33:11 -0400 Subject: [Biopython-dev] Help pages in Biopython wiki In-Reply-To: References: <128a885f0903192152m7d1e24fdh3ace50021851b36e@mail.gmail.com> Message-ID: <128a885f0903211933w2fd8986ek53ad8d083cca3534@mail.gmail.com> On Fri, Mar 20, 2009 at 4:42 AM, Leighton Pritchard wrote: > Hi Chris, > > That page doesn't exist, yet (click on the 'page' tab to see this), and no > pages link to it (see here: > http://biopython.org/wiki/Special:WhatLinksHere/Help) > > What help were you expecting to see there? Hi Leighton, I'm fairly certain there are pages one can install with a MediaWiki instance that provide the standard help. They look like this: http://www.mediawiki.org/wiki/Help:Contents They contain the standard documentation about how to edit, format, create new pages, etc. Useful things for new community members and people like me who forget the nuances of each wiki software's markup language from time to time. :-) Chris From biopython at maubp.freeserve.co.uk Sun Mar 22 10:18:49 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Sun, 22 Mar 2009 10:18:49 +0000 Subject: [Biopython-dev] Help pages in Biopython wiki In-Reply-To: <128a885f0903211933w2fd8986ek53ad8d083cca3534@mail.gmail.com> References: <128a885f0903192152m7d1e24fdh3ace50021851b36e@mail.gmail.com> <128a885f0903211933w2fd8986ek53ad8d083cca3534@mail.gmail.com> Message-ID: <320fb6e00903220318g7e214c8bmf1e6012e5db505fd@mail.gmail.com> On Sun, Mar 22, 2009 at 2:33 AM, Chris Lasher wrote: > Hi Leighton, > > I'm fairly certain there are pages one can install with a MediaWiki instance > that provide the standard help. They look like this: > http://www.mediawiki.org/wiki/Help:Contents > > They contain the standard documentation about how to edit, format, create > new pages, etc. Useful things for new community members and people like me > who forget the nuances of each wiki software's markup language from time to > time. :-) > > Chris I'm glad Leighton asked - otherwise I would had. Would it suffice to create an a manual help page, saying this is a wiki and we are happy for people to create their own account to fix any minor errors they spot, and just link to http://www.mediawiki.org/wiki/Help:Contents for help? Peter From biopython at maubp.freeserve.co.uk Sun Mar 22 10:51:17 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Sun, 22 Mar 2009 10:51:17 +0000 Subject: [Biopython-dev] [Bug 2759] Unit test for Bio.PDB.HSExposure In-Reply-To: <3f6baf360903211013k423b925avc4a3e714ce36ff85@mail.gmail.com> References: <200903211432.n2LEWfXP000985@portal.open-bio.org> <3f6baf360903210823o7a597a92va0edd2a281deb465@mail.gmail.com> <5aa3b3570903210914id0bad69xc5459de68b64ec55@mail.gmail.com> <3f6baf360903211013k423b925avc4a3e714ce36ff85@mail.gmail.com> Message-ID: <320fb6e00903220351u53563f03m4c54359278c5b7f0@mail.gmail.com> On Sat, Mar 21, 2009 at 5:13 PM, Eric Talevich wrote: > Giovanni wrote: >> If you want to use it, the problem is that it make use of a decorator >> function (@classmethod) which is not supported by earlier versions of >> python. > > Decorators and @classmethod were added in Python 2.4. Since support for > Python 2.3 is being dropped after the release of BioPython 1.50 (I believe), > it should be safe to apply the decorator to post-1.50 branches. If this > needs to be in 1.50, the older way of "mymethod = classmethod(mymethod)" > would work fine in Py2.3, although I personally would just move the PDB > loading steps to setUp, since the parser is pretty quick and the code for > that is easy to read. Extra PDB unit tests would be nice to have in Biopython 1.50, which means they must work on Python 2.3, so no decorators please. I agree with Eric that it is simpler just to use setUp for PDB file parsing. Yes, it is slower as for each test method the PDB file is reloaded - but you also make sure it is a clean object structure, which is important as some operations we will testing may change the object. e.g. HSExposure: http://bugzilla.open-bio.org/show_bug.cgi?id=2759#c4 Peter From biopython at maubp.freeserve.co.uk Sun Mar 22 10:44:42 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Sun, 22 Mar 2009 10:44:42 +0000 Subject: [Biopython-dev] Bio.Entrez catching more errors In-Reply-To: <334920.51680.qm@web62402.mail.re1.yahoo.com> References: <320fb6e00903101640s5db8ed9hc1335d02f5e4123@mail.gmail.com> <334920.51680.qm@web62402.mail.re1.yahoo.com> Message-ID: <320fb6e00903220344t1057bf74mcdc1f2256d8b29b4@mail.gmail.com> On Sat, Mar 21, 2009 at 4:47 AM, Michiel de Hoon wrote: > > I think it is good if we catch more errors in Bio.Entrez, but I think > the error catching should be done by the parser, not when > retrieving. We could do that - maybe some common functions for checking the first line to see if it looks like HTML or XML would help. It means lots of changes to lots of parsers, but would help outside the use case of Bio.Entrez - so this perhaps worth doing anyway. What about the fairly common situation (at, its something I've done fairly often) where Bio.Entrez.efetch() is used to fetch records which are saved directly to file without verification - e.g. to be parsed by another program? Unless the error is caught in Bio.Entrez.efetch() it may be out of our control. > As you show, NCBI Entrez returns error messages in various > different formats: plain text, HTML, incorrect XML, broken XML. > Since there are many ways to access NCBI Entrez, there may > be other styles of error messages that we don't know about. > Then there is the added complication of accessing NCBI Entrez > to get information in formats other than XML, e.g. GenBank files. > And all this may be changed over time by NCBI. > > Since the error message is ill-defined, code trying to identify > error messages won't be robust. All very true. But the main point in my original email was on something slightly different... > On the other hand, the format of files expected by a given > parser is well-defined: Either the file agrees with the format > expected by the parser, or it doesn't; if it doesn't, then that's > an error. Its not that simple - we are often dealing with loosely defined file formats, and you may be able to reasonably interpret one file in several different formats (giving difference/incorrect data). Some parsers are very tolerant at the moment, for example GenBank files can have a legitimate free format comment before the records, so the parser skips anything until it recognizes a GenBank locus id line. > We may not be able to extract the exact error message > returned by NCBI, but a parser for format XYZ can tell > you that the file is not in format XYZ. Some parsers may be able to do this, but not all. > Maybe the XML parser can say it doesn't look like an > XML file, but that's about it. This is an easy case because XML is so strictly defined. Spotting a non-XML file is pretty trivial. > Once NCBI Entrez starts to return errors in a uniform > format, we can modify our parsers to find out the > exact error message. Until that happens, trying to do > so on our side will not be robust. I agree that pulling out error messages (the second half of my original email in the thread) is error prone. You might argue that catching any errors is still worthwhile, as long as there are no false positives. The first half of the email (the main point) was based on a special case: HTML and XML are pretty easy to identify. If you ask for HTML and don't get it, it is an error (and vice versa). If you ask for XML and don't get it, it is an error (and vice versa). The fact that the NCBI currently often return an HTML or XML error page when a plain text format was requested is then easily detected as an error (simply from the file type). This will still work even if the NCBI do change their error formats or wording - it should be pretty robust. Peter From bugzilla-daemon at portal.open-bio.org Sun Mar 22 11:36:38 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sun, 22 Mar 2009 07:36:38 -0400 Subject: [Biopython-dev] [Bug 2754] Bio.PDB: Parse warnings should print to stderr, not stdout In-Reply-To: Message-ID: <200903221136.n2MBacSc000608@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2754 ------- Comment #12 from biopython-bugzilla at maubp.freeserve.co.uk 2009-03-22 07:36 EST ------- I have a thought last night about this - how about we keep PERMISSIVE=1 as the default but offer a "very permissive" mode: PERMISSIVE=2 (or more), silently ignore problems, continue parsing. PERMISSIVE=1 (or True), use stderr via the warning module, continue parsing. PERMISSIVE=0 (or False), raise exceptions, halt parsing. It would ofter an alternative way to silence the warnings in the unit tests, and could be controlled at the level of individual tests - for example where we want to make sure certain errors are caught. It might also be useful in ordinary scripts. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From tiagoantao at gmail.com Sun Mar 22 11:50:50 2009 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Sun, 22 Mar 2009 11:50:50 +0000 Subject: [Biopython-dev] biopython on github In-Reply-To: <587027.97686.qm@web62408.mail.re1.yahoo.com> References: <20090320125518.GA351@sobchak.mgh.harvard.edu> <587027.97686.qm@web62408.mail.re1.yahoo.com> Message-ID: <6d941f120903220450y4005b63bvd23dcb4981edec7b@mail.gmail.com> On Sat, Mar 21, 2009 at 5:33 AM, Michiel de Hoon wrote: > I haven't been following this topic closely, and as an "outsider" using git seems more complicated than using cvs or svn. And to be honest, I don't know if Biopython actually needs the branching and forking stuff. I think that this is more useful for bigger projects, where multiple developers may be working on interrelated parts of code at the same time. That hardly ever happens in Biopython, though. I would actually take this argument and reverse it: The reason why biopython has been a small project, and above all, slow to develop and innovate is excessive centralization. Using a distributed technology allows for people to try new ideas and to get things moving (while still maintaining an official rock stable version with maybe glacial policies). Lets not kid ourselves: biopython lacks a lot of stuff that is fundamental in modern computational biology. The current status quo is essentially maintaining a frozen set of functionality (most new code is really just code cleanup and optimization). While I would be cautious with a distributed environment and would agree that checks has to be put in place to assure that the official product is rock solid, has documentation and is reasonably future proof, I nonetheless warmly welcome this new development. It is also good, for a change, to have an active discussion on the list: Now this actually seems like proper, live community. Tiago From eric.talevich at gmail.com Sun Mar 22 15:25:23 2009 From: eric.talevich at gmail.com (Eric Talevich) Date: Sun, 22 Mar 2009 11:25:23 -0400 Subject: [Biopython-dev] [Bug 2754] Bio.PDB: Parse warnings should print to stderr, not stdout In-Reply-To: <200903221136.n2MBacSc000608@portal.open-bio.org> References: <200903221136.n2MBacSc000608@portal.open-bio.org> Message-ID: <3f6baf360903220825g2b871432yba5749dab4c2ba34@mail.gmail.com> On Sun, Mar 22, 2009 at 7:36 AM, wrote: > http://bugzilla.open-bio.org/show_bug.cgi?id=2754 > > > > ------- Comment #12 from biopython-bugzilla at maubp.freeserve.co.uk 2009-03-22 07:36 EST ------- > I have a thought last night about this - how about we keep PERMISSIVE=1 as > the > default but offer a "very permissive" mode: > > PERMISSIVE=2 (or more), silently ignore problems, continue parsing. > PERMISSIVE=1 (or True), use stderr via the warning module, continue > parsing. > PERMISSIVE=0 (or False), raise exceptions, halt parsing. > > It would ofter an alternative way to silence the warnings in the unit > tests, > and could be controlled at the level of individual tests - for example > where we > want to make sure certain errors are caught. > > It might also be useful in ordinary scripts. > > I like the idea. I still have to comb through the documentation for the warnings module some more, but I think it should be possible to do all of this through that API -- loading PERMISSIVE=0 turns the warnings into full exceptions, =1 makes them messages on stderr, and =2 switches them off. At some point I'd like to make a script called something like pdbtidy.py which parses a potentially not-quite-conformant PDB file in a permissive mode, lists all complaints (including things like discontinuously-numbered residues, atom collisions, psi-phi outliers, etc.), and writes out a fixed version of the file. The model for this is HTML Tidy. Do you think this would have a place in the Biopython distribution? From biopython at maubp.freeserve.co.uk Sun Mar 22 15:53:21 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Sun, 22 Mar 2009 15:53:21 +0000 Subject: [Biopython-dev] PDB tidy script, was: [Bug 275 Message-ID: <320fb6e00903220853u7b594ee3na86560e34f742b5f@mail.gmail.com> On Bug 2754 comment 12, I wrote: http://bugzilla.open-bio.org/show_bug.cgi?id=2754#c12 >> I have a thought last night about this - how about we keep PERMISSIVE=1 >> as the default but offer a "very permissive" mode: >> >> PERMISSIVE=2 (or more), silently ignore problems, continue parsing. >> PERMISSIVE=1 (or True), use stderr via the warning module, continue >> parsing. >> PERMISSIVE=0 (or False), raise exceptions, halt parsing. >> >> It would ofter an alternative way to silence the warnings in the unit >> tests, and could be controlled at the level of individual tests - for >> example where we want to make sure certain errors are caught. >> >> It might also be useful in ordinary scripts. Eric replied: > I like the idea. I still have to comb through the documentation for the > warnings module some more, but I think it should be possible to do all of > this through that API -- loading PERMISSIVE=0 turns the warnings into full > exceptions, =1 makes them messages on stderr, and =2 switches them off. It doesn't really matter - all the PDB contruction warning/errors go though _handle_PDB_exception() to this would be the least invasive way to implement this. > At some point I'd like to make a script called something like pdbtidy.py > which parses a potentially not-quite-conformant PDB file in a permissive > mode, lists all complaints (including things like discontinuously-numbered > residues, atom collisions, psi-phi outliers, etc.), and writes out a fixed > version of the file. The model for this is HTML Tidy. Do you think this > would have a place in the Biopython distribution? It sounds useful to me, it can probably go in the scripts subdirectory, along with the PDB surface exposure script. One drawback is that currently Bio.PDB's header parsing leaves a lot to be desired, and very little of the header is output when saving a PDB file (Thomas' focus is/was very much on the 3D data). Peter From lpritc at scri.ac.uk Mon Mar 23 09:02:53 2009 From: lpritc at scri.ac.uk (Leighton Pritchard) Date: Mon, 23 Mar 2009 09:02:53 +0000 Subject: [Biopython-dev] biopython on github In-Reply-To: <5aa3b3570903210740n7f818560x47991ed97ed616df@mail.gmail.com> Message-ID: On 21/03/2009 14:40, "Giovanni Marco Dall'Olio" wrote: > Have a look at this video, where it shows that the Ruby On Rails > project has grown quicker when it has moved to github: > > - http://python.genedrift.org/2009/03/15/ror-commits/ > > (the jump should be on minute 5.10 or so) I've seen this argument a couple of times, now - mostly on blogs - and I'm not sure that it's all that clear-cut. The RoR video shows a greater number of individual names associated with commits, after the move to github. However, it's not clear whether this is because a large number of individuals have suddenly decided to contribute to the project, or whether the move to a version control system in which author attribution remains with contributed code - as opposed to the bottleneck of having to be submitted with the id of someone with write access - is responsible. I don't think there's enough evidence to say 'the move to github caused an increase in contributions'. As a counter-example, the number of people who have recorded contributions to Biopython code is 46 (from the CONTRIB file on CVS). I don't think that there are that many ids associated with committing the codebase on there. My name's only associated with GenomeDiagram in the commit comments, not as an author/committer of the code - at least, as far as the CVS application is concerned - for example. This might change with git. Of course, I might be misunderstanding git's attribution model, or how the stats for that RoR video were compiled... L. -- Dr Leighton Pritchard MRSC D131, Plant Pathology Programme, SCRI Errol Road, Invergowrie, Perth and Kinross, Scotland, DD2 5DA e:lpritc at scri.ac.uk w:http://www.scri.ac.uk/staff/leightonpritchard gpg/pgp: 0xFEFC205C tel:+44(0)1382 562731 x2405 ______________________________________________________ SCRI, Invergowrie, Dundee, DD2 5DA. The Scottish Crop Research Institute is a charitable company limited by guarantee. Registered in Scotland No: SC 29367. Recognised by the Inland Revenue as a Scottish Charity No: SC 006662. DISCLAIMER: This email is from the Scottish Crop Research Institute, but the views expressed by the sender are not necessarily the views of SCRI and its subsidiaries. This email and any files transmitted with it are confidential to the intended recipient at the e-mail address to which it has been addressed. It may not be disclosed or used by any other than that addressee. If you are not the intended recipient you are requested to preserve this confidentiality and you must not use, disclose, copy, print or rely on this e-mail in any way. Please notify postmaster at scri.ac.uk quoting the name of the sender and delete the email from your system. Although SCRI has taken reasonable precautions to ensure no viruses are present in this email, neither the Institute nor the sender accepts any responsibility for any viruses, and it is your responsibility to scan the email and the attachments (if any). ______________________________________________________ From p.j.a.cock at googlemail.com Mon Mar 23 10:14:10 2009 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Mon, 23 Mar 2009 10:14:10 +0000 Subject: [Biopython-dev] biopython on github In-Reply-To: References: <5aa3b3570903210740n7f818560x47991ed97ed616df@mail.gmail.com> Message-ID: <320fb6e00903230314y212be042gfd2f0b86f8738f2d@mail.gmail.com> On Mon, Mar 23, 2009 at 9:02 AM, Leighton Pritchard wrote: > On 21/03/2009 14:40, "Giovanni Marco Dall'Olio" > wrote: > >> Have a look at this video, where it shows that the Ruby On Rails >> project has grown quicker when it has moved to github: >> >> - http://python.genedrift.org/2009/03/15/ror-commits/ >> >> (the jump should be on minute 5.10 or so) > > I've seen this argument a couple of times, now - mostly on blogs - and I'm > not sure that it's all that clear-cut. > > The RoR video shows a greater number of individual names associated with > commits, after the move to github. ?However, it's not clear whether this is > because a large number of individuals have suddenly decided to contribute to > the project, or whether the move to a version control system in which author > attribution remains with contributed code - as opposed to the bottleneck of > having to be submitted with the id of someone with write access - is > responsible. ?I don't think there's enough evidence to say 'the move to > github caused an increase in contributions'. > > As a counter-example, the number of people who have recorded contributions > to Biopython code is 46 (from the CONTRIB file on CVS). ?I don't think that > there are that many ids associated with committing the codebase on there. > My name's only associated with GenomeDiagram in the commit comments, not as > an author/committer of the code - at least, as far as the CVS application is > concerned - for example. ?This might change with git. ?Of course, I might be > misunderstanding git's attribution model, or how the stats for that RoR > video were compiled... Leighton has a good point about the attribution, and the dangers in over interpreting such a video. With git/github it will make it easier to see who contributed patches (if a patch is pulled into another branch, both the person doing the merge and the person who originally checked in the patch get recorded), and that may indirectly encourage more contributions. As Leighton points out, we do try and give credit now in CVS commit comments, but these are checked in by a core developer. I imagine this happened with RoR, but compiling this information for that video would probably have been too much work. As well as switching tools, you are also changing the metric. Something else to consider is how you are measuring activity: the git and github documentation and press encourages people to commit more often - for example while working on a bug fix or a new feature, I might make three incremental commits on my local copy of the repository, before I am happy enough to push this to the online repository. This would then show as three commits, wouldn't it - but on CVS it would probably be just one. i.e. On CVS I suspect you naturally tend to get a smaller number of larger commits than with git. This difference will probably vary from person to person - I haven't counted or anything, but with CVS I think I tend to commit lots of smaller changes, while Michiel for example tends to make fewer but larger commits). i.e. If the RoR video shows a sudden jump in the number of commits, that doesn't mean more code changes. Scaling by number of lines changed would be another metric which is perhaps more robust - but has drawbacks of its own. Peter From eric.talevich at gmail.com Mon Mar 23 20:39:05 2009 From: eric.talevich at gmail.com (Eric Talevich) Date: Mon, 23 Mar 2009 16:39:05 -0400 Subject: [Biopython-dev] PDB tidy script, was: [Bug 275 In-Reply-To: <320fb6e00903220853u7b594ee3na86560e34f742b5f@mail.gmail.com> References: <320fb6e00903220853u7b594ee3na86560e34f742b5f@mail.gmail.com> Message-ID: <3f6baf360903231339i22438e3bia554a0b7bdda7a5d@mail.gmail.com> On Sun, Mar 22, 2009 at 11:53 AM, Peter wrote: > > One drawback is that currently Bio.PDB's header parsing leaves a lot to > be desired, and very little of the header is output when saving a PDB file > (Thomas' focus is/was very much on the 3D data). > > Peter > I haven't been on this list long enough to know -- is Thomas still supporting the PDB module? If so, would he give his blessing to some more invasive changes to the PDB module, such as unifying PDBParser and parse_pdb_header? That separation has always seemed curiously vestigal to me. Now that github gives us some flexibility with public branches, it would be nice to have a discussion on some longer-term plans for Bio.PDB. I do a fair amount of work with PDB files and PyMol at my lab, and if the Biopython core devs are open to it, I can start merging enhancements into my public branch on github. However, if there's already a plan for the module, it's obviously best for me not to publish a divergent branch. -Eric From biopython at maubp.freeserve.co.uk Mon Mar 23 21:05:21 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 23 Mar 2009 21:05:21 +0000 Subject: [Biopython-dev] PDB tidy script Message-ID: <320fb6e00903231405l479ddcc6of9cd0c1aa8fd98d4@mail.gmail.com> On Mon, Mar 23, 2009 at 8:39 PM, Eric Talevich wrote: > On Sun, Mar 22, 2009 at 11:53 AM, Peter wrote: > >> >> One drawback is that currently Bio.PDB's header parsing leaves a lot to >> be desired, and very little of the header is output when saving a PDB file >> (Thomas' focus is/was very much on the 3D data). >> >> Peter > > I haven't been on this list long enough to know -- is Thomas still > supporting the PDB module? If so, would he give his blessing to some more > invasive changes to the PDB module, such as unifying PDBParser and > parse_pdb_header? That separation has always seemed curiously vestigal to > me. > Now that github gives us some flexibility with public branches, it would > be nice to have a discussion on some longer-term plans for Bio.PDB. I do a > fair amount of work with PDB files and PyMol at my lab, and if the Biopython > core devs are open to it, I can start merging enhancements into my public > branch on github. However, if there's already a plan for the module, it's > obviously best for me not to publish a divergent branch. If you look back over the history, there initially was no header parsing, it was a contribution from Kristian Rother, and I would agree, it is rather disjoint from the rest of the code. One thing I personally wanted last time I was working with PDB files was to have secondary structure information (for them alpha and beta sheet lines in the header) mapped onto the residue objects automatically. And yes, Thomas is supporting the PDB module, but his time has been rather limited of late. When I asked him about some of the open enhancement requests in bugzilla recently (off list) he said said we needed "a separate class to parse all the info in the header, not a slew of additions to the core parser class (which is designed to deal with the 3D data only)." I would suggest you try and get Thomas involved now for his input on the design (before you start coding), but if need be press ahead anyway for your own use, and he can always comment on your public branch. I hope the two of you can work together on this, and if/when Thomas does stand down (or delagate), you could then be in an excellent position to take over as the Bio.PDB maintainer if that's what you wanted. Peter From sbassi at clubdelarazon.org Tue Mar 24 06:24:38 2009 From: sbassi at clubdelarazon.org (Sebastian Bassi) Date: Tue, 24 Mar 2009 03:24:38 -0300 Subject: [Biopython-dev] SeqIO and qual: Question about reading and writing qual files Message-ID: <9e2f512b0903232324qb509c60v4154d3e1bffb089e@mail.gmail.com> I have a .fasta file and its corresponding .qual file. I run seqclean on the fasta file and I got a shorter .fasta file as output (that is expected). Using the .cln file from seqclean, I want to "trim" the .qual file the same way my new fasta is trimmed. I can read the cln and parse the information of "where to trim". For example, in one original sequence of 1000 bp, I may need to trim from 150 to 800. The problem is that I can't modify qual values using the new SeqIO qual parser (at least the size of the list can't be modified). I read the example in the doc, where it is cut doing something like: sub_rec = fullrec[150:800] But, this works only when there is a sequence (so, when read it as "fastq"), but it doesn't work when the sequence is read as "qual" (because there is no sequence and in this case I can't modify the length of the list in letter_annotations['phred_quality'], it is true that I can modify qual values in the list, but I want to modify list size). Here is the error: Traceback (most recent call last): File "/home/sbassi/bioinfo/INTA/qualparser.py", line 18, in s.letter_annotations['phred_quality'] = [0,0,0,0,10,1] File "/home/sbassi/test/virtualenv-1.3.2/t6/lib/python2.5/site-packages/biopython-1.49-py2.5-linux-i686.egg/Bio/SeqRecord.py", line 33, in __setitem__ "strings) of length %i." % self._length) TypeError: We only allow python sequences (lists, tuples or strings) of length 5. (Note: 5 was the size of the original qual record, when I tried to set it to [0,0,0,0,10,1], I get this). So my question is: Does it make sense to allow the user to modify the size of the list in letter_annotations['phred_quality'] in qual sequences? I think this is a nice feature for qual SeqIO.parse. If I can modify the list size, then I can save the modified version with SeqIO.write(x,fh,"qual") and have a qual file with a new size. I am using 1.49 with new files from CVS. -- Sebasti?n Bassi. Diplomado en Ciencia y Tecnolog?a. Non standard disclaimer: READ CAREFULLY. By reading this email, you agree, on behalf of your employer, to release me from all obligations and waivers arising from any and all NON-NEGOTIATED agreements, licenses, terms-of-service, shrinkwrap, clickwrap, browsewrap, confidentiality, non-disclosure, non-compete and acceptable use policies ("BOGUS AGREEMENTS") that I have entered into with your employer, its partners, licensors, agents and assigns, in perpetuity, without prejudice to my ongoing rights and privileges. You further represent that you have the authority to release me from any BOGUS AGREEMENTS on behalf of your employer. From biopython at maubp.freeserve.co.uk Tue Mar 24 09:49:33 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 24 Mar 2009 09:49:33 +0000 Subject: [Biopython-dev] SeqIO and qual: Question about reading and writing qual files In-Reply-To: <9e2f512b0903232324qb509c60v4154d3e1bffb089e@mail.gmail.com> References: <9e2f512b0903232324qb509c60v4154d3e1bffb089e@mail.gmail.com> Message-ID: <320fb6e00903240249h4d0bf648rfd5de741e582f687@mail.gmail.com> On Tue, Mar 24, 2009 at 6:24 AM, Sebastian Bassi wrote: > I have a .fasta file and its corresponding .qual file. > I run seqclean on the fasta file and I got a shorter .fasta file as > output (that is expected). Whose seqclean script are you using? If it doesn't output the trimmed qual file, can it work with FASTQ output instead? > Using the .cln file from seqclean, I want to "trim" the .qual file the > same way my new fasta is trimmed. > I can read the cln and parse the information of "where to trim". > For example, in one original sequence of 1000 bp, I may need to trim > from 150 to 800. > The problem is that I can't modify qual values using the new SeqIO > qual parser (at least the size of the list can't be modified). I read > the example in the doc, where it is cut doing something like: > sub_rec = fullrec[150:800] > But, this works only when there is a sequence (so, when read it as > "fastq"), but it doesn't work when the sequence is read as "qual" > (because there is no sequence ... > So my question is: Does it make sense to allow the user to modify the > size of the list in letter_annotations['phred_quality'] in qual > sequences? I think this is a nice feature for qual SeqIO.parse. This was one area of the new SeqRecord slicing I was a little unsure about - slicing a qual file's SeqRecord (or any SeqRecord with a None for the sequence). I hadn't done anything about it immediately as I couldn't think of a use case for it - so that's solved ;) One solution would be to introduce an UnknownSeq object, which would be much nicer to deal with than a None object, as it would have a length and support slicing. I've mentioned this idea before, but haven't yet put forward any actual code. This seems most elegant. Another option would be to special case handle slicing a SeqRecord with a None sequence, where we'd slice its per-letter-annotation. For now, you can force this with the current code by: #Not recommend, short term hack s.letter_annotations._length = 6 s.letter_annotations['phred_quality'] = [0,0,0,0,10,1] Right now, without changing Biopython, I have another workaround for you: Use the paired reader in Bio.SeqIO.QualityIO on the untrimmed FASTA and QUAL files, which will give you SeqRecords with both the sequence and the quality - and trim these by slicing the SeqRecord. Peter From sbassi at clubdelarazon.org Tue Mar 24 14:59:51 2009 From: sbassi at clubdelarazon.org (Sebastian Bassi) Date: Tue, 24 Mar 2009 11:59:51 -0300 Subject: [Biopython-dev] SeqIO and qual: Question about reading and writing qual files In-Reply-To: <320fb6e00903240249h4d0bf648rfd5de741e582f687@mail.gmail.com> References: <9e2f512b0903232324qb509c60v4154d3e1bffb089e@mail.gmail.com> <320fb6e00903240249h4d0bf648rfd5de741e582f687@mail.gmail.com> Message-ID: <9e2f512b0903240759n3c7f8b8fpc96bccd4d629082d@mail.gmail.com> On Tue, Mar 24, 2009 at 6:49 AM, Peter wrote: > Whose seqclean script are you using? If it doesn't output the trimmed > qual file, can it work with FASTQ output instead? I am using the seqclean found here: http://compbio.dfci.harvard.edu/tgi/software/ I doesn't output a trimmed qual file because seqclean accepts only fasta as input. Oh, wait!!!. Looking at my seqclean directory I found a cln2qual script. So I looked at the README to see what is it, and I found: "If after seqclean one needs to trim the corresponding quality values too, according to the new coordinates or trash codes found by seqclean, the utility script "cln2qual" is included (see the usage message). It expects a fasta-like file containing space delimited quality values for each nucleotide of the original sequences. It should be run after the seqclean, as it parses the trimming ("clear range") coordinates and trash codes from the cleaning report and applies them to the quality records." So this utility does what I was about to do with Biopython. But anyway, regarding this: > This was one area of the new SeqRecord slicing I was a little unsure > about - slicing a qual file's SeqRecord (or any SeqRecord with a None > for the sequence). I hadn't done anything about it immediately as I > couldn't think of a use case for it - so that's solved ;) > One solution would be to introduce an UnknownSeq object, which .... I agree with the need of an UnknownSeq object for modify the size of the qual file. Best, SB. From biopython at maubp.freeserve.co.uk Tue Mar 24 15:13:40 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 24 Mar 2009 15:13:40 +0000 Subject: [Biopython-dev] SeqIO and qual: Question about reading and writing qual files In-Reply-To: <9e2f512b0903240759n3c7f8b8fpc96bccd4d629082d@mail.gmail.com> References: <9e2f512b0903232324qb509c60v4154d3e1bffb089e@mail.gmail.com> <320fb6e00903240249h4d0bf648rfd5de741e582f687@mail.gmail.com> <9e2f512b0903240759n3c7f8b8fpc96bccd4d629082d@mail.gmail.com> Message-ID: <320fb6e00903240813x5fdb3589qef340129b5e267c0@mail.gmail.com> On Tue, Mar 24, 2009 at 2:59 PM, Sebastian Bassi wrote: > But anyway, regarding this: > >> This was one area of the new SeqRecord slicing I was a little unsure >> about - slicing a qual file's SeqRecord (or any SeqRecord with a None >> for the sequence). ?I hadn't done anything about it immediately as I >> couldn't think of a use case for it - so that's solved ;) >> One solution would be to introduce an UnknownSeq object, which >> .... > > I agree with the need of an UnknownSeq object for modify the size of > the qual file. Suppose you read in a qual file (or a GenBank file with no sequence, just a CONTIG line), and instead of None, the SeqRecord object(s) had a new UnknownSeq object saying they where made up of a given number of "N" characters using a DNA alphabet. What would you expect to get if you used Bio.SeqIO to write out the file in FASTA format? To my mind there are two sensible options - write out the file using the "NNN....N" sequence, or raise an error. Peter From biopython at maubp.freeserve.co.uk Tue Mar 24 15:23:20 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 24 Mar 2009 15:23:20 +0000 Subject: [Biopython-dev] SeqIO and qual: Question about reading and writing qual files In-Reply-To: <320fb6e00903240249h4d0bf648rfd5de741e582f687@mail.gmail.com> References: <9e2f512b0903232324qb509c60v4154d3e1bffb089e@mail.gmail.com> <320fb6e00903240249h4d0bf648rfd5de741e582f687@mail.gmail.com> Message-ID: <320fb6e00903240823o53267d8bn36908f001708f974@mail.gmail.com> On Tue, Mar 24, 2009 at 9:49 AM, Peter wrote: > > This was one area of the new SeqRecord slicing I was a little unsure > about - slicing a qual file's SeqRecord (or any SeqRecord with a None > for the sequence). ?I hadn't done anything about it immediately as I > couldn't think of a use case for it - so that's solved ;) > > One solution would be to introduce an UnknownSeq object, which > would be much nicer to deal with than a None object, as it would have > a length and support slicing. ?I've mentioned this idea before, but > haven't yet put forward any actual code. ?This seems most elegant. > > Another option would be to special case handle slicing a SeqRecord > with a None sequence, where we'd slice its per-letter-annotation. That should now be working with the change I've just checked into CVS, but the combination of slicing per-letter-annotation while the sequence is None is a real pain. I'm almost tempted to back out the qual parser for the next release (FASTQ support is fine), but let's see if if we can reach a consensus on a new UnknownSeq class instead (see my earlier email on this - what would you expect to happen if you read in a QUAL file and tried to save it as a FASTA file?). Peter From sbassi at clubdelarazon.org Tue Mar 24 15:33:56 2009 From: sbassi at clubdelarazon.org (Sebastian Bassi) Date: Tue, 24 Mar 2009 12:33:56 -0300 Subject: [Biopython-dev] SeqIO and qual: Question about reading and writing qual files In-Reply-To: <320fb6e00903240813x5fdb3589qef340129b5e267c0@mail.gmail.com> References: <9e2f512b0903232324qb509c60v4154d3e1bffb089e@mail.gmail.com> <320fb6e00903240249h4d0bf648rfd5de741e582f687@mail.gmail.com> <9e2f512b0903240759n3c7f8b8fpc96bccd4d629082d@mail.gmail.com> <320fb6e00903240813x5fdb3589qef340129b5e267c0@mail.gmail.com> Message-ID: <9e2f512b0903240833g7768de97q8f10fe72cde7e64a@mail.gmail.com> On Tue, Mar 24, 2009 at 12:13 PM, Peter wrote: .... > characters using a DNA alphabet. What would you expect to get if you > used Bio.SeqIO to write out the file in FASTA format? To my mind there > are two sensible options - write out the file using the "NNN....N" > sequence, or raise an error. "N" is OK (with the same length of the qual file), that is what ABI does when the QV is low. This is not the same case but I always think of "N" as "unknown". Raise an error is not bad because I don't see the need to go from an non-sequence qual to a fasta (it doesn't make sense). But that I don't see the need, doesn't means someone else may have a reason. Best, -- Sebasti?n Bassi. Diplomado en Ciencia y Tecnolog?a. Non standard disclaimer: READ CAREFULLY. By reading this email, you agree, on behalf of your employer, to release me from all obligations and waivers arising from any and all NON-NEGOTIATED agreements, licenses, terms-of-service, shrinkwrap, clickwrap, browsewrap, confidentiality, non-disclosure, non-compete and acceptable use policies ("BOGUS AGREEMENTS") that I have entered into with your employer, its partners, licensors, agents and assigns, in perpetuity, without prejudice to my ongoing rights and privileges. You further represent that you have the authority to release me from any BOGUS AGREEMENTS on behalf of your employer. From bugzilla-daemon at portal.open-bio.org Tue Mar 24 18:25:17 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 24 Mar 2009 14:25:17 -0400 Subject: [Biopython-dev] [Bug 2799] New: UnknownSeq object (e.g. for QUAL files) Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2799 Summary: UnknownSeq object (e.g. for QUAL files) Product: Biopython Version: Not Applicable Platform: All OS/Version: All Status: NEW Severity: enhancement Priority: P2 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: biopython-bugzilla at maubp.freeserve.co.uk Sometimes we want to represent an unknown sequence with a known length, e.g. "N"*length for nucleotides. This enhancement is about adding an UnknownSeq object to Biopython which would have the following init arguments: * length * alphabet * character (single letter string, defaulting to "X" for protein and "N" for nucleotides, "?" otherwise) Currently the Bio.SeqIO "qual" parser produces SeqRecord objects where the seq is None, yet there is a known length. This can also occur in GenBank files where the is a CONTIG line but no sequence. This makes supporting slicing (Bug 2507) complicated. Adding a new UnknownSeq class would solve this elegantly. In general, the UnknownSeq object should act as a Seq object whose sequence is the character*length. Slicing or adding UnknownSeq objects should give a new UnknownSeq object. Complement, reverse complement, transcribe and back transcribe can also return new UnknownSeq objects of the same length (alphabet permitting). Translation can return an UnknownSeq object using "X" and a protein alphabet (with the length roughly one third of the nucleotide length - whatever is consistent with the Seq translate method). Adding an UnknownSeq object to a Seq would have to give a new Seq object (or an error?). One use-case example here would be joining together contigs with unknown regions of a given length (strings of N's). This bug is a placeholder for patches or pointers to possible implementations (e.g. I intend to try some ideas on a branch on github). I expect most of the discussion to be on the (dev) mailing list, rather than bugzilla. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From tiagoantao at gmail.com Tue Mar 24 18:42:56 2009 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Tue, 24 Mar 2009 18:42:56 +0000 Subject: [Biopython-dev] biopython on github In-Reply-To: <20090317124930.GE57054@sobchak.mgh.harvard.edu> References: <320fb6e00903170146x59218aa0m857cab797ad3f440@mail.gmail.com> <20090317124930.GE57054@sobchak.mgh.harvard.edu> Message-ID: <6d941f120903241142m2f39213yfd180fcdc7ab7f0e@mail.gmail.com> Hi, On Tue, Mar 17, 2009 at 12:49 PM, Brad Chapman wrote: > There is a lot of good material in this thread for new potential > developers. Tiago, it would make sense to condense what you've > written and include it with the Contributing guide: Just a followup on this: I think it makes no sense to put much of the new content before there is an official step of moving to github. What I am doing, is just to put, for test purposes a framework to see how these suggestions my work. I' ve created a fork http://github.com/tiagoantao/biopython-popgen-test/ with several branches The proposed idea is: 1. The master branch should be a clearing house and stability point for things to be suggested for submission to the official branch. All code here should have unit tests, all unit tests should pass and documentation should exist. Is is also a place to correct bugs that are discovered in the official trunk (if these are simple to correct and don' t require the creation of a temporary branch to sort them out) 2. There is a stats branch to work on Bio.PopGen.Stats. If you want to work on statistics you can follow/fork from the statistics branch. Any code that people might have should be discussed to if they want to make it on the official release. 3. Less interesting to others, I will personally create a genepop branch to make an enhancement to the existing parser and on the ability to call the genepop binary. So: People work on their very personal branches (like my genepop one). Development branches that might have shared interests (like the stats one) should be forked/shared commit and people interested should discuss among themselves. Whenever some content is deemed ready it is then put on the popgen master branch (alongside with tests and documentation). When the master branch is in a stable state, then the changes are proposed to the official one. In my view, this protects the people working on the official thing from the potential chaos of new developments, while creating a framework which allow for people to test innovations... Tiago From biopython at maubp.freeserve.co.uk Tue Mar 24 18:54:28 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 24 Mar 2009 18:54:28 +0000 Subject: [Biopython-dev] biopython on github In-Reply-To: <6d941f120903241142m2f39213yfd180fcdc7ab7f0e@mail.gmail.com> References: <320fb6e00903170146x59218aa0m857cab797ad3f440@mail.gmail.com> <20090317124930.GE57054@sobchak.mgh.harvard.edu> <6d941f120903241142m2f39213yfd180fcdc7ab7f0e@mail.gmail.com> Message-ID: <320fb6e00903241154g1a0f468cy512b29504b8b637a@mail.gmail.com> 2009/3/24 Tiago Ant?o : > Hi, > > On Tue, Mar 17, 2009 at 12:49 PM, Brad Chapman wrote: >> There is a lot of good material in this thread for new potential >> developers. Tiago, it would make sense to condense what you've >> written and include it with the Contributing guide: > > Just a followup on this: I think it makes no sense to put much of the > new content before there is an official step of moving to github. True - but we do need enough pointers for people to help try things out. > What I am doing, is just to put, for test purposes a framework to see > how these suggestions my work.... > In my view, this protects the people working on the official thing > from the potential chaos of new developments, while creating a > framework which allow for people to test innovations... That sounds great, and a good model for other (self contained) modules under active development. I'm thinking along similar lines for Bio.SeqIO and AlignIO (and by implication, the SeqRecord and the Alignment classes). I would assume (although you didn't say this) you would also pull changes to the official trunk into your branches periodically - at very least after each official Biopython release. Peter From bartek at rezolwenta.eu.org Tue Mar 24 23:58:30 2009 From: bartek at rezolwenta.eu.org (Bartek Wilczynski) Date: Wed, 25 Mar 2009 00:58:30 +0100 Subject: [Biopython-dev] history on github - where are the tags? In-Reply-To: <8b34ec180903241649p7e81a2cew6587512c0cef16f@mail.gmail.com> References: <320fb6e00903170206h570989bbgb6b9a761d2aa70ed@mail.gmail.com> <8b34ec180903241649p7e81a2cew6587512c0cef16f@mail.gmail.com> Message-ID: <8b34ec180903241658k21a76269r789600f92c17fbbb@mail.gmail.com> Hi all, Sorry for being quiet all that time, but the conference (+jet lag both ways) proved to be more engaging than I thought. For the tags, they were not pushed to github before, because I didn't know I need to specifically do it qith git push --tags. Now they are pushed to the repository and you can fetch them to local copies by git pull -t in any local directory which resulted from cloning the official branch. They probably won't get automatically transfered to derived branches, I guess you need to pull them from the original (official) branch. cheers Bartek On Wed, Mar 25, 2009 at 12:49 AM, Bartek Wilczynski wrote: > Hi all, > > Sorry for being quiet all that time, but the conference (+jet lag both > ways) proved to be more engaging than I thought. > > For the tags, they were not pushed to github before, because I didn't > know I need to specifically do it qith git push --tags. > > Now they are pushed to the repository and you can fetch them to local > copies by git pull -t in any local directory which resulted from > cloning the official branch. > > They probably won't get automatically transfered to derived branches, > I guess you need to pull > them from the original (official) branch. > > cheers > Bartek > > On Tue, Mar 17, 2009 at 10:06 AM, Peter wrote: >> Hi Bartek et al, >> >> I've just been looking over the github mirror of CVS, and wanted to >> see it presented the history of individual files. ?For example, this >> page looks at the Bio/SeqRecord.py history using ViewCVS: >> http://cvs.biopython.org/cgi-bin/viewcvs/viewcvs.cgi/biopython/Bio/SeqRecord.py?cvsroot=biopython >> >> For comparison, in GitHub, >> http://github.com/biopython/biopython/commits/master/Bio/SeqRecord.py >> >> As you can see, all the comments and changes are there - which is >> great. ?But I can't see the CVS tag information, which I assume would >> be converting into git tags. ?Is this information present in the git >> repository, but not shown by github, or was it lost during the >> migration? ?This might seem like a little thing, but I have found it >> incredibly important for tracing bugs reported in older releases, for >> example in narrowing down when something changed. >> >> Peter >> > > > > -- > Bartek Wilczynski > ================== > Postdoctoral fellow > EMBL, Furlong group > Meyerhoffstrasse 1, > 69012 Heidelberg, > Germany > tel: +49 6221 387 8433 > -- Bartek Wilczynski ================== Postdoctoral fellow EMBL, Furlong group Meyerhoffstrasse 1, 69012 Heidelberg, Germany tel: +49 6221 387 8433 From biopython at maubp.freeserve.co.uk Wed Mar 25 10:01:45 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 25 Mar 2009 10:01:45 +0000 Subject: [Biopython-dev] SeqIO and qual: Question about reading and writing qual files In-Reply-To: <9e2f512b0903240833g7768de97q8f10fe72cde7e64a@mail.gmail.com> References: <9e2f512b0903232324qb509c60v4154d3e1bffb089e@mail.gmail.com> <320fb6e00903240249h4d0bf648rfd5de741e582f687@mail.gmail.com> <9e2f512b0903240759n3c7f8b8fpc96bccd4d629082d@mail.gmail.com> <320fb6e00903240813x5fdb3589qef340129b5e267c0@mail.gmail.com> <9e2f512b0903240833g7768de97q8f10fe72cde7e64a@mail.gmail.com> Message-ID: <320fb6e00903250301v59319214pa3246e0a49899e87@mail.gmail.com> On Tue, Mar 24, 2009 at 3:33 PM, Sebastian Bassi wrote: > On Tue, Mar 24, 2009 at 12:13 PM, Peter wrote: > .... >> characters using a DNA alphabet. What would you expect to get if you >> used Bio.SeqIO to write out the file in FASTA format? ?To my mind there >> are two sensible options - write out the file using the "NNN....N" >> sequence, or raise an error. > > "N" is OK (with the same length of the qual file), that is what ABI > does when the QV is low. This is not the same case but I always think > of "N" as "unknown". > Raise an error is not bad because I don't see the need to go from an > non-sequence qual to a fasta (it doesn't make sense). But that I don't > see the need, doesn't means someone else may have a reason. > Best, I've filed an enhancement bug for the possible enhancement to add an UnknownSeq object, perhaps as part of the Bio.Seq module, Bug 2799 http://bugzilla.open-bio.org/show_bug.cgi?id=2799 I've done an initial patch (which I plan to upload on Bugzilla) which is available now on git hub on a new branch: http://github.com/peterjc/biopython/tree/bug2799-UnknownSeq Note this doesn't do anything special (yet) when writing output files, so they will by default record a string of whatever unknown sequence character was used. It would make sense for GenBank/EMBL in SeqIO to also take advantage o the UnknownSeq object, because here the sequence is essentially optional (consider files with just a CONTIG line), but should always have a length. Sebastian - could you have a quick play with this github code (using the new UnknownSeq class), and the current CVS code (using None), and make sure both support the slicing operations you were trying earlier? Thanks. Peter From biopython at maubp.freeserve.co.uk Wed Mar 25 10:28:46 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 25 Mar 2009 10:28:46 +0000 Subject: [Biopython-dev] history on github - where are the tags? In-Reply-To: <8b34ec180903241658k21a76269r789600f92c17fbbb@mail.gmail.com> References: <320fb6e00903170206h570989bbgb6b9a761d2aa70ed@mail.gmail.com> <8b34ec180903241649p7e81a2cew6587512c0cef16f@mail.gmail.com> <8b34ec180903241658k21a76269r789600f92c17fbbb@mail.gmail.com> Message-ID: <320fb6e00903250328y19165a77t470124ce490cea3d@mail.gmail.com> On Tue, Mar 24, 2009 at 11:58 PM, Bartek Wilczynski wrote: > > Hi all, > > Sorry for being quiet all that time, but the conference (+jet lag both > ways) proved to be more engaging than I thought. That's fine - sleep is important ;) > For the tags, they were not pushed to github before, because I didn't > know I need to specifically do it qith git push --tags. I assume you've updated your cron job so this will happen automatically in future (e.g. when we do Biopython 1.50 beta). > Now they are pushed to the repository and you can fetch them to local > copies by git pull -t in any local directory which resulted from > cloning the official branch. Yes, I've checked and I can get the tags with: git pull -t ... or, git pull --tags ... They also show up in github (near the top, drop down menu next to branches) and in gitx (and I assume other GUI clients). They have commit comments like "This commit was manufactured by cvs2svn to create tag 'biopython-146'", which is fine. However, all the tags seem to have associated with them the deletion of the files AUTHORS and Bio/UniGene/UniGene.py which is rather odd. If you can work out how this happened, would it be trivial to back these tags out and redo it? > They probably won't get automatically transfered to derived branches, > I guess you need to pull them from the original (official) branch. That makes sense. Peter From mjldehoon at yahoo.com Wed Mar 25 11:47:59 2009 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Wed, 25 Mar 2009 04:47:59 -0700 (PDT) Subject: [Biopython-dev] Bio.Entrez catching more errors In-Reply-To: <320fb6e00903220344t1057bf74mcdc1f2256d8b29b4@mail.gmail.com> Message-ID: <559251.50851.qm@web62401.mail.re1.yahoo.com> > What about the fairly common situation (at, its something > I've done fairly often) where Bio.Entrez.efetch() is used > to fetch records which are saved directly to file without > verification - e.g. to be parsed by another program? > Unless the error is caught in Bio.Entrez.efetch() > it may be out of our control. That is easy: just run the output returned by NCBI through the appropriate parser. If the parser is happy, proceed to save the NCBI output in a file. > The first half of the email (the main point) was based > on a special case: HTML and XML are pretty easy to > identify. If you ask for HTML and don't get it, it is > an error (and vice versa). If you ask for XML and don't > get it, it is an error (and vice versa). The fact that > the NCBI currently often return an HTML or XML error > page when a plain text format was requested is then > easily detected as an error (simply from the file type). > This will still work even if the NCBI do change their > error formats or wording - it should be pretty robust. Have a look at serialset.xml in the Bio.Entrez test cases ... this is the output obtained from NCBI using efetch from the journals database with retmode='xml'. The file looks like XML, but it doesn't start with " References: <320fb6e00903220344t1057bf74mcdc1f2256d8b29b4@mail.gmail.com> <559251.50851.qm@web62401.mail.re1.yahoo.com> Message-ID: <320fb6e00903250515vd885b34s629dd9253d4f9186@mail.gmail.com> On Wed, Mar 25, 2009 at 11:47 AM, Michiel de Hoon wrote: > >> What about the fairly common situation (at, its something >> I've done fairly often) where Bio.Entrez.efetch() is used >> to fetch records which are saved directly to file without >> verification - e.g. to be parsed by another program? >> Unless the error is caught in Bio.Entrez.efetch() >> it may be out of our control. > > That is easy: just run the output returned by NCBI through > the appropriate parser. If the parser is happy, proceed to > save the NCBI output in a file. Possible, but you'd need to cache the handle's data in order to be able to save it after parsing. The UndoHandle doesn't do this. You could save the data to a file, and then check the parser can read it back - however, this would be complicated if you are downloading data in batches to go into a single file. >> The first half of the email (the main point) was based >> on a special case: HTML and XML are pretty easy to >> identify. ?If you ask for HTML and don't get it, it is >> an error (and vice versa). ?If you ask for XML and don't >> get it, it is an error (and vice versa). ?The fact that >> the NCBI currently often return an HTML or XML error >> page when a plain text format was requested is then >> easily detected as an error (simply from the file type). >> This will still work even if the NCBI do change their >> error formats or wording - it should be pretty robust. > > Have a look at serialset.xml in the Bio.Entrez test cases ... this > is the output obtained from NCBI using efetch from the journals > database with retmode='xml'. The file looks like XML, but it > doesn't start with " correctly, so while it's not pretty to me this would not count as > an error. I do concede my sample code for detecting XML or HTML could be improved, and this provides a good test case for a difficult XML file. Maybe when we expect XML (or HTML), all we should check is the file starts with "<"? e.g. elif "retmode" in params and params["retmode"].lower()=="html" \ and not data.lower().startswith("<") : raise TypeError("Requested HTML, but didn't get it: %s..." % data) elif "retmode" in params and params["retmode"].lower()=="xml" \ and not data.lower().startswith("<") : raise TypeError("Requested XML, but didn't get it: %s..." % data) elif "retmode" in params and params["retmode"] \ and params["retmode"].lower()!="xml" \ and data.lower().startswith(" References: <320fb6e00903170206h570989bbgb6b9a761d2aa70ed@mail.gmail.com> <8b34ec180903241649p7e81a2cew6587512c0cef16f@mail.gmail.com> <8b34ec180903241658k21a76269r789600f92c17fbbb@mail.gmail.com> <320fb6e00903250328y19165a77t470124ce490cea3d@mail.gmail.com> Message-ID: <8b34ec180903250516v75efdd2i95cb77145b4d3001@mail.gmail.com> On Wed, Mar 25, 2009 at 11:28 AM, Peter wrote: > I assume you've updated your cron job so this will happen > automatically in future (e.g. when we do Biopython 1.50 beta). Yes, naturally. > > However, all the tags seem to have associated with them the deletion > of the files AUTHORS and Bio/UniGene/UniGene.py which is rather odd. > If you can work out how this happened, would it be trivial to back > these tags out and redo it? > That's really odd. I don't know exactly where it comes from, but I've done some detective work and here are my findings: For the AUTHORS file, it was indeed deleted in a commit by Jeff Chang (2001): http://github.com/biopython/biopython/tree/c9dfca8631c23b47bddb519dce9e98d07079eb65 Which "renames" the AUTHORS file into CONTRIB file. The AUTHORS file is in the biopython tags prior to 1.00a1 and then it should not be there anymore (it's in CVS'a attic) I don't know where how it came back... Similarly, the Bio/Unigene/UniGene.py file was removed by Jeff in a commit: http://github.com/biopython/biopython/commit/8b940e38d0fbb7c471366f844318c32b08bdd8c2 And similarly, UniGene.py is no longer in CVS repo (but it's still in the attic). What these files have in common, is that there are some commits to them after they've been moved to Attic (sic!) http://github.com/biopython/biopython/commits/master/Bio/UniGene/UniGene.py http://github.com/biopython/biopython/commits/master/AUTHORS I don't know exactly how this could happen, but this inconsistency in CVS might be causing cvs2git to actually include these guys. I'll increase the verbosity of the log messages in my cron script, so Maybe I'll see some indication of a problem. If nobody has a reason for these files to be included in the current trunk, I'll go ahead and remove them from git. cheers Bartek -- Bartek Wilczynski ================== Postdoctoral fellow EMBL, Furlong group Meyerhoffstrasse 1, 69012 Heidelberg, Germany tel: +49 6221 387 8433 From biopython at maubp.freeserve.co.uk Wed Mar 25 12:20:05 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 25 Mar 2009 12:20:05 +0000 Subject: [Biopython-dev] history on github - where are the tags? In-Reply-To: <8b34ec180903250516v75efdd2i95cb77145b4d3001@mail.gmail.com> References: <320fb6e00903170206h570989bbgb6b9a761d2aa70ed@mail.gmail.com> <8b34ec180903241649p7e81a2cew6587512c0cef16f@mail.gmail.com> <8b34ec180903241658k21a76269r789600f92c17fbbb@mail.gmail.com> <320fb6e00903250328y19165a77t470124ce490cea3d@mail.gmail.com> <8b34ec180903250516v75efdd2i95cb77145b4d3001@mail.gmail.com> Message-ID: <320fb6e00903250520nedc0aaj84c10a1b2a72e8a2@mail.gmail.com> >> However, all the tags seem to have associated with them the deletion >> of the files AUTHORS and Bio/UniGene/UniGene.py which is rather odd. >> If you can work out how this happened, would it be trivial to back >> these tags out and redo it? >> > That's really odd. I don't know exactly where it comes from, but I've > done some detective work and here are my findings: > > For the AUTHORS ?file, it was indeed deleted in a commit by Jeff Chang (2001): > http://github.com/biopython/biopython/tree/c9dfca8631c23b47bddb519dce9e98d07079eb65 > Which "renames" the AUTHORS file into CONTRIB file. > > The AUTHORS file is in the biopython tags prior to 1.00a1 and then it > should not be there anymore (it's in CVS'a attic) >?I don't know where how it came back... > > Similarly, the Bio/Unigene/UniGene.py file was removed by Jeff in a commit: > http://github.com/biopython/biopython/commit/8b940e38d0fbb7c471366f844318c32b08bdd8c2 > > And similarly, UniGene.py is no longer in CVS repo (but it's still in > the attic). > > What these files have in common, is that there are some commits to > them after they've been moved to Attic (sic!) > > http://github.com/biopython/biopython/commits/master/Bio/UniGene/UniGene.py > http://github.com/biopython/biopython/commits/master/AUTHORS > > I don't know exactly how this could happen, but this inconsistency in > CVS might be causing cvs2git to actually include these guys. It does sound like a hidden hickup in our CVS repository... very strange. Peter From bartek at rezolwenta.eu.org Wed Mar 25 12:43:00 2009 From: bartek at rezolwenta.eu.org (Bartek Wilczynski) Date: Wed, 25 Mar 2009 13:43:00 +0100 Subject: [Biopython-dev] history on github - where are the tags? In-Reply-To: <320fb6e00903250520nedc0aaj84c10a1b2a72e8a2@mail.gmail.com> References: <320fb6e00903170206h570989bbgb6b9a761d2aa70ed@mail.gmail.com> <8b34ec180903241649p7e81a2cew6587512c0cef16f@mail.gmail.com> <8b34ec180903241658k21a76269r789600f92c17fbbb@mail.gmail.com> <320fb6e00903250328y19165a77t470124ce490cea3d@mail.gmail.com> <8b34ec180903250516v75efdd2i95cb77145b4d3001@mail.gmail.com> <320fb6e00903250520nedc0aaj84c10a1b2a72e8a2@mail.gmail.com> Message-ID: <8b34ec180903250543g3029edb4h33d332371ef4e469@mail.gmail.com> On Wed, Mar 25, 2009 at 1:20 PM, Peter wrote: >> I don't know exactly how this could happen, but this inconsistency in >> CVS might be causing cvs2git to actually include these guys. > > It does sound like a hidden hickup in our CVS repository... very strange. I would rather call it a glitch in a transition. I was actually quite surprised that the transition went so smooth. Now we can see that actually some things did not transfer too well... I did a thorough check to compare checkouts from current CVS and git trunks to see that there are also some other differences: As you can see below, there apart from these two files present only in git, a number of directories are not missing in git. I've checked: they are all empty directories leftover because you cannot delete a directory from CVS (some of them, like Bio.Tools have actually a number of directories in them, but they are all empty). I think that it's actually a desired behavior (removing empty directories) but if anyone is missing any of these dirs, please let me know. The diff: Only in git_branch/: AUTHORS Only in biopython/Bio: Ais Only in biopython/Bio: CDD Only in biopython/Bio: cmmCIF Only in biopython/Bio: config Only in biopython/Bio: dbdefs Only in biopython/Bio: ECell Only in biopython/Bio: expressions Only in biopython/Bio: formatdefs Only in biopython/Bio: Gobase Only in biopython/Bio: iodefs Only in biopython/Bio: Kabat Only in biopython/Bio: LocusLink Only in biopython/Bio: MultiProc Only in biopython/Bio/PDB: mmCIF_lex Only in biopython/Bio: Rebase Only in biopython/Bio/SCOP: tests Only in biopython/Bio: sources Only in biopython/Bio: Tools Only in git_branch/Bio/UniGene: UniGene.py Only in biopython/Doc/cookbook: biopython_test Only in biopython/Doc/cookbook: genbank_to_fasta Only in biopython/Doc/cookbook: LogisticRegression Only in biopython: Experimental Only in git_branch/: .git Only in biopython/Martel: examples Only in biopython/Tests: CDD Only in biopython/Tests: ECell Only in biopython/Tests: Gobase Only in biopython/Tests: Kabat Only in biopython/Tests: LocusLink Only in biopython/Tests: Ndb Only in biopython/Tests: UnitTests Only in biopython/Tests: WIT cheers Bartek From biopython at maubp.freeserve.co.uk Wed Mar 25 12:47:02 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 25 Mar 2009 12:47:02 +0000 Subject: [Biopython-dev] history on github - where are the tags? In-Reply-To: <8b34ec180903250543g3029edb4h33d332371ef4e469@mail.gmail.com> References: <320fb6e00903170206h570989bbgb6b9a761d2aa70ed@mail.gmail.com> <8b34ec180903241649p7e81a2cew6587512c0cef16f@mail.gmail.com> <8b34ec180903241658k21a76269r789600f92c17fbbb@mail.gmail.com> <320fb6e00903250328y19165a77t470124ce490cea3d@mail.gmail.com> <8b34ec180903250516v75efdd2i95cb77145b4d3001@mail.gmail.com> <320fb6e00903250520nedc0aaj84c10a1b2a72e8a2@mail.gmail.com> <8b34ec180903250543g3029edb4h33d332371ef4e469@mail.gmail.com> Message-ID: <320fb6e00903250547s7d88a1b3h8c52dd852047edb6@mail.gmail.com> On Wed, Mar 25, 2009 at 12:43 PM, Bartek Wilczynski wrote: > I did a ?thorough check to compare checkouts from current CVS and git > trunks to see that there are also some other differences: > As you can see below, there apart from these two files present only in > git, a number of directories are not missing in git. I've checked: > they are all empty directories leftover because you cannot delete a > directory from CVS (some of them, like Bio.Tools have actually a > number of directories in them, but they are all empty). > > I think that it's actually a desired behavior (removing empty > directories) but if anyone is missing any of these dirs, please let me > know. I don't care about the missing empty directories - if/once we move to git, we would have deleted them anyway. So if that has been done automatically, that's fine in my opinion. Peter From tiagoantao at gmail.com Wed Mar 25 15:39:42 2009 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Wed, 25 Mar 2009 15:39:42 +0000 Subject: [Biopython-dev] biopython on github In-Reply-To: <320fb6e00903241154g1a0f468cy512b29504b8b637a@mail.gmail.com> References: <320fb6e00903170146x59218aa0m857cab797ad3f440@mail.gmail.com> <20090317124930.GE57054@sobchak.mgh.harvard.edu> <6d941f120903241142m2f39213yfd180fcdc7ab7f0e@mail.gmail.com> <320fb6e00903241154g1a0f468cy512b29504b8b637a@mail.gmail.com> Message-ID: <6d941f120903250839i62f6d8f9i8a5f5b85ff694848@mail.gmail.com> On Tue, Mar 24, 2009 at 6:54 PM, Peter wrote: >> In my view, this protects the people working on the official thing >> from the potential chaos of new developments, while creating a >> framework which allow for people to test innovations... > > That sounds great, and a good model for other (self contained) modules under Just a minor point. any development branches should be seen as highly unstable. I say this just because I am restarting to work on statistics and I am seeing massive refactoring going on. So if people track development branches, they should be prepared for chaos ;) . Which is exactly the opposite they should expect from the official branch ;) From biopython at maubp.freeserve.co.uk Wed Mar 25 15:45:00 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 25 Mar 2009 15:45:00 +0000 Subject: [Biopython-dev] biopython on github In-Reply-To: <6d941f120903250839i62f6d8f9i8a5f5b85ff694848@mail.gmail.com> References: <320fb6e00903170146x59218aa0m857cab797ad3f440@mail.gmail.com> <20090317124930.GE57054@sobchak.mgh.harvard.edu> <6d941f120903241142m2f39213yfd180fcdc7ab7f0e@mail.gmail.com> <320fb6e00903241154g1a0f468cy512b29504b8b637a@mail.gmail.com> <6d941f120903250839i62f6d8f9i8a5f5b85ff694848@mail.gmail.com> Message-ID: <320fb6e00903250845u23dea2a6o5330bfdec0d577ef@mail.gmail.com> 2009/3/25 Tiago Ant?o : > Just a minor point. any development branches should be seen as highly > unstable. I say this just because I am restarting to work on > statistics and I am seeing massive refactoring going on. So if people > track development branches, they should be prepared for chaos ;) . > Which is exactly the opposite they should expect from the official > branch ;) We should probably all write something on the wiki page for our personal forks, describing what you're using it for, what at the main branches likely to be of interest etc. Peter From bartek at rezolwenta.eu.org Wed Mar 25 16:33:13 2009 From: bartek at rezolwenta.eu.org (Bartek Wilczynski) Date: Wed, 25 Mar 2009 17:33:13 +0100 Subject: [Biopython-dev] biopython on github In-Reply-To: <320fb6e00903250845u23dea2a6o5330bfdec0d577ef@mail.gmail.com> References: <320fb6e00903170146x59218aa0m857cab797ad3f440@mail.gmail.com> <20090317124930.GE57054@sobchak.mgh.harvard.edu> <6d941f120903241142m2f39213yfd180fcdc7ab7f0e@mail.gmail.com> <320fb6e00903241154g1a0f468cy512b29504b8b637a@mail.gmail.com> <6d941f120903250839i62f6d8f9i8a5f5b85ff694848@mail.gmail.com> <320fb6e00903250845u23dea2a6o5330bfdec0d577ef@mail.gmail.com> Message-ID: <8b34ec180903250933y5a4bdf6elae31f683d2848205@mail.gmail.com> 2009/3/25 Peter : > > We should probably all write something on the wiki page for our > personal forks, describing what you're using it for, what at the main > branches likely to be of interest etc. Hi, I'll be happy to write some draft version of guidelines for developers and contibutors to the wiki. It just seems that currently there are some problems with biopython wiki. Does anyone know what is the problem? Is it some kind of internal OBF issue or is it because of increased interest in biopython after the application note was published? Do we have access to any access statistics to the website? cheers Bartek From biopython at maubp.freeserve.co.uk Wed Mar 25 16:41:00 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 25 Mar 2009 16:41:00 +0000 Subject: [Biopython-dev] biopython on github In-Reply-To: <8b34ec180903250933y5a4bdf6elae31f683d2848205@mail.gmail.com> References: <320fb6e00903170146x59218aa0m857cab797ad3f440@mail.gmail.com> <20090317124930.GE57054@sobchak.mgh.harvard.edu> <6d941f120903241142m2f39213yfd180fcdc7ab7f0e@mail.gmail.com> <320fb6e00903241154g1a0f468cy512b29504b8b637a@mail.gmail.com> <6d941f120903250839i62f6d8f9i8a5f5b85ff694848@mail.gmail.com> <320fb6e00903250845u23dea2a6o5330bfdec0d577ef@mail.gmail.com> <8b34ec180903250933y5a4bdf6elae31f683d2848205@mail.gmail.com> Message-ID: <320fb6e00903250941o6e99e06egb672b62f2d661e15@mail.gmail.com> On Wed, Mar 25, 2009 at 4:33 PM, Bartek Wilczynski wrote: > > 2009/3/25 Peter : >> >> We should probably all write something on the wiki page for our >> personal forks, describing what you're using it for, what at the main >> branches likely to be of interest etc. > > Hi, > > I'll be happy to write some draft version of guidelines for developers > and contibutors to the wiki. Certainly add a section to the git migration page. > It just seems that currently there are some problems with biopython > wiki. Does anyone know what is the problem? > Is it some kind of internal OBF issue or is it because of increased > interest in biopython after the application note was > published? Do we have access to any access statistics to the website? Its seems to be all the OBF pages (e.g. bioperl.org too), and its been more than an hour so I'll drop their support team an email. Peter From sbassi at clubdelarazon.org Wed Mar 25 16:59:28 2009 From: sbassi at clubdelarazon.org (Sebastian Bassi) Date: Wed, 25 Mar 2009 13:59:28 -0300 Subject: [Biopython-dev] SeqIO and qual: Question about reading and writing qual files In-Reply-To: <320fb6e00903250301v59319214pa3246e0a49899e87@mail.gmail.com> References: <9e2f512b0903232324qb509c60v4154d3e1bffb089e@mail.gmail.com> <320fb6e00903240249h4d0bf648rfd5de741e582f687@mail.gmail.com> <9e2f512b0903240759n3c7f8b8fpc96bccd4d629082d@mail.gmail.com> <320fb6e00903240813x5fdb3589qef340129b5e267c0@mail.gmail.com> <9e2f512b0903240833g7768de97q8f10fe72cde7e64a@mail.gmail.com> <320fb6e00903250301v59319214pa3246e0a49899e87@mail.gmail.com> Message-ID: <9e2f512b0903250959h26081e4ak3246252d02be2ee0@mail.gmail.com> On Wed, Mar 25, 2009 at 7:01 AM, Peter wrote: .... > Sebastian - could you have a quick play with this github code (using the new > UnknownSeq class), and the current CVS code (using None), and make sure > both support the slicing operations you were trying earlier? Thanks. OK, I'll try both today and report back to the list. From eric.talevich at gmail.com Wed Mar 25 21:44:30 2009 From: eric.talevich at gmail.com (Eric Talevich) Date: Wed, 25 Mar 2009 17:44:30 -0400 Subject: [Biopython-dev] PDB tidy script In-Reply-To: <320fb6e00903231405l479ddcc6of9cd0c1aa8fd98d4@mail.gmail.com> References: <320fb6e00903231405l479ddcc6of9cd0c1aa8fd98d4@mail.gmail.com> Message-ID: <3f6baf360903251444l3064963bp788750ed7a67e4d4@mail.gmail.com> On Mon, Mar 23, 2009 at 5:05 PM, Peter wrote: > > If you look back over the history, there initially was no header parsing, > it was a contribution from Kristian Rother, and I would agree, it is rather > disjoint from the rest of the code. One thing I personally wanted last > time I was working with PDB files was to have secondary structure > information (for them alpha and beta sheet lines in the header) > mapped onto the residue objects automatically. > > And yes, Thomas is supporting the PDB module, but his time has > been rather limited of late. When I asked him about some of the > open enhancement requests in bugzilla recently (off list) he said > said we needed "a separate class to parse all the info in the header, > not a slew of additions to the core parser class (which is designed > to deal with the 3D data only)." > > I can understand both those wishes. Looking at the features currently available in the module, the best approach might be to leave the 3D parser and PDB.Entity-derived classes alone and add another wrapper class containing the header, sequence (maybe), secondary and tertiary structure as separate attributes. When working in the REPL, I've wished for a simpler function to load PDB files by path and figure out the name automatically; this would be an easy way to do it without violating Thomas's parser -- just use parse_pdb_header() in the wrapper, and use the name from there as the first argument to PDB.get_structure(). For example (quick & dirty): class PDBLoader: def __init__(self, path): self.__dict__ = parse_pdb_header(path) if not self.name: self.name = os.path.basename(path).split('.')[0] parse_3d = PDBParser() self.structure = parse_3d.get_structure(self.name, path) # self.secondary = ? # link 1/2/3ary data in various ways ... >>> pdb = PDBLoader('a_structure.pdb') >>> dir(pdb) ['__doc__', '__init__', '__module__', 'author', 'compound', 'deposition_date', 'head', 'journal_reference', 'name', 'release_date', 'resolution', 'source', 'structure', 'structure_method', 'structure_reference'] In that case, it would be reasonable to let get_structure and parse_pdb_header take an open file-like object as an alternative to the PDB file's path to avoid opening and closing the same file repeatedly. There's also some cleanup to do in parse_pdb_header.py alongside this. Does this sound reasonable? -Eric From chapmanb at 50mail.com Wed Mar 25 21:55:48 2009 From: chapmanb at 50mail.com (Brad Chapman) Date: Wed, 25 Mar 2009 17:55:48 -0400 Subject: [Biopython-dev] biopython on github In-Reply-To: <320fb6e00903250845u23dea2a6o5330bfdec0d577ef@mail.gmail.com> References: <320fb6e00903170146x59218aa0m857cab797ad3f440@mail.gmail.com> <20090317124930.GE57054@sobchak.mgh.harvard.edu> <6d941f120903241142m2f39213yfd180fcdc7ab7f0e@mail.gmail.com> <320fb6e00903241154g1a0f468cy512b29504b8b637a@mail.gmail.com> <6d941f120903250839i62f6d8f9i8a5f5b85ff694848@mail.gmail.com> <320fb6e00903250845u23dea2a6o5330bfdec0d577ef@mail.gmail.com> Message-ID: <20090325215548.GB21577@sobchak.mgh.harvard.edu> Hey all; Good discussion on this; I touch on a few points from different threads below. Michiel: > I haven't been following this topic closely, and as an "outsider" > using git seems more complicated than using cvs or svn. And to be > honest, I don't know if Biopython actually needs the branching and > forking stuff. I think that this is more useful for bigger projects, > where multiple developers may be working on interrelated parts of code > at the same time. That hardly ever happens in Biopython, though. Tiago: > I would actually take this argument and reverse it: [...] > Using a distributed technology allows for people to try new ideas and > to get things moving (while still maintaining an official rock stable > version with maybe glacial policies). I fall in between these two viewpoints. Git has more complications and, unless we manage those, we risk introducing additional barriers to contribution. Imagine looking at biopython on git hub and seeing 10 different branches for different users, many of which may be old and out of date. This could lead to the impression that we are not organized toward a single goal. If you are still interested, how do you know which ones could use your help and what they are for? The solution to this is documentation on the wiki. We rely too much on the mailing list and expect people to keep up. Peter read my mind on this: Peter: > We should probably all write something on the wiki page for our > personal forks, describing what you're using it for, what at the main > branches likely to be of interest etc. I started a page over the weekend doing this: http://biopython.org/wiki/Active_projects It's a skeleton so add or subtract away. My idea for this is that it is for longer projects that could use outside help. It's not reasonable to spend time writing up things you'll be finishing in a week or so; for that bugzilla does fine keeping interested parties up to date. Another idea on this page is a specific wish list of libraries for future work. This is a starting point for anyone who comes into Biopython fresh and would like to take something on. Also, it encourages people who have developed external libraries to deal with problems we are interested in to consider folding them into Biopython. Me: > > There is a lot of good material in this thread for new potential > > developers. Tiago, it would make sense to condense what you've > > written and include it with the Contributing guide: Tiago: > Just a followup on this: I think it makes no sense to put much of the > new content before there is an official step of moving to github. We are serious about moving to Git and need to have the documentation in place so others can learn it. You wrote up a lot of good stuff, and it will be lost on the mailing list. Brad From bugzilla-daemon at portal.open-bio.org Wed Mar 25 22:43:57 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 25 Mar 2009 18:43:57 -0400 Subject: [Biopython-dev] [Bug 2799] UnknownSeq object (e.g. for QUAL files) In-Reply-To: Message-ID: <200903252243.n2PMhvoT007523@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2799 ------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2009-03-25 18:43 EST ------- I've made my first attempt at this available as a personal branch on github, http://github.com/peterjc/biopython/tree/bug2799-UnknownSeq -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From sbassi at clubdelarazon.org Wed Mar 25 23:15:05 2009 From: sbassi at clubdelarazon.org (Sebastian Bassi) Date: Wed, 25 Mar 2009 20:15:05 -0300 Subject: [Biopython-dev] SeqIO and qual: Question about reading and writing qual files In-Reply-To: <320fb6e00903250301v59319214pa3246e0a49899e87@mail.gmail.com> References: <9e2f512b0903232324qb509c60v4154d3e1bffb089e@mail.gmail.com> <320fb6e00903240249h4d0bf648rfd5de741e582f687@mail.gmail.com> <9e2f512b0903240759n3c7f8b8fpc96bccd4d629082d@mail.gmail.com> <320fb6e00903240813x5fdb3589qef340129b5e267c0@mail.gmail.com> <9e2f512b0903240833g7768de97q8f10fe72cde7e64a@mail.gmail.com> <320fb6e00903250301v59319214pa3246e0a49899e87@mail.gmail.com> Message-ID: <9e2f512b0903251615x7c14c90en3b3a9b2b6ff86186@mail.gmail.com> On Wed, Mar 25, 2009 at 7:01 AM, Peter wrote: > Sebastian - could you have a quick play with this github code (using the new > UnknownSeq class), and the current CVS code (using None), and make sure > both support the slicing operations you were trying earlier? Thanks. First I tried the CVS code (with None in seq), it worked. Then I tried the git code and it also worked. One thing I noticed is that I got "?" instead of "N" the "sequence" of the UnknownSeq. >From a practical point of view, both versions are the same, but the concept of UnknownSeq looks solid than None, because if I don't know about about biopython internals, I would never try to slice a None seq. With "None": len(s) returns: Traceback (most recent call last): File "/home/sbassi/bioinfo/INTA/qualparser.py", line 21, in print len(s) File "/home/sbassi/test/virtualenv-1.3.2/t6/lib/python2.5/site-packages/biopython-1.49-py2.5-linux-i686.egg/Bio/SeqRecord.py", line 481, in __len__ return len(self.seq) TypeError: object of type 'NoneType' has no len() So I would never try to do: new_s = s[10:30] But with the UnknownSeq object, len(s) returns an actual length, so it is more intuitive that it can be sliced. I liked the github interface, may I setup my own repository? Best, -- Sebasti?n Bassi. Diplomado en Ciencia y Tecnolog?a. Non standard disclaimer: READ CAREFULLY. By reading this email, you agree, on behalf of your employer, to release me from all obligations and waivers arising from any and all NON-NEGOTIATED agreements, licenses, terms-of-service, shrinkwrap, clickwrap, browsewrap, confidentiality, non-disclosure, non-compete and acceptable use policies ("BOGUS AGREEMENTS") that I have entered into with your employer, its partners, licensors, agents and assigns, in perpetuity, without prejudice to my ongoing rights and privileges. You further represent that you have the authority to release me from any BOGUS AGREEMENTS on behalf of your employer. From biopython at maubp.freeserve.co.uk Wed Mar 25 23:30:14 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 25 Mar 2009 23:30:14 +0000 Subject: [Biopython-dev] SeqIO and qual: Question about reading and writing qual files In-Reply-To: <9e2f512b0903251615x7c14c90en3b3a9b2b6ff86186@mail.gmail.com> References: <9e2f512b0903232324qb509c60v4154d3e1bffb089e@mail.gmail.com> <320fb6e00903240249h4d0bf648rfd5de741e582f687@mail.gmail.com> <9e2f512b0903240759n3c7f8b8fpc96bccd4d629082d@mail.gmail.com> <320fb6e00903240813x5fdb3589qef340129b5e267c0@mail.gmail.com> <9e2f512b0903240833g7768de97q8f10fe72cde7e64a@mail.gmail.com> <320fb6e00903250301v59319214pa3246e0a49899e87@mail.gmail.com> <9e2f512b0903251615x7c14c90en3b3a9b2b6ff86186@mail.gmail.com> Message-ID: <320fb6e00903251630t45da293fl4d8d111b7e7eedc9@mail.gmail.com> On Wed, Mar 25, 2009 at 11:15 PM, Sebastian Bassi: >> Sebastian - could you have a quick play with this github code (using the new >> UnknownSeq class), and the current CVS code (using None), and make sure >> both support the slicing operations you were trying earlier? ?Thanks. > > First I tried the CVS code (with None in seq), it worked. OK, good. That will do in the very short term - the UnknownSeq needs some more testing and general approval before I'd check that in. > Then I tried the git code and it also worked. One thing I noticed is > that I got "?" instead of "N" the "sequence" of the UnknownSeq. I felt we shouldn't use an "N" unless we are confident the sequence is nucleotides. In practice, this is probably a safe assumption for FASTQ and QUAL files - unless anyone can think of a counter example? Do you think it is safe to assume FASTQ and QUAL files are just for nucleotides? I mean, you could translate a CDS from transcriptome sequencing, and for the sake of argument give each amino acid a quality score from the three nucleotide quality scores, and then save this a protein FASTQ file. But I've never heard of anyone actually doing this ;) > From a practical point of view, both versions are the same, but the > concept of UnknownSeq looks solid than None, because if I don't know > about about biopython internals, I would never try to slice a None > seq. With "None": > len(s) returns: > > Traceback (most recent call last): > ... > TypeError: object of type 'NoneType' has no len() > > So I would never try to do: > new_s = s[10:30] > > But with the UnknownSeq object, len(s) returns an actual length, so it > is more intuitive that it can be sliced. I agree the UnknownSeq is more intuitive - plus it makes the SeqRecord __getitem__ code nicer, and it means you can do len(SeqRecord) too, which was problematic if the sequence was None. > > I liked the github interface, may I setup my own repository? > Yes - this is one of the nice things about git, it makes it easy for anyone to make their own local branch of Biopython, but keep it under version control and pull in changes from the master branch (or another git user) quite easily. It should also make it easy to offer changes back to the main project (assuming we do switch to hosting it on git, for now it is still being done via CVS). However, bear in mind this is still only a test migration, and it is still possible we'll have to redo the CVS to git migration. There is a long (and on going) thread on this mailing list about all this already, with an evolving wiki page: http://biopython.org/wiki/GitMigration Peter From bartek at rezolwenta.eu.org Thu Mar 26 01:02:59 2009 From: bartek at rezolwenta.eu.org (Bartek Wilczynski) Date: Thu, 26 Mar 2009 02:02:59 +0100 Subject: [Biopython-dev] biopython on github In-Reply-To: <20090325215548.GB21577@sobchak.mgh.harvard.edu> References: <320fb6e00903170146x59218aa0m857cab797ad3f440@mail.gmail.com> <20090317124930.GE57054@sobchak.mgh.harvard.edu> <6d941f120903241142m2f39213yfd180fcdc7ab7f0e@mail.gmail.com> <320fb6e00903241154g1a0f468cy512b29504b8b637a@mail.gmail.com> <6d941f120903250839i62f6d8f9i8a5f5b85ff694848@mail.gmail.com> <320fb6e00903250845u23dea2a6o5330bfdec0d577ef@mail.gmail.com> <20090325215548.GB21577@sobchak.mgh.harvard.edu> Message-ID: <8b34ec180903251802h30661c80q51aab573f5c07c5@mail.gmail.com> On Wed, Mar 25, 2009 at 10:55 PM, Brad Chapman wrote: > Hey all; > Good discussion on this; I touch on a few points from different > threads below. > Indeed, I'm very happy that we got the ball rolling and more people now take part in the discussion. > I fall in between these two viewpoints. Git has more complications and, > unless we manage those, we risk introducing additional barriers to > contribution. Imagine looking at biopython on git hub and seeing 10 > different branches for different users, many of which may be old and > out of date. This could lead to the impression that we are not > organized toward a single goal. If you are still interested, how > do you know which ones could use your help and what they are for? > > The solution to this is documentation on the wiki. We rely too much on > the mailing list and expect people to keep up. Peter read my mind on > this: > > Peter: >> We should probably all write something on the wiki page for our >> personal forks, describing what you're using it for, what at the main >> branches likely to be of interest etc. > > I started a page over the weekend doing this: > > http://biopython.org/wiki/Active_projects > > It's a skeleton so add or subtract away. My idea for this is that it > is for longer projects that could use outside help. It's not reasonable > to spend time writing up things you'll be finishing in a week or so; for > that bugzilla does fine keeping interested parties up to date. > > Another idea on this page is a specific wish list of libraries for > future work. This is a starting point for anyone who comes into > Biopython fresh and would like to take something on. Also, it encourages > people who have developed external libraries to deal with problems we > are interested in to consider folding them into Biopython. Great ideas. I fully agree that we need clear documentation if we want more people to contribute. > > Me: >> > There is a lot of good material in this thread for new potential >> > developers. Tiago, it would make sense to condense what you've >> > written and include it with the Contributing guide: > > Tiago: >> Just a followup on this: I think it makes no sense to put much of the >> new content before there is an official step of moving to github. > > We are serious about moving to Git and need to have the documentation in > place so others can learn it. You wrote up a lot of good stuff, and it > will be lost on the mailing list. Continuing on that topic. I think there are three (more or less separate) issues here: 1) Describing git usage technically, to make sure all developers have a smooth transition to git from CVS 2) Describing typical ways to use git in biopython. This is very important to calrify how we are going to use cool features of git/github in biopython. I'm not advocating here to write it very precisely and I'm fully aware that it's going to change over time as we learn to use things better, but writing things up will help us understand how we want to use git/github. 3) General contributing guide with coding style and testing framework etc. I think that point 3 is quite well separated from the other two points, which are more git related. I think it is also nicely handled by the current wiki page: http://biopython.org/wiki/Contributing. It might be mildly adapted to include some info on git branches, but these will be minor things. Points 1 and 2 are not so easily separable, but I don't think it's a major problem. Current version of the http://biopython.org/wiki/GitMigration touches upon them, but it is meant as a temporary info, so it does not describe how things should be done after we really make the switch. I think we need to spearate these issues (temporary arrangements vs. final desired procedures), so I made a new wiki page: http://biopython.org/wiki/GitUsage which is meant as an early draft of such guidelines. This page is meant to serve as a technical tutorial describing typical tasks in biopython development. Please feel free to modify/expand this page and/or send comments to the mailing list. I've tried to keep it close to our current development model, but there is a lot of room for discussion and I'm very open to new ideas. cheers Bartek From lpritc at scri.ac.uk Thu Mar 26 11:21:26 2009 From: lpritc at scri.ac.uk (Leighton Pritchard) Date: Thu, 26 Mar 2009 11:21:26 +0000 Subject: [Biopython-dev] Biopython on Twitter Message-ID: Hi all, There's a fair old bit of chatter on the latest bandwagon: Twitter, about Biopython (http://search.twitter.com/search?max_id=1393366734&page=1&q=biopython). Seeing as both BioPerl and the OBF have 'official' Twitter accounts, it might be useful to have a Biopython Twitter account as a way of getting news out automatically (there's a python-twitter API: http://code.google.com/p/python-twitter/), and as a way of facilitating conversation or community around Biopython - suitable representatives of the official edifice/holders of the password no doubt to be discussed ;) Anyhoo, to avoid it being squatted in the interim, I've set up an account in Biopython's name, with Peter's email account (thanks, Peter) - he also knows the password. If no-one likes the idea or thinks it worthwhile, or Twitter goes the way of Gopher and OS/2 Warp in short order, it can just die on the vine - but given the number of tweets mentioning Biopython, it would be a shame for that to happen too soon ;) The Biopython Twitter home page is at http://twitter.com/Biopython L. -- Dr Leighton Pritchard MRSC D131, Plant Pathology Programme, SCRI Errol Road, Invergowrie, Perth and Kinross, Scotland, DD2 5DA e:lpritc at scri.ac.uk w:http://www.scri.ac.uk/staff/leightonpritchard gpg/pgp: 0xFEFC205C tel:+44(0)1382 562731 x2405 ______________________________________________________ SCRI, Invergowrie, Dundee, DD2 5DA. The Scottish Crop Research Institute is a charitable company limited by guarantee. Registered in Scotland No: SC 29367. Recognised by the Inland Revenue as a Scottish Charity No: SC 006662. DISCLAIMER: This email is from the Scottish Crop Research Institute, but the views expressed by the sender are not necessarily the views of SCRI and its subsidiaries. This email and any files transmitted with it are confidential to the intended recipient at the e-mail address to which it has been addressed. It may not be disclosed or used by any other than that addressee. If you are not the intended recipient you are requested to preserve this confidentiality and you must not use, disclose, copy, print or rely on this e-mail in any way. Please notify postmaster at scri.ac.uk quoting the name of the sender and delete the email from your system. Although SCRI has taken reasonable precautions to ensure no viruses are present in this email, neither the Institute nor the sender accepts any responsibility for any viruses, and it is your responsibility to scan the email and the attachments (if any). ______________________________________________________ From tiagoantao at gmail.com Thu Mar 26 12:13:20 2009 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Thu, 26 Mar 2009 12:13:20 +0000 Subject: [Biopython-dev] biopython on github In-Reply-To: <20090325215548.GB21577@sobchak.mgh.harvard.edu> References: <320fb6e00903170146x59218aa0m857cab797ad3f440@mail.gmail.com> <20090317124930.GE57054@sobchak.mgh.harvard.edu> <6d941f120903241142m2f39213yfd180fcdc7ab7f0e@mail.gmail.com> <320fb6e00903241154g1a0f468cy512b29504b8b637a@mail.gmail.com> <6d941f120903250839i62f6d8f9i8a5f5b85ff694848@mail.gmail.com> <320fb6e00903250845u23dea2a6o5330bfdec0d577ef@mail.gmail.com> <20090325215548.GB21577@sobchak.mgh.harvard.edu> Message-ID: <6d941f120903260513v734b5dd8kd8d148bebec9674b@mail.gmail.com> Hi, On Wed, Mar 25, 2009 at 9:55 PM, Brad Chapman wrote: > The solution to this is documentation on the wiki. We rely too much on > the mailing list and expect people to keep up. Peter read my mind on > this: I fully agree on this. There is lots of implicit policy that is either not documented at all or only to be read here on the mailing list. All should be on the wiki. Clear, transparent, explicit, for everybody to see (at least that is my personal opinion). > We are serious about moving to Git and need to have the documentation in > place so others can learn it. You wrote up a lot of good stuff, and it > will be lost on the mailing list. I am planning on changing http://biopython.org/wiki/PopGen_dev and "GITify" it completely. I will draft a document with a policy for updates (just as a starting point, please feel free to disagree), the currently existing branches and so on. I will include a set of tips on how to pull stuff from GIT, regarding this part I note: a. maybe this can be moved, in the future, to the general biopython documentaion b. I am far from being a git specialist. Corrections will surely be needed and encouraged. I will write back here when the changes are done. Tiago From jblanca at btc.upv.es Thu Mar 26 12:24:59 2009 From: jblanca at btc.upv.es (Jose Blanca) Date: Thu, 26 Mar 2009 13:24:59 +0100 Subject: [Biopython-dev] [BioPython] about the SeqRecord slicing In-Reply-To: <320fb6e00903260505j387279b7kfa4c69c33efe5487@mail.gmail.com> References: <200903261248.02279.jblanca@btc.upv.es> <320fb6e00903260505j387279b7kfa4c69c33efe5487@mail.gmail.com> Message-ID: <200903261324.59655.jblanca@btc.upv.es> Fisrt of all sorry for sending the last mail to the BioPython general list. On Thursday 26 March 2009 13:05:25 Peter wrote: > Can you give me an example of where you want to pull out a single > character from a SeqRecord, and its quality? ?I would consider things > like this quite elegant: > > for letter, quality in zip(record.seq, > record.letter_annotations("phred_quality") : > ? ?#do stuff I'm implementing a Contig class similar to the Alignment class but with the added capability of supporting sequences that do not start and end at the same position and with the capability of masking the sequences. I'm implementing the __getitem__ method. When I request a column I get for all sequences a int slice and I return the result of adding them all. I could solve the problem as you suggest. The problem is that this Contig class can work also with Seqs and strs (to simplify its use when we don't need a full SeqRecord). If SeqRecord behaves more like a Seq or a str I wouldn't need to check for the special SeqRecord case in the Contig.__getitem__ method. Best regards, -- Jose M. Blanca Postigo Instituto Universitario de Conservacion y Mejora de la Agrodiversidad Valenciana (COMAV) Universidad Politecnica de Valencia (UPV) Edificio CPI (Ciudad Politecnica de la Innovacion), 8E 46022 Valencia (SPAIN) Tlf.:+34-96-3877000 (ext 88473) From chapmanb at 50mail.com Thu Mar 26 12:57:07 2009 From: chapmanb at 50mail.com (Brad Chapman) Date: Thu, 26 Mar 2009 08:57:07 -0400 Subject: [Biopython-dev] biopython on github In-Reply-To: <8b34ec180903251802h30661c80q51aab573f5c07c5@mail.gmail.com> References: <320fb6e00903170146x59218aa0m857cab797ad3f440@mail.gmail.com> <20090317124930.GE57054@sobchak.mgh.harvard.edu> <6d941f120903241142m2f39213yfd180fcdc7ab7f0e@mail.gmail.com> <320fb6e00903241154g1a0f468cy512b29504b8b637a@mail.gmail.com> <6d941f120903250839i62f6d8f9i8a5f5b85ff694848@mail.gmail.com> <320fb6e00903250845u23dea2a6o5330bfdec0d577ef@mail.gmail.com> <20090325215548.GB21577@sobchak.mgh.harvard.edu> <8b34ec180903251802h30661c80q51aab573f5c07c5@mail.gmail.com> Message-ID: <20090326125707.GE21577@sobchak.mgh.harvard.edu> Hi all; Bartek: > Continuing on that topic. I think there are three (more or less > separate) issues here: > 1) Describing git usage technically, to make sure all developers have > a smooth transition to git from CVS > 2) Describing typical ways to use git in biopython. [...] > 3) General contributing guide with coding style and testing framework etc. > > I think that point 3 is quite well separated from the other two > points, which are more git related. I think it is also nicely handled > by the current wiki page: http://biopython.org/wiki/Contributing. [...] > Points 1 and 2 are not so easily separable, but I don't think it's a > major problem. Current version of the > http://biopython.org/wiki/GitMigration > touches upon them, but it is meant as a temporary info, so it does > not describe how things should be done after we really make the > switch. I think we need to spearate these issues (temporary > arrangements vs. final desired procedures), so I made a new wiki page: > http://biopython.org/wiki/GitUsage > which is meant as an early draft of such guidelines. This page is > meant to serve as a technical tutorial describing typical tasks in > biopython development. Great writeup, and I agree with you on everything up until the last point. Why do we need two pages with overlapping information? This means we have to do more work to keep them in sync and creates confusion. GitMigration is/was our documentation page. If it is the name that makes it seem temporary, we should kill GitMigration and re-route all wiki links to GitUsage. Then we can continue forward with getting the documentation up to par on GitUsage. Having the disclaimer that the page and migration is in process is enough of a warning. When we move to git permanently, we can just remove the warnings, update the final links and we will be done. Brad From tiagoantao at gmail.com Thu Mar 26 13:09:31 2009 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Thu, 26 Mar 2009 13:09:31 +0000 Subject: [Biopython-dev] biopython on github In-Reply-To: <20090326125707.GE21577@sobchak.mgh.harvard.edu> References: <320fb6e00903170146x59218aa0m857cab797ad3f440@mail.gmail.com> <20090317124930.GE57054@sobchak.mgh.harvard.edu> <6d941f120903241142m2f39213yfd180fcdc7ab7f0e@mail.gmail.com> <320fb6e00903241154g1a0f468cy512b29504b8b637a@mail.gmail.com> <6d941f120903250839i62f6d8f9i8a5f5b85ff694848@mail.gmail.com> <320fb6e00903250845u23dea2a6o5330bfdec0d577ef@mail.gmail.com> <20090325215548.GB21577@sobchak.mgh.harvard.edu> <8b34ec180903251802h30661c80q51aab573f5c07c5@mail.gmail.com> <20090326125707.GE21577@sobchak.mgh.harvard.edu> Message-ID: <6d941f120903260609q247ad2b0o4c810fa7afda7449@mail.gmail.com> I've added some text regarding git on http://biopython.org/wiki/PopGen_dev (see "Code and Contributing" and "Existing Development branches"). Feel free to criticise. I've included a link to the wonderful GitUsage page Giovanni: if you feel that I've deleted/changed something I should not have, please say. On Thu, Mar 26, 2009 at 12:57 PM, Brad Chapman wrote: > Hi all; > > Bartek: >> Continuing on that topic. I think there are three (more or less >> separate) issues here: >> 1) Describing git usage technically, to make sure all developers have >> a smooth transition to git from CVS >> 2) Describing typical ways to use git in biopython. > [...] >> 3) General contributing guide with coding style and testing framework etc. >> >> I think that point 3 is quite well separated from the other two >> points, which are more git related. I think it is also nicely handled >> by the current wiki page: http://biopython.org/wiki/Contributing. > [...] >> Points 1 and 2 are not so easily separable, but I don't think it's a >> major problem. Current version of the >> http://biopython.org/wiki/GitMigration >> ?touches upon them, but it is meant as a temporary info, so it does >> not describe how things should be done after we really make the >> switch. I think we need to spearate these issues (temporary >> arrangements vs. final desired procedures), so I made a new wiki page: >> ?http://biopython.org/wiki/GitUsage >> which is meant as an early draft of such guidelines. This page is >> meant to serve as a technical tutorial describing typical tasks in >> biopython development. > > Great writeup, and I agree with you on everything up until the last > point. Why do we need two pages with overlapping information? This > means we have to do more work to keep them in sync and creates confusion. > GitMigration is/was our documentation page. If it is the name that > makes it seem temporary, we should kill GitMigration and re-route all > wiki links to GitUsage. Then we can continue forward with getting > the documentation up to par on GitUsage. > > Having the disclaimer that the page and migration is in process is > enough of a warning. When we move to git permanently, we can just > remove the warnings, update the final links and we will be done. > > Brad > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev > -- "A man who dares to waste one hour of time has not discovered the value of life" - Charles Darwin From bartek at rezolwenta.eu.org Thu Mar 26 14:49:54 2009 From: bartek at rezolwenta.eu.org (Bartek Wilczynski) Date: Thu, 26 Mar 2009 15:49:54 +0100 Subject: [Biopython-dev] biopython on github In-Reply-To: <20090326125707.GE21577@sobchak.mgh.harvard.edu> References: <320fb6e00903170146x59218aa0m857cab797ad3f440@mail.gmail.com> <20090317124930.GE57054@sobchak.mgh.harvard.edu> <6d941f120903241142m2f39213yfd180fcdc7ab7f0e@mail.gmail.com> <320fb6e00903241154g1a0f468cy512b29504b8b637a@mail.gmail.com> <6d941f120903250839i62f6d8f9i8a5f5b85ff694848@mail.gmail.com> <320fb6e00903250845u23dea2a6o5330bfdec0d577ef@mail.gmail.com> <20090325215548.GB21577@sobchak.mgh.harvard.edu> <8b34ec180903251802h30661c80q51aab573f5c07c5@mail.gmail.com> <20090326125707.GE21577@sobchak.mgh.harvard.edu> Message-ID: <8b34ec180903260749q2b59594fo1d34cd1f721ff3b7@mail.gmail.com> Hi, On Thu, Mar 26, 2009 at 1:57 PM, Brad Chapman wrote: > Great writeup, and I agree with you on everything up until the last > point. Why do we need two pages with overlapping information? This > means we have to do more work to keep them in sync and creates confusion. > GitMigration is/was our documentation page. If it is the name that > makes it seem temporary, we should kill GitMigration and re-route all > wiki links to GitUsage. Then we can continue forward with getting > the documentation up to par on GitUsage. > > Having the disclaimer that the page and migration is in process is > enough of a warning. When we move to git permanently, we can just > remove the warnings, update the final links and we will be done. > I agree that two pages with mostly the same stuff is too much. My original idea was to first extract the "non-temporary" info from the GitMigration page and expand it into the GitUsage page. It needs a lot of work but at least the extraction part is don. Now I would suggest not to kill the GitMigration, but to remove most things from it and just leave the stuff relevant for the (hopefully not too long) transitional period. After a second of thought I decided to go ahead and change the GitMigration so that it does not overlap with GitUsage. See for yourself here: http://biopython.org/wiki/GitMigration We can revert the changes if people don't like it. cheers Bartek From biopython at maubp.freeserve.co.uk Thu Mar 26 15:07:33 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 26 Mar 2009 15:07:33 +0000 Subject: [Biopython-dev] [BioPython] about the SeqRecord slicing In-Reply-To: <200903261324.59655.jblanca@btc.upv.es> References: <200903261248.02279.jblanca@btc.upv.es> <320fb6e00903260505j387279b7kfa4c69c33efe5487@mail.gmail.com> <200903261324.59655.jblanca@btc.upv.es> Message-ID: <320fb6e00903260807m64d36b55n41cce7510a6809e3@mail.gmail.com> On Thu, Mar 26, 2009 at 12:24 PM, Jose Blanca wrote: > On Thursday 26 March 2009 13:05:25 Peter wrote: >> Can you give me an example of where you want to pull out a single >> character from a SeqRecord, and its quality? ?I would consider things >> like this quite elegant: >> >> for letter, quality in zip(record.seq, >> record.letter_annotations("phred_quality") : >> ? ?#do stuff > > I'm implementing a Contig class similar to the Alignment class but with the > added capability of supporting sequences that do not start and end at the > same position and with the capability of masking the sequences. > I'm implementing the __getitem__ method. > When I request a column I get for all sequences a int slice and I return the > result of adding them all. I could solve the problem as you suggest. The > problem is that this Contig class can work also with Seqs and strs (to > simplify its use when we don't need a full SeqRecord). If SeqRecord behaves > more like a Seq or a str I wouldn't need to check for the special SeqRecord > case in the Contig.__getitem__ method. > Best regards, If you pull out a column from a Seq or string based alignment, there is no annotation to worry about, and you can return the column as a Seq or string. As things stand, if it was a SeqRecord based alignment, having my_string[i], my_seq[i] and my_seqrecord[i] all return a single letter string is actually rather nice for generic code - as long as you are happy returning a Seq or a string for the column. However, if I understand you, when pulling a column from a SeqRecord based alignment in addition to the column's sequence you'd like the get the per-letter-annotations as well. This assumes that all the SeqRecord objects in the alignment have the same per-letter-annotation present - some might have quality and others might not! But how would you want to store this new column object? Using a string or a Seq doesn't support any annotation - you *could* use a SeqRecord with no id, name, description, features, annotation - just a sequence and any common per-letter-annotation. Is this what you had in mind? Peter From jblanca at btc.upv.es Thu Mar 26 15:14:13 2009 From: jblanca at btc.upv.es (Jose Blanca) Date: Thu, 26 Mar 2009 16:14:13 +0100 Subject: [Biopython-dev] [BioPython] about the SeqRecord slicing In-Reply-To: <320fb6e00903260807m64d36b55n41cce7510a6809e3@mail.gmail.com> References: <200903261248.02279.jblanca@btc.upv.es> <200903261324.59655.jblanca@btc.upv.es> <320fb6e00903260807m64d36b55n41cce7510a6809e3@mail.gmail.com> Message-ID: <200903261614.13454.jblanca@btc.upv.es> On Thursday 26 March 2009 16:07:33 Peter wrote: > On Thu, Mar 26, 2009 at 12:24 PM, Jose Blanca wrote: > > On Thursday 26 March 2009 13:05:25 Peter wrote: > However, if I understand you, when pulling a column from a SeqRecord > based alignment in addition to the column's sequence you'd like the get the > per-letter-annotations as well. This assumes that all the SeqRecord > objects in the alignment have the same per-letter-annotation present - some > might have quality and others might not! But how would you want to store > this new column object? Using a string or a Seq doesn't support any > annotation - you *could* use a SeqRecord with no id, name, description, > features, annotation - just a sequence and any common > per-letter-annotation. Is this what you had in mind? Yes, that's exactly what I have in mind. Do you see any problem with that approach? -- Jose M. Blanca Postigo Instituto Universitario de Conservacion y Mejora de la Agrodiversidad Valenciana (COMAV) Universidad Politecnica de Valencia (UPV) Edificio CPI (Ciudad Politecnica de la Innovacion), 8E 46022 Valencia (SPAIN) Tlf.:+34-96-3877000 (ext 88473) From biopython at maubp.freeserve.co.uk Thu Mar 26 15:32:23 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 26 Mar 2009 15:32:23 +0000 Subject: [Biopython-dev] [BioPython] about the SeqRecord slicing In-Reply-To: <200903261614.13454.jblanca@btc.upv.es> References: <200903261248.02279.jblanca@btc.upv.es> <200903261324.59655.jblanca@btc.upv.es> <320fb6e00903260807m64d36b55n41cce7510a6809e3@mail.gmail.com> <200903261614.13454.jblanca@btc.upv.es> Message-ID: <320fb6e00903260832m65c6888dpc856d033ceceda5@mail.gmail.com> On Thu, Mar 26, 2009 at 3:14 PM, Jose Blanca wrote: > On Thursday 26 March 2009 16:07:33 Peter wrote: >> However, if I understand you, when pulling a column from a SeqRecord >> based alignment in addition to the column's sequence you'd like the get the >> per-letter-annotations as well. ?This assumes that all the SeqRecord >> objects in the alignment have the same per-letter-annotation present - some >> might have quality and others might not! ?But how would you want to store >> this new column object? ?Using a string or a Seq doesn't support any >> annotation - you *could* use a SeqRecord with no id, name, description, >> features, annotation - just a sequence and any common >> per-letter-annotation. ?Is this what you had in mind? > > Yes, that's exactly what I have in mind. Do you see any problem with that > approach? Well yes. For your code to work on SeqRecord objects (based on the verbal description earlier), it needs at least the following changes to the SeqRecord: The SeqRecord __getitem__ would have to return a SeqRecord when given a single integer index, holding a single letter sequence. What about the name/id/description and annotations (e.g. organism) - do they really apply to a single letter from the sequence? Technically writing the code to offer this isn't such a problem, but I am unconvinced this is the best behaviour for normal usage. Also closely related to this, what would you expect __iter__ to iterate over? Currently it acts like iteration over the record's sequence. You'd also want the SeqRecord to support __add__ (and __radd__) so that two SeqRecord objects can be added together. I have thought about this before, and it is a *much* more complicated issue due to the meta data. In general the only safe and unambiguous choice is to exclude it from the combined record: * sequence - just add (using normal rules for adding Seq objects) * name/id/description - if the two agree, use that? Otherwise default to a blank value? * annotations - for each keyed value, you could combine the entries? Or just throwing them all away? * letter_annotations - if an entry is present in both you can combine it. Otherwise throw them away? * features - these could be combined, adjusting the locations for one record's features as appropriate I'm not ruling out adding SeqRecord addition, but I don't want to rush it while we are trying to get Biopython 1.50 done. Peter From biopython at maubp.freeserve.co.uk Thu Mar 26 15:49:49 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 26 Mar 2009 15:49:49 +0000 Subject: [Biopython-dev] Biopython on Twitter In-Reply-To: References: Message-ID: <320fb6e00903260849n683d3e39kf68fd91727970dc7@mail.gmail.com> On Thu, Mar 26, 2009 at 11:21 AM, Leighton Pritchard wrote: > Hi all, > > There's a fair old bit of chatter on the latest bandwagon: Twitter, about > Biopython > (http://search.twitter.com/search?max_id=1393366734&page=1&q=biopython). > Seeing as both BioPerl and the OBF have 'official' Twitter accounts, it > might be useful to have a Biopython Twitter account as a way of getting news > out automatically (there's a python-twitter API: > http://code.google.com/p/python-twitter/), and as a way of facilitating > conversation or community around Biopython - suitable representatives of the > official edifice/holders of the password no doubt to be discussed ;) > > Anyhoo, to avoid it being squatted in the interim, I've set up an account in > Biopython's name, with Peter's email account (thanks, Peter) - he also knows > the password. > > If no-one likes the idea or thinks it worthwhile, or Twitter goes the way of > Gopher and OS/2 Warp in short order, it can just die on the vine - but given > the number of tweets mentioning Biopython, it would be a shame for that to > happen too soon ;) > > The Biopython Twitter home page is at http://twitter.com/Biopython Quite a few people have started following this already - which is fun. I see the OBF news page entries are automatically pushed to their twitter account, http://twitter.com/obf_news plus the BioPerl tagged entries are also pushed to http://twitter.com/bioperl - I'll get in touch to see how they did it so we can have the Biopython news feed automatically echoed to twitter as well. This servers as a good point to remind/inform you that there are RSS, Atom etc feeds for the Biopython news - links on http://biopython.org/wiki/News e.g. http://news.open-bio.org/news/category/obf-projects/biopython/feed/rdf http://news.open-bio.org/news/category/obf-projects/biopython/feed/rss http://news.open-bio.org/news/category/obf-projects/biopython/feed/rss2 http://news.open-bio.org/news/category/obf-projects/biopython/feed/atom We could probably also echo the CVS (or git) RSS feed into twitter, but I suspect that would drown out any more interesting tweets. The RSS feed is listed on http://biopython.org/wiki/CVS and shown on the wiki too at: http://biopython.org/wiki/Tracking_CVS_commits (not sure how often this gets updated). The feed itself is here: http://biopython.open-bio.org/CVS2RSS/biopython.rss Peter From lpritc at scri.ac.uk Thu Mar 26 16:31:07 2009 From: lpritc at scri.ac.uk (Leighton Pritchard) Date: Thu, 26 Mar 2009 16:31:07 +0000 Subject: [Biopython-dev] Biopython on Twitter In-Reply-To: <320fb6e00903260849n683d3e39kf68fd91727970dc7@mail.gmail.com> Message-ID: Hi all, It's great to see that people have picked up on the Biopython Twitter account already - I hope that it proves useful in the longer term. Regarding the social etiquette of Twitter, and the ease with which 'following' can be taken to imply 'approval' I wonder if it would be a good policy to restrict the Twitter accounts that Biopython follows only to those representing organisations or groups. Following some individuals and not others might be seen to privilege a self-selecting group, cabal or 'elite', even the accidental suggestion of which I think would be best avoided. On 26/03/2009 15:49, "Peter" wrote: > > Quite a few people have started following this already - which is fun. I see > the OBF news page entries are automatically pushed to their twitter account, > http://twitter.com/obf_news plus the BioPerl tagged entries are also pushed > to http://twitter.com/bioperl - I'll get in touch to see how they did > it so we can [...] > We could probably also echo the CVS (or git) RSS feed into twitter, but I > suspect that would drown out any more interesting tweets. Signal to noise is apparently not an issue that bothers very many Tweeters, but I see no harm in starting a trend ;) L. -- Dr Leighton Pritchard MRSC D131, Plant Pathology Programme, SCRI Errol Road, Invergowrie, Perth and Kinross, Scotland, DD2 5DA e:lpritc at scri.ac.uk w:http://www.scri.ac.uk/staff/leightonpritchard gpg/pgp: 0xFEFC205C tel:+44(0)1382 562731 x2405 ______________________________________________________ SCRI, Invergowrie, Dundee, DD2 5DA. The Scottish Crop Research Institute is a charitable company limited by guarantee. Registered in Scotland No: SC 29367. Recognised by the Inland Revenue as a Scottish Charity No: SC 006662. DISCLAIMER: This email is from the Scottish Crop Research Institute, but the views expressed by the sender are not necessarily the views of SCRI and its subsidiaries. This email and any files transmitted with it are confidential to the intended recipient at the e-mail address to which it has been addressed. It may not be disclosed or used by any other than that addressee. If you are not the intended recipient you are requested to preserve this confidentiality and you must not use, disclose, copy, print or rely on this e-mail in any way. Please notify postmaster at scri.ac.uk quoting the name of the sender and delete the email from your system. Although SCRI has taken reasonable precautions to ensure no viruses are present in this email, neither the Institute nor the sender accepts any responsibility for any viruses, and it is your responsibility to scan the email and the attachments (if any). ______________________________________________________ From jblanca at btc.upv.es Fri Mar 27 08:22:27 2009 From: jblanca at btc.upv.es (Jose Blanca) Date: Fri, 27 Mar 2009 09:22:27 +0100 Subject: [Biopython-dev] [BioPython] about the SeqRecord slicing In-Reply-To: <320fb6e00903260832m65c6888dpc856d033ceceda5@mail.gmail.com> References: <200903261248.02279.jblanca@btc.upv.es> <200903261614.13454.jblanca@btc.upv.es> <320fb6e00903260832m65c6888dpc856d033ceceda5@mail.gmail.com> Message-ID: <200903270922.27152.jblanca@btc.upv.es> On Thursday 26 March 2009 16:32:23 Peter wrote: > The SeqRecord __getitem__ would have to return a SeqRecord when given > a single integer index, holding a single letter sequence. What about > the name/id/description and annotations (e.g. organism) - do they > really apply to a single letter from the sequence? Technically > writing the code to offer this isn't such a problem, but I am > unconvinced this is the best behaviour for normal usage. You're right, I was not thinking on the rest of the properties because I don't need them. They're a problem when slicing and adding SeqRecords. But they're also a problem in standard slicing. Should the annotations be kept when the SeqRecord is sliced? Are they still relevant? None of the behaviours will be ok for all the cases. > Also closely related to this, what would you expect __iter__ to > iterate over? Currently it acts like iteration over the record's > sequence. The SeqRecord can already hold a sequence of length one, so we have the same problem. In fact I could do seq_rec[n:n+1] and I would obtain the SeqRecord that I want. > You'd also want the SeqRecord to support __add__ (and __radd__) so > that two SeqRecord objects can be added together. I have thought > about this before, and it is a *much* more complicated issue due to > the meta data. In general the only safe and unambiguous choice is to > exclude it from the combined record: > * sequence - just add (using normal rules for adding Seq objects) > * name/id/description - if the two agree, use that? Otherwise default > to a blank value? > * annotations - for each keyed value, you could combine the entries? > Or just throwing them all away? > * letter_annotations - if an entry is present in both you can combine > it. Otherwise throw them away? > * features - these could be combined, adjusting the locations for one > record's features as appropriate As I said before I think that the same problem is presented when you do a slice. If I have the sequence of a gene named X with some annotations and I slice a part, is still be named geneX? Should the annotations be kept? > I'm not ruling out adding SeqRecord addition, but I don't want to rush > it while we are trying to get Biopython 1.50 done. That's quite sensible. I think that is a good thing to discuss all this issues, I keep learning a lot from you. Best regards, -- Jose M. Blanca Postigo Instituto Universitario de Conservacion y Mejora de la Agrodiversidad Valenciana (COMAV) Universidad Politecnica de Valencia (UPV) Edificio CPI (Ciudad Politecnica de la Innovacion), 8E 46022 Valencia (SPAIN) Tlf.:+34-96-3877000 (ext 88473) From biopython at maubp.freeserve.co.uk Fri Mar 27 10:29:10 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 27 Mar 2009 10:29:10 +0000 Subject: [Biopython-dev] [BioPython] about the SeqRecord slicing In-Reply-To: <200903270922.27152.jblanca@btc.upv.es> References: <200903261248.02279.jblanca@btc.upv.es> <200903261614.13454.jblanca@btc.upv.es> <320fb6e00903260832m65c6888dpc856d033ceceda5@mail.gmail.com> <200903270922.27152.jblanca@btc.upv.es> Message-ID: <320fb6e00903270329r74a48dcerf8e00a0ba3776af4@mail.gmail.com> On Fri, Mar 27, 2009 at 8:22 AM, Jose Blanca wrote: > On Thursday 26 March 2009 16:32:23 Peter wrote: > >> You'd also want the SeqRecord to support __add__ (and __radd__) so >> that two SeqRecord objects can be added together. ?I have thought >> about this before, and it is a *much* more complicated issue due to >> the meta data. ?In general the only safe and unambiguous choice is to >> exclude it from the combined record: >> * sequence - just add (using normal rules for adding Seq objects) >> * name/id/description - if the two agree, use that? ?Otherwise default >> to a blank value? >> * annotations - for each keyed value, you could combine the entries? >> Or just throwing them all away? >> * letter_annotations - if an entry is present in both you can combine >> it. ?Otherwise throw them away? >> * features - these could be combined, adjusting the locations for one >> record's features as appropriate > > As I said before I think that the same problem is presented when you do a > slice. If I have the sequence of a gene named X with some annotations and I > slice a part, is still be named geneX? Should the annotations be kept? The problems about the annotation when slicing a SeqRecord are similar, but I think things are worse when adding two SeqRecords together. For slicing, there are a few sub of cases: - per-letter-annotation can be sliced too - easy. - features - we retain only features fully inside the new sub-sequence (the border line features which cross the slice boundary are a small problem - excluding them is the simplest solution to code and explain). - id/name - debatable. Currently kept. - description - debatable. Consider a description which says "whole genome", that doesn't really apply to a partial sequence. On the other hand, it may. Currently kept for the sub-record. - annotations - again debatable. Without context information, we can't guess. The only sensible options are keep it all (as in CVS) or none of it. I think it is worth keeping the id/name in general (consider typical use cases like cropping a domain from a gene, or cropping columns off an alignment). I would be OK with dropping the contents of the annotations dictionary and description is order to avoid ambiguity, but this would prevent certain tasks. Peter From sbassi at clubdelarazon.org Fri Mar 27 13:31:01 2009 From: sbassi at clubdelarazon.org (Sebastian Bassi) Date: Fri, 27 Mar 2009 10:31:01 -0300 Subject: [Biopython-dev] [BioPython] about the SeqRecord slicing In-Reply-To: <320fb6e00903270329r74a48dcerf8e00a0ba3776af4@mail.gmail.com> References: <200903261248.02279.jblanca@btc.upv.es> <200903261614.13454.jblanca@btc.upv.es> <320fb6e00903260832m65c6888dpc856d033ceceda5@mail.gmail.com> <200903270922.27152.jblanca@btc.upv.es> <320fb6e00903270329r74a48dcerf8e00a0ba3776af4@mail.gmail.com> Message-ID: <9e2f512b0903270631l2b806f55oc02b1e1396bd0bfb@mail.gmail.com> On Fri, Mar 27, 2009 at 7:29 AM, Peter wrote: .... > - id/name - debatable. Currently kept. > - description - debatable. Consider a description which says "whole genome", > that doesn't really apply to a partial sequence. On the other hand, it may. > Currently kept for the sub-record. I think is up to the user to keep updated the id/name/descripption field when slicing a sequence. ..... > I would be OK with dropping the contents of the annotations dictionary and > description is order to avoid ambiguity, but this would prevent certain tasks. Another option is to make this behavior optional (I mean, select to keep or to drop the annotations, but default I would drop them). From biopython at maubp.freeserve.co.uk Fri Mar 27 13:57:30 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 27 Mar 2009 13:57:30 +0000 Subject: [Biopython-dev] [BioPython] about the SeqRecord slicing In-Reply-To: <9e2f512b0903270631l2b806f55oc02b1e1396bd0bfb@mail.gmail.com> References: <200903261248.02279.jblanca@btc.upv.es> <200903261614.13454.jblanca@btc.upv.es> <320fb6e00903260832m65c6888dpc856d033ceceda5@mail.gmail.com> <200903270922.27152.jblanca@btc.upv.es> <320fb6e00903270329r74a48dcerf8e00a0ba3776af4@mail.gmail.com> <9e2f512b0903270631l2b806f55oc02b1e1396bd0bfb@mail.gmail.com> Message-ID: <320fb6e00903270657j1aa06199o4996f11c25bf2a3b@mail.gmail.com> On Fri, Mar 27, 2009 at 1:31 PM, Sebastian Bassi wrote: > I think is up to the user to keep updated the id/name/descripption > field when slicing a sequence. If you make a new SeqRecord by first slicing a Seq object (which is how you have to do it with Biopython 1.49 or older), then dealing with ALL the annotation is explicitly in the hands of the user. Or are you saying when slicing a SeqRecord you wouldn't expect the id/name/description to be preserved for the sub-record? > ..... >> I would be OK with dropping the contents of the annotations >> dictionary and description is order to avoid ambiguity, but >> this would prevent certain tasks. > > Another option is to make this behavior optional (I mean, select to > keep or to drop the annotations, but default I would drop them). How would you make it optional? As an extra non-standard argument to __getitem__? e.g.something like my_record[10:50, annotation=False]? That seems nasty. I am sympathetic to dropping the annotations dictionary when creating a "child" SeqRecord when slicing its parent. There is also the database cross reference list (which i forgot on my last email). Again, I wouldn't object to dropping this for a sliced sub-record. If we did drop the annotations and dbxrefs when slicing, the user can manually choose to explicitly copy them from the parent object if the do want them. Peter From jblanca at btc.upv.es Fri Mar 27 14:02:57 2009 From: jblanca at btc.upv.es (Jose Blanca) Date: Fri, 27 Mar 2009 15:02:57 +0100 Subject: [Biopython-dev] [BioPython] about the SeqRecord slicing In-Reply-To: <320fb6e00903270657j1aa06199o4996f11c25bf2a3b@mail.gmail.com> References: <200903261248.02279.jblanca@btc.upv.es> <9e2f512b0903270631l2b806f55oc02b1e1396bd0bfb@mail.gmail.com> <320fb6e00903270657j1aa06199o4996f11c25bf2a3b@mail.gmail.com> Message-ID: <200903271502.57872.jblanca@btc.upv.es> On Friday 27 March 2009 14:57:30 Peter wrote: > How would you make it optional? As an extra non-standard argument > to __getitem__? e.g.something like my_record[10:50, annotation=False]? > That seems nasty. That's very nasty, not pythonic, and adds complexity to the api. > I am sympathetic to dropping the annotations dictionary when creating > a "child" SeqRecord when slicing its parent. There is also the database > cross reference list (which i forgot on my last email). Again, I wouldn't > object to dropping this for a sliced sub-record. > > If we did drop the annotations and dbxrefs when slicing, the user can > manually choose to explicitly copy them from the parent object if the > do want them. I also think that dropping all that stuff when slicing or adding is the best behaviour. -- Jose M. Blanca Postigo Instituto Universitario de Conservacion y Mejora de la Agrodiversidad Valenciana (COMAV) Universidad Politecnica de Valencia (UPV) Edificio CPI (Ciudad Politecnica de la Innovacion), 8E 46022 Valencia (SPAIN) Tlf.:+34-96-3877000 (ext 88473) From sbassi at clubdelarazon.org Fri Mar 27 14:17:55 2009 From: sbassi at clubdelarazon.org (Sebastian Bassi) Date: Fri, 27 Mar 2009 11:17:55 -0300 Subject: [Biopython-dev] [BioPython] about the SeqRecord slicing In-Reply-To: <320fb6e00903270657j1aa06199o4996f11c25bf2a3b@mail.gmail.com> References: <200903261248.02279.jblanca@btc.upv.es> <200903261614.13454.jblanca@btc.upv.es> <320fb6e00903260832m65c6888dpc856d033ceceda5@mail.gmail.com> <200903270922.27152.jblanca@btc.upv.es> <320fb6e00903270329r74a48dcerf8e00a0ba3776af4@mail.gmail.com> <9e2f512b0903270631l2b806f55oc02b1e1396bd0bfb@mail.gmail.com> <320fb6e00903270657j1aa06199o4996f11c25bf2a3b@mail.gmail.com> Message-ID: <9e2f512b0903270717s13c82d19v7c48dddda4a8fcb@mail.gmail.com> On Fri, Mar 27, 2009 at 10:57 AM, Peter wrote: > How would you make it optional? As an extra non-standard argument > to __getitem__? e.g.something like my_record[10:50, annotation=False]? > That seems nasty. Yes it is nasty this way, I never meant to do it in __getitem__. Anyway I can't think a nice and intuitive way to do it. > If we did drop the annotations and dbxrefs when slicing, the user can > manually choose to explicitly copy them from the parent object if the > do want them. Yes, that is OK. From biopython at maubp.freeserve.co.uk Fri Mar 27 14:24:13 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 27 Mar 2009 14:24:13 +0000 Subject: [Biopython-dev] [BioPython] about the SeqRecord slicing In-Reply-To: <9e2f512b0903270717s13c82d19v7c48dddda4a8fcb@mail.gmail.com> References: <200903261248.02279.jblanca@btc.upv.es> <200903261614.13454.jblanca@btc.upv.es> <320fb6e00903260832m65c6888dpc856d033ceceda5@mail.gmail.com> <200903270922.27152.jblanca@btc.upv.es> <320fb6e00903270329r74a48dcerf8e00a0ba3776af4@mail.gmail.com> <9e2f512b0903270631l2b806f55oc02b1e1396bd0bfb@mail.gmail.com> <320fb6e00903270657j1aa06199o4996f11c25bf2a3b@mail.gmail.com> <9e2f512b0903270717s13c82d19v7c48dddda4a8fcb@mail.gmail.com> Message-ID: <320fb6e00903270724r432b4daco920648d921890623@mail.gmail.com> On Fri, Mar 27, 2009 at 2:17 PM, Sebastian Bassi wrote: > On Fri, Mar 27, 2009 at 10:57 AM, Peter wrote: >> How would you make it optional? ?As an extra non-standard argument >> to __getitem__? ?e.g.something like my_record[10:50, annotation=False]? >> That seems nasty. > > Yes it is nasty this way, I never meant to do it in __getitem__. > Anyway I can't think a nice and intuitive way to do it. Me neither right now. >> If we did drop the annotations and dbxrefs when slicing, the user can >> manually choose to explicitly copy them from the parent object if the >> do want them. > > Yes, that is OK. Jose agrees, so that makes a mini consensus (at least amongst everyone who has tried the CVS code and posted to this thread). I've made that change in CVS, see Bio/SeqRecord.py revision 1.31. http://cvs.biopython.org/cgi-bin/viewcvs/viewcvs.cgi/biopython/Bio/SeqRecord.py?cvsroot=biopython As I said before, I want to preserve the id and name - preserving these would be key for cross referencing the sub-record back to its parent. Do either of you think we should also discard the description? Peter From eric.talevich at gmail.com Fri Mar 27 15:16:19 2009 From: eric.talevich at gmail.com (Eric Talevich) Date: Fri, 27 Mar 2009 11:16:19 -0400 Subject: [Biopython-dev] [BioPython] about the SeqRecord slicing In-Reply-To: <320fb6e00903270724r432b4daco920648d921890623@mail.gmail.com> References: <200903261248.02279.jblanca@btc.upv.es> <200903261614.13454.jblanca@btc.upv.es> <320fb6e00903260832m65c6888dpc856d033ceceda5@mail.gmail.com> <200903270922.27152.jblanca@btc.upv.es> <320fb6e00903270329r74a48dcerf8e00a0ba3776af4@mail.gmail.com> <9e2f512b0903270631l2b806f55oc02b1e1396bd0bfb@mail.gmail.com> <320fb6e00903270657j1aa06199o4996f11c25bf2a3b@mail.gmail.com> <9e2f512b0903270717s13c82d19v7c48dddda4a8fcb@mail.gmail.com> <320fb6e00903270724r432b4daco920648d921890623@mail.gmail.com> Message-ID: <3f6baf360903270816x4fcfd8ccg5906a9edb53709d4@mail.gmail.com> On Fri, Mar 27, 2009 at 10:24 AM, Peter wrote: > On Fri, Mar 27, 2009 at 2:17 PM, Sebastian Bassi > wrote: > > On Fri, Mar 27, 2009 at 10:57 AM, Peter > wrote: > >> How would you make it optional? As an extra non-standard argument > >> to __getitem__? e.g.something like my_record[10:50, annotation=False]? > >> That seems nasty. > > > > Yes it is nasty this way, I never meant to do it in __getitem__. > > Anyway I can't think a nice and intuitive way to do it. > > Me neither right now. > > >> If we did drop the annotations and dbxrefs when slicing, the user can > >> manually choose to explicitly copy them from the parent object if the > >> do want them. > > > > Yes, that is OK. > > One way to allow non-default options for adding and slicing is to provide a couple of functions at the class or module level (classmethod, staticmethod, plain ol' function) that have the necessary keyword arguments. These functions would do the same thing by default as the corresponding syntax, and the syntax-friendly magic methods would just pass their arguments straight to these functions. This makes the syntax pretty for the common cases, and makes the nonstandard stuff visually obvious. Examples: my_record.slice(10, 50) == my_record[10:50] my_record.slice(10, 50, annotation=True) == my_record[10:50] plus updated annotations my_record.add(other_record) == my_record + other_record my_record.add(other_record, annotation=True) == my_record + other_record, keeping annotations my_record.slice(10, 50, annotation=True).add( my_record.slice(100, 200, annotation=True), annotation=True) == my_record[10:50] + my_record[100:200], keeping all annotations (a pain otherwise) From biopython at maubp.freeserve.co.uk Fri Mar 27 15:51:53 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 27 Mar 2009 15:51:53 +0000 Subject: [Biopython-dev] [BioPython] about the SeqRecord slicing In-Reply-To: <3f6baf360903270816x4fcfd8ccg5906a9edb53709d4@mail.gmail.com> References: <200903261248.02279.jblanca@btc.upv.es> <200903261614.13454.jblanca@btc.upv.es> <320fb6e00903260832m65c6888dpc856d033ceceda5@mail.gmail.com> <200903270922.27152.jblanca@btc.upv.es> <320fb6e00903270329r74a48dcerf8e00a0ba3776af4@mail.gmail.com> <9e2f512b0903270631l2b806f55oc02b1e1396bd0bfb@mail.gmail.com> <320fb6e00903270657j1aa06199o4996f11c25bf2a3b@mail.gmail.com> <9e2f512b0903270717s13c82d19v7c48dddda4a8fcb@mail.gmail.com> <320fb6e00903270724r432b4daco920648d921890623@mail.gmail.com> <3f6baf360903270816x4fcfd8ccg5906a9edb53709d4@mail.gmail.com> Message-ID: <320fb6e00903270851i47db9121p6d272b5f7095a5d3@mail.gmail.com> On Fri, Mar 27, 2009 at 3:16 PM, Eric Talevich wrote: > One way to allow non-default options for adding and slicing is to provide a > couple of functions at the class or module level (classmethod, staticmethod, > plain ol' function) that have the necessary keyword arguments. These > functions would do the same thing by default as the corresponding syntax, > and the syntax-friendly magic methods would just pass their arguments > straight to these functions. This makes the syntax pretty for the common > cases, and makes the nonstandard stuff visually obvious. > > Examples: > > my_record.slice(10, 50) == my_record[10:50] > my_record.slice(10, 50, annotation=True) == my_record[10:50] plus updated > annotations > ... I think I understand your idea, but I'm not very keen on adding slice and add methods as alternatives to __getitem__ and __add__. As things stand (with CVS after the change an hour ago), if you want the annotations dictionary copied with a slice you must do this explicitly: >>> from Bio import SeqIO >>> my_record = SeqIO.read(open("NC_005816.gb"),"genbank") >>> my_record SeqRecord(seq=Seq('TGTAACGAACGGTGCAATAGTGATCCACACCCAACGCCTGAAATCAGATCCAGG...CTG', IUPACAmbiguousDNA()), id='NC_005816.1', name='NC_005816', description='Yersinia pestis biovar Microtus str. 91001 plasmid pPCP1, complete sequence.', dbxrefs=['Project:10638']) >>> len(my_record) 9609 >>> len(my_record.features) 29 >>> len(my_record.annotations) 11 >>> len(my_record.dbxrefs) 1 Doing a slice will not copy/preserve the annotations dict or dbxrefs list: >>> sub_record = my_record[1000:2000] >>> sub_record SeqRecord(seq=Seq('GAAAAAAGAGTATGACGTGCATCTTGATGAAAATCTGGTGAACTTCGACAAACA...GGA', IUPACAmbiguousDNA()), id='NC_005816.1', name='NC_005816', description='Yersinia pestis biovar Microtus str. 91001 plasmid pPCP1, complete sequence.', dbxrefs=[]) >>> len(sub_record) 1000 >>> len(sub_record.features) 2 >>> assert not sub_record.annotations and not sub_record.dbxrefs You can then choose to blindly reuse the annotations and dbxrefs if you want to: >>> sub_record.annotations = my_record.anntations #shares the dict >>> sub_record.dbxrefs = my_record.dbxrefs #shares the list or as a simple copy: >>> sub_record.annotations = my_record.annotations.copy() >>> sub_record.dbxrefs = my_record.dbxrefs[:] The good thing about this is it makes you think about the annotations, and which (if any) are appropriate to transfer to the sub-record. As per my earlier email, maybe we should do the same with the description? Peter From chapmanb at 50mail.com Sun Mar 29 01:06:52 2009 From: chapmanb at 50mail.com (Brad Chapman) Date: Sat, 28 Mar 2009 21:06:52 -0400 Subject: [Biopython-dev] Biopython on Twitter In-Reply-To: References: <320fb6e00903260849n683d3e39kf68fd91727970dc7@mail.gmail.com> Message-ID: <20090329010652.GA914@kunkel> Hi all; It is great we are exploring getting news out about Biopython in additional ways. One thing this can really help with is recognizing contributions to Biopython. Another is pointing out interesting discussion threads on the mailing lists and getting others involved. Do you think it would be worthwhile to "advertise" on the main list for someone interested in coordinating news and communication? They could do things like: - Send updates through twitter on day to day activities, like: Bartek and Tiago cleaned up documentation on Git submissions (link to wiki page) Peter, Jose and Sebastian are discussing slicing on SeqRecords (link to mailing list discussion) - Send out monthly news reports on new items in Biopython, in the style of Peter's update recently: http://news.open-bio.org/news/2009/03/biopython-next-gen-sequencing/ (but it should also give credit to the fine people who coded it) Perhaps there are members who are interested in Biopython and follow what is going on but aren't coders. This would be a way to get involved, and also take some of the burden off Peter. What do y'all think? Brad > > It's great to see that people have picked up on the Biopython Twitter > account already - I hope that it proves useful in the longer term. > > Regarding the social etiquette of Twitter, and the ease with which > 'following' can be taken to imply 'approval' I wonder if it would be a good > policy to restrict the Twitter accounts that Biopython follows only to those > representing organisations or groups. Following some individuals and not > others might be seen to privilege a self-selecting group, cabal or 'elite', > even the accidental suggestion of which I think would be best avoided. > > On 26/03/2009 15:49, "Peter" wrote: > > > > Quite a few people have started following this already - which is fun. I see > > the OBF news page entries are automatically pushed to their twitter account, > > http://twitter.com/obf_news plus the BioPerl tagged entries are also pushed > > to http://twitter.com/bioperl - I'll get in touch to see how they did > > it so we can > > [...] > > > We could probably also echo the CVS (or git) RSS feed into twitter, but I > > suspect that would drown out any more interesting tweets. > > Signal to noise is apparently not an issue that bothers very many Tweeters, > but I see no harm in starting a trend ;) > > L. > > -- > Dr Leighton Pritchard MRSC > D131, Plant Pathology Programme, SCRI > Errol Road, Invergowrie, Perth and Kinross, Scotland, DD2 5DA > e:lpritc at scri.ac.uk w:http://www.scri.ac.uk/staff/leightonpritchard > gpg/pgp: 0xFEFC205C tel:+44(0)1382 562731 x2405 > > > ______________________________________________________ > SCRI, Invergowrie, Dundee, DD2 5DA. > The Scottish Crop Research Institute is a charitable company limited by guarantee. > Registered in Scotland No: SC 29367. > Recognised by the Inland Revenue as a Scottish Charity No: SC 006662. > > > DISCLAIMER: > > This email is from the Scottish Crop Research Institute, but the views expressed by the sender are not necessarily the views of SCRI and its subsidiaries. This email and any files transmitted with it are confidential to the intended recipient at the e-mail address to which it has been addressed. It may not be disclosed or used by any other than that > addressee. > If you are not the intended recipient you are requested to preserve this confidentiality and you must not use, disclose, copy, print or rely on > this e-mail in any way. Please notify postmaster at scri.ac.uk quoting the name of the sender and delete the email from your system. > > Although SCRI has taken reasonable precautions to ensure no viruses are present in this email, neither the Institute nor the sender accepts any responsibility for any viruses, and it is your responsibility to scan the email and the attachments (if any). > ______________________________________________________ > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev From biopython at maubp.freeserve.co.uk Sun Mar 29 22:58:47 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Sun, 29 Mar 2009 23:58:47 +0100 Subject: [Biopython-dev] Biopython on Twitter In-Reply-To: <20090329010652.GA914@kunkel> References: <320fb6e00903260849n683d3e39kf68fd91727970dc7@mail.gmail.com> <20090329010652.GA914@kunkel> Message-ID: <320fb6e00903291558o6299575dq80eea647b1c6a900@mail.gmail.com> On Sun, Mar 29, 2009 at 2:06 AM, Brad Chapman wrote: > Hi all; > It is great we are exploring getting news out about Biopython in > additional ways. One thing this can really help with is recognizing > contributions to Biopython. Another is pointing out interesting > discussion threads on the mailing lists and getting others involved. Do you think the recent release notes and NEWS file entries have been a bit too impersonal? We can certainly be a bit more explicit if people that is a good thing. For example, should we mention Bartek by name in the paragraph on the new Bio.Motif module? This is linked to from the wiki's news page BTW: http://biopython.open-bio.org/SRC/biopython/NEWS http://cvs.biopython.org/cgi-bin/viewcvs/viewcvs.cgi/biopython/NEWS?cvsroot=biopython > Do you think it would be worthwhile to "advertise" on the main list > for someone interested in coordinating news and communication? > ... Perhaps there are members who are interested in Biopython and > follow what is going on but aren't coders. This would be a way to > get involved, ... Are you up for the job yourself Brad? From your own blog we know you can and do write regularly anyway ;) Would you like an account on the OBF news server? Email me off list and we can sort that out. In terms of micro-blogging via twitter, you sound like you have a better feel for this than me - I don't even have a personal twitter account. Monthly news posts (perhaps cc'd to the announcement email list) would be a nice idea - especially if we can encourage more lurkers to speak up. For a while BioPerl had something like this going (digest emails or something), but it needs a pretty dedicated person or team. In the meantime as you've noticed I've started making more use of our news facility myself... Peter From biopython at maubp.freeserve.co.uk Mon Mar 30 10:26:09 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 30 Mar 2009 11:26:09 +0100 Subject: [Biopython-dev] Testing Biopython with NumPy 1.3 Message-ID: <320fb6e00903300326x4cb5eb95r87dbd5c95d5379d9@mail.gmail.com> Hi all, NumPy 1.3 is about to be released, so we should try and make sure the forthcoming Biopython 1.50 release works with it. Of particular interest, this will be the first version of NumPy to support Python 2.6 on Windows, so we will hopefully be able to include a Python 2.6 Windows installer for Biopython 1.50 :) There is a release candidate out for NumPy 1.3, but so far no Windows installer for Python 2.6, but in the meantime I've just tried the NumPy 1.3 beta release instead. The good news is everything seems to compile with MinGW, but unfortunately test_Cluster.py is failing on the second line of Bio/Cluster/__init__.py, "from cluster import *". This could be a hiccup with NumPy itself - I am using their beta after all, or perhaps they have changed something. To try and narrow down the problem, has anyone else tried NumPy 1.3 (beta or release candidate) with the latest Biopython from CVS (on any platform)? Thanks, Peter From biopython at maubp.freeserve.co.uk Mon Mar 30 10:29:02 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 30 Mar 2009 11:29:02 +0100 Subject: [Biopython-dev] Testing Biopython with NumPy 1.3 In-Reply-To: <320fb6e00903300326x4cb5eb95r87dbd5c95d5379d9@mail.gmail.com> References: <320fb6e00903300326x4cb5eb95r87dbd5c95d5379d9@mail.gmail.com> Message-ID: <320fb6e00903300329ra19fe06j1cd12477e591afdf@mail.gmail.com> On Mon, Mar 30, 2009 at 11:26 AM, Peter wrote: > Hi all, > > NumPy 1.3 is about to be released, so we should try and make sure the > forthcoming Biopython 1.50 release works with it. ?Of particular interest, > this will be the first version of NumPy to support Python 2.6 on Windows, > so we will hopefully be able to include a Python 2.6 Windows installer > for Biopython 1.50 :) > > There is a release candidate out for NumPy 1.3, but so far no Windows > installer for Python 2.6, but in the meantime I've just tried the NumPy 1.3 > beta release instead. David Cournapeau has just updated sourceforge - so I will try again with the actual release candidate instead of just the beta... Peter From biopython at maubp.freeserve.co.uk Mon Mar 30 10:38:58 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 30 Mar 2009 11:38:58 +0100 Subject: [Biopython-dev] Testing Biopython with NumPy 1.3 In-Reply-To: <320fb6e00903300329ra19fe06j1cd12477e591afdf@mail.gmail.com> References: <320fb6e00903300326x4cb5eb95r87dbd5c95d5379d9@mail.gmail.com> <320fb6e00903300329ra19fe06j1cd12477e591afdf@mail.gmail.com> Message-ID: <320fb6e00903300338v35b14fa2yc0d2ba68925808da@mail.gmail.com> On Mon, Mar 30, 2009 at 11:29 AM, Peter wrote: > David Cournapeau has just updated sourceforge - so I will try again with > the actual release candidate instead of just the beta... Nope - using numpy-1.3.0rc1-win32-superpack-python2.6.exe on Windows XP, Python 2.6 using the python.org installer, with Biopython compiled with cygwin mingw32 as normal, same error - test_Cluster.py is failing on the second line of Bio/Cluster/__init__.py, "from cluster import *". So the question stands - has anyone else tried Biopython (from CVS) with NumPy 1.3 (beta or release candidate) on any platform? I should be able to check it tonight on a Linux machine myself without too much trouble... but a few more data points wouldn't hurt ;) Peter From biopython at maubp.freeserve.co.uk Mon Mar 30 11:15:06 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 30 Mar 2009 12:15:06 +0100 Subject: [Biopython-dev] test_Nexus.py and NamedTemporaryFile mode Message-ID: <320fb6e00903300415i350610c0i4c2aeed1834011da@mail.gmail.com> I've been running the test suite again on Windows, and was reminded of this open issue with NamedTemporaryFile on Windows... On Fri, Feb 13, 2009 at 5:02 PM, Peter wrote: > On Tue, Feb 10, 2009 at 11:25 AM, Michiel de Hoon wrote: >> >>> The test_Nexus tearDown used to make sure the temp output >>> files were removed. ?This is important on Windows which >>> does not do this automatically. ?I see you now allocate >>> "random" filenames using tempfile.NamedTemporaryFile(...) >>> so presumably we would need to record these so that the >>> tearDown method knows what temp files to remove. >> >> From reading the Python documentation, the file created by >> tempfile.NamedTemporaryFile is removed automatically >> when the file handle is closed, even on Windows. > > That's good to know. ?On a related point, I've just found > test_Nexus.py is failing on Windows XP with Python 2.6 (but is fine > with Python 2.3, 2.4 and 2.5): > > C:\repository\biopython\Tests>c:\python26\python test_Nexus.py > Test Nexus module ... ERROR > Test Tree module. ... ok > > ====================================================================== > ERROR: Test Nexus module > ---------------------------------------------------------------------- > Traceback (most recent call last): > ?File "test_Nexus.py", line 114, in test_NexusTest1 > ? ?f1=tempfile.NamedTemporaryFile(mode='r+w+b') > ?File "c:\python26\lib\tempfile.py", line 445, in NamedTemporaryFile > ? ?file = _os.fdopen(fd, mode, bufsize) > OSError: [Errno 22] Invalid argument > > ---------------------------------------------------------------------- > Ran 2 tests in 0.016s > > FAILED (errors=1) You can recreate this at the python 2.6 prompt with the one line: f1=tempfile.NamedTemporaryFile(mode='r+w+b') I couldn't solve this from looking at the Python documentation, but after some Google searching the answer seems to be just to use the default mode (w+b): f1=tempfile.NamedTemporaryFile() This works on Windows with Python 2.3 to 2.6, and also works on Mac OS X and Linux too (only one version of Python tested here). Fix checked into CVS. Peter From cy at cymon.org Mon Mar 30 11:42:00 2009 From: cy at cymon.org (Cymon Cox) Date: Mon, 30 Mar 2009 12:42:00 +0100 Subject: [Biopython-dev] Multiple alignment - Clustalw etc... Message-ID: <7265d4f0903300442h276df25ay1d78fb04180c5b5b@mail.gmail.com> Hi Folks, I've been trying to formalize a bunch of randomly scattered bits of code to support the use of the alignment programme Muscle (http://www.drive5.com/muscle/). I prefer to use this software in preference to Clustalw - subjectively, it seems to give the most accurate alignments. (Whether Biopython would want to support a second alignment programme/external dependency is another matter...) Anyway, while doing so, I realised just how awkward the current interface to Clustalw is, which doesn't fit the SeqIO/AlignIO paradigm well. Currently, if we have a bunch of SeqRecords, say after downloading from GenBank or being pulled from a BioSQL db, we have to write them to disk and call clustalw on the file: >>> from Bio import Clustalw >>> from Bio.Clustalw import MultipleAlignCL >>> cline = MultipleAlignCL("f002", command="clustalw") >>> align = Clustalw.do_alignment(cline) It seems to me more appropriate to be able to call clustalw directly on a bunch of SeqRecords: eg (suggested implementation) >>> records = list(SeqIO.parse(open("f002", "r"), "fasta")) >>> from Bio.Align import MultipleAlignment >>> align = MultipleAlignment(records, executable="clustalw") Secondly, the biopython interface does not support calling Clustalw to perform profile alignments, (suggested implementation) # The scaffold alignment: >>> align = AlignIO.read(open("blah.nex", "r"), "nexus") # The sequences we want to add to it: >>> records = list(SeqIO.parse(open("f002", "r"), "fasta")) >>> from Bio.Align import ProfileAlignment >>> align = ProfileAlignment(align, records, executable="clustalw") Calls to MultipleAlignment and ProfileAlignment would take a **options parameter to collect any additional command line options. Thirdly, should an alignment object have a Alignment.refine_alignment(executable="clustalw") method? Any thoughts? Cheers, C. -- ____________________________________________________________________ Cymon J. Cox Centro de Ciencias do Mar Faculdade de Ciencias do Mar e Ambiente (FCMA) Universidade do Algarve Campus de Gambelas 8005-139 Faro Portugal Phone: +0351 289800909 ext 7909 Fax: +0351 289800051 Email: cy at cymon.org, cymon at ualg.pt, cymon.cox at gmail.com HomePage : http://biology.duke.edu/bryology/cymon.html -8.63/-6.77 From chapmanb at 50mail.com Mon Mar 30 13:00:27 2009 From: chapmanb at 50mail.com (Brad Chapman) Date: Mon, 30 Mar 2009 09:00:27 -0400 Subject: [Biopython-dev] Testing Biopython with NumPy 1.3 In-Reply-To: <320fb6e00903300338v35b14fa2yc0d2ba68925808da@mail.gmail.com> References: <320fb6e00903300326x4cb5eb95r87dbd5c95d5379d9@mail.gmail.com> <320fb6e00903300329ra19fe06j1cd12477e591afdf@mail.gmail.com> <320fb6e00903300338v35b14fa2yc0d2ba68925808da@mail.gmail.com> Message-ID: <20090330130027.GB36526@sobchak.mgh.harvard.edu> Hi Peter; Things work on FreeBSD 7.1 with python2.5 and the numpy release candidate: > python2.5 Python 2.5.4 (r254:67916, Feb 18 2009, 08:20:57) [GCC 4.2.1 20070719 [FreeBSD]] on freebsd7 >>> import numpy >>> numpy.__version__ '1.3.0rc1' > python2.5 test_Cluster.py test_clusterdistance (__main__.TestCluster) ... ok test_distancematrix_kmedoids (__main__.TestCluster) ... ok test_kcluster (__main__.TestCluster) ... ok test_matrix_parse (__main__.TestCluster) ... ok test_median_mean (__main__.TestCluster) ... ok test_somcluster (__main__.TestCluster) ... ok test_treecluster (__main__.TestCluster) ... ok ---------------------------------------------------------------------- Ran 7 tests in 0.009s OK The whole test suite passes as well. Maybe this is a windows issue? Brad > On Mon, Mar 30, 2009 at 11:29 AM, Peter wrote: > > David Cournapeau has just updated sourceforge - so I will try again with > > the actual release candidate instead of just the beta... > > Nope - using numpy-1.3.0rc1-win32-superpack-python2.6.exe on Windows > XP, Python 2.6 using the python.org installer, with Biopython compiled > with cygwin mingw32 as normal, same error - test_Cluster.py is failing > on the second line of Bio/Cluster/__init__.py, "from cluster import > *". > > So the question stands - has anyone else tried Biopython (from CVS) > with NumPy 1.3 (beta or release candidate) on any platform? I should > be able to check it tonight on a Linux machine myself without too much > trouble... but a few more data points wouldn't hurt ;) > > Peter > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev From biopython at maubp.freeserve.co.uk Mon Mar 30 13:23:31 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 30 Mar 2009 14:23:31 +0100 Subject: [Biopython-dev] Testing Biopython with NumPy 1.3 In-Reply-To: <20090330130027.GB36526@sobchak.mgh.harvard.edu> References: <320fb6e00903300326x4cb5eb95r87dbd5c95d5379d9@mail.gmail.com> <320fb6e00903300329ra19fe06j1cd12477e591afdf@mail.gmail.com> <320fb6e00903300338v35b14fa2yc0d2ba68925808da@mail.gmail.com> <20090330130027.GB36526@sobchak.mgh.harvard.edu> Message-ID: <320fb6e00903300623j1f17fe6fia6ded742a7c610ec@mail.gmail.com> On Mon, Mar 30, 2009 at 2:00 PM, Brad Chapman wrote: > Hi Peter; > Things work on FreeBSD 7.1 with python2.5 and the numpy release > candidate: > ... > The whole test suite passes as well. Maybe this is a windows issue? > Brad Thanks Brad - nice to know we have Biopython being tested on a fourth major OS being tested (FreeBSD, in addition to Linux, Mac OS X and Windows XP). I've just used NumPy 1.3.0rc1 with Python 2.4.3 on a Linux box, and test_Cluster and the rest of the Biopython tests passed. This looks like a Windows and/or Python 2.6 problem - I should be able to try a Linux machine with Python 2.6 tonight... Peter From biopython at maubp.freeserve.co.uk Mon Mar 30 14:37:18 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 30 Mar 2009 15:37:18 +0100 Subject: [Biopython-dev] Multiple alignment - Clustalw etc... In-Reply-To: <7265d4f0903300442h276df25ay1d78fb04180c5b5b@mail.gmail.com> References: <7265d4f0903300442h276df25ay1d78fb04180c5b5b@mail.gmail.com> Message-ID: <320fb6e00903300737i73f6efaex7b0a22ee685c74c1@mail.gmail.com> On Mon, Mar 30, 2009 at 12:42 PM, Cymon Cox wrote: > > Hi Folks, > > I've been trying to formalize a bunch of randomly scattered bits of code to > support the use of the alignment programme Muscle > (http://www.drive5.com/muscle/). I prefer to use this software in preference > to Clustalw - subjectively, it seems to give the most accurate alignments. > (Whether Biopython would want to support a second alignment programme > /external dependency is another matter...) A wrapper for MUSCLE wouldn't hurt - although there is scope for some rearrangement of our command line tool wrappers rather than adding more and more top level modules. Maybe under Bio.Align, and move the Clustalw wrapper there too. > Anyway, while doing so, I realised just how awkward the current interface to > Clustalw is, which doesn't fit the SeqIO/AlignIO paradigm well. What I typically do fits pretty nicely with the SeqIO/AlignIO paradigm: (1) use SeqIO to prepare the FASTA input file. (2) run the command line tool (e.g. MUSCLE). (3) use AlignIO (or SeqIO) to read the alignment output file. Actually I think that Bio.Clustalw interface is now a bit out of place, as it hides some of this from you. (Note that Bio.Clustalw predates Bio.AlignIO, and that by working with handles Bio.AlignIO is fairly tool neutral). > Currently, if we have a bunch of SeqRecords, say after downloading from > GenBank or being pulled from a BioSQL db, we have to write them to disk > and call clustalw on the file: > >>>> from Bio import Clustalw >>>> from Bio.Clustalw import MultipleAlignCL >>>> cline = MultipleAlignCL("f002", command="clustalw") >>>> align = Clustalw.do_alignment(cline) Well yes. Typically for any alignment tool you'd have to write the unaligned records in FASTA format. Some tools may let handle this via standard input, so you may be able to use a pipe instead of a file - but the issues are similar. > It seems to me more appropriate to be able to call clustalw directly on a > bunch of SeqRecords: > > eg (suggested implementation) >>>> records = list(SeqIO.parse(open("f002", "r"), "fasta")) >>>> from Bio.Align import MultipleAlignment >>>> align = MultipleAlignment(records, executable="clustalw") i.e. Have a Biopython wrapper use a temp file to record the given records to in a format appropriate for the command line tool selected, and capturing the output? In the case of ClustalW or MUSCLE this means making a temp FASTA input file. For ClustalW we'd then have to open the output file, read it, and then delete it. For other tools we may be able to just capture its output on stdout and not have to clean up a temp output file. All the possible command line tools have their own arguments, range of file formats, behaviour with respect to default filenames etc. Trying to capture all this in a single wrapper seems rather ambitious. For example, how would you handle gap penalties? Keep in mind that different tools may use the same name for a gap extension penalty but interpret the values differently. Also, while I can see this might be nice for short alignments (which are quick to run), its rather implicit or magic. I personally prefer to have to deal with the files explicitly myself - but then I have been dealing with large alignments which I want to keep on disk. > Secondly, the biopython interface does not support calling > Clustalw to perform profile alignments, > > (suggested implementation) > # The scaffold alignment: >>>> align = AlignIO.read(open("blah.nex", "r"), "nexus") > # The sequences we want to add to it: >>>> records = list(SeqIO.parse(open("f002", "r"), "fasta")) >>>> from Bio.Align import ProfileAlignment >>>> align = ProfileAlignment(align, records, executable="clustalw") > > Calls to MultipleAlignment and ProfileAlignment would take a > **options parameter to collect any additional command line options. > > Thirdly, should an alignment object have a > Alignment.refine_alignment(executable="clustalw") > method? > > Any thoughts? I may have misunderstood you, but the ideas you've sketched out seem very very broad/ambitious - and actually take us further away from the SeqIO/AlignIO interface by hiding all the filenames and handles from the user. I think these should be kept explicit. Peter From eric.talevich at gmail.com Mon Mar 30 18:34:09 2009 From: eric.talevich at gmail.com (Eric Talevich) Date: Mon, 30 Mar 2009 14:34:09 -0400 Subject: [Biopython-dev] Google Summer of Code -- phyloXML parser project Message-ID: <3f6baf360903301134p421a41f2if2b8980e9e166451@mail.gmail.com> Hi folks, I noticed earlier this month that several Biopython developers had signed up as potential mentors in OBF's Summer of Code application. Although OBF apparently wasn't selected as a mentoring organization this year, some other bioinformatics-related groups were -- in particular, the National Evolutionary Synthesis Center's page mentions involvement with the Bio* projects: http://socghop.appspot.com/org/show/google/gsoc2009/nescent The project I'd like to work on is a phyloXML parser for Biopython. NESCent's idea list includes a similar entry for BioRuby (links below). I asked the mentor, Christian Zmasek, if it would be acceptable to do the project with Biopython instead of BioRuby, and he said it would, but he'd prefer to have a Biopython specialist on board as another mentor. Would any of you be interested in being a mentor for this project? I imagine it would have some things in common with the existing Nexus parser, as a starting point. http://www.phyloxml.org/ https://www.nescent.org/wg/phyloinformatics/index.php?title=Phyloinformatics_Summer_of_Code_2009#phyloXML_support_in_BioRuby Thanks, Eric From chapmanb at 50mail.com Mon Mar 30 21:00:07 2009 From: chapmanb at 50mail.com (Brad Chapman) Date: Mon, 30 Mar 2009 17:00:07 -0400 Subject: [Biopython-dev] Multiple alignment - Clustalw etc... In-Reply-To: <320fb6e00903300737i73f6efaex7b0a22ee685c74c1@mail.gmail.com> References: <7265d4f0903300442h276df25ay1d78fb04180c5b5b@mail.gmail.com> <320fb6e00903300737i73f6efaex7b0a22ee685c74c1@mail.gmail.com> Message-ID: <20090330210007.GC72956@sobchak.mgh.harvard.edu> Cymon; I wrote a bunch of the Clustalw stuff a long while ago, and it sounds like Peter has a good handle on integrating it with AlignIO so I will leave that to him. On the choosing aligners side of things, have you tried MAFFT? http://align.bmr.kyushu-u.ac.jp/mafft/software/ It's updated regularly and seems to have good buzz in the community. I haven't had to do lots of multiple alignments recently, but it's worked well for the few I've done. Having support for multiple aligners is good stuff; I second Peter's suggestion of having these live under Bio.Align. Brad > On Mon, Mar 30, 2009 at 12:42 PM, Cymon Cox wrote: > > > > Hi Folks, > > > > I've been trying to formalize a bunch of randomly scattered bits of code to > > support the use of the alignment programme Muscle > > (http://www.drive5.com/muscle/). I prefer to use this software in preference > > to Clustalw - subjectively, it seems to give the most accurate alignments. > > (Whether Biopython would want to support a second alignment programme > > /external dependency is another matter...) > > A wrapper for MUSCLE wouldn't hurt - although there is scope for some > rearrangement of our command line tool wrappers rather than adding more > and more top level modules. Maybe under Bio.Align, and move the Clustalw > wrapper there too. > > > Anyway, while doing so, I realised just how awkward the current interface to > > Clustalw is, which doesn't fit the SeqIO/AlignIO paradigm well. > > What I typically do fits pretty nicely with the SeqIO/AlignIO paradigm: > (1) use SeqIO to prepare the FASTA input file. > (2) run the command line tool (e.g. MUSCLE). > (3) use AlignIO (or SeqIO) to read the alignment output file. > > Actually I think that Bio.Clustalw interface is now a bit out of place, > as it hides some of this from you. (Note that Bio.Clustalw predates > Bio.AlignIO, and that by working with handles Bio.AlignIO is fairly > tool neutral). > > > Currently, if we have a bunch of SeqRecords, say after downloading from > > GenBank or being pulled from a BioSQL db, we have to write them to disk > > and call clustalw on the file: > > > >>>> from Bio import Clustalw > >>>> from Bio.Clustalw import MultipleAlignCL > >>>> cline = MultipleAlignCL("f002", command="clustalw") > >>>> align = Clustalw.do_alignment(cline) > > Well yes. Typically for any alignment tool you'd have to write the > unaligned records in FASTA format. Some tools may let handle > this via standard input, so you may be able to use a pipe instead > of a file - but the issues are similar. > > > It seems to me more appropriate to be able to call clustalw directly on a > > bunch of SeqRecords: > > > > eg (suggested implementation) > >>>> records = list(SeqIO.parse(open("f002", "r"), "fasta")) > >>>> from Bio.Align import MultipleAlignment > >>>> align = MultipleAlignment(records, executable="clustalw") > > i.e. Have a Biopython wrapper use a temp file to record the > given records to in a format appropriate for the command line > tool selected, and capturing the output? In the case of > ClustalW or MUSCLE this means making a temp FASTA input > file. For ClustalW we'd then have to open the output file, read > it, and then delete it. For other tools we may be able to just > capture its output on stdout and not have to clean up a temp > output file. > > All the possible command line tools have their own arguments, > range of file formats, behaviour with respect to default filenames > etc. Trying to capture all this in a single wrapper seems rather > ambitious. For example, how would you handle gap penalties? > Keep in mind that different tools may use the same name for > a gap extension penalty but interpret the values differently. > > Also, while I can see this might be nice for short alignments > (which are quick to run), its rather implicit or magic. I personally > prefer to have to deal with the files explicitly myself - but then I > have been dealing with large alignments which I want to keep > on disk. > > > Secondly, the biopython interface does not support calling > > Clustalw to perform profile alignments, > > > > (suggested implementation) > > # The scaffold alignment: > >>>> align = AlignIO.read(open("blah.nex", "r"), "nexus") > > # The sequences we want to add to it: > >>>> records = list(SeqIO.parse(open("f002", "r"), "fasta")) > >>>> from Bio.Align import ProfileAlignment > >>>> align = ProfileAlignment(align, records, executable="clustalw") > > > > Calls to MultipleAlignment and ProfileAlignment would take a > > **options parameter to collect any additional command line options. > > > > Thirdly, should an alignment object have a > > Alignment.refine_alignment(executable="clustalw") > > method? > > > > Any thoughts? > > I may have misunderstood you, but the ideas you've sketched out > seem very very broad/ambitious - and actually take us further away > from the SeqIO/AlignIO interface by hiding all the filenames and > handles from the user. I think these should be kept explicit. > > Peter > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev From chapmanb at 50mail.com Mon Mar 30 21:14:48 2009 From: chapmanb at 50mail.com (Brad Chapman) Date: Mon, 30 Mar 2009 17:14:48 -0400 Subject: [Biopython-dev] Google Summer of Code -- phyloXML parser project In-Reply-To: <3f6baf360903301134p421a41f2if2b8980e9e166451@mail.gmail.com> References: <3f6baf360903301134p421a41f2if2b8980e9e166451@mail.gmail.com> Message-ID: <20090330211448.GF72956@sobchak.mgh.harvard.edu> Hi Eric; I would be happy to help with mentoring. I have been helping another student with his application and could definitely give you feedback on yours. Based on good ones coming through the list, it should be detailed with a week by week description of what you plan to be working on and specific deliverables. They also have a short description of the motivation and your qualifications. This is my first time doing this, so I don't know much about the selection process. If more than one Biopython project was selected, I couldn't realistically mentor both; I am not even sure if that is a possibility. Either way, Google recommends having two mentors per student so it would be good to have someone else step up as well. Let me know if you have any specific questions while you are getting things together this week, Brad > Hi folks, > > I noticed earlier this month that several Biopython developers had signed up > as potential mentors in OBF's Summer of Code application. Although OBF > apparently wasn't selected as a mentoring organization this year, some other > bioinformatics-related groups were -- in particular, the National > Evolutionary Synthesis Center's page mentions involvement with the Bio* > projects: > > http://socghop.appspot.com/org/show/google/gsoc2009/nescent > > The project I'd like to work on is a phyloXML parser for Biopython. > NESCent's idea list includes a similar entry for BioRuby (links below). I > asked the mentor, Christian Zmasek, if it would be acceptable to do the > project with Biopython instead of BioRuby, and he said it would, but he'd > prefer to have a Biopython specialist on board as another mentor. > > Would any of you be interested in being a mentor for this project? I imagine > it would have some things in common with the existing Nexus parser, as a > starting point. > > http://www.phyloxml.org/ > https://www.nescent.org/wg/phyloinformatics/index.php?title=Phyloinformatics_Summer_of_Code_2009#phyloXML_support_in_BioRuby > > Thanks, > Eric > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev From chapmanb at 50mail.com Mon Mar 30 21:33:17 2009 From: chapmanb at 50mail.com (Brad Chapman) Date: Mon, 30 Mar 2009 17:33:17 -0400 Subject: [Biopython-dev] Biopython on Twitter In-Reply-To: <320fb6e00903291558o6299575dq80eea647b1c6a900@mail.gmail.com> References: <320fb6e00903260849n683d3e39kf68fd91727970dc7@mail.gmail.com> <20090329010652.GA914@kunkel> <320fb6e00903291558o6299575dq80eea647b1c6a900@mail.gmail.com> Message-ID: <20090330213317.GG72956@sobchak.mgh.harvard.edu> Hi Peter; Thanks for the feedback. I was definitely not being critical of your postings, or fishing for extra jobs for myself. On the contrary, I was inspired by the news items and brainstorming some ways to get additional people involved. People who express an interest in Biopython and don't get involved often list the following reasons: - Not feeling like they are technically able to contribute. Perhaps they are just learning Python, or don't feel comfortable with the Biopython library itself. - Traditional academics doesn't offer recognition for contributing to open source projects. While we can't change academics, we can try and come up with ways to improve the visibility of contributors and make sure they are recognized in the larger bioinformatics community. My thought was that a "news coordinator" would give one or more interested people a chance to help the community, learn more about Biopython by being involved, and also increase name recognition for everyone coding, bug fixing and discussing. In terms of how it is done, those were only my random suggestions. Certainly if someone took it up they could be as creative as they want about how to go about it. Brad > On Sun, Mar 29, 2009 at 2:06 AM, Brad Chapman wrote: > > Hi all; > > It is great we are exploring getting news out about Biopython in > > additional ways. One thing this can really help with is recognizing > > contributions to Biopython. Another is pointing out interesting > > discussion threads on the mailing lists and getting others involved. > > Do you think the recent release notes and NEWS file entries have been > a bit too impersonal? We can certainly be a bit more explicit if people > that is a good thing. For example, should we mention Bartek by name > in the paragraph on the new Bio.Motif module? > > This is linked to from the wiki's news page BTW: > http://biopython.open-bio.org/SRC/biopython/NEWS > http://cvs.biopython.org/cgi-bin/viewcvs/viewcvs.cgi/biopython/NEWS?cvsroot=biopython > > > Do you think it would be worthwhile to "advertise" on the main list > > for someone interested in coordinating news and communication? > > ... Perhaps there are members who are interested in Biopython and > > follow what is going on but aren't coders. This would be a way to > > get involved, ... > > Are you up for the job yourself Brad? From your own blog we know > you can and do write regularly anyway ;) Would you like an account > on the OBF news server? Email me off list and we can sort that out. > > In terms of micro-blogging via twitter, you sound like you have a better > feel for this than me - I don't even have a personal twitter account. > > Monthly news posts (perhaps cc'd to the announcement email list) > would be a nice idea - especially if we can encourage more lurkers > to speak up. For a while BioPerl had something like this going > (digest emails or something), but it needs a pretty dedicated person > or team. In the meantime as you've noticed I've started making > more use of our news facility myself... > > Peter From biopython at maubp.freeserve.co.uk Mon Mar 30 21:58:52 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 30 Mar 2009 22:58:52 +0100 Subject: [Biopython-dev] Biopython on Twitter In-Reply-To: <20090330213317.GG72956@sobchak.mgh.harvard.edu> References: <320fb6e00903260849n683d3e39kf68fd91727970dc7@mail.gmail.com> <20090329010652.GA914@kunkel> <320fb6e00903291558o6299575dq80eea647b1c6a900@mail.gmail.com> <20090330213317.GG72956@sobchak.mgh.harvard.edu> Message-ID: <320fb6e00903301458s7216ec97gc4ac71a03d0fd350@mail.gmail.com> On Mon, Mar 30, 2009 at 10:33 PM, Brad Chapman wrote: > Hi Peter; > Thanks for the feedback. I was definitely not being critical of your > postings, ... I hadn't had that impression, but that's still nice to hear ;) > ... or fishing for extra jobs for myself. Darn - I thought you'd be an excellent choice. > On the contrary, I was inspired by the news items and > brainstorming some ways to get additional people involved. Well unless anyone already lurking on the dev mailing list steps forward (*hint hint*), do you (Brad) want to try asking on the main discussion list to see if there are any takers? > People who express an interest in Biopython and don't get > involved often list the following reasons: > > - Not feeling like they are technically able to contribute. Perhaps > ?they are just learning Python, or don't feel comfortable with the > ?Biopython library itself. I find once they get over any shyness, even just having beginners asking questions can be valuable in itself. It shows us potential blind spots, or areas of the documentation which need clarification (or writing) - plus of course it can bring about discussions etc. > - Traditional academics doesn't offer recognition for contributing to > ?open source projects. While we can't change academics, we can try > ?and come up with ways to improve the visibility of contributors and > ?make sure they are recognized in the larger bioinformatics > ?community. > > My thought was that a "news coordinator" would give one or more > interested people a chance to help the community, learn more about > Biopython by being involved, and also increase name recognition for > everyone coding, bug fixing and discussing. Some of us are very aware of this issue (accademic recognition for contributions to projects like Biopython), and different employers will take different attitudes here. In some cases making our contributors more visible won't always be a good idea... In my case work on Biopython was a definite plus point in landing my current job, but there are of course still limits to how much work time I can reasonably spend on this (and limits to how much time I spend out of work - like right now on this email). > In terms of how it is done, those were only my random suggestions. > Certainly if someone took it up they could be as creative as they > want about how to go about it. > > Brad It's certainly worth a go :) Peter From biopython at maubp.freeserve.co.uk Mon Mar 30 22:35:05 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 30 Mar 2009 23:35:05 +0100 Subject: [Biopython-dev] Testing Biopython with NumPy 1.3 In-Reply-To: <320fb6e00903300623j1f17fe6fia6ded742a7c610ec@mail.gmail.com> References: <320fb6e00903300326x4cb5eb95r87dbd5c95d5379d9@mail.gmail.com> <320fb6e00903300329ra19fe06j1cd12477e591afdf@mail.gmail.com> <320fb6e00903300338v35b14fa2yc0d2ba68925808da@mail.gmail.com> <20090330130027.GB36526@sobchak.mgh.harvard.edu> <320fb6e00903300623j1f17fe6fia6ded742a7c610ec@mail.gmail.com> Message-ID: <320fb6e00903301535j21ae6659r931c9be0fd17faf3@mail.gmail.com> On Mon, Mar 30, 2009 at 2:23 PM, Peter wrote: > I've just used NumPy 1.3.0rc1 with Python 2.4.3 on a Linux box, and > test_Cluster and the rest of the Biopython tests passed. ?This looks like > a Windows and/or Python 2.6 problem - I should be able to try a Linux > machine with Python 2.6 tonight... I've just tried it on Ubuntu Jaunty (Alpha 6), with Python 2.6.1+ (already installed), the wise and clustalw packages installed, Numpy 1.3.0rc1 installed from source, and Biopython CVS installed from source. Again, test_Cluster.py and the rest of our tests pass (ignoring those with additional external dependencies like BioSQL, fdist, simcoal2). So, whatever is going wrong on test_Cluster.py seems to be specific to Windows (XP) and Python 2.6 - and possibly just my Windows development machine. Peter From mjldehoon at yahoo.com Tue Mar 31 00:08:34 2009 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Mon, 30 Mar 2009 17:08:34 -0700 (PDT) Subject: [Biopython-dev] Testing Biopython with NumPy 1.3 In-Reply-To: <320fb6e00903301535j21ae6659r931c9be0fd17faf3@mail.gmail.com> Message-ID: <730606.962.qm@web62408.mail.re1.yahoo.com> > So, whatever is going wrong on test_Cluster.py seems to be > specific to > Windows (XP) and Python 2.6 - and possibly just my Windows > development > machine. > I believe that the problem is that msvcr90.dll is missing. This is the C runtime from Microsoft. Earlier Pythons used msvcr71.dll, if I'm not mistaken. --Michiel From biopython at maubp.freeserve.co.uk Tue Mar 31 09:12:21 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 31 Mar 2009 10:12:21 +0100 Subject: [Biopython-dev] Testing Biopython with NumPy 1.3 In-Reply-To: <730606.962.qm@web62408.mail.re1.yahoo.com> References: <320fb6e00903301535j21ae6659r931c9be0fd17faf3@mail.gmail.com> <730606.962.qm@web62408.mail.re1.yahoo.com> Message-ID: <320fb6e00903310212o29bba163ma9d68a901eabc2c9@mail.gmail.com> On Tue, Mar 31, 2009 at 1:08 AM, Michiel de Hoon wrote: > >> So, whatever is going wrong on test_Cluster.py seems to be >> specific to Windows (XP) and Python 2.6 - and possibly just >> my Windows development machine. >> > I believe that the problem is that msvcr90.dll is missing. This > is the C runtime from Microsoft. Earlier Pythons used > msvcr71.dll, if I'm not mistaken. You may be right - there is some stuff on the numpy mailing list about this and manifest files etc when using mingw32. It may be simplest to try the appropriate MS compiler instead... Peter From biopython at maubp.freeserve.co.uk Tue Mar 31 10:28:35 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 31 Mar 2009 11:28:35 +0100 Subject: [Biopython-dev] Python's new DVCS chosen Message-ID: <320fb6e00903310328x3c2d8bc0n8138f551da7ea4a2@mail.gmail.com> Hi all, This might be of interest (although I'm sure some of you already know). Earlier this month on the python-dev mailing list, Guido van Rossum wrote: > Dear Python developers, > > The decision is made! I've selected a DVCS to use for Python. > We're switching to Mercurial (Hg). > > The implementation and schedule is still up in the air -- I am > hoping that we can switch before the summer. > ... http://mail.python.org/pipermail/python-dev/2009-March/087931.html See also PEP-374, http://www.python.org/dev/peps/pep-0374/ Interestingly, Mercurial (Hg) didn't get much of a mention in our discussions here. Peter From bartek at rezolwenta.eu.org Tue Mar 31 11:05:07 2009 From: bartek at rezolwenta.eu.org (Bartek Wilczynski) Date: Tue, 31 Mar 2009 13:05:07 +0200 Subject: [Biopython-dev] Python's new DVCS chosen In-Reply-To: <320fb6e00903310328x3c2d8bc0n8138f551da7ea4a2@mail.gmail.com> References: <320fb6e00903310328x3c2d8bc0n8138f551da7ea4a2@mail.gmail.com> Message-ID: <8b34ec180903310405x5d5353f0q2de270a3c16bdc95@mail.gmail.com> Hi, On Tue, Mar 31, 2009 at 12:28 PM, Peter wrote: > Hi all, > > This might be of interest (although I'm sure some of you already > know). ?Earlier this month on the python-dev mailing list, Guido > van Rossum wrote: >> We're switching to Mercurial (Hg). > Interestingly, Mercurial (Hg) didn't get much of a mention in our > discussions here. Their evaluation of different options (in PEP 374) was mentioned on the list by Bruce, so everyone was able to make their opinions. As Guido explains in another paragraph: >It's hard to explain my reasons for choosing -- like most language >decisions (especially the difficult ones) it's mostly a matter of gut >feelings. One thing I know is that it's better to decide now than to >spend another year discussing the pros and cons. All that could be >said has been said, pretty much, and my mind is made up. He seems to find all the candidates good enough. It's a matter then of a consensus between developers. Git happened to have many antagonists on python-dev list, but it happened to have more protagonists on biopython-dev. I think we have made a consensus decision to try out git/github and I think it's extremely counter-productive to re-open the discussion on our choice now. I'm not a git fanboy, but because there are _no_ universal criteria to choose between git vs. bzr vs. Hg we should not spend more time on this issue. cheers Bartek From cy at cymon.org Tue Mar 31 11:25:27 2009 From: cy at cymon.org (Cymon Cox) Date: Tue, 31 Mar 2009 12:25:27 +0100 Subject: [Biopython-dev] Multiple alignment - Clustalw etc... In-Reply-To: <320fb6e00903300737i73f6efaex7b0a22ee685c74c1@mail.gmail.com> References: <7265d4f0903300442h276df25ay1d78fb04180c5b5b@mail.gmail.com> <320fb6e00903300737i73f6efaex7b0a22ee685c74c1@mail.gmail.com> Message-ID: <7265d4f0903310425p60a8f80ewb8aee8cc6b4a663c@mail.gmail.com> Hi Peter, 2009/3/30 Peter > On Mon, Mar 30, 2009 at 12:42 PM, Cymon Cox wrote: > > > > Hi Folks, > > > > I've been trying to formalize a bunch of randomly scattered bits of code > to > > support the use of the alignment programme Muscle > > (http://www.drive5.com/muscle/). I prefer to use this software in > preference > > to Clustalw - subjectively, it seems to give the most accurate > alignments. > > (Whether Biopython would want to support a second alignment programme > > /external dependency is another matter...) > > A wrapper for MUSCLE wouldn't hurt - although there is scope for some > rearrangement of our command line tool wrappers rather than adding more > and more top level modules. Maybe under Bio.Align, and move the Clustalw > wrapper there too. Agreed - it would seem more appropriate to have the alignment interfaces in Bio.Align. > > Anyway, while doing so, I realised just how awkward the current interface > to > > Clustalw is, which doesn't fit the SeqIO/AlignIO paradigm well. > > What I typically do fits pretty nicely with the SeqIO/AlignIO paradigm: > (1) use SeqIO to prepare the FASTA input file. > (2) run the command line tool (e.g. MUSCLE). > (3) use AlignIO (or SeqIO) to read the alignment output file. Well, yes - we can always not use the biopython interface. > Actually I think that Bio.Clustalw interface is now a bit out of place, > as it hides some of this from you. (Note that Bio.Clustalw predates > Bio.AlignIO, and that by working with handles Bio.AlignIO is fairly > tool neutral). > > > Currently, if we have a bunch of SeqRecords, say after downloading from > > GenBank or being pulled from a BioSQL db, we have to write them to disk > > and call clustalw on the file: > > > >>>> from Bio import Clustalw > >>>> from Bio.Clustalw import MultipleAlignCL > >>>> cline = MultipleAlignCL("f002", command="clustalw") > >>>> align = Clustalw.do_alignment(cline) > > Well yes. Typically for any alignment tool you'd have to write the > unaligned records in FASTA format. Some tools may let handle > this via standard input, so you may be able to use a pipe instead > of a file - but the issues are similar. > > > It seems to me more appropriate to be able to call clustalw directly on a > > bunch of SeqRecords: > > > > eg (suggested implementation) > >>>> records = list(SeqIO.parse(open("f002", "r"), "fasta")) > >>>> from Bio.Align import MultipleAlignment > >>>> align = MultipleAlignment(records, executable="clustalw") > > i.e. Have a Biopython wrapper use a temp file to record the > given records to in a format appropriate for the command line > tool selected, and capturing the output? In the case of > ClustalW or MUSCLE this means making a temp FASTA input > file. For ClustalW we'd then have to open the output file, read > it, and then delete it. Yes, that's what I'm suggesting. Here's my reasoning: it seems to me the input and output formats of the data required by a particular alignment tool are incidental and should be hidden from the user. At present the Clustalw interface forces you to write a fasta formatted file of your records to disk, and then has Clustalw write an aligned matrix to disk in a format specified by the user. If the latter is Clustal format, then the record is parsed and an alignment object is returned, else None is returned. In either case, an output file(s) remains on disk. So, say we have a bunch of sequences in pir format and we'd like them aligned and saved in stockholm format: from Bio import SeqIO from Bio import AlignIO from Bio import Clustalw from Bio.Clustalw import MultipleAlignCL records = SeqIO.parse(open("Tests/NBRF/DMA_nuc.pir", "r"), "pir") AlignIO.write([records], open("temp.fasta", "w"), "fasta") cline = MultipleAlignCL("temp.fasta", command="clustalw") align = Clustalw.do_alignment(cline) AlignIO.write([align], open("temp.sth", "w"), "stockholm") we end up with 4 output files on disk: temp.aln, temp.dnd, temp.fasta, temp.sth - 3 of which are incidental. (BTW, using the above procedure on the files "B_nuc.pir" and "Cw_prot.pir" in Tests/NBRF hangs on RH and Ubuntu linux: it seems to be waiting for the subprocess to return, which it never does: pid, sts = os.waitpid(self.pid, 0)) As I say, I'd like to see this: >>> from Bio.Align import MultipleAlignment >>> records = list(SeqIO.parse(open("Tests/NBRF/DMA_nuc.pir", "r"), "pir")) >>> align = MultipleAlignment(records, executable="clustalw") >>> AlignIO.write([align], open("temp.sth", "w"), "stockholm") ie resulting in one file temp.sth, which we've explicitly written to disk. > For other tools we may be able to just > capture its output on stdout and not have to clean up a temp > output file. > > All the possible command line tools have their own arguments, > range of file formats, behaviour with respect to default filenames > etc. Trying to capture all this in a single wrapper seems rather > ambitious. For example, how would you handle gap penalties? > Keep in mind that different tools may use the same name for > a gap extension penalty but interpret the values differently. Sorry, I wasn't very clear about what I intended: MultipleAlignment(records, executable="clustalw", ) returns Clustalw.do_alignment(records, ) and MultipleAlignment(records, executable="muscle", ) returns Muscle.do_alignments(records, ) I'm not suggesting unifying all programme options into a single interface, just wrap the individual alignment tool modules in a common call, MulitpleAlignment(), align_records(), or whatever... As for the keyword options, at present the Clustalw interface supports the manual setting of some attributes to the MultipleAlignCL instance, but there is no type or value checking. I think as many options as possible should be supported through keyword arguments - tedious, but doable. Also, while I can see this might be nice for short alignments > (which are quick to run), its rather implicit or magic. Not sure what you mean here? Why would the size of alignment matter? And as for it being magic, its seems to me it does, and only does, what it says on the label - aligns the data. > I personally > prefer to have to deal with the files explicitly myself - but then I > have been dealing with large alignments which I want to keep > on disk. I tend to build many (small - <100 taxa) single gene alignments - in one use-case, 280 of them... > Secondly, the biopython interface does not support calling > > Clustalw to perform profile alignments, > > > > (suggested implementation) > > # The scaffold alignment: > >>>> align = AlignIO.read(open("blah.nex", "r"), "nexus") > > # The sequences we want to add to it: > >>>> records = list(SeqIO.parse(open("f002", "r"), "fasta")) > >>>> from Bio.Align import ProfileAlignment > >>>> align = ProfileAlignment(align, records, executable="clustalw") > > > > Calls to MultipleAlignment and ProfileAlignment would take a > > **options parameter to collect any additional command line options. > I'm very keen to see profile alignments supported - be it either in Clustalw or Muscle, or both. > > > Thirdly, should an alignment object have a > > Alignment.refine_alignment(executable="clustalw") > > method? > > > > Any thoughts? > > I may have misunderstood you, but the ideas you've sketched out > seem very very broad/ambitious - and actually take us further away > from the SeqIO/AlignIO interface by hiding all the filenames and > handles from the user. I think these should be kept explicit. OK, well having had my say, I'm quite happy to write the Muscle module in the style of the current Clustalw interface, or whatever style is most appropriate for exposing the filename handles. But I'm not sure what that would be - perhaps you could elaborate on this a bit... Cheers, C. -- ____________________________________________________________________ Cymon J. Cox Centro de Ciencias do Mar Faculdade de Ciencias do Mar e Ambiente (FCMA) Universidade do Algarve Campus de Gambelas 8005-139 Faro Portugal Phone: +0351 289800909 ext 7909 Fax: +0351 289800051 Email: cy at cymon.org, cymon at ualg.pt, cymon.cox at gmail.com HomePage : http://biology.duke.edu/bryology/cymon.html -8.63/-6.77 From biopython at maubp.freeserve.co.uk Tue Mar 31 11:27:07 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 31 Mar 2009 12:27:07 +0100 Subject: [Biopython-dev] Python's new DVCS chosen In-Reply-To: <8b34ec180903310405x5d5353f0q2de270a3c16bdc95@mail.gmail.com> References: <320fb6e00903310328x3c2d8bc0n8138f551da7ea4a2@mail.gmail.com> <8b34ec180903310405x5d5353f0q2de270a3c16bdc95@mail.gmail.com> Message-ID: <320fb6e00903310427s46e45337g42ced1a8e9c3a37f@mail.gmail.com> On Tue, Mar 31, 2009 at 12:05 PM, Bartek Wilczynski wrote: > I think we have made a consensus decision to try out git/github and I > think it's extremely counter-productive to re-open the discussion on > our choice now. I'm not a git fanboy, but because there are _no_ > universal criteria to choose between git vs. bzr vs. Hg we should not > spend more time on this issue. I hadn't intended to reopen the debate - it was just a post for interests sake. As you can probably tell from looking at the biopython network graph on github (which I got to work on Linux but only with Adobe's flash plugin - gnash etc didn't seem to cope), I've been getting to grips with git (and github). Peter From biopython at maubp.freeserve.co.uk Tue Mar 31 12:56:21 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 31 Mar 2009 13:56:21 +0100 Subject: [Biopython-dev] Multiple alignment - Clustalw etc... In-Reply-To: <7265d4f0903310425p60a8f80ewb8aee8cc6b4a663c@mail.gmail.com> References: <7265d4f0903300442h276df25ay1d78fb04180c5b5b@mail.gmail.com> <320fb6e00903300737i73f6efaex7b0a22ee685c74c1@mail.gmail.com> <7265d4f0903310425p60a8f80ewb8aee8cc6b4a663c@mail.gmail.com> Message-ID: <320fb6e00903310556h670634c2rcaa56c254ade07c5@mail.gmail.com> On Tue, Mar 31, 2009 at 12:25 PM, Cymon Cox wrote: >> What I typically do fits pretty nicely with the SeqIO/AlignIO paradigm: >> (1) use SeqIO to prepare the FASTA input file. >> (2) run the command line tool (e.g. MUSCLE). >> (3) use AlignIO (or SeqIO) to read the alignment output file. > > Well, yes - we can always not use the biopython interface. Ideally step (2) in the above would be handled via a Biopython command line wrapper, offering keyword arguments etc. >> i.e. Have a Biopython wrapper use a temp file to record the >> given records to in a format appropriate for the command line >> tool selected, and capturing the output? ?In the case of >> ClustalW or MUSCLE this means making a temp FASTA input >> file. ?For ClustalW we'd then have to open the output file, read >> it, and then delete it. > > Yes, that's what I'm suggesting. > > Here's my reasoning: it seems to me the input and output formats of the data > required by a particular alignment tool are incidental and should be hidden > from the user. OK - I see this as doing some implicit behind the scenes magic. Arguably this kind of thing is still nice to have if it makes things simpler for the user. I may over use this mantra, but "Explicit is better than implicit", from the Zen of Python. http://www.python.org/dev/peps/pep-0020/ > At present the Clustalw interface forces you to write a fasta > formatted file of your records to disk, and then has Clustalw > write an aligned matrix to disk in a format specified by the user. The Clustalw tool only takes FASTA formatted input, so if you have a bunch of sequences in memory you are forced to convert them into FASTA format to use them as input. The question is where does this conversion take place - explicitly by the user, or implicitly by a wrapper. > If the latter is Clustal format, then the record is parsed and an alignment > object is returned, else None is returned. In either case, an output file(s) > remains on disk. It should be a fairly simple enhancement to look at the arguments to see if another output format we can parse was selected, e.g. PHYLIP?) and also parse that. Do you think that would be a sensible addition to Bio.Clustalw.do_alignment? Its never been an issue for me as if you are using the Bio.Clustalw.do_alignment interface you probably don't care about the output file format. > So, say we have a bunch of sequences in pir format and we'd like them > aligned and saved in stockholm format: > > from Bio import SeqIO > from Bio import AlignIO > from Bio import Clustalw > from Bio.Clustalw import MultipleAlignCL > records = SeqIO.parse(open("Tests/NBRF/DMA_nuc.pir", "r"), "pir") > AlignIO.write([records], open("temp.fasta", "w"), "fasta") The above line is wrong - it should be: SeqIO.write(records, open("temp.fasta", "w"), "fasta") At this point your PIR sequences are not yet aligned, so they'll (probably) have different lengths, so shouldn't be treated as an alignment. If it doesn't raise an error maybe it should... Also you don't explicitly close the handle this way. > cline = MultipleAlignCL("temp.fasta", command="clustalw") > align = Clustalw.do_alignment(cline) > AlignIO.write([align], open("temp.sth", "w"), "stockholm") > we end up with 4 output files on disk: temp.aln, ?temp.dnd, ?temp.fasta, > temp.sth - 3 of which are incidental. Yes - but as the ClustalW doesn't read in PIR files, and doesn't output Stockholm files on its own, so this has to happen. It's just a question of who does it (the user, or the wrapper code). > (BTW, using the above procedure on the files "B_nuc.pir" and "Cw_prot.pir" > in Tests/NBRF hangs on RH and Ubuntu linux: it seems to be waiting for the > subprocess to return, which it never does: pid, sts = os.waitpid(self.pid, > 0)) I would guess this is because you never properly closed the temp.fasta file, so it may not have been flushed to disk when the Clustalw tool was called. > As I say, I'd like to see this: >>>> from Bio.Align import MultipleAlignment >>>> records = list(SeqIO.parse(open("Tests/NBRF/DMA_nuc.pir", "r"), "pir")) >>>> align = MultipleAlignment(records, executable="clustalw") >>>> AlignIO.write([align], open("temp.sth", "w"), "stockholm") > > ie resulting in one file temp.sth, which we've explicitly written to disk. So you'd like the wrapper to take care of creating and deleting the temp input FASTA file, and also deleting the temp output ClustalW file after parsing it. This can probably be done quite cleanly using python's NamedTemporaryFile object. >>?For other tools we may be able to just capture its output on >> stdout and not have to clean up a temp output file. >> >> All the possible command line tools have their own arguments, >> range of file formats, behaviour with respect to default filenames >> etc. ?Trying to capture all this in a single wrapper seems rather >> ambitious. ?For example, how would you handle gap penalties? >> Keep in mind that different tools may use the same name for >> a gap extension penalty but interpret the values differently. > > Sorry, I wasn't very clear about what I intended: > > MultipleAlignment(records, executable="clustalw", ) > returns Clustalw.do_alignment(records, ) > and > MultipleAlignment(records, executable="muscle", ) > returns Muscle.do_alignments(records, ) > > I'm not suggesting unifying all programme options into a single interface, > just wrap the individual alignment tool modules in a common call, > MulitpleAlignment(), align_records(), or whatever... I see. > As for the keyword options, at present the Clustalw interface supports the > manual setting of some attributes to the MultipleAlignCL instance, but there > is no type or value checking. I think as many options as possible should be > supported through keyword arguments - tedious, but doable. > >> Also, while I can see this might be nice for short alignments >> (which are quick to run), its rather implicit or magic. > > Not sure what you mean here? Why would the size of alignment matter? Size of alignment influences the compute time, and therefore is an issue for anyone doing things at the python prompt. Moreover, if the alignments are big and slow, you generally want to make sure the output file is kept on disk, as you'll probably want to read it more than once. > And as for it being magic, its seems to me it does, and only does, what > it says on the label - aligns the data. The magic is the behind the scenes creation/deletion of the input/output files, and the conversion between file formats. >> I personally prefer to have to deal with the files explicitly myself >> - but then I have been dealing with large alignments which I want >> to keep on disk. > > I tend to build many (small - <100 taxa) single gene alignments - in one > use-case, 280 of them... In your case I would assume the alignment takes minutes to run. You tend to care more about preserving the output files if they take hours to create ;) >> > Secondly, the biopython interface does not support calling >> > Clustalw to perform profile alignments, That is something we should probably add. > OK, well having had my say, I'm quite happy to write the Muscle module in > the style of the current Clustalw interface, or whatever style is most > appropriate for exposing the filename handles. But I'm not sure what that > would be - perhaps you could elaborate on this a bit... I've elaborated, perhaps too much? ;) Basically you seem to be thinking about a high level abstraction for multiple alignment tools (dependent on the Bio.SeqIO and Bio.AlignIO module), while I am more focused on the low level abstraction for wrapping any command line tool. This isn't to say we can't have both, but to me it makes sense to start with the low level stuff first. We (unfortunately) have several styles of command line tool wrappers in Biopython already - this is a wart that has been on my mental to do list for some time. I think we should focus on dealing with command line strings, and keep this separate from how the tools are invoked (e.g. subprocess or os.system), preparation of input files, and how any output is parsed. As long as this core is in place, more advanced wrappers are possible on top of this basic infrastructure (Tiago may have some comments here from his Bio.PopGen work). Essentially all our command line wrappers start by building a command line string. In some cases this command line string is exposed to the user (e.g. Bio.EMBOSS), and they can choose how they want to invoke it. For example, they can explicitly opt to use the Python subprocess module and pipes if they want to - or use a standard invocation from Bio.Applications (we may want to add a couple of variations to this module). Other wrappers (e.g. Bio.Blast.NCBIStandalone) instead call the tool for you. In the case of Bio.Blast.NCBIStandalone, if you don't want the handles because you've told Blast to save its output to a file, our wrapper still returns the standard output and standard error handles - it is forced on you (see Bug 2654). Also, there is no easy way to see what the actual command line string was, which can make debugging hard, and also prevents certain things (e.g. submitting the command line as a task to a cluster of workstations). At least Bio.Clustalw offers a command line string object (MultipleAlignCL), its just the do_alignment helper function I'm not so keen on. The Bio.Clustalw.do_alignment wrapper is rather unusual in that it automatically parses the output - while most of our wrappers don't. Decoupling the parsing is more modular - it makes it easy for the user to use any parser for the output from a command line tool (either using stdout, or by reading an output file). I like this, and it fits with the handle based approach in most of our parsers. So, I would suggest we think about adding new wrappers under Bio.Align (e.g. Bio.Align.Clustalw, Bio.Align.Muscle, Bio.Align.TCoffee - or perhaps all together in Bio.Align.Applications or something) based on the Bio.Application module as used in Bio.EMBOSS. We could then deprecate Bio.Clustalw, which should also help tidy up the top level name space. Initially at least, I wouldn't include any clever wrapper code at all. Once we have the basic command line objects done, these could be used to later add another layer on top implementing Cymon's ideas for multiple alignment wrappers taking care of intermediate file and inter-converting file formats on the fly, although I remain to be convinced about the value this. If you can pull it off (cross platform, on several versions of python) then a user friendly high level interface would be impressive. Peter From bartek at rezolwenta.eu.org Tue Mar 31 13:14:39 2009 From: bartek at rezolwenta.eu.org (Bartek Wilczynski) Date: Tue, 31 Mar 2009 15:14:39 +0200 Subject: [Biopython-dev] Python's new DVCS chosen In-Reply-To: <320fb6e00903310427s46e45337g42ced1a8e9c3a37f@mail.gmail.com> References: <320fb6e00903310328x3c2d8bc0n8138f551da7ea4a2@mail.gmail.com> <8b34ec180903310405x5d5353f0q2de270a3c16bdc95@mail.gmail.com> <320fb6e00903310427s46e45337g42ced1a8e9c3a37f@mail.gmail.com> Message-ID: <8b34ec180903310614k1fe4a08bkac19c2cc96b36fad@mail.gmail.com> On Tue, Mar 31, 2009 at 1:27 PM, Peter wrote: > On Tue, Mar 31, 2009 at 12:05 PM, Bartek Wilczynski > wrote: >> I think we have made a consensus decision to try out git/github and I >> think it's extremely counter-productive to re-open the discussion on >> our choice now. I'm not a git fanboy, but because there are _no_ >> universal criteria to choose between git vs. bzr vs. Hg we should not >> spend more time on this issue. > > I hadn't intended to reopen the debate - it was just a post for interests sake. > That's relieving. Maybe I'm becoming overly sensitive on the subject. > As you can probably tell from looking at the biopython network graph > on github (which I got to work on Linux but only with Adobe's flash > plugin - gnash etc didn't seem to cope), I've been getting to grips > with git (and github). > I haven't checked for a while, but it seem's that we've got quite a number of people making changes on different branches. That's cool. I'd like to encourage people to share their impressions of git+github with others on the list. If there are any issues, it's better to discuss them early. cheers Bartek From biopython at maubp.freeserve.co.uk Tue Mar 31 14:10:00 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 31 Mar 2009 15:10:00 +0100 Subject: [Biopython-dev] Easy Git - git for mere mortals? Message-ID: <320fb6e00903310710x693527f2k25b49d958543939d@mail.gmail.com> Hi all, Have any of you tried out easygit (eg)? If it is as good as it sounds on their website, it might be a sensible option for those migrating from CVS/SVN to git for the first time. http://www.gnome.org/~newren/eg/ Reading the easygit documentation, it sounds like git gives the user plenty of ways to shoot themselves in the foot (especially if used to CVS/SVN), and a lot of what easygit does is catch some of these potential mistakes. They also stress you can mix and match git and easy git, so it can act as a stepping stone to using git directly. This presentation seems a fairly gentle introduction (with plenty of for interest stuff in the second half that can be ignored), http://www.gnome.org/~newren/eg/presentations/git-introduction.pdf There are quite a few other wrappers for git too - all referred to as "porcelain", which apparently follows from Linux's division of end user commands in git into external "porcelain" and internal "plumbing". The "porcelain" are the bits of a bathroom the end user sees (like the sink), while they normally only interact with the "ugly plumbing" when something goes wrong (like dropping an ear ring down the sink). This kind of quirky language doesn't really make the documentation any clearer in my opinion, still I'm sure things are improving gradually (or at least, I hope they are). For the moment I've come to the conclusion the git man pages are not really suitable for beginners. Peter P.S. For the moment, let's keep the wiki page focused on using git itself directly - too many choices will confuse things. From cy at cymon.org Tue Mar 31 14:49:20 2009 From: cy at cymon.org (Cymon Cox) Date: Tue, 31 Mar 2009 15:49:20 +0100 Subject: [Biopython-dev] Multiple alignment - Clustalw etc... In-Reply-To: <320fb6e00903310556h670634c2rcaa56c254ade07c5@mail.gmail.com> References: <7265d4f0903300442h276df25ay1d78fb04180c5b5b@mail.gmail.com> <320fb6e00903300737i73f6efaex7b0a22ee685c74c1@mail.gmail.com> <7265d4f0903310425p60a8f80ewb8aee8cc6b4a663c@mail.gmail.com> <320fb6e00903310556h670634c2rcaa56c254ade07c5@mail.gmail.com> Message-ID: <7265d4f0903310749x154623few2689a0285f5f6983@mail.gmail.com> Hi Peter, 2009/3/31 Peter > On Tue, Mar 31, 2009 at 12:25 PM, Cymon Cox wrote:# > > > At present the Clustalw interface forces you to write a fasta > > formatted file of your records to disk, and then has Clustalw > > write an aligned matrix to disk in a format specified by the user. > > The Clustalw tool only takes FASTA formatted input, so if you have > a bunch of sequences in memory you are forced to convert them > into FASTA format to use them as input. The question is where > does this conversion take place - explicitly by the user, or implicitly > by a wrapper. Agreed - that's the question... > > If the latter is Clustal format, then the record is parsed and an > alignment > > object is returned, else None is returned. In either case, an output > file(s) > > remains on disk. > > It should be a fairly simple enhancement to look at the arguments > to see if another output format we can parse was selected, e.g. > PHYLIP?) and also parse that. Do you think that would be a > sensible addition to Bio.Clustalw.do_alignment? No - I dont think there should be any output file (of any format) at all, an alignment object should always be returned and the user explicitly write to format they want using AlignIO. (But I think this becomes clearer below...) > Its never been > an issue for me as if you are using the Bio.Clustalw.do_alignment > interface you probably don't care about the output file format. Quite. (Unless you are trying to write to a format not supported by biopython e.g. GCG, GDE, of course.) > > So, say we have a bunch of sequences in pir format and we'd like them > > aligned and saved in stockholm format: > > > > from Bio import SeqIO > > from Bio import AlignIO > > from Bio import Clustalw > > from Bio.Clustalw import MultipleAlignCL > > records = SeqIO.parse(open("Tests/NBRF/DMA_nuc.pir", "r"), "pir") > > AlignIO.write([records], open("temp.fasta", "w"), "fasta") > > The above line is wrong Doh! Grrr... Yeah, perhaps it should have raised an error - I'll follow this up elsewhere - but even with the corrected line and explicitly opening and closing the file handles, I still can get clustalw to align this file... (later...) > we end up with 4 output files on disk: temp.aln, temp.dnd, temp.fasta, > > temp.sth - 3 of which are incidental. > > Yes - but as the ClustalW doesn't read in PIR files, and doesn't output > Stockholm files on its own, so this has to happen. It's just a question > of who does it (the user, or the wrapper code). Yep... > > As I say, I'd like to see this: > >>>> from Bio.Align import MultipleAlignment > >>>> records = list(SeqIO.parse(open("Tests/NBRF/DMA_nuc.pir", "r"), > "pir")) > >>>> align = MultipleAlignment(records, executable="clustalw") > >>>> AlignIO.write([align], open("temp.sth", "w"), "stockholm") > > > > ie resulting in one file temp.sth, which we've explicitly written to > disk. > > So you'd like the wrapper to take care of creating and deleting the > temp input FASTA file, and also deleting the temp output ClustalW > file after parsing it. This can probably be done quite cleanly using > python's NamedTemporaryFile object. > Yep. > >> Also, while I can see this might be nice for short alignments > >> (which are quick to run), its rather implicit or magic. > > > > Not sure what you mean here? Why would the size of alignment matter? > > Size of alignment influences the compute time, and therefore is an issue > for > anyone doing things at the python prompt. Moreover, if the alignments are > big and slow, you generally want to make sure the output file is kept on > disk, > as you'll probably want to read it more than once. Agreed, but should the call to align the data (ie to clustalw) be writing the output to disk or should the user be making an explicit call using AlignIO? > > And as for it being magic, its seems to me it does, and only does, what > > it says on the label - aligns the data. > > The magic is the behind the scenes creation/deletion of the input/output > files, and the conversion between file formats. Fair enough - then magic it be... :) > > OK, well having had my say, I'm quite happy to write the Muscle module in > > the style of the current Clustalw interface, or whatever style is most > > appropriate for exposing the filename handles. But I'm not sure what that > > would be - perhaps you could elaborate on this a bit... > > I've elaborated, perhaps too much? ;) > > Basically you seem to be thinking about a high level abstraction for > multiple alignment tools (dependent on the Bio.SeqIO and Bio.AlignIO > module), while I am more focused on the low level abstraction for > wrapping any command line tool. This isn't to say we can't have both, > but to me it makes sense to start with the low level stuff first. > > We (unfortunately) have several styles of command line tool wrappers > in Biopython already - this is a wart that has been on my mental to do > list for some time. I think we should focus on dealing with command > line strings, and keep this separate from how the tools are invoked > (e.g. subprocess or os.system), preparation of input files, and how > any output is parsed. As long as this core is in place, more advanced > wrappers are possible on top of this basic infrastructure (Tiago may > have some comments here from his Bio.PopGen work). > > Essentially all our command line wrappers start by building a command > line string. In some cases this command line string is exposed to the > user (e.g. Bio.EMBOSS), and they can choose how they want to invoke > it. For example, they can explicitly opt to use the Python subprocess > module and pipes if they want to - or use a standard invocation from > Bio.Applications (we may want to add a couple of variations to this > module). > > Other wrappers (e.g. Bio.Blast.NCBIStandalone) instead call the tool > for you. In the case of Bio.Blast.NCBIStandalone, if you don't want > the handles because you've told Blast to save its output to a file, > our wrapper still returns the standard output and standard error > handles - it is forced on you (see Bug 2654). Also, there is no easy > way to see what the actual command line string was, which can make > debugging hard, and also prevents certain things (e.g. submitting the > command line as a task to a cluster of workstations). At least > Bio.Clustalw offers a command line string object (MultipleAlignCL), > its just the do_alignment helper function I'm not so keen on. > > The Bio.Clustalw.do_alignment wrapper is rather unusual in that it > automatically parses the output - while most of our wrappers don't. > Decoupling the parsing is more modular - it makes it easy for the user > to use any parser for the output from a command line tool (either > using stdout, or by reading an output file). I like this, and it fits > with the handle based approach in most of our parsers. Thanks for your thoughts on this, it helps clarify some things... > So, I would suggest we think about adding new wrappers under Bio.Align > (e.g. Bio.Align.Clustalw, Bio.Align.Muscle, Bio.Align.TCoffee - or > perhaps all together in Bio.Align.Applications or something) based on > the Bio.Application module as used in Bio.EMBOSS. We could then > deprecate Bio.Clustalw, which should also help tidy up the top level > name space. Initially at least, I wouldn't include any clever wrapper > code at all. OK, I'll aim for this with the Muscle code... Cheers, C. -- ____________________________________________________________________ Cymon J. Cox Centro de Ciencias do Mar Faculdade de Ciencias do Mar e Ambiente (FCMA) Universidade do Algarve Campus de Gambelas 8005-139 Faro Portugal Phone: +0351 289800909 ext 7909 Fax: +0351 289800051 Email: cy at cymon.org, cymon at ualg.pt, cymon.cox at gmail.com HomePage : http://biology.duke.edu/bryology/cymon.html -8.63/-6.77 From biopython at maubp.freeserve.co.uk Tue Mar 31 15:24:32 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 31 Mar 2009 16:24:32 +0100 Subject: [Biopython-dev] Multiple alignment - Clustalw etc... In-Reply-To: <7265d4f0903310749x154623few2689a0285f5f6983@mail.gmail.com> References: <7265d4f0903300442h276df25ay1d78fb04180c5b5b@mail.gmail.com> <320fb6e00903300737i73f6efaex7b0a22ee685c74c1@mail.gmail.com> <7265d4f0903310425p60a8f80ewb8aee8cc6b4a663c@mail.gmail.com> <320fb6e00903310556h670634c2rcaa56c254ade07c5@mail.gmail.com> <7265d4f0903310749x154623few2689a0285f5f6983@mail.gmail.com> Message-ID: <320fb6e00903310824v6fb0e1d2gff32b3effccd00b1@mail.gmail.com> On Tue, Mar 31, 2009 at 3:49 PM, Cymon Cox wrote: >>> >>> If the latter is Clustal format, then the record is parsed and an >>> alignment object is returned, else None is returned. In either >>> case, an output file(s) remains on disk. >> >> It should be a fairly simple enhancement to look at the arguments >> to see if another output format we can parse was selected, e.g. >> PHYLIP?) and also parse that. ?Do you think that would be a >> sensible addition to Bio.Clustalw.do_alignment? > > No - I dont think there should be any output file (of any format) at all, an > alignment object should always be returned and the user explicitly write to > format they want using AlignIO. (But I think this becomes clearer below...) Well there must be an output file, since ClustalW won't write its output alignment to stdout. Of course, you would have a wrapper which deletes the output file after it has been parsed into an Alignment object. However, we shouldn't change the existing Bio.Clustalw.do_alignment function to do this (or to delete the .dnd guide tree), since people may be using the call for these "side effects". >> ?Its never been >> an issue for me as if you are using the Bio.Clustalw.do_alignment >> interface you probably don't care about the output file format. > > Quite. (Unless you are trying to write to a format not supported by > biopython e.g. GCG, GDE, of course.) What I was saying was Bio.Clustalw.do_alignment knows the requested output format, and if it is ClustalW it automatically parses the output file and returns the alignment. Since this code was written, Bio.AlignIO was added and could potentially be used to parse PHYLIP (etc) output from the Clustalw tool. And one day maybe GCG etc too. i.e. Right now Bio.Clustalw.do_alignment will return an alignment if it is in ClustalW format, or None if it isn't. I'm suggesting Bio.Clustalw.do_alignment could return an alignment when Bio.AlignIO can parse the requested file format, or None if it can't. This would only be a small enhancement, and may not be worth bothering with if we are thinking about deprecating Bio.Clustalw with a replacement under Bio.Align. >> Size of alignment influences the compute time, and therefore is an issue >> for anyone doing things at the python prompt. ?Moreover, if the alignments >> are big and slow, you generally want to make sure the output file is kept >> on disk, as you'll probably want to read it more than once. > > Agreed, but should the call to align the data (ie to clustalw) be writing > the output to disk or should the user be making an explicit call using > AlignIO? The command line tool ClustalW will itself write the output to disk. I don't recall off hand, but other tools like Muscle may give the option of writing to a file or to stdout. In either case, the tool writes to a handle, and the user may want to *read* this handle using Bio.AlignIO. If I want the tool's output to go straight to a file, I'd get the tool to do it. The only reason I can see to be *writing* the alignment with Bio.AlignIO would be for file conversion (or after manipulating the alignment), and that would done by the user's python code. If you are talking about the data preparation (i.e. the input file rather than the output file), then I think it is up to the user's code to prepare a suitable input FASTA file (e.g. from SeqRecord objects with Bio.SeqIO) before calling the command line tool. >>> And as for it being magic, its seems to me it does, and only does, what >>> it says on the label - aligns the data. >> >> The magic is the behind the scenes creation/deletion of the input/output >> files, and the conversion between file formats. > > Fair enough - then magic it be... :) :) >> > OK, well having had my say, I'm quite happy to write the Muscle module in >> > the style of the current Clustalw interface, or whatever style is most >> > appropriate for exposing the filename handles. But I'm not sure what that >> > would be - perhaps you could elaborate on this a bit... >> >> I've elaborated, ... > > Thanks for your thoughts on this, it helps clarify some things... Oh good. If you don't agree with any of that, do say so by the way. >> So, I would suggest we think about adding new wrappers under Bio.Align >> (e.g. Bio.Align.Clustalw, Bio.Align.Muscle, Bio.Align.TCoffee - or >> perhaps all together in Bio.Align.Applications or something) based on >> the Bio.Application module as used in Bio.EMBOSS. ?We could then >> deprecate Bio.Clustalw, which should also help tidy up the top level >> name space. ?Initially at least, I wouldn't include any clever wrapper >> code at all. > > OK, I'll aim for this with the Muscle code... That sounds good. Now can I tempt you into trying out github at the same time, so we can see your proposed code evolve in public? Could I add at this point that I don't think the wrapper should set any default arguments - leave that up to the command line tool itself. Otherwise you can get the situation where the Biopython defaults get out of sync with the tool's own default values (an issue with our online qblast wrapper and the NCBI change their default settings over time). As an aside, I have used Muscle with Biopython thanks to its option for strict Clustal ouput, which can be parsed by Bio.AlignIO fine. For this I just generated my own command line on the fly, but I was only using a couple of the command line arguments. Peter From bugzilla-daemon at portal.open-bio.org Tue Mar 31 17:05:50 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 31 Mar 2009 13:05:50 -0400 Subject: [Biopython-dev] [Bug 2799] UnknownSeq object (e.g. for QUAL files) In-Reply-To: Message-ID: <200903311705.n2VH5oKe025136@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2799 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk 2009-03-31 13:05 EST ------- Checked into CVS from http://github.com/peterjc/biopython/tree/bug2799-UnknownSeq Checking in Bio/Seq.py; /home/repository/biopython/biopython/Bio/Seq.py,v <-- Seq.py new revision: 1.67; previous revision: 1.66 done Checking in Bio/SeqRecord.py; /home/repository/biopython/biopython/Bio/SeqRecord.py,v <-- SeqRecord.py new revision: 1.32; previous revision: 1.31 done Checking in Bio/GenBank/__init__.py; /home/repository/biopython/biopython/Bio/GenBank/__init__.py,v <-- __init__.py new revision: 1.106; previous revision: 1.105 done Checking in Bio/SeqIO/InsdcIO.py; /home/repository/biopython/biopython/Bio/SeqIO/InsdcIO.py,v <-- InsdcIO.py new revision: 1.9; previous revision: 1.8 done Checking in Bio/SeqIO/QualityIO.py; /home/repository/biopython/biopython/Bio/SeqIO/QualityIO.py,v <-- QualityIO.py new revision: 1.8; previous revision: 1.7 done Checking in Tests/test_SeqIO.py; /home/repository/biopython/biopython/Tests/test_SeqIO.py,v <-- test_SeqIO.py new revision: 1.50; previous revision: 1.49 done Checking in Tests/output/test_GenBank; /home/repository/biopython/biopython/Tests/output/test_GenBank,v <-- test_GenBank new revision: 1.41; previous revision: 1.40 done Checking in Tests/output/test_SeqIO; /home/repository/biopython/biopython/Tests/output/test_SeqIO,v <-- test_SeqIO new revision: 1.36; previous revision: 1.35 done -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython at maubp.freeserve.co.uk Tue Mar 31 17:12:37 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 31 Mar 2009 18:12:37 +0100 Subject: [Biopython-dev] SeqIO and qual: Question about reading and writing qual files In-Reply-To: <320fb6e00903251630t45da293fl4d8d111b7e7eedc9@mail.gmail.com> References: <9e2f512b0903232324qb509c60v4154d3e1bffb089e@mail.gmail.com> <320fb6e00903240249h4d0bf648rfd5de741e582f687@mail.gmail.com> <9e2f512b0903240759n3c7f8b8fpc96bccd4d629082d@mail.gmail.com> <320fb6e00903240813x5fdb3589qef340129b5e267c0@mail.gmail.com> <9e2f512b0903240833g7768de97q8f10fe72cde7e64a@mail.gmail.com> <320fb6e00903250301v59319214pa3246e0a49899e87@mail.gmail.com> <9e2f512b0903251615x7c14c90en3b3a9b2b6ff86186@mail.gmail.com> <320fb6e00903251630t45da293fl4d8d111b7e7eedc9@mail.gmail.com> Message-ID: <320fb6e00903311012y393761dev975a39464ab82043@mail.gmail.com> On Thu, Mar 26, 2009 at 12:30 AM, Peter wrote: > On Wed, Mar 25, 2009 at 11:15 PM, Sebastian Bassi: >>> Sebastian - could you have a quick play with this github code (using the new >>> UnknownSeq class), and the current CVS code (using None), and make sure >>> both support the slicing operations you were trying earlier? ?Thanks. >> >> ... >> >> From a practical point of view, both versions are the same, but the >> concept of UnknownSeq looks solid than None, because if I don't know >> about about biopython internals, I would never try to slice a None >> seq. With "None" ... >> But with the UnknownSeq object, len(s) returns an actual length, so it >> is more intuitive that it can be sliced. > > I agree the UnknownSeq is more intuitive - plus it makes the SeqRecord > __getitem__ code nicer, and it means you can do len(SeqRecord) too, > which was problematic if the sequence was None. I've checked this into CVS after this discussion (and a little off thread). I wasn't comfortable with using None for a sequence, and doing this while also wanting to support len(...) and slicing on such SeqRecord objects was basically horrible. >> Then I tried the git code and it also worked. One thing I noticed is >> that I got "?" instead of "N" the "sequence" of the UnknownSeq. > > I felt we shouldn't use an "N" unless we are confident the sequence > is nucleotides. In practice, this is probably a safe assumption for > FASTQ and QUAL files - unless anyone can think of a counter example? > Do you think it is safe to assume FASTQ and QUAL files are just for > nucleotides? > > I mean, you could translate a CDS from transcriptome sequencing, > and for the sake of argument give each amino acid a quality score > from the three nucleotide quality scores, and then save this a protein > FASTQ file. But I've never heard of anyone actually doing this ;) So, should we assume QUAL files (and perhaps FASTQ files) are nucleotides when reading them in, and enforce this when writing them out? This would mean the QUAL files' UnknownSeq objects would use the letter "N" instead of "?". Or is it more generic to leave it as it is, and not make or force any assumptions about the nature of the sequence? Peter From biopython at maubp.freeserve.co.uk Tue Mar 31 21:38:48 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 31 Mar 2009 22:38:48 +0100 Subject: [Biopython-dev] Plan for Biopython 1.50 (beta) Message-ID: <320fb6e00903311438g6fb0813bt18a035d485a6bb99@mail.gmail.com> Hi all, OK guys, after a brief chat off the mailing list, I'm hoping to do the Biopython 1.50 beta release roughly this weekend, somewhere between Friday 4 and Monday 6 April. Until then please consider CVS "frozen" for anything other that documentation changes or unit test additions, or at a push really tiny changes. Once I'm ready to actually do the release, I'll send out an email requesting no further CVS commits. Those of you that have committed changes, please check the NEWS file and DEPRECATED file is up to date - thanks. After the release of Biopython 1.50 beta, we'll reopen CVS again for small changes and documentation. While the beta is being tested by our user base, I'd like us to push to finish any missing documentation - in particular for new modules Bio.Motif (Bartek) and Bio.Graphics.GenomeDiagram (me and/or Leighton), plus the new SeqRecord slicing and UnknownSeq class (me). Depending on the feedback from the beta, I'd hope we can do the final release of Biopython 1.50 well before the end of April, and then reopen CVS for new code. That would also be a good point to evaluate moving from CVS to git. In the meantime, while CVS is (semi) frozen you can all try using github for keeping your pending submissions under version control ;) Peter