From mjldehoon at yahoo.com  Sun Mar  1 07:17:28 2009
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Sun, 1 Mar 2009 04:17:28 -0800 (PST)
Subject: [Biopython-dev] ScanProsite
Message-ID: <704108.77040.qm@web62402.mail.re1.yahoo.com>

ScanProsite is a web tool to scan protein sequences against the PROSITE database (see http://www.expasy.org/tools/scanprosite/). Biopython contains code in Bio.Prosite to interact with ScanProsite. However, this code needs to be updated, as it does not work with the current ScanProsite web pages: Neither accessing ScanProsite nor extracting the hits from the HTML page works.

This problem is relatively easy to solve, since ExPASy nowadays allows programmatic access to ScanProsite (see http://www.expasy.org/tools/scanprosite/ScanPrositeREST.html). This returns the Prosite hits in XML format, which can be parsed easily in Python.

The only issue now is how this should be presented to the user. The current (broken) way to access Prosite looks like this:

>>> from Bio import ExPASy
>>> handle = ExPASy.scanprosite1(seq=mysequence)
to get a handle to the raw HTML output, and

>>> from Bio import Prosite
>>> hits = Prosite.scan_sequence_expasy(seq=mysequence)
which returns the hits as a Python list.

One possibility is to have a ScanProsite module under Bio.Prosite or Bio.ExPASy for interaction with ScanProsite. Something like this:
>>> from Bio.ExPASy import ScanProsite
>>> handle = ScanProsite.search(seq=mysequence)
>>> hits = ScanProsite.read(handle)

Another option is to have a scan function in the Bio.Prosite module that accesses the ScanProsite web tool and parses the results:
>>> from Bio import Prosite
>>> hits = Prosite.scan(seq=mysequence)
This is more straightforward, but on the other hand people may want to save the XML search results in an XML file, and for that purpose we'd need a function that does the parsing only.

Any opinions?

--Michiel


From bugzilla-daemon at portal.open-bio.org  Sun Mar  1 12:00:36 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sun, 1 Mar 2009 12:00:36 -0500
Subject: [Biopython-dev] [Bug 2495] parse element symbols for ATOM/HETATM
	records (Bio.PDB.PDBParser)
In-Reply-To: <bug-2495-42@http.bugzilla.open-bio.org/>
Message-ID: <200903011700.n21H0alo006588@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2495


------- Comment #1 from barry_finzel at yahoo.com  2009-03-01 12:00 EST -------
IO.save should also write these element types on an output PDB file


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Sun Mar  1 12:06:54 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sun, 1 Mar 2009 12:06:54 -0500
Subject: [Biopython-dev] [Bug 2292] Bio.PDBIO writes TER records without any
	required fields
In-Reply-To: <bug-2292-42@http.bugzilla.open-bio.org/>
Message-ID: <200903011706.n21H6sJp007165@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2292


barry_finzel at yahoo.com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |barry_finzel at yahoo.com


------- Comment #2 from barry_finzel at yahoo.com  2009-03-01 12:06 EST -------
IO.save is also writing TER cards at the end of chains, rather than at the end
of polypeptide chains.
TER cards should never follow HETATM  atom records.  


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Sun Mar  1 12:22:28 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sun, 1 Mar 2009 12:22:28 -0500
Subject: [Biopython-dev] [Bug 2774] New: Bio.PDBIO.save doesn't write the
	required END record
Message-ID: <bug-2774-42@http.bugzilla.open-bio.org/>

http://bugzilla.open-bio.org/show_bug.cgi?id=2774

           Summary: Bio.PDBIO.save doesn't write the required END record
           Product: Biopython
           Version: Not Applicable
          Platform: All
        OS/Version: Mac OS
            Status: NEW
          Severity: normal
          Priority: P2
         Component: Other
        AssignedTo: biopython-dev at biopython.org
        ReportedBy: barry_finzel at yahoo.com


According to the PDB format specification
(http://www.wwpdb.org/documentation/format32/sect1.html)
All PDB files must be terminated with a record containing just "END\n".

Easy to fix in PDBIO.save()


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From biopython at maubp.freeserve.co.uk  Mon Mar  2 05:26:38 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Mon, 2 Mar 2009 10:26:38 +0000
Subject: [Biopython-dev] ScanProsite
In-Reply-To: <704108.77040.qm@web62402.mail.re1.yahoo.com>
References: <704108.77040.qm@web62402.mail.re1.yahoo.com>
Message-ID: <320fb6e00903020226n3e5929ean957f38315c28d863@mail.gmail.com>

On Sun, Mar 1, 2009 at 12:17 PM, Michiel de Hoon <mjldehoon at yahoo.com> wrote:
> ScanProsite is a web tool to scan protein sequences against the PROSITE
> database (see http://www.expasy.org/tools/scanprosite/). Biopython contains
> code in Bio.Prosite to interact with ScanProsite. However, this code needs to
> be updated, as it does not work with the current ScanProsite web pages:
> Neither accessing ScanProsite nor extracting the hits from the HTML page works.
>
> This problem is relatively easy to solve, since ExPASy nowadays allows
> programmatic access to ScanProsite
> (see http://www.expasy.org/tools/scanprosite/ScanPrositeREST.html). This
> returns the Prosite hits in XML format, which can be parsed easily in Python.
>
> The only issue now is how this should be presented to the user. ...
> ...
> This is more straightforward, but on the other hand people may want to save the
> XML search results in an XML file, and for that purpose we'd need a function that
> does the parsing only.
>
> Any opinions?

I would definitely have two functions, one returning a handle to the
XML, and one for parsing XML from a handle.  This would be more
consistent with Bio.Entrez and other parsers, and more flexible.  For
example, the user can opt to save the XML to disk, and they can also
use our parser on files or the remote site - plus of course they can
use any other XML parser they may prefer.

I like your suggestion to have a REST XML based module under
Bio.ExPASy, which means we can deprecate the HTML based Bio.Prosite
module and in the process make the top level list of modules in
Biopython a bit shorter.  In the long term I think that will help
people find functionality.

Peter

From bugzilla-daemon at portal.open-bio.org  Mon Mar  2 10:22:53 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 2 Mar 2009 10:22:53 -0500
Subject: [Biopython-dev] [Bug 2776] New: Bio.pairwise2 returns non-optimal
	alignment in at least some cases
Message-ID: <bug-2776-42@http.bugzilla.open-bio.org/>

http://bugzilla.open-bio.org/show_bug.cgi?id=2776

           Summary: Bio.pairwise2 returns non-optimal alignment in at least
                    some cases
           Product: Biopython
           Version: 1.49
          Platform: PC
        OS/Version: Linux
            Status: NEW
          Severity: normal
          Priority: P2
         Component: Main Distribution
        AssignedTo: biopython-dev at biopython.org
        ReportedBy: klaus.kopec at tuebingen.mpg.de


At least in some cases, Bio.pairwise2 returns an alignment that is not the one
with the highest score for the input parameters. This occurs in localXX and
globalXX.

Yet, I only encountered the problem with large mismatch values (which I use as
I need mismatch free alignments).

simple example (the bug also occured for longer sequences):
>>> sequence1 = 'GKG'
>>> sequence2 = 'GWG'
>>> A = pairwise2.align.globalms(sequence1, sequence2, 5, -100, -5, -5)[0]
>>> A[0]
'GKG--'
>>> A[1]
'--GWG'
>>> A[2]
-15.0

whereas
'GK-G'
'G-WG'

would get a score of 0


System: Kubuntu 8.10 64Bit, Python 2.6.1, Biopython 1.49 (my pairwise2.py is
identical to the current CVS version of it)


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Wed Mar  4 07:41:33 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 4 Mar 2009 07:41:33 -0500
Subject: [Biopython-dev] [Bug 2777] New: [Solution is one line change!]
	Entity sorting altered by detach_child() calls
Message-ID: <bug-2777-42@http.bugzilla.open-bio.org/>

http://bugzilla.open-bio.org/show_bug.cgi?id=2777

           Summary: [Solution is one line change!] Entity sorting altered by
                    detach_child() calls
           Product: Biopython
           Version: 1.49
          Platform: PC
        OS/Version: Linux
            Status: NEW
          Severity: trivial
          Priority: P1
         Component: Main Distribution
        AssignedTo: biopython-dev at biopython.org
        ReportedBy: klaus.kopec at tuebingen.mpg.de


detach_child(self, id) in Bio.PDB.Entity changes the order of self.child_list.

This bug is caused by line 71, where self.child_list is set to
self.child_dict.values() which are values of an unordered(!) dict:
self.child_list=self.child_dict.values()

Solution: Replace line 71 by:
self.child_list.remove(child)


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Wed Mar  4 07:48:19 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 4 Mar 2009 07:48:19 -0500
Subject: [Biopython-dev] [Bug 2777] [Solution is one line change!] Entity
	sorting altered by detach_child() calls
In-Reply-To: <bug-2777-42@http.bugzilla.open-bio.org/>
Message-ID: <200903041248.n24CmJSZ008104@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2777


------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk  2009-03-04 07:48 EST -------
Have you got a short example to demonstrate the original problem?

It would be useful to evaluate your change, and could be made into a unit test
too.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Wed Mar  4 08:58:41 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 4 Mar 2009 08:58:41 -0500
Subject: [Biopython-dev] [Bug 2777] [Solution is one line change!] Entity
	sorting altered by detach_child() calls
In-Reply-To: <bug-2777-42@http.bugzilla.open-bio.org/>
Message-ID: <200903041358.n24Dwfjk015027@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2777


------- Comment #2 from klaus.kopec at tuebingen.mpg.de  2009-03-04 08:58 EST -------
Created an attachment (id=1253)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1253&action=view)
example PDB file that can be used to see the bug

## Python Code to see the bug:
import os
from Bio.PDB.PDBParser import PDBParser
p=PDBParser(PERMISSIVE=1)
filename=os.path.expanduser("entity_detach_order_bug_example.pdb")
s=p.get_structure('Entity.py bug example: detach changes order', filename)

print 'order before detach:'
for r in s[0]['A'].child_list:
    print r.id

detach_me = s[0]['A'].child_list[-1] ## this is independent of the chosen entry
in the list
s[0]['A'].detach_child(detach_me.id)

print 'order after detach:'
for r in s[0]['A'].child_list:
    print r.id


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Wed Mar  4 09:18:28 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 4 Mar 2009 09:18:28 -0500
Subject: [Biopython-dev] [Bug 2777] [Solution is one line change!] Entity
	sorting altered by detach_child() calls
In-Reply-To: <bug-2777-42@http.bugzilla.open-bio.org/>
Message-ID: <200903041418.n24EISvd016743@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2777


------- Comment #3 from klaus.kopec at tuebingen.mpg.de  2009-03-04 09:18 EST -------
the output of the code in my Comment #2 is:
order before detach:
('H_PCA', 1, ' ')
(' ', 2, ' ')
(' ', 3, ' ')
(' ', 4, ' ')
order after detach:
(' ', 2, ' ')
(' ', 3, ' ')
('H_PCA', 1, ' ')

I forgot to mention, that the line "self.child_list.sort(self._sort)" needs to
be commented out as well for the fix to work (as hetatms are otherwise sorted
to the end).

hmmm... it just came to me, that this probably breaks the Parser for some other
PDB files, where residues are unsorted.

These changes do not break any existing unit tests for the PDB module, so maybe
it's still a step in the right direction.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Wed Mar  4 09:37:34 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 4 Mar 2009 09:37:34 -0500
Subject: [Biopython-dev] [Bug 2777] [Solution is one line change!] Entity
	sorting altered by detach_child() calls
In-Reply-To: <bug-2777-42@http.bugzilla.open-bio.org/>
Message-ID: <200903041437.n24EbYhj018545@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2777


------- Comment #4 from biopython-bugzilla at maubp.freeserve.co.uk  2009-03-04 09:37 EST -------
Created an attachment (id=1254)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1254&action=view)
Patch for Bio/PDB/Entity.py based on Klaus Kopec's suggestion

I've attached a patch which makes the suggested change.  I'm hoping to get
Thomas (the original author) to comment but otherwise I see no reason not to
commit this fix soon.

The old code did this:

    def detach_child(self, id):
        "Remove a child."
        child=self.child_dict[id] 
        child.detach_parent()
        del self.child_dict[id]
        self.child_list=self.child_dict.values()
        self.child_list.sort(self._sort)

It used a sort which should have preserved the order - but that only works if
the child_list is always kept sorted.  Looking at the add method, this isn't
true:

    def add(self, entity):
        "Add a child to the Entity."
        entity_id=entity.get_id()
        if self.has_id(entity_id):
            raise PDBConstructionException( \
                "%s defined twice" % str(entity_id))
        entity.set_parent(self)
        self.child_list.append(entity)
        #self.child_list.sort(self._sort)
        self.child_dict[entity_id]=entity

Interestingly the sort was commented out in the original version first
committed to Biopython's CVS, so this change predates the integration into
Biopython.

I haven't checked to see if there are any other ways the child_list could
become unsorted - that doesn't really matter.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Wed Mar  4 11:17:31 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 4 Mar 2009 11:17:31 -0500
Subject: [Biopython-dev] [Bug 2774] Bio.PDBIO.save doesn't write the
	required END record
In-Reply-To: <bug-2774-42@http.bugzilla.open-bio.org/>
Message-ID: <200903041617.n24GHVd1029752@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2774


thamelry at binf.ku.dk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED


------- Comment #1 from thamelry at binf.ku.dk  2009-03-04 11:17 EST -------

save method now has option 'write_end':

io.save(fp, write_end=1)

if 1, END is written. The reason this is not done by default is that one
sometimes calls 'save' multiple times, for example when concatenating files. So
always writing END is not a good approach.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Wed Mar  4 14:10:37 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 4 Mar 2009 14:10:37 -0500
Subject: [Biopython-dev] [Bug 2778] New: Efficiency improvement in function
	Bio.SeqUtils.GC()
Message-ID: <bug-2778-42@http.bugzilla.open-bio.org/>

http://bugzilla.open-bio.org/show_bug.cgi?id=2778

           Summary: Efficiency improvement in function Bio.SeqUtils.GC()
           Product: Biopython
           Version: 1.48
          Platform: PC
        OS/Version: Linux
            Status: NEW
          Severity: normal
          Priority: P5
         Component: Main Distribution
        AssignedTo: biopython-dev at biopython.org
        ReportedBy: wscott at chem.ubc.ca


Bio.SeqUtils.GC recalculates the gc variable in a loop using a dictionary
whereas it could simply be calculated after the loop.
The following code is suggested to replace the function:

def ScoGC(seq):
   """ calculates G+C content """
   gc=sum(map(seq.count,['G','C','g','c','S','s']))
   return gc*100.0/len(seq)


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Wed Mar  4 14:12:27 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 4 Mar 2009 14:12:27 -0500
Subject: [Biopython-dev] [Bug 2778] Efficiency improvement in function
	Bio.SeqUtils.GC()
In-Reply-To: <bug-2778-42@http.bugzilla.open-bio.org/>
Message-ID: <200903041912.n24JCR2U014353@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2778


wscott at chem.ubc.ca changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |wscott at chem.ubc.ca


------- Comment #1 from wscott at chem.ubc.ca  2009-03-04 14:12 EST -------
of course, rename ScoGC to GC...


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Wed Mar  4 17:03:59 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 4 Mar 2009 17:03:59 -0500
Subject: [Biopython-dev] [Bug 2779] New: Seq.count() docstring should note
	unexpected behaviour
Message-ID: <bug-2779-42@http.bugzilla.open-bio.org/>

http://bugzilla.open-bio.org/show_bug.cgi?id=2779

           Summary: Seq.count() docstring should note unexpected behaviour
           Product: Biopython
           Version: 1.49
          Platform: PC
        OS/Version: Windows XP
            Status: NEW
          Severity: normal
          Priority: P2
         Component: Documentation
        AssignedTo: biopython-dev at biopython.org
        ReportedBy: baoilleach at gmail.com


The Seq.count() method has the following docs:
"Count method, like that of a python string."

This is a cop-out as it does not tell the user anything. In particular, it does
not lead the user to expect that Seq("GGG").count("GG")==1. This might make
sense for Python strings, but it's incorrect for sequences.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Thu Mar  5 04:19:40 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 5 Mar 2009 04:19:40 -0500
Subject: [Biopython-dev] [Bug 2678] Bio.Entrez module does not always
	retrieve or find DTD files
In-Reply-To: <bug-2678-42@http.bugzilla.open-bio.org/>
Message-ID: <200903050919.n259Je8d016299@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2678


ruzzo at cs.washington.edu changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |ruzzo at cs.washington.edu


------- Comment #8 from ruzzo at cs.washington.edu  2009-03-05 04:19 EST -------
I'm new to biopython, so I may be doing something else wrong, but in attempting
to efetch a pubmed record tonight I see similar errors which seem to be fixed
by downloading & installing several (new) DTD's:

nlmmedline_090101.dtd
nlmmedlinecitation_090101.dtd
nlmsharedcatcit_090101.dtd
nlmcommon_090101.dtd
and possibly 
pubmed_090101.dtd


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Thu Mar  5 04:23:31 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 5 Mar 2009 04:23:31 -0500
Subject: [Biopython-dev] [Bug 2779] Seq.count() docstring should note
	unexpected behaviour
In-Reply-To: <bug-2779-42@http.bugzilla.open-bio.org/>
Message-ID: <200903050923.n259NV4S016627@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2779


------- Comment #1 from lpritc at scri.sari.ac.uk  2009-03-05 04:23 EST -------
I think that's a good point about expected behaviour for count() in a
biological sequence.  Presumably, we all expect that Seq('GGG').count('GG')
should find all overlapping matches, and return the value 2, in order to make
intuitive 'biological' sense.  There are, after all, two 'GG's in that
sequence.  This doesn't correspond to string count()ing behaviour, or to
standard re module behaviour.

The obvious way round it, that I've used before, is to compile the search
string as a regular expression, and iterate regular expression matches from one
symbol after the start of the preceding match (if any):

>>> import re
>>> startpos = 0
>>> seq = 'GGGG'
>>> motif = 'GG'
>>> motif_re = re.compile(motif)
>>> matches = []
>>> while True:
...     m = motif_re.search(seq, startpos)
...     if m is None:
...             break
...     startpos = m.start() + 1
...     matches.append(m)
... 
>>> matches
[<_sre.SRE_Match object at 0x68f38>, <_sre.SRE_Match object at 0x96ac60>,
<_sre.SRE_Match object at 0x96a950>]
>>> [(m.start(), m.group()) for m in matches]
[(0, 'GG'), (1, 'GG'), (2, 'GG')]

This could probably be done more efficiently.  Is something like this already
implemented in Bio.Motif


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Thu Mar  5 04:24:43 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 5 Mar 2009 04:24:43 -0500
Subject: [Biopython-dev] [Bug 2779] Seq.count() docstring should note
	unexpected behaviour
In-Reply-To: <bug-2779-42@http.bugzilla.open-bio.org/>
Message-ID: <200903050924.n259OhYw016750@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2779


------- Comment #2 from lpritc at scri.sari.ac.uk  2009-03-05 04:24 EST -------
D'oh!  There isn't a Bio.Motif.  My bad.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Thu Mar  5 04:43:09 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 5 Mar 2009 04:43:09 -0500
Subject: [Biopython-dev] [Bug 2779] Seq.count() docstring should note
	unexpected behaviour
In-Reply-To: <bug-2779-42@http.bugzilla.open-bio.org/>
Message-ID: <200903050943.n259h9XG018545@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2779


------- Comment #3 from baoilleach at gmail.com  2009-03-05 04:43 EST -------
Thanks for the workaround but could you replace the current count by that code?

Can you imagine any existing code that would break because of correction of
buggy behaviour?


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Thu Mar  5 05:16:52 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 5 Mar 2009 05:16:52 -0500
Subject: [Biopython-dev] [Bug 2779] Seq.count() docstring should note
	unexpected behaviour
In-Reply-To: <bug-2779-42@http.bugzilla.open-bio.org/>
Message-ID: <200903051016.n25AGqSW021680@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2779


------- Comment #4 from lpritc at scri.sari.ac.uk  2009-03-05 05:16 EST -------
Created an attachment (id=1255)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1255&action=view)
Patch to Seq.py that modified count behaviour for Seq and MutableSeq objects to
return correct counts for substrings of length > 1

(In reply to comment #3)
> Thanks for the workaround but could you replace the current count by that code?

I don't have access to CVS ;)

It would be nice to get consensus that the behaviour that this code would
produce is the desired behaviour for everyone, that we've got an acceptable way
of implementing it, and that it doesn't break anything downstream.  There's
bound to be, at best, a lag time.

I've attached a proposed patch based on the above code, though it's not
necessarily the best way to solve this problem.

> Can you imagine any existing code that would break because of correction of
> buggy behaviour?

That should come out in the testing.

And it turns out that there is a Bio.Motif, but it's in CVS. D'oh! again...


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Thu Mar  5 05:22:40 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 5 Mar 2009 05:22:40 -0500
Subject: [Biopython-dev] [Bug 2779] Seq.count() docstring should note
	unexpected behaviour
In-Reply-To: <bug-2779-42@http.bugzilla.open-bio.org/>
Message-ID: <200903051022.n25AMeIt022121@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2779


------- Comment #5 from biopython-bugzilla at maubp.freeserve.co.uk  2009-03-05 05:22 EST -------
Prior to Biopython 1.45, the count method only worked with single letter search
strings.  I changed this just over a year ago for Biopython 1.45 as Bug 2386,
but unfortunately at the time none of us considered this
overlapping/non-overlapping behaviour.  With hindsight we should have had this
debate then.

http://cvs.biopython.org/cgi-bin/viewcvs/viewcvs.cgi/biopython/Bio/Seq.py.diff?r1=1.19&r2=1.20&cvsroot=biopython

We should either:

(a) stick with the python string compatible behaviour (which has been a general
principle for the Seq class), but document this issue more clearly as a
non-overlapping search does run counter to biological usage.

or,

(b) Or change the behaviour as Leighton suggests to do an overlapping search. 
This could break any code relying on the old python string-like behaviour.

I agree we need to have a discussion of this over on the main mailing list, as
making the change could break people's code.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Thu Mar  5 05:42:27 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 5 Mar 2009 05:42:27 -0500
Subject: [Biopython-dev] [Bug 2780] New: PDB file HETATMs cannot be
	alternative location of a residue that is an ATOM
Message-ID: <bug-2780-42@http.bugzilla.open-bio.org/>

http://bugzilla.open-bio.org/show_bug.cgi?id=2780

           Summary: PDB file HETATMs cannot be alternative location of a
                    residue that is an ATOM
           Product: Biopython
           Version: 1.49
          Platform: PC
        OS/Version: Linux
            Status: NEW
          Severity: normal
          Priority: P2
         Component: Main Distribution
        AssignedTo: biopython-dev at biopython.org
        ReportedBy: klaus.kopec at tuebingen.mpg.de


In PDB files where HETATMs and ATOMs are altlocs of each other (e.g. 1RR2,
residue 184), they are treated as two separate residues.

A obvious solution is to add an "else" case to the "if" in StructureBuilder.py
line 115 (method init_residue(...)) that introduces some kind of mixed (HETATM
as well as ATOM) DisorderedResidue.

The Main problem with that: the hetero field of the residue ids will differ
between the residues, therefore the whole access-over-ids mechanism will most
likely not work with these MixedDisorderedResidues as straight forward as it
does so far.

Sadly, I could not come up with a good solution for this. Maybe some
__getattr__ magic that alters the way Chains access their residues might work
by allowing access to residues by only using the second and third component of
the id 3-tuple?!


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Thu Mar  5 05:44:12 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 5 Mar 2009 05:44:12 -0500
Subject: [Biopython-dev] [Bug 2678] Bio.Entrez module does not always
	retrieve or find DTD files
In-Reply-To: <bug-2678-42@http.bugzilla.open-bio.org/>
Message-ID: <200903051044.n25AiCH9023924@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2678


------- Comment #9 from biopython-bugzilla at maubp.freeserve.co.uk  2009-03-05 05:44 EST -------
(In reply to comment #8)
> I'm new to biopython, so I may be doing something else wrong, but in
> attempting to efetch a pubmed record tonight I see similar errors which
> seem to be fixed by downloading & installing several (new) DTD's:
> 
> nlmmedline_090101.dtd
> nlmmedlinecitation_090101.dtd
> nlmsharedcatcit_090101.dtd
> nlmcommon_090101.dtd
> and possibly 
> pubmed_090101.dtd
> 

Those have been added to CVS, and will be installed with Biopython 1.50 -
perhaps we should hurry up our release plans.

http://cvs.biopython.org/cgi-bin/viewcvs/viewcvs.cgi/biopython/Bio/Entrez/DTDs/?cvsroot=biopython


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Thu Mar  5 05:46:09 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 5 Mar 2009 05:46:09 -0500
Subject: [Biopython-dev] [Bug 2780] PDB file HETATMs cannot be alternative
	location of a residue that is an ATOM
In-Reply-To: <bug-2780-42@http.bugzilla.open-bio.org/>
Message-ID: <200903051046.n25Ak9DH024105@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2780


------- Comment #1 from klaus.kopec at tuebingen.mpg.de  2009-03-05 05:46 EST -------
Created an attachment (id=1256)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1256&action=view)
PDB file slice with 2 residues, that can be used to see the bug.

slice of PDB file 1RR2 (example mentioned in my bug submission) showing two
altloc residues where one is a HETATM and the other an ATOM. They are treated
as two residues in Biopython.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Thu Mar  5 05:56:39 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 5 Mar 2009 05:56:39 -0500
Subject: [Biopython-dev] [Bug 2778] Efficiency improvement in function
	Bio.SeqUtils.GC()
In-Reply-To: <bug-2778-42@http.bugzilla.open-bio.org/>
Message-ID: <200903051056.n25AudjU024927@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2778


biopython-bugzilla at maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED


------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk  2009-03-05 05:56 EST -------
I've checked that in, but with the existing code to catch a zero length
sequence and return 0 instead of raising a ZeroDivisionError.

def GC(seq):
    """Calculates G+C content, ..."""
    gc=sum(map(seq.count,['G','C','g','c','S','s']))
    if gc == 0: return 0
    return gc*100.0/len(seq)


The old code had been modified several times - it originally calculated the GC%
as the CG count divided by the ATCG count, thus it had to count all the bases. 
You are right, this is much cleaner.

Thanks.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Thu Mar  5 06:18:33 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 5 Mar 2009 06:18:33 -0500
Subject: [Biopython-dev] [Bug 2779] Seq.count() docstring should note
	unexpected behaviour
In-Reply-To: <bug-2779-42@http.bugzilla.open-bio.org/>
Message-ID: <200903051118.n25BIXdp026743@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2779


------- Comment #6 from baoilleach at gmail.com  2009-03-05 06:18 EST -------
Sorry - could you clarify which mailing list you mean by the "main mailing
list", the dev list or the discuss list?


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Thu Mar  5 07:27:49 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 5 Mar 2009 07:27:49 -0500
Subject: [Biopython-dev] [Bug 2779] Seq.count() docstring should note
	unexpected behaviour
In-Reply-To: <bug-2779-42@http.bugzilla.open-bio.org/>
Message-ID: <200903051227.n25CRnmA001571@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2779


------- Comment #7 from biopython-bugzilla at maubp.freeserve.co.uk  2009-03-05 07:27 EST -------
(In reply to comment #6)
> Sorry - could you clarify which mailing list you mean by the "main mailing
> list", the dev list or the discuss list?

I was thinking the main discussion list, and we should focus on the desired
behaviour rather than how we might implement it.  See:

http://lists.open-bio.org/pipermail/biopython/2009-March/004960.html

Peter


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Thu Mar  5 07:31:50 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 5 Mar 2009 07:31:50 -0500
Subject: [Biopython-dev] [Bug 2781] New: Bio.PDB Structure instances cannot
	be deepcopied
Message-ID: <bug-2781-42@http.bugzilla.open-bio.org/>

http://bugzilla.open-bio.org/show_bug.cgi?id=2781

           Summary: Bio.PDB Structure instances cannot be deepcopied
           Product: Biopython
           Version: 1.49
          Platform: PC
        OS/Version: Linux
            Status: NEW
          Severity: minor
          Priority: P3
         Component: Main Distribution
        AssignedTo: biopython-dev at biopython.org
        ReportedBy: klaus.kopec at tuebingen.mpg.de


For some reason, copy.deepcopy() of a Structure instance results in:

Exception RuntimeError: 'maximum recursion depth exceeded while calling a
Python object' in <type 'exceptions.AttributeError'> ignored

for most PDB files I tried.

Maybe implementing some __deepcopy__ methods might help, but I am unsure, as I
did not perform profound research concerning this bug.

My system: Kubuntu 8.10 64-Bit, Python 2.6.1


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From biopython at maubp.freeserve.co.uk  Thu Mar  5 07:40:16 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Thu, 5 Mar 2009 12:40:16 +0000
Subject: [Biopython-dev] Fwd: [Utilities-announce] NCBI E-Utilities
	requirements updated
In-Reply-To: <7B6F170840CA6C4DA63EE0C8A7BB43EC051985F6@NIHCESMLBX15.nih.gov>
References: <AcmN/nulEie6nWfHT9+4rg4/ff6DGwADuzjwAozlJvA=>
	<7B6F170840CA6C4DA63EE0C8A7BB43EC051985F6@NIHCESMLBX15.nih.gov>
Message-ID: <320fb6e00903050440h138893b3yb2484a557621fc41@mail.gmail.com>

This email was sent out a few weeks ago, but it took a while before
the NCBI webpage was actually updated (maybe a caching issue) so I
didn't rush to relax our rules immediately.

Under the new rules we must make no more than three requests every
second.  We could track the times of the last two requests in order to
enforce this as worded, but I think it would be simpler just to switch
from using a minimum 3 second pause between Bio.Entrez requests to
just a minimum 0.33334 second pause.  This is a much simpler code
change and will comply with the new relaxed rules.

Unless anyone has a counter suggestion, I will update Bio.Entrez and
the tutorial shortly.

Peter
---------- Forwarded message ----------
From:  <utilities-announce at ncbi.nlm.nih.gov>
Date: Thu, Feb 26, 2009 at 6:55 PM
Subject: [Utilities-announce] NCBI E-Utilities requirements updated
To: utilities-announce at ncbi.nlm.nih.gov


NCBI E-Utilities users,

E-Utilities system use requirements have been modified ?from no more
than 1 request every 3 seconds to no more than 3 requests every
second.

The online documentation has been updated to reflect this change:

http://eutils.ncbi.nlm.nih.gov/entrez/query/static/eutils_help.html


Thank you.

NCBI/NLM/NIH

_______________________________________________
Utilities-announce mailing list
http://www.ncbi.nlm.nih.gov/mailman/listinfo/utilities-announce


From bugzilla-daemon at portal.open-bio.org  Thu Mar  5 07:58:40 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 5 Mar 2009 07:58:40 -0500
Subject: [Biopython-dev] [Bug 2779] Seq.count() docstring should note
	unexpected behaviour
In-Reply-To: <bug-2779-42@http.bugzilla.open-bio.org/>
Message-ID: <200903051258.n25Cwe9p004288@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2779


------- Comment #8 from barwil at gmail.com  2009-03-05 07:58 EST -------
(In reply to comment #4)

> This could probably be done more efficiently.  Is something like this already
> implemented in Bio.Motif
> 

In Bio.Motif you can do:

m=Bio.Motif.Motif()
m.add_instance(Seq("GG"),m.alphabet))
for i in m.search_instances(Seq("GGGG",m.alphabet)):
  print i

this should give you overlapping hits

there is Bio.Motif in CVS, but the same implementation is in Bio.AlignAce.Motif
(now obsoleted).


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From biopython at maubp.freeserve.co.uk  Thu Mar  5 07:58:40 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Thu, 5 Mar 2009 12:58:40 +0000
Subject: [Biopython-dev] determining the version
In-Reply-To: <320fb6e00902190225o34092311saddf02ec39f1e1dd@mail.gmail.com>
References: <320fb6e00809241412r54c2a3a1mc69f3e573f1eaac7@mail.gmail.com>
	<63700.34226.qm@web62405.mail.re1.yahoo.com>
	<320fb6e00809250222h3d0d15bw763446b5f0ec44d1@mail.gmail.com>
	<320fb6e00810010929y4dab07a5ya25767cc0818654d@mail.gmail.com>
	<320fb6e00902190225o34092311saddf02ec39f1e1dd@mail.gmail.com>
Message-ID: <320fb6e00903050458r5ef0e202l5e1a61031fb80c2@mail.gmail.com>

On Thu, Feb 19, 2009 at 10:25 AM, Peter <biopython at maubp.freeserve.co.uk> wrote:
>
> Since this thread last year, there have been no objections. ?Following
> a recent question on the main mailing list about how to determine the
> version of Biopython this seems worth doing before the next release.
> Again, an objections or comments on the implementation details?
> Otherwise I'll make this change shortly.
>

Changes made in CVS, and updated the release instructions:
http://biopython.org/wiki/Building_a_release

In between releases, should we leave the __version__ as is, or
explicitly update it to be something like "1.49+" just after releasing
1.49?  This only affects people installing Biopython from CVS, so they
should be technically inclined...

Peter


From bugzilla-daemon at portal.open-bio.org  Thu Mar  5 09:47:30 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 5 Mar 2009 09:47:30 -0500
Subject: [Biopython-dev] [Bug 2507] Adding __getitem__ to SeqRecord for
	element access and slicing
In-Reply-To: <bug-2507-42@http.bugzilla.open-bio.org/>
Message-ID: <200903051447.n25ElU37014276@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2507


biopython-bugzilla at maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED


------- Comment #14 from biopython-bugzilla at maubp.freeserve.co.uk  2009-03-05 09:47 EST -------
We seem to have reached agreement on the mailing list, so checking this patch
in, and marking this issue as fixed.

Note we may want to review the choice of name for the new
per-letter-annotations attribute (as long as this happens before the Biopython
1.50 release), currently this is letter_annotations as per a brief discussion
on the mailing list.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Thu Mar  5 09:47:43 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 5 Mar 2009 09:47:43 -0500
Subject: [Biopython-dev] [Bug 2551] Adding advanced __getitem__ to generic
	alignment, e.g. align[1:2, 5:-5]
In-Reply-To: <bug-2551-42@http.bugzilla.open-bio.org/>
Message-ID: <200903051447.n25ElhAb014302@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2551


Bug 2551 depends on bug 2507, which changed state.

Bug 2507 Summary: Adding __getitem__ to SeqRecord for element access and slicing
http://bugzilla.open-bio.org/show_bug.cgi?id=2507

           What    |Old Value                   |New Value
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Thu Mar  5 09:47:44 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 5 Mar 2009 09:47:44 -0500
Subject: [Biopython-dev] [Bug 2767] Bio.SeqIO support for FASTQ and QUAL
	files
In-Reply-To: <bug-2767-42@http.bugzilla.open-bio.org/>
Message-ID: <200903051447.n25EliM6014314@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2767


Bug 2767 depends on bug 2507, which changed state.

Bug 2507 Summary: Adding __getitem__ to SeqRecord for element access and slicing
http://bugzilla.open-bio.org/show_bug.cgi?id=2507

           What    |Old Value                   |New Value
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Thu Mar  5 10:31:17 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 5 Mar 2009 10:31:17 -0500
Subject: [Biopython-dev] [Bug 2778] Efficiency improvement in function
	Bio.SeqUtils.GC()
In-Reply-To: <bug-2778-42@http.bugzilla.open-bio.org/>
Message-ID: <200903051531.n25FVHOq018242@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2778


------- Comment #3 from bsouthey at gmail.com  2009-03-05 10:31 EST -------
(In reply to comment #2)
> I've checked that in, but with the existing code to catch a zero length
> sequence and return 0 instead of raising a ZeroDivisionError.
> 
> def GC(seq):
>     """Calculates G+C content, ..."""
>     gc=sum(map(seq.count,['G','C','g','c','S','s']))
>     if gc == 0: return 0
>     return gc*100.0/len(seq)
> 

I think that it is clearer to check that the sequence length is not zero rather
than assuming that if the sum is zero then the sequence length is also zero. 

def GC(seq):
    """Calculates G+C content, ..."""
   gc=sum(map(seq.count,['G','C','g','c','S','s']))
   if len(seq) > 0: 
      return gc*100.0/len(seq)
   else:
      return 0


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Thu Mar  5 10:51:20 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 5 Mar 2009 10:51:20 -0500
Subject: [Biopython-dev] [Bug 2778] Efficiency improvement in function
	Bio.SeqUtils.GC()
In-Reply-To: <bug-2778-42@http.bugzilla.open-bio.org/>
Message-ID: <200903051551.n25FpKGf020282@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2778


------- Comment #4 from lpritc at scri.sari.ac.uk  2009-03-05 10:51 EST -------
(In reply to comment #3)
> (In reply to comment #2)
> > I've checked that in, but with the existing code to catch a zero length
> > sequence and return 0 instead of raising a ZeroDivisionError.
> > 
> > def GC(seq):
> >     """Calculates G+C content, ..."""
> >     gc=sum(map(seq.count,['G','C','g','c','S','s']))
> >     if gc == 0: return 0
> >     return gc*100.0/len(seq)
> > 
> 
> I think that it is clearer to check that the sequence length is not zero rather
> than assuming that if the sum is zero then the sequence length is also zero. 
> 
> def GC(seq):
>     """Calculates G+C content, ..."""
>    gc=sum(map(seq.count,['G','C','g','c','S','s']))
>    if len(seq) > 0: 
>       return gc*100.0/len(seq)
>    else:
>       return 0

It would probably be clearest, quickest and most efficient to comment that
particular line of the code to point out that it does elegant double-duty as a
check for zero sequence length ;)


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Thu Mar  5 10:56:38 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 5 Mar 2009 10:56:38 -0500
Subject: [Biopython-dev] [Bug 2778] Efficiency improvement in function
	Bio.SeqUtils.GC()
In-Reply-To: <bug-2778-42@http.bugzilla.open-bio.org/>
Message-ID: <200903051556.n25Fuc13020807@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2778


------- Comment #5 from biopython-bugzilla at maubp.freeserve.co.uk  2009-03-05 10:56 EST -------
(In reply to comment #3)
> I think that it is clearer to check that the sequence length is
> not zero rather than assuming that if the sum is zero then the
> sequence length is also zero. 

I agree, but had chosen to keep the old code.

> def GC(seq):
>     """Calculates G+C content, ..."""
>    gc=sum(map(seq.count,['G','C','g','c','S','s']))
>    if len(seq) > 0: 
>       return gc*100.0/len(seq)
>    else:
>       return 0
> 

Your length test isn't very elegant, this is much nicer/more pythonic I think:

    if seq :
        gc = sum(map(seq.count,['G','C','g','c','S','s']))
        return gc*100.0/len(seq)
    else :
        return 0

However, given most of the time the sequence will not be empty, this should be
faster:

    try :
        gc = sum(map(seq.count,['G','C','g','c','S','s']))
        return gc*100.0/len(seq)
    except ZeroDivisionError :
        return 0

CVS updated.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Thu Mar  5 11:04:07 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 5 Mar 2009 11:04:07 -0500
Subject: [Biopython-dev] [Bug 2551] Adding advanced __getitem__ to generic
	alignment, e.g. align[1:2, 5:-5]
In-Reply-To: <bug-2551-42@http.bugzilla.open-bio.org/>
Message-ID: <200903051604.n25G471v021470@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2551


------- Comment #4 from biopython-bugzilla at maubp.freeserve.co.uk  2009-03-05 11:04 EST -------
Created an attachment (id=1257)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1257&action=view)
Patch for Bio/Align/Generic.py to support array like access

This requires the patch to the SeqRecord __getitem__ method just committed to
CVS for Bug 2507.  This includes an extended doctest which tries to illustrate
the typical usage I expect.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bsouthey at gmail.com  Thu Mar  5 11:59:08 2009
From: bsouthey at gmail.com (Bruce Southey)
Date: Thu, 5 Mar 2009 10:59:08 -0600
Subject: [Biopython-dev] determining the version
In-Reply-To: <320fb6e00903050458r5ef0e202l5e1a61031fb80c2@mail.gmail.com>
References: <320fb6e00809241412r54c2a3a1mc69f3e573f1eaac7@mail.gmail.com>
	<63700.34226.qm@web62405.mail.re1.yahoo.com>
	<320fb6e00809250222h3d0d15bw763446b5f0ec44d1@mail.gmail.com>
	<320fb6e00810010929y4dab07a5ya25767cc0818654d@mail.gmail.com>
	<320fb6e00902190225o34092311saddf02ec39f1e1dd@mail.gmail.com>
	<320fb6e00903050458r5ef0e202l5e1a61031fb80c2@mail.gmail.com>
Message-ID: <bbcd77d00903050859o52da154en6df06b51fb9ef1d@mail.gmail.com>

On Thu, Mar 5, 2009 at 6:58 AM, Peter <biopython at maubp.freeserve.co.uk> wrote:
> On Thu, Feb 19, 2009 at 10:25 AM, Peter <biopython at maubp.freeserve.co.uk> wrote:
>>
>> Since this thread last year, there have been no objections. ?Following
>> a recent question on the main mailing list about how to determine the
>> version of Biopython this seems worth doing before the next release.
>> Again, an objections or comments on the implementation details?
>> Otherwise I'll make this change shortly.
>>
>
> Changes made in CVS, and updated the release instructions:
> http://biopython.org/wiki/Building_a_release
>
> In between releases, should we leave the __version__ as is, or
> explicitly update it to be something like "1.49+" just after releasing
> 1.49? ?This only affects people installing Biopython from CVS, so they
> should be technically inclined...
>
> Peter
>


I agree that it would be helpful to distinguish between an official
release and a build from the CVS. Furthermore, it would then be
important to know when the build from CVS was done at least relative
to the official releases.

So I think you tending to have a numbering scheme like:
1.49 is an official release
1.49+ (or similar) is CVS after the 1.49 official release but before
the next official release 1.50.
1.50 will be an official release
1.50+ (or similar) is the CVS after the 1.50 official release but
before the next official release whatever number it will be.

If so the release instructions should also include an instruction to
change the CVS numbering in the version in __init__.py files after
release has been made.

Also, after looking at the release instructions shouldn't BioSQL and
Doc also have version-related information?
Ideally the Biopython BioSQL code should have some connection to the
main version of BioSQL - I don't use it so it is not an issue for me
(yet).

Bruce


From biopython at maubp.freeserve.co.uk  Thu Mar  5 12:50:04 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Thu, 5 Mar 2009 17:50:04 +0000
Subject: [Biopython-dev] determining the version
In-Reply-To: <bbcd77d00903050859o52da154en6df06b51fb9ef1d@mail.gmail.com>
References: <320fb6e00809241412r54c2a3a1mc69f3e573f1eaac7@mail.gmail.com>
	<63700.34226.qm@web62405.mail.re1.yahoo.com>
	<320fb6e00809250222h3d0d15bw763446b5f0ec44d1@mail.gmail.com>
	<320fb6e00810010929y4dab07a5ya25767cc0818654d@mail.gmail.com>
	<320fb6e00902190225o34092311saddf02ec39f1e1dd@mail.gmail.com>
	<320fb6e00903050458r5ef0e202l5e1a61031fb80c2@mail.gmail.com>
	<bbcd77d00903050859o52da154en6df06b51fb9ef1d@mail.gmail.com>
Message-ID: <320fb6e00903050950k4d0cce9i1fe1442e15cf9cf7@mail.gmail.com>

On Thu, Mar 5, 2009 at 4:59 PM, Bruce Southey <bsouthey at gmail.com> wrote:
>
> I agree that it would be helpful to distinguish between an official
> release and a build from the CVS. Furthermore, it would then be
> important to know when the build from CVS was done at least relative
> to the official releases.
>
> So I think you tending to have a numbering scheme like:
> 1.49 is an official release
> 1.49+ (or similar) is CVS after the 1.49 official release but before
> the next official release 1.50.
> 1.50 will be an official release
> 1.50+ (or similar) is the CVS after the 1.50 official release but
> before the next official release whatever number it will be.

That is one of the two suggestions I was putting forward.  The other
was just leaving the version number as that of the most recent release
- people should know if they are running CVS as this has to be done
deliberately.

One tiny downside is the "+" gets turned into an underscore for
filenames (e.g. egg files, and I assume a windows installer), but we
won't be releasing those so that doesn't matter.

> If so the release instructions should also include an instruction to
> change the CVS numbering in the version in __init__.py files after
> release has been made.

Yes - assuming people are happy with this suggested scheme.

Note that if we switch to SVN, something automated with the SVN
revision number might be possible.

> Also, after looking at the release instructions shouldn't BioSQL and
> Doc also have version-related information?
> Ideally the Biopython BioSQL code should have some connection to the
> main version of BioSQL - I don't use it so it is not an issue for me
> (yet).

Because Bio/* and BioSQL/* are always shipped and packaged together,
to my mind they together make up Biopython and share the same version
number.  As to why BioSQL is top level rather than being Bio.BioSQL,
it was long ago and I have no idea.

For the documentation, recent releases of the tutorial have included
the target version of Biopython together with the date.  Again, this
should be in the release instructions.

Peter

From bugzilla-daemon at portal.open-bio.org  Thu Mar  5 12:54:01 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 5 Mar 2009 12:54:01 -0500
Subject: [Biopython-dev] [Bug 2767] Bio.SeqIO support for FASTQ and QUAL
	files
In-Reply-To: <bug-2767-42@http.bugzilla.open-bio.org/>
Message-ID: <200903051754.n25Hs1cW030546@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2767


biopython-bugzilla at maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
Attachment #1251 is|0                           |1
           obsolete|                            |


------- Comment #9 from biopython-bugzilla at maubp.freeserve.co.uk  2009-03-05 12:54 EST -------
Created an attachment (id=1258)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1258&action=view)
Read/write support for FASTQ and QUAL files, using the letter_annotations dict  

Small update to earlier version, with minor comment changes.

Also includes explicit rounding of scores to the nearest integer when writing
out PHRED scores in Solexa format (and vice versa).  This conversion still
needs verifying against real world examples.

I've been testing with real world PHRED based files only so far.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Fri Mar  6 11:08:50 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 6 Mar 2009 11:08:50 -0500
Subject: [Biopython-dev] [Bug 2779] Seq.count() docstring should note
	unexpected behaviour
In-Reply-To: <bug-2779-42@http.bugzilla.open-bio.org/>
Message-ID: <200903061608.n26G8oL9003353@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2779


biopython-bugzilla at maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED


------- Comment #9 from biopython-bugzilla at maubp.freeserve.co.uk  2009-03-06 11:08 EST -------
I have updated the docstrings in CVS to stress that like the python string a
non-overlapping count is used, marking this bug as fixed.

>From the mailing list discussion having a overlapping count available would be
a welcome enhancement, perhaps as a separate method, e.g. overlapping_count. 
Leighton's patch or Sebastian's code in Bio/SeqUtils/MeltingTemp.py could be
used for the implementation.  We can do this on a new enhancement bug, once a
consensus is reached on the mailing list.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Fri Mar  6 12:34:58 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 6 Mar 2009 12:34:58 -0500
Subject: [Biopython-dev] [Bug 2783] New: Using alternative start codons in
	Bio.Seq translate method/function
Message-ID: <bug-2783-42@http.bugzilla.open-bio.org/>

http://bugzilla.open-bio.org/show_bug.cgi?id=2783

           Summary: Using alternative start codons in Bio.Seq translate
                    method/function
           Product: Biopython
           Version: Not Applicable
          Platform: All
        OS/Version: All
            Status: NEW
          Severity: enhancement
          Priority: P2
         Component: Main Distribution
        AssignedTo: biopython-dev at biopython.org
        ReportedBy: biopython-bugzilla at maubp.freeserve.co.uk


This bug covers an issue originally raised on Bug 2381.  This bug is
specifically for how to translates a CDS using a non-standard start codon (a
codon which doesn't normally encode methionine).

In computing, we often blindly translate without worrying about start codons. 
For example, you might translated a whole genomes (in all six frames) as part
of looking for open reading frames.  Translating a partial CDS where the start
is missing is another example.  The current Bio.Seq translation functionality
supports these usages.

In real biology however, translation from RNA to amino acids always starts at a
initiation/start codon (typically AUG) which becomes the methionine at the
start of the protein.  In eukaryotes, usually the only start codon is AUG, and
it normally encodes methionine, so this doesn't seem special.  However, in many
organisms there are lots of genes with a alternative start/initiation codons
which do NOT normally encode methionine.  However, when they are used as a
start/initiation code they DO get translated as methionine!

For example, there are 418 annotated genes in E. coli K12 with non-standard
start codons - which you might want to translate into proteins (which *should*
start with a methionine).

For example, using the following NCBI FASTA file of CDS sequences,
ftp://ftp.ncbi.nih.gov/genomes/Bacteria/Escherichia_coli_K12_substr__MG1655

Here is the CDS for gene yaaX:

>ref|NC_000913.2|:5234-5530
GTGAAAAAGATGCAATCTATCGTACTCGCACTTTCCCTGGTTCTGGTCGCTCCCATGGCA
GCACAGGCTGCGGAAATTACGTTAGTCCCGTCAGTAAAATTACAGATAGGCGATCGTGAT
AATCGTGGCTATTACTGGGATGGAGGTCACTGGCGCGACCACGGCTGGTGGAAACAACAT
TATGAATGGCGAGGCAATCGCTGGCACCTACACGGACCGCCGCCACCGCCGCGCCACCAT
AAGAAAGCTCCTCATGATCATCACGGCGGTCATGGTCCAGGCAAACATCACCGCTAA

This starts GTC which is a valid bacterial start codon.  I'd like to be able to
translate this and get the actual biologically relevant protein as given in the
GenBank file NC_000913.gbk (with or without the stop symbol at the end), which
starts with "M" not "V":

     CDS             5234..5530
                     /gene="yaaX"
                     /locus_tag="b0005"
                     /codon_start=1
                     /transl_table=11
                     /product="predicted protein"
                     /protein_id="NP_414546.1"
                     /db_xref="ASAP:ABE-0000015"
                     /db_xref="UniProtKB/Swiss-Prot:P75616"
                     /db_xref="GI:16127999"
                     /db_xref="ECOCYC:G6081"
                     /db_xref="EcoGene:EG14384"
                     /db_xref="GeneID:944747"
                     /translation="MKKMQSIVLALSLVLVAPMAAQAAEITLVPSVKLQIGDRDNRGY
                     YWDGGHWRDHGWWKQHYEWRGNRWHLHGPPPPPRHHKKAPHDHHGGHGPGKHHR"

Without any non-standard start codon support, my translations start with a V
(rather than the desired M):

>>> from Bio.Seq import Seq
>>> yaaX = Seq("GTGAAAAAGATGCAATCTATCGTACTCGCACTTTCCCTGGTTCTGGTCGCTCCCATGGCA"
...            "GCACAGGCTGCGGAAATTACGTTAGTCCCGTCAGTAAAATTACAGATAGGCGATCGTGAT"
...            "AATCGTGGCTATTACTGGGATGGAGGTCACTGGCGCGACCACGGCTGGTGGAAACAACAT"
...            "TATGAATGGCGAGGCAATCGCTGGCACCTACACGGACCGCCGCCACCGCCGCGCCACCAT"
...            "AAGAAAGCTCCTCATGATCATCACGGCGGTCATGGTCCAGGCAAACATCACCGCTAA")
>>> print yaaX.translate(table=11)
VKKMQSIVLALSLVLVAPMAAQAAEITLVPSVKLQIGDRDNRGYYWDGGHWRDHGWWKQHYEWRGNRWHLHGPPPPPRHHKKAPHDHHGGHGPGKHHR*
>>> print yaaX.translate(table=11, to_stop=True)
VKKMQSIVLALSLVLVAPMAAQAAEITLVPSVKLQIGDRDNRGYYWDGGHWRDHGWWKQHYEWRGNRWHLHGPPPPPRHHKKAPHDHHGGHGPGKHHR

These start with "V", while in this situation I want an "M" because I know this
is a full CDS and the first codon is a start codon.

I therefore want to add an optional argument to the Seq object's translate
method (and the Bio.Seq.translate function) so that I can obtain the desired
results (both with and without the terminator stop symbol).  I want an option
to tell Biopython that this sequence commences with a start/initiation codon:

>>> print yaaX.translate(table=11, with_start_codon=True)
MKKMQSIVLALSLVLVAPMAAQAAEITLVPSVKLQIGDRDNRGYYWDGGHWRDHGWWKQHYEWRGNRWHLHGPPPPPRHHKKAPHDHHGGHGPGKHHR*
>>> print yaaX.translate(table=11, to_stop=True, with_start_codon=True)
MKKMQSIVLALSLVLVAPMAAQAAEITLVPSVKLQIGDRDNRGYYWDGGHWRDHGWWKQHYEWRGNRWHLHGPPPPPRHHKKAPHDHHGGHGPGKHHR

I have in the above example called this new argument "with_start_codon", but I
am open to naming suggestions.  If False (default), then nothing changes.  If
the new argument is True, this indicates that the first codon should be a valid
start/initiation codon (in the declared translation table), and that it should
be translated as a methionine.

I will upload a patch implementing this in a moment...

This proposal is NOT about an option to have the translate function/method
search the sequence for the first valid start codon (either in frame or not).

This proposal is NOT about an option to check the sequence is a valid CDS (i.e.
starts with a start codon, ends with an in frame stop codon, and has no
internal premature stop codons), and then translating it.  While this makes
sense (and BioPerl does this), this would prevent certain uses.  e.g. a partial
CDS sequence where the 3' end is missing.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Fri Mar  6 12:36:24 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 6 Mar 2009 12:36:24 -0500
Subject: [Biopython-dev] [Bug 2783] Using alternative start codons in
	Bio.Seq translate method/function
In-Reply-To: <bug-2783-42@http.bugzilla.open-bio.org/>
Message-ID: <200903061736.n26HaOWH012440@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2783


------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk  2009-03-06 12:36 EST -------
Created an attachment (id=1259)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1259&action=view)
Patch for Bio/Seq.py to support non-standard start codons in translation

Patch implementing my proposed change, based my earlier patch attachment 1040
on Bug 2381.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Fri Mar  6 12:38:39 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 6 Mar 2009 12:38:39 -0500
Subject: [Biopython-dev] [Bug 2381] translate and transcribe methods for the
	Seq object (in Bio.Seq)
In-Reply-To: <bug-2381-42@http.bugzilla.open-bio.org/>
Message-ID: <200903061738.n26Hcd04012626@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2381


biopython-bugzilla at maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED


------- Comment #55 from biopython-bugzilla at maubp.freeserve.co.uk  2009-03-06 12:38 EST -------
I'm closing this bug as basic translate and transcribe methods where included
with Biopython 1.49.

I have filed Bug 2381 for "Using alternative start codons in Bio.Seq translate
method/function".


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Fri Mar  6 12:43:25 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 6 Mar 2009 12:43:25 -0500
Subject: [Biopython-dev] [Bug 2783] Using alternative start codons in
	Bio.Seq translate method/function
In-Reply-To: <bug-2783-42@http.bugzilla.open-bio.org/>
Message-ID: <200903061743.n26HhPRX013186@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2783


------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk  2009-03-06 12:43 EST -------
(In reply to comment #1)
> Created an attachment (id=1259)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1259&action=view) [details]
> Patch for Bio/Seq.py to support non-standard start codons in translation
> 
> Patch implementing my proposed change, based my earlier patch
> attachment 1040 [details] on Bug 2381.

Actually, it was based on the patch in attachment 1032 (not 1040) on Bug 2381.

Other names proposed for this new argument included:

init - rejected as potentially confusing

force_methionine - possible, but implies any codon would be allowed even
something that isn't a valid start codon

alt_start - perhaps confusing?


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Fri Mar  6 14:54:17 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 6 Mar 2009 14:54:17 -0500
Subject: [Biopython-dev] [Bug 2783] Using alternative start codons in
	Bio.Seq translate method/function
In-Reply-To: <bug-2783-42@http.bugzilla.open-bio.org/>
Message-ID: <200903061954.n26JsHK4026141@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2783


------- Comment #3 from eric.talevich at gmail.com  2009-03-06 14:54 EST -------
(In reply to comment #2)

How about require_start? Or require_met, if you don't mind how strange it looks
as English. The name with_start_codon seems like it would take a codon or
alternate table as the argument.

I also see two choices being made by using this parameter:
(1) Check that the sequence starts with a valid start codon, and if not, raise
an exception;
(2) Use a set of alternate genetic codes for looking up the initial methionine.

>From the other bug's discussion it seems like there are a number of boolean
options that could reasonably be used with the translate() method, but adding
them all as keyword args would clutter up the API. What about using a bitmask
in Bio.Seq that can be used with translate()? The re module takes a bitmask as
the last parameter for most functions, for example, and it looks pretty clean
compared to a series of boolean keyword args.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From mjldehoon at yahoo.com  Sun Mar  8 08:03:31 2009
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Sun, 8 Mar 2009 05:03:31 -0700 (PDT)
Subject: [Biopython-dev] ScanProsite
In-Reply-To: <320fb6e00903020226n3e5929ean957f38315c28d863@mail.gmail.com>
Message-ID: <956971.84123.qm@web62404.mail.re1.yahoo.com>


--- On Mon, 3/2/09, Peter <biopython at maubp.freeserve.co.uk> wrote:
> I like your suggestion to have a REST XML based module
> under Bio.ExPASy, which means we can deprecate the HTML based
> Bio.Prosite module and in the process make the top level list of
> modules in Biopython a bit shorter.  In the long term I think that
> will help people find functionality.
> 
Then, how about the following code organization:

Bio/ExPASy/__init__.py contains 
get_prodoc_entry  Interface to the get-prodoc-entry CGI script.
get_prosite_entry Interface to the get-prosite-entry CGI script.
get_prosite_raw   Interface to the get-prosite-raw CGI script.
get_sprot_raw     Interface to the get-sprot-raw CGI script.
sprot_search_ful  Interface to the sprot-search-ful CGI script.
sprot_search_de   Interface to the sprot-search-de CGI script.
(currently in Bio/ExPASy.py)

Bio/ExPASy/Prosite.py contains read(), parse(), Record for Prosite files
(currently in Bio/Prosite/__init__.py), as well as a Pattern class to handle Prosite patterns (currently in Bio/Prosite/Pattern.py, but this seems to be unused).

Bio/ExPASy/Prodoc.py contains read(), parse(), Record for Prosite documentation files
(currently in Bio/Prosite/Prodoc.py)

Bio/ExPASy/ScanProsite contains scan(), read(), Record to interact with ScanProsite
(currently a broken version to access ScanProsite and parse its results exists in Bio/ExPASy.py and Bio/Prosite/__init__.py).

I have a simplified version of the Prosite and Prodoc parsers. If we use the scheme above, I'll put the new version in Bio/ExPASy/Prosite.py and Bio/ExPASy/Prodoc.py, and deprecate Bio.Prosite.

--Michiel.


From biopython at maubp.freeserve.co.uk  Tue Mar 10 16:29:54 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Tue, 10 Mar 2009 20:29:54 +0000
Subject: [Biopython-dev] [Utilities-announce] NCBI E-Utilities
	requirements updated
In-Reply-To: <320fb6e00903050440h138893b3yb2484a557621fc41@mail.gmail.com>
References: <7B6F170840CA6C4DA63EE0C8A7BB43EC051985F6@NIHCESMLBX15.nih.gov>
	<320fb6e00903050440h138893b3yb2484a557621fc41@mail.gmail.com>
Message-ID: <320fb6e00903101329i69e40fc0i6a2b13332df55e7a@mail.gmail.com>

On Thu, Mar 5, 2009 at 12:40 PM, Peter <biopython at maubp.freeserve.co.uk> wrote:
> This email was sent out a few weeks ago, but it took a while before
> the NCBI webpage was actually updated (maybe a caching issue) so I
> didn't rush to relax our rules immediately.
>
> Under the new rules we must make no more than three requests every
> second.  We could track the times of the last two requests in order to
> enforce this as worded, but I think it would be simpler just to switch
> from using a minimum 3 second pause between Bio.Entrez requests to
> just a minimum 0.33334 second pause.  This is a much simpler code
> change and will comply with the new relaxed rules.
>
> Unless anyone has a counter suggestion, I will update Bio.Entrez and
> the tutorial shortly.

Change made in CVS, including the tutorial.

Peter

From bugzilla-daemon at portal.open-bio.org  Tue Mar 10 16:36:28 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 10 Mar 2009 16:36:28 -0400
Subject: [Biopython-dev] [Bug 2762] GFF capability in SeqIO
In-Reply-To: <bug-2762-42@http.bugzilla.open-bio.org/>
Message-ID: <200903102036.n2AKaSje008217@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2762


------- Comment #5 from biopython-bugzilla at maubp.freeserve.co.uk  2009-03-10 16:36 EST -------
For anyone following this bug, Brad has some related code posted on his blog -
see this mailing list discussion:
http://lists.open-bio.org/pipermail/biopython/2009-March/004983.html


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Tue Mar 10 16:49:30 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 10 Mar 2009 16:49:30 -0400
Subject: [Biopython-dev] [Bug 2783] Using alternative start codons in
	Bio.Seq translate method/function
In-Reply-To: <bug-2783-42@http.bugzilla.open-bio.org/>
Message-ID: <200903102049.n2AKnUoD009300@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2783


------- Comment #4 from biopython-bugzilla at maubp.freeserve.co.uk  2009-03-10 16:49 EST -------
On comment #3, Eric wrote:
> 
> How about require_start? Or require_met, if you don't mind how strange
> it looks as English. The name with_start_codon seems like it would take
> a codon or alternate table as the argument.

I think "require_start" is OK.  Or "require_start_codon".

> I also see two choices being made by using this parameter:
> (1) Check that the sequence starts with a valid start codon, and
> if not, raise an exception;

That is what my patch does.  Plus of course translating the valid start
codon as a methionine.

> (2) Use a set of alternate genetic codes for looking up the initial
> methionine.

I'm unsure what you mean here.  If you mean actually having the
translate method search for the first valid start codon, I am
really not keen on this at all. This is complicated, and verges
on gene/ORF finding, which I specifically wanted to avoid:

Peter wrote in comment #0:
>> This proposal is NOT about an option to have the translate
>> function/method search the sequence for the first valid
>> start codon (either in frame or not).

On comment #3, Eric wrote:
> From the other bug's discussion it seems like there are a number of boolean
> options that could reasonably be used with the translate() method, but adding
> them all as keyword args would clutter up the API. What about using a bitmask
> in Bio.Seq that can be used with translate()? The re module takes a bitmask as
> the last parameter for most functions, for example, and it looks pretty clean
> compared to a series of boolean keyword args.

I agree that there is a risk of confusion with too many arguments.  But I don't
think a bitmask would help - I think it makes it worse!  I'm not saying its a
good thing, but we have lots of functions/methods in Biopython already with
lots of arguments (e.g. the standalone BLAST wrappers, or the Bio.Entrez
functions).  On the other hand, I can't immediately think of a single python
function which uses a bitmask.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From biopython at maubp.freeserve.co.uk  Tue Mar 10 19:40:29 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Tue, 10 Mar 2009 23:40:29 +0000
Subject: [Biopython-dev] Bio.Entrez catching more errors
Message-ID: <320fb6e00903101640s5db8ed9hc1335d02f5e4123@mail.gmail.com>

Hi All,

It occured to me that the Bio.Entrez._open function can look at the
retmode argument (if present) and spot if there is a mismatch between
the requested format (e.g. XML, HTML, text or asn.1) and the actual
data the NCBI returned.  Something along the following lines could be
added to the end of the _open function in Bio/Entrez/__init__.py to
acheive this:

    elif "retmode" in params and params["retmode"].lower()=="html" \
    and not data.lower().startswith("<html") \
    and not data.lower().startswith("<!doctype html") :
        raise TypeError("Requested HTML, but didn't get it: %s..." % data)
    elif "retmode" in params and params["retmode"].lower()=="xml" \
    and not data.lower().startswith("<?xml") :
        raise TypeError("Requested XML, but didn't get it: %s..." % data)
    elif "retmode" in params and params["retmode"] \
    and params["retmode"].lower()!="xml" \
    and data.lower().startswith("<?xml") :
        raise TypeError("Didn't request XML, but got it: %s..." % data)
    elif "retmode" in params and params["retmode"] \
    and params["retmode"].lower()!="html" \
    and (data.lower().startswith("<html") or \
         data.lower().startswith("<!doctype html")):
        #Expected for some error pages (e.g. the Bad Gateway caught above)
        raise TypeError("Didn't request HTML, but got it: %s..." % data)

I'm sure my XML/HTML detection could be made more robust here - I hope
the principle is clear.  My motivation is that I have noticed the NCBI
can return HTML error pages, and while we do catch some of these
explicitly (e.g. Bad Gateway, or Service Unavailable), I think any
HTML page when the user asked from XML, text or asn.1 should be
treated as error.  Similarly, not getting XML when you ask for it etc.

Note that by raising the exception including the message text it
should be much easier to diagnose these failures.  As a tiny
refinement to the above code, we should only add the "..." if there is
more text to follow - this isn't always the case.

e.g. The following give an HTML error page (while some databases like
"protein" are better behaved in this respect):
>>> print Entrez.efetch(db="homologene", id="nonexistant", retmode="text").read()
>>> print Entrez.efetch(db="homologene", id="nonexistant", retmode="asn.1").read()

Similarly, these give an XML like fragment (which is not a valid XML
file in itself - arguably an NCBI bug; some databases like "protein"
are better behaved in this respect):
>>> print Entrez.efetch(db="pubmed", id="nonexistant", retmode="xml").read()
>>> print Entrez.efetch(db="homologene", id="nonexistant", retmode="xml").read()
>>> print Entrez.efetch(db="cdd", id="nonexistant", retmode="xml").read()
>>> print Entrez.efetch(db="taxonomy", id="nonexistant", retmode="xml").read()

My suggested change to Bio.Entrez would also catch the following
examples (using an invalid database) where the NCBI ignore the retmode
and return an HTML help page:
>>> print Entrez.efetch(db="nonexistant", id="123456", retmode="xml").read()
>>> print Entrez.efetch(db="nonexistant", id="123456", retmode="text").read()

In a less clear cut example, this would flag the following as an error
as the NCBI seem to return ASN.1 text instead of HTML here::
>>> print Entrez.efetch(db="nucleotide", retmode="html", id="123456").read()

Overall, I think this change should catch lots of errors which
otherwise may not be detected until later (e.g. while trying to parse
the file).

--------------------------------------------------------------------------------------------------

On another point, should we catch these responses as errors:?

>>> efetch(db="snp", id="123456").read()
'<html><head><title>PmFetch response</title></head><body>\n<pre>\n1:
id: 123456 Error occurred: cannot get document
summary\n</pre></body></html>'
>>> efetch(db="snp", id="123456", retmode="html").read()
'<html><head><title>PmFetch response</title></head><body>\n<pre>\n1:
id: 123456 Error occurred: cannot get document
summary\n</pre></body></html>'
>>> efetch(db="snp", id="123456", retmode="xml").read()
'<?xml version="1.0"?>\n<ExchangeSet
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"\nxmlns="http://www.ncbi.nlm.nih.gov/SNP/docsum"\nxsi:schemaLocation="http://www.ncbi.nlm.nih.gov/SNP/docsum\nhttp://www.ncbi.nlm.nih.gov/SNP/docsum/eudocsum.xsd">1:
id: 123456 Error occurred: cannot get document
summary\n\n</ExchangeSet>'
>>> efetch(db="snp", id="123456", retmode="text").read()
'1: id: 123456 Error occurred: cannot get document summary\n'

and,
>>> print efetch(db="homologene", retmode="html", id="fake").read()
<html>
<body>
<br/><h2>Error occurred: Empty id list - nothing todo</h2>...

Looking for the string "Error occurred: " looks fairly safe here, and
should cover a range of entries.  Of course, you can imagine false
positives too, e.g. a valid PUBMED plain text record for a tutorial
article with a title like "Yikes! An Error Occurred: A beginner's
Guide To Defensive Programming." could match.

Peter

From lorena.carlo at gmail.com  Wed Mar 11 11:58:24 2009
From: lorena.carlo at gmail.com (=?ISO-8859-1?Q?Lorena_Carl=F3?=)
Date: Wed, 11 Mar 2009 09:58:24 -0600
Subject: [Biopython-dev] function to map uniprot IDs with PDB IDs
Message-ID: <22d7b0c30903110858x4125dc25v24fafec3d561209e@mail.gmail.com>

Hi all,

I would like to know if there is an implemented function in Biopython that
allows getting the PDB id from a Uniprotkb ID?.

Thanks,
Lorena

From biopython at maubp.freeserve.co.uk  Wed Mar 11 12:12:36 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Wed, 11 Mar 2009 16:12:36 +0000
Subject: [Biopython-dev] function to map uniprot IDs with PDB IDs
In-Reply-To: <22d7b0c30903110858x4125dc25v24fafec3d561209e@mail.gmail.com>
References: <22d7b0c30903110858x4125dc25v24fafec3d561209e@mail.gmail.com>
Message-ID: <320fb6e00903110912g717ccb52q4242a6ff169b5d1f@mail.gmail.com>

On Wed, Mar 11, 2009 at 3:58 PM, Lorena Carl? <lorena.carlo at gmail.com> wrote:
> Hi all,
>
> I would like to know if there is an implemented function in Biopython that
> allows getting the PDB id from a Uniprotkb ID?.
>
> Thanks,
> Lorena

There isn't a simple one-to-one mapping from a UniProtKB/Swiss-Prot ID
to a PDB ID, see
http://www.uniprot.org/faq/2

Are you working from UniProtKB/Swiss-Prot files?  How about something like this:

# This assumes you have downloaded the following file
# to your working directory:
# http://www.uniprot.org/uniprot/P00734.txt
from Bio import SeqIO
record = SeqIO.read(open("P00734.txt"),"swiss")
for xref in record.dbxrefs :
    if xref.startswith("PDB:") :
        print xref.split(":",1)[1]

Peter

P.S. This is more a question for the main discussion list, rather than
Biopython development


From bugzilla-daemon at portal.open-bio.org  Wed Mar 11 19:39:02 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 11 Mar 2009 19:39:02 -0400
Subject: [Biopython-dev] [Bug 2788] New: Bio.Nexus.Trees newick parser crash
Message-ID: <bug-2788-42@http.bugzilla.open-bio.org/>

http://bugzilla.open-bio.org/show_bug.cgi?id=2788

           Summary: Bio.Nexus.Trees newick parser crash
           Product: Biopython
           Version: 1.49
          Platform: Macintosh
        OS/Version: Mac OS
            Status: NEW
          Severity: normal
          Priority: P2
         Component: Main Distribution
        AssignedTo: biopython-dev at biopython.org
        ReportedBy: matzke at berkeley.edu


The newick files I have been working with seem to open fine in several
different programs/packages (Dendroscope, R's APE package, phylocom, python
alfacinha module), but not the newick parser in Bio.Nexus.Trees.

Rather than upload a file I've got the full newick string hard-coded below:

============
from Bio.Nexus.Trees import Tree

tree_str =
'(((((((((((((((((Sambucus:43.136024,Viburnum:43.136040)Adoxaceae:53.892513,(Acanthopanax:34.719704,Aralia:34.719727,Dendropanax:34.719727,Evodiopanax:34.719727,Kalopanax:34.719727,Schefflera:34.719727)Araliaceae:62.308830):7.045975,Ilex:104.074516):3.056864,((((((Catalpa:22.623766,Paulownia:22.623785)Bignoniaceae:22.623766,(Clerodendrum:19.864199,Premna:19.864218)Verbenaceae:25.383331):22.378326,(Chionanthus:29.443968,Forestiera:29.443979,Fraxinus:29.443979,Ligustrum:29.443979,Osmanthus:29.443979,Syringa:29.443979)Oleaceae:38.181892):19.113832,(Adina:38.252457,Cephalanthus:38.252472,Emmenopterys:38.252472,Pinckneya:38.252472,Randia:38.252472)Rubiaceae:48.487236):2.360018,Ehretia:89.099709):13.495450,Eucommia:102.595161):4.536214):0.905059,((((Clethra:78.134140,((Cliftonia:38.402752,Cyrilla:38.402775)Cyrillaceae:38.402752,(Arbutus:38.402752,Elliottia:38.402775,Enkianthus:38.402775,Kalmia:38.402775,Lyonia:38.402775,Oxydendrum:38.402775,Rhododendron:38.402775,Vaccinium:38.402775)Ericaceae:38.402752):1.328631):12.980787,(((Halesia:30.391993,Pterostyrax:30.392012,Styrax:30.392012)Styracaceae:51.775261,Symplocos:82.167252):0.000000,(Camellia:41.083626,Franklinia:41.083649,Gordonia:41.083649,Stewartia:41.083649,Ternstroemia:41.083649)Theaceae:41.083626):8.947675):0.000149,Diospyros:91.115099):2.023849,((Ardisia:18.344650,Myrsine:18.344666)Myrsinaceae:74.794174,Bumelia:93.138824):0.000101):14.897509):1.462594,((Alangium:48.167362,Aucuba:48.167370,Cornus:48.167370,Macrocarpium:48.167370,Torricellia:48.167370)Cornaceae:53.025345,(Hydrangea:97.032310,(Davidia:48.516151,Nyssa:48.516167)Nyssaceae:48.516151):4.160399):8.306321):7.064716,Schoepfia:116.563736):0.000000,((((Altingia:50.813206,Liquidambar:50.813213)Altingiaceae:50.813206,(Disanthus:50.813206,Distylium:50.813213,Fortuneria:50.813213,Hamamelis:50.813213,Loropetalum:50.813213,Sinowilsonia:50.813213)Hamamelidaceae:50.813206):0.000131,(Cercidiphyllum:87.828712,Daphniphyllum:87.828712):13.797829):13.247040,(((((((Choerospondias:21.440735,Cotinus:21.440742,Pist!
 acia:21.
440742,Rhus:21.440742,Toxicodendron:21.440742)Anacardiaceae:37.304596,(Acer:29.372665,Aesculus:29.372681,Dipteronia:29.372681,Koelreuteria:29.372681,Sapindus:29.372681)Sapindaceae:29.372665):0.000114,((Cedrela:49.350353,(Ailanthus:24.675177,Leitneria:24.675188,Picrasma:24.675188)Simaroubaceae:24.675177):4.016092,(Evodia:26.683222,Phellodendron:26.683233,Ptelea:26.683233,Zanthoxylum:26.683233)Rutaceae:26.683222):5.379002):29.842871,(Firmiana:32.917126,Tilia:32.917149)Malvaceae:55.671188):12.661992,(Lagerstroemia:84.110847,Szyzygium:84.110847):17.139463):2.612011,((((((Alnus:16.609535,Betula:16.609543,Carpinus:16.609543,Corylus:16.609543,Ostrya:16.609543)Betulaceae:37.306709,((Carya:25.504854,Cyclocarya:25.504866,Juglans:25.504866,Engelhardtia:25.504866,Platycarya:25.504866,Pterocarya:25.504866)Juglandaceae:25.504854,Myrica:51.009708):2.906531):9.893459,(Castanea:31.904850,Castanopsis:31.904873,Cyclobalanopsis:31.904873,Fagus:31.904873,Lithocarpus:31.904873,Quercus:31.904873)Fagaceae:31.904850):21.681023,(((((Celtis:20.739927,Pteroceltis:20.739939)Cannabaceae:20.739927,((Broussounetia:12.614990,Cudrania:12.615005,Maclura:12.615005,Morus:12.615005)Moraceae:12.614990,Oreocnide:25.229980):16.249876):10.909924,(Aphananthe:26.194889,Hemiptelea:26.194897,Planera:26.194897,Ulmus:26.194897,Zelkova:26.194897)Ulmaceae:26.194889):11.649286,(Hovenia:32.019470,Rhamnus:32.019493,Ziziphus:32.019493)Rhamnaceae:32.019596):8.938065,(((Amelanchier:36.488564,(Crataegus:36.488586,Mespilus:36.488586):0.000000):0.000000,Chaenomeles:36.488586,Eriobotrya:36.488586,Malus:36.488586,Photinia:36.488586,Pyrus:36.488586,Sorbus:36.488586):0.000000,Prunus:36.488586)Rosaceae:36.488564):12.513593):4.616908,(Albizia:31.901920,Cercis:31.901943,Cladrastis:31.901943,Dalbergia:31.901943,Erythrina:31.901943,Gleditsia:31.901943,Gymnocladus:31.901943,Laburnum:31.901943,Maackia:31.901943,Ormosia:31.901943,Robinia:31.901943,Sophora:31.901943)Fabaceae:58.205711):4.139401,((Euonymus:90.433327,Sloanea:90.433327):0.000101,((Mallotus:28.689901,Sapium:28.6!
 89920)Eu
phorbiaceae:50.330055,(Idesia:29.019764,Poliothyrsis:29.019779,Populus:29.019779,Salix:29.019779,Xylosma:29.019779)Salicaceae:50.000195):11.413469):3.813607):9.615288):0.000000,(Staphylea:21.372393,Tapiscia:21.372404,Turpinia:21.372404)Staphyleaceae:82.489929):11.011259):1.690163):7.829397,Buxus:124.393143):0.000000,Tetracentron:124.393143):2.763555,Meliosma:127.156693):1.664427,Platanus:128.821121):2.029122,Euptelea:130.850250):11.447736,((Asimina:95.972672,(Liriodendron:47.125092,Magnolia:47.125114,Manglieita:47.125114,Michelia:47.125114)Magnoliaceae:48.847580):46.325292,(Actinodaphne:49.903526,Cinnamomum:49.903542,Lindera:49.903542,Litsea:49.903542,Machilus:49.903542,Neolitsea:49.903542,Nothaphoebe:49.903542,Persea:49.903542,Phoebe:49.903542,Sassafras:49.903542,Umbellularia:49.903542)Lauraceae:92.394188):0.000257):1.840266,(Yucca:110.138222,((Sabal:100.000000,(Serenoa:95.000000,Trachycarpus:95.000000)ST:5.000000)Arecaceae:10.000000,(Arundinaria:20.476601,Phyllostachys:20.476624,Semiarundinaria:20.476624)Poaceae:89.661629):0.000000):34):30.861772,Illicium:175.000000)aus2ast:175.000000,(((((Cephalotaxus:125.000000,(Taxus:100.000000,Torreya:100.000000)TT1:25.000000)Taxaceae:90.000000,((((((((Calocedrus:85.000000,Platycladus:85.000000)CP:5.000000,(Cupressus:85.000000,Juniperus:85.000000)CJ:5.000000)CJCP:5.000000,Chamaecyparis:95.000000)CCJCP:5.000000,(Thuja:7.870000,Thujopsis:7.870000)TT2:92.13)CJCPTT:30.000000,((Cryptomeria:120.000000,Taxodium:120.000000)CT:5.000000,Glyptostrobus:125.000000)CTG:5.000000)CupCallTax:5.830000,((Metasequoia:125.000000,Sequoia:125.000000)MS:5.000000,Sequoiadendron:130.000000)Sequoioid:5.830000)STCC:49.060001,Taiwania:184.889999)Taw+others:15.110000,Cunninghamia:200.000000)nonSci:15.000000)Tax+nonSci:10.000000,Sciadopitys:225.000000):25.000000,(((Abies:106.000000,Keteleeria:106.000000)AK:54.000000,(Pseudolarix:156.000000,Tsuga:156.000000)NTP:4.000000)NTPAK:24.000000,((Larix:87.000000,Pseudotsuga:87.000000)LP:81.000000,(Picea:155.000000,Pinus:155.000000)PPC:13.000000)Pinoideae:!
 16.00000
0)Pinaceae:66.000000)Coniferales:25.000000,Ginkgo:275.000000)gymnosperm:75.000000)seedplant:50.000000;'


tree_obj = Tree(tree_str)

print tree_obj
============


This brings up the follow error for "tree_obj = Tree(tree_str)": 
========
ValueError: invalid literal for float(): seedplant
========

It looks like it is looking for a floating point number where "seedplant" is.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Thu Mar 12 06:17:01 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 12 Mar 2009 06:17:01 -0400
Subject: [Biopython-dev] [Bug 2788] Bio.Nexus.Trees newick parser crash
In-Reply-To: <bug-2788-42@http.bugzilla.open-bio.org/>
Message-ID: <200903121017.n2CAH13S012060@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2788


------- Comment #1 from cymon.cox at gmail.com  2009-03-12 06:17 EST -------
(In reply to comment #0)
> The newick files I have been working with seem to open fine in several
> different programs/packages (Dendroscope, R's APE package, phylocom, python
> alfacinha module), but not the newick parser in Bio.Nexus.Trees.

[a big tree]

> tree_obj = Tree(tree_str)
> 
> print tree_obj
> ============
> 
> 
> This brings up the follow error for "tree_obj = Tree(tree_str)": 
> ========
> ValueError: invalid literal for float(): seedplant
> ========
> 
> It looks like it is looking for a floating point number where "seedplant" is.

Your tree is decorated with node labels, which the parser cannot handle.

This came up recently (within the last year?) but I can't find the bug/message.

Should probably catch this and return an informative error - or implement node
labels...

C.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Thu Mar 12 06:38:59 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 12 Mar 2009 06:38:59 -0400
Subject: [Biopython-dev] [Bug 2788] Bio.Nexus.Trees newick parser does not
	support internal node labels
In-Reply-To: <bug-2788-42@http.bugzilla.open-bio.org/>
Message-ID: <200903121038.n2CAcxMR014167@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2788


biopython-bugzilla at maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Severity|normal                      |enhancement
         OS/Version|Mac OS                      |All
           Platform|Macintosh                   |All
            Summary|Bio.Nexus.Trees newick      |Bio.Nexus.Trees newick
                   |parser crash                |parser does not support
                   |                            |internal node labels


------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk  2009-03-12 06:38 EST -------
I thought it looked familiar, but I must have only searched the currently open
bugs.  This looks *very* similar to Bug 2543 which dealt with internal node
names, which was fixed for Biopython 1.49 (and 1.49 beta).

Frank wrote:
> Nexus.Trees has been extended to deal with internal node names, or "special
> comments" in the format [& blablalba]. Such comments comments can appear
> directly after the taxon label, after the closing parentheses, or between
> branchlength / support values attached to a node or a taxon labels, ...

i.e. On Bug 2543, Frank didn't go as far as the enhancement to cope with
"naked" node labels, just those in the square brackets.

Consider this smaller example Cymon gave on Bug 2543:

>>> from Bio.Nexus.Trees import Tree
>>> tree_str2 = "(((t9:0.385832, (t8:0.445135,t4:0.41401)C:0.024032)B:0.041436, t6:0.392496)A:0.0291131, t2:0.497673, ((t0:0.301171, t7:0.482152)E:0.0268148, ((t5:0.0984167,t3:0.488578)G:0.0349662, t1:0.130208)F:0.0318288)D:0.0273876);"
>>> tree_obj = Tree(tree_str2)
Traceback (most recent call last):
...
ValueError: invalid literal for float(): A


I've retitled this and marked it as an enhancement.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Thu Mar 12 06:41:30 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 12 Mar 2009 06:41:30 -0400
Subject: [Biopython-dev] [Bug 2543] Bio.Nexus.Trees can't handle named
	ancestors
In-Reply-To: <bug-2543-42@http.bugzilla.open-bio.org/>
Message-ID: <200903121041.n2CAfUwH014362@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2543


------- Comment #7 from biopython-bugzilla at maubp.freeserve.co.uk  2009-03-12 06:41 EST -------
On comment #5 Frank wrote:
> In my opinion, naming nodes is a feature, and I would not regard the lack of
> this feature as a bug.  But I'll have a look at the code and see how easy
> this can be changed. It would actually be nice if P4 and Bio.Nexus, both
> being python programs, could read each other's trees.

This enhancement is now covered by Bug 2788.  It appears that now several other
programs support this Newick tree variant, making it a bit more important.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From chris.lasher at gmail.com  Thu Mar 12 17:07:21 2009
From: chris.lasher at gmail.com (Chris Lasher)
Date: Thu, 12 Mar 2009 17:07:21 -0400
Subject: [Biopython-dev] biopython on github
In-Reply-To: <320fb6e00902260600p5fb90241td1ded497c08cb901@mail.gmail.com>
References: <5aa3b3570902150729g367022a5p334b2c33f86461f@mail.gmail.com>
	<320fb6e00902230731h6257376sb2d6772f72b6e03a@mail.gmail.com>
	<3f6baf360902230843u320e9fe9wc0a03928383d6cbb@mail.gmail.com>
	<320fb6e00902230908j38f5755la85a55bfc461a763@mail.gmail.com>
	<8b34ec180902231029u7a9d003r533af7f078f4a8e2@mail.gmail.com>
	<8b34ec180902250140k4fb1bef0y913b97db0e309e4b@mail.gmail.com>
	<320fb6e00902250210t2ad19536ke379e219ba6f7dae@mail.gmail.com>
	<8b34ec180902250256k6f6f5c1bvbf85d8b68a315927@mail.gmail.com>
	<8b34ec180902260526m3ff42f3x2a99a77d4d0fb928@mail.gmail.com>
	<320fb6e00902260600p5fb90241td1ded497c08cb901@mail.gmail.com>
Message-ID: <128a885f0903121407g133ed8ctda57b21ff8adb70e@mail.gmail.com>

On Thu, Feb 26, 2009 at 10:00 AM, Peter <biopython at maubp.freeserve.co.uk> wrote:
> Another option to consider would be to switch to running git on
> biopython.org, but use the git-cvsserver tool to provide an emulated
> CVS server on top of the git repository. ?This sounds possible in
> theory, and would be nice for any "old fashioned" biopython developers
> because is should be fairly transparent - they can continue to treat
> it as CVS and just work on the main trunk. ?This would require someone
> competent to do the conversion and alter the server setup - we'd have
> to talk to the OBF team about this. ?However, if anyone has first hand
> experience on git-cvsserver perhaps they could comment on weather this
> sounds like a good plan or not.

I must be missing something, Peter. Why would BioPython continue to
operate with CVS? I suppose I just really hope to see BioPython
running with something other than CVS, and I'd really like to see it
go either under Bazaar or Git.

Chris


From bartek at rezolwenta.eu.org  Thu Mar 12 19:20:23 2009
From: bartek at rezolwenta.eu.org (Bartek Wilczynski)
Date: Fri, 13 Mar 2009 00:20:23 +0100
Subject: [Biopython-dev] biopython on github
In-Reply-To: <128a885f0903121407g133ed8ctda57b21ff8adb70e@mail.gmail.com>
References: <5aa3b3570902150729g367022a5p334b2c33f86461f@mail.gmail.com>
	<3f6baf360902230843u320e9fe9wc0a03928383d6cbb@mail.gmail.com>
	<320fb6e00902230908j38f5755la85a55bfc461a763@mail.gmail.com>
	<8b34ec180902231029u7a9d003r533af7f078f4a8e2@mail.gmail.com>
	<8b34ec180902250140k4fb1bef0y913b97db0e309e4b@mail.gmail.com>
	<320fb6e00902250210t2ad19536ke379e219ba6f7dae@mail.gmail.com>
	<8b34ec180902250256k6f6f5c1bvbf85d8b68a315927@mail.gmail.com>
	<8b34ec180902260526m3ff42f3x2a99a77d4d0fb928@mail.gmail.com>
	<320fb6e00902260600p5fb90241td1ded497c08cb901@mail.gmail.com>
	<128a885f0903121407g133ed8ctda57b21ff8adb70e@mail.gmail.com>
Message-ID: <8b34ec180903121620w9c2ec46i8fed9ccb4781370e@mail.gmail.com>

On Thu, Mar 12, 2009 at 10:07 PM, Chris Lasher <chris.lasher at gmail.com> wrote:
>
> I must be missing something, Peter. Why would BioPython continue to
> operate with CVS? I suppose I just really hope to see BioPython
> running with something other than CVS, and I'd really like to see it
> go either under Bazaar or Git.
>
Hi Chris,

The idea is to do the switch in two steps:
- first we still have the main branch in CVS while we have git and/or
bzr branches synchronized with it for people to branch and contribute
- If this works nicely, we will switch to one of these systems
completely (while possibly keeping the other branch in sync, but this
is not yet decided)

The first step is to some extent operational (I'm currently busy with
other stuff, but I'll get arround it hopefully this weekend), but the
second step requires decision on our side (git or bzr?) and action on
the side of OBF (there is no git or bazar installed on obf servers).

cheers
-- 
Bartek Wilczynski

From biopython at maubp.freeserve.co.uk  Fri Mar 13 08:21:14 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Fri, 13 Mar 2009 12:21:14 +0000
Subject: [Biopython-dev] biopython on github
In-Reply-To: <8b34ec180903121620w9c2ec46i8fed9ccb4781370e@mail.gmail.com>
References: <5aa3b3570902150729g367022a5p334b2c33f86461f@mail.gmail.com>
	<320fb6e00902230908j38f5755la85a55bfc461a763@mail.gmail.com>
	<8b34ec180902231029u7a9d003r533af7f078f4a8e2@mail.gmail.com>
	<8b34ec180902250140k4fb1bef0y913b97db0e309e4b@mail.gmail.com>
	<320fb6e00902250210t2ad19536ke379e219ba6f7dae@mail.gmail.com>
	<8b34ec180902250256k6f6f5c1bvbf85d8b68a315927@mail.gmail.com>
	<8b34ec180902260526m3ff42f3x2a99a77d4d0fb928@mail.gmail.com>
	<320fb6e00902260600p5fb90241td1ded497c08cb901@mail.gmail.com>
	<128a885f0903121407g133ed8ctda57b21ff8adb70e@mail.gmail.com>
	<8b34ec180903121620w9c2ec46i8fed9ccb4781370e@mail.gmail.com>
Message-ID: <320fb6e00903130521s69c5b3eg55b71191b1e8ff21@mail.gmail.com>

On Thu, Mar 12, 2009 at 11:20 PM, Bartek Wilczynski
<bartek at rezolwenta.eu.org> wrote:
> On Thu, Mar 12, 2009 at 10:07 PM, Chris Lasher <chris.lasher at gmail.com> wrote:
>> On Thu, Feb 26, 2009 at 10:00 AM, Peter <biopython at maubp.freeserve.co.uk> wrote:
>>> Another option to consider would be to switch to running git on
>>> biopython.org, but use the git-cvsserver tool to provide an emulated
>>> CVS server on top of the git repository.  This sounds possible in
>>> theory, and would be nice for any "old fashioned" biopython developers
>>> because is should be fairly transparent - they can continue to treat
>>> it as CVS and just work on the main trunk.  This would require someone
>>> competent to do the conversion and alter the server setup - we'd have
>>> to talk to the OBF team about this.  However, if anyone has first hand
>>> experience on git-cvsserver perhaps they could comment on weather this
>>> sounds like a good plan or not.
>>
>> I must be missing something, Peter. Why would BioPython continue to
>> operate with CVS? I suppose I just really hope to see BioPython
>> running with something other than CVS, and I'd really like to see it
>> go either under Bazaar or Git.

I'm warming to the idea of git, and had noticed git includes the
optional git-cvsserver tool which emulates a CVS server while using
git underneath.  I was wondering if anyone had first hand experience
of this.  If we did move from CVS to git (still hosted on
biopython.org), this would seem to offer a nice migration path for of
our "old school" CVS developers - they can carry on as usual.  Of
course, if none of us care about having to learn a new interface, then
a simple switch would be less hassle to setup.  For the server side of
things, we'll need to talk to the OBF team about any such move - as
far as I know they've only managed CVS to SVN migrations in the past.

Peter

> Hi Chris,
>
> The idea is to do the switch in two steps:
> - first we still have the main branch in CVS while we have git and/or
> bzr branches synchronized with it for people to branch and contribute
> - If this works nicely, we will switch to one of these systems
> completely (while possibly keeping the other branch in sync, but this
> is not yet decided)

That does seem like a good plan.  Of course, there is the related
issue of where we host the official repository (externally, e.g. on
github or lauchpad) or in house (on biopython.org).  I favour keeping
the official repository on biopython.org but this will require OBF
technical support (do we have the expertise within Biopython? Bartek?
Chris?).

> The first step is to some extent operational (I'm currently busy with
> other stuff, but I'll get arround it hopefully this weekend), but the
> second step requires decision on our side (git or bzr?) and action on
> the side of OBF (there is no git or bazar installed on obf servers).

There is also the previously semi-agreed solution of switching from
CVS to SVN on biopython.org, but this would be only a gradual
improvement.  I gather there are mature tools for using git+svn
together, so it should be better than using git+cvs together.  Other
than meaning all the OBF hosted projects are on SVN (I think we are
the last still on CVS), this is beginning to seem a bit pointless.

Peter

From bugzilla-daemon at portal.open-bio.org  Fri Mar 13 11:48:39 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 13 Mar 2009 11:48:39 -0400
Subject: [Biopython-dev] [Bug 2780] PDB file HETATMs cannot be alternative
	location of a residue that is an ATOM
In-Reply-To: <bug-2780-42@http.bugzilla.open-bio.org/>
Message-ID: <200903131548.n2DFmdZ6015899@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2780


------- Comment #2 from klaus.kopec at tuebingen.mpg.de  2009-03-13 11:48 EST -------
PDB IDs of some more occurances (simply search the file for "HETATM" and look
for a HETATM record that is followed by a ATOM with the same residue number and
a different altloc).

1din
1k4q
1k55 - multiple occurances
1k56
1rqh
1rr2
1xpk
1xpl - multiple occurances
1xpm - multiple occurances


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From jblanca at btc.upv.es  Fri Mar 13 11:59:01 2009
From: jblanca at btc.upv.es (Jose Blanca)
Date: Fri, 13 Mar 2009 16:59:01 +0100
Subject: [Biopython-dev] library to create gel image
In-Reply-To: <200902271157.49948.jblanca@btc.upv.es>
References: <200902261612.54306.jblanca@btc.upv.es>
	<320fb6e00902270245q65c0b924obd5181576374134c@mail.gmail.com>
	<200902271157.49948.jblanca@btc.upv.es>
Message-ID: <200903131659.01590.jblanca@btc.upv.es>

Hi:
I've fishished a first version of a program that reads a list of Applied 
Biosystems fsa files and draws a virtual gel. It does not reads the sequence 
because my users are interested in fragment analysis, but the basic 
infraestructure is in place to do it.
It does what my users need. It's quite slow though, but I'm not investing time 
in optimizing it.
If anybody wants to take a look at the code is in:
http://bioinf.comav.upv.es/svn/gelify/gelifyfsa/
I distribute it under the GPL licence.
If you think that any part of the code could be of any use for the Biopython 
project I would be very please to give it to the comunity.
Best regards,

Jose Blanca

On Friday 27 February 2009 11:57:49 Jose Blanca wrote:
> On Friday 27 February 2009 11:45:59 Peter wrote:
> > On Fri, Feb 27, 2009 at 9:05 AM, Jose Blanca <jblanca at btc.upv.es> wrote:
> > That's much clearer - is the Genographer software showing the actual
> > image (zoomed as required, with the colours adjusted as required), or
> > an artificial recreation?
>
> Is an artificial recreation, the same as I'm trying to accomplish. I just
> want more resolution an automated process (genographer is a GUI
> application)
>
> > Are you trying to create this figure for illustrative purposes only?
> > I mean would a slightly cartoon like recreation be fine, or are you
> > trying to make it as realistic as possible?
>
> I want to analyze it.
>
> > I see you are having to reverse engineer their file format.  I guess
> > other people have tried this in the past so there may be more clues
> > out on the internet.  Have you tried emailing the company to see if
> > they would publish the file format specifications (unlikely I fear,
> > but worth asking).
>
> Fortunately the ABIF was reverse enginered by people more clever than me.
> And a couple of years ago Applied published an specification.
> http://bioinf.comav.upv.es/svn/gelify/gelifyfsa/src/doc/ABIF_File_Format.pd
>f You can't beleive everything in that specification, but it is a good
> start. Reading an abif file is not a problem, drawing the gel with as
> little coding as possible is another thing.
> Regards,
>
> Jose Blanca
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev


-- 
Jose M. Blanca Postigo
Instituto Universitario de Conservacion y
Mejora de la Agrodiversidad Valenciana (COMAV)
Universidad Politecnica de Valencia (UPV)
Edificio CPI (Ciudad Politecnica de la Innovacion), 8E
46022 Valencia (SPAIN)
Tlf.:+34-96-3877000 (ext 88473)

From biopython at maubp.freeserve.co.uk  Fri Mar 13 12:12:12 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Fri, 13 Mar 2009 16:12:12 +0000
Subject: [Biopython-dev] library to create gel image
In-Reply-To: <200903131659.01590.jblanca@btc.upv.es>
References: <200902261612.54306.jblanca@btc.upv.es>
	<320fb6e00902270245q65c0b924obd5181576374134c@mail.gmail.com>
	<200902271157.49948.jblanca@btc.upv.es>
	<200903131659.01590.jblanca@btc.upv.es>
Message-ID: <320fb6e00903130912k455c49d6y6baff970ad064bd@mail.gmail.com>

On Fri, Mar 13, 2009 at 3:59 PM, Jose Blanca <jblanca at btc.upv.es> wrote:
> Hi:
> I've fishished a first version of a program that reads a list of Applied
> Biosystems fsa files and draws a virtual gel. It does not reads the sequence
> because my users are interested in fragment analysis, but the basic
> infraestructure is in place to do it.
> It does what my users need. It's quite slow though, but I'm not investing time
> in optimizing it.

Do you have any example images online for people to look at?

Peter

From jblanca at btc.upv.es  Fri Mar 13 12:16:46 2009
From: jblanca at btc.upv.es (Jose Blanca)
Date: Fri, 13 Mar 2009 17:16:46 +0100
Subject: [Biopython-dev] library to create gel image
In-Reply-To: <320fb6e00903130912k455c49d6y6baff970ad064bd@mail.gmail.com>
References: <200902261612.54306.jblanca@btc.upv.es>
	<200903131659.01590.jblanca@btc.upv.es>
	<320fb6e00903130912k455c49d6y6baff970ad064bd@mail.gmail.com>
Message-ID: <200903131716.46413.jblanca@btc.upv.es>

Here you have one:
http://bioinf.comav.upv.es/svn/gelify/gelifyfsa/src/doc/out.png
Jose Blanca

On Friday 13 March 2009 17:12:12 Peter wrote:
> On Fri, Mar 13, 2009 at 3:59 PM, Jose Blanca <jblanca at btc.upv.es> wrote:
> > Hi:
> > I've fishished a first version of a program that reads a list of Applied
> > Biosystems fsa files and draws a virtual gel. It does not reads the
> > sequence because my users are interested in fragment analysis, but the
> > basic infraestructure is in place to do it.
> > It does what my users need. It's quite slow though, but I'm not investing
> > time in optimizing it.
>
> Do you have any example images online for people to look at?
>
> Peter


-- 
Jose M. Blanca Postigo
Instituto Universitario de Conservacion y
Mejora de la Agrodiversidad Valenciana (COMAV)
Universidad Politecnica de Valencia (UPV)
Edificio CPI (Ciudad Politecnica de la Innovacion), 8E
46022 Valencia (SPAIN)
Tlf.:+34-96-3877000 (ext 88473)

From chris.lasher at gmail.com  Sun Mar 15 01:43:34 2009
From: chris.lasher at gmail.com (Chris Lasher)
Date: Sun, 15 Mar 2009 01:43:34 -0400
Subject: [Biopython-dev] biopython on github
In-Reply-To: <320fb6e00903130521s69c5b3eg55b71191b1e8ff21@mail.gmail.com>
References: <5aa3b3570902150729g367022a5p334b2c33f86461f@mail.gmail.com>
	<8b34ec180902231029u7a9d003r533af7f078f4a8e2@mail.gmail.com>
	<8b34ec180902250140k4fb1bef0y913b97db0e309e4b@mail.gmail.com>
	<320fb6e00902250210t2ad19536ke379e219ba6f7dae@mail.gmail.com>
	<8b34ec180902250256k6f6f5c1bvbf85d8b68a315927@mail.gmail.com>
	<8b34ec180902260526m3ff42f3x2a99a77d4d0fb928@mail.gmail.com>
	<320fb6e00902260600p5fb90241td1ded497c08cb901@mail.gmail.com>
	<128a885f0903121407g133ed8ctda57b21ff8adb70e@mail.gmail.com>
	<8b34ec180903121620w9c2ec46i8fed9ccb4781370e@mail.gmail.com>
	<320fb6e00903130521s69c5b3eg55b71191b1e8ff21@mail.gmail.com>
Message-ID: <128a885f0903142243r372026d7vdf5bbe998db3a326@mail.gmail.com>

On Fri, Mar 13, 2009 at 8:21 AM, Peter <biopython at maubp.freeserve.co.uk> wrote:
> On Thu, Mar 12, 2009 at 11:20 PM, Bartek Wilczynski
> <bartek at rezolwenta.eu.org> wrote:
>> On Thu, Mar 12, 2009 at 10:07 PM, Chris Lasher <chris.lasher at gmail.com> wrote:
>>> On Thu, Feb 26, 2009 at 10:00 AM, Peter <biopython at maubp.freeserve.co.uk> wrote:
>>>> Another option to consider would be to switch to running git on
>>>> biopython.org, but use the git-cvsserver tool to provide an emulated
>>>> CVS server on top of the git repository. ?This sounds possible in
>>>> theory, and would be nice for any "old fashioned" biopython developers
>>>> because is should be fairly transparent - they can continue to treat
>>>> it as CVS and just work on the main trunk. ?This would require someone
>>>> competent to do the conversion and alter the server setup - we'd have
>>>> to talk to the OBF team about this. ?However, if anyone has first hand
>>>> experience on git-cvsserver perhaps they could comment on weather this
>>>> sounds like a good plan or not.
>>>
>>> I must be missing something, Peter. Why would BioPython continue to
>>> operate with CVS? I suppose I just really hope to see BioPython
>>> running with something other than CVS, and I'd really like to see it
>>> go either under Bazaar or Git.
>
> I'm warming to the idea of git, and had noticed git includes the
> optional git-cvsserver tool which emulates a CVS server while using
> git underneath. ?I was wondering if anyone had first hand experience
> of this. ?If we did move from CVS to git (still hosted on
> biopython.org), this would seem to offer a nice migration path for of
> our "old school" CVS developers - they can carry on as usual. ?Of
> course, if none of us care about having to learn a new interface, then
> a simple switch would be less hassle to setup. ?For the server side of
> things, we'll need to talk to the OBF team about any such move - as
> far as I know they've only managed CVS to SVN migrations in the past.
>
> Peter
>
>> Hi Chris,
>>
>> The idea is to do the switch in two steps:
>> - first we still have the main branch in CVS while we have git and/or
>> bzr branches synchronized with it for people to branch and contribute
>> - If this works nicely, we will switch to one of these systems
>> completely (while possibly keeping the other branch in sync, but this
>> is not yet decided)
>
> That does seem like a good plan. ?Of course, there is the related
> issue of where we host the official repository (externally, e.g. on
> github or lauchpad) or in house (on biopython.org). ?I favour keeping
> the official repository on biopython.org but this will require OBF
> technical support (do we have the expertise within Biopython? Bartek?
> Chris?).
>
>> The first step is to some extent operational (I'm currently busy with
>> other stuff, but I'll get arround it hopefully this weekend), but the
>> second step requires decision on our side (git or bzr?) and action on
>> the side of OBF (there is no git or bazar installed on obf servers).
>
> There is also the previously semi-agreed solution of switching from
> CVS to SVN on biopython.org, but this would be only a gradual
> improvement. ?I gather there are mature tools for using git+svn
> together, so it should be better than using git+cvs together. ?Other
> than meaning all the OBF hosted projects are on SVN (I think we are
> the last still on CVS), this is beginning to seem a bit pointless.
>
> Peter
>

Peter et al.,

I started off writing an email about why I think hosting at GitHub or
Launchpad is a better idea, but it got a bit verbose, so I just wrote
up a blog post instead. (Besides, links and images are more fun, and
make the intarwebs go 'round.) Please see
http://igotgenes.blogspot.com/2009/03/why-biopython-needs-to-move-to-github.html
or
http://tinyurl.com/a9o7ae

Chris


From mjldehoon at yahoo.com  Sun Mar 15 06:24:11 2009
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Sun, 15 Mar 2009 03:24:11 -0700 (PDT)
Subject: [Biopython-dev] Bio.ExPASy
Message-ID: <76595.11423.qm@web62404.mail.re1.yahoo.com>


Hi everybody,

As discussed previously, I have moved the Bio.Prosite code to Bio.ExPASy, and I've added a ScanProsite module to Bio.ExPASy. I guess Bio.Enzyme should also move to Bio.ExPASy. See

http://biopython.org/DIST/docs/tutorial/Tutorial.proposal.html

for the documentation of Biopython as currently in CVS.

--Michiel.


From mjldehoon at yahoo.com  Sun Mar 15 08:53:28 2009
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Sun, 15 Mar 2009 05:53:28 -0700 (PDT)
Subject: [Biopython-dev] Fw: Re:  Bio.Entrez catching more errors
Message-ID: <722257.11611.qm@web62401.mail.re1.yahoo.com>


--- On Sun, 3/15/09, Michiel de Hoon <mjldehoon at yahoo.com> wrote:
> Whereas I think it's a good idea if Bio.Entrez catches
> more errors, I think the parser is a more suitable place to
> check for errors. See Bio.ExPASy.ScanProsite for an example
> of catching errors with an XML parser; this avoids using a
> File.UndoHandle.
> 
> --Michiel
> 
> --- On Tue, 3/10/09, Peter
> <biopython at maubp.freeserve.co.uk> wrote:
> 
> > From: Peter <biopython at maubp.freeserve.co.uk>
> > Subject: [Biopython-dev] Bio.Entrez catching more
> errors
> > To: "BioPython-Dev Mailing List"
> <biopython-dev at lists.open-bio.org>
> > Date: Tuesday, March 10, 2009, 7:40 PM
> > Hi All,
> > 
> > It occured to me that the Bio.Entrez._open function
> can
> > look at the
> > retmode argument (if present) and spot if there is a
> > mismatch between
> > the requested format (e.g. XML, HTML, text or asn.1)
> and
> > the actual
> > data the NCBI returned.  Something along the following
> > lines could be
> > added to the end of the _open function in
> > Bio/Entrez/__init__.py to
> > acheive this:
> > 
> >     elif "retmode" in params and
> > params["retmode"].lower()=="html"
> \
> >     and not
> data.lower().startswith("<html")
> > \
> >     and not data.lower().startswith("<!doctype
> > html") :
> >         raise TypeError("Requested HTML, but
> > didn't get it: %s..." % data)
> >     elif "retmode" in params and
> > params["retmode"].lower()=="xml"
> \
> >     and not
> data.lower().startswith("<?xml") :
> >         raise TypeError("Requested XML, but
> didn't
> > get it: %s..." % data)
> >     elif "retmode" in params and
> > params["retmode"] \
> >     and
> > params["retmode"].lower()!="xml"
> \
> >     and data.lower().startswith("<?xml")
> :
> >         raise TypeError("Didn't request XML,
> but
> > got it: %s..." % data)
> >     elif "retmode" in params and
> > params["retmode"] \
> >     and
> > params["retmode"].lower()!="html"
> \
> >     and (data.lower().startswith("<html")
> or
> > \
> >          data.lower().startswith("<!doctype
> > html")):
> >         #Expected for some error pages (e.g. the Bad
> > Gateway caught above)
> >         raise TypeError("Didn't request HTML,
> but
> > got it: %s..." % data)
> > 
> > I'm sure my XML/HTML detection could be made more
> > robust here - I hope
> > the principle is clear.  My motivation is that I have
> > noticed the NCBI
> > can return HTML error pages, and while we do catch
> some of
> > these
> > explicitly (e.g. Bad Gateway, or Service Unavailable),
> I
> > think any
> > HTML page when the user asked from XML, text or asn.1
> > should be
> > treated as error.  Similarly, not getting XML when you
> ask
> > for it etc.
> > 
> > Note that by raising the exception including the
> message
> > text it
> > should be much easier to diagnose these failures.  As
> a
> > tiny
> > refinement to the above code, we should only add the
> > "..." if there is
> > more text to follow - this isn't always the case.
> > 
> > e.g. The following give an HTML error page (while some
> > databases like
> > "protein" are better behaved in this
> respect):
> > >>> print
> Entrez.efetch(db="homologene",
> > id="nonexistant",
> retmode="text").read()
> > >>> print
> Entrez.efetch(db="homologene",
> > id="nonexistant",
> > retmode="asn.1").read()
> > 
> > Similarly, these give an XML like fragment (which is
> not a
> > valid XML
> > file in itself - arguably an NCBI bug; some databases
> like
> > "protein"
> > are better behaved in this respect):
> > >>> print
> Entrez.efetch(db="pubmed",
> > id="nonexistant",
> retmode="xml").read()
> > >>> print
> Entrez.efetch(db="homologene",
> > id="nonexistant",
> retmode="xml").read()
> > >>> print Entrez.efetch(db="cdd",
> > id="nonexistant",
> retmode="xml").read()
> > >>> print
> Entrez.efetch(db="taxonomy",
> > id="nonexistant",
> retmode="xml").read()
> > 
> > My suggested change to Bio.Entrez would also catch the
> > following
> > examples (using an invalid database) where the NCBI
> ignore
> > the retmode
> > and return an HTML help page:
> > >>> print
> > Entrez.efetch(db="nonexistant",
> > id="123456", retmode="xml").read()
> > >>> print
> > Entrez.efetch(db="nonexistant",
> > id="123456",
> retmode="text").read()
> > 
> > In a less clear cut example, this would flag the
> following
> > as an error
> > as the NCBI seem to return ASN.1 text instead of HTML
> > here::
> > >>> print
> Entrez.efetch(db="nucleotide",
> > retmode="html",
> id="123456").read()
> > 
> > Overall, I think this change should catch lots of
> errors
> > which
> > otherwise may not be detected until later (e.g. while
> > trying to parse
> > the file).
> > 
> >
> --------------------------------------------------------------------------------------------------
> > 
> > On another point, should we catch these responses as
> > errors:?
> > 
> > >>> efetch(db="snp",
> > id="123456").read()
> > '<html><head><title>PmFetch
> >
> response</title></head><body>\n<pre>\n1:
> > id: 123456 Error occurred: cannot get document
> >
> summary\n</pre></body></html>'
> > >>> efetch(db="snp",
> > id="123456",
> retmode="html").read()
> > '<html><head><title>PmFetch
> >
> response</title></head><body>\n<pre>\n1:
> > id: 123456 Error occurred: cannot get document
> >
> summary\n</pre></body></html>'
> > >>> efetch(db="snp",
> > id="123456", retmode="xml").read()
> > '<?xml
> > version="1.0"?>\n<ExchangeSet
> >
> xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"\nxmlns="http://www.ncbi.nlm.nih.gov/SNP/docsum"\nxsi:schemaLocation="http://www.ncbi.nlm.nih.gov/SNP/docsum\nhttp://www.ncbi.nlm.nih.gov/SNP/docsum/eudocsum.xsd">1:
> > id: 123456 Error occurred: cannot get document
> > summary\n\n</ExchangeSet>'
> > >>> efetch(db="snp",
> > id="123456",
> retmode="text").read()
> > '1: id: 123456 Error occurred: cannot get document
> > summary\n'
> > 
> > and,
> > >>> print efetch(db="homologene",
> > retmode="html", id="fake").read()
> > <html>
> > <body>
> > <br/><h2>Error occurred: Empty id list -
> > nothing todo</h2>...
> > 
> > Looking for the string "Error occurred: "
> looks
> > fairly safe here, and
> > should cover a range of entries.  Of course, you can
> > imagine false
> > positives too, e.g. a valid PUBMED plain text record
> for a
> > tutorial
> > article with a title like "Yikes! An Error
> Occurred: A
> > beginner's
> > Guide To Defensive Programming." could match.
> > 
> > Peter
> > _______________________________________________
> > Biopython-dev mailing list
> > Biopython-dev at lists.open-bio.org
> >
> http://lists.open-bio.org/mailman/listinfo/biopython-dev


From chapmanb at 50mail.com  Sun Mar 15 14:54:43 2009
From: chapmanb at 50mail.com (Brad Chapman)
Date: Sun, 15 Mar 2009 14:54:43 -0400
Subject: [Biopython-dev] biopython on github
In-Reply-To: <128a885f0903142243r372026d7vdf5bbe998db3a326@mail.gmail.com>
References: <8b34ec180902231029u7a9d003r533af7f078f4a8e2@mail.gmail.com>
	<8b34ec180902250140k4fb1bef0y913b97db0e309e4b@mail.gmail.com>
	<320fb6e00902250210t2ad19536ke379e219ba6f7dae@mail.gmail.com>
	<8b34ec180902250256k6f6f5c1bvbf85d8b68a315927@mail.gmail.com>
	<8b34ec180902260526m3ff42f3x2a99a77d4d0fb928@mail.gmail.com>
	<320fb6e00902260600p5fb90241td1ded497c08cb901@mail.gmail.com>
	<128a885f0903121407g133ed8ctda57b21ff8adb70e@mail.gmail.com>
	<8b34ec180903121620w9c2ec46i8fed9ccb4781370e@mail.gmail.com>
	<320fb6e00903130521s69c5b3eg55b71191b1e8ff21@mail.gmail.com>
	<128a885f0903142243r372026d7vdf5bbe998db3a326@mail.gmail.com>
Message-ID: <20090315185443.GA30296@kunkel>

Hi all;
It is good to see the discussion around revision control systems;
Chris and Paulo's posts make some nice points. Source code
management is an important issue that influences perception of
Biopython and barriers to contributing.

My two cents on what we should do is:

- Pick a distributed source code management system. My preference
  is Git, only because it currently has more steam behind it.
  Git/Bazaar will likely end up being like the VHS/Beta debate.

- Test drive use of Git on an official GitHub repository. This would
  involve a few things:

  = Bartek and Giovanni: Can you coordinate on a single GitHub
    Biopython instance and remove the others to eliminate confusion?
  = Write up documentation for contributors. This is where we could use
    some volunteers from those interested to update the web pages.
    The two main places that need updating are:

    http://biopython.org/wiki/Contributing
    http://biopython.org/wiki/CVS
    
    I think we should ensure people are clear on what is being done
    and where you can contribute.

- Ensure GitHub can be synced with current CVS. Bartek, it sounds
  like you have a handle on this.

- Evaluate the success of Git. This is easy to measure in terms of
  new contributors, increased happiness, and what not. At the same
  time we can monitor how GitHub evolves over time.

- If successful, talk to the OpenBio team about hosting Git locally.

Peter, Michiel, et al -- how do you feel?

I think being cautious with the transition, as Peter recommends, is
important. I am old enough to remember Sourceforge being new and
everyone saying how it was stupid not to move there; then over time
Sourceforge got slow with all the users and people moved
away from it. This is just to say -- no one knows how GitHub (or
Launchpad) will evolve. OpenBio is a stable, small, nice community
and to the extent we can use their resources I believe we should.

Overall, the specifics of the above proposal aren't as important as
just doing something unambiguous and then evaluating how it works.
Right now things are a big confusing, which I think could put off
new developers, who are always welcome.

Looking forward to talking about code instead of revision control,
Brad

> On Fri, Mar 13, 2009 at 8:21 AM, Peter <biopython at maubp.freeserve.co.uk> wrote:
> > On Thu, Mar 12, 2009 at 11:20 PM, Bartek Wilczynski
> > <bartek at rezolwenta.eu.org> wrote:
> >> On Thu, Mar 12, 2009 at 10:07 PM, Chris Lasher <chris.lasher at gmail.com> wrote:
> >>> On Thu, Feb 26, 2009 at 10:00 AM, Peter <biopython at maubp.freeserve.co.uk> wrote:
> >>>> Another option to consider would be to switch to running git on
> >>>> biopython.org, but use the git-cvsserver tool to provide an emulated
> >>>> CVS server on top of the git repository. ?This sounds possible in
> >>>> theory, and would be nice for any "old fashioned" biopython developers
> >>>> because is should be fairly transparent - they can continue to treat
> >>>> it as CVS and just work on the main trunk. ?This would require someone
> >>>> competent to do the conversion and alter the server setup - we'd have
> >>>> to talk to the OBF team about this. ?However, if anyone has first hand
> >>>> experience on git-cvsserver perhaps they could comment on weather this
> >>>> sounds like a good plan or not.
> >>>
> >>> I must be missing something, Peter. Why would BioPython continue to
> >>> operate with CVS? I suppose I just really hope to see BioPython
> >>> running with something other than CVS, and I'd really like to see it
> >>> go either under Bazaar or Git.
> >
> > I'm warming to the idea of git, and had noticed git includes the
> > optional git-cvsserver tool which emulates a CVS server while using
> > git underneath. ?I was wondering if anyone had first hand experience
> > of this. ?If we did move from CVS to git (still hosted on
> > biopython.org), this would seem to offer a nice migration path for of
> > our "old school" CVS developers - they can carry on as usual. ?Of
> > course, if none of us care about having to learn a new interface, then
> > a simple switch would be less hassle to setup. ?For the server side of
> > things, we'll need to talk to the OBF team about any such move - as
> > far as I know they've only managed CVS to SVN migrations in the past.
> >
> > Peter
> >
> >> Hi Chris,
> >>
> >> The idea is to do the switch in two steps:
> >> - first we still have the main branch in CVS while we have git and/or
> >> bzr branches synchronized with it for people to branch and contribute
> >> - If this works nicely, we will switch to one of these systems
> >> completely (while possibly keeping the other branch in sync, but this
> >> is not yet decided)
> >
> > That does seem like a good plan. ?Of course, there is the related
> > issue of where we host the official repository (externally, e.g. on
> > github or lauchpad) or in house (on biopython.org). ?I favour keeping
> > the official repository on biopython.org but this will require OBF
> > technical support (do we have the expertise within Biopython? Bartek?
> > Chris?).
> >
> >> The first step is to some extent operational (I'm currently busy with
> >> other stuff, but I'll get arround it hopefully this weekend), but the
> >> second step requires decision on our side (git or bzr?) and action on
> >> the side of OBF (there is no git or bazar installed on obf servers).
> >
> > There is also the previously semi-agreed solution of switching from
> > CVS to SVN on biopython.org, but this would be only a gradual
> > improvement. ?I gather there are mature tools for using git+svn
> > together, so it should be better than using git+cvs together. ?Other
> > than meaning all the OBF hosted projects are on SVN (I think we are
> > the last still on CVS), this is beginning to seem a bit pointless.
> >
> > Peter
> >
> 
> Peter et al.,
> 
> I started off writing an email about why I think hosting at GitHub or
> Launchpad is a better idea, but it got a bit verbose, so I just wrote
> up a blog post instead. (Besides, links and images are more fun, and
> make the intarwebs go 'round.) Please see
> http://igotgenes.blogspot.com/2009/03/why-biopython-needs-to-move-to-github.html
> or
> http://tinyurl.com/a9o7ae
> 
> Chris
> 
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev

From bartek at rezolwenta.eu.org  Sun Mar 15 16:12:46 2009
From: bartek at rezolwenta.eu.org (Bartek Wilczynski)
Date: Sun, 15 Mar 2009 21:12:46 +0100
Subject: [Biopython-dev] biopython on github
In-Reply-To: <20090315185443.GA30296@kunkel>
References: <8b34ec180902231029u7a9d003r533af7f078f4a8e2@mail.gmail.com>
	<320fb6e00902250210t2ad19536ke379e219ba6f7dae@mail.gmail.com>
	<8b34ec180902250256k6f6f5c1bvbf85d8b68a315927@mail.gmail.com>
	<8b34ec180902260526m3ff42f3x2a99a77d4d0fb928@mail.gmail.com>
	<320fb6e00902260600p5fb90241td1ded497c08cb901@mail.gmail.com>
	<128a885f0903121407g133ed8ctda57b21ff8adb70e@mail.gmail.com>
	<8b34ec180903121620w9c2ec46i8fed9ccb4781370e@mail.gmail.com>
	<320fb6e00903130521s69c5b3eg55b71191b1e8ff21@mail.gmail.com>
	<128a885f0903142243r372026d7vdf5bbe998db3a326@mail.gmail.com>
	<20090315185443.GA30296@kunkel>
Message-ID: <8b34ec180903151312q5a3b2bcdwc526aef5d4ca2cfc@mail.gmail.com>

Hi all,

On Sun, Mar 15, 2009 at 7:54 PM, Brad Chapman <chapmanb at 50mail.com> wrote:
>
> - Pick a distributed source code management system. My preference
> ?is Git, only because it currently has more steam behind it.
> ?Git/Bazaar will likely end up being like the VHS/Beta debate.
>
> - Test drive use of Git on an official GitHub repository. This would
> ?involve a few things:
>
> ?= Bartek and Giovanni: Can you coordinate on a single GitHub
> ? ?Biopython instance and remove the others to eliminate confusion?
> ?= Write up documentation for contributors. This is where we could use
> ? ?some volunteers from those interested to update the web pages.
> ? ?The two main places that need updating are:
>
> ? ?http://biopython.org/wiki/Contributing
> ? ?http://biopython.org/wiki/CVS
>
> ? ?I think we should ensure people are clear on what is being done
> ? ?and where you can contribute.
>
> - Ensure GitHub can be synced with current CVS. Bartek, it sounds
> ?like you have a handle on this.
>
> - Evaluate the success of Git. This is easy to measure in terms of
> ?new contributors, increased happiness, and what not. At the same
> ?time we can monitor how GitHub evolves over time.
>

I think there are some important points brought by Brad (and others).

- From the technical point of view, I don't see any serious problems:
  - I can setup a new branch in github (current one includes some
testing changes done by Giovanni)
  - it will be synchronized daily with changes from CVS
  - I'll set up a script to also save a backup of the official branch
at the OBF server (to ensure that we do not depend on github)
  - I can make a (short) documentation on how to contribute.

I don't know wheteher anyone beside me is still interested in
testdriving launchpad/bzr as an alternative. If there are no other
people, I'll close the current testing branches from launchpad.

>
> Peter, Michiel, et al -- how do you feel?

I would also very happily hear from other developers. Especially if
there are any people who would be unhappy if we finally moved away
from CVS.

I'll post when I will have a running setup of cvs2git conversion.

cheers
Bartek


From bartek at rezolwenta.eu.org  Sun Mar 15 19:14:07 2009
From: bartek at rezolwenta.eu.org (Bartek Wilczynski)
Date: Mon, 16 Mar 2009 00:14:07 +0100
Subject: [Biopython-dev] biopython on github
In-Reply-To: <8b34ec180903151312q5a3b2bcdwc526aef5d4ca2cfc@mail.gmail.com>
References: <8b34ec180902231029u7a9d003r533af7f078f4a8e2@mail.gmail.com>
	<8b34ec180902250256k6f6f5c1bvbf85d8b68a315927@mail.gmail.com>
	<8b34ec180902260526m3ff42f3x2a99a77d4d0fb928@mail.gmail.com>
	<320fb6e00902260600p5fb90241td1ded497c08cb901@mail.gmail.com>
	<128a885f0903121407g133ed8ctda57b21ff8adb70e@mail.gmail.com>
	<8b34ec180903121620w9c2ec46i8fed9ccb4781370e@mail.gmail.com>
	<320fb6e00903130521s69c5b3eg55b71191b1e8ff21@mail.gmail.com>
	<128a885f0903142243r372026d7vdf5bbe998db3a326@mail.gmail.com>
	<20090315185443.GA30296@kunkel>
	<8b34ec180903151312q5a3b2bcdwc526aef5d4ca2cfc@mail.gmail.com>
Message-ID: <8b34ec180903151614k37db9568sc04b10bcdb688139@mail.gmail.com>

Hi all,

I've now set up script on my machine to update the biopython git
branch on github once every hour.
(thanks to Giovanni for creating and setting up the account)
It's created using the git fast-import script because of its speed.
You can find it here:

http://github.com/biopython/biopython/

It's a different branch than the one created earlier by Giovanni. The
old one is now called biopython_old
and will soon disappear from github (there were some temporary changes in it)

Th script also leaves a copy of the repository on dev.open-bio.org,
just in case :)

I've written a short guide on the wiki :
http://biopython.org/wiki/GitMigration

Please correct or give me comments if you don't like something or if
you feel something is missing.

I'm going to a conference, so I might be slow in responding to emails
next week...

cheers
Bartek

From dalloliogm at gmail.com  Mon Mar 16 05:49:29 2009
From: dalloliogm at gmail.com (Giovanni Marco Dall'Olio)
Date: Mon, 16 Mar 2009 10:49:29 +0100
Subject: [Biopython-dev] biopython on github
In-Reply-To: <8b34ec180903151614k37db9568sc04b10bcdb688139@mail.gmail.com>
References: <8b34ec180902231029u7a9d003r533af7f078f4a8e2@mail.gmail.com>
	<8b34ec180902260526m3ff42f3x2a99a77d4d0fb928@mail.gmail.com>
	<320fb6e00902260600p5fb90241td1ded497c08cb901@mail.gmail.com>
	<128a885f0903121407g133ed8ctda57b21ff8adb70e@mail.gmail.com>
	<8b34ec180903121620w9c2ec46i8fed9ccb4781370e@mail.gmail.com>
	<320fb6e00903130521s69c5b3eg55b71191b1e8ff21@mail.gmail.com>
	<128a885f0903142243r372026d7vdf5bbe998db3a326@mail.gmail.com>
	<20090315185443.GA30296@kunkel>
	<8b34ec180903151312q5a3b2bcdwc526aef5d4ca2cfc@mail.gmail.com>
	<8b34ec180903151614k37db9568sc04b10bcdb688139@mail.gmail.com>
Message-ID: <5aa3b3570903160249l6db16b6ew349e394bc3e126dc@mail.gmail.com>

On Mon, Mar 16, 2009 at 12:14 AM, Bartek Wilczynski <
bartek at rezolwenta.eu.org> wrote:

> Hi all,


>
> I've written a short guide on the wiki :
> http://biopython.org/wiki/GitMigration


I also have a draft for some documentation... I can contribute it later this
morning (now I don't have time).
p.s. the biopython website seems to be offline at the moment...


--

My blog on bioinformatics (now in English): http://bioinfoblog.it

From biopython at maubp.freeserve.co.uk  Mon Mar 16 07:05:38 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Mon, 16 Mar 2009 11:05:38 +0000
Subject: [Biopython-dev] biopython on github
In-Reply-To: <5aa3b3570903160249l6db16b6ew349e394bc3e126dc@mail.gmail.com>
References: <8b34ec180902231029u7a9d003r533af7f078f4a8e2@mail.gmail.com>
	<320fb6e00902260600p5fb90241td1ded497c08cb901@mail.gmail.com>
	<128a885f0903121407g133ed8ctda57b21ff8adb70e@mail.gmail.com>
	<8b34ec180903121620w9c2ec46i8fed9ccb4781370e@mail.gmail.com>
	<320fb6e00903130521s69c5b3eg55b71191b1e8ff21@mail.gmail.com>
	<128a885f0903142243r372026d7vdf5bbe998db3a326@mail.gmail.com>
	<20090315185443.GA30296@kunkel>
	<8b34ec180903151312q5a3b2bcdwc526aef5d4ca2cfc@mail.gmail.com>
	<8b34ec180903151614k37db9568sc04b10bcdb688139@mail.gmail.com>
	<5aa3b3570903160249l6db16b6ew349e394bc3e126dc@mail.gmail.com>
Message-ID: <320fb6e00903160405p5337f8b1m16d3c3d891950fd6@mail.gmail.com>

On Mon, Mar 16, 2009 at 9:49 AM, Giovanni Marco Dall'Olio
<dalloliogm at gmail.com> wrote:
> On Mon, Mar 16, 2009 at 12:14 AM, Bartek Wilczynski <
> bartek at rezolwenta.eu.org> wrote:
>> Hi all,
>>
>> I've written a short guide on the wiki :
>> http://biopython.org/wiki/GitMigration
>
> I also have a draft for some documentation... I can contribute it later this
> morning (now I don't have time).

In the meantime, I have updated the following pages accordingly:

http://biopython.org/wiki/CVS
http://biopython.org/wiki/SVN
http://biopython.org/wiki/Subversion_migration
http://biopython.org/wiki/Git #place holder, will be important if we
do fully move to git
http://biopython.org/wiki/GitMigration #Fixing biopython to Biopython etc

Peter

> p.s. the biopython website seems to be offline at the moment...

All the OBF pages were out for bit this morning (e.g. OBF helpdesk
#332), but it is back now.

From biopython at maubp.freeserve.co.uk  Mon Mar 16 07:30:12 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Mon, 16 Mar 2009 11:30:12 +0000
Subject: [Biopython-dev] biopython on github
In-Reply-To: <20090315185443.GA30296@kunkel>
References: <8b34ec180902231029u7a9d003r533af7f078f4a8e2@mail.gmail.com>
	<320fb6e00902250210t2ad19536ke379e219ba6f7dae@mail.gmail.com>
	<8b34ec180902250256k6f6f5c1bvbf85d8b68a315927@mail.gmail.com>
	<8b34ec180902260526m3ff42f3x2a99a77d4d0fb928@mail.gmail.com>
	<320fb6e00902260600p5fb90241td1ded497c08cb901@mail.gmail.com>
	<128a885f0903121407g133ed8ctda57b21ff8adb70e@mail.gmail.com>
	<8b34ec180903121620w9c2ec46i8fed9ccb4781370e@mail.gmail.com>
	<320fb6e00903130521s69c5b3eg55b71191b1e8ff21@mail.gmail.com>
	<128a885f0903142243r372026d7vdf5bbe998db3a326@mail.gmail.com>
	<20090315185443.GA30296@kunkel>
Message-ID: <320fb6e00903160430h125d11a3jd100497d3e25ffb8@mail.gmail.com>

On Sun, Mar 15, 2009 at 6:54 PM, Brad Chapman <chapmanb at 50mail.com> wrote:
> Hi all;
> It is good to see the discussion around revision control systems;
> Chris and Paulo's posts make some nice points. Source code
> management is an important issue that influences perception of
> Biopython and barriers to contributing.
>
> My two cents on what we should do is:
>
> - Pick a distributed source code management system. My preference
> ?is Git, only because it currently has more steam behind it.
> ?Git/Bazaar will likely end up being like the VHS/Beta debate.

I would agree git has more mind share, but I have no technical reason
to choose one over the other.

In terms of read only access, having a mirrored trunk branch on both
git (e.g. github) and bazaar (e.g. launchpad) is possible for
evaluation purposes.

> - Test drive use of Git on an official GitHub repository. This would
> ?involve a few things ...

Giovanni has shared the github "Biopython" user information so we
(i.e. Biopython) can use that for any official presence on github -
which is great.  Bartek and Giovanni seem to have this working OK.

I think having the latest CVS trunk in Launchpad automatically is
stalled because they (launchpad) can't cope with a simple
username/password for accessing a remote CVS server.  Is that right
Bartek?

> - Evaluate the success of Git. This is easy to measure in terms of
> ?new contributors, increased happiness, and what not. At the same
> ?time we can monitor how GitHub evolves over time.

It may not be that easy to measure in practice...

> - If successful, talk to the OpenBio team about hosting Git locally.

I have contacted the OBF to ask who we should talk to about this idea
(given it will probably involve server access to install new software
and perhaps changing firewall/port settings).

> Peter, Michiel, et al -- how do you feel?

I'm happy in principle with a switch to git, ideally hosted on
biopython.org (see below).

> I think being cautious with the transition, as Peter recommends, is
> important. I am old enough to remember Sourceforge being new and
> everyone saying how it was stupid not to move there; then over time
> Sourceforge got slow with all the users and people moved
> away from it. This is just to say -- no one knows how GitHub (or
> Launchpad) will evolve. OpenBio is a stable, small, nice community
> and to the extent we can use their resources I believe we should.

I did have that same example in mind - having to depend on a third
party like GitHub, LaunchPad or Sourceforge is great until things go
wrong.  The Open Bio Foundation is much smaller, and while they don't
have 100% uptime either, they are normally very responsive to issues
because they only support a small number of projects.  Of course,
ideally we might have both - an OBF hosted (git) repository on
biopython.org, synced to github for people to enjoy its collaborative
additions.

> Overall, the specifics of the above proposal aren't as important as
> just doing something unambiguous and then evaluating how it works.
> Right now things are a big confusing, which I think could put off
> new developers, who are always welcome.
>
> Looking forward to talking about code instead of revision control,

That would be nice :)

Peter


From biopython at maubp.freeserve.co.uk  Mon Mar 16 08:16:06 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Mon, 16 Mar 2009 12:16:06 +0000
Subject: [Biopython-dev] Preparing for Biopython 1.50 (beta)
Message-ID: <320fb6e00903160516yd63f61fu21ca7560562dd6dd@mail.gmail.com>

Hi All,

I think we should probably do another release soon - for one thing the
NCBI updated their DTD files, and it would be great if Biopython
shipped with them included (see discussion on Bug 2678).

We still need to work on the documentation for
Bio.Graphics.GenomeDiagram (Bug 2671) and Bio.Motif (Bug 2694), but in
the meantime I think it would be sensible to do a Biopython 1.50 beta
release in the next couple of weeks.

I'd like to include the following changes as part of the beta, but it
would be sensible to have someone else try these out first.  Any
volunteers?

Bug 2767 - Bio.SeqIO support for FASTQ and QUAL files
Bug 2551 - Adding advanced __getitem__ to generic alignment, e.g.
align[1:2,5:-5]

Any other nominations for Biopython 1.50?

I'd also like to resolve Bug 2597 (Enforce alphabet letters in Seq
objects), but that might deserve an alpha release given the higher
chance of breaking existing scripts...

Peter

From biopython at maubp.freeserve.co.uk  Mon Mar 16 09:18:19 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Mon, 16 Mar 2009 13:18:19 +0000
Subject: [Biopython-dev] biopython on github
In-Reply-To: <320fb6e00903160430h125d11a3jd100497d3e25ffb8@mail.gmail.com>
References: <8b34ec180902231029u7a9d003r533af7f078f4a8e2@mail.gmail.com>
	<8b34ec180902250256k6f6f5c1bvbf85d8b68a315927@mail.gmail.com>
	<8b34ec180902260526m3ff42f3x2a99a77d4d0fb928@mail.gmail.com>
	<320fb6e00902260600p5fb90241td1ded497c08cb901@mail.gmail.com>
	<128a885f0903121407g133ed8ctda57b21ff8adb70e@mail.gmail.com>
	<8b34ec180903121620w9c2ec46i8fed9ccb4781370e@mail.gmail.com>
	<320fb6e00903130521s69c5b3eg55b71191b1e8ff21@mail.gmail.com>
	<128a885f0903142243r372026d7vdf5bbe998db3a326@mail.gmail.com>
	<20090315185443.GA30296@kunkel>
	<320fb6e00903160430h125d11a3jd100497d3e25ffb8@mail.gmail.com>
Message-ID: <320fb6e00903160618g2b5b6acs6695fab5ef432bc7@mail.gmail.com>

Hi all,

I'm thinking a news post on
http://news.open-bio.org/news/category/obf-projects/biopython/ about
version control would be a good idea at this point.  How about this -
keywords like git, subversion and the other project names would be
links:

<start draft>
Title: Biopython and version control systems

Originally, all the OBF hosted projects used CVS for their source code
repositories.  At the start of 2008, BioPerl and BioJava moved over to
Subversion (SVN), followed by BioSQL.  Biopython was originally going
to do the same, but this didn't actually happen.  Having all the Bio*
projects using the same version control system would have simplified
server administration for the OBF, but wouldn't have actually made
that much difference to Biopython development.  Discussion has since
shifted towards next generation distributed version control systems
like git or bazaar.

Quote from Linus Torvalds,
<italics>
The slogan of Subversion for a while was ?CVS done right?, or
something like that, and if you start with that kind of slogan,
there's nowhere you can go. There is no way to do CVS right
</italics>

In addition to creating the Linux kernel, Linus Torvalds more recently
wrote git, a prominent example of a distributed version control
system.  Rather than switching from CVS to SVN, the BioRuby project
choose instead to use git, hosted on github.  Biopython is considering
doing something similar - using a distributed version control system
like git should make it easier for potential Biopython contributors to
manage their own local copies of Biopython under version control.

Initially for evaluation purposes only, Giovanni and Bartek have setup
a Biopython branch on GitHub, which will automatically be updated from
the OBF hosted Biopython CVS repository [Link to wiki page].  If this
is favorably received, then moving Biopython from CVS to git seems
likely at some point this year.

Peter on behalf of the Biopython developers
<end draft>

I hope this has everyone's approval... if not please reply here so we
can revise this before it gets posted.  Note that I've avoided getting
into specifics here, such as hosting arrangements, as the details will
go out of date.

Peter


From bartek at rezolwenta.eu.org  Mon Mar 16 10:24:42 2009
From: bartek at rezolwenta.eu.org (Bartek Wilczynski)
Date: Mon, 16 Mar 2009 15:24:42 +0100
Subject: [Biopython-dev] biopython on github
In-Reply-To: <320fb6e00903160430h125d11a3jd100497d3e25ffb8@mail.gmail.com>
References: <8b34ec180902231029u7a9d003r533af7f078f4a8e2@mail.gmail.com>
	<8b34ec180902250256k6f6f5c1bvbf85d8b68a315927@mail.gmail.com>
	<8b34ec180902260526m3ff42f3x2a99a77d4d0fb928@mail.gmail.com>
	<320fb6e00902260600p5fb90241td1ded497c08cb901@mail.gmail.com>
	<128a885f0903121407g133ed8ctda57b21ff8adb70e@mail.gmail.com>
	<8b34ec180903121620w9c2ec46i8fed9ccb4781370e@mail.gmail.com>
	<320fb6e00903130521s69c5b3eg55b71191b1e8ff21@mail.gmail.com>
	<128a885f0903142243r372026d7vdf5bbe998db3a326@mail.gmail.com>
	<20090315185443.GA30296@kunkel>
	<320fb6e00903160430h125d11a3jd100497d3e25ffb8@mail.gmail.com>
Message-ID: <8b34ec180903160724h2e239fafi22d8f5fa9c1de7cc@mail.gmail.com>

On Mon, Mar 16, 2009 at 12:30 PM, Peter <biopython at maubp.freeserve.co.uk> wrote:
> On Sun, Mar 15, 2009 at 6:54 PM, Brad Chapman <chapmanb at 50mail.com> wrote:
>> - Pick a distributed source code management system. My preference
>> ?is Git, only because it currently has more steam behind it.
>> ?Git/Bazaar will likely end up being like the VHS/Beta debate.
>
> I would agree git has more mind share, but I have no technical reason
> to choose one over the other.
>
> In terms of read only access, having a mirrored trunk branch on both
> git (e.g. github) and bazaar (e.g. launchpad) is possible for
> evaluation purposes.

It is possible, but I don't know if we should do this. To some extent
having too much choice
might be problematic....

We've done some tests on both bzr and git and it seems that both can
do the job for us. I assume,
that the purpose of "test-driving" instead of directly switching to
git is to give us a possibility  to go
back in case things go really bad. But I don't think it's a likely
event. Bigger projects are using git
(or bzr) and doing fine, so we shouldn't have problems either.

On the other hand I don't expect that having the possibility to
test-drive two options is going to make the
decision any easier. I don't expect too many people to try both
options and even if it happens I don't think
there will be a clear acclamation that one is better than the other.
Honestly, we can't expect that all developers
will learn two tools just to help us choose... Even though I was
myself one of the proponents of switching to bzr
I think that we should focus on one option and git seems to be the one
with bigger mind share among biopythonistas.
So I would vote for dropping the discussion on bzr and focusing on
making sure that noone is left behind with their
problems during the (possibly not too long) transition to git.


>
>> - Test drive use of Git on an official GitHub repository. This would
>> ?involve a few things ...
>
> Giovanni has shared the github "Biopython" user information so we
> (i.e. Biopython) can use that for any official presence on github -
> which is great. ?Bartek and Giovanni seem to have this working OK.
>
> I think having the latest CVS trunk in Launchpad automatically is
> stalled because they (launchpad) can't cope with a simple
> username/password for accessing a remote CVS server. ?Is that right
> Bartek?
>

Yes, we have now the biopython branch on github synchronized with CVS
on an hourly basis.
There is no problem with synchronizing a branch on launchpad in the
same script, but I didn't do it for reasons explained above.

>> - Evaluate the success of Git. This is easy to measure in terms of
>> ?new contributors, increased happiness, and what not. At the same
>> ?time we can monitor how GitHub evolves over time.
>
> It may not be that easy to measure in practice...
>
Well, If everyone will be able to use git I'd say it's a success. We
don't need a perfect solution. We
want to move to _a_ distributed version control system.

> I did have that same example in mind - having to depend on a third
> party like GitHub, LaunchPad or Sourceforge is great until things go
> wrong. ?The Open Bio Foundation is much smaller, and while they don't
> have 100% uptime either, they are normally very responsive to issues
> because they only support a small number of projects. ?Of course,
> ideally we might have both - an OBF hosted (git) repository on
> biopython.org, synced to github for people to enjoy its collaborative
> additions.
>

There is one difference between moving to sourceforge and moving to git.
With git, it is much less of a problem to switch hosting. The
fundamental idea is that every branch
(including all local developer branches) can be a "master" branch. So
switching to a different
hosting location is a matter of an e-mail on the developer mailing
list telling people to update
the location of the "master" in their branches. So I think that we
need to worry less about git
hosting than we would need to worry about cvs (or svn for that matter).

cheers
  Bartek


From biopython at maubp.freeserve.co.uk  Mon Mar 16 11:00:16 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Mon, 16 Mar 2009 15:00:16 +0000
Subject: [Biopython-dev] biopython on github
In-Reply-To: <8b34ec180903160724h2e239fafi22d8f5fa9c1de7cc@mail.gmail.com>
References: <8b34ec180902231029u7a9d003r533af7f078f4a8e2@mail.gmail.com>
	<8b34ec180902260526m3ff42f3x2a99a77d4d0fb928@mail.gmail.com>
	<320fb6e00902260600p5fb90241td1ded497c08cb901@mail.gmail.com>
	<128a885f0903121407g133ed8ctda57b21ff8adb70e@mail.gmail.com>
	<8b34ec180903121620w9c2ec46i8fed9ccb4781370e@mail.gmail.com>
	<320fb6e00903130521s69c5b3eg55b71191b1e8ff21@mail.gmail.com>
	<128a885f0903142243r372026d7vdf5bbe998db3a326@mail.gmail.com>
	<20090315185443.GA30296@kunkel>
	<320fb6e00903160430h125d11a3jd100497d3e25ffb8@mail.gmail.com>
	<8b34ec180903160724h2e239fafi22d8f5fa9c1de7cc@mail.gmail.com>
Message-ID: <320fb6e00903160800s36b8231fo57e0a11506f8635d@mail.gmail.com>

On Mon, Mar 16, 2009 at 2:24 PM, Bartek Wilczynski
<bartek at rezolwenta.eu.org> wrote:
>
> On Mon, Mar 16, 2009 at 12:30 PM, Peter <biopython at maubp.freeserve.co.uk> wrote:
>> On Sun, Mar 15, 2009 at 6:54 PM, Brad Chapman <chapmanb at 50mail.com> wrote:
>>> - Pick a distributed source code management system. My preference
>>> ?is Git, only because it currently has more steam behind it.
>>> ?Git/Bazaar will likely end up being like the VHS/Beta debate.
>>
>> I would agree git has more mind share, but I have no technical reason
>> to choose one over the other.
>>
>> In terms of read only access, having a mirrored trunk branch on both
>> git (e.g. github) and bazaar (e.g. launchpad) is possible for
>> evaluation purposes.
>
> It is possible, but I don't know if we should do this. To some extent
> having too much choice might be problematic...

True.

> We've done some tests on both bzr and git and it seems that both
> can do the job for us. I assume, that the purpose of "test-driving"
> instead of directly switching to git is to give us a possibility to go
> back in case things go really bad. But I don't think it's a likely
> event. Bigger projects are using git (or bzr) and doing fine, so
> we shouldn't have problems either.

Well yes, having a fall back plan during this migration is essential.

I do think there is a separate need for "test driving" for those of us with
Biopython CVS access how don't have personally experience with git
(or github).  Making the switch before then would be a very bad idea.

I personally need to make time to play with git and github, doing a
couple of *real* branches and merges.  I hope to so this week, some
of the changes I'd like to do for Biopython 1.50 would make good
candidates... but this is time that might otherwise be spent on bug
fixes, documentation etc.  And there is of course my real job too... ;)

Related to this, what OS and version of git are you (Bartel and Giovanni) using?

> On the other hand I don't expect that having the possibility to
> test-drive two options is going to make the decision any easier.
> I don't expect too many people to try both options and even if it
> happens I don't think there will be a clear acclamation that one
> is better than the other.

I agree.

> Honestly, we can't expect that all developers will learn two tools
> just to help us choose... Even though I was myself one of the
> proponents of switching to bzr.
> I think that we should focus on one option and git seems to be the one
> with bigger mind share among biopythonistas.
> So I would vote for dropping the discussion on bzr and focusing on
> making sure that noone is left behind with their
> problems during the (possibly not too long) transition to git.

I'm happy with dropping discussion on bzr, in favour of git.

(As an aside I always liked the term biopythoneers, but biopythonistas
is fun too.)

>> Giovanni has shared the github "Biopython" user information so we
>> (i.e. Biopython) can use that for any official presence on github -
>> which is great. ?Bartek and Giovanni seem to have this working OK.
>>
>> I think having the latest CVS trunk in Launchpad automatically is
>> stalled because they (launchpad) can't cope with a simple
>> username/password for accessing a remote CVS server. ?Is that right
>> Bartek?
>
> Yes, we have now the biopython branch on github synchronized with CVS
> on an hourly basis.
> There is no problem with synchronizing a branch on launchpad in the
> same script, but I didn't do it for reasons explained above.

OK.  Do you want to make sure your Launchpad branch is clearly labeled
as not current?

> Well, If everyone will be able to use git I'd say it's a success. We
> don't need a perfect solution. We want to move to _a_ distributed
> version control system.

Well, I suspect there are some silent contributors who don't care
either way - its not perfect, but CVS works well enough.  Better
the devil you know ... ;)

> ...
> There is one difference between moving to sourceforge and moving to git.
> With git, it is much less of a problem to switch hosting... So I think that we
> need to worry less about git hosting than we would need to worry about
> cvs (or svn for that matter).

That is another good reason to pick git.

Peter


From bartek at rezolwenta.eu.org  Mon Mar 16 12:55:40 2009
From: bartek at rezolwenta.eu.org (Bartek Wilczynski)
Date: Mon, 16 Mar 2009 17:55:40 +0100
Subject: [Biopython-dev] biopython on github
In-Reply-To: <320fb6e00903160800s36b8231fo57e0a11506f8635d@mail.gmail.com>
References: <8b34ec180902231029u7a9d003r533af7f078f4a8e2@mail.gmail.com>
	<320fb6e00902260600p5fb90241td1ded497c08cb901@mail.gmail.com>
	<128a885f0903121407g133ed8ctda57b21ff8adb70e@mail.gmail.com>
	<8b34ec180903121620w9c2ec46i8fed9ccb4781370e@mail.gmail.com>
	<320fb6e00903130521s69c5b3eg55b71191b1e8ff21@mail.gmail.com>
	<128a885f0903142243r372026d7vdf5bbe998db3a326@mail.gmail.com>
	<20090315185443.GA30296@kunkel>
	<320fb6e00903160430h125d11a3jd100497d3e25ffb8@mail.gmail.com>
	<8b34ec180903160724h2e239fafi22d8f5fa9c1de7cc@mail.gmail.com>
	<320fb6e00903160800s36b8231fo57e0a11506f8635d@mail.gmail.com>
Message-ID: <8b34ec180903160955m3d427927wce61940f51cf5337@mail.gmail.com>

On Mon, Mar 16, 2009 at 4:00 PM, Peter <biopython at maubp.freeserve.co.uk> wrote:
>
> I do think there is a separate need for "test driving" for those of us with
> Biopython CVS access how don't have personally experience with git
> (or github). ?Making the switch before then would be a very bad idea.
>
> I personally need to make time to play with git and github, doing a
> couple of *real* branches and merges. ?I hope to so this week, some
> of the changes I'd like to do for Biopython 1.50 would make good
> candidates... but this is time that might otherwise be spent on bug
> fixes, documentation etc. ?And there is of course my real job too... ;)
>
> Related to this, what OS and version of git are you (Bartel and Giovanni) using?
>
I'm currently using the binary installations on mac (intel) and ubuntu
(8.10). I haven't
experienced any problems which is quite expected on unix-like systems.
It would be
interesting to hear from people's experiences on windows.

>
> OK. ?Do you want to make sure your Launchpad branch is clearly labeled
> as not current?
>

I've removed the bzr branches from launchpad, so there should be no
more confusion.

cheers
Bartek


From nuin at genedrift.org  Mon Mar 16 12:58:26 2009
From: nuin at genedrift.org (Paulo Nuin)
Date: Mon, 16 Mar 2009 12:58:26 -0400
Subject: [Biopython-dev] biopython on github
In-Reply-To: <8b34ec180903160955m3d427927wce61940f51cf5337@mail.gmail.com>
References: <8b34ec180902231029u7a9d003r533af7f078f4a8e2@mail.gmail.com>	<320fb6e00902260600p5fb90241td1ded497c08cb901@mail.gmail.com>	<128a885f0903121407g133ed8ctda57b21ff8adb70e@mail.gmail.com>	<8b34ec180903121620w9c2ec46i8fed9ccb4781370e@mail.gmail.com>	<320fb6e00903130521s69c5b3eg55b71191b1e8ff21@mail.gmail.com>	<128a885f0903142243r372026d7vdf5bbe998db3a326@mail.gmail.com>	<20090315185443.GA30296@kunkel>	<320fb6e00903160430h125d11a3jd100497d3e25ffb8@mail.gmail.com>	<8b34ec180903160724h2e239fafi22d8f5fa9c1de7cc@mail.gmail.com>	<320fb6e00903160800s36b8231fo57e0a11506f8635d@mail.gmail.com>
	<8b34ec180903160955m3d427927wce61940f51cf5337@mail.gmail.com>
Message-ID: <49BE8532.9040701@genedrift.org>

No problem on Vista.

 Git (version 1.5.6.1-preview20080701)

Paulo


Bartek Wilczynski wrote:
> On Mon, Mar 16, 2009 at 4:00 PM, Peter <biopython at maubp.freeserve.co.uk> wrote:
>   
>> I do think there is a separate need for "test driving" for those of us with
>> Biopython CVS access how don't have personally experience with git
>> (or github).  Making the switch before then would be a very bad idea.
>>
>> I personally need to make time to play with git and github, doing a
>> couple of *real* branches and merges.  I hope to so this week, some
>> of the changes I'd like to do for Biopython 1.50 would make good
>> candidates... but this is time that might otherwise be spent on bug
>> fixes, documentation etc.  And there is of course my real job too... ;)
>>
>> Related to this, what OS and version of git are you (Bartel and Giovanni) using?
>>
>>     
> I'm currently using the binary installations on mac (intel) and ubuntu
> (8.10). I haven't
> experienced any problems which is quite expected on unix-like systems.
> It would be
> interesting to hear from people's experiences on windows.
>
>   
>> OK.  Do you want to make sure your Launchpad branch is clearly labeled
>> as not current?
>>
>>     
>
> I've removed the bzr branches from launchpad, so there should be no
> more confusion.
>
> cheers
> Bartek
>
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev
>   


From biopython at maubp.freeserve.co.uk  Mon Mar 16 13:07:18 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Mon, 16 Mar 2009 17:07:18 +0000
Subject: [Biopython-dev] biopython on github
In-Reply-To: <49BE8532.9040701@genedrift.org>
References: <8b34ec180902231029u7a9d003r533af7f078f4a8e2@mail.gmail.com>
	<8b34ec180903121620w9c2ec46i8fed9ccb4781370e@mail.gmail.com>
	<320fb6e00903130521s69c5b3eg55b71191b1e8ff21@mail.gmail.com>
	<128a885f0903142243r372026d7vdf5bbe998db3a326@mail.gmail.com>
	<20090315185443.GA30296@kunkel>
	<320fb6e00903160430h125d11a3jd100497d3e25ffb8@mail.gmail.com>
	<8b34ec180903160724h2e239fafi22d8f5fa9c1de7cc@mail.gmail.com>
	<320fb6e00903160800s36b8231fo57e0a11506f8635d@mail.gmail.com>
	<8b34ec180903160955m3d427927wce61940f51cf5337@mail.gmail.com>
	<49BE8532.9040701@genedrift.org>
Message-ID: <320fb6e00903161007p3e36b6d3j29e4c319c762576a@mail.gmail.com>

On Mon, Mar 16, 2009 at 4:58 PM, Paulo Nuin <nuin at genedrift.org> wrote:
>
> No problem on Vista.
>
> Git (version 1.5.6.1-preview20080701)
>
> Paulo

Hi Paulo,

Could you be a bit more precise about the version are you using and
where got it from? i.e. Are you using cygwin or the Windows native
port, http://code.google.com/p/msysgit/

And did you mean in general you have no problems with git on Windows
Vista, or have you also tried fetching Biopython from github,
building, testing (and installing it)?  For example, are there any new
line issues from the unit tests?  This is one area where CVS and git
may differ slightly...

Thanks,

Peter

From dalloliogm at gmail.com  Mon Mar 16 15:57:38 2009
From: dalloliogm at gmail.com (Giovanni Marco Dall'Olio)
Date: Mon, 16 Mar 2009 20:57:38 +0100
Subject: [Biopython-dev] biopython on github
In-Reply-To: <320fb6e00903160800s36b8231fo57e0a11506f8635d@mail.gmail.com>
References: <8b34ec180902231029u7a9d003r533af7f078f4a8e2@mail.gmail.com>
	<320fb6e00902260600p5fb90241td1ded497c08cb901@mail.gmail.com>
	<128a885f0903121407g133ed8ctda57b21ff8adb70e@mail.gmail.com>
	<8b34ec180903121620w9c2ec46i8fed9ccb4781370e@mail.gmail.com>
	<320fb6e00903130521s69c5b3eg55b71191b1e8ff21@mail.gmail.com>
	<128a885f0903142243r372026d7vdf5bbe998db3a326@mail.gmail.com>
	<20090315185443.GA30296@kunkel>
	<320fb6e00903160430h125d11a3jd100497d3e25ffb8@mail.gmail.com>
	<8b34ec180903160724h2e239fafi22d8f5fa9c1de7cc@mail.gmail.com>
	<320fb6e00903160800s36b8231fo57e0a11506f8635d@mail.gmail.com>
Message-ID: <5aa3b3570903161257h75b4289bn6cebed8312834fc9@mail.gmail.com>

On Mon, Mar 16, 2009 at 4:00 PM, Peter <biopython at maubp.freeserve.co.uk>wrote:

> On Mon, Mar 16, 2009 at 2:24 PM, Bartek Wilczynski
> <bartek at rezolwenta.eu.org> wrote:
>
> Related to this, what OS and version of git are you (Bartel and Giovanni)
> using?


I am using git 1.5.4.3 on an Ubuntu 8.04 distribution.
At home, I am using a git binary distribution on an Ubuntu 8.10.

At the moment I am having some strange problems, relative to the fact that I
had a branch previously named as 'biopython' in my account, so it seems
don't understand well the fact that the old branch has been renamed.
For example, I don't have the 'Fork' button.... but it must be a temporary
problem, I already contacted the github's tech support.


> Peter
>
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev
>


-- 

My blog on bioinformatics (now in English): http://bioinfoblog.it

From bartek at rezolwenta.eu.org  Mon Mar 16 17:04:57 2009
From: bartek at rezolwenta.eu.org (Bartek Wilczynski)
Date: Mon, 16 Mar 2009 22:04:57 +0100
Subject: [Biopython-dev] biopython on github
In-Reply-To: <5aa3b3570903161257h75b4289bn6cebed8312834fc9@mail.gmail.com>
References: <8b34ec180902231029u7a9d003r533af7f078f4a8e2@mail.gmail.com>
	<128a885f0903121407g133ed8ctda57b21ff8adb70e@mail.gmail.com>
	<8b34ec180903121620w9c2ec46i8fed9ccb4781370e@mail.gmail.com>
	<320fb6e00903130521s69c5b3eg55b71191b1e8ff21@mail.gmail.com>
	<128a885f0903142243r372026d7vdf5bbe998db3a326@mail.gmail.com>
	<20090315185443.GA30296@kunkel>
	<320fb6e00903160430h125d11a3jd100497d3e25ffb8@mail.gmail.com>
	<8b34ec180903160724h2e239fafi22d8f5fa9c1de7cc@mail.gmail.com>
	<320fb6e00903160800s36b8231fo57e0a11506f8635d@mail.gmail.com>
	<5aa3b3570903161257h75b4289bn6cebed8312834fc9@mail.gmail.com>
Message-ID: <8b34ec180903161404s506757c2k80597a12a362cfc1@mail.gmail.com>

Hi,
On Mon, Mar 16, 2009 at 8:57 PM, Giovanni Marco Dall'Olio
<dalloliogm at gmail.com> wrote:
>
> At the moment I am having some strange problems, relative to the fact that I
> had a branch previously named as 'biopython' in my account, so it seems
> don't understand well the fact that the old branch has been renamed.
> For example, I don't have the 'Fork' button.... but it must be a temporary
> problem, I already contacted the github's tech support.
>

This is connected with the change I made in the repository. Namely I
renamed the branch created
by Giovanni to biopuython-old and created a new one (the "official"
one) called biopython again.

The "rename" feature was flagged as experimental, and I don't think we
would expect to use it anymore,
and there were warnings that it can affect the branches forked from
the branched previously created by Giovanni.

These two branches were incompatible, since they were done with
different scripts (different revision numbers).
So if you need to make retain some changes you made to the old branch,
please export them from your local copy as
changesets and apply these back to the new forks made from the new repository.

I'm sorry for the inconvenience.

cheers
Bartek

From chapmanb at 50mail.com  Mon Mar 16 18:42:40 2009
From: chapmanb at 50mail.com (Brad Chapman)
Date: Mon, 16 Mar 2009 18:42:40 -0400
Subject: [Biopython-dev] biopython on github
In-Reply-To: <320fb6e00903160430h125d11a3jd100497d3e25ffb8@mail.gmail.com>
References: <320fb6e00902250210t2ad19536ke379e219ba6f7dae@mail.gmail.com>
	<8b34ec180902250256k6f6f5c1bvbf85d8b68a315927@mail.gmail.com>
	<8b34ec180902260526m3ff42f3x2a99a77d4d0fb928@mail.gmail.com>
	<320fb6e00902260600p5fb90241td1ded497c08cb901@mail.gmail.com>
	<128a885f0903121407g133ed8ctda57b21ff8adb70e@mail.gmail.com>
	<8b34ec180903121620w9c2ec46i8fed9ccb4781370e@mail.gmail.com>
	<320fb6e00903130521s69c5b3eg55b71191b1e8ff21@mail.gmail.com>
	<128a885f0903142243r372026d7vdf5bbe998db3a326@mail.gmail.com>
	<20090315185443.GA30296@kunkel>
	<320fb6e00903160430h125d11a3jd100497d3e25ffb8@mail.gmail.com>
Message-ID: <20090316224240.GA57054@sobchak.mgh.harvard.edu>

Hey everyone;
Wow, y'all are quick. Bartek, Giovanni and Peter -- thanks for all
the hard work and organization. Consolidating a couple of threads
below...

> >> I've written a short guide on the wiki :
> >> http://biopython.org/wiki/GitMigration
> >
> > I also have a draft for some documentation... I can contribute it later this
> > morning (now I don't have time).
> 
> In the meantime, I have updated the following pages accordingly:

The documentation looks awesome. My only suggestion would be to
change the navigation link that current points to CVS to point to a
generic page like SourceCode. Then that landing page could link
to the current CVS and explain we are working to transition to 
Git, with links to those pages. Currently, the Git docs are a
bit buried from the front page.

Peter, I don't appear to have wiki permissions to edit the navigation
bar; do you?

Peter:
> I'm thinking a news post on
> http://news.open-bio.org/news/category/obf-projects/biopython/ about
> version control would be a good idea at this point.  How about this -

This is great, and I would move the last paragraph describing
the Git repository to the beginning; start with what we are doing and
then describe the rationale. This should help for those with ADD, and
also give more prominent credit to Bartek, Giovanni and you for the
work that went into this.

> > - Evaluate the success of Git. This is easy to measure in terms of
> > ?new contributors, increased happiness, and what not. At the same
> > ?time we can monitor how GitHub evolves over time.
> 
> It may not be that easy to measure in practice...

How about these two metrics:

- How do current developers like it? Beyond the initial learning
  curve, does it work at least as good as CVS for day to day stuff?

- Does it lower the entry barriers to contributing to Biopython? The
  main reason to do this is to ease the initial work for coders who
  feel CVS/Patches/Bugzilla is too much. If we find new contributors
  through this, it's a win.

Modest expectations are good. If either of these fail miserably, then 
we can re-evaluate.

Brad

From chapmanb at 50mail.com  Mon Mar 16 18:55:58 2009
From: chapmanb at 50mail.com (Brad Chapman)
Date: Mon, 16 Mar 2009 18:55:58 -0400
Subject: [Biopython-dev] Preparing for Biopython 1.50 (beta)
In-Reply-To: <320fb6e00903160516yd63f61fu21ca7560562dd6dd@mail.gmail.com>
References: <320fb6e00903160516yd63f61fu21ca7560562dd6dd@mail.gmail.com>
Message-ID: <20090316225558.GC57054@sobchak.mgh.harvard.edu>

Peter;

> I think we should probably do another release soon 

Good call. +1 from me.

> I'd like to include the following changes as part of the beta, but it
> would be sensible to have someone else try these out first.  Any
> volunteers?
> 
> Bug 2767 - Bio.SeqIO support for FASTQ and QUAL files

The code for this looked good when I reviewed it earlier. I will
test it out with some solexa reads from here this week; any reason
not to check the patch and files into CVS? Then I can fire up my
coal-powered revision control system, feed two punch cards into the
mouth of the machine, hope the vacuum tubes don't burn out again,
and check it out locally.

Brad

From tiagoantao at gmail.com  Mon Mar 16 20:11:50 2009
From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=)
Date: Tue, 17 Mar 2009 00:11:50 +0000
Subject: [Biopython-dev] biopython on github
In-Reply-To: <20090316224240.GA57054@sobchak.mgh.harvard.edu>
References: <320fb6e00902250210t2ad19536ke379e219ba6f7dae@mail.gmail.com>
	<8b34ec180902260526m3ff42f3x2a99a77d4d0fb928@mail.gmail.com>
	<320fb6e00902260600p5fb90241td1ded497c08cb901@mail.gmail.com>
	<128a885f0903121407g133ed8ctda57b21ff8adb70e@mail.gmail.com>
	<8b34ec180903121620w9c2ec46i8fed9ccb4781370e@mail.gmail.com>
	<320fb6e00903130521s69c5b3eg55b71191b1e8ff21@mail.gmail.com>
	<128a885f0903142243r372026d7vdf5bbe998db3a326@mail.gmail.com>
	<20090315185443.GA30296@kunkel>
	<320fb6e00903160430h125d11a3jd100497d3e25ffb8@mail.gmail.com>
	<20090316224240.GA57054@sobchak.mgh.harvard.edu>
Message-ID: <6d941f120903161711p71c7c940t1eabe933c0fa43e5@mail.gmail.com>

I've been reading this thread and mainly staying silent but there is
one question that is not clear in my mind but I believe it is
important:

How is the "official" biopython trunk controlled? Currently what is on
CVS is the gospel and Peter and Michiel essencially have control of
what is there and what is labelled as a "biopython distribution". How
will this work now?
The second question, related to the first is how will different
branches (of different persons) be managed? I am seeing people
starting working on the same code in different directions and then
having problems merging everything together.

Maybe these questions stem from my ignorance of distributed version
control. But, if not, I think they should be resolved before
advancing.

My suggestion: write (or at least informally agree) the policy before
advancing. While distributed version control seems a good idea (no
opposition), it also seems a good way to create new problems.

BTW, I would be tempted to suggest that a labelled release would be a
good starting point for a distributed revision control bootstrap.

On Mon, Mar 16, 2009 at 10:42 PM, Brad Chapman <chapmanb at 50mail.com> wrote:
> Hey everyone;
> Wow, y'all are quick. Bartek, Giovanni and Peter -- thanks for all
> the hard work and organization. Consolidating a couple of threads
> below...
>
>> >> I've written a short guide on the wiki :
>> >> http://biopython.org/wiki/GitMigration
>> >
>> > I also have a draft for some documentation... I can contribute it later this
>> > morning (now I don't have time).
>>
>> In the meantime, I have updated the following pages accordingly:
>
> The documentation looks awesome. My only suggestion would be to
> change the navigation link that current points to CVS to point to a
> generic page like SourceCode. Then that landing page could link
> to the current CVS and explain we are working to transition to
> Git, with links to those pages. Currently, the Git docs are a
> bit buried from the front page.
>
> Peter, I don't appear to have wiki permissions to edit the navigation
> bar; do you?
>
> Peter:
>> I'm thinking a news post on
>> http://news.open-bio.org/news/category/obf-projects/biopython/ about
>> version control would be a good idea at this point. ?How about this -
>
> This is great, and I would move the last paragraph describing
> the Git repository to the beginning; start with what we are doing and
> then describe the rationale. This should help for those with ADD, and
> also give more prominent credit to Bartek, Giovanni and you for the
> work that went into this.
>
>> > - Evaluate the success of Git. This is easy to measure in terms of
>> > ?new contributors, increased happiness, and what not. At the same
>> > ?time we can monitor how GitHub evolves over time.
>>
>> It may not be that easy to measure in practice...
>
> How about these two metrics:
>
> - How do current developers like it? Beyond the initial learning
> ?curve, does it work at least as good as CVS for day to day stuff?
>
> - Does it lower the entry barriers to contributing to Biopython? The
> ?main reason to do this is to ease the initial work for coders who
> ?feel CVS/Patches/Bugzilla is too much. If we find new contributors
> ?through this, it's a win.
>
> Modest expectations are good. If either of these fail miserably, then
> we can re-evaluate.
>
> Brad
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev
>


-- 
 "A man who dares to waste one hour of time has not discovered the
value of life" - Charles Darwin


From dschruth at u.washington.edu  Mon Mar 16 19:15:39 2009
From: dschruth at u.washington.edu (David Schruth)
Date: Mon, 16 Mar 2009 16:15:39 -0700
Subject: [Biopython-dev] Preparing for Biopython 1.50 (beta)
In-Reply-To: <20090316225558.GC57054@sobchak.mgh.harvard.edu>
References: <320fb6e00903160516yd63f61fu21ca7560562dd6dd@mail.gmail.com>
	<20090316225558.GC57054@sobchak.mgh.harvard.edu>
Message-ID: <49BEDD9B.6030905@u.washington.edu>

I've got some 454 and Solid data you could test it on too.

Has anybody else looked into how these other two Next Gen formats might 
complicate things?

Brad Chapman wrote:
> Peter;
>
>   
>> I think we should probably do another release soon 
>>     
>
> Good call. +1 from me.
>
>   
>> I'd like to include the following changes as part of the beta, but it
>> would be sensible to have someone else try these out first.  Any
>> volunteers?
>>
>> Bug 2767 - Bio.SeqIO support for FASTQ and QUAL files
>>     
>
> The code for this looked good when I reviewed it earlier. I will
> test it out with some solexa reads from here this week; any reason
> not to check the patch and files into CVS? Then I can fire up my
> coal-powered revision control system, feed two punch cards into the
> mouth of the machine, hope the vacuum tubes don't burn out again,
> and check it out locally.
>
> Brad
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev
>   
-------------- next part --------------
A non-text attachment was scrubbed...
Name: dschruth.vcf
Type: text/x-vcard
Size: 450 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/biopython-dev/attachments/20090316/581f9e69/attachment.vcf>

From bugzilla-daemon at portal.open-bio.org  Mon Mar 16 20:40:01 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 16 Mar 2009 20:40:01 -0400
Subject: [Biopython-dev] [Bug 2790] New: Genepop parser creates a full
	representation of the file on memory
Message-ID: <bug-2790-42@http.bugzilla.open-bio.org/>

http://bugzilla.open-bio.org/show_bug.cgi?id=2790

           Summary: Genepop parser creates a full representation of the file
                    on memory
           Product: Biopython
           Version: Not Applicable
          Platform: All
        OS/Version: All
            Status: NEW
          Severity: normal
          Priority: P2
         Component: PopGen
        AssignedTo: biopython-dev at biopython.org
        ReportedBy: tiagoantao at gmail.com


The genepop parser creates a full representation of the file on memory.

This is fine for most users (like with 100/200 individuals and 100 markers)
but, more and more people appear now with thousands of individuals and/or
thousands of loci. In some cases the whole file doesn't fit memory.

An alternative (iterator based) interface has to be created which only
maintains a subset of the file in memory


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From idoerg at gmail.com  Mon Mar 16 20:49:39 2009
From: idoerg at gmail.com (Iddo Friedberg)
Date: Mon, 16 Mar 2009 17:49:39 -0700
Subject: [Biopython-dev] Preparing for Biopython 1.50 (beta)
In-Reply-To: <49BEDD9B.6030905@u.washington.edu>
References: <320fb6e00903160516yd63f61fu21ca7560562dd6dd@mail.gmail.com>
	<20090316225558.GC57054@sobchak.mgh.harvard.edu>
	<49BEDD9B.6030905@u.washington.edu>
Message-ID: <1237250979.20135.5.camel@lafa>

I have.

For one thing, GenBank has some new files that break the current parser.

LOCUS       ABDH01000000           55108 rc    DNA     linear   ENV 26-NOV-2007


This is a typical header for an environmental sequence (notice the ENV).
Note taht this does not necessarily have to be a next-gen sequence. It
can also be Sanger. The point is, it's not genome associated, but
obtained using metagenomic methods

To our business: the "rc" breaks the parser.


The file itself is attahed. Note that in the end iit does not have a
sequence, but rather a WGS field that points to sequence files.

I'll actually be happy to take this one.

./I


On Mon, 2009-03-16 at 16:15 -0700, David Schruth wrote:
> I've got some 454 and Solid data you could test it on too.
> 
> Has anybody else looked into how these other two Next Gen formats might 
> complicate things?
> 
> Brad Chapman wrote:
> > Peter;
> >
> >   
> >> I think we should probably do another release soon 
> >>     
> >
> > Good call. +1 from me.
> >
> >   
> >> I'd like to include the following changes as part of the beta, but it
> >> would be sensible to have someone else try these out first.  Any
> >> volunteers?
> >>
> >> Bug 2767 - Bio.SeqIO support for FASTQ and QUAL files
> >>     
> >
> > The code for this looked good when I reviewed it earlier. I will
> > test it out with some solexa reads from here this week; any reason
> > not to check the patch and files into CVS? Then I can fire up my
> > coal-powered revision control system, feed two punch cards into the
> > mouth of the machine, hope the vacuum tubes don't burn out again,
> > and check it out locally.
> >
> > Brad
> > _______________________________________________
> > Biopython-dev mailing list
> > Biopython-dev at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/biopython-dev
> >   
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev
-- 
Iddo Friedberg, Ph.D.
CALIT2 Atkinson Hall MC #0446
University of California San Diego
9500 Gilman Drive
La Jolla, CA 92093-0446 USA
+1 (858) 534-0570
http://iddo-friedberg.org
-------------- next part --------------
LOCUS       ABDH01000000           55108 rc    DNA     linear   ENV 26-NOV-2007
DEFINITION  Termite gut metagenome, whole genome shotgun sequencing project.
ACCESSION   ABDH00000000
VERSION     ABDH00000000.1  GI:161074815
PROJECT     GenomeProject:19107
DBLINK      Project:19107
KEYWORDS    WGS.
SOURCE      termite gut metagenome
  ORGANISM  termite gut metagenome
            unclassified sequences; metagenomes; organismal metagenomes.
REFERENCE   1  (bases 1 to 55108)
  AUTHORS   Warnecke,F., Luginbuhl,P., Ivanova,N., Ghassemian,M.,
            Richardson,T.H., Stege,J.T., Cayouette,M., McHardy,A.C.,
            Djordjevic,G., Aboushadi,N., Sorek,R., Tringe,S.G., Podar,M.,
            Martin,H.G., Kunin,V., Dalevi,D., Madejska,J., Kirton,E., Platt,D.,
            Szeto,E., Salamov,A., Barry,K., Mikhailova,N., Kyrpides,N.C.,
            Matson,E.G., Ottesen,E.A., Zhang,X., Hernandez,M., Murillo,C.,
            Acosta,L.G., Rigoutsos,I., Tamayo,G., Green,B.D., Chang,C.,
            Rubin,E.M., Mathur,E.J., Robertson,D.E., Hugenholtz,P. and
            Leadbetter,J.R.
  TITLE     Metagenomic and functional analysis of hindgut microbiota of a
            wood-feeding higher termite
  JOURNAL   Nature 450 (7169), 560-565 (2007)
   PUBMED   18033299
REFERENCE   2  (bases 1 to 55108)
  AUTHORS   Warnecke,F., Luginbuhl,P., Ivanova,N., Ghassemian,M.,
            Richardson,T.H., Stege,J.T., Cayouette,M., Djordjevic,G.,
            Aboushadi,N., Sorek,R., Tringe,S.G., Podar,M., Garcia Martin,H.,
            Kunin,V., McHardy,A.C., Dalevi,D., Madejska,J., Kirton,E.,
            Platt,D., Szeto,E., Salamov,A., Barry,K., Mikhailova,N.,
            Kyrpides,N., Matson,E.G., Ottesen,E.A., Zhang,X., Hernandez,M.,
            Murillo,C., Acosta,L.G., Rigoutsos,I., Tamayo,G., Green,B.,
            Chang,C., Rubin,E.M., Mathur,E.J., Robertson,D.E., Hugenholtz,P.
            and Leadbetter,J.R.
  TITLE     Direct Submission
  JOURNAL   Submitted (27-JUN-2007) Microbial Ecology Program, US DOE Joint
            Genome Institute, 2800 Mitchell Drive B100, Walnut Creek, CA
            94598-1698, USA
COMMENT     The termite gut metagenome whole genome shotgun (WGS) project has
            the project accession ABDH00000000.  This version of the project
            (01) has the accession number ABDH01000000, and consists of
            sequences ABDH01000001-ABDH01055108.
            URL -- http://www.jgi.doe.gov
            JGI Project ID:4001605
            Contact: Philip Hugenholtz (PHugenholtz at lbl.gov)
            sampling site latitude: N10.11.260; sampling site longitude:
            W083.51.345; sampling site altitude: 310 m AMSL; sample type: lumen
            content; host species: Nasutitermes sp.; anatomic site: gut,
            proctodeal segment 3, lumen; association type: symbiosis; sample
            treatment and preservation: termites were collected, transported to
            laboratory alive within 36 hours, P3 gut lumen fluid was extracted
            and stored frozen in buffered saline solution until DNA extraction.
            The JGI and collaborators endorse the principles for the
            distribution and use of large scale sequencing data adopted by the
            larger genome sequencing community and urge users of this data to
            follow them. It is our intention to publish the work of this
            project in a timely fashion and we welcome collaborative
            interaction on the project and analysis.
            (http://www.genome.gov/page.cfm?pageID=10506376).
FEATURES             Location/Qualifiers
     source          1..55108
                     /organism="termite gut metagenome"
                     /mol_type="genomic DNA"
                     /isolation_source="Nasutitermes sp. proctodeal segment 3
                     gut lumen"
                     /db_xref="taxon:433724"
                     /environmental_sample
                     /country="Costa Rica"
                     /lat_lon="10.1877 N 83.8558 W"
                     /note="metagenomic"
WGS         ABDH01000001-ABDH01055108
//

From chris.lasher at gmail.com  Mon Mar 16 23:45:33 2009
From: chris.lasher at gmail.com (Chris Lasher)
Date: Mon, 16 Mar 2009 23:45:33 -0400
Subject: [Biopython-dev] biopython on github
In-Reply-To: <6d941f120903161711p71c7c940t1eabe933c0fa43e5@mail.gmail.com>
References: <320fb6e00902250210t2ad19536ke379e219ba6f7dae@mail.gmail.com>
	<320fb6e00902260600p5fb90241td1ded497c08cb901@mail.gmail.com>
	<128a885f0903121407g133ed8ctda57b21ff8adb70e@mail.gmail.com>
	<8b34ec180903121620w9c2ec46i8fed9ccb4781370e@mail.gmail.com>
	<320fb6e00903130521s69c5b3eg55b71191b1e8ff21@mail.gmail.com>
	<128a885f0903142243r372026d7vdf5bbe998db3a326@mail.gmail.com>
	<20090315185443.GA30296@kunkel>
	<320fb6e00903160430h125d11a3jd100497d3e25ffb8@mail.gmail.com>
	<20090316224240.GA57054@sobchak.mgh.harvard.edu>
	<6d941f120903161711p71c7c940t1eabe933c0fa43e5@mail.gmail.com>
Message-ID: <128a885f0903162045l474d0df3w2b8fad7f7f129a3b@mail.gmail.com>

2009/3/16 Tiago Ant?o <tiagoantao at gmail.com>

> I've been reading this thread and mainly staying silent but there is
> one question that is not clear in my mind but I believe it is
> important:
>
> How is the "official" biopython trunk controlled? Currently what is on
> CVS is the gospel and Peter and Michiel essencially have control of
> what is there and what is labelled as a "biopython distribution". How
> will this work now?


In a distributed workflow, there is no technical official repository. The
"official repository" is socially enforced. Technically, there is no
official repository of the Linux kernel anymore. However, there is an
"official" version, which is Linus Torvald's repository. It is socially
enforced. I think Michiel and Peter still head the Biopython project--at
least they have the most clout, I would say. Therefore, we will probably
look to one of their branches as the "official" branch of Biopython. When
one of them wants to step down in duty, we will socially pass the torch on
to the next taker.

See "6.3 Using gatekeepers" at
http://doc.bazaar-vcs.org/latest/en/user-guide/index.html#team-collaboration-distributed-style
See also
http://betterexplained.com/articles/intro-to-distributed-version-control-illustrated/


> The second question, related to the first is how will different
> branches (of different persons) be managed? I am seeing people
> starting working on the same code in different directions and then
> having problems merging everything together.


People are supposed to work in different directions; this is the point of
distributed workflows. Merging tends not to be so difficult, and compared to
centralized models like CVS and SVN, it's a cinch. We will help provide
documentation for proper merging habits (e.g., merge early, merge often, and
no rebasing after pushing, etc.). There are also screencasts popping up (in
particular Scott Chacon's re-make of his Gitcasts, now at learn.github) that
we will link to for educational purposes. And of course, other developers
will be around to help out in tricky merges.

Chris


From bugzilla-daemon at portal.open-bio.org  Tue Mar 17 00:11:34 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 17 Mar 2009 00:11:34 -0400
Subject: [Biopython-dev] [Bug 2791] New: GenBank Scanner does not parse
	environmental (ENV) files
Message-ID: <bug-2791-42@http.bugzilla.open-bio.org/>

http://bugzilla.open-bio.org/show_bug.cgi?id=2791

           Summary: GenBank Scanner does not parse environmental (ENV) files
           Product: Biopython
           Version: 1.49
          Platform: All
        OS/Version: All
            Status: NEW
          Severity: major
          Priority: P2
         Component: Main Distribution
        AssignedTo: biopython-dev at biopython.org
        ReportedBy: idoerg at gmail.com
                CC: idoerg at gmail.com


GenBank Scanner does not parse environmental (ENV) files. Breask on the 'rc'
characters in the LOCUS lines.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Tue Mar 17 00:14:50 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 17 Mar 2009 00:14:50 -0400
Subject: [Biopython-dev] [Bug 2791] GenBank Scanner does not parse
	environmental (ENV) files
In-Reply-To: <bug-2791-42@http.bugzilla.open-bio.org/>
Message-ID: <200903170414.n2H4Eoit008338@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2791


idoerg at gmail.com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |ASSIGNED


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Tue Mar 17 00:32:30 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 17 Mar 2009 00:32:30 -0400
Subject: [Biopython-dev] [Bug 2791] GenBank Scanner does not parse
	environmental (ENV) files
In-Reply-To: <bug-2791-42@http.bugzilla.open-bio.org/>
Message-ID: <200903170432.n2H4WUQn009490@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2791


idoerg at gmail.com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         AssignedTo|biopython-dev at biopython.org |idoerg at gmail.com
             Status|ASSIGNED                    |NEW


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
You are the assignee for the bug, or are watching the assignee.

From biopython at maubp.freeserve.co.uk  Tue Mar 17 04:46:03 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Tue, 17 Mar 2009 08:46:03 +0000
Subject: [Biopython-dev] biopython on github
In-Reply-To: <128a885f0903162045l474d0df3w2b8fad7f7f129a3b@mail.gmail.com>
References: <320fb6e00902250210t2ad19536ke379e219ba6f7dae@mail.gmail.com>
	<128a885f0903121407g133ed8ctda57b21ff8adb70e@mail.gmail.com>
	<8b34ec180903121620w9c2ec46i8fed9ccb4781370e@mail.gmail.com>
	<320fb6e00903130521s69c5b3eg55b71191b1e8ff21@mail.gmail.com>
	<128a885f0903142243r372026d7vdf5bbe998db3a326@mail.gmail.com>
	<20090315185443.GA30296@kunkel>
	<320fb6e00903160430h125d11a3jd100497d3e25ffb8@mail.gmail.com>
	<20090316224240.GA57054@sobchak.mgh.harvard.edu>
	<6d941f120903161711p71c7c940t1eabe933c0fa43e5@mail.gmail.com>
	<128a885f0903162045l474d0df3w2b8fad7f7f129a3b@mail.gmail.com>
Message-ID: <320fb6e00903170146x59218aa0m857cab797ad3f440@mail.gmail.com>

On Tue, Mar 17, 2009 at 3:45 AM, Chris Lasher <chris.lasher at gmail.com> wrote:
> 2009/3/16 Tiago Ant?o <tiagoantao at gmail.com>
>
>> I've been reading this thread and mainly staying silent but there is
>> one question that is not clear in my mind but I believe it is
>> important:
>>
>> How is the "official" biopython trunk controlled? Currently what is on
>> CVS is the gospel and Peter and Michiel essencially have control of
>> what is there and what is labelled as a "biopython distribution". How
>> will this work now?
>
> In a distributed workflow, there is no technical official repository. The
> "official repository" is socially enforced. Technically, there is no
> official repository of the Linux kernel anymore. However, there is an
> "official" version, which is Linus Torvald's repository. It is socially
> enforced. I think Michiel and Peter still head the Biopython project--at
> least they have the most clout, I would say. Therefore, we will probably
> look to one of their branches as the "official" branch of Biopython. When
> one of them wants to step down in duty, we will socially pass the torch on
> to the next taker.

I think it is essential we have a clearly labeled official trunk
(perhaps with branches for releases), which will be used for all the
official releases (tar balls, zip files and windows installers).  Our
main webpage should make this very clear.

We could potentially continue to have a shared official branch (e.g.
belonging to the generic github biopython user), and give all the
existing CVS contributors write access - and continue to manage this
as before.  So for example, if Frank wanted to check in some minor
changes to Bio.Nexus he could just do it.  Future contributors
patches/branches might get taken up by a developer on a personal
branch for testing, before being merged into the official branch.

i.e. We can initially continue as before - right now I don't have a
feel for how much work the role of an official branch maintainer would
be, and it is difficult to guess without more hands on experience
using the new tools.

>> The second question, related to the first is how will different
>> branches (of different persons) be managed? I am seeing people
>> starting working on the same code in different directions and then
>> having problems merging everything together.
>
> People are supposed to work in different directions; this is the point of
> distributed workflows. Merging tends not to be so difficult, and compared to
> centralized models like CVS and SVN, it's a cinch. We will help provide
> documentation for proper merging habits (e.g., merge early, merge often, and
> no rebasing after pushing, etc.). There are also screencasts popping up (in
> particular Scott Chacon's re-make of his Gitcasts, now at learn.github) that
> we will link to for educational purposes. And of course, other developers
> will be around to help out in tricky merges.

Well, yes, in theory we have the same problem now with CVS - and while
the tools may make merging easier, some communication is essential
when working on the key modules which impact large parts of the code
base.

Peter


From biopython at maubp.freeserve.co.uk  Tue Mar 17 04:58:00 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Tue, 17 Mar 2009 08:58:00 +0000
Subject: [Biopython-dev] biopython on github
In-Reply-To: <20090316224240.GA57054@sobchak.mgh.harvard.edu>
References: <320fb6e00902250210t2ad19536ke379e219ba6f7dae@mail.gmail.com>
	<8b34ec180902260526m3ff42f3x2a99a77d4d0fb928@mail.gmail.com>
	<320fb6e00902260600p5fb90241td1ded497c08cb901@mail.gmail.com>
	<128a885f0903121407g133ed8ctda57b21ff8adb70e@mail.gmail.com>
	<8b34ec180903121620w9c2ec46i8fed9ccb4781370e@mail.gmail.com>
	<320fb6e00903130521s69c5b3eg55b71191b1e8ff21@mail.gmail.com>
	<128a885f0903142243r372026d7vdf5bbe998db3a326@mail.gmail.com>
	<20090315185443.GA30296@kunkel>
	<320fb6e00903160430h125d11a3jd100497d3e25ffb8@mail.gmail.com>
	<20090316224240.GA57054@sobchak.mgh.harvard.edu>
Message-ID: <320fb6e00903170158o757a4fc4naae80f83850d6093@mail.gmail.com>

>
> The documentation looks awesome. My only suggestion would be to
> change the navigation link that current points to CVS to point to a
> generic page like SourceCode. Then that landing page could link
> to the current CVS and explain we are working to transition to
> Git, with links to those pages. Currently, the Git docs are a
> bit buried from the front page.
>
> Peter, I don't appear to have wiki permissions to edit the navigation
> bar; do you?

I'm not sure how to do it (although I probably have the relevant
permissions).  I can probably give you admin rights - you use the
"Chapmanb" username on the wiki, right?

> Peter:
>> I'm thinking a news post on
>> http://news.open-bio.org/news/category/obf-projects/biopython/ about
>> version control would be a good idea at this point. ?How about this -
>
> This is great, and I would move the last paragraph describing
> the Git repository to the beginning; start with what we are doing and
> then describe the rationale. This should help for those with ADD, and
> also give more prominent credit to Bartek, Giovanni and you for the
> work that went into this.

OK.  New version, with the markup for the links included:

Initially for evaluation purposes only, Giovanni and Bartek have setup
a mirror of <a href="http://github.com/biopython/biopython/tree/master">Biopython
on GitHub</a>, which is automatically updated from the OBF hosted <a
href="http://www.biopython.org/wiki/CVS">Biopython CVS repository</a>.
 See our <a href="http://biopython.org/wiki/GitMigration">git
migration wiki page</a> for details.  If this is favorably received,
then moving Biopython from CVS to git seems likely at some point this
year.

Originally, all the OBF hosted projects used <a
href="http://www.nongnu.org/cvs/">CVS</a> for their source code
repositories.  At the start of 2008, <a
href="http://www.bioperl.org">BioPerl</a> and <a
href="http://www.biojava.org">BioJava</a> moved over to <a
href="http://subversion.tigris.org/">Subversion (SVN)</a>, followed by
<a href="http://www.biosql.org">BioSQL</a>.  <a
href="http://www.biopython.org">Biopython</a> was originally going to
do the same, but this didn't actually happen.  Having all the Bio*
projects using the same version control system would have simplified
server administration for the OBF, but using SVN wouldn't really have
made that much difference to Biopython development.  Discussion on the
<a href="http://biopython.org/pipermail/biopython-dev/">Biopython
development mailing list</a> has since shifted towards next-generation
distributed version control systems like <a
href="http://git-scm.com/">git</a> or <a
href="http://bazaar-vcs.org/">Bazaar</a>.

Quote from Linus Torvalds,
<blockquote>The slogan of Subversion for a while was ?CVS done right?,
or something like that, and if you start with that kind of slogan,
there's nowhere you can go. There is no way to do CVS
right.</blockquote>

In addition to creating the Linux kernel, Linus Torvalds more recently
wrote <a href="http://git-scm.com/">git</a>, a prominent example of a
distributed version control system.  Rather than switching from CVS to
SVN, the <a href="http://www.bioruby.org">BioRuby</a> project choose
instead to use git, hosted on <a href="http://github.com">github</a>
(see the <a href="http://github.com/bioruby/bioruby/tree/master">BioRuby
repository</a>).  Biopython is considering doing something similar -
using a <em>distributed</em> version control system like git should
make it easier for potential Biopython contributors to manage their
own local copies of Biopython under version control.

Peter, on behalf of the Biopython developers


From biopython at maubp.freeserve.co.uk  Tue Mar 17 05:06:31 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Tue, 17 Mar 2009 09:06:31 +0000
Subject: [Biopython-dev] history on github - where are the tags?
Message-ID: <320fb6e00903170206h570989bbgb6b9a761d2aa70ed@mail.gmail.com>

Hi Bartek et al,

I've just been looking over the github mirror of CVS, and wanted to
see it presented the history of individual files.  For example, this
page looks at the Bio/SeqRecord.py history using ViewCVS:
http://cvs.biopython.org/cgi-bin/viewcvs/viewcvs.cgi/biopython/Bio/SeqRecord.py?cvsroot=biopython

For comparison, in GitHub,
http://github.com/biopython/biopython/commits/master/Bio/SeqRecord.py

As you can see, all the comments and changes are there - which is
great.  But I can't see the CVS tag information, which I assume would
be converting into git tags.  Is this information present in the git
repository, but not shown by github, or was it lost during the
migration?  This might seem like a little thing, but I have found it
incredibly important for tracing bugs reported in older releases, for
example in narrowing down when something changed.

Peter

From biopython at maubp.freeserve.co.uk  Tue Mar 17 05:41:22 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Tue, 17 Mar 2009 09:41:22 +0000
Subject: [Biopython-dev] biopython on github
In-Reply-To: <8b34ec180903161404s506757c2k80597a12a362cfc1@mail.gmail.com>
References: <8b34ec180902231029u7a9d003r533af7f078f4a8e2@mail.gmail.com>
	<8b34ec180903121620w9c2ec46i8fed9ccb4781370e@mail.gmail.com>
	<320fb6e00903130521s69c5b3eg55b71191b1e8ff21@mail.gmail.com>
	<128a885f0903142243r372026d7vdf5bbe998db3a326@mail.gmail.com>
	<20090315185443.GA30296@kunkel>
	<320fb6e00903160430h125d11a3jd100497d3e25ffb8@mail.gmail.com>
	<8b34ec180903160724h2e239fafi22d8f5fa9c1de7cc@mail.gmail.com>
	<320fb6e00903160800s36b8231fo57e0a11506f8635d@mail.gmail.com>
	<5aa3b3570903161257h75b4289bn6cebed8312834fc9@mail.gmail.com>
	<8b34ec180903161404s506757c2k80597a12a362cfc1@mail.gmail.com>
Message-ID: <320fb6e00903170241i5b4a122ax1f33ff18450771df@mail.gmail.com>

On Mon, Mar 16, 2009 at 9:04 PM, Bartek Wilczynski
<bartek at rezolwenta.eu.org> wrote:
> Hi,
> On Mon, Mar 16, 2009 at 8:57 PM, Giovanni Marco Dall'Olio
> <dalloliogm at gmail.com> wrote:
>>
>> At the moment I am having some strange problems, relative to the fact that I
>> had a branch previously named as 'biopython' in my account, so it seems
>> don't understand well the fact that the old branch has been renamed.
>> For example, I don't have the 'Fork' button.... but it must be a temporary
>> problem, I already contacted the github's tech support.
>
> This is connected with the change I made in the repository. Namely I
> renamed the branch created by Giovanni to biopuython-old and created
> a new one (the "official" one) called biopython again.
>
> The "rename" feature was flagged as experimental, and I don't think we
> would expect to use it anymore, and there were warnings that it can affect
> the branches forked from the branched previously created by Giovanni.

We may need to do another rename, if we have to repeat the CVS to git migration.
For example, see my other email about the CVS tags (missing?).
Another potential
question is can you re-map the CVS usernames as part of the migration?  e.g. Can
you somehow replace CVS users "bartek", "peterc", ... with guthub
users "barwil",
"peterjc", ...?  Not essential, but it would be nice.

I would suggest as a precaution we rename it sooner rather than later
(while only
a few people will be inconvenienced), going from biopython to
biopython-cvs-mirror
(or similar).  If this does end up being the actual trunk branch, we
can just fork it
under a new branch name like "biopython" or "biopython-official" etc.

Peter

From lpritc at scri.ac.uk  Tue Mar 17 05:59:32 2009
From: lpritc at scri.ac.uk (Leighton Pritchard)
Date: Tue, 17 Mar 2009 09:59:32 +0000
Subject: [Biopython-dev] biopython on github
In-Reply-To: <320fb6e00903170146x59218aa0m857cab797ad3f440@mail.gmail.com>
Message-ID: <C5E52504.1F20A%lpritc@scri.ac.uk>

Hi all,

This has been an occasionally frustrating thread to read...

On 17/03/2009 08:46, "Peter" <biopython at maubp.freeserve.co.uk> wrote:

> On Tue, Mar 17, 2009 at 3:45 AM, Chris Lasher <chris.lasher at gmail.com> wrote:
>> 2009/3/16 Tiago Ant?o <tiagoantao at gmail.com>
>> 

>>> How is the "official" biopython trunk controlled? Currently what is on
>>> CVS is the gospel and Peter and Michiel essencially have control of
>>> what is there and what is labelled as a "biopython distribution". How
>>> will this work now?
 
>> In a distributed workflow, there is no technical official repository. The
>> "official repository" is socially enforced.

That was true before.  Unless I misread the Biopython licencing, there was
no real barrier to putting a branched copy of the code on your own
server/site, with your own modifications.  What git does is provide tools to
make merging of that sort of code easier (along with a number of of other
nice features, such as authentication of contributions).  The presence of
git does not ensure that your changes, or anyone else's, will be merged with
any other repository, and nor does it ensure the quality of contributed
code.  Git, while nice, and ideal for a number of tasks, is no magic bullet.

To an extent, the 'official' repository is, pragmatically, the one that is
most stable and well-tested.  If my hypothetical branched version had become
more stable and widely-used than the 'official' trunk, and become the most
frequently downloaded and implemented, and received new contributions in its
own right, it might then be considered de facto 'the distribution'; nasty
online spats with the original authors notwithstanding.  The 'social
enforcement' of politeness (i.e. *I* don't take credit for *your* work)
prevents this to an extent, as it ought to under any versioning system.

There's a competing tendency to consider that the coders who spent the most
time creating the code understand it the best, and are in the best position
to maintain it directly.  This is true to a large degree, and entirely
applicable to Biopython's contributed modules.  git can potentially
facilitate that sort of contribution to the 'official' trunk in a way that
CVS can't, due to its permissions bottleneck.  However, the mechanics of
incorporating that contributed code are more or less the same: the people
with control of the 'official' trunk review the code and decide whether to
include it.  This is true whether the code is submitted as a patch to
Bugzilla, emailed to a developer, put up on public CVS on your site, or in a
forked git repository.  The same is true of your own git repository - you
don't have to include someone else's forked code if you don't want to.

What possibly needs to change is not the version control system, but the way
in which people think about their contribution.  Contributions can be made
productively under any versioning system, and the key questions remain the
same in all cases: Does the new code work (are there tests)? Does the new
code break any old code?  Is there documentation?  Is the API consistent?

"What version control system are we using?" is a minor detail, unless it is
inherently broken, hinders any of the above, or causes some other
deal-breaking issue (for Linus Torvalds, this included speed issues for
merges).

>> I think Michiel and Peter still head the Biopython project--at
>> least they have the most clout, I would say. Therefore, we will probably
>> look to one of their branches as the "official" branch of Biopython. When
>> one of them wants to step down in duty, we will socially pass the torch on
>> to the next taker.

It has always been thus.  Now, instead of passing on the user authentication
to the CVS server at OBF, the user authentication to the biopython github
account will be passed on, instead:

> I think it is essential we have a clearly labeled official trunk
> (perhaps with branches for releases), which will be used for all the
> official releases (tar balls, zip files and windows installers).  Our
> main webpage should make this very clear.
> 
> We could potentially continue to have a shared official branch (e.g.
> belonging to the generic github biopython user), and give all the
> existing CVS contributors write access - and continue to manage this
> as before.  So for example, if Frank wanted to check in some minor
> changes to Bio.Nexus he could just do it.  Future contributors
> patches/branches might get taken up by a developer on a personal
> branch for testing, before being merged into the official branch.
> 
> i.e. We can initially continue as before - right now I don't have a
> feel for how much work the role of an official branch maintainer would
> be, and it is difficult to guess without more hands on experience
> using the new tools.
 
Plus ca change (avec git)...

>>> The second question, related to the first is how will different
>>> branches (of different persons) be managed? I am seeing people
>>> starting working on the same code in different directions and then
>>> having problems merging everything together.
>> 
>> People are supposed to work in different directions; this is the point of
>> distributed workflows.

I may have a different understanding of 'different directions' than you
mean, but I don't think that it's good for a community project if people
work in different directions.  I also don't think that that is the point of
distributed workflows; on the contrary, I think that they are intended to
make it easier to work independently towards a common goal.  Even if that is
by working on loosely- or non-interacting parts of the whole.

>> Merging tends not to be so difficult, and compared to
>> centralized models like CVS and SVN, it's a cinch. We will help provide
>> documentation for proper merging habits (e.g., merge early, merge often, and
>> no rebasing after pushing, etc.). There are also screencasts popping up (in
>> particular Scott Chacon's re-make of his Gitcasts, now at learn.github) that
>> we will link to for educational purposes.
>> And of course, other developers will be around to help out in tricky merges.

This characterises one of the frustrating aspects of this thread (not
getting at you personally, Chris) - the occasional implicit assumption that
'things will be inherently *better* if we use git'.  Developers are around
to help now, even using CVS (which also has clear, long-standing stable
documentation - and even an O'Reilly book).  Several people don't seem to
think that that - and the way that code is reviewed and incorporated into
the main distribution - is good enough, and I don't think that this will
change just because the version control system has changed.  Nor will
changing revision control system generate significant free time to write,
test and document code.  But we may have the recession to do that last one
for us.

> Well, yes, in theory we have the same problem now with CVS - and while
> the tools may make merging easier, some communication is essential
> when working on the key modules which impact large parts of the code
> base.

I would put it more strongly than that: communication is essential in all
aspects of the project.  A number of related blog posts make statements
along the lines of "I don't use Biopython, or post to the mailing lists, but
I think that they're doing *this* wrong", or "I submitted code, but it
didn't get taken up immediately".  Now, venting and ranting on a blog is
fine, but it's not really *communicating*, any more than it was when I
thought that the BioSQL GenBank upload code was broken, fixed it (for my
purposes) and told no-one.  Git won't change the communication issue (in
either direction) any more than it changes the code review process.

FWIW, I think that git looks like a good way to go, and that it could help
encourage people to make local modifications of Biopython for their own
benefit and in their own interests and expert area, in a way that is visible
to the core distribution (unlike the patch submission process that is now
implemented).  In that way it could facilitate more rapid expansion of the
core distribution.  However, the bottlenecks of ensuring code quality,
testing and documentation will only ease if that is taken up by the
individuals/groups making those contributions, in addition to the core
developers.

And yes, I know I'm late with the new GenomeDiagram docs... ;)

L.

-- 
Dr Leighton Pritchard MRSC
D131, Plant Pathology Programme, SCRI
Errol Road, Invergowrie, Perth and Kinross, Scotland, DD2 5DA
e:lpritc at scri.ac.uk       w:http://www.scri.ac.uk/staff/leightonpritchard
gpg/pgp: 0xFEFC205C       tel:+44(0)1382 562731 x2405


______________________________________________________
SCRI, Invergowrie, Dundee, DD2 5DA.  
The Scottish Crop Research Institute is a charitable company limited by guarantee. 
Registered in Scotland No: SC 29367.
Recognised by the Inland Revenue as a Scottish Charity No: SC 006662.


DISCLAIMER:

This email is from the Scottish Crop Research Institute, but the views expressed by the sender are not necessarily the views of SCRI and its subsidiaries.  This email and any files transmitted with it are confidential to the intended recipient at the e-mail address to which it has been addressed.  It may not be disclosed or used by any other than that
addressee.
If you are not the intended recipient you are requested to preserve this confidentiality and you must not use, disclose, copy, print or rely on
this e-mail in any way. Please notify postmaster at scri.ac.uk quoting the name of the sender and delete the email from your system.

Although SCRI has taken reasonable precautions to ensure no viruses are present in this email, neither the Institute nor the sender accepts any responsibility for any viruses, and it is your responsibility to scan the email and the attachments (if any).
______________________________________________________


From bartek at rezolwenta.eu.org  Tue Mar 17 06:06:33 2009
From: bartek at rezolwenta.eu.org (Bartek Wilczynski)
Date: Tue, 17 Mar 2009 11:06:33 +0100
Subject: [Biopython-dev] history on github - where are the tags?
In-Reply-To: <8b34ec180903170302v7dca4f04w85a11d3f0fbe6314@mail.gmail.com>
References: <320fb6e00903170206h570989bbgb6b9a761d2aa70ed@mail.gmail.com>
	<8b34ec180903170302v7dca4f04w85a11d3f0fbe6314@mail.gmail.com>
Message-ID: <8b34ec180903170306ocf4b9e7s6d34cacdfb7e423b@mail.gmail.com>

Hi,

I'll look into this. I'm now heading for a plane, so I can't do it now.

cheers
 Bartek

On Tue, Mar 17, 2009 at 11:02 AM, Bartek Wilczynski <barwil at gmail.com> wrote:
> Hi,
>
> I'll look into this. I'm now heading for a plane, so I can't do it now.
>
> cheers
> ?Bartek
>
> On Tue, Mar 17, 2009 at 10:06 AM, Peter <biopython at maubp.freeserve.co.uk> wrote:
>> Hi Bartek et al,
>>
>> I've just been looking over the github mirror of CVS, and wanted to
>> see it presented the history of individual files. ?For example, this
>> page looks at the Bio/SeqRecord.py history using ViewCVS:
>> http://cvs.biopython.org/cgi-bin/viewcvs/viewcvs.cgi/biopython/Bio/SeqRecord.py?cvsroot=biopython
>>
>> For comparison, in GitHub,
>> http://github.com/biopython/biopython/commits/master/Bio/SeqRecord.py
>>
>> As you can see, all the comments and changes are there - which is
>> great. ?But I can't see the CVS tag information, which I assume would
>> be converting into git tags. ?Is this information present in the git
>> repository, but not shown by github, or was it lost during the
>> migration? ?This might seem like a little thing, but I have found it
>> incredibly important for tracing bugs reported in older releases, for
>> example in narrowing down when something changed.
>>
>> Peter
>>
>
>
>
> --
> Bartek Wilczynski
> ==================
> Postdoctoral fellow
> EMBL, Furlong group
> Meyerhoffstrasse 1,
> 69012 Heidelberg,
> Germany
> tel: +49 6221 387 8433
>


-- 
Bartek Wilczynski
==================
Postdoctoral fellow
EMBL, Furlong group
Meyerhoffstrasse 1,
69012 Heidelberg,
Germany
tel: +49 6221 387 8433


From biopython at maubp.freeserve.co.uk  Tue Mar 17 06:17:25 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Tue, 17 Mar 2009 10:17:25 +0000
Subject: [Biopython-dev] gitignore file for github
Message-ID: <320fb6e00903170317q683202c6ycd799de0ba748ef4@mail.gmail.com>

Hi all,

I think we should add a .gitignore file to the github mirror copy
repository, which should ignore:

* the build subdirectory and all its contents
* all *.pyc files (recursively, e.g. for the unit tests)
* all LaTeX temporary files recursively under Doc (e.g. *.aux, *.log)

Is there anything else this should include?  There are a few output
files created by the unit tests that we might want to include...

Otherwise all these files show up as "unstaged" to use git's
terminology, and there is a risk of someone accidentally committing
them.

Peter

From biopython at maubp.freeserve.co.uk  Tue Mar 17 06:57:37 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Tue, 17 Mar 2009 10:57:37 +0000
Subject: [Biopython-dev] gitignore file for github
In-Reply-To: <320fb6e00903170317q683202c6ycd799de0ba748ef4@mail.gmail.com>
References: <320fb6e00903170317q683202c6ycd799de0ba748ef4@mail.gmail.com>
Message-ID: <320fb6e00903170357s14a20each59f50f5e155298b0@mail.gmail.com>

On Tue, Mar 17, 2009 at 10:17 AM, Peter <biopython at maubp.freeserve.co.uk> wrote:
> Hi all,
>
> I think we should add a .gitignore file to the github mirror copy
> repository, which should ignore:
>
> * the build subdirectory and all its contents
> * all *.pyc files (recursively, e.g. for the unit tests)
> * all LaTeX temporary files recursively under Doc (e.g. *.aux, *.log)
>
> Is there anything else this should include? ?There are a few output
> files created by the unit tests that we might want to include...

This seems to work pretty well:

#Ignore the build directory (and its sub-directories):
build
#Ignore backup files from some Unix editors,
*~
#Ignore all compiled python files (e.g. from running the unit tests):
*.pyc
#The graphics unit tests produce output files for human inspection
#(at the time of writing, only PDF files are created but I expect
#this to change).
Tests/Graphics/*.pdf
Tests/Graphics/*.eps
Tests/Graphics/*.svg
Tests/Graphics/*.png

I've uploaded this as part of one of my test branches on github,
http://github.com/peterjc/biopython-seqio-quality/tree/master

Peter


From bugzilla-daemon at portal.open-bio.org  Tue Mar 17 06:59:22 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 17 Mar 2009 06:59:22 -0400
Subject: [Biopython-dev] [Bug 2767] Bio.SeqIO support for FASTQ and QUAL
	files
In-Reply-To: <bug-2767-42@http.bugzilla.open-bio.org/>
Message-ID: <200903171059.n2HAxMms006144@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2767


------- Comment #10 from biopython-bugzilla at maubp.freeserve.co.uk  2009-03-17 06:59 EST -------
I've made these changes available on a test github branch,
http://github.com/peterjc/biopython-seqio-quality/tree/master

This doesn't include all the example files for the unit tests yet.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From tiagoantao at gmail.com  Tue Mar 17 07:18:52 2009
From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=)
Date: Tue, 17 Mar 2009 11:18:52 +0000
Subject: [Biopython-dev] biopython on github
In-Reply-To: <320fb6e00903170146x59218aa0m857cab797ad3f440@mail.gmail.com>
References: <320fb6e00902250210t2ad19536ke379e219ba6f7dae@mail.gmail.com>
	<8b34ec180903121620w9c2ec46i8fed9ccb4781370e@mail.gmail.com>
	<320fb6e00903130521s69c5b3eg55b71191b1e8ff21@mail.gmail.com>
	<128a885f0903142243r372026d7vdf5bbe998db3a326@mail.gmail.com>
	<20090315185443.GA30296@kunkel>
	<320fb6e00903160430h125d11a3jd100497d3e25ffb8@mail.gmail.com>
	<20090316224240.GA57054@sobchak.mgh.harvard.edu>
	<6d941f120903161711p71c7c940t1eabe933c0fa43e5@mail.gmail.com>
	<128a885f0903162045l474d0df3w2b8fad7f7f129a3b@mail.gmail.com>
	<320fb6e00903170146x59218aa0m857cab797ad3f440@mail.gmail.com>
Message-ID: <6d941f120903170418k46481c8bj8c20d510314f57ee@mail.gmail.com>

On Tue, Mar 17, 2009 at 8:46 AM, Peter <biopython at maubp.freeserve.co.uk> wrote:
> I think it is essential we have a clearly labeled official trunk
> (perhaps with branches for releases), which will be used for all the
> official releases (tar balls, zip files and windows installers). ?Our
> main webpage should make this very clear.

I agree.

I would like to take this opportunity just to make my opinion clear (I
normally tend to list hipothesis and refrain to give my own opinions).

1. I don't think there is a pressing need to go from CVS to whatever.
While CVS is not perfect I don't think it has been a big hurdle. But
if people want to go in that direction, I have no strong feelings
against it also.
2. The hurdle was that _policy_ was too conservative: Some time ago it
was not acceptable even to consider a development branch. That
stiffles things (although it ensures stability which is good).
Fortunately things are more negotiatable now. The point is: the main
issues are policy, not technology.
3. Like it or not, different mechanisms (ie centralized versus
distributed VCSs) facilitate different policies. Distributed version
control facilitates branching to a massive degree.
4. I think a middle ground is a good idea: While there is an official
distribution (eg that one that is labelled biopython 1.50 and that
will end up on most users computers) which is agressively controled,
there should be space for people to try out new things.
5. People that try out new things should be aware (to avoid
disappointment) that their new code might not be accepted, for many
reasons on the official trunk: not enough documentation, no test
cases, design not acceptable, poorly-commented code, whatever. It
would be very sad that people would start working on something, spend
lots of time on their branch just to see their code refused to be on
the "official" trunk.

So, in my view things work like this:
A. The "official" version on biopython.org is controlled by a "head
honcho", currently Peter with input from biopython-dev. This is the
version that most users will ever see in practice.
B. The official version has a lot of quality enforcement on top.
C. People should be free to branch away and try new things.
D. People that branch away should be aware that their stuff might not
be accepted on the official distribution. If they want it accepted
they should come to biopython-dev and have a cup of tea with the
community.
E. Maybe some contact points should be defined for modules?
F. People who want their code included in the "official" distribution
should seriously think in branching from the "official" branch and not
from any other.

I would really like to see an "official" git branch which should be
created, in my opinion from a stable release and either by Peter or
Michiel (or any other long term CVS-write user). In my case I would
branch to maintain some of the PopGen code.


Tiago


From lpritc at scri.ac.uk  Tue Mar 17 08:19:28 2009
From: lpritc at scri.ac.uk (Leighton Pritchard)
Date: Tue, 17 Mar 2009 12:19:28 +0000
Subject: [Biopython-dev] biopython on github
In-Reply-To: <6d941f120903170418k46481c8bj8c20d510314f57ee@mail.gmail.com>
Message-ID: <C5E545D0.1F230%lpritc@scri.ac.uk>

On 17/03/2009 11:18, "Tiago Ant?o" <tiagoantao at gmail.com> wrote:

> On Tue, Mar 17, 2009 at 8:46 AM, Peter <biopython at maubp.freeserve.co.uk>
> wrote:
>> I think it is essential we have a clearly labeled official trunk
>> (perhaps with branches for releases), which will be used for all the
>> official releases (tar balls, zip files and windows installers). ?Our
>> main webpage should make this very clear.
> 
> I agree.
> 
> I would like to take this opportunity just to make my opinion clear (I
> normally tend to list hipothesis and refrain to give my own opinions).

[...]

+1 for Tiago's opinion.

L.


-- 
Dr Leighton Pritchard MRSC
D131, Plant Pathology Programme, SCRI
Errol Road, Invergowrie, Perth and Kinross, Scotland, DD2 5DA
e:lpritc at scri.ac.uk       w:http://www.scri.ac.uk/staff/leightonpritchard
gpg/pgp: 0xFEFC205C       tel:+44(0)1382 562731 x2405


______________________________________________________
SCRI, Invergowrie, Dundee, DD2 5DA.  
The Scottish Crop Research Institute is a charitable company limited by guarantee. 
Registered in Scotland No: SC 29367.
Recognised by the Inland Revenue as a Scottish Charity No: SC 006662.


DISCLAIMER:

This email is from the Scottish Crop Research Institute, but the views expressed by the sender are not necessarily the views of SCRI and its subsidiaries.  This email and any files transmitted with it are confidential to the intended recipient at the e-mail address to which it has been addressed.  It may not be disclosed or used by any other than that
addressee.
If you are not the intended recipient you are requested to preserve this confidentiality and you must not use, disclose, copy, print or rely on
this e-mail in any way. Please notify postmaster at scri.ac.uk quoting the name of the sender and delete the email from your system.

Although SCRI has taken reasonable precautions to ensure no viruses are present in this email, neither the Institute nor the sender accepts any responsibility for any viruses, and it is your responsibility to scan the email and the attachments (if any).
______________________________________________________


From biopython at maubp.freeserve.co.uk  Tue Mar 17 08:44:05 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Tue, 17 Mar 2009 12:44:05 +0000
Subject: [Biopython-dev] biopython on github
In-Reply-To: <6d941f120903170418k46481c8bj8c20d510314f57ee@mail.gmail.com>
References: <320fb6e00902250210t2ad19536ke379e219ba6f7dae@mail.gmail.com>
	<320fb6e00903130521s69c5b3eg55b71191b1e8ff21@mail.gmail.com>
	<128a885f0903142243r372026d7vdf5bbe998db3a326@mail.gmail.com>
	<20090315185443.GA30296@kunkel>
	<320fb6e00903160430h125d11a3jd100497d3e25ffb8@mail.gmail.com>
	<20090316224240.GA57054@sobchak.mgh.harvard.edu>
	<6d941f120903161711p71c7c940t1eabe933c0fa43e5@mail.gmail.com>
	<128a885f0903162045l474d0df3w2b8fad7f7f129a3b@mail.gmail.com>
	<320fb6e00903170146x59218aa0m857cab797ad3f440@mail.gmail.com>
	<6d941f120903170418k46481c8bj8c20d510314f57ee@mail.gmail.com>
Message-ID: <320fb6e00903170544i401fefa4gbfa2b2d542e94816@mail.gmail.com>

2009/3/17 Tiago Ant?o <tiagoantao at gmail.com>:
> On Tue, Mar 17, 2009 at 8:46 AM, Peter <biopython at maubp.freeserve.co.uk> wrote:
>> I think it is essential we have a clearly labeled official trunk
>> (perhaps with branches for releases), which will be used for all the
>> official releases (tar balls, zip files and windows installers). ?Our
>> main webpage should make this very clear.
>
> I agree.
>
> I would like to take this opportunity just to make my opinion clear (I
> normally tend to list hipothesis and refrain to give my own opinions).
>
> 1. I don't think there is a pressing need to go from CVS to whatever.
> While CVS is not perfect I don't think it has been a big hurdle. But
> if people want to go in that direction, I have no strong feelings
> against it also.

On a purely pragmatic level, yes, CVS has been enough.  This is one
real reason why there hasn't been a great deal of pressure on us to
move - it wasn't "broken" for how Biopython worked, although it does
make branching non-trivial.  Moving from CVS to a distributed version
control system (DVCS) won't make much difference for those of us with
CVS access - the big benefit as I see it is for potential contributors
who can easily make a branch to try out their ideas, and keep it in
sync with the master branch.  This could transform how new modules or
bug fixes get contributed, hopefully for the better.

> 2. The hurdle was that _policy_ was too conservative: Some time ago it
> was not acceptable even to consider a development branch. That
> stiffles things (although it ensures stability which is good).
> Fortunately things are more negotiatable now. The point is: the main
> issues are policy, not technology.

Historically Biopython has worked from a single stable branch (Brad -
can you comment about the history of this effective policy?).  I
recall saying something in the last year or so about not wanting to do
any branching in CVS while the SVN migration seemed imminent, but this
was primarily to avoid any complication in the migration itself,
rather than any deep objection to branches themselves.

> 3. Like it or not, different mechanisms (ie centralized versus
> distributed VCSs) facilitate different policies. Distributed version
> control facilitates branching to a massive degree.

True.

> 4. I think a middle ground is a good idea: While there is an official
> distribution (eg that one that is labelled biopython 1.50 and that
> will end up on most users computers) which is agressively controled,
> there should be space for people to try out new things.

I'm not quite sure what you mean by agressively controlled.  Moving to
a DVCS really should make public experimental branches much easier.

> 5. People that try out new things should be aware (to avoid
> disappointment) that their new code might not be accepted, for many
> reasons on the official trunk: not enough documentation, no test
> cases, design not acceptable, poorly-commented code, whatever. It
> would be very sad that people would start working on something, spend
> lots of time on their branch just to see their code refused to be on
> the "official" trunk.

That is a risk - especially if anyone were to go off and work in
complete isolation without even posting anything to this mailing list.

> So, in my view things work like this:
> A. The "official" version on biopython.org is controlled by a "head
> honcho", currently Peter with input from biopython-dev. This is the
> version that most users will ever see in practice.

That could work - although having anyone as a single bottle neck is a
risk, assuming you get someone to agree to the role in the first place
;)  I am generally happy with the current arrangement where module
owners have a degree of autonomy over their modules.  I wouldn't want
to have to approve every single minor change you (Tiago) make to
Bio.PopGen - but I suppose occasional review and merging of code from
Tiago's branch on request wouldn't be too onerous.

> B. The official version has a lot of quality enforcement on top.

What does that mean?  e.g. a strict policy about unit tests before
anything goes into the main branch?

> C. People should be free to branch away and try new things.

Given the Biopython license (as Leighton pointed out) this is already
the case with CVS.  Its just using a DVCS makes should this easier,
especially for keeping branches in sync with the official branch, and
hopefully for any merges back.

> D. People that branch away should be aware that their stuff might not
> be accepted on the official distribution. If they want it accepted
> they should come to biopython-dev and have a cup of tea with the
> community.

I agree.  I like tea.

> E. Maybe some contact points should be defined for modules?

Do you mean something more explicit about documenting who currently
maintains each module?

> F. People who want their code included in the "official" distribution
> should seriously think in branching from the "official" branch and not
> from any other.

I agree.

> I would really like to see an "official" git branch which should be
> created, in my opinion from a stable release and either by Peter or
> Michiel (or any other long term CVS-write user).

I think we'll have that - and in the short term the CVS mirror on
github can be used.

> In my case I would branch to maintain some of the PopGen code.

Great.

Peter


From chapmanb at 50mail.com  Tue Mar 17 08:49:30 2009
From: chapmanb at 50mail.com (Brad Chapman)
Date: Tue, 17 Mar 2009 08:49:30 -0400
Subject: [Biopython-dev] biopython on github
In-Reply-To: <C5E52504.1F20A%lpritc@scri.ac.uk>
References: <320fb6e00903170146x59218aa0m857cab797ad3f440@mail.gmail.com>
	<C5E52504.1F20A%lpritc@scri.ac.uk>
Message-ID: <20090317124930.GE57054@sobchak.mgh.harvard.edu>

Hi everyone;
Nice to see the discussion around trying out git. Leighton and
Tiago, you both brought up some definite concerns in moving
to a distributed version control system.

Git aims to help solve the problem of a them versus us community.
When you read posts critical of Biopython, you will find a lot of
complaints about "they didn't do this." This is confusing, as anyone
using, coding with, interested in, or contributing to Biopython is a
member of the community. CVS can help create this division, since it
appears as a walled off repository only the core developers can
access.

Git frees up the source code and lowers this barrier to contributing. Now
instead of saying "why didn't the developers integrate the code I
sent to the mailing list and write tests and documentation for it,"
we can all turn the question back on ourselves and ask why we didn't
create a branch with our new contribution and do it, soliciting help
from others in Biopython.

With solving the problems come potential concerns. This coincidental
blog post from yesterday intelligently covers a lot of the issues:

http://www.pointy-stick.com/blog/2009/03/16/dark-side-distributed-version-control/

The one we should be most concerned about is fragmentation. The
community of Python coders in bioinformatics is too small to be
split up; surely we are better served by resolving any differences
and producing one high quality reusable code base.

Tiago's assessment of how things should work practically looks
exactly right. Hard working core developers, like Peter and
Michiel, will be maintaining the trunk which we roll releases off
of. Contributors can either submit patches as now, or create short
branches which get merged back in. The advantage of branches is that
others can test and develop the branched code, and that the software
should help deal with some of the pain of merging.

There is a lot of good material in this thread for new potential
developers. Tiago, it would make sense to condense what you've
written and include it with the Contributing guide:

http://biopython.org/wiki/Contributing

We should also create a place on the wiki from the developer
documentation:

http://biopython.org/wiki/Documentation#Documentation_for_Developers

that describes active development branches and their goals
(called, say, ActiveBranches). Tiago, I thought you did a page for PopGen
earlier like this but I can't find it right now. We should keep
communication at a high level to avoid confusing fragmentation.

This is a difficult change in terms of how things work; we are
asking the right questions to create a good environment for improvement.

Brad

> Hi all,
> 
> This has been an occasionally frustrating thread to read...
> 
> On 17/03/2009 08:46, "Peter" <biopython at maubp.freeserve.co.uk> wrote:
> 
> > On Tue, Mar 17, 2009 at 3:45 AM, Chris Lasher <chris.lasher at gmail.com> wrote:
> >> 2009/3/16 Tiago Ant?o <tiagoantao at gmail.com>
> >> 
> 
> >>> How is the "official" biopython trunk controlled? Currently what is on
> >>> CVS is the gospel and Peter and Michiel essencially have control of
> >>> what is there and what is labelled as a "biopython distribution". How
> >>> will this work now?
>  
> >> In a distributed workflow, there is no technical official repository. The
> >> "official repository" is socially enforced.
> 
> That was true before.  Unless I misread the Biopython licencing, there was
> no real barrier to putting a branched copy of the code on your own
> server/site, with your own modifications.  What git does is provide tools to
> make merging of that sort of code easier (along with a number of of other
> nice features, such as authentication of contributions).  The presence of
> git does not ensure that your changes, or anyone else's, will be merged with
> any other repository, and nor does it ensure the quality of contributed
> code.  Git, while nice, and ideal for a number of tasks, is no magic bullet.
> 
> To an extent, the 'official' repository is, pragmatically, the one that is
> most stable and well-tested.  If my hypothetical branched version had become
> more stable and widely-used than the 'official' trunk, and become the most
> frequently downloaded and implemented, and received new contributions in its
> own right, it might then be considered de facto 'the distribution'; nasty
> online spats with the original authors notwithstanding.  The 'social
> enforcement' of politeness (i.e. *I* don't take credit for *your* work)
> prevents this to an extent, as it ought to under any versioning system.
> 
> There's a competing tendency to consider that the coders who spent the most
> time creating the code understand it the best, and are in the best position
> to maintain it directly.  This is true to a large degree, and entirely
> applicable to Biopython's contributed modules.  git can potentially
> facilitate that sort of contribution to the 'official' trunk in a way that
> CVS can't, due to its permissions bottleneck.  However, the mechanics of
> incorporating that contributed code are more or less the same: the people
> with control of the 'official' trunk review the code and decide whether to
> include it.  This is true whether the code is submitted as a patch to
> Bugzilla, emailed to a developer, put up on public CVS on your site, or in a
> forked git repository.  The same is true of your own git repository - you
> don't have to include someone else's forked code if you don't want to.
> 
> What possibly needs to change is not the version control system, but the way
> in which people think about their contribution.  Contributions can be made
> productively under any versioning system, and the key questions remain the
> same in all cases: Does the new code work (are there tests)? Does the new
> code break any old code?  Is there documentation?  Is the API consistent?
> 
> "What version control system are we using?" is a minor detail, unless it is
> inherently broken, hinders any of the above, or causes some other
> deal-breaking issue (for Linus Torvalds, this included speed issues for
> merges).
> 
> >> I think Michiel and Peter still head the Biopython project--at
> >> least they have the most clout, I would say. Therefore, we will probably
> >> look to one of their branches as the "official" branch of Biopython. When
> >> one of them wants to step down in duty, we will socially pass the torch on
> >> to the next taker.
> 
> It has always been thus.  Now, instead of passing on the user authentication
> to the CVS server at OBF, the user authentication to the biopython github
> account will be passed on, instead:
> 
> > I think it is essential we have a clearly labeled official trunk
> > (perhaps with branches for releases), which will be used for all the
> > official releases (tar balls, zip files and windows installers).  Our
> > main webpage should make this very clear.
> > 
> > We could potentially continue to have a shared official branch (e.g.
> > belonging to the generic github biopython user), and give all the
> > existing CVS contributors write access - and continue to manage this
> > as before.  So for example, if Frank wanted to check in some minor
> > changes to Bio.Nexus he could just do it.  Future contributors
> > patches/branches might get taken up by a developer on a personal
> > branch for testing, before being merged into the official branch.
> > 
> > i.e. We can initially continue as before - right now I don't have a
> > feel for how much work the role of an official branch maintainer would
> > be, and it is difficult to guess without more hands on experience
> > using the new tools.
>  
> Plus ca change (avec git)...
> 
> >>> The second question, related to the first is how will different
> >>> branches (of different persons) be managed? I am seeing people
> >>> starting working on the same code in different directions and then
> >>> having problems merging everything together.
> >> 
> >> People are supposed to work in different directions; this is the point of
> >> distributed workflows.
> 
> I may have a different understanding of 'different directions' than you
> mean, but I don't think that it's good for a community project if people
> work in different directions.  I also don't think that that is the point of
> distributed workflows; on the contrary, I think that they are intended to
> make it easier to work independently towards a common goal.  Even if that is
> by working on loosely- or non-interacting parts of the whole.
> 
> >> Merging tends not to be so difficult, and compared to
> >> centralized models like CVS and SVN, it's a cinch. We will help provide
> >> documentation for proper merging habits (e.g., merge early, merge often, and
> >> no rebasing after pushing, etc.). There are also screencasts popping up (in
> >> particular Scott Chacon's re-make of his Gitcasts, now at learn.github) that
> >> we will link to for educational purposes.
> >> And of course, other developers will be around to help out in tricky merges.
> 
> This characterises one of the frustrating aspects of this thread (not
> getting at you personally, Chris) - the occasional implicit assumption that
> 'things will be inherently *better* if we use git'.  Developers are around
> to help now, even using CVS (which also has clear, long-standing stable
> documentation - and even an O'Reilly book).  Several people don't seem to
> think that that - and the way that code is reviewed and incorporated into
> the main distribution - is good enough, and I don't think that this will
> change just because the version control system has changed.  Nor will
> changing revision control system generate significant free time to write,
> test and document code.  But we may have the recession to do that last one
> for us.
> 
> > Well, yes, in theory we have the same problem now with CVS - and while
> > the tools may make merging easier, some communication is essential
> > when working on the key modules which impact large parts of the code
> > base.
> 
> I would put it more strongly than that: communication is essential in all
> aspects of the project.  A number of related blog posts make statements
> along the lines of "I don't use Biopython, or post to the mailing lists, but
> I think that they're doing *this* wrong", or "I submitted code, but it
> didn't get taken up immediately".  Now, venting and ranting on a blog is
> fine, but it's not really *communicating*, any more than it was when I
> thought that the BioSQL GenBank upload code was broken, fixed it (for my
> purposes) and told no-one.  Git won't change the communication issue (in
> either direction) any more than it changes the code review process.
> 
> FWIW, I think that git looks like a good way to go, and that it could help
> encourage people to make local modifications of Biopython for their own
> benefit and in their own interests and expert area, in a way that is visible
> to the core distribution (unlike the patch submission process that is now
> implemented).  In that way it could facilitate more rapid expansion of the
> core distribution.  However, the bottlenecks of ensuring code quality,
> testing and documentation will only ease if that is taken up by the
> individuals/groups making those contributions, in addition to the core
> developers.
> 
> And yes, I know I'm late with the new GenomeDiagram docs... ;)
> 
> L.
> 
> -- 
> Dr Leighton Pritchard MRSC
> D131, Plant Pathology Programme, SCRI
> Errol Road, Invergowrie, Perth and Kinross, Scotland, DD2 5DA
> e:lpritc at scri.ac.uk       w:http://www.scri.ac.uk/staff/leightonpritchard
> gpg/pgp: 0xFEFC205C       tel:+44(0)1382 562731 x2405
> 
> 
> ______________________________________________________
> SCRI, Invergowrie, Dundee, DD2 5DA.  
> The Scottish Crop Research Institute is a charitable company limited by guarantee. 
> Registered in Scotland No: SC 29367.
> Recognised by the Inland Revenue as a Scottish Charity No: SC 006662.
> 
> 
> DISCLAIMER:
> 
> This email is from the Scottish Crop Research Institute, but the views expressed by the sender are not necessarily the views of SCRI and its subsidiaries.  This email and any files transmitted with it are confidential to the intended recipient at the e-mail address to which it has been addressed.  It may not be disclosed or used by any other than that
> addressee.
> If you are not the intended recipient you are requested to preserve this confidentiality and you must not use, disclose, copy, print or rely on
> this e-mail in any way. Please notify postmaster at scri.ac.uk quoting the name of the sender and delete the email from your system.
> 
> Although SCRI has taken reasonable precautions to ensure no viruses are present in this email, neither the Institute nor the sender accepts any responsibility for any viruses, and it is your responsibility to scan the email and the attachments (if any).
> ______________________________________________________
> 
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev

From tiagoantao at gmail.com  Tue Mar 17 09:10:18 2009
From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=)
Date: Tue, 17 Mar 2009 13:10:18 +0000
Subject: [Biopython-dev] biopython on github
In-Reply-To: <320fb6e00903170544i401fefa4gbfa2b2d542e94816@mail.gmail.com>
References: <320fb6e00902250210t2ad19536ke379e219ba6f7dae@mail.gmail.com>
	<128a885f0903142243r372026d7vdf5bbe998db3a326@mail.gmail.com>
	<20090315185443.GA30296@kunkel>
	<320fb6e00903160430h125d11a3jd100497d3e25ffb8@mail.gmail.com>
	<20090316224240.GA57054@sobchak.mgh.harvard.edu>
	<6d941f120903161711p71c7c940t1eabe933c0fa43e5@mail.gmail.com>
	<128a885f0903162045l474d0df3w2b8fad7f7f129a3b@mail.gmail.com>
	<320fb6e00903170146x59218aa0m857cab797ad3f440@mail.gmail.com>
	<6d941f120903170418k46481c8bj8c20d510314f57ee@mail.gmail.com>
	<320fb6e00903170544i401fefa4gbfa2b2d542e94816@mail.gmail.com>
Message-ID: <6d941f120903170610g161342f0ief365d68f25707c1@mail.gmail.com>

Hi,

> I'm not quite sure what you mean by agressively controlled. ?Moving to
> a DVCS really should make public experimental branches much easier.


I mean that the official release is a very controlled (a good thing!).
Development branches should be more free.

> That is a risk - especially if anyone were to go off and work in
> complete isolation without even posting anything to this mailing list.

I think our obligation is to inform people of the issue. If then
people go away and don't communicate, then it becomes their problem. I
think just a couple of sentences on the Contributing page on the wiki
would be more than enough.


> That could work - although having anyone as a single bottle neck is a
> risk, assuming you get someone to agree to the role in the first place
> ;) ?I am generally happy with the current arrangement where module
> owners have a degree of autonomy over their modules. ?I wouldn't want
> to have to approve every single minor change you (Tiago) make to
> Bio.PopGen - but I suppose occasional review and merging of code from
> Tiago's branch on request wouldn't be too onerous.

I agree. I am just trying to make this "explicit" policy. So that
everybody knows the rules of the game. If people dont agree than that
should be discussed and changed. But the point is, these kind of
management issues should be written down somewhere in a transparent
way.


>> B. The official version has a lot of quality enforcement on top.
>
> What does that mean? ?e.g. a strict policy about unit tests before
> anything goes into the main branch?

I was reading  http://biopython.org/wiki/Contributing and the main
stuff is already there (the "submitting code" place).
But the point is: the official version should be stable and reliable
(as it is now, IMHO)


>> E. Maybe some contact points should be defined for modules?
>
> Do you mean something more explicit about documenting who currently
> maintains each module?

That is my point. Makes any sense?


From chapmanb at 50mail.com  Tue Mar 17 09:04:53 2009
From: chapmanb at 50mail.com (Brad Chapman)
Date: Tue, 17 Mar 2009 09:04:53 -0400
Subject: [Biopython-dev] Preparing for Biopython 1.50 (beta)
In-Reply-To: <49BEDD9B.6030905@u.washington.edu>
References: <320fb6e00903160516yd63f61fu21ca7560562dd6dd@mail.gmail.com>
	<20090316225558.GC57054@sobchak.mgh.harvard.edu>
	<49BEDD9B.6030905@u.washington.edu>
Message-ID: <20090317130453.GF57054@sobchak.mgh.harvard.edu>

Hi David;

> I've got some 454 and Solid data you could test it on too.
> 
> Has anybody else looked into how these other two Next Gen formats might 
> complicate things?

Sweet. We definitely want to support output from them as well; it is 
great to have someone on board who is working with data from other
machines.

Peter did a pretty thorough investigation of the different formats
and wrote it up in the docs to the proposed QualityIO module:

http://github.com/peterjc/biopython-seqio-quality/blob/6fdf27393cb7318b229ff8587721e83544da968d/Bio/SeqIO/QualityIO.py

Does this make sense with your experience?

If you feel comfortable with git, Peter set up a new branch with his
code for this:

http://github.com/peterjc/biopython-seqio-quality/tree/master

and we'd be more than happy to have you testing it. Alternatively,
if you want to submit some smaller data files we can use in testing, you
could attach them to the current enhancement request:

http://bugzilla.open-bio.org/show_bug.cgi?id=2767

Thanks for the help,
Brad

> 
> Brad Chapman wrote:
> > Peter;
> >
> >   
> >> I think we should probably do another release soon 
> >>     
> >
> > Good call. +1 from me.
> >
> >   
> >> I'd like to include the following changes as part of the beta, but it
> >> would be sensible to have someone else try these out first.  Any
> >> volunteers?
> >>
> >> Bug 2767 - Bio.SeqIO support for FASTQ and QUAL files
> >>     
> >
> > The code for this looked good when I reviewed it earlier. I will
> > test it out with some solexa reads from here this week; any reason
> > not to check the patch and files into CVS? Then I can fire up my
> > coal-powered revision control system, feed two punch cards into the
> > mouth of the machine, hope the vacuum tubes don't burn out again,
> > and check it out locally.
> >
> > Brad
> > _______________________________________________
> > Biopython-dev mailing list
> > Biopython-dev at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/biopython-dev
> >   

> begin:vcard
> fn:David Schruth
> n:Schruth;David
> org:University of Washington, Department of Oceanography;The Center for Environmental Genomics
> adr;dom:616 NE Northlake Place;;Benjamin Hall IRB, Room 306;Seattle;WA;98105
> email;internet:dschruth at u.washington.edu
> title:Bioinformatics Research Consultant
> tel;work:(206) 328-7381
> tel;cell:(206) 250-9110
> x-mozilla-html:FALSE
> url:http://armbrustlab.ocean.washington.edu/people/schruth
> version:2.1
> end:vcard
> 


From tiagoantao at gmail.com  Tue Mar 17 09:19:38 2009
From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=)
Date: Tue, 17 Mar 2009 13:19:38 +0000
Subject: [Biopython-dev] biopython on github
In-Reply-To: <20090317124930.GE57054@sobchak.mgh.harvard.edu>
References: <320fb6e00903170146x59218aa0m857cab797ad3f440@mail.gmail.com>
	<C5E52504.1F20A%lpritc@scri.ac.uk>
	<20090317124930.GE57054@sobchak.mgh.harvard.edu>
Message-ID: <6d941f120903170619n4cb8d4dfr8a72f8ac1e0e896d@mail.gmail.com>

On Tue, Mar 17, 2009 at 12:49 PM, Brad Chapman <chapmanb at 50mail.com> wrote:
> There is a lot of good material in this thread for new potential
> developers. Tiago, it would make sense to condense what you've
> written and include it with the Contributing guide:
>
> http://biopython.org/wiki/Contributing


I can go ahead and try to put a summary of our discussions on that
page, if nobody opposes. The change can be rewritten afterwards or
deleted anyway. The only issue is that I can only to that on the
weekend and not before (travelling abroad from Wednsday to Friday).
What I think is needed is actually a final decision on how thigs will
progress. Will there be an official git branch? The official will
still be cvs? Where will it be hosted? These are lots of important
questions, but I think there is enough discussion to arrive at a
decision.

> (called, say, ActiveBranches). Tiago, I thought you did a page for PopGen
> earlier like this but I can't find it right now. We should keep
> communication at a high level to avoid confusing fragmentation.

Coincidentally I was editing that page today. I took the liberty of
creating a link from the documentation page to it. So it should be
reachable now.

Tiago

From p.j.a.cock at googlemail.com  Tue Mar 17 10:44:08 2009
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Tue, 17 Mar 2009 14:44:08 +0000
Subject: [Biopython-dev] biopython on github
In-Reply-To: <6d941f120903170619n4cb8d4dfr8a72f8ac1e0e896d@mail.gmail.com>
References: <320fb6e00903170146x59218aa0m857cab797ad3f440@mail.gmail.com>
	<C5E52504.1F20A%lpritc@scri.ac.uk>
	<20090317124930.GE57054@sobchak.mgh.harvard.edu>
	<6d941f120903170619n4cb8d4dfr8a72f8ac1e0e896d@mail.gmail.com>
Message-ID: <320fb6e00903170744j543f643fg6ef8d677287e2361@mail.gmail.com>

2009/3/17 Tiago Ant?o <tiagoantao at gmail.com>:
> I can go ahead and try to put a summary of our discussions on that
> page, if nobody opposes. The change can be rewritten afterwards or
> deleted anyway. The only issue is that I can only to that on the
> weekend and not before (travelling abroad from Wednsday to Friday).

Sure - by the weekend I hope we'll have come to a consensus.

> What I think is needed is actually a final decision on how thigs will
> progress. Will there be an official git branch? The official will
> still be cvs? Where will it be hosted? These are lots of important
> questions, but I think there is enough discussion to arrive at a
> decision.

I think it is still to early for a final decision, but here is my
suggested plan:

In the short term (at least until Biopython 1.50 beta is out, perhaps
until Biopython 1.50 proper is out), CVS will remain the official
repository.    Bartek will continue automatically updating the
mirrored copy on github, which will otherwise be treated as READ ONLY.
 If needs be, he may have to reimport the whole history (the tag issue
troubles me - see the other thread), so there may be some bumps along
this road.  Contributions/bug fixes can continue via bugzilla with a
patch, and contributors can also try providing a URL to their own git
branch if they prefer.  During this period I hope most (ideally all)
our active developers with CVS access will create an account on
github, and try out forking from the CVS mirror, creating their own
branches, checking in some changes, and doing some simple merges - for
example pulling code from other Biopython developer's public branches.
 This should give us the confidence to trust git and github enough to
use it for real.

i.e. For the roughly the next month, we will continue as before with
CVS for the real work, but will also try out github.

Once Biopython 1.50 final is out (hopefully by the end of April 2009,
probably sooner), we need to decide if we will actually make the more
to git on github.  At this point, I would expect this to happen by
declaring CVS read only, a static archive (and emergency fall back).
Bartek would turn off his automatic syncing.  We would then continue
working on the github branch with the full CVS history, with a core of
Biopython developers having write access to the "official" branch,
doing new work under their own personal branches for eventual merging
into the main trunk.

I'd still like to have a copy of the "official" git repository running
on biopython.org, but this may not be that easy without some technical
expertise in house to do this.  From initial discussion with the OBF
team about the idea of running git on their servers, my impression is
if we can do it ourselves, we may.  Jason Stajich actually suggested
we use github independently.

Peter

P.S. Could you all update your entry on the wiki participants page
(and if you have one, your wiki user page) to include a link to your
github account:
http://biopython.org/wiki/Participants


From biopython at maubp.freeserve.co.uk  Tue Mar 17 10:46:53 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Tue, 17 Mar 2009 14:46:53 +0000
Subject: [Biopython-dev] Preparing for Biopython 1.50 (beta)
In-Reply-To: <49BEDD9B.6030905@u.washington.edu>
References: <320fb6e00903160516yd63f61fu21ca7560562dd6dd@mail.gmail.com>
	<20090316225558.GC57054@sobchak.mgh.harvard.edu>
	<49BEDD9B.6030905@u.washington.edu>
Message-ID: <320fb6e00903170746g632f56a5hfae8a4960e77fa85@mail.gmail.com>

2009/3/16 David Schruth <dschruth at u.washington.edu>:
> I've got some 454 and Solid data you could test it on too.
>
> Has anybody else looked into how these other two Next Gen formats might
> complicate things?

Roche 454 sequencers produce their own binary SFF files (standing for
sequence file format?), but they provide tools which turn these into
standard Sanger style files using PHRED qualities.  In theory, we
might be able to parse the SFF files directly, see for example
http://blog.malde.org/index.php/2008/11/14/454-sequencing-and-parsing-the-sff-binary-format/
and the links given.  In practice, most sequencing centers using Roche
454 will be happy to provide FASTQ or FASTA+QUAL files, and the code
on Bug 2767 (or the associated experimental branch on github) should
work fine on these.
http://bugzilla.open-bio.org/show_bug.cgi?id=2767

You are free to try out the proposed code yourself now, but if you
have some particular 454 files you'd like me to check, please email me
(off the mailing list).  If you can share some real data which we
could include in Biopython for a unit test that would also be great
(but unless you tell me this explicitly, I'll only make sure we can
parse your files).

Regarding SOLiD files, they work in colour space and I am under the
impression that it doesn't make sense to convert them to sequence
space until after doing the assembly or genome mapping (in colour
space).  See for example
http://solidsoftwaretools.com/gf/project/mapreads/ i.e. It may not be
appropriate to parse SOLiD reads into Biopython SeqRecord objects, and
thus wouldn't belong in Bio.SeqIO.  That isn't to say we wouldn't want
a parser elsewhere in Biopython, perhaps under Bio.Sequencing would be
best.

Peter

From biopython at maubp.freeserve.co.uk  Tue Mar 17 10:57:49 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Tue, 17 Mar 2009 14:57:49 +0000
Subject: [Biopython-dev] biopython on github
In-Reply-To: <20090316224240.GA57054@sobchak.mgh.harvard.edu>
References: <320fb6e00902250210t2ad19536ke379e219ba6f7dae@mail.gmail.com>
	<8b34ec180902260526m3ff42f3x2a99a77d4d0fb928@mail.gmail.com>
	<320fb6e00902260600p5fb90241td1ded497c08cb901@mail.gmail.com>
	<128a885f0903121407g133ed8ctda57b21ff8adb70e@mail.gmail.com>
	<8b34ec180903121620w9c2ec46i8fed9ccb4781370e@mail.gmail.com>
	<320fb6e00903130521s69c5b3eg55b71191b1e8ff21@mail.gmail.com>
	<128a885f0903142243r372026d7vdf5bbe998db3a326@mail.gmail.com>
	<20090315185443.GA30296@kunkel>
	<320fb6e00903160430h125d11a3jd100497d3e25ffb8@mail.gmail.com>
	<20090316224240.GA57054@sobchak.mgh.harvard.edu>
Message-ID: <320fb6e00903170757s183f6f59x40549f7e3a853f06@mail.gmail.com>

On Mon, Mar 16, 2009 at 10:42 PM, Brad Chapman <chapmanb at 50mail.com> wrote:
> Peter wrote:
>> I'm thinking a news post on
>> http://news.open-bio.org/news/category/obf-projects/biopython/ about
>> version control would be a good idea at this point. ?How about this -
>
> This is great, and I would move the last paragraph describing
> the Git repository to the beginning; start with what we are doing and
> then describe the rationale. This should help for those with ADD, and
> also give more prominent credit to Bartek, Giovanni and you for the
> work that went into this.

Good idea about the reordering - done, and published:
http://news.open-bio.org/news/2009/03/biopython-and-version-control-systems/
It will also show up on http://biopython.org/wiki/News via the RSS feed.

Peter


From rodrigo_faccioli at uol.com.br  Tue Mar 17 11:30:48 2009
From: rodrigo_faccioli at uol.com.br (Rodrigo faccioli)
Date: Tue, 17 Mar 2009 12:30:48 -0300
Subject: [Biopython-dev] PDB Parser error
Message-ID: <3715adb70903170830x61bb6e3bl4412a8cf1504d80c@mail.gmail.com>

I built a relational database in PostgreSQL. This database stores some
informations form PDB file. These informations are about its sequence, atoms
and sbonds. Now, I'm building a parser for this my database which I want to
load it in a biopython PDB parser structure. The idea is  keep on whole my
souce-code  based in biopython PDB parser, because will be necessary to do
some operations with these informations.

So, I study the Bio.PDB directory and I read in the PDBPaerser.py file, its
_parse_coordinates method where there is some methods about initialization
structure. I run them in my code. However, is showing the message below.
Traceback (most recent call last):
  File "src/testefcfrpPDB.py", line 32, in <module>
    main()
  File "src/testefcfrpPDB.py", line 30, in main
    structure = FcfrpPDB.getPDBFile(id)
  File "/home/faccioli/workspace/blast/src/FcfrpPDB.py", line 67, in
getPDBFile
    return fcfrpPDBParser.loadStructureFromDatabase(id)
  File "/home/faccioli/workspace/blast/src/FcfrpPDBParser.py", line 48, in
loadStructureFromDatabase
    self._structure_builder.init_atom(D_Atoms[i].get_id(),
D_Atoms[i].get_coord(), D_Atoms[i].get_bfactor(),D_Atoms[i].get_occupancy()
,D_Atoms[i].get_altloc(), D_Atoms[i].get_fullname(),
D_Atoms[i].get_serial_number())
  File
"/usr/lib/python2.5/site-packages/biopython-1.49-py2.5-linux-i686.egg/Bio/PDB/StructureBuilder.py",
line 182, in init_atom
    if residue.has_id(name):
  File
"/usr/lib/python2.5/site-packages/biopython-1.49-py2.5-linux-i686.egg/Bio/PDB/Entity.py",
line 96, in has_id
    return self.child_dict.has_key(id)
TypeError: list objects are unhashable

This post is my first post in biopython developer's list and I don't know
what is the its process to send a code.

Thanks for any help.

-- 
Rodrigo Antonio Faccioli
Ph.D Student in Electrical Engineering
University of Sao Paulo - USP
Engineering School of Sao Carlos - EESC
Department of Electrical Engineering - SEL
Intelligent System in Structure Bioinformatics
http://laips.sel.eesc.usp.br
Phone: 55 (16) 3373-9366 Ext 229
Curriculum Lattes - http://lattes.cnpq.br/1025157978990218

From lpritc at scri.ac.uk  Tue Mar 17 11:42:55 2009
From: lpritc at scri.ac.uk (Leighton Pritchard)
Date: Tue, 17 Mar 2009 15:42:55 +0000
Subject: [Biopython-dev] Preparing for Biopython 1.50 (beta)
In-Reply-To: <320fb6e00903170746g632f56a5hfae8a4960e77fa85@mail.gmail.com>
Message-ID: <C5E5757F.1F268%lpritc@scri.ac.uk>

Hi,

On 17/03/2009 14:46, "Peter" <biopython at maubp.freeserve.co.uk> wrote:

> 2009/3/16 David Schruth <dschruth at u.washington.edu>:
>> I've got some 454 and Solid data you could test it on too.
>> 
>> Has anybody else looked into how these other two Next Gen formats might
>> complicate things?

> Regarding SOLiD files, they work in colour space and I am under the
> impression that it doesn't make sense to convert them to sequence
> space until after doing the assembly or genome mapping (in colour
> space).  See for example
> http://solidsoftwaretools.com/gf/project/mapreads/ i.e. It may not be
> appropriate to parse SOLiD reads into Biopython SeqRecord objects, and
> thus wouldn't belong in Bio.SeqIO.  That isn't to say we wouldn't want
> a parser elsewhere in Biopython, perhaps under Bio.Sequencing would be
> best.

That's my understanding and practical experience, too.  For lurkers' benefit
SOLiD data looks like this:

>4_48_57_F3
T33111210002200023033000000211000101
>4_48_89_F3
T22002312223133113013303322223322223
>4_48_95_F3
T22300102100203322101021130203000201

where each of the four values (0,1,2,3) corresponds to one of 16 dimers (AA,
AC, AG, AT, CA, ...), i.e. Each colour value is degenerate for four possible
dimers.  This system is described at
http://www3.appliedbiosystems.com/cms/groups/mcb_marketing/documents/general
documents/cms_057559.pdf.

The use of an appropriate colour->dimer mapping makes it possible, in
principle, to go from colour space to nucleotide sequence, so long as a
single base of the sequence is known.  In reality a single colour space read
error silently makes the rest of the SOLiD read mapping incorrect.
Practical use of SOLiD data involves mapping the sequence reads to a
reference sequence (either by converting the reference to colour space, or
dynamic programming) prior to conversion to 'base space'.

The mapping process is probably better handled by dedicated applications,
and I think the role for Biopython in this is to parse their output.  GFF
is, awkwardly enough, a popular output format for this kind of analysis.

L.


-- 
Dr Leighton Pritchard MRSC
D131, Plant Pathology Programme, SCRI
Errol Road, Invergowrie, Perth and Kinross, Scotland, DD2 5DA
e:lpritc at scri.ac.uk       w:http://www.scri.ac.uk/staff/leightonpritchard
gpg/pgp: 0xFEFC205C       tel:+44(0)1382 562731 x2405


______________________________________________________
SCRI, Invergowrie, Dundee, DD2 5DA.  
The Scottish Crop Research Institute is a charitable company limited by guarantee. 
Registered in Scotland No: SC 29367.
Recognised by the Inland Revenue as a Scottish Charity No: SC 006662.


DISCLAIMER:

This email is from the Scottish Crop Research Institute, but the views expressed by the sender are not necessarily the views of SCRI and its subsidiaries.  This email and any files transmitted with it are confidential to the intended recipient at the e-mail address to which it has been addressed.  It may not be disclosed or used by any other than that
addressee.
If you are not the intended recipient you are requested to preserve this confidentiality and you must not use, disclose, copy, print or rely on
this e-mail in any way. Please notify postmaster at scri.ac.uk quoting the name of the sender and delete the email from your system.

Although SCRI has taken reasonable precautions to ensure no viruses are present in this email, neither the Institute nor the sender accepts any responsibility for any viruses, and it is your responsibility to scan the email and the attachments (if any).
______________________________________________________

From biopython at maubp.freeserve.co.uk  Tue Mar 17 12:01:25 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Tue, 17 Mar 2009 16:01:25 +0000
Subject: [Biopython-dev] PDB Parser error
In-Reply-To: <3715adb70903170830x61bb6e3bl4412a8cf1504d80c@mail.gmail.com>
References: <3715adb70903170830x61bb6e3bl4412a8cf1504d80c@mail.gmail.com>
Message-ID: <320fb6e00903170901v6533910bl57ddd534dc05cf51@mail.gmail.com>

On Tue, Mar 17, 2009 at 3:30 PM, Rodrigo faccioli
<rodrigo_faccioli at uol.com.br> wrote:
> I built a relational database in PostgreSQL. This database stores some
> informations form PDB file. These informations are about its sequence, atoms
> and sbonds. Now, I'm building a parser for this my database which I want to
> load it in a biopython PDB parser structure. The idea is ?keep on whole my
> souce-code ?based in biopython PDB parser, because will be necessary to do
> some operations with these informations.
>
> So, I study the Bio.PDB directory and I read in the PDBPaerser.py file, its
> _parse_coordinates method where there is some methods about initialization
> structure. I run them in my code. However, is showing the message below.
> Traceback (most recent call last):
> ?File "src/testefcfrpPDB.py", line 32, in <module>
> ...
> ?File
> "/usr/lib/python2.5/site-packages/biopython-1.49-py2.5-linux-i686.egg/Bio/PDB/Entity.py",
> line 96, in has_id
> ? ?return self.child_dict.has_key(id)
> TypeError: list objects are unhashable
>
> This post is my first post in biopython developer's list and I don't know
> what is the its process to send a code.

Its hard to say without seeing your full code (and even then, without
the database it would be difficult to reproduce it).  As you have a
TypeError, I suspect you have something as the wrong datatype - maybe
a list that should be a string or something.

If you want to share the full file testefcfrpPDB.py you could post it
on http://pastebin.com/ or something (do you have your own website?).

Peter


From biopython at maubp.freeserve.co.uk  Tue Mar 17 13:59:43 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Tue, 17 Mar 2009 17:59:43 +0000
Subject: [Biopython-dev] biopython on github
In-Reply-To: <320fb6e00903170744j543f643fg6ef8d677287e2361@mail.gmail.com>
References: <320fb6e00903170146x59218aa0m857cab797ad3f440@mail.gmail.com>
	<C5E52504.1F20A%lpritc@scri.ac.uk>
	<20090317124930.GE57054@sobchak.mgh.harvard.edu>
	<6d941f120903170619n4cb8d4dfr8a72f8ac1e0e896d@mail.gmail.com>
	<320fb6e00903170744j543f643fg6ef8d677287e2361@mail.gmail.com>
Message-ID: <320fb6e00903171059r7a5528d5i19bf5fed9cfd8a63@mail.gmail.com>

I wrote:
> In the short term (at least until Biopython 1.50 beta is out, perhaps
> until Biopython 1.50 proper is out), CVS will remain the official
> repository. ?...  During this period I hope most (ideally all) our active
> developers with CVS access will create an account on github, and
> try out forking from the CVS mirror, creating their own branches,
> checking in some changes, and doing some simple merges - for
> example pulling code from other Biopython developer's public
> branches.  This should give us the confidence to trust git and
> github enough to use it for real.

Brad and I have been trying this out in practice, and it seems to work OK.

I started a fork to test the patches for Bug 2767, adding quality
parsers to Bio.SeqIO,
http://github.com/chapmanb/biopython-seqio-quality/tree/master
I made a few incremental checkins, pushed to github one by one.

Brad then took a fork of this in order to make some minor changes and
fix a typo in the documentation :
http://github.com/chapmanb/biopython-seqio-quality/tree/master

At this point the "network" diagrams showed up the two branches as
diverging.  Brad then sent me a "pull" request, suggesting I might
want to pull his work into my branch.

Using the git command line tool, I was able to pull and merge Brad's
changes (as I had made no changes in the meantime this could be done
automatically), and then push the merged version back up to github on
my branch.  At this point my branch and brad's agreed once again, and
the "network" diagram no longer shows both.  Note that my branch now
includes a commit from Brad.

At this point, Brad may choose to delete his branch, or perhaps make
further changes.

Now all this worked, but I was wondering if the github web interface
could have simplified any of this, if I'd only know where to click.
For example, does github offer any way to view a diff between to
branches?  Or, as I suspect, do they simply expect you to use the git
tools directly for this?

Peter


From bugzilla-daemon at portal.open-bio.org  Tue Mar 17 14:06:00 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 17 Mar 2009 14:06:00 -0400
Subject: [Biopython-dev] [Bug 2767] Bio.SeqIO support for FASTQ and QUAL
	files
In-Reply-To: <bug-2767-42@http.bugzilla.open-bio.org/>
Message-ID: <200903171806.n2HI60op012464@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2767


------- Comment #11 from biopython-bugzilla at maubp.freeserve.co.uk  2009-03-17 14:06 EST -------
(In reply to comment #10)
> I've made these changes available on a test github branch,
> http://github.com/peterjc/biopython-seqio-quality/tree/master
> 
> This doesn't include all the example files for the unit tests yet.
> 

I've now checked this into CVS.  The extra example files will follow later...
leaving this bug open until that is done.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From dalloliogm at gmail.com  Tue Mar 17 14:35:04 2009
From: dalloliogm at gmail.com (Giovanni Marco Dall'Olio)
Date: Tue, 17 Mar 2009 19:35:04 +0100
Subject: [Biopython-dev] biopython on github
In-Reply-To: <320fb6e00903171059r7a5528d5i19bf5fed9cfd8a63@mail.gmail.com>
References: <320fb6e00903170146x59218aa0m857cab797ad3f440@mail.gmail.com>
	<C5E52504.1F20A%lpritc@scri.ac.uk>
	<20090317124930.GE57054@sobchak.mgh.harvard.edu>
	<6d941f120903170619n4cb8d4dfr8a72f8ac1e0e896d@mail.gmail.com>
	<320fb6e00903170744j543f643fg6ef8d677287e2361@mail.gmail.com>
	<320fb6e00903171059r7a5528d5i19bf5fed9cfd8a63@mail.gmail.com>
Message-ID: <5aa3b3570903171135nb49de80h6c6ee0930c147d29@mail.gmail.com>

On Tue, Mar 17, 2009 at 6:59 PM, Peter <biopython at maubp.freeserve.co.uk> wrote:
>
> Brad and I have been trying this out in practice, and it seems to work OK.
>
> I started a fork to test the patches for Bug 2767, adding quality
> parsers to Bio.SeqIO,
> http://github.com/chapmanb/biopython-seqio-quality/tree/master
> I made a few incremental checkins, pushed to github one by one.
>
> Brad then took a fork of this in order to make some minor changes and
> fix a typo in the documentation :
> http://github.com/chapmanb/biopython-seqio-quality/tree/master

Yes, basically this is the way it should be working.
Usually I do something similar, only I use more the procedure explained here:
- http://www.kernel.org/pub/software/scm/git/docs/v1.4.4.4/tutorial.html
(section 'Using git for collaboration')

I fetch the other branch and call it as master:otheruser-incoming,
then compare the two branches with gitk or with git log
master..otheruser-incoming.


>
>
> At this point the "network" diagrams showed up the two branches as
> diverging. ?Brad then sent me a "pull" request, suggesting I might
> want to pull his work into my branch.
>
> Using the git command line tool, I was able to pull and merge Brad's
> changes (as I had made no changes in the meantime this could be done
> automatically),

If you go on 'Fork Queue' on github, it should show other people's commits.
However, I don't trust doing this with a web interface... moreover, it
seems to not work properly some times (it is not clear how it defines
if a commit will 'apply cleanly' or not)

On the same page, there is a 'pull merge request' button, which (I
never tried it) should send a merge request to the selected recipents.

> and then push the merged version back up to github on
> my branch. ?At this point my branch and brad's agreed once again, and
> the "network" diagram no longer shows both. ?Note that my branch now
> includes a commit from Brad.

Yes, this is right. The graph only shows the commits which differ, so
it included your two branches as a single one.

If you fell comfortable with the git mechanisms, maybe later you could
create a second branch in the 'biopython/biopython' repository, and
call it 'accepted-github-changes', or something like that, which will
collect all the changes that can be submitted to the cvs.


> At this point, Brad may choose to delete his branch, or perhaps make
> further changes.

I wonder if a good strategy with this is create branches only to test
specific changes, and then delete them.
If Brad keeps his branch, later he will have to remember to update it,
which maybe is less trouble than deleting a branch and creating it
when necessary.

> Now all this worked, but I was wondering if the github web interface
> could have simplified any of this, if I'd only know where to click.
> For example, does github offer any way to view a diff between to
> branches? ?Or, as I suspect, do they simply expect you to use the git
> tools directly for this?

For my knowledge, there are not such tools :-(.
You must rely on the commit's messages to identify the differences
between different branches.
Maybe they will implement such feature at some point.

>
> Peter
>
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev


--

My blog on bioinformatics (now in English): http://bioinfoblog.it


From dalloliogm at gmail.com  Tue Mar 17 14:36:24 2009
From: dalloliogm at gmail.com (Giovanni Marco Dall'Olio)
Date: Tue, 17 Mar 2009 19:36:24 +0100
Subject: [Biopython-dev] biopython on github
In-Reply-To: <320fb6e00903170744j543f643fg6ef8d677287e2361@mail.gmail.com>
References: <320fb6e00903170146x59218aa0m857cab797ad3f440@mail.gmail.com>
	<C5E52504.1F20A%lpritc@scri.ac.uk>
	<20090317124930.GE57054@sobchak.mgh.harvard.edu>
	<6d941f120903170619n4cb8d4dfr8a72f8ac1e0e896d@mail.gmail.com>
	<320fb6e00903170744j543f643fg6ef8d677287e2361@mail.gmail.com>
Message-ID: <5aa3b3570903171136k3dc616a3hc937381d940cd305@mail.gmail.com>

2009/3/17 Peter Cock <p.j.a.cock at googlemail.com>

> 2009/3/17 Tiago Ant?o <tiagoantao at gmail.com>:
>
> I'd still like to have a copy of the "official" git repository running
> on biopython.org, but this may not be that easy without some technical
> expertise in house to do this.  From initial discussion with the OBF
> team about the idea of running git on their servers, my impression is
> if we can do it ourselves, we may.  Jason Stajich actually suggested
> we use github independently.


Well, basically it is not strictly necessary to have git installed on their
computers to create a mirror.
You can just create the clone on your computer, raw-ly copy the files there,
and then you will be able to push the new changes with an ssh access.
Since git is a distributed source control system, it doesn't require to
configure a server part as with cvs :-)

To my knowledge, the pygr project (also a bioinformatics suite in python)
have an official repository hosted in gitourious, and a mirror in github to
collect patches from there.


-- 

My blog on bioinformatics (now in English): http://bioinfoblog.it


From tiagoantao at gmail.com  Tue Mar 17 15:09:13 2009
From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=)
Date: Tue, 17 Mar 2009 19:09:13 +0000
Subject: [Biopython-dev] biopython on github
In-Reply-To: <5aa3b3570903171136k3dc616a3hc937381d940cd305@mail.gmail.com>
References: <320fb6e00903170146x59218aa0m857cab797ad3f440@mail.gmail.com>
	<C5E52504.1F20A%lpritc@scri.ac.uk>
	<20090317124930.GE57054@sobchak.mgh.harvard.edu>
	<6d941f120903170619n4cb8d4dfr8a72f8ac1e0e896d@mail.gmail.com>
	<320fb6e00903170744j543f643fg6ef8d677287e2361@mail.gmail.com>
	<5aa3b3570903171136k3dc616a3hc937381d940cd305@mail.gmail.com>
Message-ID: <6d941f120903171209g751b5b86p797e79b333972301@mail.gmail.com>

OK, in order to exercise and try github development I have forked a
branch to work on the PopGen code. The idea of the branch is to serve
as a platform for merging with the "official" branch. So, the idea is:

1. Official branch - The stable thingy
2. PopGen stabilizer branch - The place to merge contributions from
PopGen development branches. The idea is that people can go crazy on
their own branches and this intermediate one serves as a point to
stabilize (unit test, documentation, QA, ...) before the commit to the
official one
3. Crazy branches - Develop your crazy idea. I have 3 ideas myself:
One for Jason's structure code, one for my LDNe code and another for
statistics. Many more welcomed....

The development procedure would be like this:
A. People would have all the fun on their development branches
B. When they felt confident they would submit their code to the
stabilizer branch, where we would check that all the important things
were there: unit test, code comments, QA, documentation
C. When things were in good shape, we would propose changes to the
official branch

And, by the way, bug fixes of existing production would also be done
on the stabilizer branch.

Does this make any sense?

In my view, with things like git, a policy like this encourages both
innovation while preserving stability and robustness of the official
branch.

Tiago

On Tue, Mar 17, 2009 at 6:36 PM, Giovanni Marco Dall'Olio
<dalloliogm at gmail.com> wrote:
>
>
> 2009/3/17 Peter Cock <p.j.a.cock at googlemail.com>
>>
>> 2009/3/17 Tiago Ant?o <tiagoantao at gmail.com>:
>>
>> I'd still like to have a copy of the "official" git repository running
>> on biopython.org, but this may not be that easy without some technical
>> expertise in house to do this. ?From initial discussion with the OBF
>> team about the idea of running git on their servers, my impression is
>> if we can do it ourselves, we may. ?Jason Stajich actually suggested
>> we use github independently.
>
> Well, basically it is not strictly necessary to have git installed on their
> computers to create a mirror.
> You can just create the clone on your computer, raw-ly copy the files there,
> and then you will be able to push the new changes with an ssh access.
> Since git is a distributed source control system, it doesn't require to
> configure a server part as with cvs :-)
>
> To my knowledge, the pygr project (also a bioinformatics suite in python)
> have an official repository hosted in gitourious, and a mirror in github to
> collect patches from there.
>
>
>
>
> --
>
> My blog on bioinformatics (now in English): http://bioinfoblog.it
>


-- 
 "A man who dares to waste one hour of time has not discovered the
value of life" - Charles Darwin


From mailinglist.honeypot at gmail.com  Tue Mar 17 15:21:57 2009
From: mailinglist.honeypot at gmail.com (Steve Lianoglou)
Date: Tue, 17 Mar 2009 15:21:57 -0400
Subject: [Biopython-dev] biopython on github
In-Reply-To: <6d941f120903171209g751b5b86p797e79b333972301@mail.gmail.com>
References: <320fb6e00903170146x59218aa0m857cab797ad3f440@mail.gmail.com>
	<C5E52504.1F20A%lpritc@scri.ac.uk>
	<20090317124930.GE57054@sobchak.mgh.harvard.edu>
	<6d941f120903170619n4cb8d4dfr8a72f8ac1e0e896d@mail.gmail.com>
	<320fb6e00903170744j543f643fg6ef8d677287e2361@mail.gmail.com>
	<5aa3b3570903171136k3dc616a3hc937381d940cd305@mail.gmail.com>
	<6d941f120903171209g751b5b86p797e79b333972301@mail.gmail.com>
Message-ID: <7063D4EA-D827-4D91-A15C-53F148660D96@gmail.com>

Hi,

I really just loom around here, but a slight correction/point:

> A. People would have all the fun on their development branches
> B. When they felt confident they would submit their code to the
> stabilizer branch, where we would check that all the important things
> were there: unit test, code comments, QA, documentation
> C. When things were in good shape, we would propose changes to the
> official branch

I'm very much a git noob, and from having been following this thread a  
bit, it seems that many of us are, so for the noobs:

I think somewhere around B, the person wanting to commit new code  
would have to rebase[1] their branch against the official "stabilizer  
branch" (that they had originally forked from). This would put the  
onus of fixing any breaks and keeping track of recent developments on  
the branch you propose to merge into (since you originally branched),  
on the person who is writing the new code.

This makes it easier for the "official keepers of the one true branch"  
to accept new patches, since they know the patch will work on the  
latest version.

Anyway, I think I just wanted to point out that rebase was there since  
I don't think there's anything really equivalent in the CVS/SVN world.

-steve

[1] rebase : http://www.kernel.org/pub/software/scm/git/docs/git-rebase.html

From tiagoantao at gmail.com  Tue Mar 17 15:27:10 2009
From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=)
Date: Tue, 17 Mar 2009 19:27:10 +0000
Subject: [Biopython-dev] biopython on github
In-Reply-To: <7063D4EA-D827-4D91-A15C-53F148660D96@gmail.com>
References: <320fb6e00903170146x59218aa0m857cab797ad3f440@mail.gmail.com>
	<C5E52504.1F20A%lpritc@scri.ac.uk>
	<20090317124930.GE57054@sobchak.mgh.harvard.edu>
	<6d941f120903170619n4cb8d4dfr8a72f8ac1e0e896d@mail.gmail.com>
	<320fb6e00903170744j543f643fg6ef8d677287e2361@mail.gmail.com>
	<5aa3b3570903171136k3dc616a3hc937381d940cd305@mail.gmail.com>
	<6d941f120903171209g751b5b86p797e79b333972301@mail.gmail.com>
	<7063D4EA-D827-4D91-A15C-53F148660D96@gmail.com>
Message-ID: <6d941f120903171227o54bf9d36s645404de9962eed3@mail.gmail.com>

2009/3/17 Steve Lianoglou <mailinglist.honeypot at gmail.com>:
> I think somewhere around B, the person wanting to commit new code would have
> to rebase[1] their branch against the official "stabilizer branch" (that


So, if I understand well, anyone wanting to submit a change to the
official version would be responsible for rebasing, right?

PS - being a git noob and a longtime cvs/svn user and manager I much
appreciated Randal Schwartz google talk at:
http://www.youtube.com/watch?v=8dhZ9BXQgc4 Especially aroung 30
minutes it is really informative.

From mailinglist.honeypot at gmail.com  Tue Mar 17 15:34:11 2009
From: mailinglist.honeypot at gmail.com (Steve Lianoglou)
Date: Tue, 17 Mar 2009 15:34:11 -0400
Subject: [Biopython-dev] biopython on github
In-Reply-To: <6d941f120903171227o54bf9d36s645404de9962eed3@mail.gmail.com>
References: <320fb6e00903170146x59218aa0m857cab797ad3f440@mail.gmail.com>
	<C5E52504.1F20A%lpritc@scri.ac.uk>
	<20090317124930.GE57054@sobchak.mgh.harvard.edu>
	<6d941f120903170619n4cb8d4dfr8a72f8ac1e0e896d@mail.gmail.com>
	<320fb6e00903170744j543f643fg6ef8d677287e2361@mail.gmail.com>
	<5aa3b3570903171136k3dc616a3hc937381d940cd305@mail.gmail.com>
	<6d941f120903171209g751b5b86p797e79b333972301@mail.gmail.com>
	<7063D4EA-D827-4D91-A15C-53F148660D96@gmail.com>
	<6d941f120903171227o54bf9d36s645404de9962eed3@mail.gmail.com>
Message-ID: <711E86ED-F220-4E97-84BC-9E94753E111A@gmail.com>

On Mar 17, 2009, at 3:27 PM, Tiago Ant?o wrote:
> 2009/3/17 Steve Lianoglou <mailinglist.honeypot at gmail.com>:
>> I think somewhere around B, the person wanting to commit new code  
>> would have
>> to rebase[1] their branch against the official "stabilizer  
>> branch" (that
>
> So, if I understand well, anyone wanting to submit a change to the
> official version would be responsible for rebasing, right?

And if I understand it well, then I think you're right.

I think that's a reasonable policy. That puts the responsibility to  
ensure that any new code I write works with whatever has been approved  
already on me, and not you.

While this may require a bit extra responsibility on the committer,  
I'd be surprised if it would be enough to deter any new would-be   
committers from taking a shot at contributing code (maybe it would? I  
guess it's debatable).

> PS - being a git noob and a longtime cvs/svn user and manager I much
> appreciated Randal Schwartz google talk at:
> http://www.youtube.com/watch?v=8dhZ9BXQgc4 Especially aroung 30
> minutes it is really informative.

Sweet.

To be honest, the only video I ever saw of git was Linus' SVN-bash  
google talk, which somehow put me off from considering git longer than  
I should have, so this is a good link to have :-)

Thanks,
-steve

--
Steve Lianoglou
Graduate Student: Physiology, Biophysics and Systems Biology
Weill Medical College of Cornell University

http://cbio.mskcc.org/~lianos


From biopython at maubp.freeserve.co.uk  Tue Mar 17 16:21:45 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Tue, 17 Mar 2009 20:21:45 +0000
Subject: [Biopython-dev] biopython on github
In-Reply-To: <6d941f120903171209g751b5b86p797e79b333972301@mail.gmail.com>
References: <320fb6e00903170146x59218aa0m857cab797ad3f440@mail.gmail.com>
	<C5E52504.1F20A%lpritc@scri.ac.uk>
	<20090317124930.GE57054@sobchak.mgh.harvard.edu>
	<6d941f120903170619n4cb8d4dfr8a72f8ac1e0e896d@mail.gmail.com>
	<320fb6e00903170744j543f643fg6ef8d677287e2361@mail.gmail.com>
	<5aa3b3570903171136k3dc616a3hc937381d940cd305@mail.gmail.com>
	<6d941f120903171209g751b5b86p797e79b333972301@mail.gmail.com>
Message-ID: <320fb6e00903171321y4b94f220h7d2d1172ee085e15@mail.gmail.com>

2009/3/17 Tiago Ant?o <tiagoantao at gmail.com>:
> OK, in order to exercise and try github development I have forked a
> branch to work on the PopGen code. The idea of the branch is to serve
> as a platform for merging with the "official" branch. So, the idea is:
>
> 1. Official branch - The stable thingy
> 2. PopGen stabilizer branch - The place to merge contributions from
> PopGen development branches. The idea is that people can go crazy on
> their own branches and this intermediate one serves as a point to
> stabilize (unit test, documentation, QA, ...) before the commit to the
> official one
> 3. Crazy branches - Develop your crazy idea. I have 3 ideas myself:
> One for Jason's structure code, one for my LDNe code and another for
> statistics. Many more welcomed....
>
> The development procedure would be like this:
> A. People would have all the fun on their development branches
> B. When they felt confident they would submit their code to the
> stabilizer branch, where we would check that all the important things
> were there: unit test, code comments, QA, documentation
> C. When things were in good shape, we would propose changes to the
> official branch
>
> And, by the way, bug fixes of existing production would also be done
> on the stabilizer branch.
>
> Does this make any sense?

Totally.  But keep in mind the current "official" git branch (the one
being updated from CVS) may get nuked if we have to redo the import to
fix the missing version tags - so I would suggest you name your
branches with "test" or "provisional" or something temporary in the
text for now.

> In my view, with things like git, a policy like this encourages both
> innovation while preserving stability and robustness of the official
> branch.

Yes - and this like the right approach for Bio.PopGen, with you acting
as the gatekeeper.

Peter


From chapmanb at 50mail.com  Tue Mar 17 17:34:14 2009
From: chapmanb at 50mail.com (Brad Chapman)
Date: Tue, 17 Mar 2009 17:34:14 -0400
Subject: [Biopython-dev] biopython on github
In-Reply-To: <320fb6e00903171059r7a5528d5i19bf5fed9cfd8a63@mail.gmail.com>
References: <320fb6e00903170146x59218aa0m857cab797ad3f440@mail.gmail.com>
	<C5E52504.1F20A%lpritc@scri.ac.uk>
	<20090317124930.GE57054@sobchak.mgh.harvard.edu>
	<6d941f120903170619n4cb8d4dfr8a72f8ac1e0e896d@mail.gmail.com>
	<320fb6e00903170744j543f643fg6ef8d677287e2361@mail.gmail.com>
	<320fb6e00903171059r7a5528d5i19bf5fed9cfd8a63@mail.gmail.com>
Message-ID: <20090317213414.GK57054@sobchak.mgh.harvard.edu>

Hi Peter;

> Using the git command line tool, I was able to pull and merge Brad's
> changes (as I had made no changes in the meantime this could be done
> automatically), and then push the merged version back up to github on
> my branch.  At this point my branch and brad's agreed once again, and
> the "network" diagram no longer shows both.  Note that my branch now
> includes a commit from Brad.

Sweet. Glad that worked. I deleted my branch (edit->delete
repository).

While doing so, I noticed that there is also a 'Repository
Collaborators' section within the 'edit' page. So, another working
model is to have multiple users simultaneously editing one forked
revision. If you are already communicating on the work through the
mailing list or wiki, this is more like CVS/SVN then the branching
model.

> Now all this worked, but I was wondering if the github web interface
> could have simplified any of this, if I'd only know where to click.
> For example, does github offer any way to view a diff between to
> branches?  Or, as I suspect, do they simply expect you to use the git
> tools directly for this?

What was the command you used for this? git diff is still befuddling
to me.

Brad

From bugzilla-daemon at portal.open-bio.org  Wed Mar 18 10:18:39 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 18 Mar 2009 10:18:39 -0400
Subject: [Biopython-dev] [Bug 2777] [Solution is one line change!] Entity
	sorting altered by detach_child() calls
In-Reply-To: <bug-2777-42@http.bugzilla.open-bio.org/>
Message-ID: <200903181418.n2IEIdIm003158@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2777


biopython-bugzilla at maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED


------- Comment #5 from biopython-bugzilla at maubp.freeserve.co.uk  2009-03-18 10:18 EST -------
Fix checked into CVS as Bio/PDB/Entity.py revision 1.26, marking as fixed.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From biopython at maubp.freeserve.co.uk  Wed Mar 18 11:07:42 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Wed, 18 Mar 2009 15:07:42 +0000
Subject: [Biopython-dev] Preparing for Biopython 1.50 (beta)
In-Reply-To: <320fb6e00903160516yd63f61fu21ca7560562dd6dd@mail.gmail.com>
References: <320fb6e00903160516yd63f61fu21ca7560562dd6dd@mail.gmail.com>
Message-ID: <320fb6e00903180807u4a0f7a5aqaa91f20b40891ca4@mail.gmail.com>

On Mon, Mar 16, 2009 at 12:16 PM, Peter <biopython at maubp.freeserve.co.uk> wrote:
> Bug 2767 - Bio.SeqIO support for FASTQ and QUAL files

That's in CVS now, Brad and I have used it a bit, but further testing
before the beta wouldn't hurt.

> Bug 2551 - Adding advanced __getitem__ to generic alignment, e.g.
> align[1:2,5:-5]

Anyone want try this out?
http://bugzilla.open-bio.org/show_bug.cgi?id=2551

> Any other nominations for Biopython 1.50?

Other candidates with patches that have since come to mind:

Bug 2733 - Runing unit tests where Biopthyon wasn't built from source
http://bugzilla.open-bio.org/show_bug.cgi?id=2733
This seemed patch seemed OK from both my and Bruce's testing.

Bug 2738 - Speed up GenBank parsing, in particular location parsing
http://bugzilla.open-bio.org/show_bug.cgi?id=2738
I would want to run some theses with EMBL files before committing this.

Bug 2745 - Bio.GenBank.LocationParserError with a GenBank CON file
http://bugzilla.open-bio.org/show_bug.cgi?id=2745
I'd like to change CONTIG line parsing to just use a string (or a list
of strings).

Peter

From nuin at genedrift.org  Wed Mar 18 15:50:28 2009
From: nuin at genedrift.org (Paulo Nuin)
Date: Wed, 18 Mar 2009 15:50:28 -0400
Subject: [Biopython-dev] biopython on github
In-Reply-To: <320fb6e00903161007p3e36b6d3j29e4c319c762576a@mail.gmail.com>
References: <8b34ec180902231029u7a9d003r533af7f078f4a8e2@mail.gmail.com>	
	<8b34ec180903121620w9c2ec46i8fed9ccb4781370e@mail.gmail.com>	
	<320fb6e00903130521s69c5b3eg55b71191b1e8ff21@mail.gmail.com>	
	<128a885f0903142243r372026d7vdf5bbe998db3a326@mail.gmail.com>	
	<20090315185443.GA30296@kunkel>	
	<320fb6e00903160430h125d11a3jd100497d3e25ffb8@mail.gmail.com>	
	<8b34ec180903160724h2e239fafi22d8f5fa9c1de7cc@mail.gmail.com>	
	<320fb6e00903160800s36b8231fo57e0a11506f8635d@mail.gmail.com>	
	<8b34ec180903160955m3d427927wce61940f51cf5337@mail.gmail.com>	
	<49BE8532.9040701@genedrift.org>
	<320fb6e00903161007p3e36b6d3j29e4c319c762576a@mail.gmail.com>
Message-ID: <49C15084.8040208@genedrift.org>

Peter wrote:
> On Mon, Mar 16, 2009 at 4:58 PM, Paulo Nuin <nuin at genedrift.org> wrote:
>   
>> No problem on Vista.
>>
>> Git (version 1.5.6.1-preview20080701)
>>
>> Paulo
>>     
>
> Hi Paulo,
>
> Could you be a bit more precise about the version are you using and
> where got it from? i.e. Are you using cygwin or the Windows native
> port, http://code.google.com/p/msysgit/
>   
I'm using msysgit version 1.5.6.


> And did you mean in general you have no problems with git on Windows
> Vista, or have you also tried fetching Biopython from github,
> building, testing (and installing it)?  For example, are there any new
> line issues from the unit tests?  This is one area where CVS and git
> may differ slightly...
>   
I'm using Github to store a couple of projects and this version is 
working great. Also Eclipse addon is also fine. I cloned BioPython but 
haven't tried installing or building it.

Paulo

From bugzilla-daemon at portal.open-bio.org  Thu Mar 19 09:42:23 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 19 Mar 2009 09:42:23 -0400
Subject: [Biopython-dev] [Bug 2654] Bio.Blast.NCBIStandalone does not
	support the output file argument
In-Reply-To: <bug-2654-42@http.bugzilla.open-bio.org/>
Message-ID: <200903191342.n2JDgN3p016978@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2654


yvan.strahm at bccs.uib.no changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |yvan.strahm at bccs.uib.no


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Thu Mar 19 13:08:16 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 19 Mar 2009 13:08:16 -0400
Subject: [Biopython-dev] [Bug 2654] Bio.Blast.NCBIStandalone does not
	support the output file argument
In-Reply-To: <bug-2654-42@http.bugzilla.open-bio.org/>
Message-ID: <200903191708.n2JH8GqS032350@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2654


biopython-bugzilla at maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED


------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk  2009-03-19 13:08 EST -------
Fixed in Bio/Blast/NCBIStandalone.py CVS revision 1.86
http://cvs.biopython.org/cgi-bin/viewcvs/viewcvs.cgi/biopython/Bio/Blast/NCBIStandalone.py?cvsroot=biopython

Note that the three tools themselves all use -o (lower case) for the output
file, but refer to it slightly differently:

$ ./rpsblast --help | grep " -o "
  -o  Output File for Alignment [File Out]  Optional
$ ./blastpgp --help | grep " -o "
  -o  Output File for Alignment [File Out]  Optional
$ ./blastall --help | grep " -o "
  -o  BLAST report Output File [File Out]  Optional

Our function for rpsblast already supported this argument under the name
"align_outfile" which I have therefore also used for blastpgp (this is good
name as blastpgp outputs more than one type of file).

For blastall "align_outfile" doesn't seem entirely appropriate, and although it
is inconsistent I have gone for "outfile" instead.

Example usage:

#imports and setting up input parameters omitted
out_handle, err_handle = NCBIStandalone.blastall(blastall_exe, "blastp",
                                                 blastdb_nr, query_file,
                                                 expectation=0.000001,
                                                 nprocessors=1, filter="F",
                                                 outfile=output_file,
                                                 alignments=5, descriptions=5)
assert "" == err_handle.read()
assert "" = out_handle.read() #Important so we wait for BLAST to finish!
err_handle.close()
out_handle.close()
assert os.path.isfile(output_file)

count = 0
for blast_record in NCBIXML.parse(open(output_file)) :
    count += 1
print "Found %i BLAST results" % count


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From biopython at maubp.freeserve.co.uk  Thu Mar 19 15:00:51 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Thu, 19 Mar 2009 19:00:51 +0000
Subject: [Biopython-dev] biopython on github
In-Reply-To: <20090317213414.GK57054@sobchak.mgh.harvard.edu>
References: <320fb6e00903170146x59218aa0m857cab797ad3f440@mail.gmail.com>
	<C5E52504.1F20A%lpritc@scri.ac.uk>
	<20090317124930.GE57054@sobchak.mgh.harvard.edu>
	<6d941f120903170619n4cb8d4dfr8a72f8ac1e0e896d@mail.gmail.com>
	<320fb6e00903170744j543f643fg6ef8d677287e2361@mail.gmail.com>
	<320fb6e00903171059r7a5528d5i19bf5fed9cfd8a63@mail.gmail.com>
	<20090317213414.GK57054@sobchak.mgh.harvard.edu>
Message-ID: <320fb6e00903191200q4ccff93v7e082990d115bc09@mail.gmail.com>

On Tue, Mar 17, 2009 at 9:34 PM, Brad Chapman <chapmanb at 50mail.com> wrote:
> Hi Peter;
>
>> Using the git command line tool, I was able to pull and merge Brad's
>> changes (as I had made no changes in the meantime this could be done
>> automatically), and then push the merged version back up to github on
>> my branch. ?At this point my branch and brad's agreed once again, and
>> the "network" diagram no longer shows both. ?Note that my branch now
>> includes a commit from Brad.
>
> Sweet. Glad that worked. I deleted my branch (edit->delete
> repository).

How long did it take to process?  I deleted mine (after attempting to
merge against the CVS mirror).  The delete was still in progress over
12 hours later!

> While doing so, I noticed that there is also a 'Repository
> Collaborators' section within the 'edit' page. So, another working
> model is to have multiple users simultaneously editing one forked
> revision. If you are already communicating on the work through the
> mailing list or wiki, this is more like CVS/SVN then the branching
> model.

Yes, this should be a fairly simple way to give all our current CVS
developers direct access to a master branch on github.

>> Now all this worked, but I was wondering if the github web interface
>> could have simplified any of this, if I'd only know where to click.
>> For example, does github offer any way to view a diff between to
>> branches? ?Or, as I suspect, do they simply expect you to use the git
>> tools directly for this?
>
> What was the command you used for this? git diff is still befuddling
> to me.

I didn't actually figure that out (how to do a diff between two
branches on github).  And this afternoon github seems to be down, so I
haven't played with it any more.

Peter


From chris.lasher at gmail.com  Fri Mar 20 00:52:49 2009
From: chris.lasher at gmail.com (Chris Lasher)
Date: Fri, 20 Mar 2009 00:52:49 -0400
Subject: [Biopython-dev] Help pages in Biopython wiki
Message-ID: <128a885f0903192152m7d1e24fdh3ace50021851b36e@mail.gmail.com>

Would it be possible to get the help documentation installed for the
Biopython wiki?

http://biopython.org/wiki/Help

Chris

From lpritc at scri.ac.uk  Fri Mar 20 04:42:44 2009
From: lpritc at scri.ac.uk (Leighton Pritchard)
Date: Fri, 20 Mar 2009 08:42:44 +0000
Subject: [Biopython-dev] Help pages in Biopython wiki
In-Reply-To: <128a885f0903192152m7d1e24fdh3ace50021851b36e@mail.gmail.com>
Message-ID: <C5E90784.1F50A%lpritc@scri.ac.uk>

Hi Chris,

That page doesn't exist, yet (click on the 'page' tab to see this), and no
pages link to it (see here:
http://biopython.org/wiki/Special:WhatLinksHere/Help)

What help were you expecting to see there?

L.

On 20/03/2009 04:52, "Chris Lasher" <chris.lasher at gmail.com> wrote:

> Would it be possible to get the help documentation installed for the
> Biopython wiki?
> 
> http://biopython.org/wiki/Help
> 
> Chris
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev
> 
> ______________________________________________________________________
> This email has been scanned by the MessageLabs Email Security System.
> For more information please visit http://www.messagelabs.com/email
> ______________________________________________________________________

-- 
Dr Leighton Pritchard MRSC
D131, Plant Pathology Programme, SCRI
Errol Road, Invergowrie, Perth and Kinross, Scotland, DD2 5DA
e:lpritc at scri.ac.uk       w:http://www.scri.ac.uk/staff/leightonpritchard
gpg/pgp: 0xFEFC205C       tel:+44(0)1382 562731 x2405


______________________________________________________
SCRI, Invergowrie, Dundee, DD2 5DA.  
The Scottish Crop Research Institute is a charitable company limited by guarantee. 
Registered in Scotland No: SC 29367.
Recognised by the Inland Revenue as a Scottish Charity No: SC 006662.


DISCLAIMER:

This email is from the Scottish Crop Research Institute, but the views expressed by the sender are not necessarily the views of SCRI and its subsidiaries.  This email and any files transmitted with it are confidential to the intended recipient at the e-mail address to which it has been addressed.  It may not be disclosed or used by any other than that
addressee.
If you are not the intended recipient you are requested to preserve this confidentiality and you must not use, disclose, copy, print or rely on
this e-mail in any way. Please notify postmaster at scri.ac.uk quoting the name of the sender and delete the email from your system.

Although SCRI has taken reasonable precautions to ensure no viruses are present in this email, neither the Institute nor the sender accepts any responsibility for any viruses, and it is your responsibility to scan the email and the attachments (if any).
______________________________________________________

From biopython at maubp.freeserve.co.uk  Fri Mar 20 06:41:49 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Fri, 20 Mar 2009 10:41:49 +0000
Subject: [Biopython-dev] biopython on github
In-Reply-To: <320fb6e00903191200q4ccff93v7e082990d115bc09@mail.gmail.com>
References: <320fb6e00903170146x59218aa0m857cab797ad3f440@mail.gmail.com>
	<C5E52504.1F20A%lpritc@scri.ac.uk>
	<20090317124930.GE57054@sobchak.mgh.harvard.edu>
	<6d941f120903170619n4cb8d4dfr8a72f8ac1e0e896d@mail.gmail.com>
	<320fb6e00903170744j543f643fg6ef8d677287e2361@mail.gmail.com>
	<320fb6e00903171059r7a5528d5i19bf5fed9cfd8a63@mail.gmail.com>
	<20090317213414.GK57054@sobchak.mgh.harvard.edu>
	<320fb6e00903191200q4ccff93v7e082990d115bc09@mail.gmail.com>
Message-ID: <320fb6e00903200341n7df020a7j95c611ab0a886ccb@mail.gmail.com>

On Thu, Mar 19, 2009 at 7:00 PM, Peter <biopython at maubp.freeserve.co.uk> wrote:
> On Tue, Mar 17, 2009 at 9:34 PM, Brad Chapman <chapmanb at 50mail.com> wrote:
>> Sweet. Glad that worked. I deleted my branch (edit->delete
>> repository).
>
> How long did it take to process? ?I deleted mine (after attempting to
> merge against the CVS mirror). ?The delete was still in progress over
> 12 hours later!

And the branch delete is still on-going :(

> ... ?And this afternoon github seems to be down, so I haven't played with it any more.

Its back online again, but right now for me github is a bit of a damp squid [*].
As my initial branch/fork of biopython still exists but is being
deleted, it seems
in the meantime I can't create a new branch of biopython.  Odd, and rather
frustrating.  Hopefully it will sort itself out shortly, and I can
have another play
with merging branches...

Peter

[*] For the benefit of non-native English speakers, or or anyone whose sense
of humour works differently to mine, this was a pun, based on the English phrase
"damp squib" for a disappointing event, and the fact that github's
error page has
some kind of cartoon squid/octopus-cat creature on it.


From dalloliogm at gmail.com  Fri Mar 20 07:15:21 2009
From: dalloliogm at gmail.com (Giovanni Marco Dall'Olio)
Date: Fri, 20 Mar 2009 12:15:21 +0100
Subject: [Biopython-dev] biopython on github
In-Reply-To: <320fb6e00903200341n7df020a7j95c611ab0a886ccb@mail.gmail.com>
References: <320fb6e00903170146x59218aa0m857cab797ad3f440@mail.gmail.com>
	<C5E52504.1F20A%lpritc@scri.ac.uk>
	<20090317124930.GE57054@sobchak.mgh.harvard.edu>
	<6d941f120903170619n4cb8d4dfr8a72f8ac1e0e896d@mail.gmail.com>
	<320fb6e00903170744j543f643fg6ef8d677287e2361@mail.gmail.com>
	<320fb6e00903171059r7a5528d5i19bf5fed9cfd8a63@mail.gmail.com>
	<20090317213414.GK57054@sobchak.mgh.harvard.edu>
	<320fb6e00903191200q4ccff93v7e082990d115bc09@mail.gmail.com>
	<320fb6e00903200341n7df020a7j95c611ab0a886ccb@mail.gmail.com>
Message-ID: <5aa3b3570903200415m2f46a45fs8be270f28357a994@mail.gmail.com>

On Fri, Mar 20, 2009 at 11:41 AM, Peter <biopython at maubp.freeserve.co.uk> wrote:
> On Thu, Mar 19, 2009 at 7:00 PM, Peter <biopython at maubp.freeserve.co.uk> wrote:
>> On Tue, Mar 17, 2009 at 9:34 PM, Brad Chapman <chapmanb at 50mail.com> wrote:
>>> Sweet. Glad that worked. I deleted my branch (edit->delete
>>> repository).
>>
>> How long did it take to process? ?I deleted mine (after attempting to
>> merge against the CVS mirror). ?The delete was still in progress over
>> 12 hours later!
>
> And the branch delete is still on-going :(
>
>> ... ?And this afternoon github seems to be down, so I haven't played with it any more.
>
> Its back online again, but right now for me github is a bit of a damp squid [*].
> As my initial branch/fork of biopython still exists but is being
> deleted, it seems
> in the meantime I can't create a new branch of biopython.

mmm are you referring to this:
- http://github.com/peterjc/biopython-seqio-quality/network
?

I can see it, and also fetch/pull changes from it..

I see that you have renamed your fork as seqio-quality. Ok, but I
think it is better to keep the fork's name as 'biopython', and then
create many branches inside it.

For example:

<create a fork on github>
git clone <yourforkurl>
cd biopython

# make some commits to your master branch:
touch testfile.txt
git add testfile.txt
git commit -a -m 'test file added'
# push the changes to your github repository ('origin' refers to
github; see $(CWD)/biopython/.git/config)
git push origin master


# create a branch called 'experimental-seqio-quality', and switch to it:
# without arguments, git branch shows the list of branches and the current one:
git branch
# create the experimental-seqio-quality branch:
git branch experimental-seqio-quality
# switch to it:
git checkout experimental-seqio-quality
# check that experimental-seqio-quality is the current working branch:
git branch

# now you are working in the branch called
'experimental-seqio-quality'. All the changes you
# commit here, will not be saved in the 'master' branch or the others,
as long as you don't
# merge them:
touch seqio-parser
git add seqio-parser
git commit -a -m 'added seqioparser'
git push origin experimental-seqio-quality
# after pushing, git will create a new branch in github. Look for
example at my fork here:
# - http://github.com/biopython/biopython/network

############

Here is how you can merge and compare your branch with someone else's
or with the biopython one:

# add a reference to biopython official branch
git remote add biopython git://github.com/biopython/biopython.git

# obtain the set of changes from the biopython branch, and merge them
git fetch biopython
git log master biopython/master
git diff master biopython/master
git merge master biopython/master

git remote add peter git://github.com/peterjc/biopython-seqio-quality.git
git fetch peter # there it should be a way to do this without having to fetch
git diff master peter/master

For references, look at this guide:
http://github.com/guides/keeping-a-git-fork-in-sync-with-the-forked-repo


>?Odd, and rather
> frustrating. ?Hopefully it will sort itself out shortly, and I can
> have another play
> with merging branches...
>
> Peter
>
> [*] For the benefit of non-native English speakers, or or anyone whose sense
> of humour works differently to mine, this was a pun, based on the English phrase
> "damp squib" for a disappointing event, and the fact that github's
> error page has
> some kind of cartoon squid/octopus-cat creature on it.
>
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev
>


--

My blog on bioinformatics (now in English): http://bioinfoblog.it


From cymon.cox at googlemail.com  Fri Mar 20 07:16:27 2009
From: cymon.cox at googlemail.com (Cymon Cox)
Date: Fri, 20 Mar 2009 11:16:27 +0000
Subject: [Biopython-dev] Test - ignore
Message-ID: <7265d4f0903200416o7c8135ddrfae4aad723bd17b7@mail.gmail.com>


From biopython at maubp.freeserve.co.uk  Fri Mar 20 07:32:15 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Fri, 20 Mar 2009 11:32:15 +0000
Subject: [Biopython-dev] biopython on github
In-Reply-To: <5aa3b3570903200415m2f46a45fs8be270f28357a994@mail.gmail.com>
References: <320fb6e00903170146x59218aa0m857cab797ad3f440@mail.gmail.com>
	<C5E52504.1F20A%lpritc@scri.ac.uk>
	<20090317124930.GE57054@sobchak.mgh.harvard.edu>
	<6d941f120903170619n4cb8d4dfr8a72f8ac1e0e896d@mail.gmail.com>
	<320fb6e00903170744j543f643fg6ef8d677287e2361@mail.gmail.com>
	<320fb6e00903171059r7a5528d5i19bf5fed9cfd8a63@mail.gmail.com>
	<20090317213414.GK57054@sobchak.mgh.harvard.edu>
	<320fb6e00903191200q4ccff93v7e082990d115bc09@mail.gmail.com>
	<320fb6e00903200341n7df020a7j95c611ab0a886ccb@mail.gmail.com>
	<5aa3b3570903200415m2f46a45fs8be270f28357a994@mail.gmail.com>
Message-ID: <320fb6e00903200432s59ddf9a8vfd8230c0a07cd598@mail.gmail.com>

>> As my initial branch/fork of biopython still exists but is being
>> deleted, it seems in the meantime I can't create a new branch
>> of biopython.
>
> mmm are you referring to this:
> - http://github.com/peterjc/biopython-seqio-quality/network
> ?
>
> I can see it, and also fetch/pull changes from it..

True, the network page is still there for me. But
http://github.com/peterjc/biopython-seqio-quality/ which redirects to
http://github.com/peterjc/biopython-seqio-quality/tree/master
shows me just a "This repository is being deleted" page.

> I see that you have renamed your fork as seqio-quality. Ok, but I
> think it is better to keep the fork's name as 'biopython', and then
> create many branches inside it.

I don't think I had entirely understood github's use of fork versus branch.
I'll have so do some more reading and try again once my account has
settled down.  Thanks for the details in your email.

Peter

From bugzilla-daemon at portal.open-bio.org  Fri Mar 20 08:18:53 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 20 Mar 2009 08:18:53 -0400
Subject: [Biopython-dev] [Bug 2678] Bio.Entrez module does not always
	retrieve or find DTD files
In-Reply-To: <bug-2678-42@http.bugzilla.open-bio.org/>
Message-ID: <200903201218.n2KCIrSX026346@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2678


------- Comment #10 from mdehoon at ims.u-tokyo.ac.jp  2009-03-20 08:18 EST -------
(In reply to comment #7)
> (In reply to comment #6)
> > If the DTD is available locally in Bio/Entrez/DTDs, then Bio.Entrez will read
> > it from there. If not, it tries to download it. This may fail if the servers
> > are busy. If the needed DTDs are saved in Bio/Entrez/DTDs (and installed when
> > Biopython is installed), you won't run into this problem.
> 
> I was just looking at this on my Windows XP Python 2.3 machine, and when it
> tried to download missing DTD files it was just using a filename as the URL.

In hindsight, I wonder if trying to download missing DTD files is really a good
idea. Suppose a user does a large number of Entrez queries, and saves the
results as XML files. Then, he tries to parse each of those XML files. If a DTD
file is missing, then Bio.Entrez will try to download the same DTD file for
each XML file it is trying to parse. This is not only wasteful, but also
bypasses Entrez's rule of no more than three accesses per second. In addition,
this is fragile. The XML files typically contain a full url to the needed DTD.
But many of Entrez's DTD files contain references to other DTD files, and those
references can be relative. When Bio.Entrez gets such a relative path to where
the DTD file is located, it is difficult to figure out the absolute path to the
DTD. Now we are looking for it in http://www.ncbi.nlm.nih.gov/dtd/, but this
does not seem to contain all required DTDs.

It may therefore make sense not to download the DTD file, but to raise an
Exception with a helpful error message, specifying which DTD file is missing,
where it can possibly be found, and where the DTD file can be installed. It
requires some more effort from the user, but it is more robust, won't break
Entrez' rules, and is more efficient.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From chapmanb at 50mail.com  Fri Mar 20 08:55:18 2009
From: chapmanb at 50mail.com (Brad Chapman)
Date: Fri, 20 Mar 2009 08:55:18 -0400
Subject: [Biopython-dev] biopython on github
In-Reply-To: <320fb6e00903200432s59ddf9a8vfd8230c0a07cd598@mail.gmail.com>
References: <C5E52504.1F20A%lpritc@scri.ac.uk>
	<20090317124930.GE57054@sobchak.mgh.harvard.edu>
	<6d941f120903170619n4cb8d4dfr8a72f8ac1e0e896d@mail.gmail.com>
	<320fb6e00903170744j543f643fg6ef8d677287e2361@mail.gmail.com>
	<320fb6e00903171059r7a5528d5i19bf5fed9cfd8a63@mail.gmail.com>
	<20090317213414.GK57054@sobchak.mgh.harvard.edu>
	<320fb6e00903191200q4ccff93v7e082990d115bc09@mail.gmail.com>
	<320fb6e00903200341n7df020a7j95c611ab0a886ccb@mail.gmail.com>
	<5aa3b3570903200415m2f46a45fs8be270f28357a994@mail.gmail.com>
	<320fb6e00903200432s59ddf9a8vfd8230c0a07cd598@mail.gmail.com>
Message-ID: <20090320125518.GA351@sobchak.mgh.harvard.edu>

Hi all;

> >> As my initial branch/fork of biopython still exists but is being
> >> deleted, it seems in the meantime I can't create a new branch
> >> of biopython.
[...]
> True, the network page is still there for me. But
> http://github.com/peterjc/biopython-seqio-quality/ which redirects to
> http://github.com/peterjc/biopython-seqio-quality/tree/master
> shows me just a "This repository is being deleted" page.

Peter, the repository deletion was very quick for me, so it looks like it
got stuck somewhere with the GitHub downtime. Does this help for getting it
removed:

http://originblog.wordpress.com/2008/04/28/github-tips-removing-a-remote-branch/

> > I see that you have renamed your fork as seqio-quality. Ok, but I
> > think it is better to keep the fork's name as 'biopython', and then
> > create many branches inside it.
> 
> I don't think I had entirely understood github's use of fork versus branch.
> I'll have so do some more reading and try again once my account has
> settled down.  Thanks for the details in your email.

Wow, now I am mad confused. I thought forks and branches were
conceptually the same. Giovanni, it seems like you are suggesting one
branch (the GitHub fork) and then a second branch (the git branch 
command). We were thinking of a standard case as:

1. Fork the Biopython trunk at GitHub. Name this something so it
makes sense what the fork/branch is for.
2. Work on the fork/branch. If you want, invite others to work on it
with you.
3. When finished, be sure you are up to date with the master
Biopython trunk.
4. Submit the fork/branch for inclusion in Biopython.
5. Once included, delete the fork/branch.

Which parts of this fall out of "standard" git practice? In general,
we should strive to keep this as simple as possible. If using Git is
complicated then we are losing a lot of our advantage over CVS/patches.

Giovanni, the example commands were very helpful; I added details to the Git
page on how to see diffs of branches:

http://biopython.org/wiki/GitMigration#Evaluating_changes

Brad

From bugzilla-daemon at portal.open-bio.org  Fri Mar 20 09:57:00 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 20 Mar 2009 09:57:00 -0400
Subject: [Biopython-dev] [Bug 2678] Bio.Entrez module does not always
	retrieve or find DTD files
In-Reply-To: <bug-2678-42@http.bugzilla.open-bio.org/>
Message-ID: <200903201357.n2KDv0JJ001146@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2678


------- Comment #11 from biopython-bugzilla at maubp.freeserve.co.uk  2009-03-20 09:57 EST -------
(In reply to comment #10)
> 
> In hindsight, I wonder if trying to download missing DTD files is really a
> good idea. Suppose a user does a large number of Entrez queries, and saves
> the results as XML files. Then, he tries to parse each of those XML files.
> If a DTD file is missing, then Bio.Entrez will try to download the same DTD
> file for each XML file it is trying to parse. This is not only wasteful, but
> also bypasses Entrez's rule of no more than three accesses per second.

Very true.  We should be able to enforce the access limit here without too much
trouble.  More generally, it would make sense for the DTD file to be saved -
ideally to the python site-packages but as we may not have write access, at
least to a cache.

> In addition, this is fragile. The XML files typically contain a full url to
> the needed DTD.   But many of Entrez's DTD files contain references to other
> DTD files, and those references can be relative. When Bio.Entrez gets such a
> relative path to where the DTD file is located, it is difficult to figure out
> the absolute path to the DTD. Now we are looking for it in
> http://www.ncbi.nlm.nih.gov/dtd/, but this does not seem to contain all
> required DTDs.

When I looked into the DTD URLs, I didn't see the NCBI using an relative
links, but they may have changed things since.  Additionally the NCBI have a
(different but overlapping) set of DTD files at:
http://eutils.ncbi.nlm.nih.gov/entrez/query/DTD/

Can we get some python XML/DTD library to resolve these links for us?

> It may therefore make sense not to download the DTD file, but to raise an
> Exception with a helpful error message, specifying which DTD file is missing,
> where it can possibly be found, and where the DTD file can be installed. It
> requires some more effort from the user, but it is more robust, won't break
> Entrez' rules, and is more efficient.

Biopython 1.49 generally failed to download missing DTD files.  Right now the
current code in CVS does much better at coping with missing DTD files, but in a
very wasteful way.  In either version, it does at least issue warnings,
indicating something is not right.

As a user, I would prefer Bio.Entrez to download missing DTD files on demand
AND SAVE THEM.  As a developer I can see this is rather complicated, and you
are right Michiel - a simple error message with instructions is much more
straight forward.

Note that the error might also suggest upgrading to the latest Biopython, or
reporting the issue to us - but it would then be a very long error message!

If you want to switch to a helpful error message for missing DTD files, I'm OK
with that.  We could also ship the current code for Biopython 1.50.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From dalloliogm at gmail.com  Fri Mar 20 10:25:41 2009
From: dalloliogm at gmail.com (Giovanni Marco Dall'Olio)
Date: Fri, 20 Mar 2009 15:25:41 +0100
Subject: [Biopython-dev] biopython on github
In-Reply-To: <20090320125518.GA351@sobchak.mgh.harvard.edu>
References: <C5E52504.1F20A%lpritc@scri.ac.uk>
	<6d941f120903170619n4cb8d4dfr8a72f8ac1e0e896d@mail.gmail.com>
	<320fb6e00903170744j543f643fg6ef8d677287e2361@mail.gmail.com>
	<320fb6e00903171059r7a5528d5i19bf5fed9cfd8a63@mail.gmail.com>
	<20090317213414.GK57054@sobchak.mgh.harvard.edu>
	<320fb6e00903191200q4ccff93v7e082990d115bc09@mail.gmail.com>
	<320fb6e00903200341n7df020a7j95c611ab0a886ccb@mail.gmail.com>
	<5aa3b3570903200415m2f46a45fs8be270f28357a994@mail.gmail.com>
	<320fb6e00903200432s59ddf9a8vfd8230c0a07cd598@mail.gmail.com>
	<20090320125518.GA351@sobchak.mgh.harvard.edu>
Message-ID: <5aa3b3570903200725p1437ceem6a538af640c52ced@mail.gmail.com>

On Fri, Mar 20, 2009 at 1:55 PM, Brad Chapman <chapmanb at 50mail.com> wrote:
> Hi all;
>
>> >> As my initial branch/fork of biopython still exists but is being
>> >> deleted, it seems in the meantime I can't create a new branch
>> >> of biopython.
> [...]
>> True, the network page is still there for me. But
>> http://github.com/peterjc/biopython-seqio-quality/ which redirects to
>> http://github.com/peterjc/biopython-seqio-quality/tree/master
>> shows me just a "This repository is being deleted" page.
>
> Peter, the repository deletion was very quick for me, so it looks like it
> got stuck somewhere with the GitHub downtime. Does this help for getting it
> removed:
>
> http://originblog.wordpress.com/2008/04/28/github-tips-removing-a-remote-branch/
>
>> > I see that you have renamed your fork as seqio-quality. Ok, but I
>> > think it is better to keep the fork's name as 'biopython', and then
>> > create many branches inside it.
>>
>> I don't think I had entirely understood github's use of fork versus branch.
>> I'll have so do some more reading and try again once my account has
>> settled down. ?Thanks for the details in your email.
>
> Wow, now I am mad confused. I thought forks and branches were
> conceptually the same.

Consider that the term "fork" is specific to github, and has nothing
to do with git. There is no 'git fork' command.
When you do a 'fork' in github, what it does it to create a personal
'space' on your account on github, to host all your personalizations,
including new commits and also new branches of development.
It is a kind of 'working space', that indicates all the work you have done.

I understand it seems a bit complicated at first :-( but I think that,
without using github, it is even more difficult to understand these
things.

In your account you can have more than one experimental branch. For
example, I can create a branch called 'experimental-xzy-parser',
another called 'personal modifications', and keep the master branch as
it is (or rename it).

if you want to contribute to my 'xyz parser', you can fetch this
branch into your space, with a command like:
$: git remote add giovanni <my url on github>
$: git pull giovanni master:experimental-xyz-parser # (not sure about
this last command)

this should create a branch called 'experimental-xyz-parser' in your
computer, so you can work with it, make modifications, and later push
it to github (where it will happear in the network graph).


> Giovanni, it seems like you are suggesting one
> branch (the GitHub fork) and then a second branch (the git branch
> command). We were thinking of a standard case as:
>
> 1. Fork the Biopython trunk at GitHub. Name this something so it
> makes sense what the fork/branch is for.
> 2. Work on the fork/branch. If you want, invite others to work on it
> with you.
> 3. When finished, be sure you are up to date with the master
> Biopython trunk.
> 4. Submit the fork/branch for inclusion in Biopython.
> 5. Once included, delete the fork/branch.
>
> Which parts of this fall out of "standard" git practice? In general,
> we should strive to keep this as simple as possible. If using Git is
> complicated then we are losing a lot of our advantage over CVS/patches.
>
> Giovanni, the example commands were very helpful; I added details to the Git
> page on how to see diffs of branches:
>
> http://biopython.org/wiki/GitMigration#Evaluating_changes
>
> Brad
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev
>


-- 

My blog on bioinformatics (now in English): http://bioinfoblog.it


From bugzilla-daemon at portal.open-bio.org  Fri Mar 20 10:50:49 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 20 Mar 2009 10:50:49 -0400
Subject: [Biopython-dev] [Bug 2767] Bio.SeqIO support for FASTQ and QUAL
	files
In-Reply-To: <bug-2767-42@http.bugzilla.open-bio.org/>
Message-ID: <200903201450.n2KEonrB005712@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2767


biopython-bugzilla at maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED


------- Comment #12 from biopython-bugzilla at maubp.freeserve.co.uk  2009-03-20 10:50 EST -------
Code is in CVS with unit tests.  Marking as fixed.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Fri Mar 20 10:53:37 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 20 Mar 2009 10:53:37 -0400
Subject: [Biopython-dev] [Bug 2770] suggestion: raise a warning if
	Entrez.email is not set
In-Reply-To: <bug-2770-42@http.bugzilla.open-bio.org/>
Message-ID: <200903201453.n2KErbfO006014@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2770


biopython-bugzilla at maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |WONTFIX


------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk  2009-03-20 10:53 EST -------
Resolved as won't fix (unless the NCBI change their guidelines).


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Fri Mar 20 11:49:52 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 20 Mar 2009 11:49:52 -0400
Subject: [Biopython-dev] [Bug 2718] Bio.Graphics and output file formats
	(PDF, EPS, SVG, and bitmaps)
In-Reply-To: <bug-2718-42@http.bugzilla.open-bio.org/>
Message-ID: <200903201549.n2KFnqs8011031@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2718


biopython-bugzilla at maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED


------- Comment #5 from biopython-bugzilla at maubp.freeserve.co.uk  2009-03-20 11:49 EST -------
(In reply to comment #0)
> (1) All the Bio.Graphics "write to file/handle" functions to accept any of the
> supported file formats (like Bio.Graphics.GenomeDiagram), which would require
> renderPM at run time for the bitmap formats (see Bug 2710).  They should share
> some code for mapping format names to ReportLab rendering module.  This would
> be easy to do without changing the existing mix of method names.

That should be working in CVS now.

> (2) Update the docstrings for the "write to file/handle" functions to make it
> clear they can accept a filename OR a handle (a result of the underlying
> reportlab renderer's drawToFile function's behaviour - see note below).

This was done in CVS some time ago (comment 2)

> (3) Standardise on the method naming (and perhaps deprecate the old methods). 
> Using "write" seems to be a sensible choice based on the current names used in
> Bio.Graphics.

This one is more difficult.  GenomeDiagram uses a two step system - draw then
write, where draw creates the ReportLab drawing object, and write saves it to a
file.  I'm going to leave this for another day...

Marking bug as fixed.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Fri Mar 20 13:32:50 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 20 Mar 2009 13:32:50 -0400
Subject: [Biopython-dev] [Bug 2795] New: Add commit, rollback,
	close to DBServer object
Message-ID: <bug-2795-42@http.bugzilla.open-bio.org/>

http://bugzilla.open-bio.org/show_bug.cgi?id=2795

           Summary: Add commit, rollback, close to DBServer object
           Product: Biopython
           Version: Not Applicable
          Platform: All
        OS/Version: All
            Status: NEW
          Severity: enhancement
          Priority: P2
         Component: BioSQL
        AssignedTo: biopython-dev at biopython.org
        ReportedBy: biopython-bugzilla at maubp.freeserve.co.uk


The DBServer object is defined in file BioSQL/BioSeqDatabase.py and it might
make sense to add the following methods to it:

    def commit(self):
        """Commits the current transaction to the database."""
        return self.adaptor.commit()

    def rollback(self):
        """Rolls backs the current transaction."""
        return self.adaptor.rollback()

    def close(self):
        """Close the connection. No further activity possible."""
        return self.adaptor.close()

I think the adaptor is intended to hide internal implementation details, so we
shouldn't be forcing people to use it directly for transaction support.
Consider this example from http://www.biopython.org/wiki/BioSQL currently:

from Bio import Entrez
from Bio import SeqIO
from BioSQL import BioSeqDatabase
server = BioSeqDatabase.open_database(driver="MySQLdb", user="root",
                     passwd = "", host = "localhost", db="bioseqdb")
db = server["orchids"]
handle = Entrez.efetch(db="nuccore", id="6273291,6273290,6273289",
rettype="genbank")
db.load(SeqIO.parse(handle, "genbank"))
server.adaptor.commit()

The last line would become just:

server.commit()

This seems cleaner.  Patch to follow...


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Fri Mar 20 13:34:14 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 20 Mar 2009 13:34:14 -0400
Subject: [Biopython-dev] [Bug 2795] Add commit, rollback,
	close to DBServer object
In-Reply-To: <bug-2795-42@http.bugzilla.open-bio.org/>
Message-ID: <200903201734.n2KHYEZR018864@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2795


------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk  2009-03-20 13:34 EST -------
Created an attachment (id=1263)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1263&action=view)
BioSQL patch

Patch to implement the change described.  Tested with MySQL only.

Cymon - what do you think of this?  And does it work on PostgreSQL?


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Fri Mar 20 13:59:14 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 20 Mar 2009 13:59:14 -0400
Subject: [Biopython-dev] [Bug 2795] Add commit, rollback,
	close to DBServer object
In-Reply-To: <bug-2795-42@http.bugzilla.open-bio.org/>
Message-ID: <200903201759.n2KHxENC020654@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2795


------- Comment #2 from cymon.cox at gmail.com  2009-03-20 13:59 EST -------
(In reply to comment #1)
> Created an attachment (id=1263)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1263&action=view) [details]
> BioSQL patch
> 
> Patch to implement the change described.  Tested with MySQL only.
> 
> Cymon - what do you think of this?  And does it work on PostgreSQL?

I think it makes sense, and works on PostgreSQL with the psycopg2 driver.
C.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Fri Mar 20 14:07:55 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 20 Mar 2009 14:07:55 -0400
Subject: [Biopython-dev] [Bug 2795] Add commit, rollback,
	close to DBServer object
In-Reply-To: <bug-2795-42@http.bugzilla.open-bio.org/>
Message-ID: <200903201807.n2KI7t37021424@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2795


biopython-bugzilla at maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED


------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk  2009-03-20 14:07 EST -------
(In reply to comment #2)
> I think it makes sense, and works on PostgreSQL with the psycopg2 driver.
> C.

Great, checked in, marking as fixed.  We should update the wiki once Biopython
1.50 is out...


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Fri Mar 20 14:52:44 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 20 Mar 2009 14:52:44 -0400
Subject: [Biopython-dev] [Bug 2754] Bio.PDB: Parse warnings should print to
	stderr, not stdout
In-Reply-To: <bug-2754-42@http.bugzilla.open-bio.org/>
Message-ID: <200903201852.n2KIqiBO024589@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2754


------- Comment #10 from eric.talevich at gmail.com  2009-03-20 14:52 EST -------
Here's the github branch where I'm working on this bug:

http://github.com/etal/biopython/tree/master

I've applied the two patches attached here and converted the test script from
print-and-compare to unittest. The tests pass now, but I haven't added checks
for specific parsing errors, just the general PDBConstructionError raised when
parsing the example file with PERMISSIVE=0.

The warnings are hidden during tests, as expected, but in this branch the
PDBParser warnings are noticeably more annoying during normal use. Fixing this
will require more tweaking in Bio/PDB/PDBParser.py -- I'll do that in the same
branch, since I don't think you'd want to merge one fix without the other. Same
goes for the __debug__ protection in StructureBuilder.py.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Fri Mar 20 16:08:37 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 20 Mar 2009 16:08:37 -0400
Subject: [Biopython-dev] [Bug 2754] Bio.PDB: Parse warnings should print to
	stderr, not stdout
In-Reply-To: <bug-2754-42@http.bugzilla.open-bio.org/>
Message-ID: <200903202008.n2KK8bpj029413@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2754


------- Comment #11 from biopython-bugzilla at maubp.freeserve.co.uk  2009-03-20 16:08 EST -------
(In reply to comment #10)
> Here's the github branch where I'm working on this bug:
> http://github.com/etal/biopython/tree/master

I've had a quick look on github, and this look interesting and I hope we can
get it into Biopython proper before too long.

Peter


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From biopython at maubp.freeserve.co.uk  Fri Mar 20 16:44:34 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Fri, 20 Mar 2009 20:44:34 +0000
Subject: [Biopython-dev] biopython on github
In-Reply-To: <20090320125518.GA351@sobchak.mgh.harvard.edu>
References: <C5E52504.1F20A%lpritc@scri.ac.uk>
	<6d941f120903170619n4cb8d4dfr8a72f8ac1e0e896d@mail.gmail.com>
	<320fb6e00903170744j543f643fg6ef8d677287e2361@mail.gmail.com>
	<320fb6e00903171059r7a5528d5i19bf5fed9cfd8a63@mail.gmail.com>
	<20090317213414.GK57054@sobchak.mgh.harvard.edu>
	<320fb6e00903191200q4ccff93v7e082990d115bc09@mail.gmail.com>
	<320fb6e00903200341n7df020a7j95c611ab0a886ccb@mail.gmail.com>
	<5aa3b3570903200415m2f46a45fs8be270f28357a994@mail.gmail.com>
	<320fb6e00903200432s59ddf9a8vfd8230c0a07cd598@mail.gmail.com>
	<20090320125518.GA351@sobchak.mgh.harvard.edu>
Message-ID: <320fb6e00903201344w64b303a1q1b1aac2740bac04a@mail.gmail.com>

On Fri, Mar 20, 2009 at 12:55 PM, Brad Chapman <chapmanb at 50mail.com> wrote:
>
> Peter, the repository deletion was very quick for me, so it looks like it
> got stuck somewhere with the GitHub downtime.

They've fixed it - I picked a bad day to delete a "fork".

Giovanni wrote:
>> > I see that you have renamed your fork as seqio-quality. Ok, but I
>> > think it is better to keep the fork's name as 'biopython', and then
>> > create many branches inside it.

Agreed - when I did that, I hadn't appreciated github's distinction between
branches and forks.

Peter wrote:
>> I don't think I had entirely understood github's use of fork versus branch.
>> I'll have so do some more reading and try again once my account has
>> settled down.  Thanks for the details in your email.

Brad wrote:
> Wow, now I am mad confused. I thought forks and branches were
> conceptually the same. Giovanni, it seems like you are suggesting one
> branch (the GitHub fork) and then a second branch (the git branch
> command). We were thinking of a standard case as:
>
> 1. Fork the Biopython trunk at GitHub. Name this something so it
> makes sense what the fork/branch is for.
> 2. Work on the fork/branch. If you want, invite others to work on it
> with you.
> 3. When finished, be sure you are up to date with the master
> Biopython trunk.
> 4. Submit the fork/branch for inclusion in Biopython.
> 5. Once included, delete the fork/branch.

If I understand correctly, a potential contributor does this:
1. Fork Biopython trunk at GitHub, which will give you your own
public repository (aka a "fork" in github's terminology), called
by default contributorname/biopython, containing initially a
single master branch, e.g.
http://github.com/peterjc/biopython/tree/master
2. Using the git command line tool, create a branch within your
repository to work on a problem, say bug2551, and upload this
branch to your github account. e.g.
http://github.com/peterjc/biopython/tree/bug2551 (I presume)
3. Work on your code, and commit changes to your bug2551 branch
and push these up to your github account.
4. Once you are happy, submit this bug2551 branch for inclusion in
Biopython (in the short term via Bugzilla, but if/when we have moved
to github fully, as a pull request to the main biopython master,
or if appropriate the master of the mainterainer of that module).
5. Once the changes are in the main Biopython, you can delete
the bug2551 branch (but not the whole "fork" which may contain
other branches).

Almost the same... I'll try this shortly (maybe Monday).

Peter

From bugzilla-daemon at portal.open-bio.org  Sat Mar 21 00:13:10 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sat, 21 Mar 2009 00:13:10 -0400
Subject: [Biopython-dev] [Bug 2678] Bio.Entrez module does not always
	retrieve or find DTD files
In-Reply-To: <bug-2678-42@http.bugzilla.open-bio.org/>
Message-ID: <200903210413.n2L4DAgf028509@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2678


mdehoon at ims.u-tokyo.ac.jp changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED


------- Comment #12 from mdehoon at ims.u-tokyo.ac.jp  2009-03-21 00:13 EST -------
(In reply to comment #11)
I've changed Parser.py to show an informative error message about the missing
DTD file, where most likely it can be found, and where to install it. Since
this is probably the best we can do, I'm marking this bug as fixed.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Sat Mar 21 00:24:43 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sat, 21 Mar 2009 00:24:43 -0400
Subject: [Biopython-dev] [Bug 2771] Bio.Entrez.read can't parse XML files
	from dbSNP (snp database)
In-Reply-To: <bug-2771-42@http.bugzilla.open-bio.org/>
Message-ID: <200903210424.n2L4OhOA029253@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2771


------- Comment #5 from mdehoon at ims.u-tokyo.ac.jp  2009-03-21 00:24 EST -------
(In reply to comment #0)
> >>> handle = Entrez.efetch(db='snp', id='9996597', retmode='xml')
> >>> cont = handle.read()
> >>> print cont
> '<?xml version="1.0"?>
> <ExchangeSet...>
> ...
> </ExchangeSet>
> 
With Bio.Entrez currently in CVS, Entrez.read does not raise an exception, but
simply returns an empty record. The problem is that EFetch from the SNP
database uses an XML Schema instead of a DTD to describe the contents of the
XML file, as shown in the first few lines of the XML file:

<?xml version="1.0"?>
<ExchangeSet xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns="http://www.ncbi.nlm.nih.gov/SNP/docsum"
xsi:schemaLocation="http://www.ncbi.nlm.nih.gov/SNP/docsum
http://www.ncbi.nlm.nih.gov/SNP/docsum/eudocsum.xsd">

The last url shows the XML Schema.
All other Entrez Utilities I've seen so far use a DTD instead of an XML Schema.
Hence, Entrez.read only has a DTD parser to find out how to interpret the XML
file. In principle, Bio.Entrez can be modified to add an XML Schema parser.
While this is not trivial, it is probably not super difficult. Marco, would you
be willing to write such a parser? If you have a parser for the XML Schema, I
can show you how to integrate it with Bio.Entrez.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From mjldehoon at yahoo.com  Sat Mar 21 00:47:07 2009
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Fri, 20 Mar 2009 21:47:07 -0700 (PDT)
Subject: [Biopython-dev] Bio.Entrez catching more errors
In-Reply-To: <320fb6e00903101640s5db8ed9hc1335d02f5e4123@mail.gmail.com>
Message-ID: <334920.51680.qm@web62402.mail.re1.yahoo.com>


I think it is good if we catch more errors in Bio.Entrez, but I think the error catching should be done by the parser, not when retrieving.

As you show, NCBI Entrez returns error messages in various different formats: plain text, HTML, incorrect XML, broken XML. Since there are many ways to access NCBI Entrez, there may be other styles of error messages that we don't know about. Then there is the added complication of accessing NCBI Entrez to get information in formats other than XML, e.g. GenBank files. And all this may be changed over time by NCBI.

Since the error message is ill-defined, code trying to identify error messages won't be robust. On the other hand, the format of files expected by a given parser is well-defined: Either the file agrees with the format expected by the parser, or it doesn't; if it doesn't, then that's an error. We may not be able to extract the exact error message returned by NCBI, but a parser for format XYZ can tell you that the file is not in format XYZ. Maybe the XML parser can say it doesn't look like an XML file, but that's about it.

Once NCBI Entrez starts to return errors in a uniform format, we can modify our parsers to find out the exact error message. Until that happens, trying to do so on our side will not be robust.

--Michiel


--- On Tue, 3/10/09, Peter <biopython at maubp.freeserve.co.uk> wrote:

> From: Peter <biopython at maubp.freeserve.co.uk>
> Subject: [Biopython-dev] Bio.Entrez catching more errors
> To: "BioPython-Dev Mailing List" <biopython-dev at lists.open-bio.org>
> Date: Tuesday, March 10, 2009, 7:40 PM
> Hi All,
> 
> It occured to me that the Bio.Entrez._open function can
> look at the
> retmode argument (if present) and spot if there is a
> mismatch between
> the requested format (e.g. XML, HTML, text or asn.1) and
> the actual
> data the NCBI returned.  Something along the following
> lines could be
> added to the end of the _open function in
> Bio/Entrez/__init__.py to
> acheive this:
> 
>     elif "retmode" in params and
> params["retmode"].lower()=="html" \
>     and not data.lower().startswith("<html")
> \
>     and not data.lower().startswith("<!doctype
> html") :
>         raise TypeError("Requested HTML, but
> didn't get it: %s..." % data)
>     elif "retmode" in params and
> params["retmode"].lower()=="xml" \
>     and not data.lower().startswith("<?xml") :
>         raise TypeError("Requested XML, but didn't
> get it: %s..." % data)
>     elif "retmode" in params and
> params["retmode"] \
>     and
> params["retmode"].lower()!="xml" \
>     and data.lower().startswith("<?xml") :
>         raise TypeError("Didn't request XML, but
> got it: %s..." % data)
>     elif "retmode" in params and
> params["retmode"] \
>     and
> params["retmode"].lower()!="html" \
>     and (data.lower().startswith("<html") or
> \
>          data.lower().startswith("<!doctype
> html")):
>         #Expected for some error pages (e.g. the Bad
> Gateway caught above)
>         raise TypeError("Didn't request HTML, but
> got it: %s..." % data)
> 
> I'm sure my XML/HTML detection could be made more
> robust here - I hope
> the principle is clear.  My motivation is that I have
> noticed the NCBI
> can return HTML error pages, and while we do catch some of
> these
> explicitly (e.g. Bad Gateway, or Service Unavailable), I
> think any
> HTML page when the user asked from XML, text or asn.1
> should be
> treated as error.  Similarly, not getting XML when you ask
> for it etc.
> 
> Note that by raising the exception including the message
> text it
> should be much easier to diagnose these failures.  As a
> tiny
> refinement to the above code, we should only add the
> "..." if there is
> more text to follow - this isn't always the case.
> 
> e.g. The following give an HTML error page (while some
> databases like
> "protein" are better behaved in this respect):
> >>> print Entrez.efetch(db="homologene",
> id="nonexistant", retmode="text").read()
> >>> print Entrez.efetch(db="homologene",
> id="nonexistant",
> retmode="asn.1").read()
> 
> Similarly, these give an XML like fragment (which is not a
> valid XML
> file in itself - arguably an NCBI bug; some databases like
> "protein"
> are better behaved in this respect):
> >>> print Entrez.efetch(db="pubmed",
> id="nonexistant", retmode="xml").read()
> >>> print Entrez.efetch(db="homologene",
> id="nonexistant", retmode="xml").read()
> >>> print Entrez.efetch(db="cdd",
> id="nonexistant", retmode="xml").read()
> >>> print Entrez.efetch(db="taxonomy",
> id="nonexistant", retmode="xml").read()
> 
> My suggested change to Bio.Entrez would also catch the
> following
> examples (using an invalid database) where the NCBI ignore
> the retmode
> and return an HTML help page:
> >>> print
> Entrez.efetch(db="nonexistant",
> id="123456", retmode="xml").read()
> >>> print
> Entrez.efetch(db="nonexistant",
> id="123456", retmode="text").read()
> 
> In a less clear cut example, this would flag the following
> as an error
> as the NCBI seem to return ASN.1 text instead of HTML
> here::
> >>> print Entrez.efetch(db="nucleotide",
> retmode="html", id="123456").read()
> 
> Overall, I think this change should catch lots of errors
> which
> otherwise may not be detected until later (e.g. while
> trying to parse
> the file).
> 
> --------------------------------------------------------------------------------------------------
> 
> On another point, should we catch these responses as
> errors:?
> 
> >>> efetch(db="snp",
> id="123456").read()
> '<html><head><title>PmFetch
> response</title></head><body>\n<pre>\n1:
> id: 123456 Error occurred: cannot get document
> summary\n</pre></body></html>'
> >>> efetch(db="snp",
> id="123456", retmode="html").read()
> '<html><head><title>PmFetch
> response</title></head><body>\n<pre>\n1:
> id: 123456 Error occurred: cannot get document
> summary\n</pre></body></html>'
> >>> efetch(db="snp",
> id="123456", retmode="xml").read()
> '<?xml
> version="1.0"?>\n<ExchangeSet
> xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"\nxmlns="http://www.ncbi.nlm.nih.gov/SNP/docsum"\nxsi:schemaLocation="http://www.ncbi.nlm.nih.gov/SNP/docsum\nhttp://www.ncbi.nlm.nih.gov/SNP/docsum/eudocsum.xsd">1:
> id: 123456 Error occurred: cannot get document
> summary\n\n</ExchangeSet>'
> >>> efetch(db="snp",
> id="123456", retmode="text").read()
> '1: id: 123456 Error occurred: cannot get document
> summary\n'
> 
> and,
> >>> print efetch(db="homologene",
> retmode="html", id="fake").read()
> <html>
> <body>
> <br/><h2>Error occurred: Empty id list -
> nothing todo</h2>...
> 
> Looking for the string "Error occurred: " looks
> fairly safe here, and
> should cover a range of entries.  Of course, you can
> imagine false
> positives too, e.g. a valid PUBMED plain text record for a
> tutorial
> article with a title like "Yikes! An Error Occurred: A
> beginner's
> Guide To Defensive Programming." could match.
> 
> Peter
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev


From mjldehoon at yahoo.com  Sat Mar 21 00:54:08 2009
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Fri, 20 Mar 2009 21:54:08 -0700 (PDT)
Subject: [Biopython-dev] Bio.Enzyme (was: Re:  Bio.ExPASy)
In-Reply-To: <76595.11423.qm@web62404.mail.re1.yahoo.com>
Message-ID: <517737.76119.qm@web62403.mail.re1.yahoo.com>


I've created a simplified version of the parser in Bio.Enzyme in Bio.ExPASy.Enzyme. The idea behind it is to collect all parsers related to ExPASy databases in Bio.ExPASy so that they can be found more easily by users.

Bio.ExPASy.Enzyme works essentially the same as Bio.Enzyme, but I've done a few things a bit differently. The biggest change is probably that Bio.Enzyme stores information as attributes to a record, whereas Bio.ExPASy.Enzyme has a Record derived from a dictionary, and stores information in the dictionary (same as Bio.Medline). Does anybody have any objection if Bio.ExPASy.Enzyme becomes the "official" parser for ExPASy's Enzyme database? If not, I'll modify the documentation and tests accordingly, and start the deprecation process for Bio.Enzyme.

--Michiel

--- On Sun, 3/15/09, Michiel de Hoon <mjldehoon at yahoo.com> wrote:

> From: Michiel de Hoon <mjldehoon at yahoo.com>
> Subject: [Biopython-dev] Bio.ExPASy
> To: biopython-dev at biopython.org
> Date: Sunday, March 15, 2009, 6:24 AM
> Hi everybody,
> 
> As discussed previously, I have moved the Bio.Prosite code
> to Bio.ExPASy, and I've added a ScanProsite module to
> Bio.ExPASy. I guess Bio.Enzyme should also move to
> Bio.ExPASy. See
> 
> http://biopython.org/DIST/docs/tutorial/Tutorial.proposal.html
> 
> for the documentation of Biopython as currently in CVS.
> 
> --Michiel.
> 
> 
>       
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev


From bugzilla-daemon at portal.open-bio.org  Sat Mar 21 01:05:19 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sat, 21 Mar 2009 01:05:19 -0400
Subject: [Biopython-dev] [Bug 2759] Unit test for Bio.PDB.HSExposure
In-Reply-To: <bug-2759-42@http.bugzilla.open-bio.org/>
Message-ID: <200903210505.n2L55Jb0031713@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2759


------- Comment #8 from eric.talevich at gmail.com  2009-03-21 01:05 EST -------
Marco & Peter, have either of you applied these patches to a git branch yet? My
branch for Bug 2754 and related changes also converts test_PDB.py to unittest. 
(I silence the warnings by calling warnings.simplefilter('ignore') in the setUp
method.) I'd like to try cherry-picking this commit if it's available on
github.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From mjldehoon at yahoo.com  Sat Mar 21 01:33:42 2009
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Fri, 20 Mar 2009 22:33:42 -0700 (PDT)
Subject: [Biopython-dev] biopython on github
In-Reply-To: <20090320125518.GA351@sobchak.mgh.harvard.edu>
Message-ID: <587027.97686.qm@web62408.mail.re1.yahoo.com>


> Which parts of this fall out of "standard" git
> practice? In general,
> we should strive to keep this as simple as possible. If
> using Git is
> complicated then we are losing a lot of our advantage over
> CVS/patches.

I haven't been following this topic closely, and as an "outsider" using git seems more complicated than using cvs or svn. And to be honest, I don't know if Biopython actually needs the branching and forking stuff. I think that this is more useful for bigger projects, where multiple developers may be working on interrelated parts of code at the same time. That hardly ever happens in Biopython, though.

--Michiel.


From idoerg at gmail.com  Sat Mar 21 01:55:36 2009
From: idoerg at gmail.com (Iddo Friedberg)
Date: Fri, 20 Mar 2009 22:55:36 -0700
Subject: [Biopython-dev] It's out!
Message-ID: <49C48158.9060004@gmail.com>

I'm first to announce this.... hehehe

http://bioinformatics.oxfordjournals.org/cgi/content/abstract/btp163v1

-- 
Iddo Friedberg Ph.D.
Atkinson Hall MC 0446
University of California San Diego
9500 Gilman Dr.
La Jolla, CA 92093-0446 USA
http://iddo-friedberg.net

From dalloliogm at gmail.com  Sat Mar 21 09:57:54 2009
From: dalloliogm at gmail.com (Giovanni Marco Dall'Olio)
Date: Sat, 21 Mar 2009 14:57:54 +0100
Subject: [Biopython-dev] biopython on github
In-Reply-To: <320fb6e00903201344w64b303a1q1b1aac2740bac04a@mail.gmail.com>
References: <C5E52504.1F20A%lpritc@scri.ac.uk>
	<320fb6e00903170744j543f643fg6ef8d677287e2361@mail.gmail.com>
	<320fb6e00903171059r7a5528d5i19bf5fed9cfd8a63@mail.gmail.com>
	<20090317213414.GK57054@sobchak.mgh.harvard.edu>
	<320fb6e00903191200q4ccff93v7e082990d115bc09@mail.gmail.com>
	<320fb6e00903200341n7df020a7j95c611ab0a886ccb@mail.gmail.com>
	<5aa3b3570903200415m2f46a45fs8be270f28357a994@mail.gmail.com>
	<320fb6e00903200432s59ddf9a8vfd8230c0a07cd598@mail.gmail.com>
	<20090320125518.GA351@sobchak.mgh.harvard.edu>
	<320fb6e00903201344w64b303a1q1b1aac2740bac04a@mail.gmail.com>
Message-ID: <5aa3b3570903210657v46b1b1bbj80c013b83ff635e3@mail.gmail.com>

On Fri, Mar 20, 2009 at 9:44 PM, Peter <biopython at maubp.freeserve.co.uk> wrote:

> If I understand correctly, a potential contributor does this:
> 1. Fork Biopython trunk at GitHub, which will give you your own
> public repository (aka a "fork" in github's terminology), called
> by default contributorname/biopython, containing initially a
> single master branch, e.g.
> http://github.com/peterjc/biopython/tree/master
> 2. Using the git command line tool, create a branch within your
> repository to work on a problem, say bug2551, and upload this
> branch to your github account. e.g.
> http://github.com/peterjc/biopython/tree/bug2551 (I presume)
> 3. Work on your code, and commit changes to your bug2551 branch
> and push these up to your github account.
> 4. Once you are happy, submit this bug2551 branch for inclusion in
> Biopython (in the short term via Bugzilla, but if/when we have moved
> to github fully, as a pull request to the main biopython master,
> or if appropriate the master of the mainterainer of that module).
> 5. Once the changes are in the main Biopython, you can delete
> the bug2551 branch (but not the whole "fork" which may contain
> other branches).


Yes, I think this is the procedure.
It is a good idea to create a branch with a bug's name, so more people
can work at the same time on the same fix.


-- 

My blog on bioinformatics (now in English): http://bioinfoblog.it

From bugzilla-daemon at portal.open-bio.org  Sat Mar 21 10:32:41 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sat, 21 Mar 2009 10:32:41 -0400
Subject: [Biopython-dev] [Bug 2759] Unit test for Bio.PDB.HSExposure
In-Reply-To: <bug-2759-42@http.bugzilla.open-bio.org/>
Message-ID: <200903211432.n2LEWfXP000985@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2759


------- Comment #9 from dalloliogm at gmail.com  2009-03-21 10:32 EST -------
(In reply to comment #8)
> Marco & Peter, have either of you applied these patches to a git branch yet? My
> branch for Bug 2754 and related changes also converts test_PDB.py to unittest. 
> (I silence the warnings by calling warnings.simplefilter('ignore') in the setUp
> method.) I'd like to try cherry-picking this commit if it's available on
> github.

ok... Is your branch this one:
-
http://github.com/etal/biopython/commit/65f5cf9fa8d6d63976b0942e00bd9aecef7e4197
?


This was my proposal:
-
http://github.com/dalloliogm/biopython/blob/alternative-pdb-exposure-test/Tests/test_PDBexposure.py


I have structured the unittest in a different way, so every test case
represents a pdb file with some known values for PDB exposure etc..: but the
result should be the same.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From dalloliogm at gmail.com  Sat Mar 21 10:40:05 2009
From: dalloliogm at gmail.com (Giovanni Marco Dall'Olio)
Date: Sat, 21 Mar 2009 15:40:05 +0100
Subject: [Biopython-dev] biopython on github
In-Reply-To: <587027.97686.qm@web62408.mail.re1.yahoo.com>
References: <20090320125518.GA351@sobchak.mgh.harvard.edu>
	<587027.97686.qm@web62408.mail.re1.yahoo.com>
Message-ID: <5aa3b3570903210740n7f818560x47991ed97ed616df@mail.gmail.com>

On Sat, Mar 21, 2009 at 6:33 AM, Michiel de Hoon <mjldehoon at yahoo.com> wrote:
>
>> Which parts of this fall out of "standard" git
>> practice? In general,
>> we should strive to keep this as simple as possible. If
>> using Git is
>> complicated then we are losing a lot of our advantage over
>> CVS/patches.
>
> I haven't been following this topic closely, and as an "outsider" using git seems more complicated than using cvs or svn. And to be honest, I don't know if Biopython actually needs the branching and forking stuff.


ok, but I assure you if you don't want to learn the advanced features
it can be used as you did with cvs.
The only difference, maybe, is that you work with a local copy
(offline) and push the changes only when you are sure about them.


If you keep a mirror on github to collect patched and enhancements, it
has some advantages:

- more than one people can work on a patch at the same time
- it is a lot easier to create customized branches of biopython. So if
someone needs to create a custom version of biopython for its own
purposes, it will be always easy to keep it compatible with the
official code.
- people can play with the code and propose enhancements, without
having to ask for write rights. This means that more people can take
confidence with biopython's code and propose fixes.

Have a look at this video, where it shows that the Ruby On Rails
project has grown quicker when it has moved to github:

- http://python.genedrift.org/2009/03/15/ror-commits/

(the jump should be on minute 5.10 or so)


> I think that this is more useful for bigger projects, where multiple developers may be working on interrelated parts of code at the same time. That hardly ever happens in Biopython, though.

Let's say I want to propose a patch to biopython. One of you
developers will probably need to look at it and propose some changes
to adapt it with the rest of biopython.
Isn't it this situation are you describing (multiple developers
working on interrelated parts of the code)?

Another example is the popgen module.
Since it is a pretty big module, and independent from the rest, an
'experimental popgen branch' of biopython has been created, based on
what was the latest biopython's cvs at the time.
However, in the range of time that it has passed since when this
branch has been created, the biopython's cvs has changed: so maybe now
the experimental popgen branch is not compatible any more with the
official code, if some module or convention has been changed.

So, git and github make the process of creating a new branch of
development and keeping it compatible with the original one easier.

> --Michiel.
>
>
>
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev
>


-- 

My blog on bioinformatics (now in English): http://bioinfoblog.it

From eric.talevich at gmail.com  Sat Mar 21 11:23:56 2009
From: eric.talevich at gmail.com (Eric Talevich)
Date: Sat, 21 Mar 2009 11:23:56 -0400
Subject: [Biopython-dev] [Bug 2759] Unit test for Bio.PDB.HSExposure
In-Reply-To: <200903211432.n2LEWfXP000985@portal.open-bio.org>
References: <bug-2759-42@http.bugzilla.open-bio.org/>
	<200903211432.n2LEWfXP000985@portal.open-bio.org>
Message-ID: <3f6baf360903210823o7a597a92va0edd2a281deb465@mail.gmail.com>

On Sat, Mar 21, 2009 at 10:32 AM, <bugzilla-daemon at portal.open-bio.org>wrote:

> http://bugzilla.open-bio.org/show_bug.cgi?id=2759
>
>
> ------- Comment #9 from dalloliogm at gmail.com  2009-03-21 10:32 EST -------
> (In reply to comment #8)
> > Marco & Peter, have either of you applied these patches to a git branch
> yet? My
> > branch for Bug 2754 and related changes also converts test_PDB.py to
> unittest.
> > (I silence the warnings by calling warnings.simplefilter('ignore') in the
> setUp
> > method.) I'd like to try cherry-picking this commit if it's available on
> > github.
>
> ok... Is your branch this one:
> -
>
> http://github.com/etal/biopython/commit/65f5cf9fa8d6d63976b0942e00bd9aecef7e4197
> ?
>
>
> This was my proposal:
> -
>
> http://github.com/dalloliogm/biopython/blob/alternative-pdb-exposure-test/Tests/test_PDBexposure.py
>
>
> I have structured the unittest in a different way, so every test case
> represents a pdb file with some known values for PDB exposure etc..: but
> the
> result should be the same.
>
>

Oh, I see now that these are meant to be separate files. Yes, that's my
branch. Perhaps test_PDB.py should be renamed to test_PDBParser.py, and the
NeighborSearch test moved elsewhere. In that case, there's no merging
problem here, and the only change needed in test_PDBexposure.py is to
silence the warnings... right?

From dalloliogm at gmail.com  Sat Mar 21 12:14:45 2009
From: dalloliogm at gmail.com (Giovanni Marco Dall'Olio)
Date: Sat, 21 Mar 2009 17:14:45 +0100
Subject: [Biopython-dev] [Bug 2759] Unit test for Bio.PDB.HSExposure
In-Reply-To: <3f6baf360903210823o7a597a92va0edd2a281deb465@mail.gmail.com>
References: <bug-2759-42@http.bugzilla.open-bio.org/>
	<200903211432.n2LEWfXP000985@portal.open-bio.org>
	<3f6baf360903210823o7a597a92va0edd2a281deb465@mail.gmail.com>
Message-ID: <5aa3b3570903210914id0bad69xc5459de68b64ec55@mail.gmail.com>

On Sat, Mar 21, 2009 at 4:23 PM, Eric Talevich <eric.talevich at gmail.com> wrote:
> On Sat, Mar 21, 2009 at 10:32 AM, <bugzilla-daemon at portal.open-bio.org>wrote:
>
>> http://bugzilla.open-bio.org/show_bug.cgi?id=2759
>>
>>
>> ------- Comment #9 from dalloliogm at gmail.com ?2009-03-21 10:32 EST -------
>> (In reply to comment #8)
>> > Marco & Peter, have either of you applied these patches to a git branch
>> yet? My
>> > branch for Bug 2754 and related changes also converts test_PDB.py to
>> unittest.
>> > (I silence the warnings by calling warnings.simplefilter('ignore') in the
>> setUp
>> > method.) I'd like to try cherry-picking this commit if it's available on
>> > github.
>>
>> ok... Is your branch this one:
>> -
>>
>> http://github.com/etal/biopython/commit/65f5cf9fa8d6d63976b0942e00bd9aecef7e4197
>> ?
>>
>>
>> This was my proposal:
>> -
>>
>> http://github.com/dalloliogm/biopython/blob/alternative-pdb-exposure-test/Tests/test_PDBexposure.py
>>
>>
>> I have structured the unittest in a different way, so every test case
>> represents a pdb file with some known values for PDB exposure etc..: but
>> the
>> result should be the same.
>>
>>
>
> Oh, I see now that these are meant to be separate files. Yes, that's my
> branch. Perhaps test_PDB.py should be renamed to test_PDBParser.py, and the
> NeighborSearch test moved elsewhere. In that case, there's no merging
> problem here, and the only change needed in test_PDBexposure.py is to
> silence the warnings... right?

well, it depends also on what Peter think.
Mine was only a proof of concept to see if the unittest could be
refactored in that way.
In principle, it should be equivalent to the the original one and
execute the same tests.

If you want to use it, the problem is that it make use of a decorator
function (@classmethod) which is not supported by earlier versions of
python.

This can be resolved by moving all the instructions in setUpAll into
setUp, like here:
- http://github.com/dalloliogm/biopython/commit/83864b8a1269aaf52ac193d7bf9ed9ca5edc5a30

(however, this way the setUp instructions - like opening and parsing
the PPDB file - will be repeated for every test).


> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev
>


-- 

My blog on bioinformatics (now in English): http://bioinfoblog.it


From eric.talevich at gmail.com  Sat Mar 21 13:13:52 2009
From: eric.talevich at gmail.com (Eric Talevich)
Date: Sat, 21 Mar 2009 13:13:52 -0400
Subject: [Biopython-dev] [Bug 2759] Unit test for Bio.PDB.HSExposure
In-Reply-To: <5aa3b3570903210914id0bad69xc5459de68b64ec55@mail.gmail.com>
References: <bug-2759-42@http.bugzilla.open-bio.org/>
	<200903211432.n2LEWfXP000985@portal.open-bio.org>
	<3f6baf360903210823o7a597a92va0edd2a281deb465@mail.gmail.com>
	<5aa3b3570903210914id0bad69xc5459de68b64ec55@mail.gmail.com>
Message-ID: <3f6baf360903211013k423b925avc4a3e714ce36ff85@mail.gmail.com>

On Sat, Mar 21, 2009 at 12:14 PM, Giovanni Marco Dall'Olio <
dalloliogm at gmail.com> wrote:

> On Sat, Mar 21, 2009 at 4:23 PM, Eric Talevich <eric.talevich at gmail.com>
> wrote:
> > On Sat, Mar 21, 2009 at 10:32 AM, <bugzilla-daemon at portal.open-bio.org
> >wrote:
> >
> >> http://bugzilla.open-bio.org/show_bug.cgi?id=2759
> >>
> >>
> >> ok... Is your branch this one:
> >> -
> >>
> http://github.com/etal/biopython/commit/65f5cf9fa8d6d63976b0942e00bd9aecef7e4197
> >> ?
> >>
> >>
> >> This was my proposal:
> >> -
> >>
> http://github.com/dalloliogm/biopython/blob/alternative-pdb-exposure-test/Tests/test_PDBexposure.py
> >>
>
>
> If you want to use it, the problem is that it make use of a decorator
> function (@classmethod) which is not supported by earlier versions of
> python.
>
>
Decorators and @classmethod were added in Python 2.4. Since support for
Python 2.3 is being dropped after the release of BioPython 1.50 (I believe),
it should be safe to apply the decorator to post-1.50 branches. If this
needs to be in 1.50, the older way of "mymethod = classmethod(mymethod)"
would work fine in Py2.3, although I personally would just move the PDB
loading steps to setUp, since the parser is pretty quick and the code for
that is easy to read.

I'll finish up my work on Bug 2754 and merge/rebase it before trying to
integrate this code -- that should bring the parse warnings under control
and make it easier for Peter to dispatch this bug.

From biopython at maubp.freeserve.co.uk  Sat Mar 21 17:16:43 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Sat, 21 Mar 2009 21:16:43 +0000
Subject: [Biopython-dev] biopython on github
In-Reply-To: <587027.97686.qm@web62408.mail.re1.yahoo.com>
References: <20090320125518.GA351@sobchak.mgh.harvard.edu>
	<587027.97686.qm@web62408.mail.re1.yahoo.com>
Message-ID: <320fb6e00903211416r457e303bnc0515b576bbe6c9a@mail.gmail.com>

On Sat, Mar 21, 2009 at 5:33 AM, Michiel de Hoon <mjldehoon at yahoo.com> wrote:
> I haven't been following this topic closely, and as an
> "outsider" using git seems more complicated than using
> cvs or svn. And to be honest, I don't know if Biopython
> actually needs the branching and forking stuff. I think
> that this is more useful for bigger projects, where
> multiple developers may be working on interrelated
> parts of code at the same time. That hardly ever
> happens in Biopython, though.

Certainly git and github is much more powerful, and
therefore more complicated.  There is no denying that.

However, if we move to git on github, I would expect
those of us with CVS access to all be given write
access to the official Biopython branch (probably
using the collaborators feature).  If that is done, I
think you won't find things so different from now.
i.e. Initially at least, it would be business as usual -
our core official developers would be trusted to work
directly on the main branch as now (with discussions
before commits as appropriate), and do not have to
worry about forking/branching etc (unless they want
to).

In terms of the actual command(s) you'd have to type
in at the terminal to commit a change to the online
repository, this goes from one step:

cvs commit -m "Comment here" file1.py file2.py

... to two steps.  First you you have to commit changes
locally (to git on your personal machine) and then
push them to the main Biopython branch on public
server (on github).  Once I'm back at work where I
have git installed, I'll write this up on the wiki -
assuming Brad doesn't beat me too it ;)

The big change is for non-core developers, i.e.
potential contributors (like Eric who is currently trying
some Bio.PDB changes).  For them, using git allows
them to work on their changes and keep in sync with
the master repository with much more ease.

Peter

From chris.lasher at gmail.com  Sat Mar 21 22:33:11 2009
From: chris.lasher at gmail.com (Chris Lasher)
Date: Sat, 21 Mar 2009 22:33:11 -0400
Subject: [Biopython-dev] Help pages in Biopython wiki
In-Reply-To: <C5E90784.1F50A%lpritc@scri.ac.uk>
References: <128a885f0903192152m7d1e24fdh3ace50021851b36e@mail.gmail.com>
	<C5E90784.1F50A%lpritc@scri.ac.uk>
Message-ID: <128a885f0903211933w2fd8986ek53ad8d083cca3534@mail.gmail.com>

On Fri, Mar 20, 2009 at 4:42 AM, Leighton Pritchard <lpritc at scri.ac.uk>wrote:

> Hi Chris,
>
> That page doesn't exist, yet (click on the 'page' tab to see this), and no
> pages link to it (see here:
> http://biopython.org/wiki/Special:WhatLinksHere/Help)
>
> What help were you expecting to see there?


Hi Leighton,

I'm fairly certain there are pages one can install with a MediaWiki instance
that provide the standard help. They look like this:
http://www.mediawiki.org/wiki/Help:Contents

They contain the standard documentation about how to edit, format, create
new pages, etc. Useful things for new community members and people like me
who forget the nuances of each wiki software's markup language from time to
time. :-)

Chris

From biopython at maubp.freeserve.co.uk  Sun Mar 22 06:18:49 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Sun, 22 Mar 2009 10:18:49 +0000
Subject: [Biopython-dev] Help pages in Biopython wiki
In-Reply-To: <128a885f0903211933w2fd8986ek53ad8d083cca3534@mail.gmail.com>
References: <128a885f0903192152m7d1e24fdh3ace50021851b36e@mail.gmail.com>
	<C5E90784.1F50A%lpritc@scri.ac.uk>
	<128a885f0903211933w2fd8986ek53ad8d083cca3534@mail.gmail.com>
Message-ID: <320fb6e00903220318g7e214c8bmf1e6012e5db505fd@mail.gmail.com>

On Sun, Mar 22, 2009 at 2:33 AM, Chris Lasher <chris.lasher at gmail.com> wrote:
> Hi Leighton,
>
> I'm fairly certain there are pages one can install with a MediaWiki instance
> that provide the standard help. They look like this:
> http://www.mediawiki.org/wiki/Help:Contents
>
> They contain the standard documentation about how to edit, format, create
> new pages, etc. Useful things for new community members and people like me
> who forget the nuances of each wiki software's markup language from time to
> time. :-)
>
> Chris

I'm glad Leighton asked - otherwise I would had.

Would it suffice to create an a manual help page, saying this is a
wiki and we are
happy for people to create their own account to fix any minor errors they
spot, and just link to http://www.mediawiki.org/wiki/Help:Contents for help?

Peter

From biopython at maubp.freeserve.co.uk  Sun Mar 22 06:51:17 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Sun, 22 Mar 2009 10:51:17 +0000
Subject: [Biopython-dev] [Bug 2759] Unit test for Bio.PDB.HSExposure
In-Reply-To: <3f6baf360903211013k423b925avc4a3e714ce36ff85@mail.gmail.com>
References: <bug-2759-42@http.bugzilla.open-bio.org/>
	<200903211432.n2LEWfXP000985@portal.open-bio.org>
	<3f6baf360903210823o7a597a92va0edd2a281deb465@mail.gmail.com>
	<5aa3b3570903210914id0bad69xc5459de68b64ec55@mail.gmail.com>
	<3f6baf360903211013k423b925avc4a3e714ce36ff85@mail.gmail.com>
Message-ID: <320fb6e00903220351u53563f03m4c54359278c5b7f0@mail.gmail.com>

On Sat, Mar 21, 2009 at 5:13 PM, Eric Talevich <eric.talevich at gmail.com> wrote:
> Giovanni wrote:
>> If you want to use it, the problem is that it make use of a decorator
>> function (@classmethod) which is not supported by earlier versions of
>> python.
>
> Decorators and @classmethod were added in Python 2.4. Since support for
> Python 2.3 is being dropped after the release of BioPython 1.50 (I believe),
> it should be safe to apply the decorator to post-1.50 branches. If this
> needs to be in 1.50, the older way of "mymethod = classmethod(mymethod)"
> would work fine in Py2.3, although I personally would just move the PDB
> loading steps to setUp, since the parser is pretty quick and the code for
> that is easy to read.

Extra PDB unit tests would be nice to have in Biopython 1.50, which means
they must work on Python 2.3, so no decorators please.

I agree with Eric that it is simpler just to use setUp for PDB file
parsing.  Yes,
it is slower as for each test method the PDB file is reloaded - but you also
make sure it is a clean object structure, which is important as some
operations we will testing may change the object.  e.g. HSExposure:
http://bugzilla.open-bio.org/show_bug.cgi?id=2759#c4

Peter

From biopython at maubp.freeserve.co.uk  Sun Mar 22 06:44:42 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Sun, 22 Mar 2009 10:44:42 +0000
Subject: [Biopython-dev] Bio.Entrez catching more errors
In-Reply-To: <334920.51680.qm@web62402.mail.re1.yahoo.com>
References: <320fb6e00903101640s5db8ed9hc1335d02f5e4123@mail.gmail.com>
	<334920.51680.qm@web62402.mail.re1.yahoo.com>
Message-ID: <320fb6e00903220344t1057bf74mcdc1f2256d8b29b4@mail.gmail.com>

On Sat, Mar 21, 2009 at 4:47 AM, Michiel de Hoon <mjldehoon at yahoo.com> wrote:
>
> I think it is good if we catch more errors in Bio.Entrez, but I think
> the error catching should be done by the parser, not when
> retrieving.

We could do that - maybe some common functions for checking
the first line to see if it looks like HTML or XML would help.  It means
lots of changes to lots of parsers, but would help outside the use
case of Bio.Entrez - so this perhaps worth doing anyway.

What about the fairly common situation (at, its something I've done
fairly often) where Bio.Entrez.efetch() is used to fetch records which
are saved directly to file without verification - e.g. to be parsed by
another program?  Unless the error is caught in Bio.Entrez.efetch()
it may be out of our control.

> As you show, NCBI Entrez returns error messages in various
> different formats: plain text, HTML, incorrect XML, broken XML.
> Since there are many ways to access NCBI Entrez, there may
> be other styles of error messages that we don't know about.
> Then there is the added complication of accessing NCBI Entrez
> to get information in formats other than XML, e.g. GenBank files.
> And all this may be changed over time by NCBI.
>
> Since the error message is ill-defined, code trying to identify
> error messages won't be robust.

All very true.  But the main point in my original email was on
something slightly different...

> On the other hand, the format of files expected by a given
> parser is well-defined: Either the file agrees with the format
> expected by the parser, or it doesn't; if it doesn't, then that's
> an error.

Its not that simple - we are often dealing with loosely defined
file formats, and you may be able to reasonably interpret one
file in several different formats (giving difference/incorrect data).

Some parsers are very tolerant at the moment, for example
GenBank files can have a legitimate free format comment
before the records, so the parser skips anything until it
recognizes a GenBank locus id line.

> We may not be able to extract the exact error message
> returned by NCBI, but a parser for format XYZ can tell
> you that the file is not in format XYZ.

Some parsers may be able to do this, but not all.

> Maybe the XML parser can say it doesn't look like an
> XML file, but that's about it.

This is an easy case because XML is so strictly defined.
Spotting a non-XML file is pretty trivial.

> Once NCBI Entrez starts to return errors in a uniform
> format, we can modify our parsers to find out the
> exact error message. Until that happens, trying to do
> so on our side will not be robust.

I agree that pulling out error messages (the second half
of my original email in the thread) is error prone.  You
might argue that catching any errors is still worthwhile,
as long as there are no false positives.

The first half of the email (the main point) was based
on a special case: HTML and XML are pretty easy to
identify.  If you ask for HTML and don't get it, it is an
error (and vice versa).  If you ask for XML and don't
get it, it is an error (and vice versa).  The fact that
the NCBI currently often return an HTML or XML error
page when a plain text format was requested is then
easily detected as an error (simply from the file type).
This will still work even if the NCBI do change their
error formats or wording - it should be pretty robust.

Peter

From bugzilla-daemon at portal.open-bio.org  Sun Mar 22 07:36:38 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sun, 22 Mar 2009 07:36:38 -0400
Subject: [Biopython-dev] [Bug 2754] Bio.PDB: Parse warnings should print to
	stderr, not stdout
In-Reply-To: <bug-2754-42@http.bugzilla.open-bio.org/>
Message-ID: <200903221136.n2MBacSc000608@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2754


------- Comment #12 from biopython-bugzilla at maubp.freeserve.co.uk  2009-03-22 07:36 EST -------
I have a thought last night about this - how about we keep PERMISSIVE=1 as the
default but offer a "very permissive" mode:

PERMISSIVE=2 (or more), silently ignore problems, continue parsing.
PERMISSIVE=1 (or True), use stderr via the warning module, continue parsing.
PERMISSIVE=0 (or False), raise exceptions, halt parsing.

It would ofter an alternative way to silence the warnings in the unit tests,
and could be controlled at the level of individual tests - for example where we
want to make sure certain errors are caught.

It might also be useful in ordinary scripts.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From tiagoantao at gmail.com  Sun Mar 22 07:50:50 2009
From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=)
Date: Sun, 22 Mar 2009 11:50:50 +0000
Subject: [Biopython-dev] biopython on github
In-Reply-To: <587027.97686.qm@web62408.mail.re1.yahoo.com>
References: <20090320125518.GA351@sobchak.mgh.harvard.edu>
	<587027.97686.qm@web62408.mail.re1.yahoo.com>
Message-ID: <6d941f120903220450y4005b63bvd23dcb4981edec7b@mail.gmail.com>

On Sat, Mar 21, 2009 at 5:33 AM, Michiel de Hoon <mjldehoon at yahoo.com> wrote:
> I haven't been following this topic closely, and as an "outsider" using git seems more complicated than using cvs or svn. And to be honest, I don't know if Biopython actually needs the branching and forking stuff. I think that this is more useful for bigger projects, where multiple developers may be working on interrelated parts of code at the same time. That hardly ever happens in Biopython, though.


I would actually take this argument and reverse it:
The reason why biopython has been a small project, and above all, slow
to develop and innovate is excessive centralization. Using a
distributed technology allows for people to try new ideas and to get
things moving (while still maintaining an official rock stable version
with maybe glacial policies).
Lets not kid ourselves: biopython lacks a lot of stuff that is
fundamental in modern computational biology. The current status quo is
essentially maintaining a frozen set of functionality (most new code
is really just code cleanup and optimization).

While I would be cautious with a distributed environment and would
agree that checks has to be put in place to assure that the official
product is rock solid, has documentation and is reasonably future
proof, I nonetheless warmly welcome this new development.

It is also good, for a change, to have an active discussion on the
list: Now this actually seems like proper, live community.

Tiago


From eric.talevich at gmail.com  Sun Mar 22 11:25:23 2009
From: eric.talevich at gmail.com (Eric Talevich)
Date: Sun, 22 Mar 2009 11:25:23 -0400
Subject: [Biopython-dev] [Bug 2754] Bio.PDB: Parse warnings should print
	to stderr, not stdout
In-Reply-To: <200903221136.n2MBacSc000608@portal.open-bio.org>
References: <bug-2754-42@http.bugzilla.open-bio.org/>
	<200903221136.n2MBacSc000608@portal.open-bio.org>
Message-ID: <3f6baf360903220825g2b871432yba5749dab4c2ba34@mail.gmail.com>

On Sun, Mar 22, 2009 at 7:36 AM, <bugzilla-daemon at portal.open-bio.org>wrote:

> http://bugzilla.open-bio.org/show_bug.cgi?id=2754
>
>
>
> ------- Comment #12 from biopython-bugzilla at maubp.freeserve.co.uk 2009-03-22 07:36 EST -------
> I have a thought last night about this - how about we keep PERMISSIVE=1 as
> the
> default but offer a "very permissive" mode:
>
> PERMISSIVE=2 (or more), silently ignore problems, continue parsing.
> PERMISSIVE=1 (or True), use stderr via the warning module, continue
> parsing.
> PERMISSIVE=0 (or False), raise exceptions, halt parsing.
>
> It would ofter an alternative way to silence the warnings in the unit
> tests,
> and could be controlled at the level of individual tests - for example
> where we
> want to make sure certain errors are caught.
>
> It might also be useful in ordinary scripts.
>
>

I like the idea. I still have to comb through the documentation for the
warnings module some more, but I think it should be possible to do all of
this through that API -- loading PERMISSIVE=0 turns the warnings into full
exceptions, =1 makes them messages on stderr, and =2 switches them off.

At some point I'd like to make a script called something like pdbtidy.py
which parses a potentially not-quite-conformant PDB file in a permissive
mode, lists all complaints (including things like discontinuously-numbered
residues, atom collisions, psi-phi outliers, etc.), and writes out a fixed
version of the file. The model for this is HTML Tidy. Do you think this
would have a place in the Biopython distribution?

From biopython at maubp.freeserve.co.uk  Sun Mar 22 11:53:21 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Sun, 22 Mar 2009 15:53:21 +0000
Subject: [Biopython-dev]  PDB tidy script, was: [Bug 275
Message-ID: <320fb6e00903220853u7b594ee3na86560e34f742b5f@mail.gmail.com>

On Bug 2754 comment 12, I wrote:
http://bugzilla.open-bio.org/show_bug.cgi?id=2754#c12
>> I have a thought last night about this - how about we keep PERMISSIVE=1
>> as the default but offer a "very permissive" mode:
>>
>> PERMISSIVE=2 (or more), silently ignore problems, continue parsing.
>> PERMISSIVE=1 (or True), use stderr via the warning module, continue
>> parsing.
>> PERMISSIVE=0 (or False), raise exceptions, halt parsing.
>>
>> It would ofter an alternative way to silence the warnings in the unit
>> tests, and could be controlled at the level of individual tests - for
>> example where we want to make sure certain errors are caught.
>>
>> It might also be useful in ordinary scripts.

Eric replied:
> I like the idea. I still have to comb through the documentation for the
> warnings module some more, but I think it should be possible to do all of
> this through that API -- loading PERMISSIVE=0 turns the warnings into full
> exceptions, =1 makes them messages on stderr, and =2 switches them off.

It doesn't really matter - all the PDB contruction warning/errors go though
_handle_PDB_exception() to this would be the least invasive way to
implement this.

> At some point I'd like to make a script called something like pdbtidy.py
> which parses a potentially not-quite-conformant PDB file in a permissive
> mode, lists all complaints (including things like discontinuously-numbered
> residues, atom collisions, psi-phi outliers, etc.), and writes out a fixed
> version of the file. The model for this is HTML Tidy. Do you think this
> would have a place in the Biopython distribution?

It sounds useful to me, it can probably go in the scripts subdirectory,
along with the PDB surface exposure script.

One drawback is that currently Bio.PDB's header parsing leaves a lot to
be desired, and very little of the header is output when saving a PDB file
(Thomas' focus is/was very much on the 3D data).

Peter

From lpritc at scri.ac.uk  Mon Mar 23 05:02:53 2009
From: lpritc at scri.ac.uk (Leighton Pritchard)
Date: Mon, 23 Mar 2009 09:02:53 +0000
Subject: [Biopython-dev] biopython on github
In-Reply-To: <5aa3b3570903210740n7f818560x47991ed97ed616df@mail.gmail.com>
Message-ID: <C5ED00BD.1F6E4%lpritc@scri.ac.uk>

On 21/03/2009 14:40, "Giovanni Marco Dall'Olio" <dalloliogm at gmail.com>
wrote:

> Have a look at this video, where it shows that the Ruby On Rails
> project has grown quicker when it has moved to github:
> 
> - http://python.genedrift.org/2009/03/15/ror-commits/
> 
> (the jump should be on minute 5.10 or so)

I've seen this argument a couple of times, now - mostly on blogs - and I'm
not sure that it's all that clear-cut.

The RoR video shows a greater number of individual names associated with
commits, after the move to github.  However, it's not clear whether this is
because a large number of individuals have suddenly decided to contribute to
the project, or whether the move to a version control system in which author
attribution remains with contributed code - as opposed to the bottleneck of
having to be submitted with the id of someone with write access - is
responsible.  I don't think there's enough evidence to say 'the move to
github caused an increase in contributions'.

As a counter-example, the number of people who have recorded contributions
to Biopython code is 46 (from the CONTRIB file on CVS).  I don't think that
there are that many ids associated with committing the codebase on there.
My name's only associated with GenomeDiagram in the commit comments, not as
an author/committer of the code - at least, as far as the CVS application is
concerned - for example.  This might change with git.  Of course, I might be
misunderstanding git's attribution model, or how the stats for that RoR
video were compiled...

L.


-- 
Dr Leighton Pritchard MRSC
D131, Plant Pathology Programme, SCRI
Errol Road, Invergowrie, Perth and Kinross, Scotland, DD2 5DA
e:lpritc at scri.ac.uk       w:http://www.scri.ac.uk/staff/leightonpritchard
gpg/pgp: 0xFEFC205C       tel:+44(0)1382 562731 x2405


______________________________________________________
SCRI, Invergowrie, Dundee, DD2 5DA.  
The Scottish Crop Research Institute is a charitable company limited by guarantee. 
Registered in Scotland No: SC 29367.
Recognised by the Inland Revenue as a Scottish Charity No: SC 006662.


DISCLAIMER:

This email is from the Scottish Crop Research Institute, but the views expressed by the sender are not necessarily the views of SCRI and its subsidiaries.  This email and any files transmitted with it are confidential to the intended recipient at the e-mail address to which it has been addressed.  It may not be disclosed or used by any other than that
addressee.
If you are not the intended recipient you are requested to preserve this confidentiality and you must not use, disclose, copy, print or rely on
this e-mail in any way. Please notify postmaster at scri.ac.uk quoting the name of the sender and delete the email from your system.

Although SCRI has taken reasonable precautions to ensure no viruses are present in this email, neither the Institute nor the sender accepts any responsibility for any viruses, and it is your responsibility to scan the email and the attachments (if any).
______________________________________________________

From p.j.a.cock at googlemail.com  Mon Mar 23 06:14:10 2009
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Mon, 23 Mar 2009 10:14:10 +0000
Subject: [Biopython-dev] biopython on github
In-Reply-To: <C5ED00BD.1F6E4%lpritc@scri.ac.uk>
References: <5aa3b3570903210740n7f818560x47991ed97ed616df@mail.gmail.com>
	<C5ED00BD.1F6E4%lpritc@scri.ac.uk>
Message-ID: <320fb6e00903230314y212be042gfd2f0b86f8738f2d@mail.gmail.com>

On Mon, Mar 23, 2009 at 9:02 AM, Leighton Pritchard <lpritc at scri.ac.uk> wrote:
> On 21/03/2009 14:40, "Giovanni Marco Dall'Olio" <dalloliogm at gmail.com>
> wrote:
>
>> Have a look at this video, where it shows that the Ruby On Rails
>> project has grown quicker when it has moved to github:
>>
>> - http://python.genedrift.org/2009/03/15/ror-commits/
>>
>> (the jump should be on minute 5.10 or so)
>
> I've seen this argument a couple of times, now - mostly on blogs - and I'm
> not sure that it's all that clear-cut.
>
> The RoR video shows a greater number of individual names associated with
> commits, after the move to github. ?However, it's not clear whether this is
> because a large number of individuals have suddenly decided to contribute to
> the project, or whether the move to a version control system in which author
> attribution remains with contributed code - as opposed to the bottleneck of
> having to be submitted with the id of someone with write access - is
> responsible. ?I don't think there's enough evidence to say 'the move to
> github caused an increase in contributions'.
>
> As a counter-example, the number of people who have recorded contributions
> to Biopython code is 46 (from the CONTRIB file on CVS). ?I don't think that
> there are that many ids associated with committing the codebase on there.
> My name's only associated with GenomeDiagram in the commit comments, not as
> an author/committer of the code - at least, as far as the CVS application is
> concerned - for example. ?This might change with git. ?Of course, I might be
> misunderstanding git's attribution model, or how the stats for that RoR
> video were compiled...

Leighton has a good point about the attribution, and the dangers in
over interpreting such a video.  With git/github it will make it
easier to see who contributed patches (if a patch is pulled into
another branch, both the person doing the merge and the person who
originally checked in the patch get recorded), and that may indirectly
encourage more contributions.  As Leighton points out, we do try and
give credit now in CVS commit comments, but these are checked in by a
core developer.  I imagine this happened with RoR, but compiling this
information for that video would probably have been too much work.  As
well as switching tools, you are also changing the metric.

Something else to consider is how you are measuring activity: the git
and github documentation and press encourages people to commit more
often - for example while working on a bug fix or a new feature, I
might make three incremental commits on my local copy of the
repository, before I am happy enough to push this to the online
repository.  This would then show as three commits, wouldn't it - but
on CVS it would probably be just one.   i.e.  On CVS I suspect you
naturally tend to get a smaller number of larger commits than with
git.  This difference will probably vary from person to person - I
haven't counted or anything, but with CVS I think I tend to commit
lots of smaller changes, while Michiel for example tends to make fewer
but larger commits).  i.e. If the RoR video shows a sudden jump in the
number of commits, that doesn't mean more code changes.  Scaling by
number of lines changed would be another metric which is perhaps more
robust - but has drawbacks of its own.

Peter


From eric.talevich at gmail.com  Mon Mar 23 16:39:05 2009
From: eric.talevich at gmail.com (Eric Talevich)
Date: Mon, 23 Mar 2009 16:39:05 -0400
Subject: [Biopython-dev] PDB tidy script, was: [Bug 275
In-Reply-To: <320fb6e00903220853u7b594ee3na86560e34f742b5f@mail.gmail.com>
References: <320fb6e00903220853u7b594ee3na86560e34f742b5f@mail.gmail.com>
Message-ID: <3f6baf360903231339i22438e3bia554a0b7bdda7a5d@mail.gmail.com>

On Sun, Mar 22, 2009 at 11:53 AM, Peter <biopython at maubp.freeserve.co.uk>wrote:

>
> One drawback is that currently Bio.PDB's header parsing leaves a lot to
> be desired, and very little of the header is output when saving a PDB file
> (Thomas' focus is/was very much on the 3D data).
>
> Peter
>


I haven't been on this list long enough to know -- is Thomas still
supporting the PDB module? If so, would he give his blessing to some more
invasive changes to the PDB module, such as unifying PDBParser and
parse_pdb_header? That separation has always seemed curiously vestigal to
me. Now that github gives us some flexibility with public branches, it would
be nice to have a discussion on some longer-term plans for Bio.PDB. I do a
fair amount of work with PDB files and PyMol at my lab, and if the Biopython
core devs are open to it, I can start merging enhancements into my public
branch on github. However, if there's already a plan for the module, it's
obviously best for me not to publish a divergent branch.

-Eric

From biopython at maubp.freeserve.co.uk  Mon Mar 23 17:05:21 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Mon, 23 Mar 2009 21:05:21 +0000
Subject: [Biopython-dev] PDB tidy script
Message-ID: <320fb6e00903231405l479ddcc6of9cd0c1aa8fd98d4@mail.gmail.com>

On Mon, Mar 23, 2009 at 8:39 PM, Eric Talevich <eric.talevich at gmail.com> wrote:
> On Sun, Mar 22, 2009 at 11:53 AM, Peter <biopython at maubp.freeserve.co.uk>wrote:
>
>>
>> One drawback is that currently Bio.PDB's header parsing leaves a lot to
>> be desired, and very little of the header is output when saving a PDB file
>> (Thomas' focus is/was very much on the 3D data).
>>
>> Peter
>
> I haven't been on this list long enough to know -- is Thomas still
> supporting the PDB module? If so, would he give his blessing to some more
> invasive changes to the PDB module, such as unifying PDBParser and
> parse_pdb_header? That separation has always seemed curiously vestigal to
> me.
> Now that github gives us some flexibility with public branches, it would
> be nice to have a discussion on some longer-term plans for Bio.PDB. I do a
> fair amount of work with PDB files and PyMol at my lab, and if the Biopython
> core devs are open to it, I can start merging enhancements into my public
> branch on github. However, if there's already a plan for the module, it's
> obviously best for me not to publish a divergent branch.

If you look back over the history, there initially was no header parsing,
it was a contribution from Kristian Rother, and I would agree, it is rather
disjoint from the rest of the code.  One thing I personally wanted last
time I was working with PDB files was to have secondary structure
information (for them alpha and beta sheet lines in the header)
mapped onto the residue objects automatically.

And yes, Thomas is supporting the PDB module, but his time has
been rather limited of late.  When I asked him about some of the
open enhancement requests in bugzilla recently (off list) he said
said we needed "a separate class to parse all the info in the header,
not a slew of additions to the core parser class (which is designed
to deal with the 3D data only)."

I would suggest you try and get Thomas involved now for his input
on the design (before you start coding), but if need be press ahead
anyway for your own use, and he can always comment on your
public branch.  I hope the two of you can work together on this, and
if/when Thomas does stand down (or delagate), you could then be
in an excellent position to take over as the Bio.PDB maintainer if
that's what you wanted.

Peter

From sbassi at clubdelarazon.org  Tue Mar 24 02:24:38 2009
From: sbassi at clubdelarazon.org (Sebastian Bassi)
Date: Tue, 24 Mar 2009 03:24:38 -0300
Subject: [Biopython-dev] SeqIO and qual: Question about reading and writing
	qual files
Message-ID: <9e2f512b0903232324qb509c60v4154d3e1bffb089e@mail.gmail.com>

I have a .fasta file and its corresponding .qual file.
I run seqclean on the fasta file and I got a shorter .fasta file as
output (that is expected).
Using the .cln file from seqclean, I want to "trim" the .qual file the
same way my new fasta is trimmed.
I can read the cln and parse the information of "where to trim".
For example, in one original sequence of 1000 bp, I may need to trim
from 150 to 800.
The problem is that I can't modify qual values using the new SeqIO
qual parser (at least the size of the list can't be modified). I read
the example in the doc, where it is cut doing something like:
sub_rec = fullrec[150:800]
But, this works only when there is a sequence (so, when read it as
"fastq"), but it doesn't work when the sequence is read as "qual"
(because there is no sequence and in this case I can't modify the
length of the list in letter_annotations['phred_quality'], it is true
that I can modify qual values in the list, but I want to modify list
size).
Here is the error:
Traceback (most recent call last):
  File "/home/sbassi/bioinfo/INTA/qualparser.py", line 18, in <module>
    s.letter_annotations['phred_quality'] = [0,0,0,0,10,1]
  File "/home/sbassi/test/virtualenv-1.3.2/t6/lib/python2.5/site-packages/biopython-1.49-py2.5-linux-i686.egg/Bio/SeqRecord.py",
line 33, in __setitem__
    "strings) of length %i." % self._length)
TypeError: We only allow python sequences (lists, tuples or strings)
of length 5.


(Note: 5 was the size of the original qual record, when I tried to set
it to [0,0,0,0,10,1], I get this).

So my question is: Does it make sense to allow the user to modify the
size of the list in letter_annotations['phred_quality'] in qual
sequences? I think this is a nice feature for qual SeqIO.parse. If I
can modify the list size, then I can save the modified version with
SeqIO.write(x,fh,"qual") and have a qual file with a new size.

I am using 1.49 with new files from CVS.


-- 
Sebasti?n Bassi. Diplomado en Ciencia y Tecnolog?a.

Non standard disclaimer: READ CAREFULLY. By reading this email,
you agree, on behalf of your employer, to release me from all
obligations and waivers arising from any and all NON-NEGOTIATED
agreements, licenses, terms-of-service, shrinkwrap, clickwrap,
browsewrap, confidentiality, non-disclosure, non-compete and
acceptable use policies ("BOGUS AGREEMENTS") that I have
entered into with your employer, its partners, licensors, agents and
assigns, in perpetuity, without prejudice to my ongoing rights and
privileges. You further represent that you have the authority to release
me from any BOGUS AGREEMENTS on behalf of your employer.


From biopython at maubp.freeserve.co.uk  Tue Mar 24 05:49:33 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Tue, 24 Mar 2009 09:49:33 +0000
Subject: [Biopython-dev] SeqIO and qual: Question about reading and
	writing qual files
In-Reply-To: <9e2f512b0903232324qb509c60v4154d3e1bffb089e@mail.gmail.com>
References: <9e2f512b0903232324qb509c60v4154d3e1bffb089e@mail.gmail.com>
Message-ID: <320fb6e00903240249h4d0bf648rfd5de741e582f687@mail.gmail.com>

On Tue, Mar 24, 2009 at 6:24 AM, Sebastian Bassi
<sbassi at clubdelarazon.org> wrote:
> I have a .fasta file and its corresponding .qual file.
> I run seqclean on the fasta file and I got a shorter .fasta file as
> output (that is expected).

Whose seqclean script are you using?  If it doesn't output the trimmed
qual file, can it work with FASTQ output instead?

> Using the .cln file from seqclean, I want to "trim" the .qual file the
> same way my new fasta is trimmed.
> I can read the cln and parse the information of "where to trim".
> For example, in one original sequence of 1000 bp, I may need to trim
> from 150 to 800.
> The problem is that I can't modify qual values using the new SeqIO
> qual parser (at least the size of the list can't be modified). I read
> the example in the doc, where it is cut doing something like:
> sub_rec = fullrec[150:800]
> But, this works only when there is a sequence (so, when read it as
> "fastq"), but it doesn't work when the sequence is read as "qual"
> (because there is no sequence ...
> So my question is: Does it make sense to allow the user to modify the
> size of the list in letter_annotations['phred_quality'] in qual
> sequences? I think this is a nice feature for qual SeqIO.parse.

This was one area of the new SeqRecord slicing I was a little unsure
about - slicing a qual file's SeqRecord (or any SeqRecord with a None
for the sequence).  I hadn't done anything about it immediately as I
couldn't think of a use case for it - so that's solved ;)

One solution would be to introduce an UnknownSeq object, which
would be much nicer to deal with than a None object, as it would have
a length and support slicing.  I've mentioned this idea before, but
haven't yet put forward any actual code.  This seems most elegant.

Another option would be to special case handle slicing a SeqRecord
with a None sequence, where we'd slice its per-letter-annotation. For
now, you can force this with the current code by:

#Not recommend, short term hack
s.letter_annotations._length = 6
s.letter_annotations['phred_quality'] = [0,0,0,0,10,1]

Right now, without changing Biopython, I have another workaround for
you: Use the paired reader in Bio.SeqIO.QualityIO on the untrimmed
FASTA and QUAL files, which will give you SeqRecords with both the
sequence and the quality - and trim these by slicing the SeqRecord.

Peter

From sbassi at clubdelarazon.org  Tue Mar 24 10:59:51 2009
From: sbassi at clubdelarazon.org (Sebastian Bassi)
Date: Tue, 24 Mar 2009 11:59:51 -0300
Subject: [Biopython-dev] SeqIO and qual: Question about reading and
	writing qual files
In-Reply-To: <320fb6e00903240249h4d0bf648rfd5de741e582f687@mail.gmail.com>
References: <9e2f512b0903232324qb509c60v4154d3e1bffb089e@mail.gmail.com>
	<320fb6e00903240249h4d0bf648rfd5de741e582f687@mail.gmail.com>
Message-ID: <9e2f512b0903240759n3c7f8b8fpc96bccd4d629082d@mail.gmail.com>

On Tue, Mar 24, 2009 at 6:49 AM, Peter <biopython at maubp.freeserve.co.uk> wrote:
> Whose seqclean script are you using?  If it doesn't output the trimmed
> qual file, can it work with FASTQ output instead?

I am using the seqclean found here:
http://compbio.dfci.harvard.edu/tgi/software/
I doesn't output a trimmed qual file because seqclean accepts only
fasta as input. Oh, wait!!!. Looking at my seqclean directory I found
a cln2qual script. So I looked at the README to see what is it, and I
found:

"If after seqclean one needs to trim the corresponding quality values too,
according to the new coordinates or trash codes found by seqclean, the
utility script "cln2qual" is included (see the usage message). It expects
a fasta-like file containing space delimited quality values for each nucleotide
of the original sequences. It should be run after the seqclean, as it parses the
trimming ("clear range") coordinates and trash codes from the cleaning report
and applies them to the quality records."

So this utility does what I was about to do with Biopython.

But anyway, regarding this:

> This was one area of the new SeqRecord slicing I was a little unsure
> about - slicing a qual file's SeqRecord (or any SeqRecord with a None
> for the sequence).  I hadn't done anything about it immediately as I
> couldn't think of a use case for it - so that's solved ;)
> One solution would be to introduce an UnknownSeq object, which
....

I agree with the need of an UnknownSeq object for modify the size of
the qual file.

Best,
SB.

From biopython at maubp.freeserve.co.uk  Tue Mar 24 11:13:40 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Tue, 24 Mar 2009 15:13:40 +0000
Subject: [Biopython-dev] SeqIO and qual: Question about reading and
	writing qual files
In-Reply-To: <9e2f512b0903240759n3c7f8b8fpc96bccd4d629082d@mail.gmail.com>
References: <9e2f512b0903232324qb509c60v4154d3e1bffb089e@mail.gmail.com>
	<320fb6e00903240249h4d0bf648rfd5de741e582f687@mail.gmail.com>
	<9e2f512b0903240759n3c7f8b8fpc96bccd4d629082d@mail.gmail.com>
Message-ID: <320fb6e00903240813x5fdb3589qef340129b5e267c0@mail.gmail.com>

On Tue, Mar 24, 2009 at 2:59 PM, Sebastian Bassi
<sbassi at clubdelarazon.org> wrote:
> But anyway, regarding this:
>
>> This was one area of the new SeqRecord slicing I was a little unsure
>> about - slicing a qual file's SeqRecord (or any SeqRecord with a None
>> for the sequence). ?I hadn't done anything about it immediately as I
>> couldn't think of a use case for it - so that's solved ;)
>> One solution would be to introduce an UnknownSeq object, which
>> ....
>
> I agree with the need of an UnknownSeq object for modify the size of
> the qual file.

Suppose you read in a qual file (or a GenBank file with no sequence, just a
CONTIG line), and instead of None, the SeqRecord object(s) had a new
UnknownSeq object saying they where made up of a given number of "N"
characters using a DNA alphabet. What would you expect to get if you
used Bio.SeqIO to write out the file in FASTA format?  To my mind there
are two sensible options - write out the file using the "NNN....N"
sequence, or raise an error.

Peter


From biopython at maubp.freeserve.co.uk  Tue Mar 24 11:23:20 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Tue, 24 Mar 2009 15:23:20 +0000
Subject: [Biopython-dev] SeqIO and qual: Question about reading and
	writing qual files
In-Reply-To: <320fb6e00903240249h4d0bf648rfd5de741e582f687@mail.gmail.com>
References: <9e2f512b0903232324qb509c60v4154d3e1bffb089e@mail.gmail.com>
	<320fb6e00903240249h4d0bf648rfd5de741e582f687@mail.gmail.com>
Message-ID: <320fb6e00903240823o53267d8bn36908f001708f974@mail.gmail.com>

On Tue, Mar 24, 2009 at 9:49 AM, Peter <biopython at maubp.freeserve.co.uk> wrote:
>
> This was one area of the new SeqRecord slicing I was a little unsure
> about - slicing a qual file's SeqRecord (or any SeqRecord with a None
> for the sequence). ?I hadn't done anything about it immediately as I
> couldn't think of a use case for it - so that's solved ;)
>
> One solution would be to introduce an UnknownSeq object, which
> would be much nicer to deal with than a None object, as it would have
> a length and support slicing. ?I've mentioned this idea before, but
> haven't yet put forward any actual code. ?This seems most elegant.
>
> Another option would be to special case handle slicing a SeqRecord
> with a None sequence, where we'd slice its per-letter-annotation.

That should now be working with the change I've just checked into CVS,
but the combination of slicing per-letter-annotation while the sequence
is None is a real pain.

I'm almost tempted to back out the qual parser for the next release
(FASTQ support is fine), but let's see if if we can reach a consensus on
a new UnknownSeq class instead (see my earlier email on this - what
would you expect to happen if you read in a QUAL file and tried to
save it as a FASTA file?).

Peter


From sbassi at clubdelarazon.org  Tue Mar 24 11:33:56 2009
From: sbassi at clubdelarazon.org (Sebastian Bassi)
Date: Tue, 24 Mar 2009 12:33:56 -0300
Subject: [Biopython-dev] SeqIO and qual: Question about reading and
	writing qual files
In-Reply-To: <320fb6e00903240813x5fdb3589qef340129b5e267c0@mail.gmail.com>
References: <9e2f512b0903232324qb509c60v4154d3e1bffb089e@mail.gmail.com>
	<320fb6e00903240249h4d0bf648rfd5de741e582f687@mail.gmail.com>
	<9e2f512b0903240759n3c7f8b8fpc96bccd4d629082d@mail.gmail.com>
	<320fb6e00903240813x5fdb3589qef340129b5e267c0@mail.gmail.com>
Message-ID: <9e2f512b0903240833g7768de97q8f10fe72cde7e64a@mail.gmail.com>

On Tue, Mar 24, 2009 at 12:13 PM, Peter <biopython at maubp.freeserve.co.uk> wrote:
....
> characters using a DNA alphabet. What would you expect to get if you
> used Bio.SeqIO to write out the file in FASTA format?  To my mind there
> are two sensible options - write out the file using the "NNN....N"
> sequence, or raise an error.

"N" is OK (with the same length of the qual file), that is what ABI
does when the QV is low. This is not the same case but I always think
of "N" as "unknown".
Raise an error is not bad because I don't see the need to go from an
non-sequence qual to a fasta (it doesn't make sense). But that I don't
see the need, doesn't means someone else may have a reason.
Best,

-- 
Sebasti?n Bassi. Diplomado en Ciencia y Tecnolog?a.

Non standard disclaimer: READ CAREFULLY. By reading this email,
you agree, on behalf of your employer, to release me from all
obligations and waivers arising from any and all NON-NEGOTIATED
agreements, licenses, terms-of-service, shrinkwrap, clickwrap,
browsewrap, confidentiality, non-disclosure, non-compete and
acceptable use policies ("BOGUS AGREEMENTS") that I have
entered into with your employer, its partners, licensors, agents and
assigns, in perpetuity, without prejudice to my ongoing rights and
privileges. You further represent that you have the authority to release
me from any BOGUS AGREEMENTS on behalf of your employer.


From bugzilla-daemon at portal.open-bio.org  Tue Mar 24 14:25:17 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 24 Mar 2009 14:25:17 -0400
Subject: [Biopython-dev] [Bug 2799] New: UnknownSeq object (e.g. for QUAL
	files)
Message-ID: <bug-2799-42@http.bugzilla.open-bio.org/>

http://bugzilla.open-bio.org/show_bug.cgi?id=2799

           Summary: UnknownSeq object (e.g. for QUAL files)
           Product: Biopython
           Version: Not Applicable
          Platform: All
        OS/Version: All
            Status: NEW
          Severity: enhancement
          Priority: P2
         Component: Main Distribution
        AssignedTo: biopython-dev at biopython.org
        ReportedBy: biopython-bugzilla at maubp.freeserve.co.uk


Sometimes we want to represent an unknown sequence with a known length, e.g.
"N"*length for nucleotides.  This enhancement is about adding an UnknownSeq
object to Biopython which would have the following init arguments:

* length
* alphabet
* character (single letter string, defaulting to "X" for protein and "N" for
nucleotides, "?" otherwise)

Currently the Bio.SeqIO "qual" parser produces SeqRecord objects where the seq
is None, yet there is a known length.  This can also occur in GenBank files
where the is a CONTIG line but no sequence.  This makes supporting slicing (Bug
2507) complicated.  Adding a new UnknownSeq class would solve this elegantly.

In general, the UnknownSeq object should act as a Seq object whose sequence is
the character*length.

Slicing or adding UnknownSeq objects should give a new UnknownSeq object. 

Complement, reverse complement, transcribe and back transcribe can also return
new UnknownSeq objects of the same length (alphabet permitting).  Translation
can return an UnknownSeq object using "X" and a protein alphabet (with the
length roughly one third of the nucleotide length - whatever is consistent with
the Seq translate method).

Adding an UnknownSeq object to a Seq would have to give a new Seq object (or an
error?).  One use-case example here would be joining together contigs with
unknown regions of a given length (strings of N's).

This bug is a placeholder for patches or pointers to possible implementations
(e.g. I intend to try some ideas on a branch on github).  I expect most of the
discussion to be on the (dev) mailing list, rather than bugzilla.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From tiagoantao at gmail.com  Tue Mar 24 14:42:56 2009
From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=)
Date: Tue, 24 Mar 2009 18:42:56 +0000
Subject: [Biopython-dev] biopython on github
In-Reply-To: <20090317124930.GE57054@sobchak.mgh.harvard.edu>
References: <320fb6e00903170146x59218aa0m857cab797ad3f440@mail.gmail.com>
	<C5E52504.1F20A%lpritc@scri.ac.uk>
	<20090317124930.GE57054@sobchak.mgh.harvard.edu>
Message-ID: <6d941f120903241142m2f39213yfd180fcdc7ab7f0e@mail.gmail.com>

Hi,

On Tue, Mar 17, 2009 at 12:49 PM, Brad Chapman <chapmanb at 50mail.com> wrote:
> There is a lot of good material in this thread for new potential
> developers. Tiago, it would make sense to condense what you've
> written and include it with the Contributing guide:

Just a followup on this: I think it makes no sense to put much of the
new content before there is an official step of moving to github.

What I am doing, is just to put, for test purposes a framework to see
how these suggestions my work.

I' ve created a fork
http://github.com/tiagoantao/biopython-popgen-test/
with several branches

The proposed idea is:
1. The master branch should be a clearing house and stability point
for things to be suggested for submission to the official branch. All
code here should have unit tests, all unit tests should pass and
documentation should exist. Is is also a place to correct bugs that
are discovered in the official trunk (if these are simple to correct
and don' t require the creation of a temporary branch to sort them
out)
2. There is a stats branch to work on Bio.PopGen.Stats. If you want to
work on statistics you can follow/fork from the statistics branch. Any
code that people might have should be discussed to if they want to
make it on the official release.
3. Less interesting to others, I will personally create a genepop
branch to make an enhancement to the existing parser and on the
ability to call the genepop binary.

So:
People work on their very personal branches (like my genepop one).
Development branches that might have shared interests (like the stats
one) should be forked/shared commit and people interested should
discuss among themselves.
Whenever some content is deemed ready it is then put on the popgen
master branch (alongside with tests and documentation). When the
master branch is in a stable state, then the changes are proposed to
the official one.

In my view, this protects the people working on the official thing
from the potential chaos of new developments, while creating a
framework which allow for people to test innovations...

Tiago

From biopython at maubp.freeserve.co.uk  Tue Mar 24 14:54:28 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Tue, 24 Mar 2009 18:54:28 +0000
Subject: [Biopython-dev] biopython on github
In-Reply-To: <6d941f120903241142m2f39213yfd180fcdc7ab7f0e@mail.gmail.com>
References: <320fb6e00903170146x59218aa0m857cab797ad3f440@mail.gmail.com>
	<C5E52504.1F20A%lpritc@scri.ac.uk>
	<20090317124930.GE57054@sobchak.mgh.harvard.edu>
	<6d941f120903241142m2f39213yfd180fcdc7ab7f0e@mail.gmail.com>
Message-ID: <320fb6e00903241154g1a0f468cy512b29504b8b637a@mail.gmail.com>

2009/3/24 Tiago Ant?o <tiagoantao at gmail.com>:
> Hi,
>
> On Tue, Mar 17, 2009 at 12:49 PM, Brad Chapman <chapmanb at 50mail.com> wrote:
>> There is a lot of good material in this thread for new potential
>> developers. Tiago, it would make sense to condense what you've
>> written and include it with the Contributing guide:
>
> Just a followup on this: I think it makes no sense to put much of the
> new content before there is an official step of moving to github.

True - but we do need enough pointers for people to help try things out.

> What I am doing, is just to put, for test purposes a framework to see
> how these suggestions my work....
> In my view, this protects the people working on the official thing
> from the potential chaos of new developments, while creating a
> framework which allow for people to test innovations...

That sounds great, and a good model for other (self contained) modules under
active development.  I'm thinking along similar lines for Bio.SeqIO and AlignIO
(and by implication, the SeqRecord and the Alignment classes).

I would assume (although you didn't say this) you would also pull changes to
the official trunk into your branches periodically - at very least
after each official
Biopython release.

Peter


From bartek at rezolwenta.eu.org  Tue Mar 24 19:58:30 2009
From: bartek at rezolwenta.eu.org (Bartek Wilczynski)
Date: Wed, 25 Mar 2009 00:58:30 +0100
Subject: [Biopython-dev] history on github - where are the tags?
In-Reply-To: <8b34ec180903241649p7e81a2cew6587512c0cef16f@mail.gmail.com>
References: <320fb6e00903170206h570989bbgb6b9a761d2aa70ed@mail.gmail.com>
	<8b34ec180903241649p7e81a2cew6587512c0cef16f@mail.gmail.com>
Message-ID: <8b34ec180903241658k21a76269r789600f92c17fbbb@mail.gmail.com>

Hi all,

Sorry for being quiet all that time, but the conference (+jet lag both
ways) proved to be more engaging than I thought.

For the tags, they were not pushed to github before, because I didn't
know I need to specifically do it qith git push --tags.

Now they are pushed to the repository and you can fetch them to local
copies by git pull -t in any local directory which resulted from
cloning the official branch.

They probably won't get automatically transfered to derived branches,
I guess you need to pull
them from the original (official) branch.

cheers
Bartek

On Wed, Mar 25, 2009 at 12:49 AM, Bartek Wilczynski <barwil at gmail.com> wrote:
> Hi all,
>
> Sorry for being quiet all that time, but the conference (+jet lag both
> ways) proved to be more engaging than I thought.
>
> For the tags, they were not pushed to github before, because I didn't
> know I need to specifically do it qith git push --tags.
>
> Now they are pushed to the repository and you can fetch them to local
> copies by git pull -t in any local directory which resulted from
> cloning the official branch.
>
> They probably won't get automatically transfered to derived branches,
> I guess you need to pull
> them from the original (official) branch.
>
> cheers
> Bartek
>
> On Tue, Mar 17, 2009 at 10:06 AM, Peter <biopython at maubp.freeserve.co.uk> wrote:
>> Hi Bartek et al,
>>
>> I've just been looking over the github mirror of CVS, and wanted to
>> see it presented the history of individual files. ?For example, this
>> page looks at the Bio/SeqRecord.py history using ViewCVS:
>> http://cvs.biopython.org/cgi-bin/viewcvs/viewcvs.cgi/biopython/Bio/SeqRecord.py?cvsroot=biopython
>>
>> For comparison, in GitHub,
>> http://github.com/biopython/biopython/commits/master/Bio/SeqRecord.py
>>
>> As you can see, all the comments and changes are there - which is
>> great. ?But I can't see the CVS tag information, which I assume would
>> be converting into git tags. ?Is this information present in the git
>> repository, but not shown by github, or was it lost during the
>> migration? ?This might seem like a little thing, but I have found it
>> incredibly important for tracing bugs reported in older releases, for
>> example in narrowing down when something changed.
>>
>> Peter
>>
>
>
>
> --
> Bartek Wilczynski
> ==================
> Postdoctoral fellow
> EMBL, Furlong group
> Meyerhoffstrasse 1,
> 69012 Heidelberg,
> Germany
> tel: +49 6221 387 8433
>


-- 
Bartek Wilczynski
==================
Postdoctoral fellow
EMBL, Furlong group
Meyerhoffstrasse 1,
69012 Heidelberg,
Germany
tel: +49 6221 387 8433


From biopython at maubp.freeserve.co.uk  Wed Mar 25 06:01:45 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Wed, 25 Mar 2009 10:01:45 +0000
Subject: [Biopython-dev] SeqIO and qual: Question about reading and
	writing qual files
In-Reply-To: <9e2f512b0903240833g7768de97q8f10fe72cde7e64a@mail.gmail.com>
References: <9e2f512b0903232324qb509c60v4154d3e1bffb089e@mail.gmail.com>
	<320fb6e00903240249h4d0bf648rfd5de741e582f687@mail.gmail.com>
	<9e2f512b0903240759n3c7f8b8fpc96bccd4d629082d@mail.gmail.com>
	<320fb6e00903240813x5fdb3589qef340129b5e267c0@mail.gmail.com>
	<9e2f512b0903240833g7768de97q8f10fe72cde7e64a@mail.gmail.com>
Message-ID: <320fb6e00903250301v59319214pa3246e0a49899e87@mail.gmail.com>

On Tue, Mar 24, 2009 at 3:33 PM, Sebastian Bassi
<sbassi at clubdelarazon.org> wrote:
> On Tue, Mar 24, 2009 at 12:13 PM, Peter <biopython at maubp.freeserve.co.uk> wrote:
> ....
>> characters using a DNA alphabet. What would you expect to get if you
>> used Bio.SeqIO to write out the file in FASTA format? ?To my mind there
>> are two sensible options - write out the file using the "NNN....N"
>> sequence, or raise an error.
>
> "N" is OK (with the same length of the qual file), that is what ABI
> does when the QV is low. This is not the same case but I always think
> of "N" as "unknown".
> Raise an error is not bad because I don't see the need to go from an
> non-sequence qual to a fasta (it doesn't make sense). But that I don't
> see the need, doesn't means someone else may have a reason.
> Best,

I've filed an enhancement bug for the possible enhancement to add an UnknownSeq
object, perhaps as part of the Bio.Seq module, Bug 2799
http://bugzilla.open-bio.org/show_bug.cgi?id=2799

I've done an initial patch (which I plan to upload on Bugzilla) which
is available now
on git hub on a new branch:
http://github.com/peterjc/biopython/tree/bug2799-UnknownSeq

Note this doesn't do anything special (yet) when writing output files,
so they will
by default record a string of whatever unknown sequence character was used.

It would make sense for GenBank/EMBL in SeqIO to also take advantage o
the UnknownSeq object, because here the sequence is essentially optional
(consider files with just a CONTIG line), but should always have a length.

Sebastian - could you have a quick play with this github code (using the new
UnknownSeq class), and the current CVS code (using None), and make sure
both support the slicing operations you were trying earlier?  Thanks.

Peter


From biopython at maubp.freeserve.co.uk  Wed Mar 25 06:28:46 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Wed, 25 Mar 2009 10:28:46 +0000
Subject: [Biopython-dev] history on github - where are the tags?
In-Reply-To: <8b34ec180903241658k21a76269r789600f92c17fbbb@mail.gmail.com>
References: <320fb6e00903170206h570989bbgb6b9a761d2aa70ed@mail.gmail.com>
	<8b34ec180903241649p7e81a2cew6587512c0cef16f@mail.gmail.com>
	<8b34ec180903241658k21a76269r789600f92c17fbbb@mail.gmail.com>
Message-ID: <320fb6e00903250328y19165a77t470124ce490cea3d@mail.gmail.com>

On Tue, Mar 24, 2009 at 11:58 PM, Bartek Wilczynski wrote:
>
> Hi all,
>
> Sorry for being quiet all that time, but the conference (+jet lag both
> ways) proved to be more engaging than I thought.

That's fine - sleep is important ;)

> For the tags, they were not pushed to github before, because I didn't
> know I need to specifically do it qith git push --tags.

I assume you've updated your cron job so this will happen
automatically in future (e.g. when we do Biopython 1.50 beta).

> Now they are pushed to the repository and you can fetch them to local
> copies by git pull -t in any local directory which resulted from
> cloning the official branch.

Yes, I've checked and I can get the tags with:
git pull -t ...
or,
git pull --tags ...

They also show up in github (near the top, drop down menu next to
branches) and in gitx (and I assume other GUI clients).

They have commit comments like "This commit was manufactured by
cvs2svn to create tag 'biopython-146'", which is fine.

However, all the tags seem to have associated with them the deletion
of the files AUTHORS and Bio/UniGene/UniGene.py which is rather odd.
If you can work out how this happened, would it be trivial to back
these tags out and redo it?

> They probably won't get automatically transfered to derived branches,
> I guess you need to pull them from the original (official) branch.

That makes sense.

Peter

From mjldehoon at yahoo.com  Wed Mar 25 07:47:59 2009
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Wed, 25 Mar 2009 04:47:59 -0700 (PDT)
Subject: [Biopython-dev] Bio.Entrez catching more errors
In-Reply-To: <320fb6e00903220344t1057bf74mcdc1f2256d8b29b4@mail.gmail.com>
Message-ID: <559251.50851.qm@web62401.mail.re1.yahoo.com>


> What about the fairly common situation (at, its something
> I've done fairly often) where Bio.Entrez.efetch() is used
> to fetch records which are saved directly to file without
> verification - e.g. to be parsed by another program?
> Unless the error is caught in Bio.Entrez.efetch()
> it may be out of our control.

That is easy: just run the output returned by NCBI through the appropriate parser. If the parser is happy, proceed to save the NCBI output in a file.

> The first half of the email (the main point) was based
> on a special case: HTML and XML are pretty easy to
> identify.  If you ask for HTML and don't get it, it is
> an error (and vice versa).  If you ask for XML and don't
> get it, it is an error (and vice versa).  The fact that
> the NCBI currently often return an HTML or XML error
> page when a plain text format was requested is then
> easily detected as an error (simply from the file type).
> This will still work even if the NCBI do change their
> error formats or wording - it should be pretty robust.

Have a look at serialset.xml in the Bio.Entrez test cases ... this is the output obtained from NCBI using efetch from the journals database with retmode='xml'. The file looks like XML, but it doesn't start with "<!xml". However, Bio.Entrez.read parses it correctly, so while it's not pretty to me this would not count as an error.

--Michiel.


From biopython at maubp.freeserve.co.uk  Wed Mar 25 08:15:21 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Wed, 25 Mar 2009 12:15:21 +0000
Subject: [Biopython-dev] Bio.Entrez catching more errors
In-Reply-To: <559251.50851.qm@web62401.mail.re1.yahoo.com>
References: <320fb6e00903220344t1057bf74mcdc1f2256d8b29b4@mail.gmail.com>
	<559251.50851.qm@web62401.mail.re1.yahoo.com>
Message-ID: <320fb6e00903250515vd885b34s629dd9253d4f9186@mail.gmail.com>

On Wed, Mar 25, 2009 at 11:47 AM, Michiel de Hoon <mjldehoon at yahoo.com> wrote:
>
>> What about the fairly common situation (at, its something
>> I've done fairly often) where Bio.Entrez.efetch() is used
>> to fetch records which are saved directly to file without
>> verification - e.g. to be parsed by another program?
>> Unless the error is caught in Bio.Entrez.efetch()
>> it may be out of our control.
>
> That is easy: just run the output returned by NCBI through
> the appropriate parser. If the parser is happy, proceed to
> save the NCBI output in a file.

Possible, but you'd need to cache the handle's data in order
to be able to save it after parsing.  The UndoHandle doesn't
do this.

You could save the data to a file, and then check the parser
can read it back - however, this would be complicated if you
are downloading data in batches to go into a single file.

>> The first half of the email (the main point) was based
>> on a special case: HTML and XML are pretty easy to
>> identify. ?If you ask for HTML and don't get it, it is
>> an error (and vice versa). ?If you ask for XML and don't
>> get it, it is an error (and vice versa). ?The fact that
>> the NCBI currently often return an HTML or XML error
>> page when a plain text format was requested is then
>> easily detected as an error (simply from the file type).
>> This will still work even if the NCBI do change their
>> error formats or wording - it should be pretty robust.
>
> Have a look at serialset.xml in the Bio.Entrez test cases ... this
> is the output obtained from NCBI using efetch from the journals
> database with retmode='xml'. The file looks like XML, but it
> doesn't start with "<!xml". However, Bio.Entrez.read parses it
> correctly, so while it's not pretty to me this would not count as
> an error.

I do concede my sample code for detecting XML or HTML could
be improved, and this provides a good test case for a difficult
XML file.  Maybe when we expect XML (or HTML), all we should
check is the file starts with "<"?  e.g.

   elif "retmode" in params and params["retmode"].lower()=="html" \
   and not data.lower().startswith("<") :
       raise TypeError("Requested HTML, but didn't get it: %s..." % data)
   elif "retmode" in params and params["retmode"].lower()=="xml" \
   and not data.lower().startswith("<") :
       raise TypeError("Requested XML, but didn't get it: %s..." % data)
   elif "retmode" in params and params["retmode"] \
   and params["retmode"].lower()!="xml" \
   and data.lower().startswith("<?xml") :
       raise TypeError("Didn't request XML, but got it: %s..." % data)
   elif "retmode" in params and params["retmode"] \
   and params["retmode"].lower()!="html" \
   and (data.lower().startswith("<html") or \
        data.lower().startswith("<!doctype html")):
       #Expected for some error pages (e.g. the Bad Gateway caught above)
       raise TypeError("Didn't request HTML, but got it: %s..." % data)

The above code isn't expected to catch all possible errors - just the
most common ones.  One this thing version won't catch is a mix up
between XML and HTML (e.g. requested XML, given HTML error page)
but the two do overlap somewhat anyway.

Peter


From bartek at rezolwenta.eu.org  Wed Mar 25 08:16:08 2009
From: bartek at rezolwenta.eu.org (Bartek Wilczynski)
Date: Wed, 25 Mar 2009 13:16:08 +0100
Subject: [Biopython-dev] history on github - where are the tags?
In-Reply-To: <320fb6e00903250328y19165a77t470124ce490cea3d@mail.gmail.com>
References: <320fb6e00903170206h570989bbgb6b9a761d2aa70ed@mail.gmail.com>
	<8b34ec180903241649p7e81a2cew6587512c0cef16f@mail.gmail.com>
	<8b34ec180903241658k21a76269r789600f92c17fbbb@mail.gmail.com>
	<320fb6e00903250328y19165a77t470124ce490cea3d@mail.gmail.com>
Message-ID: <8b34ec180903250516v75efdd2i95cb77145b4d3001@mail.gmail.com>

On Wed, Mar 25, 2009 at 11:28 AM, Peter <biopython at maubp.freeserve.co.uk> wrote:

> I assume you've updated your cron job so this will happen
> automatically in future (e.g. when we do Biopython 1.50 beta).

Yes, naturally.

>
> However, all the tags seem to have associated with them the deletion
> of the files AUTHORS and Bio/UniGene/UniGene.py which is rather odd.
> If you can work out how this happened, would it be trivial to back
> these tags out and redo it?
>
That's really odd. I don't know exactly where it comes from, but I've
done some detective work and here are my findings:

For the AUTHORS  file, it was indeed deleted in a commit by Jeff Chang (2001):
http://github.com/biopython/biopython/tree/c9dfca8631c23b47bddb519dce9e98d07079eb65
Which "renames" the AUTHORS file into CONTRIB file.

The AUTHORS file is in the biopython tags prior to 1.00a1 and then it
should not be there anymore (it's in CVS'a attic)
 I don't know where how it came back...

Similarly, the Bio/Unigene/UniGene.py file was removed by Jeff in a commit:
http://github.com/biopython/biopython/commit/8b940e38d0fbb7c471366f844318c32b08bdd8c2

And similarly, UniGene.py is no longer in CVS repo (but it's still in
the attic).

What these files have in common, is that there are some commits to
them after they've been moved to Attic (sic!)

http://github.com/biopython/biopython/commits/master/Bio/UniGene/UniGene.py
http://github.com/biopython/biopython/commits/master/AUTHORS

I don't know exactly how this could happen, but this inconsistency in
CVS might be causing cvs2git to actually include these guys.

I'll increase the verbosity of the log messages in my cron script, so
Maybe I'll see some indication of a problem.

If nobody has a reason for these files to be included in the current
trunk, I'll go ahead and remove them from git.

cheers
Bartek

-- 
Bartek Wilczynski
==================
Postdoctoral fellow
EMBL, Furlong group
Meyerhoffstrasse 1,
69012 Heidelberg,
Germany
tel: +49 6221 387 8433

From biopython at maubp.freeserve.co.uk  Wed Mar 25 08:20:05 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Wed, 25 Mar 2009 12:20:05 +0000
Subject: [Biopython-dev] history on github - where are the tags?
In-Reply-To: <8b34ec180903250516v75efdd2i95cb77145b4d3001@mail.gmail.com>
References: <320fb6e00903170206h570989bbgb6b9a761d2aa70ed@mail.gmail.com>
	<8b34ec180903241649p7e81a2cew6587512c0cef16f@mail.gmail.com>
	<8b34ec180903241658k21a76269r789600f92c17fbbb@mail.gmail.com>
	<320fb6e00903250328y19165a77t470124ce490cea3d@mail.gmail.com>
	<8b34ec180903250516v75efdd2i95cb77145b4d3001@mail.gmail.com>
Message-ID: <320fb6e00903250520nedc0aaj84c10a1b2a72e8a2@mail.gmail.com>

>> However, all the tags seem to have associated with them the deletion
>> of the files AUTHORS and Bio/UniGene/UniGene.py which is rather odd.
>> If you can work out how this happened, would it be trivial to back
>> these tags out and redo it?
>>
> That's really odd. I don't know exactly where it comes from, but I've
> done some detective work and here are my findings:
>
> For the AUTHORS ?file, it was indeed deleted in a commit by Jeff Chang (2001):
> http://github.com/biopython/biopython/tree/c9dfca8631c23b47bddb519dce9e98d07079eb65
> Which "renames" the AUTHORS file into CONTRIB file.
>
> The AUTHORS file is in the biopython tags prior to 1.00a1 and then it
> should not be there anymore (it's in CVS'a attic)
>?I don't know where how it came back...
>
> Similarly, the Bio/Unigene/UniGene.py file was removed by Jeff in a commit:
> http://github.com/biopython/biopython/commit/8b940e38d0fbb7c471366f844318c32b08bdd8c2
>
> And similarly, UniGene.py is no longer in CVS repo (but it's still in
> the attic).
>
> What these files have in common, is that there are some commits to
> them after they've been moved to Attic (sic!)
>
> http://github.com/biopython/biopython/commits/master/Bio/UniGene/UniGene.py
> http://github.com/biopython/biopython/commits/master/AUTHORS
>
> I don't know exactly how this could happen, but this inconsistency in
> CVS might be causing cvs2git to actually include these guys.

It does sound like a hidden hickup in our CVS repository... very strange.

Peter


From bartek at rezolwenta.eu.org  Wed Mar 25 08:43:00 2009
From: bartek at rezolwenta.eu.org (Bartek Wilczynski)
Date: Wed, 25 Mar 2009 13:43:00 +0100
Subject: [Biopython-dev] history on github - where are the tags?
In-Reply-To: <320fb6e00903250520nedc0aaj84c10a1b2a72e8a2@mail.gmail.com>
References: <320fb6e00903170206h570989bbgb6b9a761d2aa70ed@mail.gmail.com>
	<8b34ec180903241649p7e81a2cew6587512c0cef16f@mail.gmail.com>
	<8b34ec180903241658k21a76269r789600f92c17fbbb@mail.gmail.com>
	<320fb6e00903250328y19165a77t470124ce490cea3d@mail.gmail.com>
	<8b34ec180903250516v75efdd2i95cb77145b4d3001@mail.gmail.com>
	<320fb6e00903250520nedc0aaj84c10a1b2a72e8a2@mail.gmail.com>
Message-ID: <8b34ec180903250543g3029edb4h33d332371ef4e469@mail.gmail.com>

On Wed, Mar 25, 2009 at 1:20 PM, Peter <biopython at maubp.freeserve.co.uk> wrote:

>> I don't know exactly how this could happen, but this inconsistency in
>> CVS might be causing cvs2git to actually include these guys.
>
> It does sound like a hidden hickup in our CVS repository... very strange.


I would rather call it a glitch in a transition. I was actually quite
surprised that the transition went so smooth.
Now we can see that actually some things did not transfer too well...

I did a  thorough check to compare checkouts from current CVS and git
trunks to see that there are also some other differences:
As you can see below, there apart from these two files present only in
git, a number of directories are not missing in git. I've checked:
they are all empty directories leftover because you cannot delete a
directory from CVS (some of them, like Bio.Tools have actually a
number of directories in them, but they are all empty).

I think that it's actually a desired behavior (removing empty
directories) but if anyone is missing any of these dirs, please let me
know.

The diff:

Only in git_branch/: AUTHORS
Only in biopython/Bio: Ais
Only in biopython/Bio: CDD
Only in biopython/Bio: cmmCIF
Only in biopython/Bio: config
Only in biopython/Bio: dbdefs
Only in biopython/Bio: ECell
Only in biopython/Bio: expressions
Only in biopython/Bio: formatdefs
Only in biopython/Bio: Gobase
Only in biopython/Bio: iodefs
Only in biopython/Bio: Kabat
Only in biopython/Bio: LocusLink
Only in biopython/Bio: MultiProc
Only in biopython/Bio/PDB: mmCIF_lex
Only in biopython/Bio: Rebase
Only in biopython/Bio/SCOP: tests
Only in biopython/Bio: sources
Only in biopython/Bio: Tools
Only in git_branch/Bio/UniGene: UniGene.py
Only in biopython/Doc/cookbook: biopython_test
Only in biopython/Doc/cookbook: genbank_to_fasta
Only in biopython/Doc/cookbook: LogisticRegression
Only in biopython: Experimental
Only in git_branch/: .git
Only in biopython/Martel: examples
Only in biopython/Tests: CDD
Only in biopython/Tests: ECell
Only in biopython/Tests: Gobase
Only in biopython/Tests: Kabat
Only in biopython/Tests: LocusLink
Only in biopython/Tests: Ndb
Only in biopython/Tests: UnitTests
Only in biopython/Tests: WIT

cheers
Bartek

From biopython at maubp.freeserve.co.uk  Wed Mar 25 08:47:02 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Wed, 25 Mar 2009 12:47:02 +0000
Subject: [Biopython-dev] history on github - where are the tags?
In-Reply-To: <8b34ec180903250543g3029edb4h33d332371ef4e469@mail.gmail.com>
References: <320fb6e00903170206h570989bbgb6b9a761d2aa70ed@mail.gmail.com>
	<8b34ec180903241649p7e81a2cew6587512c0cef16f@mail.gmail.com>
	<8b34ec180903241658k21a76269r789600f92c17fbbb@mail.gmail.com>
	<320fb6e00903250328y19165a77t470124ce490cea3d@mail.gmail.com>
	<8b34ec180903250516v75efdd2i95cb77145b4d3001@mail.gmail.com>
	<320fb6e00903250520nedc0aaj84c10a1b2a72e8a2@mail.gmail.com>
	<8b34ec180903250543g3029edb4h33d332371ef4e469@mail.gmail.com>
Message-ID: <320fb6e00903250547s7d88a1b3h8c52dd852047edb6@mail.gmail.com>

On Wed, Mar 25, 2009 at 12:43 PM, Bartek Wilczynski
<bartek at rezolwenta.eu.org> wrote:
> I did a ?thorough check to compare checkouts from current CVS and git
> trunks to see that there are also some other differences:
> As you can see below, there apart from these two files present only in
> git, a number of directories are not missing in git. I've checked:
> they are all empty directories leftover because you cannot delete a
> directory from CVS (some of them, like Bio.Tools have actually a
> number of directories in them, but they are all empty).
>
> I think that it's actually a desired behavior (removing empty
> directories) but if anyone is missing any of these dirs, please let me
> know.

I don't care about the missing empty directories - if/once we move
to git, we would have deleted them anyway.  So if that has been done
automatically, that's fine in my opinion.

Peter


From tiagoantao at gmail.com  Wed Mar 25 11:39:42 2009
From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=)
Date: Wed, 25 Mar 2009 15:39:42 +0000
Subject: [Biopython-dev] biopython on github
In-Reply-To: <320fb6e00903241154g1a0f468cy512b29504b8b637a@mail.gmail.com>
References: <320fb6e00903170146x59218aa0m857cab797ad3f440@mail.gmail.com>
	<C5E52504.1F20A%lpritc@scri.ac.uk>
	<20090317124930.GE57054@sobchak.mgh.harvard.edu>
	<6d941f120903241142m2f39213yfd180fcdc7ab7f0e@mail.gmail.com>
	<320fb6e00903241154g1a0f468cy512b29504b8b637a@mail.gmail.com>
Message-ID: <6d941f120903250839i62f6d8f9i8a5f5b85ff694848@mail.gmail.com>

On Tue, Mar 24, 2009 at 6:54 PM, Peter <biopython at maubp.freeserve.co.uk> wrote:
>> In my view, this protects the people working on the official thing
>> from the potential chaos of new developments, while creating a
>> framework which allow for people to test innovations...
>
> That sounds great, and a good model for other (self contained) modules under


Just a minor point. any development branches should be seen as highly
unstable. I say this just because I am restarting to work on
statistics and I am seeing massive refactoring going on. So if people
track development branches, they should be prepared for chaos ;) .
Which is exactly the opposite they should expect from the official
branch ;)

From biopython at maubp.freeserve.co.uk  Wed Mar 25 11:45:00 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Wed, 25 Mar 2009 15:45:00 +0000
Subject: [Biopython-dev] biopython on github
In-Reply-To: <6d941f120903250839i62f6d8f9i8a5f5b85ff694848@mail.gmail.com>
References: <320fb6e00903170146x59218aa0m857cab797ad3f440@mail.gmail.com>
	<C5E52504.1F20A%lpritc@scri.ac.uk>
	<20090317124930.GE57054@sobchak.mgh.harvard.edu>
	<6d941f120903241142m2f39213yfd180fcdc7ab7f0e@mail.gmail.com>
	<320fb6e00903241154g1a0f468cy512b29504b8b637a@mail.gmail.com>
	<6d941f120903250839i62f6d8f9i8a5f5b85ff694848@mail.gmail.com>
Message-ID: <320fb6e00903250845u23dea2a6o5330bfdec0d577ef@mail.gmail.com>

2009/3/25 Tiago Ant?o <tiagoantao at gmail.com>:
> Just a minor point. any development branches should be seen as highly
> unstable. I say this just because I am restarting to work on
> statistics and I am seeing massive refactoring going on. So if people
> track development branches, they should be prepared for chaos ;) .
> Which is exactly the opposite they should expect from the official
> branch ;)

We should probably all write something on the wiki page for our
personal forks, describing what you're using it for, what at the main
branches likely to be of interest etc.

Peter


From bartek at rezolwenta.eu.org  Wed Mar 25 12:33:13 2009
From: bartek at rezolwenta.eu.org (Bartek Wilczynski)
Date: Wed, 25 Mar 2009 17:33:13 +0100
Subject: [Biopython-dev] biopython on github
In-Reply-To: <320fb6e00903250845u23dea2a6o5330bfdec0d577ef@mail.gmail.com>
References: <320fb6e00903170146x59218aa0m857cab797ad3f440@mail.gmail.com>
	<C5E52504.1F20A%lpritc@scri.ac.uk>
	<20090317124930.GE57054@sobchak.mgh.harvard.edu>
	<6d941f120903241142m2f39213yfd180fcdc7ab7f0e@mail.gmail.com>
	<320fb6e00903241154g1a0f468cy512b29504b8b637a@mail.gmail.com>
	<6d941f120903250839i62f6d8f9i8a5f5b85ff694848@mail.gmail.com>
	<320fb6e00903250845u23dea2a6o5330bfdec0d577ef@mail.gmail.com>
Message-ID: <8b34ec180903250933y5a4bdf6elae31f683d2848205@mail.gmail.com>

2009/3/25 Peter <biopython at maubp.freeserve.co.uk>:
>
> We should probably all write something on the wiki page for our
> personal forks, describing what you're using it for, what at the main
> branches likely to be of interest etc.

Hi,

I'll be happy to write some draft version of guidelines for developers
and contibutors to the wiki.

It just seems that currently there are some problems with biopython
wiki. Does anyone know what is the problem?
Is it some kind of internal OBF issue or is it because of increased
interest in biopython after the application note was
published? Do we have access to any access statistics to the website?

cheers
Bartek

From biopython at maubp.freeserve.co.uk  Wed Mar 25 12:41:00 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Wed, 25 Mar 2009 16:41:00 +0000
Subject: [Biopython-dev] biopython on github
In-Reply-To: <8b34ec180903250933y5a4bdf6elae31f683d2848205@mail.gmail.com>
References: <320fb6e00903170146x59218aa0m857cab797ad3f440@mail.gmail.com>
	<C5E52504.1F20A%lpritc@scri.ac.uk>
	<20090317124930.GE57054@sobchak.mgh.harvard.edu>
	<6d941f120903241142m2f39213yfd180fcdc7ab7f0e@mail.gmail.com>
	<320fb6e00903241154g1a0f468cy512b29504b8b637a@mail.gmail.com>
	<6d941f120903250839i62f6d8f9i8a5f5b85ff694848@mail.gmail.com>
	<320fb6e00903250845u23dea2a6o5330bfdec0d577ef@mail.gmail.com>
	<8b34ec180903250933y5a4bdf6elae31f683d2848205@mail.gmail.com>
Message-ID: <320fb6e00903250941o6e99e06egb672b62f2d661e15@mail.gmail.com>

On Wed, Mar 25, 2009 at 4:33 PM, Bartek Wilczynski
<bartek at rezolwenta.eu.org> wrote:
>
> 2009/3/25 Peter <biopython at maubp.freeserve.co.uk>:
>>
>> We should probably all write something on the wiki page for our
>> personal forks, describing what you're using it for, what at the main
>> branches likely to be of interest etc.
>
> Hi,
>
> I'll be happy to write some draft version of guidelines for developers
> and contibutors to the wiki.

Certainly add a section to the git migration page.

> It just seems that currently there are some problems with biopython
> wiki. Does anyone know what is the problem?
> Is it some kind of internal OBF issue or is it because of increased
> interest in biopython after the application note was
> published? Do we have access to any access statistics to the website?

Its seems to be all the OBF pages (e.g. bioperl.org too), and its been
more than an hour so I'll drop their support team an email.

Peter

From sbassi at clubdelarazon.org  Wed Mar 25 12:59:28 2009
From: sbassi at clubdelarazon.org (Sebastian Bassi)
Date: Wed, 25 Mar 2009 13:59:28 -0300
Subject: [Biopython-dev] SeqIO and qual: Question about reading and
	writing qual files
In-Reply-To: <320fb6e00903250301v59319214pa3246e0a49899e87@mail.gmail.com>
References: <9e2f512b0903232324qb509c60v4154d3e1bffb089e@mail.gmail.com>
	<320fb6e00903240249h4d0bf648rfd5de741e582f687@mail.gmail.com>
	<9e2f512b0903240759n3c7f8b8fpc96bccd4d629082d@mail.gmail.com>
	<320fb6e00903240813x5fdb3589qef340129b5e267c0@mail.gmail.com>
	<9e2f512b0903240833g7768de97q8f10fe72cde7e64a@mail.gmail.com>
	<320fb6e00903250301v59319214pa3246e0a49899e87@mail.gmail.com>
Message-ID: <9e2f512b0903250959h26081e4ak3246252d02be2ee0@mail.gmail.com>

On Wed, Mar 25, 2009 at 7:01 AM, Peter <biopython at maubp.freeserve.co.uk> wrote:
....
> Sebastian - could you have a quick play with this github code (using the new
> UnknownSeq class), and the current CVS code (using None), and make sure
> both support the slicing operations you were trying earlier?  Thanks.

OK, I'll try both today and report back to the list.

From eric.talevich at gmail.com  Wed Mar 25 17:44:30 2009
From: eric.talevich at gmail.com (Eric Talevich)
Date: Wed, 25 Mar 2009 17:44:30 -0400
Subject: [Biopython-dev] PDB tidy script
In-Reply-To: <320fb6e00903231405l479ddcc6of9cd0c1aa8fd98d4@mail.gmail.com>
References: <320fb6e00903231405l479ddcc6of9cd0c1aa8fd98d4@mail.gmail.com>
Message-ID: <3f6baf360903251444l3064963bp788750ed7a67e4d4@mail.gmail.com>

On Mon, Mar 23, 2009 at 5:05 PM, Peter <biopython at maubp.freeserve.co.uk>wrote:

>
> If you look back over the history, there initially was no header parsing,
> it was a contribution from Kristian Rother, and I would agree, it is rather
> disjoint from the rest of the code.  One thing I personally wanted last
> time I was working with PDB files was to have secondary structure
> information (for them alpha and beta sheet lines in the header)
> mapped onto the residue objects automatically.
>
> And yes, Thomas is supporting the PDB module, but his time has
> been rather limited of late.  When I asked him about some of the
> open enhancement requests in bugzilla recently (off list) he said
> said we needed "a separate class to parse all the info in the header,
> not a slew of additions to the core parser class (which is designed
> to deal with the 3D data only)."
>
>
I can understand both those wishes. Looking at the features currently
available in the module, the best approach might be to leave the 3D parser
and PDB.Entity-derived classes alone and add another wrapper class
containing the header, sequence (maybe), secondary and tertiary structure as
separate attributes.

When working in the REPL, I've wished for a simpler function to load PDB
files by path and figure out the name automatically; this would be an easy
way to do it without violating Thomas's parser -- just use
parse_pdb_header() in the wrapper, and use the name from there as the first
argument to PDB.get_structure(). For example (quick & dirty):

class PDBLoader:
    def __init__(self, path):
        self.__dict__ = parse_pdb_header(path)
        if not self.name:
            self.name = os.path.basename(path).split('.')[0]
        parse_3d = PDBParser()
        self.structure = parse_3d.get_structure(self.name, path)
        # self.secondary = ?
        # link 1/2/3ary data in various ways ...

>>> pdb = PDBLoader('a_structure.pdb')
>>> dir(pdb)
['__doc__', '__init__', '__module__', 'author', 'compound',
'deposition_date', 'head', 'journal_reference', 'name', 'release_date',
'resolution', 'source', 'structure', 'structure_method',
'structure_reference']


In that case, it would be reasonable to let get_structure and
parse_pdb_header take an open file-like object as an alternative to the PDB
file's path to avoid opening and closing the same file repeatedly. There's
also some cleanup to do in parse_pdb_header.py alongside this.

Does this sound reasonable?

-Eric

From chapmanb at 50mail.com  Wed Mar 25 17:55:48 2009
From: chapmanb at 50mail.com (Brad Chapman)
Date: Wed, 25 Mar 2009 17:55:48 -0400
Subject: [Biopython-dev] biopython on github
In-Reply-To: <320fb6e00903250845u23dea2a6o5330bfdec0d577ef@mail.gmail.com>
References: <320fb6e00903170146x59218aa0m857cab797ad3f440@mail.gmail.com>
	<C5E52504.1F20A%lpritc@scri.ac.uk>
	<20090317124930.GE57054@sobchak.mgh.harvard.edu>
	<6d941f120903241142m2f39213yfd180fcdc7ab7f0e@mail.gmail.com>
	<320fb6e00903241154g1a0f468cy512b29504b8b637a@mail.gmail.com>
	<6d941f120903250839i62f6d8f9i8a5f5b85ff694848@mail.gmail.com>
	<320fb6e00903250845u23dea2a6o5330bfdec0d577ef@mail.gmail.com>
Message-ID: <20090325215548.GB21577@sobchak.mgh.harvard.edu>

Hey all;
Good discussion on this; I touch on a few points from different
threads below.

Michiel:
> I haven't been following this topic closely, and as an "outsider"
> using git seems more complicated than using cvs or svn. And to be
> honest, I don't know if Biopython actually needs the branching and
> forking stuff. I think that this is more useful for bigger projects,
> where multiple developers may be working on interrelated parts of code
> at the same time. That hardly ever happens in Biopython, though.

Tiago:
> I would actually take this argument and reverse it:
[...]
> Using a distributed technology allows for people to try new ideas and 
> to get things moving (while still maintaining an official rock stable 
> version with maybe glacial policies).

I fall in between these two viewpoints. Git has more complications and,
unless we manage those, we risk introducing additional barriers to
contribution. Imagine looking at biopython on git hub and seeing 10
different branches for different users, many of which may be old and
out of date. This could lead to the impression that we are not
organized toward a single goal. If you are still interested, how
do you know which ones could use your help and what they are for?

The solution to this is documentation on the wiki. We rely too much on
the mailing list and expect people to keep up. Peter read my mind on
this:

Peter:
> We should probably all write something on the wiki page for our
> personal forks, describing what you're using it for, what at the main
> branches likely to be of interest etc.

I started a page over the weekend doing this:

http://biopython.org/wiki/Active_projects

It's a skeleton so add or subtract away. My idea for this is that it
is for longer projects that could use outside help. It's not reasonable
to spend time writing up things you'll be finishing in a week or so; for
that bugzilla does fine keeping interested parties up to date.

Another idea on this page is a specific wish list of libraries for
future work. This is a starting point for anyone who comes into
Biopython fresh and would like to take something on. Also, it encourages
people who have developed external libraries to deal with problems we
are interested in to consider folding them into Biopython.

Me:
> > There is a lot of good material in this thread for new potential
> > developers. Tiago, it would make sense to condense what you've
> > written and include it with the Contributing guide:

Tiago:
> Just a followup on this: I think it makes no sense to put much of the
> new content before there is an official step of moving to github.

We are serious about moving to Git and need to have the documentation in
place so others can learn it. You wrote up a lot of good stuff, and it
will be lost on the mailing list.

Brad

From bugzilla-daemon at portal.open-bio.org  Wed Mar 25 18:43:57 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 25 Mar 2009 18:43:57 -0400
Subject: [Biopython-dev] [Bug 2799] UnknownSeq object (e.g. for QUAL files)
In-Reply-To: <bug-2799-42@http.bugzilla.open-bio.org/>
Message-ID: <200903252243.n2PMhvoT007523@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2799


------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk  2009-03-25 18:43 EST -------
I've made my first attempt at this available as a personal branch on github,
http://github.com/peterjc/biopython/tree/bug2799-UnknownSeq


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From sbassi at clubdelarazon.org  Wed Mar 25 19:15:05 2009
From: sbassi at clubdelarazon.org (Sebastian Bassi)
Date: Wed, 25 Mar 2009 20:15:05 -0300
Subject: [Biopython-dev] SeqIO and qual: Question about reading and
	writing qual files
In-Reply-To: <320fb6e00903250301v59319214pa3246e0a49899e87@mail.gmail.com>
References: <9e2f512b0903232324qb509c60v4154d3e1bffb089e@mail.gmail.com>
	<320fb6e00903240249h4d0bf648rfd5de741e582f687@mail.gmail.com>
	<9e2f512b0903240759n3c7f8b8fpc96bccd4d629082d@mail.gmail.com>
	<320fb6e00903240813x5fdb3589qef340129b5e267c0@mail.gmail.com>
	<9e2f512b0903240833g7768de97q8f10fe72cde7e64a@mail.gmail.com>
	<320fb6e00903250301v59319214pa3246e0a49899e87@mail.gmail.com>
Message-ID: <9e2f512b0903251615x7c14c90en3b3a9b2b6ff86186@mail.gmail.com>

On Wed, Mar 25, 2009 at 7:01 AM, Peter <biopython at maubp.freeserve.co.uk> wrote:
> Sebastian - could you have a quick play with this github code (using the new
> UnknownSeq class), and the current CVS code (using None), and make sure
> both support the slicing operations you were trying earlier?  Thanks.

First I tried the CVS code (with None in seq), it worked.
Then I tried the git code and it also worked. One thing I noticed is
that I got "?" instead of "N" the "sequence" of the UnknownSeq.
>From a practical point of view, both versions are the same, but the
concept of UnknownSeq looks solid than None, because if I don't know
about about biopython internals, I would never try to slice a None
seq. With "None":
len(s) returns:

Traceback (most recent call last):
  File "/home/sbassi/bioinfo/INTA/qualparser.py", line 21, in <module>
    print len(s)
  File "/home/sbassi/test/virtualenv-1.3.2/t6/lib/python2.5/site-packages/biopython-1.49-py2.5-linux-i686.egg/Bio/SeqRecord.py",
line 481, in __len__
    return len(self.seq)
TypeError: object of type 'NoneType' has no len()

So I would never try to do:
new_s = s[10:30]

But with the UnknownSeq object, len(s) returns an actual length, so it
is more intuitive that it can be sliced.

I liked the github interface, may I setup my own repository?

Best,

-- 
Sebasti?n Bassi. Diplomado en Ciencia y Tecnolog?a.

Non standard disclaimer: READ CAREFULLY. By reading this email,
you agree, on behalf of your employer, to release me from all
obligations and waivers arising from any and all NON-NEGOTIATED
agreements, licenses, terms-of-service, shrinkwrap, clickwrap,
browsewrap, confidentiality, non-disclosure, non-compete and
acceptable use policies ("BOGUS AGREEMENTS") that I have
entered into with your employer, its partners, licensors, agents and
assigns, in perpetuity, without prejudice to my ongoing rights and
privileges. You further represent that you have the authority to release
me from any BOGUS AGREEMENTS on behalf of your employer.


From biopython at maubp.freeserve.co.uk  Wed Mar 25 19:30:14 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Wed, 25 Mar 2009 23:30:14 +0000
Subject: [Biopython-dev] SeqIO and qual: Question about reading and
	writing qual files
In-Reply-To: <9e2f512b0903251615x7c14c90en3b3a9b2b6ff86186@mail.gmail.com>
References: <9e2f512b0903232324qb509c60v4154d3e1bffb089e@mail.gmail.com>
	<320fb6e00903240249h4d0bf648rfd5de741e582f687@mail.gmail.com>
	<9e2f512b0903240759n3c7f8b8fpc96bccd4d629082d@mail.gmail.com>
	<320fb6e00903240813x5fdb3589qef340129b5e267c0@mail.gmail.com>
	<9e2f512b0903240833g7768de97q8f10fe72cde7e64a@mail.gmail.com>
	<320fb6e00903250301v59319214pa3246e0a49899e87@mail.gmail.com>
	<9e2f512b0903251615x7c14c90en3b3a9b2b6ff86186@mail.gmail.com>
Message-ID: <320fb6e00903251630t45da293fl4d8d111b7e7eedc9@mail.gmail.com>

On Wed, Mar 25, 2009 at 11:15 PM, Sebastian Bassi:
>> Sebastian - could you have a quick play with this github code (using the new
>> UnknownSeq class), and the current CVS code (using None), and make sure
>> both support the slicing operations you were trying earlier? ?Thanks.
>
> First I tried the CVS code (with None in seq), it worked.

OK, good.  That will do in the very short term - the UnknownSeq needs
some more testing and general approval before I'd check that in.

> Then I tried the git code and it also worked. One thing I noticed is
> that I got "?" instead of "N" the "sequence" of the UnknownSeq.

I felt we shouldn't use an "N" unless we are confident the sequence
is nucleotides.  In practice, this is probably a safe assumption for
FASTQ and QUAL files - unless anyone can think of a counter example?
Do you think it is safe to assume FASTQ and QUAL files are just for
nucleotides?

I mean, you could translate a CDS from transcriptome sequencing,
and for the sake of argument give each amino acid a quality score
from the three nucleotide quality scores, and then save this a protein
FASTQ file.  But I've never heard of anyone actually doing this ;)

> From a practical point of view, both versions are the same, but the
> concept of UnknownSeq looks solid than None, because if I don't know
> about about biopython internals, I would never try to slice a None
> seq. With "None":
> len(s) returns:
>
> Traceback (most recent call last):
> ...
> TypeError: object of type 'NoneType' has no len()
>
> So I would never try to do:
> new_s = s[10:30]
>
> But with the UnknownSeq object, len(s) returns an actual length, so it
> is more intuitive that it can be sliced.

I agree the UnknownSeq is more intuitive - plus it makes the SeqRecord
__getitem__ code nicer, and it means you can do len(SeqRecord) too,
which was problematic if the sequence was None.

>
> I liked the github interface, may I setup my own repository?
>

Yes - this is one of the nice things about git, it makes it easy for anyone
to make their own local branch of Biopython, but keep it under version
control and pull in changes from the master branch (or another git user)
quite easily.  It should also make it easy to offer changes back to the
main project (assuming we do switch to hosting it on git, for now it is
still being done via CVS).  However, bear in mind this is still only a test
migration, and it is still possible we'll have to redo the CVS to git
migration.  There is a long (and on going) thread on this mailing list
about all this already, with an evolving wiki page:
http://biopython.org/wiki/GitMigration

Peter


From bartek at rezolwenta.eu.org  Wed Mar 25 21:02:59 2009
From: bartek at rezolwenta.eu.org (Bartek Wilczynski)
Date: Thu, 26 Mar 2009 02:02:59 +0100
Subject: [Biopython-dev] biopython on github
In-Reply-To: <20090325215548.GB21577@sobchak.mgh.harvard.edu>
References: <320fb6e00903170146x59218aa0m857cab797ad3f440@mail.gmail.com>
	<C5E52504.1F20A%lpritc@scri.ac.uk>
	<20090317124930.GE57054@sobchak.mgh.harvard.edu>
	<6d941f120903241142m2f39213yfd180fcdc7ab7f0e@mail.gmail.com>
	<320fb6e00903241154g1a0f468cy512b29504b8b637a@mail.gmail.com>
	<6d941f120903250839i62f6d8f9i8a5f5b85ff694848@mail.gmail.com>
	<320fb6e00903250845u23dea2a6o5330bfdec0d577ef@mail.gmail.com>
	<20090325215548.GB21577@sobchak.mgh.harvard.edu>
Message-ID: <8b34ec180903251802h30661c80q51aab573f5c07c5@mail.gmail.com>

On Wed, Mar 25, 2009 at 10:55 PM, Brad Chapman <chapmanb at 50mail.com> wrote:
> Hey all;
> Good discussion on this; I touch on a few points from different
> threads below.
>
Indeed, I'm very happy that we got the ball rolling and more people
now take part in the discussion.

> I fall in between these two viewpoints. Git has more complications and,
> unless we manage those, we risk introducing additional barriers to
> contribution. Imagine looking at biopython on git hub and seeing 10
> different branches for different users, many of which may be old and
> out of date. This could lead to the impression that we are not
> organized toward a single goal. If you are still interested, how
> do you know which ones could use your help and what they are for?
>
> The solution to this is documentation on the wiki. We rely too much on
> the mailing list and expect people to keep up. Peter read my mind on
> this:
>
> Peter:
>> We should probably all write something on the wiki page for our
>> personal forks, describing what you're using it for, what at the main
>> branches likely to be of interest etc.
>
> I started a page over the weekend doing this:
>
> http://biopython.org/wiki/Active_projects
>
> It's a skeleton so add or subtract away. My idea for this is that it
> is for longer projects that could use outside help. It's not reasonable
> to spend time writing up things you'll be finishing in a week or so; for
> that bugzilla does fine keeping interested parties up to date.
>
> Another idea on this page is a specific wish list of libraries for
> future work. This is a starting point for anyone who comes into
> Biopython fresh and would like to take something on. Also, it encourages
> people who have developed external libraries to deal with problems we
> are interested in to consider folding them into Biopython.

Great ideas. I fully agree that we need clear documentation if we want
more people to contribute.

>
> Me:
>> > There is a lot of good material in this thread for new potential
>> > developers. Tiago, it would make sense to condense what you've
>> > written and include it with the Contributing guide:
>
> Tiago:
>> Just a followup on this: I think it makes no sense to put much of the
>> new content before there is an official step of moving to github.
>
> We are serious about moving to Git and need to have the documentation in
> place so others can learn it. You wrote up a lot of good stuff, and it
> will be lost on the mailing list.

Continuing on that topic. I think there are three (more or less
separate) issues here:
1) Describing git usage technically, to make sure all developers have
a smooth transition to git from CVS
2) Describing typical ways to use git in biopython. This is very
important to calrify how we are going to use cool features of
git/github in biopython. I'm not advocating here to write it very
precisely and I'm fully aware that it's going to change over time as
we learn to use things better, but writing things up will help us
understand how we want to use git/github.
3) General contributing guide with coding style and testing framework etc.

I think that point 3 is quite well separated from the other two
points, which are more git related. I think it is also nicely handled
by the current wiki page:
http://biopython.org/wiki/Contributing. It might be mildly adapted to
include some info on git branches, but these will be minor things.

Points 1 and 2 are not so easily separable, but I don't think it's a
major problem. Current version of the
http://biopython.org/wiki/GitMigration
 touches upon them, but it is meant as a temporary info, so it does
not describe how things should be done after we really make the
switch. I think we need to spearate these issues (temporary
arrangements vs. final desired procedures), so I made a new wiki page:
 http://biopython.org/wiki/GitUsage
which is meant as an early draft of such guidelines. This page is
meant to serve as a technical tutorial describing typical tasks in
biopython development.

Please feel free to modify/expand this page and/or send comments to
the mailing list.
I've tried to keep it close to our current development model, but
there is a lot of room for discussion and I'm very open to new ideas.

cheers
  Bartek

From lpritc at scri.ac.uk  Thu Mar 26 07:21:26 2009
From: lpritc at scri.ac.uk (Leighton Pritchard)
Date: Thu, 26 Mar 2009 11:21:26 +0000
Subject: [Biopython-dev] Biopython on Twitter
Message-ID: <C5F115B6.1FA1D%lpritc@scri.ac.uk>

Hi all,

There's a fair old bit of chatter on the latest bandwagon: Twitter, about
Biopython 
(http://search.twitter.com/search?max_id=1393366734&page=1&q=biopython).
Seeing as both BioPerl and the OBF have 'official' Twitter accounts, it
might be useful to have a Biopython Twitter account as a way of getting news
out automatically (there's a python-twitter API:
http://code.google.com/p/python-twitter/), and as a way of facilitating
conversation or community around Biopython - suitable representatives of the
official edifice/holders of the password no doubt to be discussed ;)

Anyhoo, to avoid it being squatted in the interim, I've set up an account in
Biopython's name, with Peter's email account (thanks, Peter) - he also knows
the password.  

If no-one likes the idea or thinks it worthwhile, or Twitter goes the way of
Gopher and OS/2 Warp in short order, it can just die on the vine - but given
the number of tweets mentioning Biopython, it would be a shame for that to
happen too soon ;)

The Biopython Twitter home page is at http://twitter.com/Biopython

L.

-- 
Dr Leighton Pritchard MRSC
D131, Plant Pathology Programme, SCRI
Errol Road, Invergowrie, Perth and Kinross, Scotland, DD2 5DA
e:lpritc at scri.ac.uk       w:http://www.scri.ac.uk/staff/leightonpritchard
gpg/pgp: 0xFEFC205C       tel:+44(0)1382 562731 x2405


______________________________________________________
SCRI, Invergowrie, Dundee, DD2 5DA.  
The Scottish Crop Research Institute is a charitable company limited by guarantee. 
Registered in Scotland No: SC 29367.
Recognised by the Inland Revenue as a Scottish Charity No: SC 006662.


DISCLAIMER:

This email is from the Scottish Crop Research Institute, but the views expressed by the sender are not necessarily the views of SCRI and its subsidiaries.  This email and any files transmitted with it are confidential to the intended recipient at the e-mail address to which it has been addressed.  It may not be disclosed or used by any other than that
addressee.
If you are not the intended recipient you are requested to preserve this confidentiality and you must not use, disclose, copy, print or rely on
this e-mail in any way. Please notify postmaster at scri.ac.uk quoting the name of the sender and delete the email from your system.

Although SCRI has taken reasonable precautions to ensure no viruses are present in this email, neither the Institute nor the sender accepts any responsibility for any viruses, and it is your responsibility to scan the email and the attachments (if any).
______________________________________________________

From tiagoantao at gmail.com  Thu Mar 26 08:13:20 2009
From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=)
Date: Thu, 26 Mar 2009 12:13:20 +0000
Subject: [Biopython-dev] biopython on github
In-Reply-To: <20090325215548.GB21577@sobchak.mgh.harvard.edu>
References: <320fb6e00903170146x59218aa0m857cab797ad3f440@mail.gmail.com>
	<C5E52504.1F20A%lpritc@scri.ac.uk>
	<20090317124930.GE57054@sobchak.mgh.harvard.edu>
	<6d941f120903241142m2f39213yfd180fcdc7ab7f0e@mail.gmail.com>
	<320fb6e00903241154g1a0f468cy512b29504b8b637a@mail.gmail.com>
	<6d941f120903250839i62f6d8f9i8a5f5b85ff694848@mail.gmail.com>
	<320fb6e00903250845u23dea2a6o5330bfdec0d577ef@mail.gmail.com>
	<20090325215548.GB21577@sobchak.mgh.harvard.edu>
Message-ID: <6d941f120903260513v734b5dd8kd8d148bebec9674b@mail.gmail.com>

Hi,

On Wed, Mar 25, 2009 at 9:55 PM, Brad Chapman <chapmanb at 50mail.com> wrote:
> The solution to this is documentation on the wiki. We rely too much on
> the mailing list and expect people to keep up. Peter read my mind on
> this:

I fully agree on this. There is lots of implicit policy that is either
not documented at all or only to be read here on the mailing list. All
should be on the wiki. Clear, transparent, explicit, for everybody to
see (at least that is my personal opinion).


> We are serious about moving to Git and need to have the documentation in
> place so others can learn it. You wrote up a lot of good stuff, and it
> will be lost on the mailing list.

I am planning on changing http://biopython.org/wiki/PopGen_dev and
"GITify" it completely. I will draft a document with a policy for
updates (just as a starting point, please feel free to disagree), the
currently existing branches and so on.

I will include a set of tips on how to pull stuff from GIT, regarding
this part I note:
a. maybe this can be moved, in the future, to the general biopython documentaion
b. I am far from being a git specialist. Corrections will surely be
needed and encouraged.

I will write back here when the changes are done.

Tiago

From jblanca at btc.upv.es  Thu Mar 26 08:24:59 2009
From: jblanca at btc.upv.es (Jose Blanca)
Date: Thu, 26 Mar 2009 13:24:59 +0100
Subject: [Biopython-dev] [BioPython] about the SeqRecord slicing
In-Reply-To: <320fb6e00903260505j387279b7kfa4c69c33efe5487@mail.gmail.com>
References: <200903261248.02279.jblanca@btc.upv.es>
	<320fb6e00903260505j387279b7kfa4c69c33efe5487@mail.gmail.com>
Message-ID: <200903261324.59655.jblanca@btc.upv.es>

Fisrt of all sorry for sending the last mail to the BioPython general list.

On Thursday 26 March 2009 13:05:25 Peter wrote:
> Can you give me an example of where you want to pull out a single
> character from a SeqRecord, and its quality? ?I would consider things
> like this quite elegant:
>
> for letter, quality in zip(record.seq,
> record.letter_annotations("phred_quality") :
> ? ?#do stuff
I'm implementing a Contig class similar to the Alignment class but with the 
added capability of supporting sequences that do not start and end at the 
same position and with the capability of masking the sequences.
I'm implementing the __getitem__ method.
When I request a column I get for all sequences a int slice and I return the 
result of adding them all. I could solve the problem as you suggest. The 
problem is that this Contig class can work also with Seqs and strs (to 
simplify its use when we don't need a full SeqRecord). If SeqRecord behaves 
more like a Seq or a str I wouldn't need to check for the special SeqRecord 
case in the Contig.__getitem__ method.
Best regards,

-- 
Jose M. Blanca Postigo
Instituto Universitario de Conservacion y
Mejora de la Agrodiversidad Valenciana (COMAV)
Universidad Politecnica de Valencia (UPV)
Edificio CPI (Ciudad Politecnica de la Innovacion), 8E
46022 Valencia (SPAIN)
Tlf.:+34-96-3877000 (ext 88473)


From chapmanb at 50mail.com  Thu Mar 26 08:57:07 2009
From: chapmanb at 50mail.com (Brad Chapman)
Date: Thu, 26 Mar 2009 08:57:07 -0400
Subject: [Biopython-dev] biopython on github
In-Reply-To: <8b34ec180903251802h30661c80q51aab573f5c07c5@mail.gmail.com>
References: <320fb6e00903170146x59218aa0m857cab797ad3f440@mail.gmail.com>
	<C5E52504.1F20A%lpritc@scri.ac.uk>
	<20090317124930.GE57054@sobchak.mgh.harvard.edu>
	<6d941f120903241142m2f39213yfd180fcdc7ab7f0e@mail.gmail.com>
	<320fb6e00903241154g1a0f468cy512b29504b8b637a@mail.gmail.com>
	<6d941f120903250839i62f6d8f9i8a5f5b85ff694848@mail.gmail.com>
	<320fb6e00903250845u23dea2a6o5330bfdec0d577ef@mail.gmail.com>
	<20090325215548.GB21577@sobchak.mgh.harvard.edu>
	<8b34ec180903251802h30661c80q51aab573f5c07c5@mail.gmail.com>
Message-ID: <20090326125707.GE21577@sobchak.mgh.harvard.edu>

Hi all;

Bartek:
> Continuing on that topic. I think there are three (more or less
> separate) issues here:
> 1) Describing git usage technically, to make sure all developers have
> a smooth transition to git from CVS
> 2) Describing typical ways to use git in biopython. 
[...]
> 3) General contributing guide with coding style and testing framework etc.
> 
> I think that point 3 is quite well separated from the other two
> points, which are more git related. I think it is also nicely handled
> by the current wiki page: http://biopython.org/wiki/Contributing. 
[...]
> Points 1 and 2 are not so easily separable, but I don't think it's a
> major problem. Current version of the
> http://biopython.org/wiki/GitMigration
>  touches upon them, but it is meant as a temporary info, so it does
> not describe how things should be done after we really make the
> switch. I think we need to spearate these issues (temporary
> arrangements vs. final desired procedures), so I made a new wiki page:
>  http://biopython.org/wiki/GitUsage
> which is meant as an early draft of such guidelines. This page is
> meant to serve as a technical tutorial describing typical tasks in
> biopython development.

Great writeup, and I agree with you on everything up until the last
point. Why do we need two pages with overlapping information? This
means we have to do more work to keep them in sync and creates confusion.
GitMigration is/was our documentation page. If it is the name that
makes it seem temporary, we should kill GitMigration and re-route all
wiki links to GitUsage. Then we can continue forward with getting
the documentation up to par on GitUsage.

Having the disclaimer that the page and migration is in process is
enough of a warning. When we move to git permanently, we can just
remove the warnings, update the final links and we will be done.

Brad

From tiagoantao at gmail.com  Thu Mar 26 09:09:31 2009
From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=)
Date: Thu, 26 Mar 2009 13:09:31 +0000
Subject: [Biopython-dev] biopython on github
In-Reply-To: <20090326125707.GE21577@sobchak.mgh.harvard.edu>
References: <320fb6e00903170146x59218aa0m857cab797ad3f440@mail.gmail.com>
	<C5E52504.1F20A%lpritc@scri.ac.uk>
	<20090317124930.GE57054@sobchak.mgh.harvard.edu>
	<6d941f120903241142m2f39213yfd180fcdc7ab7f0e@mail.gmail.com>
	<320fb6e00903241154g1a0f468cy512b29504b8b637a@mail.gmail.com>
	<6d941f120903250839i62f6d8f9i8a5f5b85ff694848@mail.gmail.com>
	<320fb6e00903250845u23dea2a6o5330bfdec0d577ef@mail.gmail.com>
	<20090325215548.GB21577@sobchak.mgh.harvard.edu>
	<8b34ec180903251802h30661c80q51aab573f5c07c5@mail.gmail.com>
	<20090326125707.GE21577@sobchak.mgh.harvard.edu>
Message-ID: <6d941f120903260609q247ad2b0o4c810fa7afda7449@mail.gmail.com>

I've added some text regarding git on
http://biopython.org/wiki/PopGen_dev
(see "Code and Contributing" and "Existing Development branches").
Feel free to criticise. I've included a link to the wonderful GitUsage page
Giovanni: if you feel that I've deleted/changed something I should not
have, please say.


On Thu, Mar 26, 2009 at 12:57 PM, Brad Chapman <chapmanb at 50mail.com> wrote:
> Hi all;
>
> Bartek:
>> Continuing on that topic. I think there are three (more or less
>> separate) issues here:
>> 1) Describing git usage technically, to make sure all developers have
>> a smooth transition to git from CVS
>> 2) Describing typical ways to use git in biopython.
> [...]
>> 3) General contributing guide with coding style and testing framework etc.
>>
>> I think that point 3 is quite well separated from the other two
>> points, which are more git related. I think it is also nicely handled
>> by the current wiki page: http://biopython.org/wiki/Contributing.
> [...]
>> Points 1 and 2 are not so easily separable, but I don't think it's a
>> major problem. Current version of the
>> http://biopython.org/wiki/GitMigration
>> ?touches upon them, but it is meant as a temporary info, so it does
>> not describe how things should be done after we really make the
>> switch. I think we need to spearate these issues (temporary
>> arrangements vs. final desired procedures), so I made a new wiki page:
>> ?http://biopython.org/wiki/GitUsage
>> which is meant as an early draft of such guidelines. This page is
>> meant to serve as a technical tutorial describing typical tasks in
>> biopython development.
>
> Great writeup, and I agree with you on everything up until the last
> point. Why do we need two pages with overlapping information? This
> means we have to do more work to keep them in sync and creates confusion.
> GitMigration is/was our documentation page. If it is the name that
> makes it seem temporary, we should kill GitMigration and re-route all
> wiki links to GitUsage. Then we can continue forward with getting
> the documentation up to par on GitUsage.
>
> Having the disclaimer that the page and migration is in process is
> enough of a warning. When we move to git permanently, we can just
> remove the warnings, update the final links and we will be done.
>
> Brad
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev
>


-- 
 "A man who dares to waste one hour of time has not discovered the
value of life" - Charles Darwin


From bartek at rezolwenta.eu.org  Thu Mar 26 10:49:54 2009
From: bartek at rezolwenta.eu.org (Bartek Wilczynski)
Date: Thu, 26 Mar 2009 15:49:54 +0100
Subject: [Biopython-dev] biopython on github
In-Reply-To: <20090326125707.GE21577@sobchak.mgh.harvard.edu>
References: <320fb6e00903170146x59218aa0m857cab797ad3f440@mail.gmail.com>
	<C5E52504.1F20A%lpritc@scri.ac.uk>
	<20090317124930.GE57054@sobchak.mgh.harvard.edu>
	<6d941f120903241142m2f39213yfd180fcdc7ab7f0e@mail.gmail.com>
	<320fb6e00903241154g1a0f468cy512b29504b8b637a@mail.gmail.com>
	<6d941f120903250839i62f6d8f9i8a5f5b85ff694848@mail.gmail.com>
	<320fb6e00903250845u23dea2a6o5330bfdec0d577ef@mail.gmail.com>
	<20090325215548.GB21577@sobchak.mgh.harvard.edu>
	<8b34ec180903251802h30661c80q51aab573f5c07c5@mail.gmail.com>
	<20090326125707.GE21577@sobchak.mgh.harvard.edu>
Message-ID: <8b34ec180903260749q2b59594fo1d34cd1f721ff3b7@mail.gmail.com>

Hi,

On Thu, Mar 26, 2009 at 1:57 PM, Brad Chapman <chapmanb at 50mail.com> wrote:
> Great writeup, and I agree with you on everything up until the last
> point. Why do we need two pages with overlapping information? This
> means we have to do more work to keep them in sync and creates confusion.
> GitMigration is/was our documentation page. If it is the name that
> makes it seem temporary, we should kill GitMigration and re-route all
> wiki links to GitUsage. Then we can continue forward with getting
> the documentation up to par on GitUsage.
>
> Having the disclaimer that the page and migration is in process is
> enough of a warning. When we move to git permanently, we can just
> remove the warnings, update the final links and we will be done.
>

I agree that two pages with mostly the same stuff is too much. My
original idea was to first extract the "non-temporary" info from
the GitMigration page and expand it into the GitUsage page. It needs a
lot of work but at least the extraction part is don. Now I would
suggest not to kill the GitMigration, but to remove most things from
it and just leave the stuff relevant for the (hopefully not too long)
transitional period.

After a second of thought I decided to go ahead and change the
GitMigration so that it does not overlap with GitUsage. See for
yourself here:
http://biopython.org/wiki/GitMigration

We can revert the changes if people don't like it.

cheers
  Bartek

From biopython at maubp.freeserve.co.uk  Thu Mar 26 11:07:33 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Thu, 26 Mar 2009 15:07:33 +0000
Subject: [Biopython-dev] [BioPython] about the SeqRecord slicing
In-Reply-To: <200903261324.59655.jblanca@btc.upv.es>
References: <200903261248.02279.jblanca@btc.upv.es>
	<320fb6e00903260505j387279b7kfa4c69c33efe5487@mail.gmail.com>
	<200903261324.59655.jblanca@btc.upv.es>
Message-ID: <320fb6e00903260807m64d36b55n41cce7510a6809e3@mail.gmail.com>

On Thu, Mar 26, 2009 at 12:24 PM, Jose Blanca <jblanca at btc.upv.es> wrote:
> On Thursday 26 March 2009 13:05:25 Peter wrote:
>> Can you give me an example of where you want to pull out a single
>> character from a SeqRecord, and its quality? ?I would consider things
>> like this quite elegant:
>>
>> for letter, quality in zip(record.seq,
>> record.letter_annotations("phred_quality") :
>> ? ?#do stuff
>
> I'm implementing a Contig class similar to the Alignment class but with the
> added capability of supporting sequences that do not start and end at the
> same position and with the capability of masking the sequences.
> I'm implementing the __getitem__ method.
> When I request a column I get for all sequences a int slice and I return the
> result of adding them all. I could solve the problem as you suggest. The
> problem is that this Contig class can work also with Seqs and strs (to
> simplify its use when we don't need a full SeqRecord). If SeqRecord behaves
> more like a Seq or a str I wouldn't need to check for the special SeqRecord
> case in the Contig.__getitem__ method.
> Best regards,

If you pull out a column from a Seq or string based alignment, there is no
annotation to worry about, and you can return the column as a Seq or string.
As things stand, if it was a SeqRecord based alignment, having my_string[i],
my_seq[i] and my_seqrecord[i] all return a single letter string is actually
rather nice for generic code - as long as you are happy returning a Seq or a
string for the column.

However, if I understand you, when pulling a column from a SeqRecord
based alignment in addition to the column's sequence you'd like the get the
per-letter-annotations as well.  This assumes that all the SeqRecord objects
in the alignment have the same per-letter-annotation present - some might
have quality and others might not!  But how would you want to store this
new column object?  Using a string or a Seq doesn't support any annotation
 - you *could* use a SeqRecord with no id, name, description, features,
annotation - just a sequence and any common per-letter-annotation.  Is this
what you had in mind?

Peter


From jblanca at btc.upv.es  Thu Mar 26 11:14:13 2009
From: jblanca at btc.upv.es (Jose Blanca)
Date: Thu, 26 Mar 2009 16:14:13 +0100
Subject: [Biopython-dev] [BioPython] about the SeqRecord slicing
In-Reply-To: <320fb6e00903260807m64d36b55n41cce7510a6809e3@mail.gmail.com>
References: <200903261248.02279.jblanca@btc.upv.es>
	<200903261324.59655.jblanca@btc.upv.es>
	<320fb6e00903260807m64d36b55n41cce7510a6809e3@mail.gmail.com>
Message-ID: <200903261614.13454.jblanca@btc.upv.es>

On Thursday 26 March 2009 16:07:33 Peter wrote:
> On Thu, Mar 26, 2009 at 12:24 PM, Jose Blanca <jblanca at btc.upv.es> wrote:
> > On Thursday 26 March 2009 13:05:25 Peter wrote:

> However, if I understand you, when pulling a column from a SeqRecord
> based alignment in addition to the column's sequence you'd like the get the
> per-letter-annotations as well.  This assumes that all the SeqRecord
> objects in the alignment have the same per-letter-annotation present - some
> might have quality and others might not!  But how would you want to store
> this new column object?  Using a string or a Seq doesn't support any
> annotation - you *could* use a SeqRecord with no id, name, description,
> features, annotation - just a sequence and any common
> per-letter-annotation.  Is this what you had in mind?
Yes, that's exactly what I have in mind. Do you see any problem with that 
approach?

-- 
Jose M. Blanca Postigo
Instituto Universitario de Conservacion y
Mejora de la Agrodiversidad Valenciana (COMAV)
Universidad Politecnica de Valencia (UPV)
Edificio CPI (Ciudad Politecnica de la Innovacion), 8E
46022 Valencia (SPAIN)
Tlf.:+34-96-3877000 (ext 88473)

From biopython at maubp.freeserve.co.uk  Thu Mar 26 11:32:23 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Thu, 26 Mar 2009 15:32:23 +0000
Subject: [Biopython-dev] [BioPython] about the SeqRecord slicing
In-Reply-To: <200903261614.13454.jblanca@btc.upv.es>
References: <200903261248.02279.jblanca@btc.upv.es>
	<200903261324.59655.jblanca@btc.upv.es>
	<320fb6e00903260807m64d36b55n41cce7510a6809e3@mail.gmail.com>
	<200903261614.13454.jblanca@btc.upv.es>
Message-ID: <320fb6e00903260832m65c6888dpc856d033ceceda5@mail.gmail.com>

On Thu, Mar 26, 2009 at 3:14 PM, Jose Blanca <jblanca at btc.upv.es> wrote:
> On Thursday 26 March 2009 16:07:33 Peter wrote:
>> However, if I understand you, when pulling a column from a SeqRecord
>> based alignment in addition to the column's sequence you'd like the get the
>> per-letter-annotations as well. ?This assumes that all the SeqRecord
>> objects in the alignment have the same per-letter-annotation present - some
>> might have quality and others might not! ?But how would you want to store
>> this new column object? ?Using a string or a Seq doesn't support any
>> annotation - you *could* use a SeqRecord with no id, name, description,
>> features, annotation - just a sequence and any common
>> per-letter-annotation. ?Is this what you had in mind?
>
> Yes, that's exactly what I have in mind. Do you see any problem with that
> approach?

Well yes.  For your code to work on SeqRecord objects (based on the
verbal description earlier), it needs at least the following changes
to the SeqRecord:

The SeqRecord __getitem__ would have to return a SeqRecord when given
a single integer index, holding a single letter sequence.  What about
the name/id/description and annotations (e.g. organism) - do they
really apply to a single letter from the sequence?  Technically
writing the code to offer this isn't such a problem, but I am
unconvinced this is the best behaviour for normal usage.

Also closely related to this, what would you expect __iter__ to
iterate over?  Currently it acts like iteration over the record's
sequence.

You'd also want the SeqRecord to support __add__ (and __radd__) so
that two SeqRecord objects can be added together.  I have thought
about this before, and it is a *much* more complicated issue due to
the meta data.  In general the only safe and unambiguous choice is to
exclude it from the combined record:
* sequence - just add (using normal rules for adding Seq objects)
* name/id/description - if the two agree, use that?  Otherwise default
to a blank value?
* annotations - for each keyed value, you could combine the entries?
Or just throwing them all away?
* letter_annotations - if an entry is present in both you can combine
it.  Otherwise throw them away?
* features - these could be combined, adjusting the locations for one
record's features as appropriate

I'm not ruling out adding SeqRecord addition, but I don't want to rush
it while we are trying to get Biopython 1.50 done.

Peter


From biopython at maubp.freeserve.co.uk  Thu Mar 26 11:49:49 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Thu, 26 Mar 2009 15:49:49 +0000
Subject: [Biopython-dev] Biopython on Twitter
In-Reply-To: <C5F115B6.1FA1D%lpritc@scri.ac.uk>
References: <C5F115B6.1FA1D%lpritc@scri.ac.uk>
Message-ID: <320fb6e00903260849n683d3e39kf68fd91727970dc7@mail.gmail.com>

On Thu, Mar 26, 2009 at 11:21 AM, Leighton Pritchard <lpritc at scri.ac.uk> wrote:
> Hi all,
>
> There's a fair old bit of chatter on the latest bandwagon: Twitter, about
> Biopython
> (http://search.twitter.com/search?max_id=1393366734&page=1&q=biopython).
> Seeing as both BioPerl and the OBF have 'official' Twitter accounts, it
> might be useful to have a Biopython Twitter account as a way of getting news
> out automatically (there's a python-twitter API:
> http://code.google.com/p/python-twitter/), and as a way of facilitating
> conversation or community around Biopython - suitable representatives of the
> official edifice/holders of the password no doubt to be discussed ;)
>
> Anyhoo, to avoid it being squatted in the interim, I've set up an account in
> Biopython's name, with Peter's email account (thanks, Peter) - he also knows
> the password.
>
> If no-one likes the idea or thinks it worthwhile, or Twitter goes the way of
> Gopher and OS/2 Warp in short order, it can just die on the vine - but given
> the number of tweets mentioning Biopython, it would be a shame for that to
> happen too soon ;)
>
> The Biopython Twitter home page is at http://twitter.com/Biopython

Quite a few people have started following this already - which is fun.  I see
the OBF news page entries are automatically pushed to their twitter account,
http://twitter.com/obf_news plus the BioPerl tagged entries are also pushed
to http://twitter.com/bioperl - I'll get in touch to see how they did
it so we can
have the Biopython news feed automatically echoed to twitter as well.

This servers as a good point to remind/inform you that there are RSS, Atom
etc feeds for the Biopython news - links on http://biopython.org/wiki/News

e.g.
http://news.open-bio.org/news/category/obf-projects/biopython/feed/rdf
http://news.open-bio.org/news/category/obf-projects/biopython/feed/rss
http://news.open-bio.org/news/category/obf-projects/biopython/feed/rss2
http://news.open-bio.org/news/category/obf-projects/biopython/feed/atom

We could probably also echo the CVS (or git) RSS feed into twitter, but I
suspect that would drown out any more interesting tweets.  The RSS feed
is listed on http://biopython.org/wiki/CVS and shown on the wiki too at:
http://biopython.org/wiki/Tracking_CVS_commits (not sure how often this
gets updated).  The feed itself is here:
http://biopython.open-bio.org/CVS2RSS/biopython.rss

Peter

From lpritc at scri.ac.uk  Thu Mar 26 12:31:07 2009
From: lpritc at scri.ac.uk (Leighton Pritchard)
Date: Thu, 26 Mar 2009 16:31:07 +0000
Subject: [Biopython-dev] Biopython on Twitter
In-Reply-To: <320fb6e00903260849n683d3e39kf68fd91727970dc7@mail.gmail.com>
Message-ID: <C5F15E4B.1FA8B%lpritc@scri.ac.uk>

Hi all,

It's great to see that people have picked up on the Biopython Twitter
account already - I hope that it proves useful in the longer term.

Regarding the social etiquette of Twitter, and the ease with which
'following' can be taken to imply 'approval' I wonder if it would be a good
policy to restrict the Twitter accounts that Biopython follows only to those
representing organisations or groups.  Following some individuals and not
others might be seen to privilege a self-selecting group, cabal or 'elite',
even the accidental suggestion of which I think would be best avoided.

On 26/03/2009 15:49, "Peter" <biopython at maubp.freeserve.co.uk> wrote:
> 
> Quite a few people have started following this already - which is fun.  I see
> the OBF news page entries are automatically pushed to their twitter account,
> http://twitter.com/obf_news plus the BioPerl tagged entries are also pushed
> to http://twitter.com/bioperl - I'll get in touch to see how they did
> it so we can

[...]

> We could probably also echo the CVS (or git) RSS feed into twitter, but I
> suspect that would drown out any more interesting tweets.

Signal to noise is apparently not an issue that bothers very many Tweeters,
but I see no harm in starting a trend ;)

L.

-- 
Dr Leighton Pritchard MRSC
D131, Plant Pathology Programme, SCRI
Errol Road, Invergowrie, Perth and Kinross, Scotland, DD2 5DA
e:lpritc at scri.ac.uk       w:http://www.scri.ac.uk/staff/leightonpritchard
gpg/pgp: 0xFEFC205C       tel:+44(0)1382 562731 x2405


______________________________________________________
SCRI, Invergowrie, Dundee, DD2 5DA.  
The Scottish Crop Research Institute is a charitable company limited by guarantee. 
Registered in Scotland No: SC 29367.
Recognised by the Inland Revenue as a Scottish Charity No: SC 006662.


DISCLAIMER:

This email is from the Scottish Crop Research Institute, but the views expressed by the sender are not necessarily the views of SCRI and its subsidiaries.  This email and any files transmitted with it are confidential to the intended recipient at the e-mail address to which it has been addressed.  It may not be disclosed or used by any other than that
addressee.
If you are not the intended recipient you are requested to preserve this confidentiality and you must not use, disclose, copy, print or rely on
this e-mail in any way. Please notify postmaster at scri.ac.uk quoting the name of the sender and delete the email from your system.

Although SCRI has taken reasonable precautions to ensure no viruses are present in this email, neither the Institute nor the sender accepts any responsibility for any viruses, and it is your responsibility to scan the email and the attachments (if any).
______________________________________________________

From jblanca at btc.upv.es  Fri Mar 27 04:22:27 2009
From: jblanca at btc.upv.es (Jose Blanca)
Date: Fri, 27 Mar 2009 09:22:27 +0100
Subject: [Biopython-dev] [BioPython] about the SeqRecord slicing
In-Reply-To: <320fb6e00903260832m65c6888dpc856d033ceceda5@mail.gmail.com>
References: <200903261248.02279.jblanca@btc.upv.es>
	<200903261614.13454.jblanca@btc.upv.es>
	<320fb6e00903260832m65c6888dpc856d033ceceda5@mail.gmail.com>
Message-ID: <200903270922.27152.jblanca@btc.upv.es>

On Thursday 26 March 2009 16:32:23 Peter wrote:

> The SeqRecord __getitem__ would have to return a SeqRecord when given
> a single integer index, holding a single letter sequence.  What about
> the name/id/description and annotations (e.g. organism) - do they
> really apply to a single letter from the sequence?  Technically
> writing the code to offer this isn't such a problem, but I am
> unconvinced this is the best behaviour for normal usage.
You're right, I was not thinking on the rest of the properties because I don't 
need them. They're a problem when slicing and adding SeqRecords. But they're 
also a problem in standard slicing. Should the annotations be kept when the 
SeqRecord is sliced? Are they still relevant? None of the behaviours will be 
ok for all the cases.

> Also closely related to this, what would you expect __iter__ to
> iterate over?  Currently it acts like iteration over the record's
> sequence.
The SeqRecord can already hold a sequence of length one, so we have the same 
problem. In fact I could do seq_rec[n:n+1] and I would obtain the SeqRecord 
that I want. 

> You'd also want the SeqRecord to support __add__ (and __radd__) so
> that two SeqRecord objects can be added together.  I have thought
> about this before, and it is a *much* more complicated issue due to
> the meta data.  In general the only safe and unambiguous choice is to
> exclude it from the combined record:
> * sequence - just add (using normal rules for adding Seq objects)
> * name/id/description - if the two agree, use that?  Otherwise default
> to a blank value?
> * annotations - for each keyed value, you could combine the entries?
> Or just throwing them all away?
> * letter_annotations - if an entry is present in both you can combine
> it.  Otherwise throw them away?
> * features - these could be combined, adjusting the locations for one
> record's features as appropriate
As I said before I think that the same problem is presented when you do a 
slice. If I have the sequence of a gene named X with some annotations and I 
slice a part, is still be named geneX? Should the annotations be kept?

> I'm not ruling out adding SeqRecord addition, but I don't want to rush
> it while we are trying to get Biopython 1.50 done.
That's quite sensible. I think that is a good thing to discuss all this 
issues, I keep learning a lot from you.
Best regards,

-- 
Jose M. Blanca Postigo
Instituto Universitario de Conservacion y
Mejora de la Agrodiversidad Valenciana (COMAV)
Universidad Politecnica de Valencia (UPV)
Edificio CPI (Ciudad Politecnica de la Innovacion), 8E
46022 Valencia (SPAIN)
Tlf.:+34-96-3877000 (ext 88473)

From biopython at maubp.freeserve.co.uk  Fri Mar 27 06:29:10 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Fri, 27 Mar 2009 10:29:10 +0000
Subject: [Biopython-dev] [BioPython] about the SeqRecord slicing
In-Reply-To: <200903270922.27152.jblanca@btc.upv.es>
References: <200903261248.02279.jblanca@btc.upv.es>
	<200903261614.13454.jblanca@btc.upv.es>
	<320fb6e00903260832m65c6888dpc856d033ceceda5@mail.gmail.com>
	<200903270922.27152.jblanca@btc.upv.es>
Message-ID: <320fb6e00903270329r74a48dcerf8e00a0ba3776af4@mail.gmail.com>

On Fri, Mar 27, 2009 at 8:22 AM, Jose Blanca <jblanca at btc.upv.es> wrote:
> On Thursday 26 March 2009 16:32:23 Peter wrote:
>
>> You'd also want the SeqRecord to support __add__ (and __radd__) so
>> that two SeqRecord objects can be added together. ?I have thought
>> about this before, and it is a *much* more complicated issue due to
>> the meta data. ?In general the only safe and unambiguous choice is to
>> exclude it from the combined record:
>> * sequence - just add (using normal rules for adding Seq objects)
>> * name/id/description - if the two agree, use that? ?Otherwise default
>> to a blank value?
>> * annotations - for each keyed value, you could combine the entries?
>> Or just throwing them all away?
>> * letter_annotations - if an entry is present in both you can combine
>> it. ?Otherwise throw them away?
>> * features - these could be combined, adjusting the locations for one
>> record's features as appropriate
>
> As I said before I think that the same problem is presented when you do a
> slice. If I have the sequence of a gene named X with some annotations and I
> slice a part, is still be named geneX? Should the annotations be kept?

The problems about the annotation when slicing a SeqRecord are similar, but
I think things are worse when adding two SeqRecords together.

For slicing, there are a few sub of cases:
- per-letter-annotation can be sliced too - easy.
- features - we retain only features fully inside the new sub-sequence (the
  border line features which cross the slice boundary are a small problem -
  excluding them is the simplest solution to code and explain).
- id/name - debatable.  Currently kept.
- description - debatable.  Consider a description which says "whole genome",
  that doesn't really apply to a partial sequence.  On the other hand, it may.
  Currently kept for the sub-record.
- annotations - again debatable.    Without context information, we can't guess.
  The only sensible options are keep it all (as in CVS) or none of it.

I think it is worth keeping the id/name in general (consider typical use cases
like cropping a domain from a gene, or cropping columns off an alignment).
I would be OK with dropping the contents of the annotations dictionary and
description is order to avoid ambiguity, but this would prevent certain tasks.

Peter


From sbassi at clubdelarazon.org  Fri Mar 27 09:31:01 2009
From: sbassi at clubdelarazon.org (Sebastian Bassi)
Date: Fri, 27 Mar 2009 10:31:01 -0300
Subject: [Biopython-dev] [BioPython] about the SeqRecord slicing
In-Reply-To: <320fb6e00903270329r74a48dcerf8e00a0ba3776af4@mail.gmail.com>
References: <200903261248.02279.jblanca@btc.upv.es>
	<200903261614.13454.jblanca@btc.upv.es>
	<320fb6e00903260832m65c6888dpc856d033ceceda5@mail.gmail.com>
	<200903270922.27152.jblanca@btc.upv.es>
	<320fb6e00903270329r74a48dcerf8e00a0ba3776af4@mail.gmail.com>
Message-ID: <9e2f512b0903270631l2b806f55oc02b1e1396bd0bfb@mail.gmail.com>

On Fri, Mar 27, 2009 at 7:29 AM, Peter <biopython at maubp.freeserve.co.uk> wrote:
....
> - id/name - debatable.  Currently kept.
> - description - debatable.  Consider a description which says "whole genome",
>  that doesn't really apply to a partial sequence.  On the other hand, it may.
>  Currently kept for the sub-record.

I think is up to the user to keep updated the id/name/descripption
field when slicing a sequence.

.....
> I would be OK with dropping the contents of the annotations dictionary and
> description is order to avoid ambiguity, but this would prevent certain tasks.

Another option is to make this behavior optional (I mean, select to
keep or to drop the annotations, but default I would drop them).

From biopython at maubp.freeserve.co.uk  Fri Mar 27 09:57:30 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Fri, 27 Mar 2009 13:57:30 +0000
Subject: [Biopython-dev] [BioPython] about the SeqRecord slicing
In-Reply-To: <9e2f512b0903270631l2b806f55oc02b1e1396bd0bfb@mail.gmail.com>
References: <200903261248.02279.jblanca@btc.upv.es>
	<200903261614.13454.jblanca@btc.upv.es>
	<320fb6e00903260832m65c6888dpc856d033ceceda5@mail.gmail.com>
	<200903270922.27152.jblanca@btc.upv.es>
	<320fb6e00903270329r74a48dcerf8e00a0ba3776af4@mail.gmail.com>
	<9e2f512b0903270631l2b806f55oc02b1e1396bd0bfb@mail.gmail.com>
Message-ID: <320fb6e00903270657j1aa06199o4996f11c25bf2a3b@mail.gmail.com>

On Fri, Mar 27, 2009 at 1:31 PM, Sebastian Bassi wrote:
> I think is up to the user to keep updated the id/name/descripption
> field when slicing a sequence.

If you make a new SeqRecord by first slicing a Seq object (which is
how you have to do it with Biopython 1.49 or older), then dealing
with ALL the annotation is explicitly in the hands of the user.

Or are you saying when slicing a SeqRecord you wouldn't expect
the id/name/description to be preserved for the sub-record?

> .....
>> I would be OK with dropping the contents of the annotations
>> dictionary and description is order to avoid ambiguity, but
>> this would prevent certain tasks.
>
> Another option is to make this behavior optional (I mean, select to
> keep or to drop the annotations, but default I would drop them).

How would you make it optional?  As an extra non-standard argument
to __getitem__?  e.g.something like my_record[10:50, annotation=False]?
That seems nasty.

I am sympathetic to dropping the annotations dictionary when creating
a "child" SeqRecord when slicing its parent.  There is also the database
cross reference list (which i forgot on my last email).  Again, I wouldn't
object to dropping this for a sliced sub-record.

If we did drop the annotations and dbxrefs when slicing, the user can
manually choose to explicitly copy them from the parent object if the
do want them.

Peter

From jblanca at btc.upv.es  Fri Mar 27 10:02:57 2009
From: jblanca at btc.upv.es (Jose Blanca)
Date: Fri, 27 Mar 2009 15:02:57 +0100
Subject: [Biopython-dev] [BioPython] about the SeqRecord slicing
In-Reply-To: <320fb6e00903270657j1aa06199o4996f11c25bf2a3b@mail.gmail.com>
References: <200903261248.02279.jblanca@btc.upv.es>
	<9e2f512b0903270631l2b806f55oc02b1e1396bd0bfb@mail.gmail.com>
	<320fb6e00903270657j1aa06199o4996f11c25bf2a3b@mail.gmail.com>
Message-ID: <200903271502.57872.jblanca@btc.upv.es>

On Friday 27 March 2009 14:57:30 Peter wrote:

> How would you make it optional?  As an extra non-standard argument
> to __getitem__?  e.g.something like my_record[10:50, annotation=False]?
> That seems nasty.
That's very nasty, not pythonic, and adds complexity to the api.

> I am sympathetic to dropping the annotations dictionary when creating
> a "child" SeqRecord when slicing its parent.  There is also the database
> cross reference list (which i forgot on my last email).  Again, I wouldn't
> object to dropping this for a sliced sub-record.
>
> If we did drop the annotations and dbxrefs when slicing, the user can
> manually choose to explicitly copy them from the parent object if the
> do want them.
I also think that dropping all that stuff when slicing or adding is the best 
behaviour.

-- 
Jose M. Blanca Postigo
Instituto Universitario de Conservacion y
Mejora de la Agrodiversidad Valenciana (COMAV)
Universidad Politecnica de Valencia (UPV)
Edificio CPI (Ciudad Politecnica de la Innovacion), 8E
46022 Valencia (SPAIN)
Tlf.:+34-96-3877000 (ext 88473)

From sbassi at clubdelarazon.org  Fri Mar 27 10:17:55 2009
From: sbassi at clubdelarazon.org (Sebastian Bassi)
Date: Fri, 27 Mar 2009 11:17:55 -0300
Subject: [Biopython-dev] [BioPython] about the SeqRecord slicing
In-Reply-To: <320fb6e00903270657j1aa06199o4996f11c25bf2a3b@mail.gmail.com>
References: <200903261248.02279.jblanca@btc.upv.es>
	<200903261614.13454.jblanca@btc.upv.es>
	<320fb6e00903260832m65c6888dpc856d033ceceda5@mail.gmail.com>
	<200903270922.27152.jblanca@btc.upv.es>
	<320fb6e00903270329r74a48dcerf8e00a0ba3776af4@mail.gmail.com>
	<9e2f512b0903270631l2b806f55oc02b1e1396bd0bfb@mail.gmail.com>
	<320fb6e00903270657j1aa06199o4996f11c25bf2a3b@mail.gmail.com>
Message-ID: <9e2f512b0903270717s13c82d19v7c48dddda4a8fcb@mail.gmail.com>

On Fri, Mar 27, 2009 at 10:57 AM, Peter <biopython at maubp.freeserve.co.uk> wrote:
> How would you make it optional?  As an extra non-standard argument
> to __getitem__?  e.g.something like my_record[10:50, annotation=False]?
> That seems nasty.

Yes it is nasty this way, I never meant to do it in __getitem__.
Anyway I can't think a nice and intuitive way to do it.

> If we did drop the annotations and dbxrefs when slicing, the user can
> manually choose to explicitly copy them from the parent object if the
> do want them.

Yes, that is OK.

From biopython at maubp.freeserve.co.uk  Fri Mar 27 10:24:13 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Fri, 27 Mar 2009 14:24:13 +0000
Subject: [Biopython-dev] [BioPython] about the SeqRecord slicing
In-Reply-To: <9e2f512b0903270717s13c82d19v7c48dddda4a8fcb@mail.gmail.com>
References: <200903261248.02279.jblanca@btc.upv.es>
	<200903261614.13454.jblanca@btc.upv.es>
	<320fb6e00903260832m65c6888dpc856d033ceceda5@mail.gmail.com>
	<200903270922.27152.jblanca@btc.upv.es>
	<320fb6e00903270329r74a48dcerf8e00a0ba3776af4@mail.gmail.com>
	<9e2f512b0903270631l2b806f55oc02b1e1396bd0bfb@mail.gmail.com>
	<320fb6e00903270657j1aa06199o4996f11c25bf2a3b@mail.gmail.com>
	<9e2f512b0903270717s13c82d19v7c48dddda4a8fcb@mail.gmail.com>
Message-ID: <320fb6e00903270724r432b4daco920648d921890623@mail.gmail.com>

On Fri, Mar 27, 2009 at 2:17 PM, Sebastian Bassi
<sbassi at clubdelarazon.org> wrote:
> On Fri, Mar 27, 2009 at 10:57 AM, Peter <biopython at maubp.freeserve.co.uk> wrote:
>> How would you make it optional? ?As an extra non-standard argument
>> to __getitem__? ?e.g.something like my_record[10:50, annotation=False]?
>> That seems nasty.
>
> Yes it is nasty this way, I never meant to do it in __getitem__.
> Anyway I can't think a nice and intuitive way to do it.

Me neither right now.

>> If we did drop the annotations and dbxrefs when slicing, the user can
>> manually choose to explicitly copy them from the parent object if the
>> do want them.
>
> Yes, that is OK.

Jose agrees, so that makes a mini consensus (at least amongst everyone who
has tried the CVS code and posted to this thread).  I've made that
change in CVS,
see Bio/SeqRecord.py revision 1.31.
http://cvs.biopython.org/cgi-bin/viewcvs/viewcvs.cgi/biopython/Bio/SeqRecord.py?cvsroot=biopython

As I said before, I want to preserve the id and name - preserving
these would be key
for cross referencing the sub-record back to its parent.

Do either of you think we should also discard the description?

Peter


From eric.talevich at gmail.com  Fri Mar 27 11:16:19 2009
From: eric.talevich at gmail.com (Eric Talevich)
Date: Fri, 27 Mar 2009 11:16:19 -0400
Subject: [Biopython-dev] [BioPython] about the SeqRecord slicing
In-Reply-To: <320fb6e00903270724r432b4daco920648d921890623@mail.gmail.com>
References: <200903261248.02279.jblanca@btc.upv.es>
	<200903261614.13454.jblanca@btc.upv.es>
	<320fb6e00903260832m65c6888dpc856d033ceceda5@mail.gmail.com>
	<200903270922.27152.jblanca@btc.upv.es>
	<320fb6e00903270329r74a48dcerf8e00a0ba3776af4@mail.gmail.com>
	<9e2f512b0903270631l2b806f55oc02b1e1396bd0bfb@mail.gmail.com>
	<320fb6e00903270657j1aa06199o4996f11c25bf2a3b@mail.gmail.com>
	<9e2f512b0903270717s13c82d19v7c48dddda4a8fcb@mail.gmail.com>
	<320fb6e00903270724r432b4daco920648d921890623@mail.gmail.com>
Message-ID: <3f6baf360903270816x4fcfd8ccg5906a9edb53709d4@mail.gmail.com>

On Fri, Mar 27, 2009 at 10:24 AM, Peter <biopython at maubp.freeserve.co.uk>wrote:

> On Fri, Mar 27, 2009 at 2:17 PM, Sebastian Bassi
> <sbassi at clubdelarazon.org> wrote:
> > On Fri, Mar 27, 2009 at 10:57 AM, Peter <biopython at maubp.freeserve.co.uk>
> wrote:
> >> How would you make it optional?  As an extra non-standard argument
> >> to __getitem__?  e.g.something like my_record[10:50, annotation=False]?
> >> That seems nasty.
> >
> > Yes it is nasty this way, I never meant to do it in __getitem__.
> > Anyway I can't think a nice and intuitive way to do it.
>
> Me neither right now.
>
> >> If we did drop the annotations and dbxrefs when slicing, the user can
> >> manually choose to explicitly copy them from the parent object if the
> >> do want them.
> >
> > Yes, that is OK.
>
>
One way to allow non-default options for adding and slicing is to provide a
couple of functions at the class or module level (classmethod, staticmethod,
plain ol' function) that have the necessary keyword arguments. These
functions would do the same thing by default as the corresponding syntax,
and the syntax-friendly magic methods would just pass their arguments
straight to these functions. This makes the syntax pretty for the common
cases, and makes the nonstandard stuff visually obvious.

Examples:

my_record.slice(10, 50) == my_record[10:50]
my_record.slice(10, 50, annotation=True) == my_record[10:50] plus updated
annotations

my_record.add(other_record) == my_record + other_record
my_record.add(other_record, annotation=True) == my_record + other_record,
keeping annotations
my_record.slice(10, 50, annotation=True).add(
    my_record.slice(100, 200, annotation=True),
    annotation=True) == my_record[10:50] + my_record[100:200], keeping all
annotations (a pain otherwise)

From biopython at maubp.freeserve.co.uk  Fri Mar 27 11:51:53 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Fri, 27 Mar 2009 15:51:53 +0000
Subject: [Biopython-dev] [BioPython] about the SeqRecord slicing
In-Reply-To: <3f6baf360903270816x4fcfd8ccg5906a9edb53709d4@mail.gmail.com>
References: <200903261248.02279.jblanca@btc.upv.es>
	<200903261614.13454.jblanca@btc.upv.es>
	<320fb6e00903260832m65c6888dpc856d033ceceda5@mail.gmail.com>
	<200903270922.27152.jblanca@btc.upv.es>
	<320fb6e00903270329r74a48dcerf8e00a0ba3776af4@mail.gmail.com>
	<9e2f512b0903270631l2b806f55oc02b1e1396bd0bfb@mail.gmail.com>
	<320fb6e00903270657j1aa06199o4996f11c25bf2a3b@mail.gmail.com>
	<9e2f512b0903270717s13c82d19v7c48dddda4a8fcb@mail.gmail.com>
	<320fb6e00903270724r432b4daco920648d921890623@mail.gmail.com>
	<3f6baf360903270816x4fcfd8ccg5906a9edb53709d4@mail.gmail.com>
Message-ID: <320fb6e00903270851i47db9121p6d272b5f7095a5d3@mail.gmail.com>

On Fri, Mar 27, 2009 at 3:16 PM, Eric Talevich <eric.talevich at gmail.com> wrote:
> One way to allow non-default options for adding and slicing is to provide a
> couple of functions at the class or module level (classmethod, staticmethod,
> plain ol' function) that have the necessary keyword arguments. These
> functions would do the same thing by default as the corresponding syntax,
> and the syntax-friendly magic methods would just pass their arguments
> straight to these functions. This makes the syntax pretty for the common
> cases, and makes the nonstandard stuff visually obvious.
>
> Examples:
>
> my_record.slice(10, 50) == my_record[10:50]
> my_record.slice(10, 50, annotation=True) == my_record[10:50] plus updated
> annotations
> ...

I think I understand your idea, but I'm not very keen on adding slice
and add methods as alternatives to __getitem__ and __add__.

As things stand (with CVS after the change an hour ago), if you want
the annotations dictionary copied with a slice you must do this
explicitly:

>>> from Bio import SeqIO
>>> my_record = SeqIO.read(open("NC_005816.gb"),"genbank")
>>> my_record
SeqRecord(seq=Seq('TGTAACGAACGGTGCAATAGTGATCCACACCCAACGCCTGAAATCAGATCCAGG...CTG',
IUPACAmbiguousDNA()), id='NC_005816.1', name='NC_005816',
description='Yersinia pestis biovar Microtus str. 91001 plasmid pPCP1,
complete sequence.', dbxrefs=['Project:10638'])
>>> len(my_record)
9609
>>> len(my_record.features)
29
>>> len(my_record.annotations)
11
>>> len(my_record.dbxrefs)
1

Doing a slice will not copy/preserve the annotations dict or dbxrefs list:

>>> sub_record = my_record[1000:2000]
>>> sub_record
SeqRecord(seq=Seq('GAAAAAAGAGTATGACGTGCATCTTGATGAAAATCTGGTGAACTTCGACAAACA...GGA',
IUPACAmbiguousDNA()), id='NC_005816.1', name='NC_005816',
description='Yersinia pestis biovar Microtus str. 91001 plasmid pPCP1,
complete sequence.', dbxrefs=[])
>>> len(sub_record)
1000
>>> len(sub_record.features)
2
>>> assert not sub_record.annotations and not sub_record.dbxrefs

You can then choose to blindly reuse the annotations and dbxrefs if you want to:

>>> sub_record.annotations = my_record.anntations #shares the dict
>>> sub_record.dbxrefs = my_record.dbxrefs #shares the list

or as a simple copy:

>>> sub_record.annotations = my_record.annotations.copy()
>>> sub_record.dbxrefs = my_record.dbxrefs[:]

The good thing about this is it makes you think about the annotations,
and which (if any) are appropriate to transfer to the sub-record.  As
per my earlier email, maybe we should do the same with the
description?

Peter

From chapmanb at 50mail.com  Sat Mar 28 21:06:52 2009
From: chapmanb at 50mail.com (Brad Chapman)
Date: Sat, 28 Mar 2009 21:06:52 -0400
Subject: [Biopython-dev] Biopython on Twitter
In-Reply-To: <C5F15E4B.1FA8B%lpritc@scri.ac.uk>
References: <320fb6e00903260849n683d3e39kf68fd91727970dc7@mail.gmail.com>
	<C5F15E4B.1FA8B%lpritc@scri.ac.uk>
Message-ID: <20090329010652.GA914@kunkel>

Hi all;
It is great we are exploring getting news out about Biopython in
additional ways. One thing this can really help with is recognizing
contributions to Biopython. Another is pointing out interesting
discussion threads on the mailing lists and getting others involved.

Do you think it would be worthwhile to "advertise" on the main list
for someone interested in coordinating news and communication? They
could do things like:

- Send updates through twitter on day to day activities, like:

  Bartek and Tiago cleaned up documentation on Git submissions 
    (link to wiki page)
  Peter, Jose and Sebastian are discussing slicing on SeqRecords
    (link to mailing list discussion)

- Send out monthly news reports on new items in Biopython, in the
  style of Peter's update recently:
  http://news.open-bio.org/news/2009/03/biopython-next-gen-sequencing/
  (but it should also give credit to the fine people who coded it)

Perhaps there are members who are interested in Biopython and follow
what is going on but aren't coders. This would be a way to get
involved, and also take some of the burden off Peter. What do 
y'all think?

Brad
 

> 
> It's great to see that people have picked up on the Biopython Twitter
> account already - I hope that it proves useful in the longer term.
> 
> Regarding the social etiquette of Twitter, and the ease with which
> 'following' can be taken to imply 'approval' I wonder if it would be a good
> policy to restrict the Twitter accounts that Biopython follows only to those
> representing organisations or groups.  Following some individuals and not
> others might be seen to privilege a self-selecting group, cabal or 'elite',
> even the accidental suggestion of which I think would be best avoided.
> 
> On 26/03/2009 15:49, "Peter" <biopython at maubp.freeserve.co.uk> wrote:
> > 
> > Quite a few people have started following this already - which is fun.  I see
> > the OBF news page entries are automatically pushed to their twitter account,
> > http://twitter.com/obf_news plus the BioPerl tagged entries are also pushed
> > to http://twitter.com/bioperl - I'll get in touch to see how they did
> > it so we can
> 
> [...]
> 
> > We could probably also echo the CVS (or git) RSS feed into twitter, but I
> > suspect that would drown out any more interesting tweets.
> 
> Signal to noise is apparently not an issue that bothers very many Tweeters,
> but I see no harm in starting a trend ;)
> 
> L.
> 
> -- 
> Dr Leighton Pritchard MRSC
> D131, Plant Pathology Programme, SCRI
> Errol Road, Invergowrie, Perth and Kinross, Scotland, DD2 5DA
> e:lpritc at scri.ac.uk       w:http://www.scri.ac.uk/staff/leightonpritchard
> gpg/pgp: 0xFEFC205C       tel:+44(0)1382 562731 x2405
> 
> 
> ______________________________________________________
> SCRI, Invergowrie, Dundee, DD2 5DA.  
> The Scottish Crop Research Institute is a charitable company limited by guarantee. 
> Registered in Scotland No: SC 29367.
> Recognised by the Inland Revenue as a Scottish Charity No: SC 006662.
> 
> 
> DISCLAIMER:
> 
> This email is from the Scottish Crop Research Institute, but the views expressed by the sender are not necessarily the views of SCRI and its subsidiaries.  This email and any files transmitted with it are confidential to the intended recipient at the e-mail address to which it has been addressed.  It may not be disclosed or used by any other than that
> addressee.
> If you are not the intended recipient you are requested to preserve this confidentiality and you must not use, disclose, copy, print or rely on
> this e-mail in any way. Please notify postmaster at scri.ac.uk quoting the name of the sender and delete the email from your system.
> 
> Although SCRI has taken reasonable precautions to ensure no viruses are present in this email, neither the Institute nor the sender accepts any responsibility for any viruses, and it is your responsibility to scan the email and the attachments (if any).
> ______________________________________________________
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev

From biopython at maubp.freeserve.co.uk  Sun Mar 29 18:58:47 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Sun, 29 Mar 2009 23:58:47 +0100
Subject: [Biopython-dev] Biopython on Twitter
In-Reply-To: <20090329010652.GA914@kunkel>
References: <320fb6e00903260849n683d3e39kf68fd91727970dc7@mail.gmail.com>
	<C5F15E4B.1FA8B%lpritc@scri.ac.uk> <20090329010652.GA914@kunkel>
Message-ID: <320fb6e00903291558o6299575dq80eea647b1c6a900@mail.gmail.com>

On Sun, Mar 29, 2009 at 2:06 AM, Brad Chapman <chapmanb at 50mail.com> wrote:
> Hi all;
> It is great we are exploring getting news out about Biopython in
> additional ways. One thing this can really help with is recognizing
> contributions to Biopython. Another is pointing out interesting
> discussion threads on the mailing lists and getting others involved.

Do you think the recent release notes and NEWS file entries have been
a bit too impersonal?  We can certainly be a bit more explicit if people
that is a good thing. For example, should we mention Bartek by name
in the paragraph on the new Bio.Motif module?

This is linked to from the wiki's news page BTW:
http://biopython.open-bio.org/SRC/biopython/NEWS
http://cvs.biopython.org/cgi-bin/viewcvs/viewcvs.cgi/biopython/NEWS?cvsroot=biopython

> Do you think it would be worthwhile to "advertise" on the main list
> for someone interested in coordinating news and communication?
> ... Perhaps there are members who are interested in Biopython and
> follow what is going on but aren't coders. This would be a way to
> get involved, ...

Are you up for the job yourself Brad?  From your own blog we know
you can and do write regularly anyway ;)   Would you like an account
on the OBF news server? Email me off list and we can sort that out.

In terms of micro-blogging via twitter, you sound like you have a better
feel for this than me - I don't even have a personal twitter account.

Monthly news posts (perhaps cc'd to the announcement email list)
would be a nice idea - especially if we can encourage more lurkers
to speak up.  For a while BioPerl had something like this going
(digest emails or something), but it needs a pretty dedicated person
or team.  In the meantime as you've noticed I've started making
more use of our news facility myself...

Peter

From biopython at maubp.freeserve.co.uk  Mon Mar 30 06:26:09 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Mon, 30 Mar 2009 11:26:09 +0100
Subject: [Biopython-dev] Testing Biopython with NumPy 1.3
Message-ID: <320fb6e00903300326x4cb5eb95r87dbd5c95d5379d9@mail.gmail.com>

Hi all,

NumPy 1.3 is about to be released, so we should try and make sure the
forthcoming
Biopython 1.50 release works with it.  Of particular interest, this
will be the first version
of NumPy to support Python 2.6 on Windows, so we will hopefully be
able to include
a Python 2.6 Windows installer for Biopython 1.50 :)

There is a release candidate out for NumPy 1.3, but so far no Windows
installer for
Python 2.6, but in the meantime I've just tried the NumPy 1.3 beta
release instead.
The good news is everything seems to compile with MinGW, but unfortunately
test_Cluster.py is failing on the second line of
Bio/Cluster/__init__.py, "from cluster
import *".  This could be a hiccup with NumPy itself - I am using
their beta after all,
or perhaps they have changed something.

To try and narrow down the problem, has anyone else tried NumPy 1.3 (beta or
release candidate) with the latest Biopython from CVS (on any platform)?

Thanks,

Peter

From biopython at maubp.freeserve.co.uk  Mon Mar 30 06:29:02 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Mon, 30 Mar 2009 11:29:02 +0100
Subject: [Biopython-dev] Testing Biopython with NumPy 1.3
In-Reply-To: <320fb6e00903300326x4cb5eb95r87dbd5c95d5379d9@mail.gmail.com>
References: <320fb6e00903300326x4cb5eb95r87dbd5c95d5379d9@mail.gmail.com>
Message-ID: <320fb6e00903300329ra19fe06j1cd12477e591afdf@mail.gmail.com>

On Mon, Mar 30, 2009 at 11:26 AM, Peter wrote:
> Hi all,
>
> NumPy 1.3 is about to be released, so we should try and make sure the
> forthcoming Biopython 1.50 release works with it. ?Of particular interest,
> this will be the first version of NumPy to support Python 2.6 on Windows,
> so we will hopefully be able to include a Python 2.6 Windows installer
> for Biopython 1.50 :)
>
> There is a release candidate out for NumPy 1.3, but so far no Windows
> installer for Python 2.6, but in the meantime I've just tried the NumPy 1.3
> beta release instead.

David Cournapeau has just updated sourceforge - so I will try again with
the actual release candidate instead of just the beta...

Peter


From biopython at maubp.freeserve.co.uk  Mon Mar 30 06:38:58 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Mon, 30 Mar 2009 11:38:58 +0100
Subject: [Biopython-dev] Testing Biopython with NumPy 1.3
In-Reply-To: <320fb6e00903300329ra19fe06j1cd12477e591afdf@mail.gmail.com>
References: <320fb6e00903300326x4cb5eb95r87dbd5c95d5379d9@mail.gmail.com>
	<320fb6e00903300329ra19fe06j1cd12477e591afdf@mail.gmail.com>
Message-ID: <320fb6e00903300338v35b14fa2yc0d2ba68925808da@mail.gmail.com>

On Mon, Mar 30, 2009 at 11:29 AM, Peter <biopython at maubp.freeserve.co.uk> wrote:
> David Cournapeau has just updated sourceforge - so I will try again with
> the actual release candidate instead of just the beta...

Nope - using numpy-1.3.0rc1-win32-superpack-python2.6.exe on Windows
XP, Python 2.6 using the python.org installer, with Biopython compiled
with cygwin mingw32 as normal, same error - test_Cluster.py is failing
on the second line of Bio/Cluster/__init__.py, "from cluster import
*".

So the question stands - has anyone else tried Biopython (from CVS)
with NumPy 1.3 (beta or release candidate) on any platform?  I should
be able to check it tonight on a Linux machine myself without too much
trouble... but a few more data points wouldn't hurt ;)

Peter

From biopython at maubp.freeserve.co.uk  Mon Mar 30 07:15:06 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Mon, 30 Mar 2009 12:15:06 +0100
Subject: [Biopython-dev] test_Nexus.py and NamedTemporaryFile mode
Message-ID: <320fb6e00903300415i350610c0i4c2aeed1834011da@mail.gmail.com>

I've been running the test suite again on Windows, and was reminded of
this open issue with NamedTemporaryFile on Windows...

On Fri, Feb 13, 2009 at 5:02 PM, Peter <biopython at maubp.freeserve.co.uk> wrote:
> On Tue, Feb 10, 2009 at 11:25 AM, Michiel de Hoon <mjldehoon at yahoo.com> wrote:
>>
>>> The test_Nexus tearDown used to make sure the temp output
>>> files were removed. ?This is important on Windows which
>>> does not do this automatically. ?I see you now allocate
>>> "random" filenames using tempfile.NamedTemporaryFile(...)
>>> so presumably we would need to record these so that the
>>> tearDown method knows what temp files to remove.
>>
>> From reading the Python documentation, the file created by
>> tempfile.NamedTemporaryFile is removed automatically
>> when the file handle is closed, even on Windows.
>
> That's good to know. ?On a related point, I've just found
> test_Nexus.py is failing on Windows XP with Python 2.6 (but is fine
> with Python 2.3, 2.4 and 2.5):
>
> C:\repository\biopython\Tests>c:\python26\python test_Nexus.py
> Test Nexus module ... ERROR
> Test Tree module. ... ok
>
> ======================================================================
> ERROR: Test Nexus module
> ----------------------------------------------------------------------
> Traceback (most recent call last):
> ?File "test_Nexus.py", line 114, in test_NexusTest1
> ? ?f1=tempfile.NamedTemporaryFile(mode='r+w+b')
> ?File "c:\python26\lib\tempfile.py", line 445, in NamedTemporaryFile
> ? ?file = _os.fdopen(fd, mode, bufsize)
> OSError: [Errno 22] Invalid argument
>
> ----------------------------------------------------------------------
> Ran 2 tests in 0.016s
>
> FAILED (errors=1)

You can recreate this at the python 2.6 prompt with the one line:
f1=tempfile.NamedTemporaryFile(mode='r+w+b')

I couldn't solve this from looking at the Python documentation, but
after some Google searching the answer seems to be just to use the
default mode (w+b):
f1=tempfile.NamedTemporaryFile()

This works on Windows with Python 2.3 to 2.6, and also works on Mac OS
X and Linux too (only one version of Python tested here).  Fix checked
into CVS.

Peter


From cy at cymon.org  Mon Mar 30 07:42:00 2009
From: cy at cymon.org (Cymon Cox)
Date: Mon, 30 Mar 2009 12:42:00 +0100
Subject: [Biopython-dev] Multiple alignment - Clustalw etc...
Message-ID: <7265d4f0903300442h276df25ay1d78fb04180c5b5b@mail.gmail.com>

Hi Folks,

I've been trying to formalize a bunch of randomly scattered bits of code to
support the use of the alignment programme Muscle
(http://www.drive5.com/muscle/). I prefer to use this software in preference
to
Clustalw - subjectively, it seems to give the most accurate alignments.
(Whether
Biopython would want to support a second alignment programme/external
dependency
is another matter...)

Anyway, while doing so, I realised just how awkward the current interface to
Clustalw is, which doesn't fit the SeqIO/AlignIO paradigm well.

Currently, if we have a bunch of SeqRecords, say after downloading from
GenBank
or being pulled from a BioSQL db, we have to write them to disk and call
clustalw on the file:

>>> from Bio import Clustalw
>>> from Bio.Clustalw import MultipleAlignCL
>>> cline = MultipleAlignCL("f002", command="clustalw")
>>> align = Clustalw.do_alignment(cline)

It seems to me more appropriate to be able to call clustalw directly on a
bunch
of SeqRecords:

eg (suggested implementation)
>>> records = list(SeqIO.parse(open("f002", "r"), "fasta"))
>>> from Bio.Align import MultipleAlignment
>>> align = MultipleAlignment(records, executable="clustalw")

Secondly, the biopython interface does not support calling Clustalw to
perform
profile alignments,

(suggested implementation)
# The scaffold alignment:
>>> align = AlignIO.read(open("blah.nex", "r"), "nexus")
# The sequences we want to add to it:
>>> records = list(SeqIO.parse(open("f002", "r"), "fasta"))
>>> from Bio.Align import ProfileAlignment
>>> align = ProfileAlignment(align, records, executable="clustalw")

Calls to MultipleAlignment and ProfileAlignment would take a **options
parameter to collect any additional command line options.

Thirdly, should an alignment object have a
Alignment.refine_alignment(executable="clustalw")
method?

Any thoughts?

Cheers, C.
-- 
____________________________________________________________________

Cymon J. Cox

Centro de Ciencias do Mar
Faculdade de Ciencias do Mar e Ambiente (FCMA)
Universidade do Algarve
Campus de Gambelas
8005-139 Faro
Portugal

Phone: +0351 289800909 ext 7909
Fax: +0351 289800051
Email: cy at cymon.org, cymon at ualg.pt, cymon.cox at gmail.com
HomePage : http://biology.duke.edu/bryology/cymon.html
-8.63/-6.77

From chapmanb at 50mail.com  Mon Mar 30 09:00:27 2009
From: chapmanb at 50mail.com (Brad Chapman)
Date: Mon, 30 Mar 2009 09:00:27 -0400
Subject: [Biopython-dev] Testing Biopython with NumPy 1.3
In-Reply-To: <320fb6e00903300338v35b14fa2yc0d2ba68925808da@mail.gmail.com>
References: <320fb6e00903300326x4cb5eb95r87dbd5c95d5379d9@mail.gmail.com>
	<320fb6e00903300329ra19fe06j1cd12477e591afdf@mail.gmail.com>
	<320fb6e00903300338v35b14fa2yc0d2ba68925808da@mail.gmail.com>
Message-ID: <20090330130027.GB36526@sobchak.mgh.harvard.edu>

Hi Peter;
Things work on FreeBSD 7.1 with python2.5 and the numpy release
candidate:

> python2.5
Python 2.5.4 (r254:67916, Feb 18 2009, 08:20:57) [GCC 4.2.1 20070719  [FreeBSD]] on freebsd7
>>> import numpy
>>> numpy.__version__
'1.3.0rc1'

> python2.5 test_Cluster.py
test_clusterdistance (__main__.TestCluster) ... ok
test_distancematrix_kmedoids (__main__.TestCluster) ... ok
test_kcluster (__main__.TestCluster) ... ok
test_matrix_parse (__main__.TestCluster) ... ok
test_median_mean (__main__.TestCluster) ... ok
test_somcluster (__main__.TestCluster) ... ok
test_treecluster (__main__.TestCluster) ... ok

----------------------------------------------------------------------
Ran 7 tests in 0.009s

OK

The whole test suite passes as well. Maybe this is a windows issue?
Brad


> On Mon, Mar 30, 2009 at 11:29 AM, Peter <biopython at maubp.freeserve.co.uk> wrote:
> > David Cournapeau has just updated sourceforge - so I will try again with
> > the actual release candidate instead of just the beta...
> 
> Nope - using numpy-1.3.0rc1-win32-superpack-python2.6.exe on Windows
> XP, Python 2.6 using the python.org installer, with Biopython compiled
> with cygwin mingw32 as normal, same error - test_Cluster.py is failing
> on the second line of Bio/Cluster/__init__.py, "from cluster import
> *".
> 
> So the question stands - has anyone else tried Biopython (from CVS)
> with NumPy 1.3 (beta or release candidate) on any platform?  I should
> be able to check it tonight on a Linux machine myself without too much
> trouble... but a few more data points wouldn't hurt ;)
> 
> Peter
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev

From biopython at maubp.freeserve.co.uk  Mon Mar 30 09:23:31 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Mon, 30 Mar 2009 14:23:31 +0100
Subject: [Biopython-dev] Testing Biopython with NumPy 1.3
In-Reply-To: <20090330130027.GB36526@sobchak.mgh.harvard.edu>
References: <320fb6e00903300326x4cb5eb95r87dbd5c95d5379d9@mail.gmail.com>
	<320fb6e00903300329ra19fe06j1cd12477e591afdf@mail.gmail.com>
	<320fb6e00903300338v35b14fa2yc0d2ba68925808da@mail.gmail.com>
	<20090330130027.GB36526@sobchak.mgh.harvard.edu>
Message-ID: <320fb6e00903300623j1f17fe6fia6ded742a7c610ec@mail.gmail.com>

On Mon, Mar 30, 2009 at 2:00 PM, Brad Chapman <chapmanb at 50mail.com> wrote:
> Hi Peter;
> Things work on FreeBSD 7.1 with python2.5 and the numpy release
> candidate:
> ...
> The whole test suite passes as well. Maybe this is a windows issue?
> Brad

Thanks Brad - nice to know we have Biopython being tested on a
fourth major OS being tested (FreeBSD, in addition to Linux, Mac
OS X and Windows XP).

I've just used NumPy 1.3.0rc1 with Python 2.4.3 on a Linux box, and
test_Cluster and the rest of the Biopython tests passed.  This looks like
a Windows and/or Python 2.6 problem - I should be able to try a Linux
machine with Python 2.6 tonight...

Peter

From biopython at maubp.freeserve.co.uk  Mon Mar 30 10:37:18 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Mon, 30 Mar 2009 15:37:18 +0100
Subject: [Biopython-dev] Multiple alignment - Clustalw etc...
In-Reply-To: <7265d4f0903300442h276df25ay1d78fb04180c5b5b@mail.gmail.com>
References: <7265d4f0903300442h276df25ay1d78fb04180c5b5b@mail.gmail.com>
Message-ID: <320fb6e00903300737i73f6efaex7b0a22ee685c74c1@mail.gmail.com>

On Mon, Mar 30, 2009 at 12:42 PM, Cymon Cox <cy at cymon.org> wrote:
>
> Hi Folks,
>
> I've been trying to formalize a bunch of randomly scattered bits of code to
> support the use of the alignment programme Muscle
> (http://www.drive5.com/muscle/). I prefer to use this software in preference
> to Clustalw - subjectively, it seems to give the most accurate alignments.
> (Whether Biopython would want to support a second alignment programme
> /external dependency is another matter...)

A wrapper for MUSCLE wouldn't hurt - although there is scope for some
rearrangement of our command line tool wrappers rather than adding more
and more top level modules.  Maybe under Bio.Align, and move the Clustalw
wrapper there too.

> Anyway, while doing so, I realised just how awkward the current interface to
> Clustalw is, which doesn't fit the SeqIO/AlignIO paradigm well.

What I typically do fits pretty nicely with the SeqIO/AlignIO paradigm:
(1) use SeqIO to prepare the FASTA input file.
(2) run the command line tool (e.g. MUSCLE).
(3) use AlignIO (or SeqIO) to read the alignment output file.

Actually I think that Bio.Clustalw interface is now a bit out of place,
as it hides some of this from you.  (Note that Bio.Clustalw predates
Bio.AlignIO, and that by working with handles Bio.AlignIO is fairly
tool neutral).

> Currently, if we have a bunch of SeqRecords, say after downloading from
> GenBank or being pulled from a BioSQL db, we have to write them to disk
> and call clustalw on the file:
>
>>>> from Bio import Clustalw
>>>> from Bio.Clustalw import MultipleAlignCL
>>>> cline = MultipleAlignCL("f002", command="clustalw")
>>>> align = Clustalw.do_alignment(cline)

Well yes. Typically for any alignment tool you'd have to write the
unaligned records in FASTA format.  Some tools may let handle
this via standard input, so you may be able to use a pipe instead
of a file - but the issues are similar.

> It seems to me more appropriate to be able to call clustalw directly on a
> bunch of SeqRecords:
>
> eg (suggested implementation)
>>>> records = list(SeqIO.parse(open("f002", "r"), "fasta"))
>>>> from Bio.Align import MultipleAlignment
>>>> align = MultipleAlignment(records, executable="clustalw")

i.e. Have a Biopython wrapper use a temp file to record the
given records to in a format appropriate for the command line
tool selected, and capturing the output?  In the case of
ClustalW or MUSCLE this means making a temp FASTA input
file.  For ClustalW we'd then have to open the output file, read
it, and then delete it.  For other tools we may be able to just
capture its output on stdout and not have to clean up a temp
output file.

All the possible command line tools have their own arguments,
range of file formats, behaviour with respect to default filenames
etc.  Trying to capture all this in a single wrapper seems rather
ambitious.  For example, how would you handle gap penalties?
Keep in mind that different tools may use the same name for
a gap extension penalty but interpret the values differently.

Also, while I can see this might be nice for short alignments
(which are quick to run), its rather implicit or magic.  I personally
prefer to have to deal with the files explicitly myself - but then I
have been dealing with large alignments which I want to keep
on disk.

> Secondly, the biopython interface does not support calling
> Clustalw to perform profile alignments,
>
> (suggested implementation)
> # The scaffold alignment:
>>>> align = AlignIO.read(open("blah.nex", "r"), "nexus")
> # The sequences we want to add to it:
>>>> records = list(SeqIO.parse(open("f002", "r"), "fasta"))
>>>> from Bio.Align import ProfileAlignment
>>>> align = ProfileAlignment(align, records, executable="clustalw")
>
> Calls to MultipleAlignment and ProfileAlignment would take a
> **options parameter to collect any additional command line options.
>
> Thirdly, should an alignment object have a
> Alignment.refine_alignment(executable="clustalw")
> method?
>
> Any thoughts?

I may have misunderstood you, but the ideas you've sketched out
seem very very broad/ambitious - and actually take us further away
from the SeqIO/AlignIO interface by hiding all the filenames and
handles from the user.  I think these should be kept explicit.

Peter

From eric.talevich at gmail.com  Mon Mar 30 14:34:09 2009
From: eric.talevich at gmail.com (Eric Talevich)
Date: Mon, 30 Mar 2009 14:34:09 -0400
Subject: [Biopython-dev] Google Summer of Code -- phyloXML parser project
Message-ID: <3f6baf360903301134p421a41f2if2b8980e9e166451@mail.gmail.com>

Hi folks,

I noticed earlier this month that several Biopython developers had signed up
as potential mentors in OBF's Summer of Code application. Although OBF
apparently wasn't selected as a mentoring organization this year, some other
bioinformatics-related groups were -- in particular, the National
Evolutionary Synthesis Center's page mentions involvement with the Bio*
projects:

http://socghop.appspot.com/org/show/google/gsoc2009/nescent

The project I'd like to work on is a phyloXML parser for Biopython.
NESCent's idea list includes a similar entry for BioRuby (links below). I
asked the mentor, Christian Zmasek, if it would be acceptable to do the
project with Biopython instead of BioRuby, and he said it would, but he'd
prefer to have a Biopython specialist on board as another mentor.

Would any of you be interested in being a mentor for this project? I imagine
it would have some things in common with the existing Nexus parser, as a
starting point.

http://www.phyloxml.org/
https://www.nescent.org/wg/phyloinformatics/index.php?title=Phyloinformatics_Summer_of_Code_2009#phyloXML_support_in_BioRuby

Thanks,
Eric

From chapmanb at 50mail.com  Mon Mar 30 17:00:07 2009
From: chapmanb at 50mail.com (Brad Chapman)
Date: Mon, 30 Mar 2009 17:00:07 -0400
Subject: [Biopython-dev] Multiple alignment - Clustalw etc...
In-Reply-To: <320fb6e00903300737i73f6efaex7b0a22ee685c74c1@mail.gmail.com>
References: <7265d4f0903300442h276df25ay1d78fb04180c5b5b@mail.gmail.com>
	<320fb6e00903300737i73f6efaex7b0a22ee685c74c1@mail.gmail.com>
Message-ID: <20090330210007.GC72956@sobchak.mgh.harvard.edu>

Cymon;
I wrote a bunch of the Clustalw stuff a long while ago, and it
sounds like Peter has a good handle on integrating it with AlignIO
so I will leave that to him.

On the choosing aligners side of things, have you tried MAFFT?

http://align.bmr.kyushu-u.ac.jp/mafft/software/

It's updated regularly and seems to have good buzz in the community.
I haven't had to do lots of multiple alignments recently, but it's
worked well for the few I've done.

Having support for multiple aligners is good stuff; I second Peter's
suggestion of having these live under Bio.Align.

Brad

> On Mon, Mar 30, 2009 at 12:42 PM, Cymon Cox <cy at cymon.org> wrote:
> >
> > Hi Folks,
> >
> > I've been trying to formalize a bunch of randomly scattered bits of code to
> > support the use of the alignment programme Muscle
> > (http://www.drive5.com/muscle/). I prefer to use this software in preference
> > to Clustalw - subjectively, it seems to give the most accurate alignments.
> > (Whether Biopython would want to support a second alignment programme
> > /external dependency is another matter...)
> 
> A wrapper for MUSCLE wouldn't hurt - although there is scope for some
> rearrangement of our command line tool wrappers rather than adding more
> and more top level modules.  Maybe under Bio.Align, and move the Clustalw
> wrapper there too.
> 
> > Anyway, while doing so, I realised just how awkward the current interface to
> > Clustalw is, which doesn't fit the SeqIO/AlignIO paradigm well.
> 
> What I typically do fits pretty nicely with the SeqIO/AlignIO paradigm:
> (1) use SeqIO to prepare the FASTA input file.
> (2) run the command line tool (e.g. MUSCLE).
> (3) use AlignIO (or SeqIO) to read the alignment output file.
> 
> Actually I think that Bio.Clustalw interface is now a bit out of place,
> as it hides some of this from you.  (Note that Bio.Clustalw predates
> Bio.AlignIO, and that by working with handles Bio.AlignIO is fairly
> tool neutral).
> 
> > Currently, if we have a bunch of SeqRecords, say after downloading from
> > GenBank or being pulled from a BioSQL db, we have to write them to disk
> > and call clustalw on the file:
> >
> >>>> from Bio import Clustalw
> >>>> from Bio.Clustalw import MultipleAlignCL
> >>>> cline = MultipleAlignCL("f002", command="clustalw")
> >>>> align = Clustalw.do_alignment(cline)
> 
> Well yes. Typically for any alignment tool you'd have to write the
> unaligned records in FASTA format.  Some tools may let handle
> this via standard input, so you may be able to use a pipe instead
> of a file - but the issues are similar.
> 
> > It seems to me more appropriate to be able to call clustalw directly on a
> > bunch of SeqRecords:
> >
> > eg (suggested implementation)
> >>>> records = list(SeqIO.parse(open("f002", "r"), "fasta"))
> >>>> from Bio.Align import MultipleAlignment
> >>>> align = MultipleAlignment(records, executable="clustalw")
> 
> i.e. Have a Biopython wrapper use a temp file to record the
> given records to in a format appropriate for the command line
> tool selected, and capturing the output?  In the case of
> ClustalW or MUSCLE this means making a temp FASTA input
> file.  For ClustalW we'd then have to open the output file, read
> it, and then delete it.  For other tools we may be able to just
> capture its output on stdout and not have to clean up a temp
> output file.
> 
> All the possible command line tools have their own arguments,
> range of file formats, behaviour with respect to default filenames
> etc.  Trying to capture all this in a single wrapper seems rather
> ambitious.  For example, how would you handle gap penalties?
> Keep in mind that different tools may use the same name for
> a gap extension penalty but interpret the values differently.
> 
> Also, while I can see this might be nice for short alignments
> (which are quick to run), its rather implicit or magic.  I personally
> prefer to have to deal with the files explicitly myself - but then I
> have been dealing with large alignments which I want to keep
> on disk.
> 
> > Secondly, the biopython interface does not support calling
> > Clustalw to perform profile alignments,
> >
> > (suggested implementation)
> > # The scaffold alignment:
> >>>> align = AlignIO.read(open("blah.nex", "r"), "nexus")
> > # The sequences we want to add to it:
> >>>> records = list(SeqIO.parse(open("f002", "r"), "fasta"))
> >>>> from Bio.Align import ProfileAlignment
> >>>> align = ProfileAlignment(align, records, executable="clustalw")
> >
> > Calls to MultipleAlignment and ProfileAlignment would take a
> > **options parameter to collect any additional command line options.
> >
> > Thirdly, should an alignment object have a
> > Alignment.refine_alignment(executable="clustalw")
> > method?
> >
> > Any thoughts?
> 
> I may have misunderstood you, but the ideas you've sketched out
> seem very very broad/ambitious - and actually take us further away
> from the SeqIO/AlignIO interface by hiding all the filenames and
> handles from the user.  I think these should be kept explicit.
> 
> Peter
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev

From chapmanb at 50mail.com  Mon Mar 30 17:14:48 2009
From: chapmanb at 50mail.com (Brad Chapman)
Date: Mon, 30 Mar 2009 17:14:48 -0400
Subject: [Biopython-dev] Google Summer of Code -- phyloXML parser	project
In-Reply-To: <3f6baf360903301134p421a41f2if2b8980e9e166451@mail.gmail.com>
References: <3f6baf360903301134p421a41f2if2b8980e9e166451@mail.gmail.com>
Message-ID: <20090330211448.GF72956@sobchak.mgh.harvard.edu>

Hi Eric;
I would be happy to help with mentoring. I have been helping another
student with his application and could definitely give you feedback
on yours. Based on good ones coming through the list, it should be
detailed with a week by week description of what you plan to be working
on and specific deliverables. They also have a short description of the
motivation and your qualifications.

This is my first time doing this, so I don't know much about the
selection process. If more than one Biopython project was selected,
I couldn't realistically mentor both; I am not even sure if that
is a possibility. Either way, Google recommends having two mentors
per student so it would be good to have someone else step up as
well.

Let me know if you have any specific questions while you are getting
things together this week,
Brad

> Hi folks,
> 
> I noticed earlier this month that several Biopython developers had signed up
> as potential mentors in OBF's Summer of Code application. Although OBF
> apparently wasn't selected as a mentoring organization this year, some other
> bioinformatics-related groups were -- in particular, the National
> Evolutionary Synthesis Center's page mentions involvement with the Bio*
> projects:
> 
> http://socghop.appspot.com/org/show/google/gsoc2009/nescent
> 
> The project I'd like to work on is a phyloXML parser for Biopython.
> NESCent's idea list includes a similar entry for BioRuby (links below). I
> asked the mentor, Christian Zmasek, if it would be acceptable to do the
> project with Biopython instead of BioRuby, and he said it would, but he'd
> prefer to have a Biopython specialist on board as another mentor.
> 
> Would any of you be interested in being a mentor for this project? I imagine
> it would have some things in common with the existing Nexus parser, as a
> starting point.
> 
> http://www.phyloxml.org/
> https://www.nescent.org/wg/phyloinformatics/index.php?title=Phyloinformatics_Summer_of_Code_2009#phyloXML_support_in_BioRuby
> 
> Thanks,
> Eric
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev

From chapmanb at 50mail.com  Mon Mar 30 17:33:17 2009
From: chapmanb at 50mail.com (Brad Chapman)
Date: Mon, 30 Mar 2009 17:33:17 -0400
Subject: [Biopython-dev] Biopython on Twitter
In-Reply-To: <320fb6e00903291558o6299575dq80eea647b1c6a900@mail.gmail.com>
References: <320fb6e00903260849n683d3e39kf68fd91727970dc7@mail.gmail.com>
	<C5F15E4B.1FA8B%lpritc@scri.ac.uk> <20090329010652.GA914@kunkel>
	<320fb6e00903291558o6299575dq80eea647b1c6a900@mail.gmail.com>
Message-ID: <20090330213317.GG72956@sobchak.mgh.harvard.edu>

Hi Peter;
Thanks for the feedback. I was definitely not being critical of your
postings, or fishing for extra jobs for myself. On the contrary, I
was inspired by the news items and brainstorming some ways to get
additional people involved.

People who express an interest in Biopython and don't get involved
often list the following reasons:

- Not feeling like they are technically able to contribute. Perhaps
  they are just learning Python, or don't feel comfortable with the
  Biopython library itself.

- Traditional academics doesn't offer recognition for contributing to
  open source projects. While we can't change academics, we can try
  and come up with ways to improve the visibility of contributors and
  make sure they are recognized in the larger bioinformatics
  community.

My thought was that a "news coordinator" would give one or more
interested people a chance to help the community, learn more about
Biopython by being involved, and also increase name recognition for
everyone coding, bug fixing and discussing.

In terms of how it is done, those were only my random suggestions.
Certainly if someone took it up they could be as creative as they
want about how to go about it.

Brad


> On Sun, Mar 29, 2009 at 2:06 AM, Brad Chapman <chapmanb at 50mail.com> wrote:
> > Hi all;
> > It is great we are exploring getting news out about Biopython in
> > additional ways. One thing this can really help with is recognizing
> > contributions to Biopython. Another is pointing out interesting
> > discussion threads on the mailing lists and getting others involved.
> 
> Do you think the recent release notes and NEWS file entries have been
> a bit too impersonal?  We can certainly be a bit more explicit if people
> that is a good thing. For example, should we mention Bartek by name
> in the paragraph on the new Bio.Motif module?
> 
> This is linked to from the wiki's news page BTW:
> http://biopython.open-bio.org/SRC/biopython/NEWS
> http://cvs.biopython.org/cgi-bin/viewcvs/viewcvs.cgi/biopython/NEWS?cvsroot=biopython
> 
> > Do you think it would be worthwhile to "advertise" on the main list
> > for someone interested in coordinating news and communication?
> > ... Perhaps there are members who are interested in Biopython and
> > follow what is going on but aren't coders. This would be a way to
> > get involved, ...
> 
> Are you up for the job yourself Brad?  From your own blog we know
> you can and do write regularly anyway ;)   Would you like an account
> on the OBF news server? Email me off list and we can sort that out.
> 
> In terms of micro-blogging via twitter, you sound like you have a better
> feel for this than me - I don't even have a personal twitter account.
> 
> Monthly news posts (perhaps cc'd to the announcement email list)
> would be a nice idea - especially if we can encourage more lurkers
> to speak up.  For a while BioPerl had something like this going
> (digest emails or something), but it needs a pretty dedicated person
> or team.  In the meantime as you've noticed I've started making
> more use of our news facility myself...
> 
> Peter

From biopython at maubp.freeserve.co.uk  Mon Mar 30 17:58:52 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Mon, 30 Mar 2009 22:58:52 +0100
Subject: [Biopython-dev] Biopython on Twitter
In-Reply-To: <20090330213317.GG72956@sobchak.mgh.harvard.edu>
References: <320fb6e00903260849n683d3e39kf68fd91727970dc7@mail.gmail.com>
	<C5F15E4B.1FA8B%lpritc@scri.ac.uk> <20090329010652.GA914@kunkel>
	<320fb6e00903291558o6299575dq80eea647b1c6a900@mail.gmail.com>
	<20090330213317.GG72956@sobchak.mgh.harvard.edu>
Message-ID: <320fb6e00903301458s7216ec97gc4ac71a03d0fd350@mail.gmail.com>

On Mon, Mar 30, 2009 at 10:33 PM, Brad Chapman <chapmanb at 50mail.com> wrote:
> Hi Peter;
> Thanks for the feedback. I was definitely not being critical of your
> postings, ...

I hadn't had that impression, but that's still nice to hear ;)

> ... or fishing for extra jobs for myself.

Darn - I thought you'd be an excellent choice.

> On the contrary, I was inspired by the news items and
> brainstorming some ways to get additional people involved.

Well unless anyone already lurking on the dev mailing list steps
forward (*hint hint*), do you (Brad) want to try asking on the main
discussion list to see if there are any takers?

> People who express an interest in Biopython and don't get
> involved often list the following reasons:
>
> - Not feeling like they are technically able to contribute. Perhaps
> ?they are just learning Python, or don't feel comfortable with the
> ?Biopython library itself.

I find once they get over any shyness, even just having beginners
asking questions can be valuable in itself.  It shows us potential
blind spots, or areas of the documentation which need clarification
(or writing) - plus of course it can bring about discussions etc.

> - Traditional academics doesn't offer recognition for contributing to
> ?open source projects. While we can't change academics, we can try
> ?and come up with ways to improve the visibility of contributors and
> ?make sure they are recognized in the larger bioinformatics
> ?community.
>
> My thought was that a "news coordinator" would give one or more
> interested people a chance to help the community, learn more about
> Biopython by being involved, and also increase name recognition for
> everyone coding, bug fixing and discussing.

Some of us are very aware of this issue (accademic recognition for
contributions to projects like Biopython), and different employers
will take different attitudes here.  In some cases making our
contributors more visible won't always be a good idea...  In my
case work on Biopython was a definite plus point in landing my
current job, but there are of course still limits to how much work
time I can reasonably spend on this (and limits to how much time
I spend out of work - like right now on this email).

> In terms of how it is done, those were only my random suggestions.
> Certainly if someone took it up they could be as creative as they
> want about how to go about it.
>
> Brad

It's certainly worth a go :)

Peter


From biopython at maubp.freeserve.co.uk  Mon Mar 30 18:35:05 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Mon, 30 Mar 2009 23:35:05 +0100
Subject: [Biopython-dev] Testing Biopython with NumPy 1.3
In-Reply-To: <320fb6e00903300623j1f17fe6fia6ded742a7c610ec@mail.gmail.com>
References: <320fb6e00903300326x4cb5eb95r87dbd5c95d5379d9@mail.gmail.com>
	<320fb6e00903300329ra19fe06j1cd12477e591afdf@mail.gmail.com>
	<320fb6e00903300338v35b14fa2yc0d2ba68925808da@mail.gmail.com>
	<20090330130027.GB36526@sobchak.mgh.harvard.edu>
	<320fb6e00903300623j1f17fe6fia6ded742a7c610ec@mail.gmail.com>
Message-ID: <320fb6e00903301535j21ae6659r931c9be0fd17faf3@mail.gmail.com>

On Mon, Mar 30, 2009 at 2:23 PM, Peter wrote:
> I've just used NumPy 1.3.0rc1 with Python 2.4.3 on a Linux box, and
> test_Cluster and the rest of the Biopython tests passed. ?This looks like
> a Windows and/or Python 2.6 problem - I should be able to try a Linux
> machine with Python 2.6 tonight...

I've just tried it on Ubuntu Jaunty (Alpha 6), with Python 2.6.1+
(already installed), the wise and clustalw packages installed, Numpy
1.3.0rc1 installed from source, and Biopython CVS installed from
source.  Again, test_Cluster.py and the rest of our tests pass
(ignoring those with additional external dependencies like BioSQL,
fdist, simcoal2).

So, whatever is going wrong on test_Cluster.py seems to be specific to
Windows (XP) and Python 2.6 - and possibly just my Windows development
machine.

Peter


From mjldehoon at yahoo.com  Mon Mar 30 20:08:34 2009
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Mon, 30 Mar 2009 17:08:34 -0700 (PDT)
Subject: [Biopython-dev] Testing Biopython with NumPy 1.3
In-Reply-To: <320fb6e00903301535j21ae6659r931c9be0fd17faf3@mail.gmail.com>
Message-ID: <730606.962.qm@web62408.mail.re1.yahoo.com>


> So, whatever is going wrong on test_Cluster.py seems to be
> specific to
> Windows (XP) and Python 2.6 - and possibly just my Windows
> development
> machine.
> 
I believe that the problem is that msvcr90.dll is missing. This is the C runtime from Microsoft. Earlier Pythons used msvcr71.dll, if I'm not mistaken.

--Michiel


From biopython at maubp.freeserve.co.uk  Tue Mar 31 05:12:21 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Tue, 31 Mar 2009 10:12:21 +0100
Subject: [Biopython-dev] Testing Biopython with NumPy 1.3
In-Reply-To: <730606.962.qm@web62408.mail.re1.yahoo.com>
References: <320fb6e00903301535j21ae6659r931c9be0fd17faf3@mail.gmail.com>
	<730606.962.qm@web62408.mail.re1.yahoo.com>
Message-ID: <320fb6e00903310212o29bba163ma9d68a901eabc2c9@mail.gmail.com>

On Tue, Mar 31, 2009 at 1:08 AM, Michiel de Hoon <mjldehoon at yahoo.com> wrote:
>
>> So, whatever is going wrong on test_Cluster.py seems to be
>> specific to Windows (XP) and Python 2.6 - and possibly just
>> my Windows development machine.
>>
> I believe that the problem is that msvcr90.dll is missing. This
> is the C runtime from Microsoft. Earlier Pythons used
> msvcr71.dll, if I'm not mistaken.

You may be right - there is some stuff on the numpy mailing list
about this and manifest files etc when using mingw32.  It may
be simplest to try the appropriate MS compiler instead...

Peter

From biopython at maubp.freeserve.co.uk  Tue Mar 31 06:28:35 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Tue, 31 Mar 2009 11:28:35 +0100
Subject: [Biopython-dev] Python's new DVCS chosen
Message-ID: <320fb6e00903310328x3c2d8bc0n8138f551da7ea4a2@mail.gmail.com>

Hi all,

This might be of interest (although I'm sure some of you already
know).  Earlier this month on the python-dev mailing list, Guido
van Rossum wrote:

> Dear Python developers,
>
> The decision is made! I've selected a DVCS to use for Python.
> We're switching to Mercurial (Hg).
>
> The implementation and schedule is still up in the air -- I am
> hoping that we can switch before the summer.
> ...

http://mail.python.org/pipermail/python-dev/2009-March/087931.html
See also PEP-374, http://www.python.org/dev/peps/pep-0374/

Interestingly, Mercurial (Hg) didn't get much of a mention in our
discussions here.

Peter

From bartek at rezolwenta.eu.org  Tue Mar 31 07:05:07 2009
From: bartek at rezolwenta.eu.org (Bartek Wilczynski)
Date: Tue, 31 Mar 2009 13:05:07 +0200
Subject: [Biopython-dev] Python's new DVCS chosen
In-Reply-To: <320fb6e00903310328x3c2d8bc0n8138f551da7ea4a2@mail.gmail.com>
References: <320fb6e00903310328x3c2d8bc0n8138f551da7ea4a2@mail.gmail.com>
Message-ID: <8b34ec180903310405x5d5353f0q2de270a3c16bdc95@mail.gmail.com>

Hi,
On Tue, Mar 31, 2009 at 12:28 PM, Peter <biopython at maubp.freeserve.co.uk> wrote:
> Hi all,
>
> This might be of interest (although I'm sure some of you already
> know). ?Earlier this month on the python-dev mailing list, Guido
> van Rossum wrote:
>> We're switching to Mercurial (Hg).
> Interestingly, Mercurial (Hg) didn't get much of a mention in our
> discussions here.

Their evaluation of different options  (in PEP 374) was mentioned on
the list by Bruce, so everyone was able to make their opinions.

As Guido explains in another paragraph:
>It's hard to explain my reasons for choosing -- like most language
>decisions (especially the difficult ones) it's mostly a matter of gut
>feelings. One thing I know is that it's better to decide now than to
>spend another year discussing the pros and cons. All that could be
>said has been said, pretty much, and my mind is made up.

He seems to find all the candidates good enough. It's a matter then of
a consensus between developers. Git happened to have many antagonists
on python-dev list, but it happened to have more protagonists on
biopython-dev.

I think we have made a consensus decision to try out git/github and I
think it's extremely counter-productive to re-open the discussion on
our choice now. I'm not a git fanboy, but because there are _no_
universal criteria to choose between git vs. bzr vs. Hg we should not
spend more time on this issue.

cheers
  Bartek


From cy at cymon.org  Tue Mar 31 07:25:27 2009
From: cy at cymon.org (Cymon Cox)
Date: Tue, 31 Mar 2009 12:25:27 +0100
Subject: [Biopython-dev] Multiple alignment - Clustalw etc...
In-Reply-To: <320fb6e00903300737i73f6efaex7b0a22ee685c74c1@mail.gmail.com>
References: <7265d4f0903300442h276df25ay1d78fb04180c5b5b@mail.gmail.com> 
	<320fb6e00903300737i73f6efaex7b0a22ee685c74c1@mail.gmail.com>
Message-ID: <7265d4f0903310425p60a8f80ewb8aee8cc6b4a663c@mail.gmail.com>

Hi Peter,

2009/3/30 Peter <biopython at maubp.freeserve.co.uk>

> On Mon, Mar 30, 2009 at 12:42 PM, Cymon Cox <cy at cymon.org> wrote:
> >
> > Hi Folks,
> >
> > I've been trying to formalize a bunch of randomly scattered bits of code
> to
> > support the use of the alignment programme Muscle
> > (http://www.drive5.com/muscle/). I prefer to use this software in
> preference
> > to Clustalw - subjectively, it seems to give the most accurate
> alignments.
> > (Whether Biopython would want to support a second alignment programme
> > /external dependency is another matter...)
>
> A wrapper for MUSCLE wouldn't hurt - although there is scope for some
> rearrangement of our command line tool wrappers rather than adding more
> and more top level modules.  Maybe under Bio.Align, and move the Clustalw
> wrapper there too.


Agreed - it would seem more appropriate to have the alignment interfaces in
Bio.Align.


> > Anyway, while doing so, I realised just how awkward the current interface
> to
> > Clustalw is, which doesn't fit the SeqIO/AlignIO paradigm well.
>
> What I typically do fits pretty nicely with the SeqIO/AlignIO paradigm:
> (1) use SeqIO to prepare the FASTA input file.
> (2) run the command line tool (e.g. MUSCLE).
> (3) use AlignIO (or SeqIO) to read the alignment output file.


Well, yes - we can always not use the biopython interface.


> Actually I think that Bio.Clustalw interface is now a bit out of place,
> as it hides some of this from you. (Note that Bio.Clustalw predates
> Bio.AlignIO, and that by working with handles Bio.AlignIO is fairly
> tool neutral).
>
> > Currently, if we have a bunch of SeqRecords, say after downloading from
> > GenBank or being pulled from a BioSQL db, we have to write them to disk
> > and call clustalw on the file:
> >
> >>>> from Bio import Clustalw
> >>>> from Bio.Clustalw import MultipleAlignCL
> >>>> cline = MultipleAlignCL("f002", command="clustalw")
> >>>> align = Clustalw.do_alignment(cline)
>
> Well yes. Typically for any alignment tool you'd have to write the
> unaligned records in FASTA format.  Some tools may let handle
> this via standard input, so you may be able to use a pipe instead
> of a file - but the issues are similar.
>
> > It seems to me more appropriate to be able to call clustalw directly on a
> > bunch of SeqRecords:
> >
> > eg (suggested implementation)
> >>>> records = list(SeqIO.parse(open("f002", "r"), "fasta"))
> >>>> from Bio.Align import MultipleAlignment
> >>>> align = MultipleAlignment(records, executable="clustalw")
>
> i.e. Have a Biopython wrapper use a temp file to record the
> given records to in a format appropriate for the command line
> tool selected, and capturing the output?  In the case of
> ClustalW or MUSCLE this means making a temp FASTA input
> file.  For ClustalW we'd then have to open the output file, read
> it, and then delete it.


Yes, that's what I'm suggesting.

Here's my reasoning: it seems to me the input and output formats of the data
required by a particular alignment tool are incidental and should be hidden
from the user. At present the Clustalw interface forces you to write a fasta
formatted file of your records to disk, and then has Clustalw write an
aligned matrix to disk in a format specified by the user. If the latter is
Clustal format, then the record is parsed and an alignment object is
returned, else None is returned. In either case, an output file(s) remains
on disk.

So, say we have a bunch of sequences in pir format and we'd like them
aligned and saved in stockholm format:

from Bio import SeqIO
from Bio import AlignIO
from Bio import Clustalw
from Bio.Clustalw import MultipleAlignCL
records = SeqIO.parse(open("Tests/NBRF/DMA_nuc.pir", "r"), "pir")
AlignIO.write([records], open("temp.fasta", "w"), "fasta")
cline = MultipleAlignCL("temp.fasta", command="clustalw")
align = Clustalw.do_alignment(cline)
AlignIO.write([align], open("temp.sth", "w"), "stockholm")

we end up with 4 output files on disk: temp.aln,  temp.dnd,  temp.fasta,
temp.sth - 3 of which are incidental.

(BTW, using the above procedure on the files "B_nuc.pir" and "Cw_prot.pir"
in Tests/NBRF hangs on RH and Ubuntu linux: it seems to be waiting for the
subprocess to return, which it never does: pid, sts = os.waitpid(self.pid,
0))

As I say, I'd like to see this:
>>> from Bio.Align import MultipleAlignment
>>> records = list(SeqIO.parse(open("Tests/NBRF/DMA_nuc.pir", "r"), "pir"))
>>> align = MultipleAlignment(records, executable="clustalw")
>>> AlignIO.write([align], open("temp.sth", "w"), "stockholm")

ie resulting in one file temp.sth, which we've explicitly written to disk.


>  For other tools we may be able to just
> capture its output on stdout and not have to clean up a temp
> output file.
>
> All the possible command line tools have their own arguments,
> range of file formats, behaviour with respect to default filenames
> etc.  Trying to capture all this in a single wrapper seems rather
> ambitious.  For example, how would you handle gap penalties?
> Keep in mind that different tools may use the same name for
> a gap extension penalty but interpret the values differently.


Sorry, I wasn't very clear about what I intended:

MultipleAlignment(records, executable="clustalw", <keyword args>)
returns Clustalw.do_alignment(records, <keyword args>)
and
MultipleAlignment(records, executable="muscle", <keyword args>)
returns Muscle.do_alignments(records, <keyword args>)

I'm not suggesting unifying all programme options into a single interface,
just wrap the individual alignment tool modules in a common call,
MulitpleAlignment(), align_records(), or whatever...

As for the keyword options, at present the Clustalw interface supports the
manual setting of some attributes to the MultipleAlignCL instance, but there
is no type or value checking. I think as many options as possible should be
supported through keyword arguments - tedious, but doable.

Also, while I can see this might be nice for short alignments
> (which are quick to run), its rather implicit or magic.


Not sure what you mean here? Why would the size of alignment matter? And as
for it being magic, its seems to me it does, and only does, what it says on
the label - aligns the data.


>  I personally
> prefer to have to deal with the files explicitly myself - but then I
> have been dealing with large alignments which I want to keep
> on disk.


I tend to build many (small - <100 taxa) single gene alignments - in one
use-case, 280 of them...

> Secondly, the biopython interface does not support calling
> > Clustalw to perform profile alignments,
> >
> > (suggested implementation)
> > # The scaffold alignment:
> >>>> align = AlignIO.read(open("blah.nex", "r"), "nexus")
> > # The sequences we want to add to it:
> >>>> records = list(SeqIO.parse(open("f002", "r"), "fasta"))
> >>>> from Bio.Align import ProfileAlignment
> >>>> align = ProfileAlignment(align, records, executable="clustalw")
> >
> > Calls to MultipleAlignment and ProfileAlignment would take a
> > **options parameter to collect any additional command line options.
>

I'm very keen to see profile alignments supported - be it either in Clustalw
or Muscle, or both.

>
> > Thirdly, should an alignment object have a
> > Alignment.refine_alignment(executable="clustalw")
> > method?
> >
> > Any thoughts?
>
> I may have misunderstood you, but the ideas you've sketched out
> seem very very broad/ambitious - and actually take us further away
>
from the SeqIO/AlignIO interface by hiding all the filenames and
> handles from the user.  I think these should be kept explicit.


OK, well having had my say, I'm quite happy to write the Muscle module in
the style of the current Clustalw interface, or whatever style is most
appropriate for exposing the filename handles. But I'm not sure what that
would be - perhaps you could elaborate on this a bit...

Cheers, C.
-- 
____________________________________________________________________

Cymon J. Cox

Centro de Ciencias do Mar
Faculdade de Ciencias do Mar e Ambiente (FCMA)
Universidade do Algarve
Campus de Gambelas
8005-139 Faro
Portugal

Phone: +0351 289800909 ext 7909
Fax: +0351 289800051
Email: cy at cymon.org, cymon at ualg.pt, cymon.cox at gmail.com
HomePage : http://biology.duke.edu/bryology/cymon.html
-8.63/-6.77

From biopython at maubp.freeserve.co.uk  Tue Mar 31 07:27:07 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Tue, 31 Mar 2009 12:27:07 +0100
Subject: [Biopython-dev] Python's new DVCS chosen
In-Reply-To: <8b34ec180903310405x5d5353f0q2de270a3c16bdc95@mail.gmail.com>
References: <320fb6e00903310328x3c2d8bc0n8138f551da7ea4a2@mail.gmail.com>
	<8b34ec180903310405x5d5353f0q2de270a3c16bdc95@mail.gmail.com>
Message-ID: <320fb6e00903310427s46e45337g42ced1a8e9c3a37f@mail.gmail.com>

On Tue, Mar 31, 2009 at 12:05 PM, Bartek Wilczynski
<bartek at rezolwenta.eu.org> wrote:
> I think we have made a consensus decision to try out git/github and I
> think it's extremely counter-productive to re-open the discussion on
> our choice now. I'm not a git fanboy, but because there are _no_
> universal criteria to choose between git vs. bzr vs. Hg we should not
> spend more time on this issue.

I hadn't intended to reopen the debate - it was just a post for interests sake.

As you can probably tell from looking at the biopython network graph
on github (which I got to work on Linux but only with Adobe's flash
plugin - gnash etc didn't seem to cope), I've been getting to grips
with git (and github).

Peter

From biopython at maubp.freeserve.co.uk  Tue Mar 31 08:56:21 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Tue, 31 Mar 2009 13:56:21 +0100
Subject: [Biopython-dev] Multiple alignment - Clustalw etc...
In-Reply-To: <7265d4f0903310425p60a8f80ewb8aee8cc6b4a663c@mail.gmail.com>
References: <7265d4f0903300442h276df25ay1d78fb04180c5b5b@mail.gmail.com>
	<320fb6e00903300737i73f6efaex7b0a22ee685c74c1@mail.gmail.com>
	<7265d4f0903310425p60a8f80ewb8aee8cc6b4a663c@mail.gmail.com>
Message-ID: <320fb6e00903310556h670634c2rcaa56c254ade07c5@mail.gmail.com>

On Tue, Mar 31, 2009 at 12:25 PM, Cymon Cox <cy at cymon.org> wrote:
>> What I typically do fits pretty nicely with the SeqIO/AlignIO paradigm:
>> (1) use SeqIO to prepare the FASTA input file.
>> (2) run the command line tool (e.g. MUSCLE).
>> (3) use AlignIO (or SeqIO) to read the alignment output file.
>
> Well, yes - we can always not use the biopython interface.

Ideally step (2) in the above would be handled via a Biopython
command line wrapper, offering keyword arguments etc.

>> i.e. Have a Biopython wrapper use a temp file to record the
>> given records to in a format appropriate for the command line
>> tool selected, and capturing the output? ?In the case of
>> ClustalW or MUSCLE this means making a temp FASTA input
>> file. ?For ClustalW we'd then have to open the output file, read
>> it, and then delete it.
>
> Yes, that's what I'm suggesting.
>
> Here's my reasoning: it seems to me the input and output formats of the data
> required by a particular alignment tool are incidental and should be hidden
> from the user.

OK - I see this as doing some implicit behind the scenes magic.
Arguably this kind of thing is still nice to have if it makes things
simpler for the user.

I may over use this mantra, but "Explicit is better than implicit",
from the Zen of Python.  http://www.python.org/dev/peps/pep-0020/

> At present the Clustalw interface forces you to write a fasta
> formatted file of your records to disk, and then has Clustalw
> write an aligned matrix to disk in a format specified by the user.

The Clustalw tool only takes FASTA formatted input, so if you have
a bunch of sequences in memory you are forced to convert them
into FASTA format to use them as input.  The question is where
does this conversion take place - explicitly by the user, or implicitly
by a wrapper.

> If the latter is Clustal format, then the record is parsed and an alignment
> object is returned, else None is returned. In either case, an output file(s)
> remains on disk.

It should be a fairly simple enhancement to look at the arguments
to see if another output format we can parse was selected, e.g.
PHYLIP?) and also parse that.  Do you think that would be a
sensible addition to Bio.Clustalw.do_alignment?  Its never been
an issue for me as if you are using the Bio.Clustalw.do_alignment
interface you probably don't care about the output file format.

> So, say we have a bunch of sequences in pir format and we'd like them
> aligned and saved in stockholm format:
>
> from Bio import SeqIO
> from Bio import AlignIO
> from Bio import Clustalw
> from Bio.Clustalw import MultipleAlignCL
> records = SeqIO.parse(open("Tests/NBRF/DMA_nuc.pir", "r"), "pir")
> AlignIO.write([records], open("temp.fasta", "w"), "fasta")

The above line is wrong - it should be:
SeqIO.write(records, open("temp.fasta", "w"), "fasta")
At this point your PIR sequences are not yet aligned, so they'll (probably)
have different lengths, so shouldn't be treated as an alignment.  If it
doesn't raise an error maybe it should...

Also you don't explicitly close the handle this way.

> cline = MultipleAlignCL("temp.fasta", command="clustalw")
> align = Clustalw.do_alignment(cline)
> AlignIO.write([align], open("temp.sth", "w"), "stockholm")

> we end up with 4 output files on disk: temp.aln, ?temp.dnd, ?temp.fasta,
> temp.sth - 3 of which are incidental.

Yes - but as the ClustalW doesn't read in PIR files, and doesn't output
Stockholm files on its own, so this has to happen.  It's just a question
of who does it (the user, or the wrapper code).

> (BTW, using the above procedure on the files "B_nuc.pir" and "Cw_prot.pir"
> in Tests/NBRF hangs on RH and Ubuntu linux: it seems to be waiting for the
> subprocess to return, which it never does: pid, sts = os.waitpid(self.pid,
> 0))

I would guess this is because you never properly closed the
temp.fasta file, so it may not have been flushed to disk when the
Clustalw tool was called.

> As I say, I'd like to see this:
>>>> from Bio.Align import MultipleAlignment
>>>> records = list(SeqIO.parse(open("Tests/NBRF/DMA_nuc.pir", "r"), "pir"))
>>>> align = MultipleAlignment(records, executable="clustalw")
>>>> AlignIO.write([align], open("temp.sth", "w"), "stockholm")
>
> ie resulting in one file temp.sth, which we've explicitly written to disk.

So you'd like the wrapper to take care of creating and deleting the
temp input FASTA file, and also deleting the temp output ClustalW
file after parsing it.  This can probably be done quite cleanly using
python's NamedTemporaryFile object.

>>?For other tools we may be able to just capture its output on
>> stdout and not have to clean up a temp output file.
>>
>> All the possible command line tools have their own arguments,
>> range of file formats, behaviour with respect to default filenames
>> etc. ?Trying to capture all this in a single wrapper seems rather
>> ambitious. ?For example, how would you handle gap penalties?
>> Keep in mind that different tools may use the same name for
>> a gap extension penalty but interpret the values differently.
>
> Sorry, I wasn't very clear about what I intended:
>
> MultipleAlignment(records, executable="clustalw", <keyword args>)
> returns Clustalw.do_alignment(records, <keyword args>)
> and
> MultipleAlignment(records, executable="muscle", <keyword args>)
> returns Muscle.do_alignments(records, <keyword args>)
>
> I'm not suggesting unifying all programme options into a single interface,
> just wrap the individual alignment tool modules in a common call,
> MulitpleAlignment(), align_records(), or whatever...

I see.

> As for the keyword options, at present the Clustalw interface supports the
> manual setting of some attributes to the MultipleAlignCL instance, but there
> is no type or value checking. I think as many options as possible should be
> supported through keyword arguments - tedious, but doable.
>
>> Also, while I can see this might be nice for short alignments
>> (which are quick to run), its rather implicit or magic.
>
> Not sure what you mean here? Why would the size of alignment matter?

Size of alignment influences the compute time, and therefore is an issue for
anyone doing things at the python prompt.  Moreover, if the alignments are
big and slow, you generally want to make sure the output file is kept on disk,
as you'll probably want to read it more than once.

> And as for it being magic, its seems to me it does, and only does, what
> it says on the label - aligns the data.

The magic is the behind the scenes creation/deletion of the input/output
files, and the conversion between file formats.

>> I personally prefer to have to deal with the files explicitly myself
>> - but then I have been dealing with large alignments which I want
>> to keep on disk.
>
> I tend to build many (small - <100 taxa) single gene alignments - in one
> use-case, 280 of them...

In your case I would assume the alignment takes minutes to run.  You tend
to care more about preserving the output files if they take hours to create ;)

>> > Secondly, the biopython interface does not support calling
>> > Clustalw to perform profile alignments,

That is something we should probably add.

> OK, well having had my say, I'm quite happy to write the Muscle module in
> the style of the current Clustalw interface, or whatever style is most
> appropriate for exposing the filename handles. But I'm not sure what that
> would be - perhaps you could elaborate on this a bit...

I've elaborated, perhaps too much? ;)

Basically you seem to be thinking about a high level abstraction for
multiple alignment tools (dependent on the Bio.SeqIO and Bio.AlignIO
module), while I am more focused on the low level abstraction for
wrapping any command line tool.  This isn't to say we can't have both,
but to me it makes sense to start with the low level stuff first.

We (unfortunately) have several styles of command line tool wrappers
in Biopython already - this is a wart that has been on my mental to do
list for some time.  I think we should focus on dealing with command
line strings, and keep this separate from how the tools are invoked
(e.g. subprocess or os.system), preparation of input files, and how
any output is parsed.  As long as this core is in place, more advanced
wrappers are possible on top of this basic infrastructure (Tiago may
have some comments here from his Bio.PopGen work).

Essentially all our command line wrappers start by building a command
line string.  In some cases this command line string is exposed to the
user (e.g. Bio.EMBOSS), and they can choose how they want to invoke
it.  For example, they can explicitly opt to use the Python subprocess
module and pipes if they want to - or use a standard invocation from
Bio.Applications (we may want to add a couple of variations to this
module).

Other wrappers (e.g. Bio.Blast.NCBIStandalone) instead call the tool
for you. In the case of Bio.Blast.NCBIStandalone, if you don't want
the handles because you've told Blast to save its output to a file,
our wrapper still returns the standard output and standard error
handles - it is forced on you (see Bug 2654).   Also, there is no easy
way to see what the actual command line string was, which can make
debugging hard, and also prevents certain things (e.g. submitting the
command line as a task to a cluster of workstations).  At least
Bio.Clustalw offers a command line string object (MultipleAlignCL),
its just the do_alignment helper function I'm not so keen on.

The Bio.Clustalw.do_alignment wrapper is rather unusual in that it
automatically parses the output - while most of our wrappers don't.
Decoupling the parsing is more modular - it makes it easy for the user
to use any parser for the output from a command line tool (either
using stdout, or by reading an output file).  I like this, and it fits
with the handle based approach in most of our parsers.

So, I would suggest we think about adding new wrappers under Bio.Align
(e.g. Bio.Align.Clustalw, Bio.Align.Muscle, Bio.Align.TCoffee - or
perhaps all together in Bio.Align.Applications or something) based on
the Bio.Application module as used in Bio.EMBOSS.  We could then
deprecate Bio.Clustalw, which should also help tidy up the top level
name space.  Initially at least, I wouldn't include any clever wrapper
code at all.

Once we have the basic command line objects done, these could be used
to later add another layer on top implementing Cymon's ideas for
multiple alignment wrappers taking care of intermediate file and
inter-converting file formats on the fly, although I remain to be
convinced about the value this.  If you can pull it off (cross
platform, on several versions of python) then a user friendly high
level interface would be impressive.

Peter


From bartek at rezolwenta.eu.org  Tue Mar 31 09:14:39 2009
From: bartek at rezolwenta.eu.org (Bartek Wilczynski)
Date: Tue, 31 Mar 2009 15:14:39 +0200
Subject: [Biopython-dev] Python's new DVCS chosen
In-Reply-To: <320fb6e00903310427s46e45337g42ced1a8e9c3a37f@mail.gmail.com>
References: <320fb6e00903310328x3c2d8bc0n8138f551da7ea4a2@mail.gmail.com>
	<8b34ec180903310405x5d5353f0q2de270a3c16bdc95@mail.gmail.com>
	<320fb6e00903310427s46e45337g42ced1a8e9c3a37f@mail.gmail.com>
Message-ID: <8b34ec180903310614k1fe4a08bkac19c2cc96b36fad@mail.gmail.com>

On Tue, Mar 31, 2009 at 1:27 PM, Peter <biopython at maubp.freeserve.co.uk> wrote:
> On Tue, Mar 31, 2009 at 12:05 PM, Bartek Wilczynski
> <bartek at rezolwenta.eu.org> wrote:
>> I think we have made a consensus decision to try out git/github and I
>> think it's extremely counter-productive to re-open the discussion on
>> our choice now. I'm not a git fanboy, but because there are _no_
>> universal criteria to choose between git vs. bzr vs. Hg we should not
>> spend more time on this issue.
>
> I hadn't intended to reopen the debate - it was just a post for interests sake.
>
That's relieving. Maybe I'm becoming overly sensitive on the subject.

> As you can probably tell from looking at the biopython network graph
> on github (which I got to work on Linux but only with Adobe's flash
> plugin - gnash etc didn't seem to cope), I've been getting to grips
> with git (and github).
>
I haven't  checked for a while, but it seem's that we've got quite a number
 of people making changes on different branches. That's cool.

I'd like to encourage people to share their impressions of git+github
with others on  the list.
If there are any issues, it's better to discuss them early.

cheers
  Bartek

From biopython at maubp.freeserve.co.uk  Tue Mar 31 10:10:00 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Tue, 31 Mar 2009 15:10:00 +0100
Subject: [Biopython-dev] Easy Git - git for mere mortals?
Message-ID: <320fb6e00903310710x693527f2k25b49d958543939d@mail.gmail.com>

Hi all,

Have any of you tried out easygit (eg)?  If it is as good as it sounds
on their website, it might be a sensible option for those migrating
from CVS/SVN to git for the first time.
http://www.gnome.org/~newren/eg/

Reading the easygit documentation, it sounds like git gives the user
plenty of ways to shoot themselves in the foot (especially if used to
CVS/SVN), and a lot of what easygit does is catch some of these
potential mistakes.  They also stress you can mix and match git and
easy git, so it can act as a stepping stone to using git directly.

This presentation seems a fairly gentle introduction (with plenty of
for interest stuff in the second half that can be ignored),
http://www.gnome.org/~newren/eg/presentations/git-introduction.pdf

There are quite a few other wrappers for git too - all referred to as
"porcelain", which apparently follows from Linux's division of end
user commands in git into external "porcelain" and internal
"plumbing".  The "porcelain" are the bits of a bathroom the end user
sees (like the sink), while they normally only interact with the "ugly
plumbing" when something goes wrong (like dropping an ear ring down
the sink).  This kind of quirky language doesn't really make the
documentation any clearer in my opinion, still I'm sure things are
improving gradually (or at least, I hope they are).  For the moment
I've come to the conclusion the git man pages are not really suitable
for beginners.

Peter

P.S. For the moment, let's keep the wiki page focused on using git
itself directly - too many choices will confuse things.

From cy at cymon.org  Tue Mar 31 10:49:20 2009
From: cy at cymon.org (Cymon Cox)
Date: Tue, 31 Mar 2009 15:49:20 +0100
Subject: [Biopython-dev] Multiple alignment - Clustalw etc...
In-Reply-To: <320fb6e00903310556h670634c2rcaa56c254ade07c5@mail.gmail.com>
References: <7265d4f0903300442h276df25ay1d78fb04180c5b5b@mail.gmail.com> 
	<320fb6e00903300737i73f6efaex7b0a22ee685c74c1@mail.gmail.com> 
	<7265d4f0903310425p60a8f80ewb8aee8cc6b4a663c@mail.gmail.com> 
	<320fb6e00903310556h670634c2rcaa56c254ade07c5@mail.gmail.com>
Message-ID: <7265d4f0903310749x154623few2689a0285f5f6983@mail.gmail.com>

Hi Peter,

2009/3/31 Peter <biopython at maubp.freeserve.co.uk>

> On Tue, Mar 31, 2009 at 12:25 PM, Cymon Cox <cy at cymon.org> wrote:#
>

> > At present the Clustalw interface forces you to write a fasta
> > formatted file of your records to disk, and then has Clustalw
> > write an aligned matrix to disk in a format specified by the user.
>
> The Clustalw tool only takes FASTA formatted input, so if you have
> a bunch of sequences in memory you are forced to convert them
> into FASTA format to use them as input.  The question is where
> does this conversion take place - explicitly by the user, or implicitly
> by a wrapper.


Agreed - that's the question...


> > If the latter is Clustal format, then the record is parsed and an
> alignment
>  > object is returned, else None is returned. In either case, an output
> file(s)
> > remains on disk.
>
> It should be a fairly simple enhancement to look at the arguments
> to see if another output format we can parse was selected, e.g.
> PHYLIP?) and also parse that.  Do you think that would be a
> sensible addition to Bio.Clustalw.do_alignment?


No - I dont think there should be any output file (of any format) at all, an
alignment object should always be returned and the user explicitly write to
format they want using AlignIO. (But I think this becomes clearer below...)


>  Its never been
> an issue for me as if you are using the Bio.Clustalw.do_alignment
> interface you probably don't care about the output file format.


Quite. (Unless you are trying to write to a format not supported by
biopython e.g. GCG, GDE, of course.)


> > So, say we have a bunch of sequences in pir format and we'd like them
> > aligned and saved in stockholm format:
> >
> > from Bio import SeqIO
> > from Bio import AlignIO
> > from Bio import Clustalw
> > from Bio.Clustalw import MultipleAlignCL
> > records = SeqIO.parse(open("Tests/NBRF/DMA_nuc.pir", "r"), "pir")
> > AlignIO.write([records], open("temp.fasta", "w"), "fasta")
>
> The above line is wrong


Doh! Grrr...

Yeah, perhaps it should have raised an error - I'll follow this up elsewhere
- but even with the corrected line and explicitly opening and closing the
file handles, I still can get clustalw to align this file... (later...)

> we end up with 4 output files on disk: temp.aln,  temp.dnd,  temp.fasta,
> > temp.sth - 3 of which are incidental.
>
> Yes - but as the ClustalW doesn't read in PIR files, and doesn't output
> Stockholm files on its own, so this has to happen.  It's just a question
> of who does it (the user, or the wrapper code).


Yep...


> > As I say, I'd like to see this:
>  >>>> from Bio.Align import MultipleAlignment
> >>>> records = list(SeqIO.parse(open("Tests/NBRF/DMA_nuc.pir", "r"),
> "pir"))
> >>>> align = MultipleAlignment(records, executable="clustalw")
> >>>> AlignIO.write([align], open("temp.sth", "w"), "stockholm")
> >
> > ie resulting in one file temp.sth, which we've explicitly written to
> disk.
>
> So you'd like the wrapper to take care of creating and deleting the
> temp input FASTA file, and also deleting the temp output ClustalW
> file after parsing it.  This can probably be done quite cleanly using
> python's NamedTemporaryFile object.
>

Yep.


> >> Also, while I can see this might be nice for short alignments
>  >> (which are quick to run), its rather implicit or magic.
> >
> > Not sure what you mean here? Why would the size of alignment matter?
>
> Size of alignment influences the compute time, and therefore is an issue
> for
> anyone doing things at the python prompt.  Moreover, if the alignments are
> big and slow, you generally want to make sure the output file is kept on
> disk,
> as you'll probably want to read it more than once.


Agreed, but should the call to align the data (ie to clustalw) be writing
the output to disk or should the user be making an explicit call using
AlignIO?


> > And as for it being magic, its seems to me it does, and only does, what
> > it says on the label - aligns the data.
>
> The magic is the behind the scenes creation/deletion of the input/output
> files, and the conversion between file formats.


Fair enough - then magic it be... :)


> > OK, well having had my say, I'm quite happy to write the Muscle module in
> > the style of the current Clustalw interface, or whatever style is most
> > appropriate for exposing the filename handles. But I'm not sure what that
> > would be - perhaps you could elaborate on this a bit...
>
> I've elaborated, perhaps too much? ;)
>
> Basically you seem to be thinking about a high level abstraction for
> multiple alignment tools (dependent on the Bio.SeqIO and Bio.AlignIO
> module), while I am more focused on the low level abstraction for
> wrapping any command line tool.  This isn't to say we can't have both,
> but to me it makes sense to start with the low level stuff first.
>
> We (unfortunately) have several styles of command line tool wrappers
> in Biopython already - this is a wart that has been on my mental to do
> list for some time.  I think we should focus on dealing with command
> line strings, and keep this separate from how the tools are invoked
> (e.g. subprocess or os.system), preparation of input files, and how
> any output is parsed.  As long as this core is in place, more advanced
> wrappers are possible on top of this basic infrastructure (Tiago may
> have some comments here from his Bio.PopGen work).
>
> Essentially all our command line wrappers start by building a command
> line string.  In some cases this command line string is exposed to the
> user (e.g. Bio.EMBOSS), and they can choose how they want to invoke
> it.  For example, they can explicitly opt to use the Python subprocess
> module and pipes if they want to - or use a standard invocation from
> Bio.Applications (we may want to add a couple of variations to this
> module).
>
> Other wrappers (e.g. Bio.Blast.NCBIStandalone) instead call the tool
> for you. In the case of Bio.Blast.NCBIStandalone, if you don't want
> the handles because you've told Blast to save its output to a file,
> our wrapper still returns the standard output and standard error
> handles - it is forced on you (see Bug 2654).   Also, there is no easy
> way to see what the actual command line string was, which can make
> debugging hard, and also prevents certain things (e.g. submitting the
> command line as a task to a cluster of workstations).  At least
> Bio.Clustalw offers a command line string object (MultipleAlignCL),
> its just the do_alignment helper function I'm not so keen on.
>
> The Bio.Clustalw.do_alignment wrapper is rather unusual in that it
> automatically parses the output - while most of our wrappers don't.
> Decoupling the parsing is more modular - it makes it easy for the user
> to use any parser for the output from a command line tool (either
> using stdout, or by reading an output file).  I like this, and it fits
> with the handle based approach in most of our parsers.


Thanks for your thoughts on this, it helps clarify some things...


> So, I would suggest we think about adding new wrappers under Bio.Align
> (e.g. Bio.Align.Clustalw, Bio.Align.Muscle, Bio.Align.TCoffee - or
> perhaps all together in Bio.Align.Applications or something) based on
> the Bio.Application module as used in Bio.EMBOSS.  We could then
> deprecate Bio.Clustalw, which should also help tidy up the top level
> name space.  Initially at least, I wouldn't include any clever wrapper
> code at all.


OK, I'll aim for this with the Muscle code...

Cheers, C.
-- 
____________________________________________________________________

Cymon J. Cox

Centro de Ciencias do Mar
Faculdade de Ciencias do Mar e Ambiente (FCMA)
Universidade do Algarve
Campus de Gambelas
8005-139 Faro
Portugal

Phone: +0351 289800909 ext 7909
Fax: +0351 289800051
Email: cy at cymon.org, cymon at ualg.pt, cymon.cox at gmail.com
HomePage : http://biology.duke.edu/bryology/cymon.html
-8.63/-6.77

From biopython at maubp.freeserve.co.uk  Tue Mar 31 11:24:32 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Tue, 31 Mar 2009 16:24:32 +0100
Subject: [Biopython-dev] Multiple alignment - Clustalw etc...
In-Reply-To: <7265d4f0903310749x154623few2689a0285f5f6983@mail.gmail.com>
References: <7265d4f0903300442h276df25ay1d78fb04180c5b5b@mail.gmail.com>
	<320fb6e00903300737i73f6efaex7b0a22ee685c74c1@mail.gmail.com>
	<7265d4f0903310425p60a8f80ewb8aee8cc6b4a663c@mail.gmail.com>
	<320fb6e00903310556h670634c2rcaa56c254ade07c5@mail.gmail.com>
	<7265d4f0903310749x154623few2689a0285f5f6983@mail.gmail.com>
Message-ID: <320fb6e00903310824v6fb0e1d2gff32b3effccd00b1@mail.gmail.com>

On Tue, Mar 31, 2009 at 3:49 PM, Cymon Cox <cy at cymon.org> wrote:
>>>
>>> If the latter is Clustal format, then the record is parsed and an
>>> alignment object is returned, else None is returned. In either
>>> case, an output file(s) remains on disk.
>>
>> It should be a fairly simple enhancement to look at the arguments
>> to see if another output format we can parse was selected, e.g.
>> PHYLIP?) and also parse that. ?Do you think that would be a
>> sensible addition to Bio.Clustalw.do_alignment?
>
> No - I dont think there should be any output file (of any format) at all, an
> alignment object should always be returned and the user explicitly write to
> format they want using AlignIO. (But I think this becomes clearer below...)

Well there must be an output file, since ClustalW won't write its output
alignment to stdout.  Of course, you would have a wrapper which
deletes the output file after it has been parsed into an Alignment object.
However, we shouldn't change the existing Bio.Clustalw.do_alignment
function to do this (or to delete the .dnd guide tree), since people may
be using the call for these "side effects".

>> ?Its never been
>> an issue for me as if you are using the Bio.Clustalw.do_alignment
>> interface you probably don't care about the output file format.
>
> Quite. (Unless you are trying to write to a format not supported by
> biopython e.g. GCG, GDE, of course.)

What I was saying was Bio.Clustalw.do_alignment knows the requested
output format, and if it is ClustalW it automatically parses the output file
and returns the alignment.  Since this code was written, Bio.AlignIO was
added and could potentially be used to parse PHYLIP (etc) output from
the Clustalw tool.  And one day maybe GCG etc too.

i.e. Right now Bio.Clustalw.do_alignment will return an alignment if it is in
ClustalW format, or None if it isn't.  I'm suggesting Bio.Clustalw.do_alignment
could return an alignment when Bio.AlignIO can parse the requested file
format, or None if it can't.

This would only be a small enhancement, and may not be worth bothering
with if we are thinking about deprecating Bio.Clustalw with a replacement
under Bio.Align.

>> Size of alignment influences the compute time, and therefore is an issue
>> for anyone doing things at the python prompt. ?Moreover, if the alignments
>> are big and slow, you generally want to make sure the output file is kept
>> on disk, as you'll probably want to read it more than once.
>
> Agreed, but should the call to align the data (ie to clustalw) be writing
> the output to disk or should the user be making an explicit call using
> AlignIO?

The command line tool ClustalW will itself write the output to disk.  I don't
recall off hand, but other tools like Muscle may give the option of writing
to a file or to stdout.  In either case, the tool writes to a handle, and the
user may want to *read* this handle using Bio.AlignIO.

If I want the tool's output to go straight to a file, I'd get the tool to do it.
The only reason I can see to be *writing* the alignment with Bio.AlignIO
would be for file conversion (or after manipulating the alignment), and that
would done by the user's python code.

If you are talking about the data preparation (i.e. the input file rather than
the output file), then I think it is up to the user's code to prepare a suitable
input FASTA file (e.g. from SeqRecord objects with Bio.SeqIO) before
calling the command line tool.

>>> And as for it being magic, its seems to me it does, and only does, what
>>> it says on the label - aligns the data.
>>
>> The magic is the behind the scenes creation/deletion of the input/output
>> files, and the conversion between file formats.
>
> Fair enough - then magic it be... :)

:)

>> > OK, well having had my say, I'm quite happy to write the Muscle module in
>> > the style of the current Clustalw interface, or whatever style is most
>> > appropriate for exposing the filename handles. But I'm not sure what that
>> > would be - perhaps you could elaborate on this a bit...
>>
>> I've elaborated, ...
>
> Thanks for your thoughts on this, it helps clarify some things...

Oh good.  If you don't agree with any of that, do say so by the way.

>> So, I would suggest we think about adding new wrappers under Bio.Align
>> (e.g. Bio.Align.Clustalw, Bio.Align.Muscle, Bio.Align.TCoffee - or
>> perhaps all together in Bio.Align.Applications or something) based on
>> the Bio.Application module as used in Bio.EMBOSS. ?We could then
>> deprecate Bio.Clustalw, which should also help tidy up the top level
>> name space. ?Initially at least, I wouldn't include any clever wrapper
>> code at all.
>
> OK, I'll aim for this with the Muscle code...

That sounds good.  Now can I tempt you into trying out github at the same
time, so we can see your proposed code evolve in public?

Could I add at this point that I don't think the wrapper should set any default
arguments - leave that up to the command line tool itself.  Otherwise you can
get the situation where the Biopython defaults get out of sync with the tool's
own default values (an issue with our online qblast wrapper and the NCBI
change their default settings over time).

As an aside, I have used Muscle with Biopython thanks to its option for
strict Clustal ouput, which can be parsed by Bio.AlignIO fine.  For this I
just generated my own command line on the fly, but I was only using a
couple of the command line arguments.

Peter


From bugzilla-daemon at portal.open-bio.org  Tue Mar 31 13:05:50 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 31 Mar 2009 13:05:50 -0400
Subject: [Biopython-dev] [Bug 2799] UnknownSeq object (e.g. for QUAL files)
In-Reply-To: <bug-2799-42@http.bugzilla.open-bio.org/>
Message-ID: <200903311705.n2VH5oKe025136@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2799


biopython-bugzilla at maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED


------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk  2009-03-31 13:05 EST -------
Checked into CVS from
http://github.com/peterjc/biopython/tree/bug2799-UnknownSeq

Checking in Bio/Seq.py;
/home/repository/biopython/biopython/Bio/Seq.py,v  <--  Seq.py
new revision: 1.67; previous revision: 1.66
done
Checking in Bio/SeqRecord.py;
/home/repository/biopython/biopython/Bio/SeqRecord.py,v  <--  SeqRecord.py
new revision: 1.32; previous revision: 1.31
done
Checking in Bio/GenBank/__init__.py;
/home/repository/biopython/biopython/Bio/GenBank/__init__.py,v  <-- 
__init__.py
new revision: 1.106; previous revision: 1.105
done
Checking in Bio/SeqIO/InsdcIO.py;
/home/repository/biopython/biopython/Bio/SeqIO/InsdcIO.py,v  <--  InsdcIO.py
new revision: 1.9; previous revision: 1.8
done
Checking in Bio/SeqIO/QualityIO.py;
/home/repository/biopython/biopython/Bio/SeqIO/QualityIO.py,v  <-- 
QualityIO.py
new revision: 1.8; previous revision: 1.7
done
Checking in Tests/test_SeqIO.py;
/home/repository/biopython/biopython/Tests/test_SeqIO.py,v  <--  test_SeqIO.py
new revision: 1.50; previous revision: 1.49
done
Checking in Tests/output/test_GenBank;
/home/repository/biopython/biopython/Tests/output/test_GenBank,v  <-- 
test_GenBank
new revision: 1.41; previous revision: 1.40
done
Checking in Tests/output/test_SeqIO;
/home/repository/biopython/biopython/Tests/output/test_SeqIO,v  <--  test_SeqIO
new revision: 1.36; previous revision: 1.35
done


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From biopython at maubp.freeserve.co.uk  Tue Mar 31 13:12:37 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Tue, 31 Mar 2009 18:12:37 +0100
Subject: [Biopython-dev] SeqIO and qual: Question about reading and
	writing qual files
In-Reply-To: <320fb6e00903251630t45da293fl4d8d111b7e7eedc9@mail.gmail.com>
References: <9e2f512b0903232324qb509c60v4154d3e1bffb089e@mail.gmail.com>
	<320fb6e00903240249h4d0bf648rfd5de741e582f687@mail.gmail.com>
	<9e2f512b0903240759n3c7f8b8fpc96bccd4d629082d@mail.gmail.com>
	<320fb6e00903240813x5fdb3589qef340129b5e267c0@mail.gmail.com>
	<9e2f512b0903240833g7768de97q8f10fe72cde7e64a@mail.gmail.com>
	<320fb6e00903250301v59319214pa3246e0a49899e87@mail.gmail.com>
	<9e2f512b0903251615x7c14c90en3b3a9b2b6ff86186@mail.gmail.com>
	<320fb6e00903251630t45da293fl4d8d111b7e7eedc9@mail.gmail.com>
Message-ID: <320fb6e00903311012y393761dev975a39464ab82043@mail.gmail.com>

On Thu, Mar 26, 2009 at 12:30 AM, Peter <biopython at maubp.freeserve.co.uk> wrote:
> On Wed, Mar 25, 2009 at 11:15 PM, Sebastian Bassi:
>>> Sebastian - could you have a quick play with this github code (using the new
>>> UnknownSeq class), and the current CVS code (using None), and make sure
>>> both support the slicing operations you were trying earlier? ?Thanks.
>>
>> ...
>>
>> From a practical point of view, both versions are the same, but the
>> concept of UnknownSeq looks solid than None, because if I don't know
>> about about biopython internals, I would never try to slice a None
>> seq. With "None" ...
>> But with the UnknownSeq object, len(s) returns an actual length, so it
>> is more intuitive that it can be sliced.
>
> I agree the UnknownSeq is more intuitive - plus it makes the SeqRecord
> __getitem__ code nicer, and it means you can do len(SeqRecord) too,
> which was problematic if the sequence was None.

I've checked this into CVS after this discussion (and a little off thread).
I wasn't comfortable with using None for a sequence, and doing this
while also wanting to support len(...) and slicing on such SeqRecord
objects was basically horrible.

>> Then I tried the git code and it also worked. One thing I noticed is
>> that I got "?" instead of "N" the "sequence" of the UnknownSeq.
>
> I felt we shouldn't use an "N" unless we are confident the sequence
> is nucleotides.  In practice, this is probably a safe assumption for
> FASTQ and QUAL files - unless anyone can think of a counter example?
> Do you think it is safe to assume FASTQ and QUAL files are just for
> nucleotides?
>
> I mean, you could translate a CDS from transcriptome sequencing,
> and for the sake of argument give each amino acid a quality score
> from the three nucleotide quality scores, and then save this a protein
> FASTQ file.  But I've never heard of anyone actually doing this ;)

So, should we assume QUAL files (and perhaps FASTQ files) are
nucleotides when reading them in, and enforce this when writing
them out?  This would mean the QUAL files' UnknownSeq objects
would use the letter "N" instead of "?".

Or is it more generic to leave it as it is, and not make or force any
assumptions about the nature of the sequence?

Peter


From biopython at maubp.freeserve.co.uk  Tue Mar 31 17:38:48 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Tue, 31 Mar 2009 22:38:48 +0100
Subject: [Biopython-dev] Plan for Biopython 1.50 (beta)
Message-ID: <320fb6e00903311438g6fb0813bt18a035d485a6bb99@mail.gmail.com>

Hi all,

OK guys, after a brief chat off the mailing list, I'm hoping to do the
Biopython 1.50 beta release roughly this weekend, somewhere between
Friday 4 and Monday 6 April.  Until then please consider CVS "frozen"
for anything other that documentation changes or unit test additions,
or at a push really tiny changes.  Once I'm ready to actually do the
release, I'll send out an email requesting no further CVS commits.

Those of you that have committed changes, please check the NEWS file
and DEPRECATED file is up to date - thanks.

After the release of Biopython 1.50 beta, we'll reopen CVS again for
small changes and documentation.  While the beta is being tested by
our user base, I'd like us to push to finish any missing documentation
- in particular for new modules Bio.Motif (Bartek) and
Bio.Graphics.GenomeDiagram (me and/or Leighton), plus the new
SeqRecord slicing and UnknownSeq class (me).

Depending on the feedback from the beta, I'd hope we can do the final
release of Biopython 1.50 well before the end of April, and then
reopen CVS for new code.

That would also be a good point to evaluate moving from CVS to git.
In the meantime, while CVS is (semi) frozen you can all try using
github for keeping your pending submissions under version control ;)

Peter

From mjldehoon at yahoo.com  Sun Mar  1 12:17:28 2009
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Sun, 1 Mar 2009 04:17:28 -0800 (PST)
Subject: [Biopython-dev] ScanProsite
Message-ID: <704108.77040.qm@web62402.mail.re1.yahoo.com>

ScanProsite is a web tool to scan protein sequences against the PROSITE database (see http://www.expasy.org/tools/scanprosite/). Biopython contains code in Bio.Prosite to interact with ScanProsite. However, this code needs to be updated, as it does not work with the current ScanProsite web pages: Neither accessing ScanProsite nor extracting the hits from the HTML page works.

This problem is relatively easy to solve, since ExPASy nowadays allows programmatic access to ScanProsite (see http://www.expasy.org/tools/scanprosite/ScanPrositeREST.html). This returns the Prosite hits in XML format, which can be parsed easily in Python.

The only issue now is how this should be presented to the user. The current (broken) way to access Prosite looks like this:

>>> from Bio import ExPASy
>>> handle = ExPASy.scanprosite1(seq=mysequence)
to get a handle to the raw HTML output, and

>>> from Bio import Prosite
>>> hits = Prosite.scan_sequence_expasy(seq=mysequence)
which returns the hits as a Python list.

One possibility is to have a ScanProsite module under Bio.Prosite or Bio.ExPASy for interaction with ScanProsite. Something like this:
>>> from Bio.ExPASy import ScanProsite
>>> handle = ScanProsite.search(seq=mysequence)
>>> hits = ScanProsite.read(handle)

Another option is to have a scan function in the Bio.Prosite module that accesses the ScanProsite web tool and parses the results:
>>> from Bio import Prosite
>>> hits = Prosite.scan(seq=mysequence)
This is more straightforward, but on the other hand people may want to save the XML search results in an XML file, and for that purpose we'd need a function that does the parsing only.

Any opinions?

--Michiel


From bugzilla-daemon at portal.open-bio.org  Sun Mar  1 17:00:36 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sun, 1 Mar 2009 12:00:36 -0500
Subject: [Biopython-dev] [Bug 2495] parse element symbols for ATOM/HETATM
	records (Bio.PDB.PDBParser)
In-Reply-To: <bug-2495-42@http.bugzilla.open-bio.org/>
Message-ID: <200903011700.n21H0alo006588@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2495


------- Comment #1 from barry_finzel at yahoo.com  2009-03-01 12:00 EST -------
IO.save should also write these element types on an output PDB file


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Sun Mar  1 17:06:54 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sun, 1 Mar 2009 12:06:54 -0500
Subject: [Biopython-dev] [Bug 2292] Bio.PDBIO writes TER records without any
	required fields
In-Reply-To: <bug-2292-42@http.bugzilla.open-bio.org/>
Message-ID: <200903011706.n21H6sJp007165@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2292


barry_finzel at yahoo.com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |barry_finzel at yahoo.com


------- Comment #2 from barry_finzel at yahoo.com  2009-03-01 12:06 EST -------
IO.save is also writing TER cards at the end of chains, rather than at the end
of polypeptide chains.
TER cards should never follow HETATM  atom records.  


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Sun Mar  1 17:22:28 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sun, 1 Mar 2009 12:22:28 -0500
Subject: [Biopython-dev] [Bug 2774] New: Bio.PDBIO.save doesn't write the
	required END record
Message-ID: <bug-2774-42@http.bugzilla.open-bio.org/>

http://bugzilla.open-bio.org/show_bug.cgi?id=2774

           Summary: Bio.PDBIO.save doesn't write the required END record
           Product: Biopython
           Version: Not Applicable
          Platform: All
        OS/Version: Mac OS
            Status: NEW
          Severity: normal
          Priority: P2
         Component: Other
        AssignedTo: biopython-dev at biopython.org
        ReportedBy: barry_finzel at yahoo.com


According to the PDB format specification
(http://www.wwpdb.org/documentation/format32/sect1.html)
All PDB files must be terminated with a record containing just "END\n".

Easy to fix in PDBIO.save()


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From biopython at maubp.freeserve.co.uk  Mon Mar  2 10:26:38 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Mon, 2 Mar 2009 10:26:38 +0000
Subject: [Biopython-dev] ScanProsite
In-Reply-To: <704108.77040.qm@web62402.mail.re1.yahoo.com>
References: <704108.77040.qm@web62402.mail.re1.yahoo.com>
Message-ID: <320fb6e00903020226n3e5929ean957f38315c28d863@mail.gmail.com>

On Sun, Mar 1, 2009 at 12:17 PM, Michiel de Hoon <mjldehoon at yahoo.com> wrote:
> ScanProsite is a web tool to scan protein sequences against the PROSITE
> database (see http://www.expasy.org/tools/scanprosite/). Biopython contains
> code in Bio.Prosite to interact with ScanProsite. However, this code needs to
> be updated, as it does not work with the current ScanProsite web pages:
> Neither accessing ScanProsite nor extracting the hits from the HTML page works.
>
> This problem is relatively easy to solve, since ExPASy nowadays allows
> programmatic access to ScanProsite
> (see http://www.expasy.org/tools/scanprosite/ScanPrositeREST.html). This
> returns the Prosite hits in XML format, which can be parsed easily in Python.
>
> The only issue now is how this should be presented to the user. ...
> ...
> This is more straightforward, but on the other hand people may want to save the
> XML search results in an XML file, and for that purpose we'd need a function that
> does the parsing only.
>
> Any opinions?

I would definitely have two functions, one returning a handle to the
XML, and one for parsing XML from a handle.  This would be more
consistent with Bio.Entrez and other parsers, and more flexible.  For
example, the user can opt to save the XML to disk, and they can also
use our parser on files or the remote site - plus of course they can
use any other XML parser they may prefer.

I like your suggestion to have a REST XML based module under
Bio.ExPASy, which means we can deprecate the HTML based Bio.Prosite
module and in the process make the top level list of modules in
Biopython a bit shorter.  In the long term I think that will help
people find functionality.

Peter


From bugzilla-daemon at portal.open-bio.org  Mon Mar  2 15:22:53 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 2 Mar 2009 10:22:53 -0500
Subject: [Biopython-dev] [Bug 2776] New: Bio.pairwise2 returns non-optimal
	alignment in at least some cases
Message-ID: <bug-2776-42@http.bugzilla.open-bio.org/>

http://bugzilla.open-bio.org/show_bug.cgi?id=2776

           Summary: Bio.pairwise2 returns non-optimal alignment in at least
                    some cases
           Product: Biopython
           Version: 1.49
          Platform: PC
        OS/Version: Linux
            Status: NEW
          Severity: normal
          Priority: P2
         Component: Main Distribution
        AssignedTo: biopython-dev at biopython.org
        ReportedBy: klaus.kopec at tuebingen.mpg.de


At least in some cases, Bio.pairwise2 returns an alignment that is not the one
with the highest score for the input parameters. This occurs in localXX and
globalXX.

Yet, I only encountered the problem with large mismatch values (which I use as
I need mismatch free alignments).

simple example (the bug also occured for longer sequences):
>>> sequence1 = 'GKG'
>>> sequence2 = 'GWG'
>>> A = pairwise2.align.globalms(sequence1, sequence2, 5, -100, -5, -5)[0]
>>> A[0]
'GKG--'
>>> A[1]
'--GWG'
>>> A[2]
-15.0

whereas
'GK-G'
'G-WG'

would get a score of 0


System: Kubuntu 8.10 64Bit, Python 2.6.1, Biopython 1.49 (my pairwise2.py is
identical to the current CVS version of it)


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Wed Mar  4 12:41:33 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 4 Mar 2009 07:41:33 -0500
Subject: [Biopython-dev] [Bug 2777] New: [Solution is one line change!]
	Entity sorting altered by detach_child() calls
Message-ID: <bug-2777-42@http.bugzilla.open-bio.org/>

http://bugzilla.open-bio.org/show_bug.cgi?id=2777

           Summary: [Solution is one line change!] Entity sorting altered by
                    detach_child() calls
           Product: Biopython
           Version: 1.49
          Platform: PC
        OS/Version: Linux
            Status: NEW
          Severity: trivial
          Priority: P1
         Component: Main Distribution
        AssignedTo: biopython-dev at biopython.org
        ReportedBy: klaus.kopec at tuebingen.mpg.de


detach_child(self, id) in Bio.PDB.Entity changes the order of self.child_list.

This bug is caused by line 71, where self.child_list is set to
self.child_dict.values() which are values of an unordered(!) dict:
self.child_list=self.child_dict.values()

Solution: Replace line 71 by:
self.child_list.remove(child)


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Wed Mar  4 12:48:19 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 4 Mar 2009 07:48:19 -0500
Subject: [Biopython-dev] [Bug 2777] [Solution is one line change!] Entity
	sorting altered by detach_child() calls
In-Reply-To: <bug-2777-42@http.bugzilla.open-bio.org/>
Message-ID: <200903041248.n24CmJSZ008104@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2777


------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk  2009-03-04 07:48 EST -------
Have you got a short example to demonstrate the original problem?

It would be useful to evaluate your change, and could be made into a unit test
too.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Wed Mar  4 13:58:41 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 4 Mar 2009 08:58:41 -0500
Subject: [Biopython-dev] [Bug 2777] [Solution is one line change!] Entity
	sorting altered by detach_child() calls
In-Reply-To: <bug-2777-42@http.bugzilla.open-bio.org/>
Message-ID: <200903041358.n24Dwfjk015027@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2777


------- Comment #2 from klaus.kopec at tuebingen.mpg.de  2009-03-04 08:58 EST -------
Created an attachment (id=1253)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1253&action=view)
example PDB file that can be used to see the bug

## Python Code to see the bug:
import os
from Bio.PDB.PDBParser import PDBParser
p=PDBParser(PERMISSIVE=1)
filename=os.path.expanduser("entity_detach_order_bug_example.pdb")
s=p.get_structure('Entity.py bug example: detach changes order', filename)

print 'order before detach:'
for r in s[0]['A'].child_list:
    print r.id

detach_me = s[0]['A'].child_list[-1] ## this is independent of the chosen entry
in the list
s[0]['A'].detach_child(detach_me.id)

print 'order after detach:'
for r in s[0]['A'].child_list:
    print r.id


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Wed Mar  4 14:18:28 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 4 Mar 2009 09:18:28 -0500
Subject: [Biopython-dev] [Bug 2777] [Solution is one line change!] Entity
	sorting altered by detach_child() calls
In-Reply-To: <bug-2777-42@http.bugzilla.open-bio.org/>
Message-ID: <200903041418.n24EISvd016743@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2777


------- Comment #3 from klaus.kopec at tuebingen.mpg.de  2009-03-04 09:18 EST -------
the output of the code in my Comment #2 is:
order before detach:
('H_PCA', 1, ' ')
(' ', 2, ' ')
(' ', 3, ' ')
(' ', 4, ' ')
order after detach:
(' ', 2, ' ')
(' ', 3, ' ')
('H_PCA', 1, ' ')

I forgot to mention, that the line "self.child_list.sort(self._sort)" needs to
be commented out as well for the fix to work (as hetatms are otherwise sorted
to the end).

hmmm... it just came to me, that this probably breaks the Parser for some other
PDB files, where residues are unsorted.

These changes do not break any existing unit tests for the PDB module, so maybe
it's still a step in the right direction.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Wed Mar  4 14:37:34 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 4 Mar 2009 09:37:34 -0500
Subject: [Biopython-dev] [Bug 2777] [Solution is one line change!] Entity
	sorting altered by detach_child() calls
In-Reply-To: <bug-2777-42@http.bugzilla.open-bio.org/>
Message-ID: <200903041437.n24EbYhj018545@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2777


------- Comment #4 from biopython-bugzilla at maubp.freeserve.co.uk  2009-03-04 09:37 EST -------
Created an attachment (id=1254)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1254&action=view)
Patch for Bio/PDB/Entity.py based on Klaus Kopec's suggestion

I've attached a patch which makes the suggested change.  I'm hoping to get
Thomas (the original author) to comment but otherwise I see no reason not to
commit this fix soon.

The old code did this:

    def detach_child(self, id):
        "Remove a child."
        child=self.child_dict[id] 
        child.detach_parent()
        del self.child_dict[id]
        self.child_list=self.child_dict.values()
        self.child_list.sort(self._sort)

It used a sort which should have preserved the order - but that only works if
the child_list is always kept sorted.  Looking at the add method, this isn't
true:

    def add(self, entity):
        "Add a child to the Entity."
        entity_id=entity.get_id()
        if self.has_id(entity_id):
            raise PDBConstructionException( \
                "%s defined twice" % str(entity_id))
        entity.set_parent(self)
        self.child_list.append(entity)
        #self.child_list.sort(self._sort)
        self.child_dict[entity_id]=entity

Interestingly the sort was commented out in the original version first
committed to Biopython's CVS, so this change predates the integration into
Biopython.

I haven't checked to see if there are any other ways the child_list could
become unsorted - that doesn't really matter.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Wed Mar  4 16:17:31 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 4 Mar 2009 11:17:31 -0500
Subject: [Biopython-dev] [Bug 2774] Bio.PDBIO.save doesn't write the
	required END record
In-Reply-To: <bug-2774-42@http.bugzilla.open-bio.org/>
Message-ID: <200903041617.n24GHVd1029752@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2774


thamelry at binf.ku.dk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED


------- Comment #1 from thamelry at binf.ku.dk  2009-03-04 11:17 EST -------

save method now has option 'write_end':

io.save(fp, write_end=1)

if 1, END is written. The reason this is not done by default is that one
sometimes calls 'save' multiple times, for example when concatenating files. So
always writing END is not a good approach.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Wed Mar  4 19:10:37 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 4 Mar 2009 14:10:37 -0500
Subject: [Biopython-dev] [Bug 2778] New: Efficiency improvement in function
	Bio.SeqUtils.GC()
Message-ID: <bug-2778-42@http.bugzilla.open-bio.org/>

http://bugzilla.open-bio.org/show_bug.cgi?id=2778

           Summary: Efficiency improvement in function Bio.SeqUtils.GC()
           Product: Biopython
           Version: 1.48
          Platform: PC
        OS/Version: Linux
            Status: NEW
          Severity: normal
          Priority: P5
         Component: Main Distribution
        AssignedTo: biopython-dev at biopython.org
        ReportedBy: wscott at chem.ubc.ca


Bio.SeqUtils.GC recalculates the gc variable in a loop using a dictionary
whereas it could simply be calculated after the loop.
The following code is suggested to replace the function:

def ScoGC(seq):
   """ calculates G+C content """
   gc=sum(map(seq.count,['G','C','g','c','S','s']))
   return gc*100.0/len(seq)


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Wed Mar  4 19:12:27 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 4 Mar 2009 14:12:27 -0500
Subject: [Biopython-dev] [Bug 2778] Efficiency improvement in function
	Bio.SeqUtils.GC()
In-Reply-To: <bug-2778-42@http.bugzilla.open-bio.org/>
Message-ID: <200903041912.n24JCR2U014353@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2778


wscott at chem.ubc.ca changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |wscott at chem.ubc.ca


------- Comment #1 from wscott at chem.ubc.ca  2009-03-04 14:12 EST -------
of course, rename ScoGC to GC...


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Wed Mar  4 22:03:59 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 4 Mar 2009 17:03:59 -0500
Subject: [Biopython-dev] [Bug 2779] New: Seq.count() docstring should note
	unexpected behaviour
Message-ID: <bug-2779-42@http.bugzilla.open-bio.org/>

http://bugzilla.open-bio.org/show_bug.cgi?id=2779

           Summary: Seq.count() docstring should note unexpected behaviour
           Product: Biopython
           Version: 1.49
          Platform: PC
        OS/Version: Windows XP
            Status: NEW
          Severity: normal
          Priority: P2
         Component: Documentation
        AssignedTo: biopython-dev at biopython.org
        ReportedBy: baoilleach at gmail.com


The Seq.count() method has the following docs:
"Count method, like that of a python string."

This is a cop-out as it does not tell the user anything. In particular, it does
not lead the user to expect that Seq("GGG").count("GG")==1. This might make
sense for Python strings, but it's incorrect for sequences.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Thu Mar  5 09:19:40 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 5 Mar 2009 04:19:40 -0500
Subject: [Biopython-dev] [Bug 2678] Bio.Entrez module does not always
	retrieve or find DTD files
In-Reply-To: <bug-2678-42@http.bugzilla.open-bio.org/>
Message-ID: <200903050919.n259Je8d016299@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2678


ruzzo at cs.washington.edu changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |ruzzo at cs.washington.edu


------- Comment #8 from ruzzo at cs.washington.edu  2009-03-05 04:19 EST -------
I'm new to biopython, so I may be doing something else wrong, but in attempting
to efetch a pubmed record tonight I see similar errors which seem to be fixed
by downloading & installing several (new) DTD's:

nlmmedline_090101.dtd
nlmmedlinecitation_090101.dtd
nlmsharedcatcit_090101.dtd
nlmcommon_090101.dtd
and possibly 
pubmed_090101.dtd


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Thu Mar  5 09:23:31 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 5 Mar 2009 04:23:31 -0500
Subject: [Biopython-dev] [Bug 2779] Seq.count() docstring should note
	unexpected behaviour
In-Reply-To: <bug-2779-42@http.bugzilla.open-bio.org/>
Message-ID: <200903050923.n259NV4S016627@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2779


------- Comment #1 from lpritc at scri.sari.ac.uk  2009-03-05 04:23 EST -------
I think that's a good point about expected behaviour for count() in a
biological sequence.  Presumably, we all expect that Seq('GGG').count('GG')
should find all overlapping matches, and return the value 2, in order to make
intuitive 'biological' sense.  There are, after all, two 'GG's in that
sequence.  This doesn't correspond to string count()ing behaviour, or to
standard re module behaviour.

The obvious way round it, that I've used before, is to compile the search
string as a regular expression, and iterate regular expression matches from one
symbol after the start of the preceding match (if any):

>>> import re
>>> startpos = 0
>>> seq = 'GGGG'
>>> motif = 'GG'
>>> motif_re = re.compile(motif)
>>> matches = []
>>> while True:
...     m = motif_re.search(seq, startpos)
...     if m is None:
...             break
...     startpos = m.start() + 1
...     matches.append(m)
... 
>>> matches
[<_sre.SRE_Match object at 0x68f38>, <_sre.SRE_Match object at 0x96ac60>,
<_sre.SRE_Match object at 0x96a950>]
>>> [(m.start(), m.group()) for m in matches]
[(0, 'GG'), (1, 'GG'), (2, 'GG')]

This could probably be done more efficiently.  Is something like this already
implemented in Bio.Motif


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Thu Mar  5 09:24:43 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 5 Mar 2009 04:24:43 -0500
Subject: [Biopython-dev] [Bug 2779] Seq.count() docstring should note
	unexpected behaviour
In-Reply-To: <bug-2779-42@http.bugzilla.open-bio.org/>
Message-ID: <200903050924.n259OhYw016750@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2779


------- Comment #2 from lpritc at scri.sari.ac.uk  2009-03-05 04:24 EST -------
D'oh!  There isn't a Bio.Motif.  My bad.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Thu Mar  5 09:43:09 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 5 Mar 2009 04:43:09 -0500
Subject: [Biopython-dev] [Bug 2779] Seq.count() docstring should note
	unexpected behaviour
In-Reply-To: <bug-2779-42@http.bugzilla.open-bio.org/>
Message-ID: <200903050943.n259h9XG018545@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2779


------- Comment #3 from baoilleach at gmail.com  2009-03-05 04:43 EST -------
Thanks for the workaround but could you replace the current count by that code?

Can you imagine any existing code that would break because of correction of
buggy behaviour?


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Thu Mar  5 10:16:52 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 5 Mar 2009 05:16:52 -0500
Subject: [Biopython-dev] [Bug 2779] Seq.count() docstring should note
	unexpected behaviour
In-Reply-To: <bug-2779-42@http.bugzilla.open-bio.org/>
Message-ID: <200903051016.n25AGqSW021680@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2779


------- Comment #4 from lpritc at scri.sari.ac.uk  2009-03-05 05:16 EST -------
Created an attachment (id=1255)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1255&action=view)
Patch to Seq.py that modified count behaviour for Seq and MutableSeq objects to
return correct counts for substrings of length > 1

(In reply to comment #3)
> Thanks for the workaround but could you replace the current count by that code?

I don't have access to CVS ;)

It would be nice to get consensus that the behaviour that this code would
produce is the desired behaviour for everyone, that we've got an acceptable way
of implementing it, and that it doesn't break anything downstream.  There's
bound to be, at best, a lag time.

I've attached a proposed patch based on the above code, though it's not
necessarily the best way to solve this problem.

> Can you imagine any existing code that would break because of correction of
> buggy behaviour?

That should come out in the testing.

And it turns out that there is a Bio.Motif, but it's in CVS. D'oh! again...


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Thu Mar  5 10:22:40 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 5 Mar 2009 05:22:40 -0500
Subject: [Biopython-dev] [Bug 2779] Seq.count() docstring should note
	unexpected behaviour
In-Reply-To: <bug-2779-42@http.bugzilla.open-bio.org/>
Message-ID: <200903051022.n25AMeIt022121@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2779


------- Comment #5 from biopython-bugzilla at maubp.freeserve.co.uk  2009-03-05 05:22 EST -------
Prior to Biopython 1.45, the count method only worked with single letter search
strings.  I changed this just over a year ago for Biopython 1.45 as Bug 2386,
but unfortunately at the time none of us considered this
overlapping/non-overlapping behaviour.  With hindsight we should have had this
debate then.

http://cvs.biopython.org/cgi-bin/viewcvs/viewcvs.cgi/biopython/Bio/Seq.py.diff?r1=1.19&r2=1.20&cvsroot=biopython

We should either:

(a) stick with the python string compatible behaviour (which has been a general
principle for the Seq class), but document this issue more clearly as a
non-overlapping search does run counter to biological usage.

or,

(b) Or change the behaviour as Leighton suggests to do an overlapping search. 
This could break any code relying on the old python string-like behaviour.

I agree we need to have a discussion of this over on the main mailing list, as
making the change could break people's code.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Thu Mar  5 10:42:27 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 5 Mar 2009 05:42:27 -0500
Subject: [Biopython-dev] [Bug 2780] New: PDB file HETATMs cannot be
	alternative location of a residue that is an ATOM
Message-ID: <bug-2780-42@http.bugzilla.open-bio.org/>

http://bugzilla.open-bio.org/show_bug.cgi?id=2780

           Summary: PDB file HETATMs cannot be alternative location of a
                    residue that is an ATOM
           Product: Biopython
           Version: 1.49
          Platform: PC
        OS/Version: Linux
            Status: NEW
          Severity: normal
          Priority: P2
         Component: Main Distribution
        AssignedTo: biopython-dev at biopython.org
        ReportedBy: klaus.kopec at tuebingen.mpg.de


In PDB files where HETATMs and ATOMs are altlocs of each other (e.g. 1RR2,
residue 184), they are treated as two separate residues.

A obvious solution is to add an "else" case to the "if" in StructureBuilder.py
line 115 (method init_residue(...)) that introduces some kind of mixed (HETATM
as well as ATOM) DisorderedResidue.

The Main problem with that: the hetero field of the residue ids will differ
between the residues, therefore the whole access-over-ids mechanism will most
likely not work with these MixedDisorderedResidues as straight forward as it
does so far.

Sadly, I could not come up with a good solution for this. Maybe some
__getattr__ magic that alters the way Chains access their residues might work
by allowing access to residues by only using the second and third component of
the id 3-tuple?!


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Thu Mar  5 10:44:12 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 5 Mar 2009 05:44:12 -0500
Subject: [Biopython-dev] [Bug 2678] Bio.Entrez module does not always
	retrieve or find DTD files
In-Reply-To: <bug-2678-42@http.bugzilla.open-bio.org/>
Message-ID: <200903051044.n25AiCH9023924@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2678


------- Comment #9 from biopython-bugzilla at maubp.freeserve.co.uk  2009-03-05 05:44 EST -------
(In reply to comment #8)
> I'm new to biopython, so I may be doing something else wrong, but in
> attempting to efetch a pubmed record tonight I see similar errors which
> seem to be fixed by downloading & installing several (new) DTD's:
> 
> nlmmedline_090101.dtd
> nlmmedlinecitation_090101.dtd
> nlmsharedcatcit_090101.dtd
> nlmcommon_090101.dtd
> and possibly 
> pubmed_090101.dtd
> 

Those have been added to CVS, and will be installed with Biopython 1.50 -
perhaps we should hurry up our release plans.

http://cvs.biopython.org/cgi-bin/viewcvs/viewcvs.cgi/biopython/Bio/Entrez/DTDs/?cvsroot=biopython


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Thu Mar  5 10:46:09 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 5 Mar 2009 05:46:09 -0500
Subject: [Biopython-dev] [Bug 2780] PDB file HETATMs cannot be alternative
	location of a residue that is an ATOM
In-Reply-To: <bug-2780-42@http.bugzilla.open-bio.org/>
Message-ID: <200903051046.n25Ak9DH024105@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2780


------- Comment #1 from klaus.kopec at tuebingen.mpg.de  2009-03-05 05:46 EST -------
Created an attachment (id=1256)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1256&action=view)
PDB file slice with 2 residues, that can be used to see the bug.

slice of PDB file 1RR2 (example mentioned in my bug submission) showing two
altloc residues where one is a HETATM and the other an ATOM. They are treated
as two residues in Biopython.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Thu Mar  5 10:56:39 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 5 Mar 2009 05:56:39 -0500
Subject: [Biopython-dev] [Bug 2778] Efficiency improvement in function
	Bio.SeqUtils.GC()
In-Reply-To: <bug-2778-42@http.bugzilla.open-bio.org/>
Message-ID: <200903051056.n25AudjU024927@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2778


biopython-bugzilla at maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED


------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk  2009-03-05 05:56 EST -------
I've checked that in, but with the existing code to catch a zero length
sequence and return 0 instead of raising a ZeroDivisionError.

def GC(seq):
    """Calculates G+C content, ..."""
    gc=sum(map(seq.count,['G','C','g','c','S','s']))
    if gc == 0: return 0
    return gc*100.0/len(seq)


The old code had been modified several times - it originally calculated the GC%
as the CG count divided by the ATCG count, thus it had to count all the bases. 
You are right, this is much cleaner.

Thanks.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Thu Mar  5 11:18:33 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 5 Mar 2009 06:18:33 -0500
Subject: [Biopython-dev] [Bug 2779] Seq.count() docstring should note
	unexpected behaviour
In-Reply-To: <bug-2779-42@http.bugzilla.open-bio.org/>
Message-ID: <200903051118.n25BIXdp026743@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2779


------- Comment #6 from baoilleach at gmail.com  2009-03-05 06:18 EST -------
Sorry - could you clarify which mailing list you mean by the "main mailing
list", the dev list or the discuss list?


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Thu Mar  5 12:27:49 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 5 Mar 2009 07:27:49 -0500
Subject: [Biopython-dev] [Bug 2779] Seq.count() docstring should note
	unexpected behaviour
In-Reply-To: <bug-2779-42@http.bugzilla.open-bio.org/>
Message-ID: <200903051227.n25CRnmA001571@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2779


------- Comment #7 from biopython-bugzilla at maubp.freeserve.co.uk  2009-03-05 07:27 EST -------
(In reply to comment #6)
> Sorry - could you clarify which mailing list you mean by the "main mailing
> list", the dev list or the discuss list?

I was thinking the main discussion list, and we should focus on the desired
behaviour rather than how we might implement it.  See:

http://lists.open-bio.org/pipermail/biopython/2009-March/004960.html

Peter


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Thu Mar  5 12:31:50 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 5 Mar 2009 07:31:50 -0500
Subject: [Biopython-dev] [Bug 2781] New: Bio.PDB Structure instances cannot
	be deepcopied
Message-ID: <bug-2781-42@http.bugzilla.open-bio.org/>

http://bugzilla.open-bio.org/show_bug.cgi?id=2781

           Summary: Bio.PDB Structure instances cannot be deepcopied
           Product: Biopython
           Version: 1.49
          Platform: PC
        OS/Version: Linux
            Status: NEW
          Severity: minor
          Priority: P3
         Component: Main Distribution
        AssignedTo: biopython-dev at biopython.org
        ReportedBy: klaus.kopec at tuebingen.mpg.de


For some reason, copy.deepcopy() of a Structure instance results in:

Exception RuntimeError: 'maximum recursion depth exceeded while calling a
Python object' in <type 'exceptions.AttributeError'> ignored

for most PDB files I tried.

Maybe implementing some __deepcopy__ methods might help, but I am unsure, as I
did not perform profound research concerning this bug.

My system: Kubuntu 8.10 64-Bit, Python 2.6.1


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From biopython at maubp.freeserve.co.uk  Thu Mar  5 12:40:16 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Thu, 5 Mar 2009 12:40:16 +0000
Subject: [Biopython-dev] Fwd: [Utilities-announce] NCBI E-Utilities
	requirements updated
In-Reply-To: <7B6F170840CA6C4DA63EE0C8A7BB43EC051985F6@NIHCESMLBX15.nih.gov>
References: <AcmN/nulEie6nWfHT9+4rg4/ff6DGwADuzjwAozlJvA=>
	<7B6F170840CA6C4DA63EE0C8A7BB43EC051985F6@NIHCESMLBX15.nih.gov>
Message-ID: <320fb6e00903050440h138893b3yb2484a557621fc41@mail.gmail.com>

This email was sent out a few weeks ago, but it took a while before
the NCBI webpage was actually updated (maybe a caching issue) so I
didn't rush to relax our rules immediately.

Under the new rules we must make no more than three requests every
second.  We could track the times of the last two requests in order to
enforce this as worded, but I think it would be simpler just to switch
from using a minimum 3 second pause between Bio.Entrez requests to
just a minimum 0.33334 second pause.  This is a much simpler code
change and will comply with the new relaxed rules.

Unless anyone has a counter suggestion, I will update Bio.Entrez and
the tutorial shortly.

Peter
---------- Forwarded message ----------
From:  <utilities-announce at ncbi.nlm.nih.gov>
Date: Thu, Feb 26, 2009 at 6:55 PM
Subject: [Utilities-announce] NCBI E-Utilities requirements updated
To: utilities-announce at ncbi.nlm.nih.gov


NCBI E-Utilities users,

E-Utilities system use requirements have been modified ?from no more
than 1 request every 3 seconds to no more than 3 requests every
second.

The online documentation has been updated to reflect this change:

http://eutils.ncbi.nlm.nih.gov/entrez/query/static/eutils_help.html


Thank you.

NCBI/NLM/NIH

_______________________________________________
Utilities-announce mailing list
http://www.ncbi.nlm.nih.gov/mailman/listinfo/utilities-announce


From bugzilla-daemon at portal.open-bio.org  Thu Mar  5 12:58:40 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 5 Mar 2009 07:58:40 -0500
Subject: [Biopython-dev] [Bug 2779] Seq.count() docstring should note
	unexpected behaviour
In-Reply-To: <bug-2779-42@http.bugzilla.open-bio.org/>
Message-ID: <200903051258.n25Cwe9p004288@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2779


------- Comment #8 from barwil at gmail.com  2009-03-05 07:58 EST -------
(In reply to comment #4)

> This could probably be done more efficiently.  Is something like this already
> implemented in Bio.Motif
> 

In Bio.Motif you can do:

m=Bio.Motif.Motif()
m.add_instance(Seq("GG"),m.alphabet))
for i in m.search_instances(Seq("GGGG",m.alphabet)):
  print i

this should give you overlapping hits

there is Bio.Motif in CVS, but the same implementation is in Bio.AlignAce.Motif
(now obsoleted).


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From biopython at maubp.freeserve.co.uk  Thu Mar  5 12:58:40 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Thu, 5 Mar 2009 12:58:40 +0000
Subject: [Biopython-dev] determining the version
In-Reply-To: <320fb6e00902190225o34092311saddf02ec39f1e1dd@mail.gmail.com>
References: <320fb6e00809241412r54c2a3a1mc69f3e573f1eaac7@mail.gmail.com>
	<63700.34226.qm@web62405.mail.re1.yahoo.com>
	<320fb6e00809250222h3d0d15bw763446b5f0ec44d1@mail.gmail.com>
	<320fb6e00810010929y4dab07a5ya25767cc0818654d@mail.gmail.com>
	<320fb6e00902190225o34092311saddf02ec39f1e1dd@mail.gmail.com>
Message-ID: <320fb6e00903050458r5ef0e202l5e1a61031fb80c2@mail.gmail.com>

On Thu, Feb 19, 2009 at 10:25 AM, Peter <biopython at maubp.freeserve.co.uk> wrote:
>
> Since this thread last year, there have been no objections. ?Following
> a recent question on the main mailing list about how to determine the
> version of Biopython this seems worth doing before the next release.
> Again, an objections or comments on the implementation details?
> Otherwise I'll make this change shortly.
>

Changes made in CVS, and updated the release instructions:
http://biopython.org/wiki/Building_a_release

In between releases, should we leave the __version__ as is, or
explicitly update it to be something like "1.49+" just after releasing
1.49?  This only affects people installing Biopython from CVS, so they
should be technically inclined...

Peter


From bugzilla-daemon at portal.open-bio.org  Thu Mar  5 14:47:30 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 5 Mar 2009 09:47:30 -0500
Subject: [Biopython-dev] [Bug 2507] Adding __getitem__ to SeqRecord for
	element access and slicing
In-Reply-To: <bug-2507-42@http.bugzilla.open-bio.org/>
Message-ID: <200903051447.n25ElU37014276@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2507


biopython-bugzilla at maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED


------- Comment #14 from biopython-bugzilla at maubp.freeserve.co.uk  2009-03-05 09:47 EST -------
We seem to have reached agreement on the mailing list, so checking this patch
in, and marking this issue as fixed.

Note we may want to review the choice of name for the new
per-letter-annotations attribute (as long as this happens before the Biopython
1.50 release), currently this is letter_annotations as per a brief discussion
on the mailing list.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Thu Mar  5 14:47:43 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 5 Mar 2009 09:47:43 -0500
Subject: [Biopython-dev] [Bug 2551] Adding advanced __getitem__ to generic
	alignment, e.g. align[1:2, 5:-5]
In-Reply-To: <bug-2551-42@http.bugzilla.open-bio.org/>
Message-ID: <200903051447.n25ElhAb014302@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2551


Bug 2551 depends on bug 2507, which changed state.

Bug 2507 Summary: Adding __getitem__ to SeqRecord for element access and slicing
http://bugzilla.open-bio.org/show_bug.cgi?id=2507

           What    |Old Value                   |New Value
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Thu Mar  5 14:47:44 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 5 Mar 2009 09:47:44 -0500
Subject: [Biopython-dev] [Bug 2767] Bio.SeqIO support for FASTQ and QUAL
	files
In-Reply-To: <bug-2767-42@http.bugzilla.open-bio.org/>
Message-ID: <200903051447.n25EliM6014314@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2767


Bug 2767 depends on bug 2507, which changed state.

Bug 2507 Summary: Adding __getitem__ to SeqRecord for element access and slicing
http://bugzilla.open-bio.org/show_bug.cgi?id=2507

           What    |Old Value                   |New Value
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Thu Mar  5 15:31:17 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 5 Mar 2009 10:31:17 -0500
Subject: [Biopython-dev] [Bug 2778] Efficiency improvement in function
	Bio.SeqUtils.GC()
In-Reply-To: <bug-2778-42@http.bugzilla.open-bio.org/>
Message-ID: <200903051531.n25FVHOq018242@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2778


------- Comment #3 from bsouthey at gmail.com  2009-03-05 10:31 EST -------
(In reply to comment #2)
> I've checked that in, but with the existing code to catch a zero length
> sequence and return 0 instead of raising a ZeroDivisionError.
> 
> def GC(seq):
>     """Calculates G+C content, ..."""
>     gc=sum(map(seq.count,['G','C','g','c','S','s']))
>     if gc == 0: return 0
>     return gc*100.0/len(seq)
> 

I think that it is clearer to check that the sequence length is not zero rather
than assuming that if the sum is zero then the sequence length is also zero. 

def GC(seq):
    """Calculates G+C content, ..."""
   gc=sum(map(seq.count,['G','C','g','c','S','s']))
   if len(seq) > 0: 
      return gc*100.0/len(seq)
   else:
      return 0


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Thu Mar  5 15:51:20 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 5 Mar 2009 10:51:20 -0500
Subject: [Biopython-dev] [Bug 2778] Efficiency improvement in function
	Bio.SeqUtils.GC()
In-Reply-To: <bug-2778-42@http.bugzilla.open-bio.org/>
Message-ID: <200903051551.n25FpKGf020282@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2778


------- Comment #4 from lpritc at scri.sari.ac.uk  2009-03-05 10:51 EST -------
(In reply to comment #3)
> (In reply to comment #2)
> > I've checked that in, but with the existing code to catch a zero length
> > sequence and return 0 instead of raising a ZeroDivisionError.
> > 
> > def GC(seq):
> >     """Calculates G+C content, ..."""
> >     gc=sum(map(seq.count,['G','C','g','c','S','s']))
> >     if gc == 0: return 0
> >     return gc*100.0/len(seq)
> > 
> 
> I think that it is clearer to check that the sequence length is not zero rather
> than assuming that if the sum is zero then the sequence length is also zero. 
> 
> def GC(seq):
>     """Calculates G+C content, ..."""
>    gc=sum(map(seq.count,['G','C','g','c','S','s']))
>    if len(seq) > 0: 
>       return gc*100.0/len(seq)
>    else:
>       return 0

It would probably be clearest, quickest and most efficient to comment that
particular line of the code to point out that it does elegant double-duty as a
check for zero sequence length ;)


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Thu Mar  5 15:56:38 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 5 Mar 2009 10:56:38 -0500
Subject: [Biopython-dev] [Bug 2778] Efficiency improvement in function
	Bio.SeqUtils.GC()
In-Reply-To: <bug-2778-42@http.bugzilla.open-bio.org/>
Message-ID: <200903051556.n25Fuc13020807@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2778


------- Comment #5 from biopython-bugzilla at maubp.freeserve.co.uk  2009-03-05 10:56 EST -------
(In reply to comment #3)
> I think that it is clearer to check that the sequence length is
> not zero rather than assuming that if the sum is zero then the
> sequence length is also zero. 

I agree, but had chosen to keep the old code.

> def GC(seq):
>     """Calculates G+C content, ..."""
>    gc=sum(map(seq.count,['G','C','g','c','S','s']))
>    if len(seq) > 0: 
>       return gc*100.0/len(seq)
>    else:
>       return 0
> 

Your length test isn't very elegant, this is much nicer/more pythonic I think:

    if seq :
        gc = sum(map(seq.count,['G','C','g','c','S','s']))
        return gc*100.0/len(seq)
    else :
        return 0

However, given most of the time the sequence will not be empty, this should be
faster:

    try :
        gc = sum(map(seq.count,['G','C','g','c','S','s']))
        return gc*100.0/len(seq)
    except ZeroDivisionError :
        return 0

CVS updated.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Thu Mar  5 16:04:07 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 5 Mar 2009 11:04:07 -0500
Subject: [Biopython-dev] [Bug 2551] Adding advanced __getitem__ to generic
	alignment, e.g. align[1:2, 5:-5]
In-Reply-To: <bug-2551-42@http.bugzilla.open-bio.org/>
Message-ID: <200903051604.n25G471v021470@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2551


------- Comment #4 from biopython-bugzilla at maubp.freeserve.co.uk  2009-03-05 11:04 EST -------
Created an attachment (id=1257)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1257&action=view)
Patch for Bio/Align/Generic.py to support array like access

This requires the patch to the SeqRecord __getitem__ method just committed to
CVS for Bug 2507.  This includes an extended doctest which tries to illustrate
the typical usage I expect.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bsouthey at gmail.com  Thu Mar  5 16:59:08 2009
From: bsouthey at gmail.com (Bruce Southey)
Date: Thu, 5 Mar 2009 10:59:08 -0600
Subject: [Biopython-dev] determining the version
In-Reply-To: <320fb6e00903050458r5ef0e202l5e1a61031fb80c2@mail.gmail.com>
References: <320fb6e00809241412r54c2a3a1mc69f3e573f1eaac7@mail.gmail.com>
	<63700.34226.qm@web62405.mail.re1.yahoo.com>
	<320fb6e00809250222h3d0d15bw763446b5f0ec44d1@mail.gmail.com>
	<320fb6e00810010929y4dab07a5ya25767cc0818654d@mail.gmail.com>
	<320fb6e00902190225o34092311saddf02ec39f1e1dd@mail.gmail.com>
	<320fb6e00903050458r5ef0e202l5e1a61031fb80c2@mail.gmail.com>
Message-ID: <bbcd77d00903050859o52da154en6df06b51fb9ef1d@mail.gmail.com>

On Thu, Mar 5, 2009 at 6:58 AM, Peter <biopython at maubp.freeserve.co.uk> wrote:
> On Thu, Feb 19, 2009 at 10:25 AM, Peter <biopython at maubp.freeserve.co.uk> wrote:
>>
>> Since this thread last year, there have been no objections. ?Following
>> a recent question on the main mailing list about how to determine the
>> version of Biopython this seems worth doing before the next release.
>> Again, an objections or comments on the implementation details?
>> Otherwise I'll make this change shortly.
>>
>
> Changes made in CVS, and updated the release instructions:
> http://biopython.org/wiki/Building_a_release
>
> In between releases, should we leave the __version__ as is, or
> explicitly update it to be something like "1.49+" just after releasing
> 1.49? ?This only affects people installing Biopython from CVS, so they
> should be technically inclined...
>
> Peter
>


I agree that it would be helpful to distinguish between an official
release and a build from the CVS. Furthermore, it would then be
important to know when the build from CVS was done at least relative
to the official releases.

So I think you tending to have a numbering scheme like:
1.49 is an official release
1.49+ (or similar) is CVS after the 1.49 official release but before
the next official release 1.50.
1.50 will be an official release
1.50+ (or similar) is the CVS after the 1.50 official release but
before the next official release whatever number it will be.

If so the release instructions should also include an instruction to
change the CVS numbering in the version in __init__.py files after
release has been made.

Also, after looking at the release instructions shouldn't BioSQL and
Doc also have version-related information?
Ideally the Biopython BioSQL code should have some connection to the
main version of BioSQL - I don't use it so it is not an issue for me
(yet).

Bruce


From biopython at maubp.freeserve.co.uk  Thu Mar  5 17:50:04 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Thu, 5 Mar 2009 17:50:04 +0000
Subject: [Biopython-dev] determining the version
In-Reply-To: <bbcd77d00903050859o52da154en6df06b51fb9ef1d@mail.gmail.com>
References: <320fb6e00809241412r54c2a3a1mc69f3e573f1eaac7@mail.gmail.com>
	<63700.34226.qm@web62405.mail.re1.yahoo.com>
	<320fb6e00809250222h3d0d15bw763446b5f0ec44d1@mail.gmail.com>
	<320fb6e00810010929y4dab07a5ya25767cc0818654d@mail.gmail.com>
	<320fb6e00902190225o34092311saddf02ec39f1e1dd@mail.gmail.com>
	<320fb6e00903050458r5ef0e202l5e1a61031fb80c2@mail.gmail.com>
	<bbcd77d00903050859o52da154en6df06b51fb9ef1d@mail.gmail.com>
Message-ID: <320fb6e00903050950k4d0cce9i1fe1442e15cf9cf7@mail.gmail.com>

On Thu, Mar 5, 2009 at 4:59 PM, Bruce Southey <bsouthey at gmail.com> wrote:
>
> I agree that it would be helpful to distinguish between an official
> release and a build from the CVS. Furthermore, it would then be
> important to know when the build from CVS was done at least relative
> to the official releases.
>
> So I think you tending to have a numbering scheme like:
> 1.49 is an official release
> 1.49+ (or similar) is CVS after the 1.49 official release but before
> the next official release 1.50.
> 1.50 will be an official release
> 1.50+ (or similar) is the CVS after the 1.50 official release but
> before the next official release whatever number it will be.

That is one of the two suggestions I was putting forward.  The other
was just leaving the version number as that of the most recent release
- people should know if they are running CVS as this has to be done
deliberately.

One tiny downside is the "+" gets turned into an underscore for
filenames (e.g. egg files, and I assume a windows installer), but we
won't be releasing those so that doesn't matter.

> If so the release instructions should also include an instruction to
> change the CVS numbering in the version in __init__.py files after
> release has been made.

Yes - assuming people are happy with this suggested scheme.

Note that if we switch to SVN, something automated with the SVN
revision number might be possible.

> Also, after looking at the release instructions shouldn't BioSQL and
> Doc also have version-related information?
> Ideally the Biopython BioSQL code should have some connection to the
> main version of BioSQL - I don't use it so it is not an issue for me
> (yet).

Because Bio/* and BioSQL/* are always shipped and packaged together,
to my mind they together make up Biopython and share the same version
number.  As to why BioSQL is top level rather than being Bio.BioSQL,
it was long ago and I have no idea.

For the documentation, recent releases of the tutorial have included
the target version of Biopython together with the date.  Again, this
should be in the release instructions.

Peter


From bugzilla-daemon at portal.open-bio.org  Thu Mar  5 17:54:01 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 5 Mar 2009 12:54:01 -0500
Subject: [Biopython-dev] [Bug 2767] Bio.SeqIO support for FASTQ and QUAL
	files
In-Reply-To: <bug-2767-42@http.bugzilla.open-bio.org/>
Message-ID: <200903051754.n25Hs1cW030546@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2767


biopython-bugzilla at maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
Attachment #1251 is|0                           |1
           obsolete|                            |


------- Comment #9 from biopython-bugzilla at maubp.freeserve.co.uk  2009-03-05 12:54 EST -------
Created an attachment (id=1258)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1258&action=view)
Read/write support for FASTQ and QUAL files, using the letter_annotations dict  

Small update to earlier version, with minor comment changes.

Also includes explicit rounding of scores to the nearest integer when writing
out PHRED scores in Solexa format (and vice versa).  This conversion still
needs verifying against real world examples.

I've been testing with real world PHRED based files only so far.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Fri Mar  6 16:08:50 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 6 Mar 2009 11:08:50 -0500
Subject: [Biopython-dev] [Bug 2779] Seq.count() docstring should note
	unexpected behaviour
In-Reply-To: <bug-2779-42@http.bugzilla.open-bio.org/>
Message-ID: <200903061608.n26G8oL9003353@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2779


biopython-bugzilla at maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED


------- Comment #9 from biopython-bugzilla at maubp.freeserve.co.uk  2009-03-06 11:08 EST -------
I have updated the docstrings in CVS to stress that like the python string a
non-overlapping count is used, marking this bug as fixed.

>From the mailing list discussion having a overlapping count available would be
a welcome enhancement, perhaps as a separate method, e.g. overlapping_count. 
Leighton's patch or Sebastian's code in Bio/SeqUtils/MeltingTemp.py could be
used for the implementation.  We can do this on a new enhancement bug, once a
consensus is reached on the mailing list.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Fri Mar  6 17:34:58 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 6 Mar 2009 12:34:58 -0500
Subject: [Biopython-dev] [Bug 2783] New: Using alternative start codons in
	Bio.Seq translate method/function
Message-ID: <bug-2783-42@http.bugzilla.open-bio.org/>

http://bugzilla.open-bio.org/show_bug.cgi?id=2783

           Summary: Using alternative start codons in Bio.Seq translate
                    method/function
           Product: Biopython
           Version: Not Applicable
          Platform: All
        OS/Version: All
            Status: NEW
          Severity: enhancement
          Priority: P2
         Component: Main Distribution
        AssignedTo: biopython-dev at biopython.org
        ReportedBy: biopython-bugzilla at maubp.freeserve.co.uk


This bug covers an issue originally raised on Bug 2381.  This bug is
specifically for how to translates a CDS using a non-standard start codon (a
codon which doesn't normally encode methionine).

In computing, we often blindly translate without worrying about start codons. 
For example, you might translated a whole genomes (in all six frames) as part
of looking for open reading frames.  Translating a partial CDS where the start
is missing is another example.  The current Bio.Seq translation functionality
supports these usages.

In real biology however, translation from RNA to amino acids always starts at a
initiation/start codon (typically AUG) which becomes the methionine at the
start of the protein.  In eukaryotes, usually the only start codon is AUG, and
it normally encodes methionine, so this doesn't seem special.  However, in many
organisms there are lots of genes with a alternative start/initiation codons
which do NOT normally encode methionine.  However, when they are used as a
start/initiation code they DO get translated as methionine!

For example, there are 418 annotated genes in E. coli K12 with non-standard
start codons - which you might want to translate into proteins (which *should*
start with a methionine).

For example, using the following NCBI FASTA file of CDS sequences,
ftp://ftp.ncbi.nih.gov/genomes/Bacteria/Escherichia_coli_K12_substr__MG1655

Here is the CDS for gene yaaX:

>ref|NC_000913.2|:5234-5530
GTGAAAAAGATGCAATCTATCGTACTCGCACTTTCCCTGGTTCTGGTCGCTCCCATGGCA
GCACAGGCTGCGGAAATTACGTTAGTCCCGTCAGTAAAATTACAGATAGGCGATCGTGAT
AATCGTGGCTATTACTGGGATGGAGGTCACTGGCGCGACCACGGCTGGTGGAAACAACAT
TATGAATGGCGAGGCAATCGCTGGCACCTACACGGACCGCCGCCACCGCCGCGCCACCAT
AAGAAAGCTCCTCATGATCATCACGGCGGTCATGGTCCAGGCAAACATCACCGCTAA

This starts GTC which is a valid bacterial start codon.  I'd like to be able to
translate this and get the actual biologically relevant protein as given in the
GenBank file NC_000913.gbk (with or without the stop symbol at the end), which
starts with "M" not "V":

     CDS             5234..5530
                     /gene="yaaX"
                     /locus_tag="b0005"
                     /codon_start=1
                     /transl_table=11
                     /product="predicted protein"
                     /protein_id="NP_414546.1"
                     /db_xref="ASAP:ABE-0000015"
                     /db_xref="UniProtKB/Swiss-Prot:P75616"
                     /db_xref="GI:16127999"
                     /db_xref="ECOCYC:G6081"
                     /db_xref="EcoGene:EG14384"
                     /db_xref="GeneID:944747"
                     /translation="MKKMQSIVLALSLVLVAPMAAQAAEITLVPSVKLQIGDRDNRGY
                     YWDGGHWRDHGWWKQHYEWRGNRWHLHGPPPPPRHHKKAPHDHHGGHGPGKHHR"

Without any non-standard start codon support, my translations start with a V
(rather than the desired M):

>>> from Bio.Seq import Seq
>>> yaaX = Seq("GTGAAAAAGATGCAATCTATCGTACTCGCACTTTCCCTGGTTCTGGTCGCTCCCATGGCA"
...            "GCACAGGCTGCGGAAATTACGTTAGTCCCGTCAGTAAAATTACAGATAGGCGATCGTGAT"
...            "AATCGTGGCTATTACTGGGATGGAGGTCACTGGCGCGACCACGGCTGGTGGAAACAACAT"
...            "TATGAATGGCGAGGCAATCGCTGGCACCTACACGGACCGCCGCCACCGCCGCGCCACCAT"
...            "AAGAAAGCTCCTCATGATCATCACGGCGGTCATGGTCCAGGCAAACATCACCGCTAA")
>>> print yaaX.translate(table=11)
VKKMQSIVLALSLVLVAPMAAQAAEITLVPSVKLQIGDRDNRGYYWDGGHWRDHGWWKQHYEWRGNRWHLHGPPPPPRHHKKAPHDHHGGHGPGKHHR*
>>> print yaaX.translate(table=11, to_stop=True)
VKKMQSIVLALSLVLVAPMAAQAAEITLVPSVKLQIGDRDNRGYYWDGGHWRDHGWWKQHYEWRGNRWHLHGPPPPPRHHKKAPHDHHGGHGPGKHHR

These start with "V", while in this situation I want an "M" because I know this
is a full CDS and the first codon is a start codon.

I therefore want to add an optional argument to the Seq object's translate
method (and the Bio.Seq.translate function) so that I can obtain the desired
results (both with and without the terminator stop symbol).  I want an option
to tell Biopython that this sequence commences with a start/initiation codon:

>>> print yaaX.translate(table=11, with_start_codon=True)
MKKMQSIVLALSLVLVAPMAAQAAEITLVPSVKLQIGDRDNRGYYWDGGHWRDHGWWKQHYEWRGNRWHLHGPPPPPRHHKKAPHDHHGGHGPGKHHR*
>>> print yaaX.translate(table=11, to_stop=True, with_start_codon=True)
MKKMQSIVLALSLVLVAPMAAQAAEITLVPSVKLQIGDRDNRGYYWDGGHWRDHGWWKQHYEWRGNRWHLHGPPPPPRHHKKAPHDHHGGHGPGKHHR

I have in the above example called this new argument "with_start_codon", but I
am open to naming suggestions.  If False (default), then nothing changes.  If
the new argument is True, this indicates that the first codon should be a valid
start/initiation codon (in the declared translation table), and that it should
be translated as a methionine.

I will upload a patch implementing this in a moment...

This proposal is NOT about an option to have the translate function/method
search the sequence for the first valid start codon (either in frame or not).

This proposal is NOT about an option to check the sequence is a valid CDS (i.e.
starts with a start codon, ends with an in frame stop codon, and has no
internal premature stop codons), and then translating it.  While this makes
sense (and BioPerl does this), this would prevent certain uses.  e.g. a partial
CDS sequence where the 3' end is missing.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Fri Mar  6 17:36:24 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 6 Mar 2009 12:36:24 -0500
Subject: [Biopython-dev] [Bug 2783] Using alternative start codons in
	Bio.Seq translate method/function
In-Reply-To: <bug-2783-42@http.bugzilla.open-bio.org/>
Message-ID: <200903061736.n26HaOWH012440@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2783


------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk  2009-03-06 12:36 EST -------
Created an attachment (id=1259)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1259&action=view)
Patch for Bio/Seq.py to support non-standard start codons in translation

Patch implementing my proposed change, based my earlier patch attachment 1040
on Bug 2381.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Fri Mar  6 17:38:39 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 6 Mar 2009 12:38:39 -0500
Subject: [Biopython-dev] [Bug 2381] translate and transcribe methods for the
	Seq object (in Bio.Seq)
In-Reply-To: <bug-2381-42@http.bugzilla.open-bio.org/>
Message-ID: <200903061738.n26Hcd04012626@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2381


biopython-bugzilla at maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED


------- Comment #55 from biopython-bugzilla at maubp.freeserve.co.uk  2009-03-06 12:38 EST -------
I'm closing this bug as basic translate and transcribe methods where included
with Biopython 1.49.

I have filed Bug 2381 for "Using alternative start codons in Bio.Seq translate
method/function".


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Fri Mar  6 17:43:25 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 6 Mar 2009 12:43:25 -0500
Subject: [Biopython-dev] [Bug 2783] Using alternative start codons in
	Bio.Seq translate method/function
In-Reply-To: <bug-2783-42@http.bugzilla.open-bio.org/>
Message-ID: <200903061743.n26HhPRX013186@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2783


------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk  2009-03-06 12:43 EST -------
(In reply to comment #1)
> Created an attachment (id=1259)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1259&action=view) [details]
> Patch for Bio/Seq.py to support non-standard start codons in translation
> 
> Patch implementing my proposed change, based my earlier patch
> attachment 1040 [details] on Bug 2381.

Actually, it was based on the patch in attachment 1032 (not 1040) on Bug 2381.

Other names proposed for this new argument included:

init - rejected as potentially confusing

force_methionine - possible, but implies any codon would be allowed even
something that isn't a valid start codon

alt_start - perhaps confusing?


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Fri Mar  6 19:54:17 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 6 Mar 2009 14:54:17 -0500
Subject: [Biopython-dev] [Bug 2783] Using alternative start codons in
	Bio.Seq translate method/function
In-Reply-To: <bug-2783-42@http.bugzilla.open-bio.org/>
Message-ID: <200903061954.n26JsHK4026141@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2783


------- Comment #3 from eric.talevich at gmail.com  2009-03-06 14:54 EST -------
(In reply to comment #2)

How about require_start? Or require_met, if you don't mind how strange it looks
as English. The name with_start_codon seems like it would take a codon or
alternate table as the argument.

I also see two choices being made by using this parameter:
(1) Check that the sequence starts with a valid start codon, and if not, raise
an exception;
(2) Use a set of alternate genetic codes for looking up the initial methionine.

>From the other bug's discussion it seems like there are a number of boolean
options that could reasonably be used with the translate() method, but adding
them all as keyword args would clutter up the API. What about using a bitmask
in Bio.Seq that can be used with translate()? The re module takes a bitmask as
the last parameter for most functions, for example, and it looks pretty clean
compared to a series of boolean keyword args.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From mjldehoon at yahoo.com  Sun Mar  8 12:03:31 2009
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Sun, 8 Mar 2009 05:03:31 -0700 (PDT)
Subject: [Biopython-dev] ScanProsite
In-Reply-To: <320fb6e00903020226n3e5929ean957f38315c28d863@mail.gmail.com>
Message-ID: <956971.84123.qm@web62404.mail.re1.yahoo.com>


--- On Mon, 3/2/09, Peter <biopython at maubp.freeserve.co.uk> wrote:
> I like your suggestion to have a REST XML based module
> under Bio.ExPASy, which means we can deprecate the HTML based
> Bio.Prosite module and in the process make the top level list of
> modules in Biopython a bit shorter.  In the long term I think that
> will help people find functionality.
> 
Then, how about the following code organization:

Bio/ExPASy/__init__.py contains 
get_prodoc_entry  Interface to the get-prodoc-entry CGI script.
get_prosite_entry Interface to the get-prosite-entry CGI script.
get_prosite_raw   Interface to the get-prosite-raw CGI script.
get_sprot_raw     Interface to the get-sprot-raw CGI script.
sprot_search_ful  Interface to the sprot-search-ful CGI script.
sprot_search_de   Interface to the sprot-search-de CGI script.
(currently in Bio/ExPASy.py)

Bio/ExPASy/Prosite.py contains read(), parse(), Record for Prosite files
(currently in Bio/Prosite/__init__.py), as well as a Pattern class to handle Prosite patterns (currently in Bio/Prosite/Pattern.py, but this seems to be unused).

Bio/ExPASy/Prodoc.py contains read(), parse(), Record for Prosite documentation files
(currently in Bio/Prosite/Prodoc.py)

Bio/ExPASy/ScanProsite contains scan(), read(), Record to interact with ScanProsite
(currently a broken version to access ScanProsite and parse its results exists in Bio/ExPASy.py and Bio/Prosite/__init__.py).

I have a simplified version of the Prosite and Prodoc parsers. If we use the scheme above, I'll put the new version in Bio/ExPASy/Prosite.py and Bio/ExPASy/Prodoc.py, and deprecate Bio.Prosite.

--Michiel.


From biopython at maubp.freeserve.co.uk  Tue Mar 10 20:29:54 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Tue, 10 Mar 2009 20:29:54 +0000
Subject: [Biopython-dev] [Utilities-announce] NCBI E-Utilities
	requirements updated
In-Reply-To: <320fb6e00903050440h138893b3yb2484a557621fc41@mail.gmail.com>
References: <7B6F170840CA6C4DA63EE0C8A7BB43EC051985F6@NIHCESMLBX15.nih.gov>
	<320fb6e00903050440h138893b3yb2484a557621fc41@mail.gmail.com>
Message-ID: <320fb6e00903101329i69e40fc0i6a2b13332df55e7a@mail.gmail.com>

On Thu, Mar 5, 2009 at 12:40 PM, Peter <biopython at maubp.freeserve.co.uk> wrote:
> This email was sent out a few weeks ago, but it took a while before
> the NCBI webpage was actually updated (maybe a caching issue) so I
> didn't rush to relax our rules immediately.
>
> Under the new rules we must make no more than three requests every
> second.  We could track the times of the last two requests in order to
> enforce this as worded, but I think it would be simpler just to switch
> from using a minimum 3 second pause between Bio.Entrez requests to
> just a minimum 0.33334 second pause.  This is a much simpler code
> change and will comply with the new relaxed rules.
>
> Unless anyone has a counter suggestion, I will update Bio.Entrez and
> the tutorial shortly.

Change made in CVS, including the tutorial.

Peter


From bugzilla-daemon at portal.open-bio.org  Tue Mar 10 20:36:28 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 10 Mar 2009 16:36:28 -0400
Subject: [Biopython-dev] [Bug 2762] GFF capability in SeqIO
In-Reply-To: <bug-2762-42@http.bugzilla.open-bio.org/>
Message-ID: <200903102036.n2AKaSje008217@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2762


------- Comment #5 from biopython-bugzilla at maubp.freeserve.co.uk  2009-03-10 16:36 EST -------
For anyone following this bug, Brad has some related code posted on his blog -
see this mailing list discussion:
http://lists.open-bio.org/pipermail/biopython/2009-March/004983.html


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Tue Mar 10 20:49:30 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 10 Mar 2009 16:49:30 -0400
Subject: [Biopython-dev] [Bug 2783] Using alternative start codons in
	Bio.Seq translate method/function
In-Reply-To: <bug-2783-42@http.bugzilla.open-bio.org/>
Message-ID: <200903102049.n2AKnUoD009300@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2783


------- Comment #4 from biopython-bugzilla at maubp.freeserve.co.uk  2009-03-10 16:49 EST -------
On comment #3, Eric wrote:
> 
> How about require_start? Or require_met, if you don't mind how strange
> it looks as English. The name with_start_codon seems like it would take
> a codon or alternate table as the argument.

I think "require_start" is OK.  Or "require_start_codon".

> I also see two choices being made by using this parameter:
> (1) Check that the sequence starts with a valid start codon, and
> if not, raise an exception;

That is what my patch does.  Plus of course translating the valid start
codon as a methionine.

> (2) Use a set of alternate genetic codes for looking up the initial
> methionine.

I'm unsure what you mean here.  If you mean actually having the
translate method search for the first valid start codon, I am
really not keen on this at all. This is complicated, and verges
on gene/ORF finding, which I specifically wanted to avoid:

Peter wrote in comment #0:
>> This proposal is NOT about an option to have the translate
>> function/method search the sequence for the first valid
>> start codon (either in frame or not).

On comment #3, Eric wrote:
> From the other bug's discussion it seems like there are a number of boolean
> options that could reasonably be used with the translate() method, but adding
> them all as keyword args would clutter up the API. What about using a bitmask
> in Bio.Seq that can be used with translate()? The re module takes a bitmask as
> the last parameter for most functions, for example, and it looks pretty clean
> compared to a series of boolean keyword args.

I agree that there is a risk of confusion with too many arguments.  But I don't
think a bitmask would help - I think it makes it worse!  I'm not saying its a
good thing, but we have lots of functions/methods in Biopython already with
lots of arguments (e.g. the standalone BLAST wrappers, or the Bio.Entrez
functions).  On the other hand, I can't immediately think of a single python
function which uses a bitmask.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From biopython at maubp.freeserve.co.uk  Tue Mar 10 23:40:29 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Tue, 10 Mar 2009 23:40:29 +0000
Subject: [Biopython-dev] Bio.Entrez catching more errors
Message-ID: <320fb6e00903101640s5db8ed9hc1335d02f5e4123@mail.gmail.com>

Hi All,

It occured to me that the Bio.Entrez._open function can look at the
retmode argument (if present) and spot if there is a mismatch between
the requested format (e.g. XML, HTML, text or asn.1) and the actual
data the NCBI returned.  Something along the following lines could be
added to the end of the _open function in Bio/Entrez/__init__.py to
acheive this:

    elif "retmode" in params and params["retmode"].lower()=="html" \
    and not data.lower().startswith("<html") \
    and not data.lower().startswith("<!doctype html") :
        raise TypeError("Requested HTML, but didn't get it: %s..." % data)
    elif "retmode" in params and params["retmode"].lower()=="xml" \
    and not data.lower().startswith("<?xml") :
        raise TypeError("Requested XML, but didn't get it: %s..." % data)
    elif "retmode" in params and params["retmode"] \
    and params["retmode"].lower()!="xml" \
    and data.lower().startswith("<?xml") :
        raise TypeError("Didn't request XML, but got it: %s..." % data)
    elif "retmode" in params and params["retmode"] \
    and params["retmode"].lower()!="html" \
    and (data.lower().startswith("<html") or \
         data.lower().startswith("<!doctype html")):
        #Expected for some error pages (e.g. the Bad Gateway caught above)
        raise TypeError("Didn't request HTML, but got it: %s..." % data)

I'm sure my XML/HTML detection could be made more robust here - I hope
the principle is clear.  My motivation is that I have noticed the NCBI
can return HTML error pages, and while we do catch some of these
explicitly (e.g. Bad Gateway, or Service Unavailable), I think any
HTML page when the user asked from XML, text or asn.1 should be
treated as error.  Similarly, not getting XML when you ask for it etc.

Note that by raising the exception including the message text it
should be much easier to diagnose these failures.  As a tiny
refinement to the above code, we should only add the "..." if there is
more text to follow - this isn't always the case.

e.g. The following give an HTML error page (while some databases like
"protein" are better behaved in this respect):
>>> print Entrez.efetch(db="homologene", id="nonexistant", retmode="text").read()
>>> print Entrez.efetch(db="homologene", id="nonexistant", retmode="asn.1").read()

Similarly, these give an XML like fragment (which is not a valid XML
file in itself - arguably an NCBI bug; some databases like "protein"
are better behaved in this respect):
>>> print Entrez.efetch(db="pubmed", id="nonexistant", retmode="xml").read()
>>> print Entrez.efetch(db="homologene", id="nonexistant", retmode="xml").read()
>>> print Entrez.efetch(db="cdd", id="nonexistant", retmode="xml").read()
>>> print Entrez.efetch(db="taxonomy", id="nonexistant", retmode="xml").read()

My suggested change to Bio.Entrez would also catch the following
examples (using an invalid database) where the NCBI ignore the retmode
and return an HTML help page:
>>> print Entrez.efetch(db="nonexistant", id="123456", retmode="xml").read()
>>> print Entrez.efetch(db="nonexistant", id="123456", retmode="text").read()

In a less clear cut example, this would flag the following as an error
as the NCBI seem to return ASN.1 text instead of HTML here::
>>> print Entrez.efetch(db="nucleotide", retmode="html", id="123456").read()

Overall, I think this change should catch lots of errors which
otherwise may not be detected until later (e.g. while trying to parse
the file).

--------------------------------------------------------------------------------------------------

On another point, should we catch these responses as errors:?

>>> efetch(db="snp", id="123456").read()
'<html><head><title>PmFetch response</title></head><body>\n<pre>\n1:
id: 123456 Error occurred: cannot get document
summary\n</pre></body></html>'
>>> efetch(db="snp", id="123456", retmode="html").read()
'<html><head><title>PmFetch response</title></head><body>\n<pre>\n1:
id: 123456 Error occurred: cannot get document
summary\n</pre></body></html>'
>>> efetch(db="snp", id="123456", retmode="xml").read()
'<?xml version="1.0"?>\n<ExchangeSet
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"\nxmlns="http://www.ncbi.nlm.nih.gov/SNP/docsum"\nxsi:schemaLocation="http://www.ncbi.nlm.nih.gov/SNP/docsum\nhttp://www.ncbi.nlm.nih.gov/SNP/docsum/eudocsum.xsd">1:
id: 123456 Error occurred: cannot get document
summary\n\n</ExchangeSet>'
>>> efetch(db="snp", id="123456", retmode="text").read()
'1: id: 123456 Error occurred: cannot get document summary\n'

and,
>>> print efetch(db="homologene", retmode="html", id="fake").read()
<html>
<body>
<br/><h2>Error occurred: Empty id list - nothing todo</h2>...

Looking for the string "Error occurred: " looks fairly safe here, and
should cover a range of entries.  Of course, you can imagine false
positives too, e.g. a valid PUBMED plain text record for a tutorial
article with a title like "Yikes! An Error Occurred: A beginner's
Guide To Defensive Programming." could match.

Peter


From lorena.carlo at gmail.com  Wed Mar 11 15:58:24 2009
From: lorena.carlo at gmail.com (=?ISO-8859-1?Q?Lorena_Carl=F3?=)
Date: Wed, 11 Mar 2009 09:58:24 -0600
Subject: [Biopython-dev] function to map uniprot IDs with PDB IDs
Message-ID: <22d7b0c30903110858x4125dc25v24fafec3d561209e@mail.gmail.com>

Hi all,

I would like to know if there is an implemented function in Biopython that
allows getting the PDB id from a Uniprotkb ID?.

Thanks,
Lorena


From biopython at maubp.freeserve.co.uk  Wed Mar 11 16:12:36 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Wed, 11 Mar 2009 16:12:36 +0000
Subject: [Biopython-dev] function to map uniprot IDs with PDB IDs
In-Reply-To: <22d7b0c30903110858x4125dc25v24fafec3d561209e@mail.gmail.com>
References: <22d7b0c30903110858x4125dc25v24fafec3d561209e@mail.gmail.com>
Message-ID: <320fb6e00903110912g717ccb52q4242a6ff169b5d1f@mail.gmail.com>

On Wed, Mar 11, 2009 at 3:58 PM, Lorena Carl? <lorena.carlo at gmail.com> wrote:
> Hi all,
>
> I would like to know if there is an implemented function in Biopython that
> allows getting the PDB id from a Uniprotkb ID?.
>
> Thanks,
> Lorena

There isn't a simple one-to-one mapping from a UniProtKB/Swiss-Prot ID
to a PDB ID, see
http://www.uniprot.org/faq/2

Are you working from UniProtKB/Swiss-Prot files?  How about something like this:

# This assumes you have downloaded the following file
# to your working directory:
# http://www.uniprot.org/uniprot/P00734.txt
from Bio import SeqIO
record = SeqIO.read(open("P00734.txt"),"swiss")
for xref in record.dbxrefs :
    if xref.startswith("PDB:") :
        print xref.split(":",1)[1]

Peter

P.S. This is more a question for the main discussion list, rather than
Biopython development


From bugzilla-daemon at portal.open-bio.org  Wed Mar 11 23:39:02 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 11 Mar 2009 19:39:02 -0400
Subject: [Biopython-dev] [Bug 2788] New: Bio.Nexus.Trees newick parser crash
Message-ID: <bug-2788-42@http.bugzilla.open-bio.org/>

http://bugzilla.open-bio.org/show_bug.cgi?id=2788

           Summary: Bio.Nexus.Trees newick parser crash
           Product: Biopython
           Version: 1.49
          Platform: Macintosh
        OS/Version: Mac OS
            Status: NEW
          Severity: normal
          Priority: P2
         Component: Main Distribution
        AssignedTo: biopython-dev at biopython.org
        ReportedBy: matzke at berkeley.edu


The newick files I have been working with seem to open fine in several
different programs/packages (Dendroscope, R's APE package, phylocom, python
alfacinha module), but not the newick parser in Bio.Nexus.Trees.

Rather than upload a file I've got the full newick string hard-coded below:

============
from Bio.Nexus.Trees import Tree

tree_str =
'(((((((((((((((((Sambucus:43.136024,Viburnum:43.136040)Adoxaceae:53.892513,(Acanthopanax:34.719704,Aralia:34.719727,Dendropanax:34.719727,Evodiopanax:34.719727,Kalopanax:34.719727,Schefflera:34.719727)Araliaceae:62.308830):7.045975,Ilex:104.074516):3.056864,((((((Catalpa:22.623766,Paulownia:22.623785)Bignoniaceae:22.623766,(Clerodendrum:19.864199,Premna:19.864218)Verbenaceae:25.383331):22.378326,(Chionanthus:29.443968,Forestiera:29.443979,Fraxinus:29.443979,Ligustrum:29.443979,Osmanthus:29.443979,Syringa:29.443979)Oleaceae:38.181892):19.113832,(Adina:38.252457,Cephalanthus:38.252472,Emmenopterys:38.252472,Pinckneya:38.252472,Randia:38.252472)Rubiaceae:48.487236):2.360018,Ehretia:89.099709):13.495450,Eucommia:102.595161):4.536214):0.905059,((((Clethra:78.134140,((Cliftonia:38.402752,Cyrilla:38.402775)Cyrillaceae:38.402752,(Arbutus:38.402752,Elliottia:38.402775,Enkianthus:38.402775,Kalmia:38.402775,Lyonia:38.402775,Oxydendrum:38.402775,Rhododendron:38.402775,Vaccinium:38.402775)Ericaceae:38.402752):1.328631):12.980787,(((Halesia:30.391993,Pterostyrax:30.392012,Styrax:30.392012)Styracaceae:51.775261,Symplocos:82.167252):0.000000,(Camellia:41.083626,Franklinia:41.083649,Gordonia:41.083649,Stewartia:41.083649,Ternstroemia:41.083649)Theaceae:41.083626):8.947675):0.000149,Diospyros:91.115099):2.023849,((Ardisia:18.344650,Myrsine:18.344666)Myrsinaceae:74.794174,Bumelia:93.138824):0.000101):14.897509):1.462594,((Alangium:48.167362,Aucuba:48.167370,Cornus:48.167370,Macrocarpium:48.167370,Torricellia:48.167370)Cornaceae:53.025345,(Hydrangea:97.032310,(Davidia:48.516151,Nyssa:48.516167)Nyssaceae:48.516151):4.160399):8.306321):7.064716,Schoepfia:116.563736):0.000000,((((Altingia:50.813206,Liquidambar:50.813213)Altingiaceae:50.813206,(Disanthus:50.813206,Distylium:50.813213,Fortuneria:50.813213,Hamamelis:50.813213,Loropetalum:50.813213,Sinowilsonia:50.813213)Hamamelidaceae:50.813206):0.000131,(Cercidiphyllum:87.828712,Daphniphyllum:87.828712):13.797829):13.247040,(((((((Choerospondias:21.440735,Cotinus:21.440742,Pist!
 acia:21.
440742,Rhus:21.440742,Toxicodendron:21.440742)Anacardiaceae:37.304596,(Acer:29.372665,Aesculus:29.372681,Dipteronia:29.372681,Koelreuteria:29.372681,Sapindus:29.372681)Sapindaceae:29.372665):0.000114,((Cedrela:49.350353,(Ailanthus:24.675177,Leitneria:24.675188,Picrasma:24.675188)Simaroubaceae:24.675177):4.016092,(Evodia:26.683222,Phellodendron:26.683233,Ptelea:26.683233,Zanthoxylum:26.683233)Rutaceae:26.683222):5.379002):29.842871,(Firmiana:32.917126,Tilia:32.917149)Malvaceae:55.671188):12.661992,(Lagerstroemia:84.110847,Szyzygium:84.110847):17.139463):2.612011,((((((Alnus:16.609535,Betula:16.609543,Carpinus:16.609543,Corylus:16.609543,Ostrya:16.609543)Betulaceae:37.306709,((Carya:25.504854,Cyclocarya:25.504866,Juglans:25.504866,Engelhardtia:25.504866,Platycarya:25.504866,Pterocarya:25.504866)Juglandaceae:25.504854,Myrica:51.009708):2.906531):9.893459,(Castanea:31.904850,Castanopsis:31.904873,Cyclobalanopsis:31.904873,Fagus:31.904873,Lithocarpus:31.904873,Quercus:31.904873)Fagaceae:31.904850):21.681023,(((((Celtis:20.739927,Pteroceltis:20.739939)Cannabaceae:20.739927,((Broussounetia:12.614990,Cudrania:12.615005,Maclura:12.615005,Morus:12.615005)Moraceae:12.614990,Oreocnide:25.229980):16.249876):10.909924,(Aphananthe:26.194889,Hemiptelea:26.194897,Planera:26.194897,Ulmus:26.194897,Zelkova:26.194897)Ulmaceae:26.194889):11.649286,(Hovenia:32.019470,Rhamnus:32.019493,Ziziphus:32.019493)Rhamnaceae:32.019596):8.938065,(((Amelanchier:36.488564,(Crataegus:36.488586,Mespilus:36.488586):0.000000):0.000000,Chaenomeles:36.488586,Eriobotrya:36.488586,Malus:36.488586,Photinia:36.488586,Pyrus:36.488586,Sorbus:36.488586):0.000000,Prunus:36.488586)Rosaceae:36.488564):12.513593):4.616908,(Albizia:31.901920,Cercis:31.901943,Cladrastis:31.901943,Dalbergia:31.901943,Erythrina:31.901943,Gleditsia:31.901943,Gymnocladus:31.901943,Laburnum:31.901943,Maackia:31.901943,Ormosia:31.901943,Robinia:31.901943,Sophora:31.901943)Fabaceae:58.205711):4.139401,((Euonymus:90.433327,Sloanea:90.433327):0.000101,((Mallotus:28.689901,Sapium:28.6!
 89920)Eu
phorbiaceae:50.330055,(Idesia:29.019764,Poliothyrsis:29.019779,Populus:29.019779,Salix:29.019779,Xylosma:29.019779)Salicaceae:50.000195):11.413469):3.813607):9.615288):0.000000,(Staphylea:21.372393,Tapiscia:21.372404,Turpinia:21.372404)Staphyleaceae:82.489929):11.011259):1.690163):7.829397,Buxus:124.393143):0.000000,Tetracentron:124.393143):2.763555,Meliosma:127.156693):1.664427,Platanus:128.821121):2.029122,Euptelea:130.850250):11.447736,((Asimina:95.972672,(Liriodendron:47.125092,Magnolia:47.125114,Manglieita:47.125114,Michelia:47.125114)Magnoliaceae:48.847580):46.325292,(Actinodaphne:49.903526,Cinnamomum:49.903542,Lindera:49.903542,Litsea:49.903542,Machilus:49.903542,Neolitsea:49.903542,Nothaphoebe:49.903542,Persea:49.903542,Phoebe:49.903542,Sassafras:49.903542,Umbellularia:49.903542)Lauraceae:92.394188):0.000257):1.840266,(Yucca:110.138222,((Sabal:100.000000,(Serenoa:95.000000,Trachycarpus:95.000000)ST:5.000000)Arecaceae:10.000000,(Arundinaria:20.476601,Phyllostachys:20.476624,Semiarundinaria:20.476624)Poaceae:89.661629):0.000000):34):30.861772,Illicium:175.000000)aus2ast:175.000000,(((((Cephalotaxus:125.000000,(Taxus:100.000000,Torreya:100.000000)TT1:25.000000)Taxaceae:90.000000,((((((((Calocedrus:85.000000,Platycladus:85.000000)CP:5.000000,(Cupressus:85.000000,Juniperus:85.000000)CJ:5.000000)CJCP:5.000000,Chamaecyparis:95.000000)CCJCP:5.000000,(Thuja:7.870000,Thujopsis:7.870000)TT2:92.13)CJCPTT:30.000000,((Cryptomeria:120.000000,Taxodium:120.000000)CT:5.000000,Glyptostrobus:125.000000)CTG:5.000000)CupCallTax:5.830000,((Metasequoia:125.000000,Sequoia:125.000000)MS:5.000000,Sequoiadendron:130.000000)Sequoioid:5.830000)STCC:49.060001,Taiwania:184.889999)Taw+others:15.110000,Cunninghamia:200.000000)nonSci:15.000000)Tax+nonSci:10.000000,Sciadopitys:225.000000):25.000000,(((Abies:106.000000,Keteleeria:106.000000)AK:54.000000,(Pseudolarix:156.000000,Tsuga:156.000000)NTP:4.000000)NTPAK:24.000000,((Larix:87.000000,Pseudotsuga:87.000000)LP:81.000000,(Picea:155.000000,Pinus:155.000000)PPC:13.000000)Pinoideae:!
 16.00000
0)Pinaceae:66.000000)Coniferales:25.000000,Ginkgo:275.000000)gymnosperm:75.000000)seedplant:50.000000;'


tree_obj = Tree(tree_str)

print tree_obj
============


This brings up the follow error for "tree_obj = Tree(tree_str)": 
========
ValueError: invalid literal for float(): seedplant
========

It looks like it is looking for a floating point number where "seedplant" is.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Thu Mar 12 10:17:01 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 12 Mar 2009 06:17:01 -0400
Subject: [Biopython-dev] [Bug 2788] Bio.Nexus.Trees newick parser crash
In-Reply-To: <bug-2788-42@http.bugzilla.open-bio.org/>
Message-ID: <200903121017.n2CAH13S012060@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2788


------- Comment #1 from cymon.cox at gmail.com  2009-03-12 06:17 EST -------
(In reply to comment #0)
> The newick files I have been working with seem to open fine in several
> different programs/packages (Dendroscope, R's APE package, phylocom, python
> alfacinha module), but not the newick parser in Bio.Nexus.Trees.

[a big tree]

> tree_obj = Tree(tree_str)
> 
> print tree_obj
> ============
> 
> 
> This brings up the follow error for "tree_obj = Tree(tree_str)": 
> ========
> ValueError: invalid literal for float(): seedplant
> ========
> 
> It looks like it is looking for a floating point number where "seedplant" is.

Your tree is decorated with node labels, which the parser cannot handle.

This came up recently (within the last year?) but I can't find the bug/message.

Should probably catch this and return an informative error - or implement node
labels...

C.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Thu Mar 12 10:38:59 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 12 Mar 2009 06:38:59 -0400
Subject: [Biopython-dev] [Bug 2788] Bio.Nexus.Trees newick parser does not
	support internal node labels
In-Reply-To: <bug-2788-42@http.bugzilla.open-bio.org/>
Message-ID: <200903121038.n2CAcxMR014167@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2788


biopython-bugzilla at maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Severity|normal                      |enhancement
         OS/Version|Mac OS                      |All
           Platform|Macintosh                   |All
            Summary|Bio.Nexus.Trees newick      |Bio.Nexus.Trees newick
                   |parser crash                |parser does not support
                   |                            |internal node labels


------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk  2009-03-12 06:38 EST -------
I thought it looked familiar, but I must have only searched the currently open
bugs.  This looks *very* similar to Bug 2543 which dealt with internal node
names, which was fixed for Biopython 1.49 (and 1.49 beta).

Frank wrote:
> Nexus.Trees has been extended to deal with internal node names, or "special
> comments" in the format [& blablalba]. Such comments comments can appear
> directly after the taxon label, after the closing parentheses, or between
> branchlength / support values attached to a node or a taxon labels, ...

i.e. On Bug 2543, Frank didn't go as far as the enhancement to cope with
"naked" node labels, just those in the square brackets.

Consider this smaller example Cymon gave on Bug 2543:

>>> from Bio.Nexus.Trees import Tree
>>> tree_str2 = "(((t9:0.385832, (t8:0.445135,t4:0.41401)C:0.024032)B:0.041436, t6:0.392496)A:0.0291131, t2:0.497673, ((t0:0.301171, t7:0.482152)E:0.0268148, ((t5:0.0984167,t3:0.488578)G:0.0349662, t1:0.130208)F:0.0318288)D:0.0273876);"
>>> tree_obj = Tree(tree_str2)
Traceback (most recent call last):
...
ValueError: invalid literal for float(): A


I've retitled this and marked it as an enhancement.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Thu Mar 12 10:41:30 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 12 Mar 2009 06:41:30 -0400
Subject: [Biopython-dev] [Bug 2543] Bio.Nexus.Trees can't handle named
	ancestors
In-Reply-To: <bug-2543-42@http.bugzilla.open-bio.org/>
Message-ID: <200903121041.n2CAfUwH014362@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2543


------- Comment #7 from biopython-bugzilla at maubp.freeserve.co.uk  2009-03-12 06:41 EST -------
On comment #5 Frank wrote:
> In my opinion, naming nodes is a feature, and I would not regard the lack of
> this feature as a bug.  But I'll have a look at the code and see how easy
> this can be changed. It would actually be nice if P4 and Bio.Nexus, both
> being python programs, could read each other's trees.

This enhancement is now covered by Bug 2788.  It appears that now several other
programs support this Newick tree variant, making it a bit more important.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From chris.lasher at gmail.com  Thu Mar 12 21:07:21 2009
From: chris.lasher at gmail.com (Chris Lasher)
Date: Thu, 12 Mar 2009 17:07:21 -0400
Subject: [Biopython-dev] biopython on github
In-Reply-To: <320fb6e00902260600p5fb90241td1ded497c08cb901@mail.gmail.com>
References: <5aa3b3570902150729g367022a5p334b2c33f86461f@mail.gmail.com>
	<320fb6e00902230731h6257376sb2d6772f72b6e03a@mail.gmail.com>
	<3f6baf360902230843u320e9fe9wc0a03928383d6cbb@mail.gmail.com>
	<320fb6e00902230908j38f5755la85a55bfc461a763@mail.gmail.com>
	<8b34ec180902231029u7a9d003r533af7f078f4a8e2@mail.gmail.com>
	<8b34ec180902250140k4fb1bef0y913b97db0e309e4b@mail.gmail.com>
	<320fb6e00902250210t2ad19536ke379e219ba6f7dae@mail.gmail.com>
	<8b34ec180902250256k6f6f5c1bvbf85d8b68a315927@mail.gmail.com>
	<8b34ec180902260526m3ff42f3x2a99a77d4d0fb928@mail.gmail.com>
	<320fb6e00902260600p5fb90241td1ded497c08cb901@mail.gmail.com>
Message-ID: <128a885f0903121407g133ed8ctda57b21ff8adb70e@mail.gmail.com>

On Thu, Feb 26, 2009 at 10:00 AM, Peter <biopython at maubp.freeserve.co.uk> wrote:
> Another option to consider would be to switch to running git on
> biopython.org, but use the git-cvsserver tool to provide an emulated
> CVS server on top of the git repository. ?This sounds possible in
> theory, and would be nice for any "old fashioned" biopython developers
> because is should be fairly transparent - they can continue to treat
> it as CVS and just work on the main trunk. ?This would require someone
> competent to do the conversion and alter the server setup - we'd have
> to talk to the OBF team about this. ?However, if anyone has first hand
> experience on git-cvsserver perhaps they could comment on weather this
> sounds like a good plan or not.

I must be missing something, Peter. Why would BioPython continue to
operate with CVS? I suppose I just really hope to see BioPython
running with something other than CVS, and I'd really like to see it
go either under Bazaar or Git.

Chris


From bartek at rezolwenta.eu.org  Thu Mar 12 23:20:23 2009
From: bartek at rezolwenta.eu.org (Bartek Wilczynski)
Date: Fri, 13 Mar 2009 00:20:23 +0100
Subject: [Biopython-dev] biopython on github
In-Reply-To: <128a885f0903121407g133ed8ctda57b21ff8adb70e@mail.gmail.com>
References: <5aa3b3570902150729g367022a5p334b2c33f86461f@mail.gmail.com>
	<3f6baf360902230843u320e9fe9wc0a03928383d6cbb@mail.gmail.com>
	<320fb6e00902230908j38f5755la85a55bfc461a763@mail.gmail.com>
	<8b34ec180902231029u7a9d003r533af7f078f4a8e2@mail.gmail.com>
	<8b34ec180902250140k4fb1bef0y913b97db0e309e4b@mail.gmail.com>
	<320fb6e00902250210t2ad19536ke379e219ba6f7dae@mail.gmail.com>
	<8b34ec180902250256k6f6f5c1bvbf85d8b68a315927@mail.gmail.com>
	<8b34ec180902260526m3ff42f3x2a99a77d4d0fb928@mail.gmail.com>
	<320fb6e00902260600p5fb90241td1ded497c08cb901@mail.gmail.com>
	<128a885f0903121407g133ed8ctda57b21ff8adb70e@mail.gmail.com>
Message-ID: <8b34ec180903121620w9c2ec46i8fed9ccb4781370e@mail.gmail.com>

On Thu, Mar 12, 2009 at 10:07 PM, Chris Lasher <chris.lasher at gmail.com> wrote:
>
> I must be missing something, Peter. Why would BioPython continue to
> operate with CVS? I suppose I just really hope to see BioPython
> running with something other than CVS, and I'd really like to see it
> go either under Bazaar or Git.
>
Hi Chris,

The idea is to do the switch in two steps:
- first we still have the main branch in CVS while we have git and/or
bzr branches synchronized with it for people to branch and contribute
- If this works nicely, we will switch to one of these systems
completely (while possibly keeping the other branch in sync, but this
is not yet decided)

The first step is to some extent operational (I'm currently busy with
other stuff, but I'll get arround it hopefully this weekend), but the
second step requires decision on our side (git or bzr?) and action on
the side of OBF (there is no git or bazar installed on obf servers).

cheers
-- 
Bartek Wilczynski


From biopython at maubp.freeserve.co.uk  Fri Mar 13 12:21:14 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Fri, 13 Mar 2009 12:21:14 +0000
Subject: [Biopython-dev] biopython on github
In-Reply-To: <8b34ec180903121620w9c2ec46i8fed9ccb4781370e@mail.gmail.com>
References: <5aa3b3570902150729g367022a5p334b2c33f86461f@mail.gmail.com>
	<320fb6e00902230908j38f5755la85a55bfc461a763@mail.gmail.com>
	<8b34ec180902231029u7a9d003r533af7f078f4a8e2@mail.gmail.com>
	<8b34ec180902250140k4fb1bef0y913b97db0e309e4b@mail.gmail.com>
	<320fb6e00902250210t2ad19536ke379e219ba6f7dae@mail.gmail.com>
	<8b34ec180902250256k6f6f5c1bvbf85d8b68a315927@mail.gmail.com>
	<8b34ec180902260526m3ff42f3x2a99a77d4d0fb928@mail.gmail.com>
	<320fb6e00902260600p5fb90241td1ded497c08cb901@mail.gmail.com>
	<128a885f0903121407g133ed8ctda57b21ff8adb70e@mail.gmail.com>
	<8b34ec180903121620w9c2ec46i8fed9ccb4781370e@mail.gmail.com>
Message-ID: <320fb6e00903130521s69c5b3eg55b71191b1e8ff21@mail.gmail.com>

On Thu, Mar 12, 2009 at 11:20 PM, Bartek Wilczynski
<bartek at rezolwenta.eu.org> wrote:
> On Thu, Mar 12, 2009 at 10:07 PM, Chris Lasher <chris.lasher at gmail.com> wrote:
>> On Thu, Feb 26, 2009 at 10:00 AM, Peter <biopython at maubp.freeserve.co.uk> wrote:
>>> Another option to consider would be to switch to running git on
>>> biopython.org, but use the git-cvsserver tool to provide an emulated
>>> CVS server on top of the git repository.  This sounds possible in
>>> theory, and would be nice for any "old fashioned" biopython developers
>>> because is should be fairly transparent - they can continue to treat
>>> it as CVS and just work on the main trunk.  This would require someone
>>> competent to do the conversion and alter the server setup - we'd have
>>> to talk to the OBF team about this.  However, if anyone has first hand
>>> experience on git-cvsserver perhaps they could comment on weather this
>>> sounds like a good plan or not.
>>
>> I must be missing something, Peter. Why would BioPython continue to
>> operate with CVS? I suppose I just really hope to see BioPython
>> running with something other than CVS, and I'd really like to see it
>> go either under Bazaar or Git.

I'm warming to the idea of git, and had noticed git includes the
optional git-cvsserver tool which emulates a CVS server while using
git underneath.  I was wondering if anyone had first hand experience
of this.  If we did move from CVS to git (still hosted on
biopython.org), this would seem to offer a nice migration path for of
our "old school" CVS developers - they can carry on as usual.  Of
course, if none of us care about having to learn a new interface, then
a simple switch would be less hassle to setup.  For the server side of
things, we'll need to talk to the OBF team about any such move - as
far as I know they've only managed CVS to SVN migrations in the past.

Peter

> Hi Chris,
>
> The idea is to do the switch in two steps:
> - first we still have the main branch in CVS while we have git and/or
> bzr branches synchronized with it for people to branch and contribute
> - If this works nicely, we will switch to one of these systems
> completely (while possibly keeping the other branch in sync, but this
> is not yet decided)

That does seem like a good plan.  Of course, there is the related
issue of where we host the official repository (externally, e.g. on
github or lauchpad) or in house (on biopython.org).  I favour keeping
the official repository on biopython.org but this will require OBF
technical support (do we have the expertise within Biopython? Bartek?
Chris?).

> The first step is to some extent operational (I'm currently busy with
> other stuff, but I'll get arround it hopefully this weekend), but the
> second step requires decision on our side (git or bzr?) and action on
> the side of OBF (there is no git or bazar installed on obf servers).

There is also the previously semi-agreed solution of switching from
CVS to SVN on biopython.org, but this would be only a gradual
improvement.  I gather there are mature tools for using git+svn
together, so it should be better than using git+cvs together.  Other
than meaning all the OBF hosted projects are on SVN (I think we are
the last still on CVS), this is beginning to seem a bit pointless.

Peter


From bugzilla-daemon at portal.open-bio.org  Fri Mar 13 15:48:39 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 13 Mar 2009 11:48:39 -0400
Subject: [Biopython-dev] [Bug 2780] PDB file HETATMs cannot be alternative
	location of a residue that is an ATOM
In-Reply-To: <bug-2780-42@http.bugzilla.open-bio.org/>
Message-ID: <200903131548.n2DFmdZ6015899@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2780


------- Comment #2 from klaus.kopec at tuebingen.mpg.de  2009-03-13 11:48 EST -------
PDB IDs of some more occurances (simply search the file for "HETATM" and look
for a HETATM record that is followed by a ATOM with the same residue number and
a different altloc).

1din
1k4q
1k55 - multiple occurances
1k56
1rqh
1rr2
1xpk
1xpl - multiple occurances
1xpm - multiple occurances


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From jblanca at btc.upv.es  Fri Mar 13 15:59:01 2009
From: jblanca at btc.upv.es (Jose Blanca)
Date: Fri, 13 Mar 2009 16:59:01 +0100
Subject: [Biopython-dev] library to create gel image
In-Reply-To: <200902271157.49948.jblanca@btc.upv.es>
References: <200902261612.54306.jblanca@btc.upv.es>
	<320fb6e00902270245q65c0b924obd5181576374134c@mail.gmail.com>
	<200902271157.49948.jblanca@btc.upv.es>
Message-ID: <200903131659.01590.jblanca@btc.upv.es>

Hi:
I've fishished a first version of a program that reads a list of Applied 
Biosystems fsa files and draws a virtual gel. It does not reads the sequence 
because my users are interested in fragment analysis, but the basic 
infraestructure is in place to do it.
It does what my users need. It's quite slow though, but I'm not investing time 
in optimizing it.
If anybody wants to take a look at the code is in:
http://bioinf.comav.upv.es/svn/gelify/gelifyfsa/
I distribute it under the GPL licence.
If you think that any part of the code could be of any use for the Biopython 
project I would be very please to give it to the comunity.
Best regards,

Jose Blanca

On Friday 27 February 2009 11:57:49 Jose Blanca wrote:
> On Friday 27 February 2009 11:45:59 Peter wrote:
> > On Fri, Feb 27, 2009 at 9:05 AM, Jose Blanca <jblanca at btc.upv.es> wrote:
> > That's much clearer - is the Genographer software showing the actual
> > image (zoomed as required, with the colours adjusted as required), or
> > an artificial recreation?
>
> Is an artificial recreation, the same as I'm trying to accomplish. I just
> want more resolution an automated process (genographer is a GUI
> application)
>
> > Are you trying to create this figure for illustrative purposes only?
> > I mean would a slightly cartoon like recreation be fine, or are you
> > trying to make it as realistic as possible?
>
> I want to analyze it.
>
> > I see you are having to reverse engineer their file format.  I guess
> > other people have tried this in the past so there may be more clues
> > out on the internet.  Have you tried emailing the company to see if
> > they would publish the file format specifications (unlikely I fear,
> > but worth asking).
>
> Fortunately the ABIF was reverse enginered by people more clever than me.
> And a couple of years ago Applied published an specification.
> http://bioinf.comav.upv.es/svn/gelify/gelifyfsa/src/doc/ABIF_File_Format.pd
>f You can't beleive everything in that specification, but it is a good
> start. Reading an abif file is not a problem, drawing the gel with as
> little coding as possible is another thing.
> Regards,
>
> Jose Blanca
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev


-- 
Jose M. Blanca Postigo
Instituto Universitario de Conservacion y
Mejora de la Agrodiversidad Valenciana (COMAV)
Universidad Politecnica de Valencia (UPV)
Edificio CPI (Ciudad Politecnica de la Innovacion), 8E
46022 Valencia (SPAIN)
Tlf.:+34-96-3877000 (ext 88473)


From biopython at maubp.freeserve.co.uk  Fri Mar 13 16:12:12 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Fri, 13 Mar 2009 16:12:12 +0000
Subject: [Biopython-dev] library to create gel image
In-Reply-To: <200903131659.01590.jblanca@btc.upv.es>
References: <200902261612.54306.jblanca@btc.upv.es>
	<320fb6e00902270245q65c0b924obd5181576374134c@mail.gmail.com>
	<200902271157.49948.jblanca@btc.upv.es>
	<200903131659.01590.jblanca@btc.upv.es>
Message-ID: <320fb6e00903130912k455c49d6y6baff970ad064bd@mail.gmail.com>

On Fri, Mar 13, 2009 at 3:59 PM, Jose Blanca <jblanca at btc.upv.es> wrote:
> Hi:
> I've fishished a first version of a program that reads a list of Applied
> Biosystems fsa files and draws a virtual gel. It does not reads the sequence
> because my users are interested in fragment analysis, but the basic
> infraestructure is in place to do it.
> It does what my users need. It's quite slow though, but I'm not investing time
> in optimizing it.

Do you have any example images online for people to look at?

Peter


From jblanca at btc.upv.es  Fri Mar 13 16:16:46 2009
From: jblanca at btc.upv.es (Jose Blanca)
Date: Fri, 13 Mar 2009 17:16:46 +0100
Subject: [Biopython-dev] library to create gel image
In-Reply-To: <320fb6e00903130912k455c49d6y6baff970ad064bd@mail.gmail.com>
References: <200902261612.54306.jblanca@btc.upv.es>
	<200903131659.01590.jblanca@btc.upv.es>
	<320fb6e00903130912k455c49d6y6baff970ad064bd@mail.gmail.com>
Message-ID: <200903131716.46413.jblanca@btc.upv.es>

Here you have one:
http://bioinf.comav.upv.es/svn/gelify/gelifyfsa/src/doc/out.png
Jose Blanca

On Friday 13 March 2009 17:12:12 Peter wrote:
> On Fri, Mar 13, 2009 at 3:59 PM, Jose Blanca <jblanca at btc.upv.es> wrote:
> > Hi:
> > I've fishished a first version of a program that reads a list of Applied
> > Biosystems fsa files and draws a virtual gel. It does not reads the
> > sequence because my users are interested in fragment analysis, but the
> > basic infraestructure is in place to do it.
> > It does what my users need. It's quite slow though, but I'm not investing
> > time in optimizing it.
>
> Do you have any example images online for people to look at?
>
> Peter


-- 
Jose M. Blanca Postigo
Instituto Universitario de Conservacion y
Mejora de la Agrodiversidad Valenciana (COMAV)
Universidad Politecnica de Valencia (UPV)
Edificio CPI (Ciudad Politecnica de la Innovacion), 8E
46022 Valencia (SPAIN)
Tlf.:+34-96-3877000 (ext 88473)


From chris.lasher at gmail.com  Sun Mar 15 05:43:34 2009
From: chris.lasher at gmail.com (Chris Lasher)
Date: Sun, 15 Mar 2009 01:43:34 -0400
Subject: [Biopython-dev] biopython on github
In-Reply-To: <320fb6e00903130521s69c5b3eg55b71191b1e8ff21@mail.gmail.com>
References: <5aa3b3570902150729g367022a5p334b2c33f86461f@mail.gmail.com>
	<8b34ec180902231029u7a9d003r533af7f078f4a8e2@mail.gmail.com>
	<8b34ec180902250140k4fb1bef0y913b97db0e309e4b@mail.gmail.com>
	<320fb6e00902250210t2ad19536ke379e219ba6f7dae@mail.gmail.com>
	<8b34ec180902250256k6f6f5c1bvbf85d8b68a315927@mail.gmail.com>
	<8b34ec180902260526m3ff42f3x2a99a77d4d0fb928@mail.gmail.com>
	<320fb6e00902260600p5fb90241td1ded497c08cb901@mail.gmail.com>
	<128a885f0903121407g133ed8ctda57b21ff8adb70e@mail.gmail.com>
	<8b34ec180903121620w9c2ec46i8fed9ccb4781370e@mail.gmail.com>
	<320fb6e00903130521s69c5b3eg55b71191b1e8ff21@mail.gmail.com>
Message-ID: <128a885f0903142243r372026d7vdf5bbe998db3a326@mail.gmail.com>

On Fri, Mar 13, 2009 at 8:21 AM, Peter <biopython at maubp.freeserve.co.uk> wrote:
> On Thu, Mar 12, 2009 at 11:20 PM, Bartek Wilczynski
> <bartek at rezolwenta.eu.org> wrote:
>> On Thu, Mar 12, 2009 at 10:07 PM, Chris Lasher <chris.lasher at gmail.com> wrote:
>>> On Thu, Feb 26, 2009 at 10:00 AM, Peter <biopython at maubp.freeserve.co.uk> wrote:
>>>> Another option to consider would be to switch to running git on
>>>> biopython.org, but use the git-cvsserver tool to provide an emulated
>>>> CVS server on top of the git repository. ?This sounds possible in
>>>> theory, and would be nice for any "old fashioned" biopython developers
>>>> because is should be fairly transparent - they can continue to treat
>>>> it as CVS and just work on the main trunk. ?This would require someone
>>>> competent to do the conversion and alter the server setup - we'd have
>>>> to talk to the OBF team about this. ?However, if anyone has first hand
>>>> experience on git-cvsserver perhaps they could comment on weather this
>>>> sounds like a good plan or not.
>>>
>>> I must be missing something, Peter. Why would BioPython continue to
>>> operate with CVS? I suppose I just really hope to see BioPython
>>> running with something other than CVS, and I'd really like to see it
>>> go either under Bazaar or Git.
>
> I'm warming to the idea of git, and had noticed git includes the
> optional git-cvsserver tool which emulates a CVS server while using
> git underneath. ?I was wondering if anyone had first hand experience
> of this. ?If we did move from CVS to git (still hosted on
> biopython.org), this would seem to offer a nice migration path for of
> our "old school" CVS developers - they can carry on as usual. ?Of
> course, if none of us care about having to learn a new interface, then
> a simple switch would be less hassle to setup. ?For the server side of
> things, we'll need to talk to the OBF team about any such move - as
> far as I know they've only managed CVS to SVN migrations in the past.
>
> Peter
>
>> Hi Chris,
>>
>> The idea is to do the switch in two steps:
>> - first we still have the main branch in CVS while we have git and/or
>> bzr branches synchronized with it for people to branch and contribute
>> - If this works nicely, we will switch to one of these systems
>> completely (while possibly keeping the other branch in sync, but this
>> is not yet decided)
>
> That does seem like a good plan. ?Of course, there is the related
> issue of where we host the official repository (externally, e.g. on
> github or lauchpad) or in house (on biopython.org). ?I favour keeping
> the official repository on biopython.org but this will require OBF
> technical support (do we have the expertise within Biopython? Bartek?
> Chris?).
>
>> The first step is to some extent operational (I'm currently busy with
>> other stuff, but I'll get arround it hopefully this weekend), but the
>> second step requires decision on our side (git or bzr?) and action on
>> the side of OBF (there is no git or bazar installed on obf servers).
>
> There is also the previously semi-agreed solution of switching from
> CVS to SVN on biopython.org, but this would be only a gradual
> improvement. ?I gather there are mature tools for using git+svn
> together, so it should be better than using git+cvs together. ?Other
> than meaning all the OBF hosted projects are on SVN (I think we are
> the last still on CVS), this is beginning to seem a bit pointless.
>
> Peter
>

Peter et al.,

I started off writing an email about why I think hosting at GitHub or
Launchpad is a better idea, but it got a bit verbose, so I just wrote
up a blog post instead. (Besides, links and images are more fun, and
make the intarwebs go 'round.) Please see
http://igotgenes.blogspot.com/2009/03/why-biopython-needs-to-move-to-github.html
or
http://tinyurl.com/a9o7ae

Chris


From mjldehoon at yahoo.com  Sun Mar 15 10:24:11 2009
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Sun, 15 Mar 2009 03:24:11 -0700 (PDT)
Subject: [Biopython-dev] Bio.ExPASy
Message-ID: <76595.11423.qm@web62404.mail.re1.yahoo.com>


Hi everybody,

As discussed previously, I have moved the Bio.Prosite code to Bio.ExPASy, and I've added a ScanProsite module to Bio.ExPASy. I guess Bio.Enzyme should also move to Bio.ExPASy. See

http://biopython.org/DIST/docs/tutorial/Tutorial.proposal.html

for the documentation of Biopython as currently in CVS.

--Michiel.


From mjldehoon at yahoo.com  Sun Mar 15 12:53:28 2009
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Sun, 15 Mar 2009 05:53:28 -0700 (PDT)
Subject: [Biopython-dev] Fw: Re:  Bio.Entrez catching more errors
Message-ID: <722257.11611.qm@web62401.mail.re1.yahoo.com>


--- On Sun, 3/15/09, Michiel de Hoon <mjldehoon at yahoo.com> wrote:
> Whereas I think it's a good idea if Bio.Entrez catches
> more errors, I think the parser is a more suitable place to
> check for errors. See Bio.ExPASy.ScanProsite for an example
> of catching errors with an XML parser; this avoids using a
> File.UndoHandle.
> 
> --Michiel
> 
> --- On Tue, 3/10/09, Peter
> <biopython at maubp.freeserve.co.uk> wrote:
> 
> > From: Peter <biopython at maubp.freeserve.co.uk>
> > Subject: [Biopython-dev] Bio.Entrez catching more
> errors
> > To: "BioPython-Dev Mailing List"
> <biopython-dev at lists.open-bio.org>
> > Date: Tuesday, March 10, 2009, 7:40 PM
> > Hi All,
> > 
> > It occured to me that the Bio.Entrez._open function
> can
> > look at the
> > retmode argument (if present) and spot if there is a
> > mismatch between
> > the requested format (e.g. XML, HTML, text or asn.1)
> and
> > the actual
> > data the NCBI returned.  Something along the following
> > lines could be
> > added to the end of the _open function in
> > Bio/Entrez/__init__.py to
> > acheive this:
> > 
> >     elif "retmode" in params and
> > params["retmode"].lower()=="html"
> \
> >     and not
> data.lower().startswith("<html")
> > \
> >     and not data.lower().startswith("<!doctype
> > html") :
> >         raise TypeError("Requested HTML, but
> > didn't get it: %s..." % data)
> >     elif "retmode" in params and
> > params["retmode"].lower()=="xml"
> \
> >     and not
> data.lower().startswith("<?xml") :
> >         raise TypeError("Requested XML, but
> didn't
> > get it: %s..." % data)
> >     elif "retmode" in params and
> > params["retmode"] \
> >     and
> > params["retmode"].lower()!="xml"
> \
> >     and data.lower().startswith("<?xml")
> :
> >         raise TypeError("Didn't request XML,
> but
> > got it: %s..." % data)
> >     elif "retmode" in params and
> > params["retmode"] \
> >     and
> > params["retmode"].lower()!="html"
> \
> >     and (data.lower().startswith("<html")
> or
> > \
> >          data.lower().startswith("<!doctype
> > html")):
> >         #Expected for some error pages (e.g. the Bad
> > Gateway caught above)
> >         raise TypeError("Didn't request HTML,
> but
> > got it: %s..." % data)
> > 
> > I'm sure my XML/HTML detection could be made more
> > robust here - I hope
> > the principle is clear.  My motivation is that I have
> > noticed the NCBI
> > can return HTML error pages, and while we do catch
> some of
> > these
> > explicitly (e.g. Bad Gateway, or Service Unavailable),
> I
> > think any
> > HTML page when the user asked from XML, text or asn.1
> > should be
> > treated as error.  Similarly, not getting XML when you
> ask
> > for it etc.
> > 
> > Note that by raising the exception including the
> message
> > text it
> > should be much easier to diagnose these failures.  As
> a
> > tiny
> > refinement to the above code, we should only add the
> > "..." if there is
> > more text to follow - this isn't always the case.
> > 
> > e.g. The following give an HTML error page (while some
> > databases like
> > "protein" are better behaved in this
> respect):
> > >>> print
> Entrez.efetch(db="homologene",
> > id="nonexistant",
> retmode="text").read()
> > >>> print
> Entrez.efetch(db="homologene",
> > id="nonexistant",
> > retmode="asn.1").read()
> > 
> > Similarly, these give an XML like fragment (which is
> not a
> > valid XML
> > file in itself - arguably an NCBI bug; some databases
> like
> > "protein"
> > are better behaved in this respect):
> > >>> print
> Entrez.efetch(db="pubmed",
> > id="nonexistant",
> retmode="xml").read()
> > >>> print
> Entrez.efetch(db="homologene",
> > id="nonexistant",
> retmode="xml").read()
> > >>> print Entrez.efetch(db="cdd",
> > id="nonexistant",
> retmode="xml").read()
> > >>> print
> Entrez.efetch(db="taxonomy",
> > id="nonexistant",
> retmode="xml").read()
> > 
> > My suggested change to Bio.Entrez would also catch the
> > following
> > examples (using an invalid database) where the NCBI
> ignore
> > the retmode
> > and return an HTML help page:
> > >>> print
> > Entrez.efetch(db="nonexistant",
> > id="123456", retmode="xml").read()
> > >>> print
> > Entrez.efetch(db="nonexistant",
> > id="123456",
> retmode="text").read()
> > 
> > In a less clear cut example, this would flag the
> following
> > as an error
> > as the NCBI seem to return ASN.1 text instead of HTML
> > here::
> > >>> print
> Entrez.efetch(db="nucleotide",
> > retmode="html",
> id="123456").read()
> > 
> > Overall, I think this change should catch lots of
> errors
> > which
> > otherwise may not be detected until later (e.g. while
> > trying to parse
> > the file).
> > 
> >
> --------------------------------------------------------------------------------------------------
> > 
> > On another point, should we catch these responses as
> > errors:?
> > 
> > >>> efetch(db="snp",
> > id="123456").read()
> > '<html><head><title>PmFetch
> >
> response</title></head><body>\n<pre>\n1:
> > id: 123456 Error occurred: cannot get document
> >
> summary\n</pre></body></html>'
> > >>> efetch(db="snp",
> > id="123456",
> retmode="html").read()
> > '<html><head><title>PmFetch
> >
> response</title></head><body>\n<pre>\n1:
> > id: 123456 Error occurred: cannot get document
> >
> summary\n</pre></body></html>'
> > >>> efetch(db="snp",
> > id="123456", retmode="xml").read()
> > '<?xml
> > version="1.0"?>\n<ExchangeSet
> >
> xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"\nxmlns="http://www.ncbi.nlm.nih.gov/SNP/docsum"\nxsi:schemaLocation="http://www.ncbi.nlm.nih.gov/SNP/docsum\nhttp://www.ncbi.nlm.nih.gov/SNP/docsum/eudocsum.xsd">1:
> > id: 123456 Error occurred: cannot get document
> > summary\n\n</ExchangeSet>'
> > >>> efetch(db="snp",
> > id="123456",
> retmode="text").read()
> > '1: id: 123456 Error occurred: cannot get document
> > summary\n'
> > 
> > and,
> > >>> print efetch(db="homologene",
> > retmode="html", id="fake").read()
> > <html>
> > <body>
> > <br/><h2>Error occurred: Empty id list -
> > nothing todo</h2>...
> > 
> > Looking for the string "Error occurred: "
> looks
> > fairly safe here, and
> > should cover a range of entries.  Of course, you can
> > imagine false
> > positives too, e.g. a valid PUBMED plain text record
> for a
> > tutorial
> > article with a title like "Yikes! An Error
> Occurred: A
> > beginner's
> > Guide To Defensive Programming." could match.
> > 
> > Peter
> > _______________________________________________
> > Biopython-dev mailing list
> > Biopython-dev at lists.open-bio.org
> >
> http://lists.open-bio.org/mailman/listinfo/biopython-dev


From chapmanb at 50mail.com  Sun Mar 15 18:54:43 2009
From: chapmanb at 50mail.com (Brad Chapman)
Date: Sun, 15 Mar 2009 14:54:43 -0400
Subject: [Biopython-dev] biopython on github
In-Reply-To: <128a885f0903142243r372026d7vdf5bbe998db3a326@mail.gmail.com>
References: <8b34ec180902231029u7a9d003r533af7f078f4a8e2@mail.gmail.com>
	<8b34ec180902250140k4fb1bef0y913b97db0e309e4b@mail.gmail.com>
	<320fb6e00902250210t2ad19536ke379e219ba6f7dae@mail.gmail.com>
	<8b34ec180902250256k6f6f5c1bvbf85d8b68a315927@mail.gmail.com>
	<8b34ec180902260526m3ff42f3x2a99a77d4d0fb928@mail.gmail.com>
	<320fb6e00902260600p5fb90241td1ded497c08cb901@mail.gmail.com>
	<128a885f0903121407g133ed8ctda57b21ff8adb70e@mail.gmail.com>
	<8b34ec180903121620w9c2ec46i8fed9ccb4781370e@mail.gmail.com>
	<320fb6e00903130521s69c5b3eg55b71191b1e8ff21@mail.gmail.com>
	<128a885f0903142243r372026d7vdf5bbe998db3a326@mail.gmail.com>
Message-ID: <20090315185443.GA30296@kunkel>

Hi all;
It is good to see the discussion around revision control systems;
Chris and Paulo's posts make some nice points. Source code
management is an important issue that influences perception of
Biopython and barriers to contributing.

My two cents on what we should do is:

- Pick a distributed source code management system. My preference
  is Git, only because it currently has more steam behind it.
  Git/Bazaar will likely end up being like the VHS/Beta debate.

- Test drive use of Git on an official GitHub repository. This would
  involve a few things:

  = Bartek and Giovanni: Can you coordinate on a single GitHub
    Biopython instance and remove the others to eliminate confusion?
  = Write up documentation for contributors. This is where we could use
    some volunteers from those interested to update the web pages.
    The two main places that need updating are:

    http://biopython.org/wiki/Contributing
    http://biopython.org/wiki/CVS
    
    I think we should ensure people are clear on what is being done
    and where you can contribute.

- Ensure GitHub can be synced with current CVS. Bartek, it sounds
  like you have a handle on this.

- Evaluate the success of Git. This is easy to measure in terms of
  new contributors, increased happiness, and what not. At the same
  time we can monitor how GitHub evolves over time.

- If successful, talk to the OpenBio team about hosting Git locally.

Peter, Michiel, et al -- how do you feel?

I think being cautious with the transition, as Peter recommends, is
important. I am old enough to remember Sourceforge being new and
everyone saying how it was stupid not to move there; then over time
Sourceforge got slow with all the users and people moved
away from it. This is just to say -- no one knows how GitHub (or
Launchpad) will evolve. OpenBio is a stable, small, nice community
and to the extent we can use their resources I believe we should.

Overall, the specifics of the above proposal aren't as important as
just doing something unambiguous and then evaluating how it works.
Right now things are a big confusing, which I think could put off
new developers, who are always welcome.

Looking forward to talking about code instead of revision control,
Brad

> On Fri, Mar 13, 2009 at 8:21 AM, Peter <biopython at maubp.freeserve.co.uk> wrote:
> > On Thu, Mar 12, 2009 at 11:20 PM, Bartek Wilczynski
> > <bartek at rezolwenta.eu.org> wrote:
> >> On Thu, Mar 12, 2009 at 10:07 PM, Chris Lasher <chris.lasher at gmail.com> wrote:
> >>> On Thu, Feb 26, 2009 at 10:00 AM, Peter <biopython at maubp.freeserve.co.uk> wrote:
> >>>> Another option to consider would be to switch to running git on
> >>>> biopython.org, but use the git-cvsserver tool to provide an emulated
> >>>> CVS server on top of the git repository. ?This sounds possible in
> >>>> theory, and would be nice for any "old fashioned" biopython developers
> >>>> because is should be fairly transparent - they can continue to treat
> >>>> it as CVS and just work on the main trunk. ?This would require someone
> >>>> competent to do the conversion and alter the server setup - we'd have
> >>>> to talk to the OBF team about this. ?However, if anyone has first hand
> >>>> experience on git-cvsserver perhaps they could comment on weather this
> >>>> sounds like a good plan or not.
> >>>
> >>> I must be missing something, Peter. Why would BioPython continue to
> >>> operate with CVS? I suppose I just really hope to see BioPython
> >>> running with something other than CVS, and I'd really like to see it
> >>> go either under Bazaar or Git.
> >
> > I'm warming to the idea of git, and had noticed git includes the
> > optional git-cvsserver tool which emulates a CVS server while using
> > git underneath. ?I was wondering if anyone had first hand experience
> > of this. ?If we did move from CVS to git (still hosted on
> > biopython.org), this would seem to offer a nice migration path for of
> > our "old school" CVS developers - they can carry on as usual. ?Of
> > course, if none of us care about having to learn a new interface, then
> > a simple switch would be less hassle to setup. ?For the server side of
> > things, we'll need to talk to the OBF team about any such move - as
> > far as I know they've only managed CVS to SVN migrations in the past.
> >
> > Peter
> >
> >> Hi Chris,
> >>
> >> The idea is to do the switch in two steps:
> >> - first we still have the main branch in CVS while we have git and/or
> >> bzr branches synchronized with it for people to branch and contribute
> >> - If this works nicely, we will switch to one of these systems
> >> completely (while possibly keeping the other branch in sync, but this
> >> is not yet decided)
> >
> > That does seem like a good plan. ?Of course, there is the related
> > issue of where we host the official repository (externally, e.g. on
> > github or lauchpad) or in house (on biopython.org). ?I favour keeping
> > the official repository on biopython.org but this will require OBF
> > technical support (do we have the expertise within Biopython? Bartek?
> > Chris?).
> >
> >> The first step is to some extent operational (I'm currently busy with
> >> other stuff, but I'll get arround it hopefully this weekend), but the
> >> second step requires decision on our side (git or bzr?) and action on
> >> the side of OBF (there is no git or bazar installed on obf servers).
> >
> > There is also the previously semi-agreed solution of switching from
> > CVS to SVN on biopython.org, but this would be only a gradual
> > improvement. ?I gather there are mature tools for using git+svn
> > together, so it should be better than using git+cvs together. ?Other
> > than meaning all the OBF hosted projects are on SVN (I think we are
> > the last still on CVS), this is beginning to seem a bit pointless.
> >
> > Peter
> >
> 
> Peter et al.,
> 
> I started off writing an email about why I think hosting at GitHub or
> Launchpad is a better idea, but it got a bit verbose, so I just wrote
> up a blog post instead. (Besides, links and images are more fun, and
> make the intarwebs go 'round.) Please see
> http://igotgenes.blogspot.com/2009/03/why-biopython-needs-to-move-to-github.html
> or
> http://tinyurl.com/a9o7ae
> 
> Chris
> 
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev


From bartek at rezolwenta.eu.org  Sun Mar 15 20:12:46 2009
From: bartek at rezolwenta.eu.org (Bartek Wilczynski)
Date: Sun, 15 Mar 2009 21:12:46 +0100
Subject: [Biopython-dev] biopython on github
In-Reply-To: <20090315185443.GA30296@kunkel>
References: <8b34ec180902231029u7a9d003r533af7f078f4a8e2@mail.gmail.com>
	<320fb6e00902250210t2ad19536ke379e219ba6f7dae@mail.gmail.com>
	<8b34ec180902250256k6f6f5c1bvbf85d8b68a315927@mail.gmail.com>
	<8b34ec180902260526m3ff42f3x2a99a77d4d0fb928@mail.gmail.com>
	<320fb6e00902260600p5fb90241td1ded497c08cb901@mail.gmail.com>
	<128a885f0903121407g133ed8ctda57b21ff8adb70e@mail.gmail.com>
	<8b34ec180903121620w9c2ec46i8fed9ccb4781370e@mail.gmail.com>
	<320fb6e00903130521s69c5b3eg55b71191b1e8ff21@mail.gmail.com>
	<128a885f0903142243r372026d7vdf5bbe998db3a326@mail.gmail.com>
	<20090315185443.GA30296@kunkel>
Message-ID: <8b34ec180903151312q5a3b2bcdwc526aef5d4ca2cfc@mail.gmail.com>

Hi all,

On Sun, Mar 15, 2009 at 7:54 PM, Brad Chapman <chapmanb at 50mail.com> wrote:
>
> - Pick a distributed source code management system. My preference
> ?is Git, only because it currently has more steam behind it.
> ?Git/Bazaar will likely end up being like the VHS/Beta debate.
>
> - Test drive use of Git on an official GitHub repository. This would
> ?involve a few things:
>
> ?= Bartek and Giovanni: Can you coordinate on a single GitHub
> ? ?Biopython instance and remove the others to eliminate confusion?
> ?= Write up documentation for contributors. This is where we could use
> ? ?some volunteers from those interested to update the web pages.
> ? ?The two main places that need updating are:
>
> ? ?http://biopython.org/wiki/Contributing
> ? ?http://biopython.org/wiki/CVS
>
> ? ?I think we should ensure people are clear on what is being done
> ? ?and where you can contribute.
>
> - Ensure GitHub can be synced with current CVS. Bartek, it sounds
> ?like you have a handle on this.
>
> - Evaluate the success of Git. This is easy to measure in terms of
> ?new contributors, increased happiness, and what not. At the same
> ?time we can monitor how GitHub evolves over time.
>

I think there are some important points brought by Brad (and others).

- From the technical point of view, I don't see any serious problems:
  - I can setup a new branch in github (current one includes some
testing changes done by Giovanni)
  - it will be synchronized daily with changes from CVS
  - I'll set up a script to also save a backup of the official branch
at the OBF server (to ensure that we do not depend on github)
  - I can make a (short) documentation on how to contribute.

I don't know wheteher anyone beside me is still interested in
testdriving launchpad/bzr as an alternative. If there are no other
people, I'll close the current testing branches from launchpad.

>
> Peter, Michiel, et al -- how do you feel?

I would also very happily hear from other developers. Especially if
there are any people who would be unhappy if we finally moved away
from CVS.

I'll post when I will have a running setup of cvs2git conversion.

cheers
Bartek


From bartek at rezolwenta.eu.org  Sun Mar 15 23:14:07 2009
From: bartek at rezolwenta.eu.org (Bartek Wilczynski)
Date: Mon, 16 Mar 2009 00:14:07 +0100
Subject: [Biopython-dev] biopython on github
In-Reply-To: <8b34ec180903151312q5a3b2bcdwc526aef5d4ca2cfc@mail.gmail.com>
References: <8b34ec180902231029u7a9d003r533af7f078f4a8e2@mail.gmail.com>
	<8b34ec180902250256k6f6f5c1bvbf85d8b68a315927@mail.gmail.com>
	<8b34ec180902260526m3ff42f3x2a99a77d4d0fb928@mail.gmail.com>
	<320fb6e00902260600p5fb90241td1ded497c08cb901@mail.gmail.com>
	<128a885f0903121407g133ed8ctda57b21ff8adb70e@mail.gmail.com>
	<8b34ec180903121620w9c2ec46i8fed9ccb4781370e@mail.gmail.com>
	<320fb6e00903130521s69c5b3eg55b71191b1e8ff21@mail.gmail.com>
	<128a885f0903142243r372026d7vdf5bbe998db3a326@mail.gmail.com>
	<20090315185443.GA30296@kunkel>
	<8b34ec180903151312q5a3b2bcdwc526aef5d4ca2cfc@mail.gmail.com>
Message-ID: <8b34ec180903151614k37db9568sc04b10bcdb688139@mail.gmail.com>

Hi all,

I've now set up script on my machine to update the biopython git
branch on github once every hour.
(thanks to Giovanni for creating and setting up the account)
It's created using the git fast-import script because of its speed.
You can find it here:

http://github.com/biopython/biopython/

It's a different branch than the one created earlier by Giovanni. The
old one is now called biopython_old
and will soon disappear from github (there were some temporary changes in it)

Th script also leaves a copy of the repository on dev.open-bio.org,
just in case :)

I've written a short guide on the wiki :
http://biopython.org/wiki/GitMigration

Please correct or give me comments if you don't like something or if
you feel something is missing.

I'm going to a conference, so I might be slow in responding to emails
next week...

cheers
Bartek


From dalloliogm at gmail.com  Mon Mar 16 09:49:29 2009
From: dalloliogm at gmail.com (Giovanni Marco Dall'Olio)
Date: Mon, 16 Mar 2009 10:49:29 +0100
Subject: [Biopython-dev] biopython on github
In-Reply-To: <8b34ec180903151614k37db9568sc04b10bcdb688139@mail.gmail.com>
References: <8b34ec180902231029u7a9d003r533af7f078f4a8e2@mail.gmail.com>
	<8b34ec180902260526m3ff42f3x2a99a77d4d0fb928@mail.gmail.com>
	<320fb6e00902260600p5fb90241td1ded497c08cb901@mail.gmail.com>
	<128a885f0903121407g133ed8ctda57b21ff8adb70e@mail.gmail.com>
	<8b34ec180903121620w9c2ec46i8fed9ccb4781370e@mail.gmail.com>
	<320fb6e00903130521s69c5b3eg55b71191b1e8ff21@mail.gmail.com>
	<128a885f0903142243r372026d7vdf5bbe998db3a326@mail.gmail.com>
	<20090315185443.GA30296@kunkel>
	<8b34ec180903151312q5a3b2bcdwc526aef5d4ca2cfc@mail.gmail.com>
	<8b34ec180903151614k37db9568sc04b10bcdb688139@mail.gmail.com>
Message-ID: <5aa3b3570903160249l6db16b6ew349e394bc3e126dc@mail.gmail.com>

On Mon, Mar 16, 2009 at 12:14 AM, Bartek Wilczynski <
bartek at rezolwenta.eu.org> wrote:

> Hi all,


>
> I've written a short guide on the wiki :
> http://biopython.org/wiki/GitMigration


I also have a draft for some documentation... I can contribute it later this
morning (now I don't have time).
p.s. the biopython website seems to be offline at the moment...


--

My blog on bioinformatics (now in English): http://bioinfoblog.it


From biopython at maubp.freeserve.co.uk  Mon Mar 16 11:05:38 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Mon, 16 Mar 2009 11:05:38 +0000
Subject: [Biopython-dev] biopython on github
In-Reply-To: <5aa3b3570903160249l6db16b6ew349e394bc3e126dc@mail.gmail.com>
References: <8b34ec180902231029u7a9d003r533af7f078f4a8e2@mail.gmail.com>
	<320fb6e00902260600p5fb90241td1ded497c08cb901@mail.gmail.com>
	<128a885f0903121407g133ed8ctda57b21ff8adb70e@mail.gmail.com>
	<8b34ec180903121620w9c2ec46i8fed9ccb4781370e@mail.gmail.com>
	<320fb6e00903130521s69c5b3eg55b71191b1e8ff21@mail.gmail.com>
	<128a885f0903142243r372026d7vdf5bbe998db3a326@mail.gmail.com>
	<20090315185443.GA30296@kunkel>
	<8b34ec180903151312q5a3b2bcdwc526aef5d4ca2cfc@mail.gmail.com>
	<8b34ec180903151614k37db9568sc04b10bcdb688139@mail.gmail.com>
	<5aa3b3570903160249l6db16b6ew349e394bc3e126dc@mail.gmail.com>
Message-ID: <320fb6e00903160405p5337f8b1m16d3c3d891950fd6@mail.gmail.com>

On Mon, Mar 16, 2009 at 9:49 AM, Giovanni Marco Dall'Olio
<dalloliogm at gmail.com> wrote:
> On Mon, Mar 16, 2009 at 12:14 AM, Bartek Wilczynski <
> bartek at rezolwenta.eu.org> wrote:
>> Hi all,
>>
>> I've written a short guide on the wiki :
>> http://biopython.org/wiki/GitMigration
>
> I also have a draft for some documentation... I can contribute it later this
> morning (now I don't have time).

In the meantime, I have updated the following pages accordingly:

http://biopython.org/wiki/CVS
http://biopython.org/wiki/SVN
http://biopython.org/wiki/Subversion_migration
http://biopython.org/wiki/Git #place holder, will be important if we
do fully move to git
http://biopython.org/wiki/GitMigration #Fixing biopython to Biopython etc

Peter

> p.s. the biopython website seems to be offline at the moment...

All the OBF pages were out for bit this morning (e.g. OBF helpdesk
#332), but it is back now.


From biopython at maubp.freeserve.co.uk  Mon Mar 16 11:30:12 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Mon, 16 Mar 2009 11:30:12 +0000
Subject: [Biopython-dev] biopython on github
In-Reply-To: <20090315185443.GA30296@kunkel>
References: <8b34ec180902231029u7a9d003r533af7f078f4a8e2@mail.gmail.com>
	<320fb6e00902250210t2ad19536ke379e219ba6f7dae@mail.gmail.com>
	<8b34ec180902250256k6f6f5c1bvbf85d8b68a315927@mail.gmail.com>
	<8b34ec180902260526m3ff42f3x2a99a77d4d0fb928@mail.gmail.com>
	<320fb6e00902260600p5fb90241td1ded497c08cb901@mail.gmail.com>
	<128a885f0903121407g133ed8ctda57b21ff8adb70e@mail.gmail.com>
	<8b34ec180903121620w9c2ec46i8fed9ccb4781370e@mail.gmail.com>
	<320fb6e00903130521s69c5b3eg55b71191b1e8ff21@mail.gmail.com>
	<128a885f0903142243r372026d7vdf5bbe998db3a326@mail.gmail.com>
	<20090315185443.GA30296@kunkel>
Message-ID: <320fb6e00903160430h125d11a3jd100497d3e25ffb8@mail.gmail.com>

On Sun, Mar 15, 2009 at 6:54 PM, Brad Chapman <chapmanb at 50mail.com> wrote:
> Hi all;
> It is good to see the discussion around revision control systems;
> Chris and Paulo's posts make some nice points. Source code
> management is an important issue that influences perception of
> Biopython and barriers to contributing.
>
> My two cents on what we should do is:
>
> - Pick a distributed source code management system. My preference
> ?is Git, only because it currently has more steam behind it.
> ?Git/Bazaar will likely end up being like the VHS/Beta debate.

I would agree git has more mind share, but I have no technical reason
to choose one over the other.

In terms of read only access, having a mirrored trunk branch on both
git (e.g. github) and bazaar (e.g. launchpad) is possible for
evaluation purposes.

> - Test drive use of Git on an official GitHub repository. This would
> ?involve a few things ...

Giovanni has shared the github "Biopython" user information so we
(i.e. Biopython) can use that for any official presence on github -
which is great.  Bartek and Giovanni seem to have this working OK.

I think having the latest CVS trunk in Launchpad automatically is
stalled because they (launchpad) can't cope with a simple
username/password for accessing a remote CVS server.  Is that right
Bartek?

> - Evaluate the success of Git. This is easy to measure in terms of
> ?new contributors, increased happiness, and what not. At the same
> ?time we can monitor how GitHub evolves over time.

It may not be that easy to measure in practice...

> - If successful, talk to the OpenBio team about hosting Git locally.

I have contacted the OBF to ask who we should talk to about this idea
(given it will probably involve server access to install new software
and perhaps changing firewall/port settings).

> Peter, Michiel, et al -- how do you feel?

I'm happy in principle with a switch to git, ideally hosted on
biopython.org (see below).

> I think being cautious with the transition, as Peter recommends, is
> important. I am old enough to remember Sourceforge being new and
> everyone saying how it was stupid not to move there; then over time
> Sourceforge got slow with all the users and people moved
> away from it. This is just to say -- no one knows how GitHub (or
> Launchpad) will evolve. OpenBio is a stable, small, nice community
> and to the extent we can use their resources I believe we should.

I did have that same example in mind - having to depend on a third
party like GitHub, LaunchPad or Sourceforge is great until things go
wrong.  The Open Bio Foundation is much smaller, and while they don't
have 100% uptime either, they are normally very responsive to issues
because they only support a small number of projects.  Of course,
ideally we might have both - an OBF hosted (git) repository on
biopython.org, synced to github for people to enjoy its collaborative
additions.

> Overall, the specifics of the above proposal aren't as important as
> just doing something unambiguous and then evaluating how it works.
> Right now things are a big confusing, which I think could put off
> new developers, who are always welcome.
>
> Looking forward to talking about code instead of revision control,

That would be nice :)

Peter


From biopython at maubp.freeserve.co.uk  Mon Mar 16 12:16:06 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Mon, 16 Mar 2009 12:16:06 +0000
Subject: [Biopython-dev] Preparing for Biopython 1.50 (beta)
Message-ID: <320fb6e00903160516yd63f61fu21ca7560562dd6dd@mail.gmail.com>

Hi All,

I think we should probably do another release soon - for one thing the
NCBI updated their DTD files, and it would be great if Biopython
shipped with them included (see discussion on Bug 2678).

We still need to work on the documentation for
Bio.Graphics.GenomeDiagram (Bug 2671) and Bio.Motif (Bug 2694), but in
the meantime I think it would be sensible to do a Biopython 1.50 beta
release in the next couple of weeks.

I'd like to include the following changes as part of the beta, but it
would be sensible to have someone else try these out first.  Any
volunteers?

Bug 2767 - Bio.SeqIO support for FASTQ and QUAL files
Bug 2551 - Adding advanced __getitem__ to generic alignment, e.g.
align[1:2,5:-5]

Any other nominations for Biopython 1.50?

I'd also like to resolve Bug 2597 (Enforce alphabet letters in Seq
objects), but that might deserve an alpha release given the higher
chance of breaking existing scripts...

Peter


From biopython at maubp.freeserve.co.uk  Mon Mar 16 13:18:19 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Mon, 16 Mar 2009 13:18:19 +0000
Subject: [Biopython-dev] biopython on github
In-Reply-To: <320fb6e00903160430h125d11a3jd100497d3e25ffb8@mail.gmail.com>
References: <8b34ec180902231029u7a9d003r533af7f078f4a8e2@mail.gmail.com>
	<8b34ec180902250256k6f6f5c1bvbf85d8b68a315927@mail.gmail.com>
	<8b34ec180902260526m3ff42f3x2a99a77d4d0fb928@mail.gmail.com>
	<320fb6e00902260600p5fb90241td1ded497c08cb901@mail.gmail.com>
	<128a885f0903121407g133ed8ctda57b21ff8adb70e@mail.gmail.com>
	<8b34ec180903121620w9c2ec46i8fed9ccb4781370e@mail.gmail.com>
	<320fb6e00903130521s69c5b3eg55b71191b1e8ff21@mail.gmail.com>
	<128a885f0903142243r372026d7vdf5bbe998db3a326@mail.gmail.com>
	<20090315185443.GA30296@kunkel>
	<320fb6e00903160430h125d11a3jd100497d3e25ffb8@mail.gmail.com>
Message-ID: <320fb6e00903160618g2b5b6acs6695fab5ef432bc7@mail.gmail.com>

Hi all,

I'm thinking a news post on
http://news.open-bio.org/news/category/obf-projects/biopython/ about
version control would be a good idea at this point.  How about this -
keywords like git, subversion and the other project names would be
links:

<start draft>
Title: Biopython and version control systems

Originally, all the OBF hosted projects used CVS for their source code
repositories.  At the start of 2008, BioPerl and BioJava moved over to
Subversion (SVN), followed by BioSQL.  Biopython was originally going
to do the same, but this didn't actually happen.  Having all the Bio*
projects using the same version control system would have simplified
server administration for the OBF, but wouldn't have actually made
that much difference to Biopython development.  Discussion has since
shifted towards next generation distributed version control systems
like git or bazaar.

Quote from Linus Torvalds,
<italics>
The slogan of Subversion for a while was ?CVS done right?, or
something like that, and if you start with that kind of slogan,
there's nowhere you can go. There is no way to do CVS right
</italics>

In addition to creating the Linux kernel, Linus Torvalds more recently
wrote git, a prominent example of a distributed version control
system.  Rather than switching from CVS to SVN, the BioRuby project
choose instead to use git, hosted on github.  Biopython is considering
doing something similar - using a distributed version control system
like git should make it easier for potential Biopython contributors to
manage their own local copies of Biopython under version control.

Initially for evaluation purposes only, Giovanni and Bartek have setup
a Biopython branch on GitHub, which will automatically be updated from
the OBF hosted Biopython CVS repository [Link to wiki page].  If this
is favorably received, then moving Biopython from CVS to git seems
likely at some point this year.

Peter on behalf of the Biopython developers
<end draft>

I hope this has everyone's approval... if not please reply here so we
can revise this before it gets posted.  Note that I've avoided getting
into specifics here, such as hosting arrangements, as the details will
go out of date.

Peter


From bartek at rezolwenta.eu.org  Mon Mar 16 14:24:42 2009
From: bartek at rezolwenta.eu.org (Bartek Wilczynski)
Date: Mon, 16 Mar 2009 15:24:42 +0100
Subject: [Biopython-dev] biopython on github
In-Reply-To: <320fb6e00903160430h125d11a3jd100497d3e25ffb8@mail.gmail.com>
References: <8b34ec180902231029u7a9d003r533af7f078f4a8e2@mail.gmail.com>
	<8b34ec180902250256k6f6f5c1bvbf85d8b68a315927@mail.gmail.com>
	<8b34ec180902260526m3ff42f3x2a99a77d4d0fb928@mail.gmail.com>
	<320fb6e00902260600p5fb90241td1ded497c08cb901@mail.gmail.com>
	<128a885f0903121407g133ed8ctda57b21ff8adb70e@mail.gmail.com>
	<8b34ec180903121620w9c2ec46i8fed9ccb4781370e@mail.gmail.com>
	<320fb6e00903130521s69c5b3eg55b71191b1e8ff21@mail.gmail.com>
	<128a885f0903142243r372026d7vdf5bbe998db3a326@mail.gmail.com>
	<20090315185443.GA30296@kunkel>
	<320fb6e00903160430h125d11a3jd100497d3e25ffb8@mail.gmail.com>
Message-ID: <8b34ec180903160724h2e239fafi22d8f5fa9c1de7cc@mail.gmail.com>

On Mon, Mar 16, 2009 at 12:30 PM, Peter <biopython at maubp.freeserve.co.uk> wrote:
> On Sun, Mar 15, 2009 at 6:54 PM, Brad Chapman <chapmanb at 50mail.com> wrote:
>> - Pick a distributed source code management system. My preference
>> ?is Git, only because it currently has more steam behind it.
>> ?Git/Bazaar will likely end up being like the VHS/Beta debate.
>
> I would agree git has more mind share, but I have no technical reason
> to choose one over the other.
>
> In terms of read only access, having a mirrored trunk branch on both
> git (e.g. github) and bazaar (e.g. launchpad) is possible for
> evaluation purposes.

It is possible, but I don't know if we should do this. To some extent
having too much choice
might be problematic....

We've done some tests on both bzr and git and it seems that both can
do the job for us. I assume,
that the purpose of "test-driving" instead of directly switching to
git is to give us a possibility  to go
back in case things go really bad. But I don't think it's a likely
event. Bigger projects are using git
(or bzr) and doing fine, so we shouldn't have problems either.

On the other hand I don't expect that having the possibility to
test-drive two options is going to make the
decision any easier. I don't expect too many people to try both
options and even if it happens I don't think
there will be a clear acclamation that one is better than the other.
Honestly, we can't expect that all developers
will learn two tools just to help us choose... Even though I was
myself one of the proponents of switching to bzr
I think that we should focus on one option and git seems to be the one
with bigger mind share among biopythonistas.
So I would vote for dropping the discussion on bzr and focusing on
making sure that noone is left behind with their
problems during the (possibly not too long) transition to git.


>
>> - Test drive use of Git on an official GitHub repository. This would
>> ?involve a few things ...
>
> Giovanni has shared the github "Biopython" user information so we
> (i.e. Biopython) can use that for any official presence on github -
> which is great. ?Bartek and Giovanni seem to have this working OK.
>
> I think having the latest CVS trunk in Launchpad automatically is
> stalled because they (launchpad) can't cope with a simple
> username/password for accessing a remote CVS server. ?Is that right
> Bartek?
>

Yes, we have now the biopython branch on github synchronized with CVS
on an hourly basis.
There is no problem with synchronizing a branch on launchpad in the
same script, but I didn't do it for reasons explained above.

>> - Evaluate the success of Git. This is easy to measure in terms of
>> ?new contributors, increased happiness, and what not. At the same
>> ?time we can monitor how GitHub evolves over time.
>
> It may not be that easy to measure in practice...
>
Well, If everyone will be able to use git I'd say it's a success. We
don't need a perfect solution. We
want to move to _a_ distributed version control system.

> I did have that same example in mind - having to depend on a third
> party like GitHub, LaunchPad or Sourceforge is great until things go
> wrong. ?The Open Bio Foundation is much smaller, and while they don't
> have 100% uptime either, they are normally very responsive to issues
> because they only support a small number of projects. ?Of course,
> ideally we might have both - an OBF hosted (git) repository on
> biopython.org, synced to github for people to enjoy its collaborative
> additions.
>

There is one difference between moving to sourceforge and moving to git.
With git, it is much less of a problem to switch hosting. The
fundamental idea is that every branch
(including all local developer branches) can be a "master" branch. So
switching to a different
hosting location is a matter of an e-mail on the developer mailing
list telling people to update
the location of the "master" in their branches. So I think that we
need to worry less about git
hosting than we would need to worry about cvs (or svn for that matter).

cheers
  Bartek


From biopython at maubp.freeserve.co.uk  Mon Mar 16 15:00:16 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Mon, 16 Mar 2009 15:00:16 +0000
Subject: [Biopython-dev] biopython on github
In-Reply-To: <8b34ec180903160724h2e239fafi22d8f5fa9c1de7cc@mail.gmail.com>
References: <8b34ec180902231029u7a9d003r533af7f078f4a8e2@mail.gmail.com>
	<8b34ec180902260526m3ff42f3x2a99a77d4d0fb928@mail.gmail.com>
	<320fb6e00902260600p5fb90241td1ded497c08cb901@mail.gmail.com>
	<128a885f0903121407g133ed8ctda57b21ff8adb70e@mail.gmail.com>
	<8b34ec180903121620w9c2ec46i8fed9ccb4781370e@mail.gmail.com>
	<320fb6e00903130521s69c5b3eg55b71191b1e8ff21@mail.gmail.com>
	<128a885f0903142243r372026d7vdf5bbe998db3a326@mail.gmail.com>
	<20090315185443.GA30296@kunkel>
	<320fb6e00903160430h125d11a3jd100497d3e25ffb8@mail.gmail.com>
	<8b34ec180903160724h2e239fafi22d8f5fa9c1de7cc@mail.gmail.com>
Message-ID: <320fb6e00903160800s36b8231fo57e0a11506f8635d@mail.gmail.com>

On Mon, Mar 16, 2009 at 2:24 PM, Bartek Wilczynski
<bartek at rezolwenta.eu.org> wrote:
>
> On Mon, Mar 16, 2009 at 12:30 PM, Peter <biopython at maubp.freeserve.co.uk> wrote:
>> On Sun, Mar 15, 2009 at 6:54 PM, Brad Chapman <chapmanb at 50mail.com> wrote:
>>> - Pick a distributed source code management system. My preference
>>> ?is Git, only because it currently has more steam behind it.
>>> ?Git/Bazaar will likely end up being like the VHS/Beta debate.
>>
>> I would agree git has more mind share, but I have no technical reason
>> to choose one over the other.
>>
>> In terms of read only access, having a mirrored trunk branch on both
>> git (e.g. github) and bazaar (e.g. launchpad) is possible for
>> evaluation purposes.
>
> It is possible, but I don't know if we should do this. To some extent
> having too much choice might be problematic...

True.

> We've done some tests on both bzr and git and it seems that both
> can do the job for us. I assume, that the purpose of "test-driving"
> instead of directly switching to git is to give us a possibility to go
> back in case things go really bad. But I don't think it's a likely
> event. Bigger projects are using git (or bzr) and doing fine, so
> we shouldn't have problems either.

Well yes, having a fall back plan during this migration is essential.

I do think there is a separate need for "test driving" for those of us with
Biopython CVS access how don't have personally experience with git
(or github).  Making the switch before then would be a very bad idea.

I personally need to make time to play with git and github, doing a
couple of *real* branches and merges.  I hope to so this week, some
of the changes I'd like to do for Biopython 1.50 would make good
candidates... but this is time that might otherwise be spent on bug
fixes, documentation etc.  And there is of course my real job too... ;)

Related to this, what OS and version of git are you (Bartel and Giovanni) using?

> On the other hand I don't expect that having the possibility to
> test-drive two options is going to make the decision any easier.
> I don't expect too many people to try both options and even if it
> happens I don't think there will be a clear acclamation that one
> is better than the other.

I agree.

> Honestly, we can't expect that all developers will learn two tools
> just to help us choose... Even though I was myself one of the
> proponents of switching to bzr.
> I think that we should focus on one option and git seems to be the one
> with bigger mind share among biopythonistas.
> So I would vote for dropping the discussion on bzr and focusing on
> making sure that noone is left behind with their
> problems during the (possibly not too long) transition to git.

I'm happy with dropping discussion on bzr, in favour of git.

(As an aside I always liked the term biopythoneers, but biopythonistas
is fun too.)

>> Giovanni has shared the github "Biopython" user information so we
>> (i.e. Biopython) can use that for any official presence on github -
>> which is great. ?Bartek and Giovanni seem to have this working OK.
>>
>> I think having the latest CVS trunk in Launchpad automatically is
>> stalled because they (launchpad) can't cope with a simple
>> username/password for accessing a remote CVS server. ?Is that right
>> Bartek?
>
> Yes, we have now the biopython branch on github synchronized with CVS
> on an hourly basis.
> There is no problem with synchronizing a branch on launchpad in the
> same script, but I didn't do it for reasons explained above.

OK.  Do you want to make sure your Launchpad branch is clearly labeled
as not current?

> Well, If everyone will be able to use git I'd say it's a success. We
> don't need a perfect solution. We want to move to _a_ distributed
> version control system.

Well, I suspect there are some silent contributors who don't care
either way - its not perfect, but CVS works well enough.  Better
the devil you know ... ;)

> ...
> There is one difference between moving to sourceforge and moving to git.
> With git, it is much less of a problem to switch hosting... So I think that we
> need to worry less about git hosting than we would need to worry about
> cvs (or svn for that matter).

That is another good reason to pick git.

Peter


From bartek at rezolwenta.eu.org  Mon Mar 16 16:55:40 2009
From: bartek at rezolwenta.eu.org (Bartek Wilczynski)
Date: Mon, 16 Mar 2009 17:55:40 +0100
Subject: [Biopython-dev] biopython on github
In-Reply-To: <320fb6e00903160800s36b8231fo57e0a11506f8635d@mail.gmail.com>
References: <8b34ec180902231029u7a9d003r533af7f078f4a8e2@mail.gmail.com>
	<320fb6e00902260600p5fb90241td1ded497c08cb901@mail.gmail.com>
	<128a885f0903121407g133ed8ctda57b21ff8adb70e@mail.gmail.com>
	<8b34ec180903121620w9c2ec46i8fed9ccb4781370e@mail.gmail.com>
	<320fb6e00903130521s69c5b3eg55b71191b1e8ff21@mail.gmail.com>
	<128a885f0903142243r372026d7vdf5bbe998db3a326@mail.gmail.com>
	<20090315185443.GA30296@kunkel>
	<320fb6e00903160430h125d11a3jd100497d3e25ffb8@mail.gmail.com>
	<8b34ec180903160724h2e239fafi22d8f5fa9c1de7cc@mail.gmail.com>
	<320fb6e00903160800s36b8231fo57e0a11506f8635d@mail.gmail.com>
Message-ID: <8b34ec180903160955m3d427927wce61940f51cf5337@mail.gmail.com>

On Mon, Mar 16, 2009 at 4:00 PM, Peter <biopython at maubp.freeserve.co.uk> wrote:
>
> I do think there is a separate need for "test driving" for those of us with
> Biopython CVS access how don't have personally experience with git
> (or github). ?Making the switch before then would be a very bad idea.
>
> I personally need to make time to play with git and github, doing a
> couple of *real* branches and merges. ?I hope to so this week, some
> of the changes I'd like to do for Biopython 1.50 would make good
> candidates... but this is time that might otherwise be spent on bug
> fixes, documentation etc. ?And there is of course my real job too... ;)
>
> Related to this, what OS and version of git are you (Bartel and Giovanni) using?
>
I'm currently using the binary installations on mac (intel) and ubuntu
(8.10). I haven't
experienced any problems which is quite expected on unix-like systems.
It would be
interesting to hear from people's experiences on windows.

>
> OK. ?Do you want to make sure your Launchpad branch is clearly labeled
> as not current?
>

I've removed the bzr branches from launchpad, so there should be no
more confusion.

cheers
Bartek


From nuin at genedrift.org  Mon Mar 16 16:58:26 2009
From: nuin at genedrift.org (Paulo Nuin)
Date: Mon, 16 Mar 2009 12:58:26 -0400
Subject: [Biopython-dev] biopython on github
In-Reply-To: <8b34ec180903160955m3d427927wce61940f51cf5337@mail.gmail.com>
References: <8b34ec180902231029u7a9d003r533af7f078f4a8e2@mail.gmail.com>	<320fb6e00902260600p5fb90241td1ded497c08cb901@mail.gmail.com>	<128a885f0903121407g133ed8ctda57b21ff8adb70e@mail.gmail.com>	<8b34ec180903121620w9c2ec46i8fed9ccb4781370e@mail.gmail.com>	<320fb6e00903130521s69c5b3eg55b71191b1e8ff21@mail.gmail.com>	<128a885f0903142243r372026d7vdf5bbe998db3a326@mail.gmail.com>	<20090315185443.GA30296@kunkel>	<320fb6e00903160430h125d11a3jd100497d3e25ffb8@mail.gmail.com>	<8b34ec180903160724h2e239fafi22d8f5fa9c1de7cc@mail.gmail.com>	<320fb6e00903160800s36b8231fo57e0a11506f8635d@mail.gmail.com>
	<8b34ec180903160955m3d427927wce61940f51cf5337@mail.gmail.com>
Message-ID: <49BE8532.9040701@genedrift.org>

No problem on Vista.

 Git (version 1.5.6.1-preview20080701)

Paulo


Bartek Wilczynski wrote:
> On Mon, Mar 16, 2009 at 4:00 PM, Peter <biopython at maubp.freeserve.co.uk> wrote:
>   
>> I do think there is a separate need for "test driving" for those of us with
>> Biopython CVS access how don't have personally experience with git
>> (or github).  Making the switch before then would be a very bad idea.
>>
>> I personally need to make time to play with git and github, doing a
>> couple of *real* branches and merges.  I hope to so this week, some
>> of the changes I'd like to do for Biopython 1.50 would make good
>> candidates... but this is time that might otherwise be spent on bug
>> fixes, documentation etc.  And there is of course my real job too... ;)
>>
>> Related to this, what OS and version of git are you (Bartel and Giovanni) using?
>>
>>     
> I'm currently using the binary installations on mac (intel) and ubuntu
> (8.10). I haven't
> experienced any problems which is quite expected on unix-like systems.
> It would be
> interesting to hear from people's experiences on windows.
>
>   
>> OK.  Do you want to make sure your Launchpad branch is clearly labeled
>> as not current?
>>
>>     
>
> I've removed the bzr branches from launchpad, so there should be no
> more confusion.
>
> cheers
> Bartek
>
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev
>   


From biopython at maubp.freeserve.co.uk  Mon Mar 16 17:07:18 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Mon, 16 Mar 2009 17:07:18 +0000
Subject: [Biopython-dev] biopython on github
In-Reply-To: <49BE8532.9040701@genedrift.org>
References: <8b34ec180902231029u7a9d003r533af7f078f4a8e2@mail.gmail.com>
	<8b34ec180903121620w9c2ec46i8fed9ccb4781370e@mail.gmail.com>
	<320fb6e00903130521s69c5b3eg55b71191b1e8ff21@mail.gmail.com>
	<128a885f0903142243r372026d7vdf5bbe998db3a326@mail.gmail.com>
	<20090315185443.GA30296@kunkel>
	<320fb6e00903160430h125d11a3jd100497d3e25ffb8@mail.gmail.com>
	<8b34ec180903160724h2e239fafi22d8f5fa9c1de7cc@mail.gmail.com>
	<320fb6e00903160800s36b8231fo57e0a11506f8635d@mail.gmail.com>
	<8b34ec180903160955m3d427927wce61940f51cf5337@mail.gmail.com>
	<49BE8532.9040701@genedrift.org>
Message-ID: <320fb6e00903161007p3e36b6d3j29e4c319c762576a@mail.gmail.com>

On Mon, Mar 16, 2009 at 4:58 PM, Paulo Nuin <nuin at genedrift.org> wrote:
>
> No problem on Vista.
>
> Git (version 1.5.6.1-preview20080701)
>
> Paulo

Hi Paulo,

Could you be a bit more precise about the version are you using and
where got it from? i.e. Are you using cygwin or the Windows native
port, http://code.google.com/p/msysgit/

And did you mean in general you have no problems with git on Windows
Vista, or have you also tried fetching Biopython from github,
building, testing (and installing it)?  For example, are there any new
line issues from the unit tests?  This is one area where CVS and git
may differ slightly...

Thanks,

Peter


From dalloliogm at gmail.com  Mon Mar 16 19:57:38 2009
From: dalloliogm at gmail.com (Giovanni Marco Dall'Olio)
Date: Mon, 16 Mar 2009 20:57:38 +0100
Subject: [Biopython-dev] biopython on github
In-Reply-To: <320fb6e00903160800s36b8231fo57e0a11506f8635d@mail.gmail.com>
References: <8b34ec180902231029u7a9d003r533af7f078f4a8e2@mail.gmail.com>
	<320fb6e00902260600p5fb90241td1ded497c08cb901@mail.gmail.com>
	<128a885f0903121407g133ed8ctda57b21ff8adb70e@mail.gmail.com>
	<8b34ec180903121620w9c2ec46i8fed9ccb4781370e@mail.gmail.com>
	<320fb6e00903130521s69c5b3eg55b71191b1e8ff21@mail.gmail.com>
	<128a885f0903142243r372026d7vdf5bbe998db3a326@mail.gmail.com>
	<20090315185443.GA30296@kunkel>
	<320fb6e00903160430h125d11a3jd100497d3e25ffb8@mail.gmail.com>
	<8b34ec180903160724h2e239fafi22d8f5fa9c1de7cc@mail.gmail.com>
	<320fb6e00903160800s36b8231fo57e0a11506f8635d@mail.gmail.com>
Message-ID: <5aa3b3570903161257h75b4289bn6cebed8312834fc9@mail.gmail.com>

On Mon, Mar 16, 2009 at 4:00 PM, Peter <biopython at maubp.freeserve.co.uk>wrote:

> On Mon, Mar 16, 2009 at 2:24 PM, Bartek Wilczynski
> <bartek at rezolwenta.eu.org> wrote:
>
> Related to this, what OS and version of git are you (Bartel and Giovanni)
> using?


I am using git 1.5.4.3 on an Ubuntu 8.04 distribution.
At home, I am using a git binary distribution on an Ubuntu 8.10.

At the moment I am having some strange problems, relative to the fact that I
had a branch previously named as 'biopython' in my account, so it seems
don't understand well the fact that the old branch has been renamed.
For example, I don't have the 'Fork' button.... but it must be a temporary
problem, I already contacted the github's tech support.


> Peter
>
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev
>


-- 

My blog on bioinformatics (now in English): http://bioinfoblog.it


From bartek at rezolwenta.eu.org  Mon Mar 16 21:04:57 2009
From: bartek at rezolwenta.eu.org (Bartek Wilczynski)
Date: Mon, 16 Mar 2009 22:04:57 +0100
Subject: [Biopython-dev] biopython on github
In-Reply-To: <5aa3b3570903161257h75b4289bn6cebed8312834fc9@mail.gmail.com>
References: <8b34ec180902231029u7a9d003r533af7f078f4a8e2@mail.gmail.com>
	<128a885f0903121407g133ed8ctda57b21ff8adb70e@mail.gmail.com>
	<8b34ec180903121620w9c2ec46i8fed9ccb4781370e@mail.gmail.com>
	<320fb6e00903130521s69c5b3eg55b71191b1e8ff21@mail.gmail.com>
	<128a885f0903142243r372026d7vdf5bbe998db3a326@mail.gmail.com>
	<20090315185443.GA30296@kunkel>
	<320fb6e00903160430h125d11a3jd100497d3e25ffb8@mail.gmail.com>
	<8b34ec180903160724h2e239fafi22d8f5fa9c1de7cc@mail.gmail.com>
	<320fb6e00903160800s36b8231fo57e0a11506f8635d@mail.gmail.com>
	<5aa3b3570903161257h75b4289bn6cebed8312834fc9@mail.gmail.com>
Message-ID: <8b34ec180903161404s506757c2k80597a12a362cfc1@mail.gmail.com>

Hi,
On Mon, Mar 16, 2009 at 8:57 PM, Giovanni Marco Dall'Olio
<dalloliogm at gmail.com> wrote:
>
> At the moment I am having some strange problems, relative to the fact that I
> had a branch previously named as 'biopython' in my account, so it seems
> don't understand well the fact that the old branch has been renamed.
> For example, I don't have the 'Fork' button.... but it must be a temporary
> problem, I already contacted the github's tech support.
>

This is connected with the change I made in the repository. Namely I
renamed the branch created
by Giovanni to biopuython-old and created a new one (the "official"
one) called biopython again.

The "rename" feature was flagged as experimental, and I don't think we
would expect to use it anymore,
and there were warnings that it can affect the branches forked from
the branched previously created by Giovanni.

These two branches were incompatible, since they were done with
different scripts (different revision numbers).
So if you need to make retain some changes you made to the old branch,
please export them from your local copy as
changesets and apply these back to the new forks made from the new repository.

I'm sorry for the inconvenience.

cheers
Bartek


From chapmanb at 50mail.com  Mon Mar 16 22:42:40 2009
From: chapmanb at 50mail.com (Brad Chapman)
Date: Mon, 16 Mar 2009 18:42:40 -0400
Subject: [Biopython-dev] biopython on github
In-Reply-To: <320fb6e00903160430h125d11a3jd100497d3e25ffb8@mail.gmail.com>
References: <320fb6e00902250210t2ad19536ke379e219ba6f7dae@mail.gmail.com>
	<8b34ec180902250256k6f6f5c1bvbf85d8b68a315927@mail.gmail.com>
	<8b34ec180902260526m3ff42f3x2a99a77d4d0fb928@mail.gmail.com>
	<320fb6e00902260600p5fb90241td1ded497c08cb901@mail.gmail.com>
	<128a885f0903121407g133ed8ctda57b21ff8adb70e@mail.gmail.com>
	<8b34ec180903121620w9c2ec46i8fed9ccb4781370e@mail.gmail.com>
	<320fb6e00903130521s69c5b3eg55b71191b1e8ff21@mail.gmail.com>
	<128a885f0903142243r372026d7vdf5bbe998db3a326@mail.gmail.com>
	<20090315185443.GA30296@kunkel>
	<320fb6e00903160430h125d11a3jd100497d3e25ffb8@mail.gmail.com>
Message-ID: <20090316224240.GA57054@sobchak.mgh.harvard.edu>

Hey everyone;
Wow, y'all are quick. Bartek, Giovanni and Peter -- thanks for all
the hard work and organization. Consolidating a couple of threads
below...

> >> I've written a short guide on the wiki :
> >> http://biopython.org/wiki/GitMigration
> >
> > I also have a draft for some documentation... I can contribute it later this
> > morning (now I don't have time).
> 
> In the meantime, I have updated the following pages accordingly:

The documentation looks awesome. My only suggestion would be to
change the navigation link that current points to CVS to point to a
generic page like SourceCode. Then that landing page could link
to the current CVS and explain we are working to transition to 
Git, with links to those pages. Currently, the Git docs are a
bit buried from the front page.

Peter, I don't appear to have wiki permissions to edit the navigation
bar; do you?

Peter:
> I'm thinking a news post on
> http://news.open-bio.org/news/category/obf-projects/biopython/ about
> version control would be a good idea at this point.  How about this -

This is great, and I would move the last paragraph describing
the Git repository to the beginning; start with what we are doing and
then describe the rationale. This should help for those with ADD, and
also give more prominent credit to Bartek, Giovanni and you for the
work that went into this.

> > - Evaluate the success of Git. This is easy to measure in terms of
> > ?new contributors, increased happiness, and what not. At the same
> > ?time we can monitor how GitHub evolves over time.
> 
> It may not be that easy to measure in practice...

How about these two metrics:

- How do current developers like it? Beyond the initial learning
  curve, does it work at least as good as CVS for day to day stuff?

- Does it lower the entry barriers to contributing to Biopython? The
  main reason to do this is to ease the initial work for coders who
  feel CVS/Patches/Bugzilla is too much. If we find new contributors
  through this, it's a win.

Modest expectations are good. If either of these fail miserably, then 
we can re-evaluate.

Brad


From chapmanb at 50mail.com  Mon Mar 16 22:55:58 2009
From: chapmanb at 50mail.com (Brad Chapman)
Date: Mon, 16 Mar 2009 18:55:58 -0400
Subject: [Biopython-dev] Preparing for Biopython 1.50 (beta)
In-Reply-To: <320fb6e00903160516yd63f61fu21ca7560562dd6dd@mail.gmail.com>
References: <320fb6e00903160516yd63f61fu21ca7560562dd6dd@mail.gmail.com>
Message-ID: <20090316225558.GC57054@sobchak.mgh.harvard.edu>

Peter;

> I think we should probably do another release soon 

Good call. +1 from me.

> I'd like to include the following changes as part of the beta, but it
> would be sensible to have someone else try these out first.  Any
> volunteers?
> 
> Bug 2767 - Bio.SeqIO support for FASTQ and QUAL files

The code for this looked good when I reviewed it earlier. I will
test it out with some solexa reads from here this week; any reason
not to check the patch and files into CVS? Then I can fire up my
coal-powered revision control system, feed two punch cards into the
mouth of the machine, hope the vacuum tubes don't burn out again,
and check it out locally.

Brad


From tiagoantao at gmail.com  Tue Mar 17 00:11:50 2009
From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=)
Date: Tue, 17 Mar 2009 00:11:50 +0000
Subject: [Biopython-dev] biopython on github
In-Reply-To: <20090316224240.GA57054@sobchak.mgh.harvard.edu>
References: <320fb6e00902250210t2ad19536ke379e219ba6f7dae@mail.gmail.com>
	<8b34ec180902260526m3ff42f3x2a99a77d4d0fb928@mail.gmail.com>
	<320fb6e00902260600p5fb90241td1ded497c08cb901@mail.gmail.com>
	<128a885f0903121407g133ed8ctda57b21ff8adb70e@mail.gmail.com>
	<8b34ec180903121620w9c2ec46i8fed9ccb4781370e@mail.gmail.com>
	<320fb6e00903130521s69c5b3eg55b71191b1e8ff21@mail.gmail.com>
	<128a885f0903142243r372026d7vdf5bbe998db3a326@mail.gmail.com>
	<20090315185443.GA30296@kunkel>
	<320fb6e00903160430h125d11a3jd100497d3e25ffb8@mail.gmail.com>
	<20090316224240.GA57054@sobchak.mgh.harvard.edu>
Message-ID: <6d941f120903161711p71c7c940t1eabe933c0fa43e5@mail.gmail.com>

I've been reading this thread and mainly staying silent but there is
one question that is not clear in my mind but I believe it is
important:

How is the "official" biopython trunk controlled? Currently what is on
CVS is the gospel and Peter and Michiel essencially have control of
what is there and what is labelled as a "biopython distribution". How
will this work now?
The second question, related to the first is how will different
branches (of different persons) be managed? I am seeing people
starting working on the same code in different directions and then
having problems merging everything together.

Maybe these questions stem from my ignorance of distributed version
control. But, if not, I think they should be resolved before
advancing.

My suggestion: write (or at least informally agree) the policy before
advancing. While distributed version control seems a good idea (no
opposition), it also seems a good way to create new problems.

BTW, I would be tempted to suggest that a labelled release would be a
good starting point for a distributed revision control bootstrap.

On Mon, Mar 16, 2009 at 10:42 PM, Brad Chapman <chapmanb at 50mail.com> wrote:
> Hey everyone;
> Wow, y'all are quick. Bartek, Giovanni and Peter -- thanks for all
> the hard work and organization. Consolidating a couple of threads
> below...
>
>> >> I've written a short guide on the wiki :
>> >> http://biopython.org/wiki/GitMigration
>> >
>> > I also have a draft for some documentation... I can contribute it later this
>> > morning (now I don't have time).
>>
>> In the meantime, I have updated the following pages accordingly:
>
> The documentation looks awesome. My only suggestion would be to
> change the navigation link that current points to CVS to point to a
> generic page like SourceCode. Then that landing page could link
> to the current CVS and explain we are working to transition to
> Git, with links to those pages. Currently, the Git docs are a
> bit buried from the front page.
>
> Peter, I don't appear to have wiki permissions to edit the navigation
> bar; do you?
>
> Peter:
>> I'm thinking a news post on
>> http://news.open-bio.org/news/category/obf-projects/biopython/ about
>> version control would be a good idea at this point. ?How about this -
>
> This is great, and I would move the last paragraph describing
> the Git repository to the beginning; start with what we are doing and
> then describe the rationale. This should help for those with ADD, and
> also give more prominent credit to Bartek, Giovanni and you for the
> work that went into this.
>
>> > - Evaluate the success of Git. This is easy to measure in terms of
>> > ?new contributors, increased happiness, and what not. At the same
>> > ?time we can monitor how GitHub evolves over time.
>>
>> It may not be that easy to measure in practice...
>
> How about these two metrics:
>
> - How do current developers like it? Beyond the initial learning
> ?curve, does it work at least as good as CVS for day to day stuff?
>
> - Does it lower the entry barriers to contributing to Biopython? The
> ?main reason to do this is to ease the initial work for coders who
> ?feel CVS/Patches/Bugzilla is too much. If we find new contributors
> ?through this, it's a win.
>
> Modest expectations are good. If either of these fail miserably, then
> we can re-evaluate.
>
> Brad
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev
>


-- 
 "A man who dares to waste one hour of time has not discovered the
value of life" - Charles Darwin


From dschruth at u.washington.edu  Mon Mar 16 23:15:39 2009
From: dschruth at u.washington.edu (David Schruth)
Date: Mon, 16 Mar 2009 16:15:39 -0700
Subject: [Biopython-dev] Preparing for Biopython 1.50 (beta)
In-Reply-To: <20090316225558.GC57054@sobchak.mgh.harvard.edu>
References: <320fb6e00903160516yd63f61fu21ca7560562dd6dd@mail.gmail.com>
	<20090316225558.GC57054@sobchak.mgh.harvard.edu>
Message-ID: <49BEDD9B.6030905@u.washington.edu>

I've got some 454 and Solid data you could test it on too.

Has anybody else looked into how these other two Next Gen formats might 
complicate things?

Brad Chapman wrote:
> Peter;
>
>   
>> I think we should probably do another release soon 
>>     
>
> Good call. +1 from me.
>
>   
>> I'd like to include the following changes as part of the beta, but it
>> would be sensible to have someone else try these out first.  Any
>> volunteers?
>>
>> Bug 2767 - Bio.SeqIO support for FASTQ and QUAL files
>>     
>
> The code for this looked good when I reviewed it earlier. I will
> test it out with some solexa reads from here this week; any reason
> not to check the patch and files into CVS? Then I can fire up my
> coal-powered revision control system, feed two punch cards into the
> mouth of the machine, hope the vacuum tubes don't burn out again,
> and check it out locally.
>
> Brad
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev
>   
-------------- next part --------------
A non-text attachment was scrubbed...
Name: dschruth.vcf
Type: text/x-vcard
Size: 450 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/biopython-dev/attachments/20090316/581f9e69/attachment-0002.vcf>

From bugzilla-daemon at portal.open-bio.org  Tue Mar 17 00:40:01 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 16 Mar 2009 20:40:01 -0400
Subject: [Biopython-dev] [Bug 2790] New: Genepop parser creates a full
	representation of the file on memory
Message-ID: <bug-2790-42@http.bugzilla.open-bio.org/>

http://bugzilla.open-bio.org/show_bug.cgi?id=2790

           Summary: Genepop parser creates a full representation of the file
                    on memory
           Product: Biopython
           Version: Not Applicable
          Platform: All
        OS/Version: All
            Status: NEW
          Severity: normal
          Priority: P2
         Component: PopGen
        AssignedTo: biopython-dev at biopython.org
        ReportedBy: tiagoantao at gmail.com


The genepop parser creates a full representation of the file on memory.

This is fine for most users (like with 100/200 individuals and 100 markers)
but, more and more people appear now with thousands of individuals and/or
thousands of loci. In some cases the whole file doesn't fit memory.

An alternative (iterator based) interface has to be created which only
maintains a subset of the file in memory


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From idoerg at gmail.com  Tue Mar 17 00:49:39 2009
From: idoerg at gmail.com (Iddo Friedberg)
Date: Mon, 16 Mar 2009 17:49:39 -0700
Subject: [Biopython-dev] Preparing for Biopython 1.50 (beta)
In-Reply-To: <49BEDD9B.6030905@u.washington.edu>
References: <320fb6e00903160516yd63f61fu21ca7560562dd6dd@mail.gmail.com>
	<20090316225558.GC57054@sobchak.mgh.harvard.edu>
	<49BEDD9B.6030905@u.washington.edu>
Message-ID: <1237250979.20135.5.camel@lafa>

I have.

For one thing, GenBank has some new files that break the current parser.

LOCUS       ABDH01000000           55108 rc    DNA     linear   ENV 26-NOV-2007


This is a typical header for an environmental sequence (notice the ENV).
Note taht this does not necessarily have to be a next-gen sequence. It
can also be Sanger. The point is, it's not genome associated, but
obtained using metagenomic methods

To our business: the "rc" breaks the parser.


The file itself is attahed. Note that in the end iit does not have a
sequence, but rather a WGS field that points to sequence files.

I'll actually be happy to take this one.

./I


On Mon, 2009-03-16 at 16:15 -0700, David Schruth wrote:
> I've got some 454 and Solid data you could test it on too.
> 
> Has anybody else looked into how these other two Next Gen formats might 
> complicate things?
> 
> Brad Chapman wrote:
> > Peter;
> >
> >   
> >> I think we should probably do another release soon 
> >>     
> >
> > Good call. +1 from me.
> >
> >   
> >> I'd like to include the following changes as part of the beta, but it
> >> would be sensible to have someone else try these out first.  Any
> >> volunteers?
> >>
> >> Bug 2767 - Bio.SeqIO support for FASTQ and QUAL files
> >>     
> >
> > The code for this looked good when I reviewed it earlier. I will
> > test it out with some solexa reads from here this week; any reason
> > not to check the patch and files into CVS? Then I can fire up my
> > coal-powered revision control system, feed two punch cards into the
> > mouth of the machine, hope the vacuum tubes don't burn out again,
> > and check it out locally.
> >
> > Brad
> > _______________________________________________
> > Biopython-dev mailing list
> > Biopython-dev at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/biopython-dev
> >   
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev
-- 
Iddo Friedberg, Ph.D.
CALIT2 Atkinson Hall MC #0446
University of California San Diego
9500 Gilman Drive
La Jolla, CA 92093-0446 USA
+1 (858) 534-0570
http://iddo-friedberg.org
-------------- next part --------------
LOCUS       ABDH01000000           55108 rc    DNA     linear   ENV 26-NOV-2007
DEFINITION  Termite gut metagenome, whole genome shotgun sequencing project.
ACCESSION   ABDH00000000
VERSION     ABDH00000000.1  GI:161074815
PROJECT     GenomeProject:19107
DBLINK      Project:19107
KEYWORDS    WGS.
SOURCE      termite gut metagenome
  ORGANISM  termite gut metagenome
            unclassified sequences; metagenomes; organismal metagenomes.
REFERENCE   1  (bases 1 to 55108)
  AUTHORS   Warnecke,F., Luginbuhl,P., Ivanova,N., Ghassemian,M.,
            Richardson,T.H., Stege,J.T., Cayouette,M., McHardy,A.C.,
            Djordjevic,G., Aboushadi,N., Sorek,R., Tringe,S.G., Podar,M.,
            Martin,H.G., Kunin,V., Dalevi,D., Madejska,J., Kirton,E., Platt,D.,
            Szeto,E., Salamov,A., Barry,K., Mikhailova,N., Kyrpides,N.C.,
            Matson,E.G., Ottesen,E.A., Zhang,X., Hernandez,M., Murillo,C.,
            Acosta,L.G., Rigoutsos,I., Tamayo,G., Green,B.D., Chang,C.,
            Rubin,E.M., Mathur,E.J., Robertson,D.E., Hugenholtz,P. and
            Leadbetter,J.R.
  TITLE     Metagenomic and functional analysis of hindgut microbiota of a
            wood-feeding higher termite
  JOURNAL   Nature 450 (7169), 560-565 (2007)
   PUBMED   18033299
REFERENCE   2  (bases 1 to 55108)
  AUTHORS   Warnecke,F., Luginbuhl,P., Ivanova,N., Ghassemian,M.,
            Richardson,T.H., Stege,J.T., Cayouette,M., Djordjevic,G.,
            Aboushadi,N., Sorek,R., Tringe,S.G., Podar,M., Garcia Martin,H.,
            Kunin,V., McHardy,A.C., Dalevi,D., Madejska,J., Kirton,E.,
            Platt,D., Szeto,E., Salamov,A., Barry,K., Mikhailova,N.,
            Kyrpides,N., Matson,E.G., Ottesen,E.A., Zhang,X., Hernandez,M.,
            Murillo,C., Acosta,L.G., Rigoutsos,I., Tamayo,G., Green,B.,
            Chang,C., Rubin,E.M., Mathur,E.J., Robertson,D.E., Hugenholtz,P.
            and Leadbetter,J.R.
  TITLE     Direct Submission
  JOURNAL   Submitted (27-JUN-2007) Microbial Ecology Program, US DOE Joint
            Genome Institute, 2800 Mitchell Drive B100, Walnut Creek, CA
            94598-1698, USA
COMMENT     The termite gut metagenome whole genome shotgun (WGS) project has
            the project accession ABDH00000000.  This version of the project
            (01) has the accession number ABDH01000000, and consists of
            sequences ABDH01000001-ABDH01055108.
            URL -- http://www.jgi.doe.gov
            JGI Project ID:4001605
            Contact: Philip Hugenholtz (PHugenholtz at lbl.gov)
            sampling site latitude: N10.11.260; sampling site longitude:
            W083.51.345; sampling site altitude: 310 m AMSL; sample type: lumen
            content; host species: Nasutitermes sp.; anatomic site: gut,
            proctodeal segment 3, lumen; association type: symbiosis; sample
            treatment and preservation: termites were collected, transported to
            laboratory alive within 36 hours, P3 gut lumen fluid was extracted
            and stored frozen in buffered saline solution until DNA extraction.
            The JGI and collaborators endorse the principles for the
            distribution and use of large scale sequencing data adopted by the
            larger genome sequencing community and urge users of this data to
            follow them. It is our intention to publish the work of this
            project in a timely fashion and we welcome collaborative
            interaction on the project and analysis.
            (http://www.genome.gov/page.cfm?pageID=10506376).
FEATURES             Location/Qualifiers
     source          1..55108
                     /organism="termite gut metagenome"
                     /mol_type="genomic DNA"
                     /isolation_source="Nasutitermes sp. proctodeal segment 3
                     gut lumen"
                     /db_xref="taxon:433724"
                     /environmental_sample
                     /country="Costa Rica"
                     /lat_lon="10.1877 N 83.8558 W"
                     /note="metagenomic"
WGS         ABDH01000001-ABDH01055108
//

From chris.lasher at gmail.com  Tue Mar 17 03:45:33 2009
From: chris.lasher at gmail.com (Chris Lasher)
Date: Mon, 16 Mar 2009 23:45:33 -0400
Subject: [Biopython-dev] biopython on github
In-Reply-To: <6d941f120903161711p71c7c940t1eabe933c0fa43e5@mail.gmail.com>
References: <320fb6e00902250210t2ad19536ke379e219ba6f7dae@mail.gmail.com>
	<320fb6e00902260600p5fb90241td1ded497c08cb901@mail.gmail.com>
	<128a885f0903121407g133ed8ctda57b21ff8adb70e@mail.gmail.com>
	<8b34ec180903121620w9c2ec46i8fed9ccb4781370e@mail.gmail.com>
	<320fb6e00903130521s69c5b3eg55b71191b1e8ff21@mail.gmail.com>
	<128a885f0903142243r372026d7vdf5bbe998db3a326@mail.gmail.com>
	<20090315185443.GA30296@kunkel>
	<320fb6e00903160430h125d11a3jd100497d3e25ffb8@mail.gmail.com>
	<20090316224240.GA57054@sobchak.mgh.harvard.edu>
	<6d941f120903161711p71c7c940t1eabe933c0fa43e5@mail.gmail.com>
Message-ID: <128a885f0903162045l474d0df3w2b8fad7f7f129a3b@mail.gmail.com>

2009/3/16 Tiago Ant?o <tiagoantao at gmail.com>

> I've been reading this thread and mainly staying silent but there is
> one question that is not clear in my mind but I believe it is
> important:
>
> How is the "official" biopython trunk controlled? Currently what is on
> CVS is the gospel and Peter and Michiel essencially have control of
> what is there and what is labelled as a "biopython distribution". How
> will this work now?


In a distributed workflow, there is no technical official repository. The
"official repository" is socially enforced. Technically, there is no
official repository of the Linux kernel anymore. However, there is an
"official" version, which is Linus Torvald's repository. It is socially
enforced. I think Michiel and Peter still head the Biopython project--at
least they have the most clout, I would say. Therefore, we will probably
look to one of their branches as the "official" branch of Biopython. When
one of them wants to step down in duty, we will socially pass the torch on
to the next taker.

See "6.3 Using gatekeepers" at
http://doc.bazaar-vcs.org/latest/en/user-guide/index.html#team-collaboration-distributed-style
See also
http://betterexplained.com/articles/intro-to-distributed-version-control-illustrated/


> The second question, related to the first is how will different
> branches (of different persons) be managed? I am seeing people
> starting working on the same code in different directions and then
> having problems merging everything together.


People are supposed to work in different directions; this is the point of
distributed workflows. Merging tends not to be so difficult, and compared to
centralized models like CVS and SVN, it's a cinch. We will help provide
documentation for proper merging habits (e.g., merge early, merge often, and
no rebasing after pushing, etc.). There are also screencasts popping up (in
particular Scott Chacon's re-make of his Gitcasts, now at learn.github) that
we will link to for educational purposes. And of course, other developers
will be around to help out in tricky merges.

Chris


From bugzilla-daemon at portal.open-bio.org  Tue Mar 17 04:11:34 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 17 Mar 2009 00:11:34 -0400
Subject: [Biopython-dev] [Bug 2791] New: GenBank Scanner does not parse
	environmental (ENV) files
Message-ID: <bug-2791-42@http.bugzilla.open-bio.org/>

http://bugzilla.open-bio.org/show_bug.cgi?id=2791

           Summary: GenBank Scanner does not parse environmental (ENV) files
           Product: Biopython
           Version: 1.49
          Platform: All
        OS/Version: All
            Status: NEW
          Severity: major
          Priority: P2
         Component: Main Distribution
        AssignedTo: biopython-dev at biopython.org
        ReportedBy: idoerg at gmail.com
                CC: idoerg at gmail.com


GenBank Scanner does not parse environmental (ENV) files. Breask on the 'rc'
characters in the LOCUS lines.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Tue Mar 17 04:14:50 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 17 Mar 2009 00:14:50 -0400
Subject: [Biopython-dev] [Bug 2791] GenBank Scanner does not parse
	environmental (ENV) files
In-Reply-To: <bug-2791-42@http.bugzilla.open-bio.org/>
Message-ID: <200903170414.n2H4Eoit008338@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2791


idoerg at gmail.com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |ASSIGNED


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Tue Mar 17 04:32:30 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 17 Mar 2009 00:32:30 -0400
Subject: [Biopython-dev] [Bug 2791] GenBank Scanner does not parse
	environmental (ENV) files
In-Reply-To: <bug-2791-42@http.bugzilla.open-bio.org/>
Message-ID: <200903170432.n2H4WUQn009490@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2791


idoerg at gmail.com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         AssignedTo|biopython-dev at biopython.org |idoerg at gmail.com
             Status|ASSIGNED                    |NEW


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
You are the assignee for the bug, or are watching the assignee.


From biopython at maubp.freeserve.co.uk  Tue Mar 17 08:46:03 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Tue, 17 Mar 2009 08:46:03 +0000
Subject: [Biopython-dev] biopython on github
In-Reply-To: <128a885f0903162045l474d0df3w2b8fad7f7f129a3b@mail.gmail.com>
References: <320fb6e00902250210t2ad19536ke379e219ba6f7dae@mail.gmail.com>
	<128a885f0903121407g133ed8ctda57b21ff8adb70e@mail.gmail.com>
	<8b34ec180903121620w9c2ec46i8fed9ccb4781370e@mail.gmail.com>
	<320fb6e00903130521s69c5b3eg55b71191b1e8ff21@mail.gmail.com>
	<128a885f0903142243r372026d7vdf5bbe998db3a326@mail.gmail.com>
	<20090315185443.GA30296@kunkel>
	<320fb6e00903160430h125d11a3jd100497d3e25ffb8@mail.gmail.com>
	<20090316224240.GA57054@sobchak.mgh.harvard.edu>
	<6d941f120903161711p71c7c940t1eabe933c0fa43e5@mail.gmail.com>
	<128a885f0903162045l474d0df3w2b8fad7f7f129a3b@mail.gmail.com>
Message-ID: <320fb6e00903170146x59218aa0m857cab797ad3f440@mail.gmail.com>

On Tue, Mar 17, 2009 at 3:45 AM, Chris Lasher <chris.lasher at gmail.com> wrote:
> 2009/3/16 Tiago Ant?o <tiagoantao at gmail.com>
>
>> I've been reading this thread and mainly staying silent but there is
>> one question that is not clear in my mind but I believe it is
>> important:
>>
>> How is the "official" biopython trunk controlled? Currently what is on
>> CVS is the gospel and Peter and Michiel essencially have control of
>> what is there and what is labelled as a "biopython distribution". How
>> will this work now?
>
> In a distributed workflow, there is no technical official repository. The
> "official repository" is socially enforced. Technically, there is no
> official repository of the Linux kernel anymore. However, there is an
> "official" version, which is Linus Torvald's repository. It is socially
> enforced. I think Michiel and Peter still head the Biopython project--at
> least they have the most clout, I would say. Therefore, we will probably
> look to one of their branches as the "official" branch of Biopython. When
> one of them wants to step down in duty, we will socially pass the torch on
> to the next taker.

I think it is essential we have a clearly labeled official trunk
(perhaps with branches for releases), which will be used for all the
official releases (tar balls, zip files and windows installers).  Our
main webpage should make this very clear.

We could potentially continue to have a shared official branch (e.g.
belonging to the generic github biopython user), and give all the
existing CVS contributors write access - and continue to manage this
as before.  So for example, if Frank wanted to check in some minor
changes to Bio.Nexus he could just do it.  Future contributors
patches/branches might get taken up by a developer on a personal
branch for testing, before being merged into the official branch.

i.e. We can initially continue as before - right now I don't have a
feel for how much work the role of an official branch maintainer would
be, and it is difficult to guess without more hands on experience
using the new tools.

>> The second question, related to the first is how will different
>> branches (of different persons) be managed? I am seeing people
>> starting working on the same code in different directions and then
>> having problems merging everything together.
>
> People are supposed to work in different directions; this is the point of
> distributed workflows. Merging tends not to be so difficult, and compared to
> centralized models like CVS and SVN, it's a cinch. We will help provide
> documentation for proper merging habits (e.g., merge early, merge often, and
> no rebasing after pushing, etc.). There are also screencasts popping up (in
> particular Scott Chacon's re-make of his Gitcasts, now at learn.github) that
> we will link to for educational purposes. And of course, other developers
> will be around to help out in tricky merges.

Well, yes, in theory we have the same problem now with CVS - and while
the tools may make merging easier, some communication is essential
when working on the key modules which impact large parts of the code
base.

Peter


From biopython at maubp.freeserve.co.uk  Tue Mar 17 08:58:00 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Tue, 17 Mar 2009 08:58:00 +0000
Subject: [Biopython-dev] biopython on github
In-Reply-To: <20090316224240.GA57054@sobchak.mgh.harvard.edu>
References: <320fb6e00902250210t2ad19536ke379e219ba6f7dae@mail.gmail.com>
	<8b34ec180902260526m3ff42f3x2a99a77d4d0fb928@mail.gmail.com>
	<320fb6e00902260600p5fb90241td1ded497c08cb901@mail.gmail.com>
	<128a885f0903121407g133ed8ctda57b21ff8adb70e@mail.gmail.com>
	<8b34ec180903121620w9c2ec46i8fed9ccb4781370e@mail.gmail.com>
	<320fb6e00903130521s69c5b3eg55b71191b1e8ff21@mail.gmail.com>
	<128a885f0903142243r372026d7vdf5bbe998db3a326@mail.gmail.com>
	<20090315185443.GA30296@kunkel>
	<320fb6e00903160430h125d11a3jd100497d3e25ffb8@mail.gmail.com>
	<20090316224240.GA57054@sobchak.mgh.harvard.edu>
Message-ID: <320fb6e00903170158o757a4fc4naae80f83850d6093@mail.gmail.com>

>
> The documentation looks awesome. My only suggestion would be to
> change the navigation link that current points to CVS to point to a
> generic page like SourceCode. Then that landing page could link
> to the current CVS and explain we are working to transition to
> Git, with links to those pages. Currently, the Git docs are a
> bit buried from the front page.
>
> Peter, I don't appear to have wiki permissions to edit the navigation
> bar; do you?

I'm not sure how to do it (although I probably have the relevant
permissions).  I can probably give you admin rights - you use the
"Chapmanb" username on the wiki, right?

> Peter:
>> I'm thinking a news post on
>> http://news.open-bio.org/news/category/obf-projects/biopython/ about
>> version control would be a good idea at this point. ?How about this -
>
> This is great, and I would move the last paragraph describing
> the Git repository to the beginning; start with what we are doing and
> then describe the rationale. This should help for those with ADD, and
> also give more prominent credit to Bartek, Giovanni and you for the
> work that went into this.

OK.  New version, with the markup for the links included:

Initially for evaluation purposes only, Giovanni and Bartek have setup
a mirror of <a href="http://github.com/biopython/biopython/tree/master">Biopython
on GitHub</a>, which is automatically updated from the OBF hosted <a
href="http://www.biopython.org/wiki/CVS">Biopython CVS repository</a>.
 See our <a href="http://biopython.org/wiki/GitMigration">git
migration wiki page</a> for details.  If this is favorably received,
then moving Biopython from CVS to git seems likely at some point this
year.

Originally, all the OBF hosted projects used <a
href="http://www.nongnu.org/cvs/">CVS</a> for their source code
repositories.  At the start of 2008, <a
href="http://www.bioperl.org">BioPerl</a> and <a
href="http://www.biojava.org">BioJava</a> moved over to <a
href="http://subversion.tigris.org/">Subversion (SVN)</a>, followed by
<a href="http://www.biosql.org">BioSQL</a>.  <a
href="http://www.biopython.org">Biopython</a> was originally going to
do the same, but this didn't actually happen.  Having all the Bio*
projects using the same version control system would have simplified
server administration for the OBF, but using SVN wouldn't really have
made that much difference to Biopython development.  Discussion on the
<a href="http://biopython.org/pipermail/biopython-dev/">Biopython
development mailing list</a> has since shifted towards next-generation
distributed version control systems like <a
href="http://git-scm.com/">git</a> or <a
href="http://bazaar-vcs.org/">Bazaar</a>.

Quote from Linus Torvalds,
<blockquote>The slogan of Subversion for a while was ?CVS done right?,
or something like that, and if you start with that kind of slogan,
there's nowhere you can go. There is no way to do CVS
right.</blockquote>

In addition to creating the Linux kernel, Linus Torvalds more recently
wrote <a href="http://git-scm.com/">git</a>, a prominent example of a
distributed version control system.  Rather than switching from CVS to
SVN, the <a href="http://www.bioruby.org">BioRuby</a> project choose
instead to use git, hosted on <a href="http://github.com">github</a>
(see the <a href="http://github.com/bioruby/bioruby/tree/master">BioRuby
repository</a>).  Biopython is considering doing something similar -
using a <em>distributed</em> version control system like git should
make it easier for potential Biopython contributors to manage their
own local copies of Biopython under version control.

Peter, on behalf of the Biopython developers


From biopython at maubp.freeserve.co.uk  Tue Mar 17 09:06:31 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Tue, 17 Mar 2009 09:06:31 +0000
Subject: [Biopython-dev] history on github - where are the tags?
Message-ID: <320fb6e00903170206h570989bbgb6b9a761d2aa70ed@mail.gmail.com>

Hi Bartek et al,

I've just been looking over the github mirror of CVS, and wanted to
see it presented the history of individual files.  For example, this
page looks at the Bio/SeqRecord.py history using ViewCVS:
http://cvs.biopython.org/cgi-bin/viewcvs/viewcvs.cgi/biopython/Bio/SeqRecord.py?cvsroot=biopython

For comparison, in GitHub,
http://github.com/biopython/biopython/commits/master/Bio/SeqRecord.py

As you can see, all the comments and changes are there - which is
great.  But I can't see the CVS tag information, which I assume would
be converting into git tags.  Is this information present in the git
repository, but not shown by github, or was it lost during the
migration?  This might seem like a little thing, but I have found it
incredibly important for tracing bugs reported in older releases, for
example in narrowing down when something changed.

Peter


From biopython at maubp.freeserve.co.uk  Tue Mar 17 09:41:22 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Tue, 17 Mar 2009 09:41:22 +0000
Subject: [Biopython-dev] biopython on github
In-Reply-To: <8b34ec180903161404s506757c2k80597a12a362cfc1@mail.gmail.com>
References: <8b34ec180902231029u7a9d003r533af7f078f4a8e2@mail.gmail.com>
	<8b34ec180903121620w9c2ec46i8fed9ccb4781370e@mail.gmail.com>
	<320fb6e00903130521s69c5b3eg55b71191b1e8ff21@mail.gmail.com>
	<128a885f0903142243r372026d7vdf5bbe998db3a326@mail.gmail.com>
	<20090315185443.GA30296@kunkel>
	<320fb6e00903160430h125d11a3jd100497d3e25ffb8@mail.gmail.com>
	<8b34ec180903160724h2e239fafi22d8f5fa9c1de7cc@mail.gmail.com>
	<320fb6e00903160800s36b8231fo57e0a11506f8635d@mail.gmail.com>
	<5aa3b3570903161257h75b4289bn6cebed8312834fc9@mail.gmail.com>
	<8b34ec180903161404s506757c2k80597a12a362cfc1@mail.gmail.com>
Message-ID: <320fb6e00903170241i5b4a122ax1f33ff18450771df@mail.gmail.com>

On Mon, Mar 16, 2009 at 9:04 PM, Bartek Wilczynski
<bartek at rezolwenta.eu.org> wrote:
> Hi,
> On Mon, Mar 16, 2009 at 8:57 PM, Giovanni Marco Dall'Olio
> <dalloliogm at gmail.com> wrote:
>>
>> At the moment I am having some strange problems, relative to the fact that I
>> had a branch previously named as 'biopython' in my account, so it seems
>> don't understand well the fact that the old branch has been renamed.
>> For example, I don't have the 'Fork' button.... but it must be a temporary
>> problem, I already contacted the github's tech support.
>
> This is connected with the change I made in the repository. Namely I
> renamed the branch created by Giovanni to biopuython-old and created
> a new one (the "official" one) called biopython again.
>
> The "rename" feature was flagged as experimental, and I don't think we
> would expect to use it anymore, and there were warnings that it can affect
> the branches forked from the branched previously created by Giovanni.

We may need to do another rename, if we have to repeat the CVS to git migration.
For example, see my other email about the CVS tags (missing?).
Another potential
question is can you re-map the CVS usernames as part of the migration?  e.g. Can
you somehow replace CVS users "bartek", "peterc", ... with guthub
users "barwil",
"peterjc", ...?  Not essential, but it would be nice.

I would suggest as a precaution we rename it sooner rather than later
(while only
a few people will be inconvenienced), going from biopython to
biopython-cvs-mirror
(or similar).  If this does end up being the actual trunk branch, we
can just fork it
under a new branch name like "biopython" or "biopython-official" etc.

Peter


From lpritc at scri.ac.uk  Tue Mar 17 09:59:32 2009
From: lpritc at scri.ac.uk (Leighton Pritchard)
Date: Tue, 17 Mar 2009 09:59:32 +0000
Subject: [Biopython-dev] biopython on github
In-Reply-To: <320fb6e00903170146x59218aa0m857cab797ad3f440@mail.gmail.com>
Message-ID: <C5E52504.1F20A%lpritc@scri.ac.uk>

Hi all,

This has been an occasionally frustrating thread to read...

On 17/03/2009 08:46, "Peter" <biopython at maubp.freeserve.co.uk> wrote:

> On Tue, Mar 17, 2009 at 3:45 AM, Chris Lasher <chris.lasher at gmail.com> wrote:
>> 2009/3/16 Tiago Ant?o <tiagoantao at gmail.com>
>> 

>>> How is the "official" biopython trunk controlled? Currently what is on
>>> CVS is the gospel and Peter and Michiel essencially have control of
>>> what is there and what is labelled as a "biopython distribution". How
>>> will this work now?
 
>> In a distributed workflow, there is no technical official repository. The
>> "official repository" is socially enforced.

That was true before.  Unless I misread the Biopython licencing, there was
no real barrier to putting a branched copy of the code on your own
server/site, with your own modifications.  What git does is provide tools to
make merging of that sort of code easier (along with a number of of other
nice features, such as authentication of contributions).  The presence of
git does not ensure that your changes, or anyone else's, will be merged with
any other repository, and nor does it ensure the quality of contributed
code.  Git, while nice, and ideal for a number of tasks, is no magic bullet.

To an extent, the 'official' repository is, pragmatically, the one that is
most stable and well-tested.  If my hypothetical branched version had become
more stable and widely-used than the 'official' trunk, and become the most
frequently downloaded and implemented, and received new contributions in its
own right, it might then be considered de facto 'the distribution'; nasty
online spats with the original authors notwithstanding.  The 'social
enforcement' of politeness (i.e. *I* don't take credit for *your* work)
prevents this to an extent, as it ought to under any versioning system.

There's a competing tendency to consider that the coders who spent the most
time creating the code understand it the best, and are in the best position
to maintain it directly.  This is true to a large degree, and entirely
applicable to Biopython's contributed modules.  git can potentially
facilitate that sort of contribution to the 'official' trunk in a way that
CVS can't, due to its permissions bottleneck.  However, the mechanics of
incorporating that contributed code are more or less the same: the people
with control of the 'official' trunk review the code and decide whether to
include it.  This is true whether the code is submitted as a patch to
Bugzilla, emailed to a developer, put up on public CVS on your site, or in a
forked git repository.  The same is true of your own git repository - you
don't have to include someone else's forked code if you don't want to.

What possibly needs to change is not the version control system, but the way
in which people think about their contribution.  Contributions can be made
productively under any versioning system, and the key questions remain the
same in all cases: Does the new code work (are there tests)? Does the new
code break any old code?  Is there documentation?  Is the API consistent?

"What version control system are we using?" is a minor detail, unless it is
inherently broken, hinders any of the above, or causes some other
deal-breaking issue (for Linus Torvalds, this included speed issues for
merges).

>> I think Michiel and Peter still head the Biopython project--at
>> least they have the most clout, I would say. Therefore, we will probably
>> look to one of their branches as the "official" branch of Biopython. When
>> one of them wants to step down in duty, we will socially pass the torch on
>> to the next taker.

It has always been thus.  Now, instead of passing on the user authentication
to the CVS server at OBF, the user authentication to the biopython github
account will be passed on, instead:

> I think it is essential we have a clearly labeled official trunk
> (perhaps with branches for releases), which will be used for all the
> official releases (tar balls, zip files and windows installers).  Our
> main webpage should make this very clear.
> 
> We could potentially continue to have a shared official branch (e.g.
> belonging to the generic github biopython user), and give all the
> existing CVS contributors write access - and continue to manage this
> as before.  So for example, if Frank wanted to check in some minor
> changes to Bio.Nexus he could just do it.  Future contributors
> patches/branches might get taken up by a developer on a personal
> branch for testing, before being merged into the official branch.
> 
> i.e. We can initially continue as before - right now I don't have a
> feel for how much work the role of an official branch maintainer would
> be, and it is difficult to guess without more hands on experience
> using the new tools.
 
Plus ca change (avec git)...

>>> The second question, related to the first is how will different
>>> branches (of different persons) be managed? I am seeing people
>>> starting working on the same code in different directions and then
>>> having problems merging everything together.
>> 
>> People are supposed to work in different directions; this is the point of
>> distributed workflows.

I may have a different understanding of 'different directions' than you
mean, but I don't think that it's good for a community project if people
work in different directions.  I also don't think that that is the point of
distributed workflows; on the contrary, I think that they are intended to
make it easier to work independently towards a common goal.  Even if that is
by working on loosely- or non-interacting parts of the whole.

>> Merging tends not to be so difficult, and compared to
>> centralized models like CVS and SVN, it's a cinch. We will help provide
>> documentation for proper merging habits (e.g., merge early, merge often, and
>> no rebasing after pushing, etc.). There are also screencasts popping up (in
>> particular Scott Chacon's re-make of his Gitcasts, now at learn.github) that
>> we will link to for educational purposes.
>> And of course, other developers will be around to help out in tricky merges.

This characterises one of the frustrating aspects of this thread (not
getting at you personally, Chris) - the occasional implicit assumption that
'things will be inherently *better* if we use git'.  Developers are around
to help now, even using CVS (which also has clear, long-standing stable
documentation - and even an O'Reilly book).  Several people don't seem to
think that that - and the way that code is reviewed and incorporated into
the main distribution - is good enough, and I don't think that this will
change just because the version control system has changed.  Nor will
changing revision control system generate significant free time to write,
test and document code.  But we may have the recession to do that last one
for us.

> Well, yes, in theory we have the same problem now with CVS - and while
> the tools may make merging easier, some communication is essential
> when working on the key modules which impact large parts of the code
> base.

I would put it more strongly than that: communication is essential in all
aspects of the project.  A number of related blog posts make statements
along the lines of "I don't use Biopython, or post to the mailing lists, but
I think that they're doing *this* wrong", or "I submitted code, but it
didn't get taken up immediately".  Now, venting and ranting on a blog is
fine, but it's not really *communicating*, any more than it was when I
thought that the BioSQL GenBank upload code was broken, fixed it (for my
purposes) and told no-one.  Git won't change the communication issue (in
either direction) any more than it changes the code review process.

FWIW, I think that git looks like a good way to go, and that it could help
encourage people to make local modifications of Biopython for their own
benefit and in their own interests and expert area, in a way that is visible
to the core distribution (unlike the patch submission process that is now
implemented).  In that way it could facilitate more rapid expansion of the
core distribution.  However, the bottlenecks of ensuring code quality,
testing and documentation will only ease if that is taken up by the
individuals/groups making those contributions, in addition to the core
developers.

And yes, I know I'm late with the new GenomeDiagram docs... ;)

L.

-- 
Dr Leighton Pritchard MRSC
D131, Plant Pathology Programme, SCRI
Errol Road, Invergowrie, Perth and Kinross, Scotland, DD2 5DA
e:lpritc at scri.ac.uk       w:http://www.scri.ac.uk/staff/leightonpritchard
gpg/pgp: 0xFEFC205C       tel:+44(0)1382 562731 x2405


______________________________________________________
SCRI, Invergowrie, Dundee, DD2 5DA.  
The Scottish Crop Research Institute is a charitable company limited by guarantee. 
Registered in Scotland No: SC 29367.
Recognised by the Inland Revenue as a Scottish Charity No: SC 006662.


DISCLAIMER:

This email is from the Scottish Crop Research Institute, but the views expressed by the sender are not necessarily the views of SCRI and its subsidiaries.  This email and any files transmitted with it are confidential to the intended recipient at the e-mail address to which it has been addressed.  It may not be disclosed or used by any other than that
addressee.
If you are not the intended recipient you are requested to preserve this confidentiality and you must not use, disclose, copy, print or rely on
this e-mail in any way. Please notify postmaster at scri.ac.uk quoting the name of the sender and delete the email from your system.

Although SCRI has taken reasonable precautions to ensure no viruses are present in this email, neither the Institute nor the sender accepts any responsibility for any viruses, and it is your responsibility to scan the email and the attachments (if any).
______________________________________________________


From bartek at rezolwenta.eu.org  Tue Mar 17 10:06:33 2009
From: bartek at rezolwenta.eu.org (Bartek Wilczynski)
Date: Tue, 17 Mar 2009 11:06:33 +0100
Subject: [Biopython-dev] history on github - where are the tags?
In-Reply-To: <8b34ec180903170302v7dca4f04w85a11d3f0fbe6314@mail.gmail.com>
References: <320fb6e00903170206h570989bbgb6b9a761d2aa70ed@mail.gmail.com>
	<8b34ec180903170302v7dca4f04w85a11d3f0fbe6314@mail.gmail.com>
Message-ID: <8b34ec180903170306ocf4b9e7s6d34cacdfb7e423b@mail.gmail.com>

Hi,

I'll look into this. I'm now heading for a plane, so I can't do it now.

cheers
 Bartek

On Tue, Mar 17, 2009 at 11:02 AM, Bartek Wilczynski <barwil at gmail.com> wrote:
> Hi,
>
> I'll look into this. I'm now heading for a plane, so I can't do it now.
>
> cheers
> ?Bartek
>
> On Tue, Mar 17, 2009 at 10:06 AM, Peter <biopython at maubp.freeserve.co.uk> wrote:
>> Hi Bartek et al,
>>
>> I've just been looking over the github mirror of CVS, and wanted to
>> see it presented the history of individual files. ?For example, this
>> page looks at the Bio/SeqRecord.py history using ViewCVS:
>> http://cvs.biopython.org/cgi-bin/viewcvs/viewcvs.cgi/biopython/Bio/SeqRecord.py?cvsroot=biopython
>>
>> For comparison, in GitHub,
>> http://github.com/biopython/biopython/commits/master/Bio/SeqRecord.py
>>
>> As you can see, all the comments and changes are there - which is
>> great. ?But I can't see the CVS tag information, which I assume would
>> be converting into git tags. ?Is this information present in the git
>> repository, but not shown by github, or was it lost during the
>> migration? ?This might seem like a little thing, but I have found it
>> incredibly important for tracing bugs reported in older releases, for
>> example in narrowing down when something changed.
>>
>> Peter
>>
>
>
>
> --
> Bartek Wilczynski
> ==================
> Postdoctoral fellow
> EMBL, Furlong group
> Meyerhoffstrasse 1,
> 69012 Heidelberg,
> Germany
> tel: +49 6221 387 8433
>


-- 
Bartek Wilczynski
==================
Postdoctoral fellow
EMBL, Furlong group
Meyerhoffstrasse 1,
69012 Heidelberg,
Germany
tel: +49 6221 387 8433


From biopython at maubp.freeserve.co.uk  Tue Mar 17 10:17:25 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Tue, 17 Mar 2009 10:17:25 +0000
Subject: [Biopython-dev] gitignore file for github
Message-ID: <320fb6e00903170317q683202c6ycd799de0ba748ef4@mail.gmail.com>

Hi all,

I think we should add a .gitignore file to the github mirror copy
repository, which should ignore:

* the build subdirectory and all its contents
* all *.pyc files (recursively, e.g. for the unit tests)
* all LaTeX temporary files recursively under Doc (e.g. *.aux, *.log)

Is there anything else this should include?  There are a few output
files created by the unit tests that we might want to include...

Otherwise all these files show up as "unstaged" to use git's
terminology, and there is a risk of someone accidentally committing
them.

Peter


From biopython at maubp.freeserve.co.uk  Tue Mar 17 10:57:37 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Tue, 17 Mar 2009 10:57:37 +0000
Subject: [Biopython-dev] gitignore file for github
In-Reply-To: <320fb6e00903170317q683202c6ycd799de0ba748ef4@mail.gmail.com>
References: <320fb6e00903170317q683202c6ycd799de0ba748ef4@mail.gmail.com>
Message-ID: <320fb6e00903170357s14a20each59f50f5e155298b0@mail.gmail.com>

On Tue, Mar 17, 2009 at 10:17 AM, Peter <biopython at maubp.freeserve.co.uk> wrote:
> Hi all,
>
> I think we should add a .gitignore file to the github mirror copy
> repository, which should ignore:
>
> * the build subdirectory and all its contents
> * all *.pyc files (recursively, e.g. for the unit tests)
> * all LaTeX temporary files recursively under Doc (e.g. *.aux, *.log)
>
> Is there anything else this should include? ?There are a few output
> files created by the unit tests that we might want to include...

This seems to work pretty well:

#Ignore the build directory (and its sub-directories):
build
#Ignore backup files from some Unix editors,
*~
#Ignore all compiled python files (e.g. from running the unit tests):
*.pyc
#The graphics unit tests produce output files for human inspection
#(at the time of writing, only PDF files are created but I expect
#this to change).
Tests/Graphics/*.pdf
Tests/Graphics/*.eps
Tests/Graphics/*.svg
Tests/Graphics/*.png

I've uploaded this as part of one of my test branches on github,
http://github.com/peterjc/biopython-seqio-quality/tree/master

Peter


From bugzilla-daemon at portal.open-bio.org  Tue Mar 17 10:59:22 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 17 Mar 2009 06:59:22 -0400
Subject: [Biopython-dev] [Bug 2767] Bio.SeqIO support for FASTQ and QUAL
	files
In-Reply-To: <bug-2767-42@http.bugzilla.open-bio.org/>
Message-ID: <200903171059.n2HAxMms006144@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2767


------- Comment #10 from biopython-bugzilla at maubp.freeserve.co.uk  2009-03-17 06:59 EST -------
I've made these changes available on a test github branch,
http://github.com/peterjc/biopython-seqio-quality/tree/master

This doesn't include all the example files for the unit tests yet.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From tiagoantao at gmail.com  Tue Mar 17 11:18:52 2009
From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=)
Date: Tue, 17 Mar 2009 11:18:52 +0000
Subject: [Biopython-dev] biopython on github
In-Reply-To: <320fb6e00903170146x59218aa0m857cab797ad3f440@mail.gmail.com>
References: <320fb6e00902250210t2ad19536ke379e219ba6f7dae@mail.gmail.com>
	<8b34ec180903121620w9c2ec46i8fed9ccb4781370e@mail.gmail.com>
	<320fb6e00903130521s69c5b3eg55b71191b1e8ff21@mail.gmail.com>
	<128a885f0903142243r372026d7vdf5bbe998db3a326@mail.gmail.com>
	<20090315185443.GA30296@kunkel>
	<320fb6e00903160430h125d11a3jd100497d3e25ffb8@mail.gmail.com>
	<20090316224240.GA57054@sobchak.mgh.harvard.edu>
	<6d941f120903161711p71c7c940t1eabe933c0fa43e5@mail.gmail.com>
	<128a885f0903162045l474d0df3w2b8fad7f7f129a3b@mail.gmail.com>
	<320fb6e00903170146x59218aa0m857cab797ad3f440@mail.gmail.com>
Message-ID: <6d941f120903170418k46481c8bj8c20d510314f57ee@mail.gmail.com>

On Tue, Mar 17, 2009 at 8:46 AM, Peter <biopython at maubp.freeserve.co.uk> wrote:
> I think it is essential we have a clearly labeled official trunk
> (perhaps with branches for releases), which will be used for all the
> official releases (tar balls, zip files and windows installers). ?Our
> main webpage should make this very clear.

I agree.

I would like to take this opportunity just to make my opinion clear (I
normally tend to list hipothesis and refrain to give my own opinions).

1. I don't think there is a pressing need to go from CVS to whatever.
While CVS is not perfect I don't think it has been a big hurdle. But
if people want to go in that direction, I have no strong feelings
against it also.
2. The hurdle was that _policy_ was too conservative: Some time ago it
was not acceptable even to consider a development branch. That
stiffles things (although it ensures stability which is good).
Fortunately things are more negotiatable now. The point is: the main
issues are policy, not technology.
3. Like it or not, different mechanisms (ie centralized versus
distributed VCSs) facilitate different policies. Distributed version
control facilitates branching to a massive degree.
4. I think a middle ground is a good idea: While there is an official
distribution (eg that one that is labelled biopython 1.50 and that
will end up on most users computers) which is agressively controled,
there should be space for people to try out new things.
5. People that try out new things should be aware (to avoid
disappointment) that their new code might not be accepted, for many
reasons on the official trunk: not enough documentation, no test
cases, design not acceptable, poorly-commented code, whatever. It
would be very sad that people would start working on something, spend
lots of time on their branch just to see their code refused to be on
the "official" trunk.

So, in my view things work like this:
A. The "official" version on biopython.org is controlled by a "head
honcho", currently Peter with input from biopython-dev. This is the
version that most users will ever see in practice.
B. The official version has a lot of quality enforcement on top.
C. People should be free to branch away and try new things.
D. People that branch away should be aware that their stuff might not
be accepted on the official distribution. If they want it accepted
they should come to biopython-dev and have a cup of tea with the
community.
E. Maybe some contact points should be defined for modules?
F. People who want their code included in the "official" distribution
should seriously think in branching from the "official" branch and not
from any other.

I would really like to see an "official" git branch which should be
created, in my opinion from a stable release and either by Peter or
Michiel (or any other long term CVS-write user). In my case I would
branch to maintain some of the PopGen code.


Tiago


From lpritc at scri.ac.uk  Tue Mar 17 12:19:28 2009
From: lpritc at scri.ac.uk (Leighton Pritchard)
Date: Tue, 17 Mar 2009 12:19:28 +0000
Subject: [Biopython-dev] biopython on github
In-Reply-To: <6d941f120903170418k46481c8bj8c20d510314f57ee@mail.gmail.com>
Message-ID: <C5E545D0.1F230%lpritc@scri.ac.uk>

On 17/03/2009 11:18, "Tiago Ant?o" <tiagoantao at gmail.com> wrote:

> On Tue, Mar 17, 2009 at 8:46 AM, Peter <biopython at maubp.freeserve.co.uk>
> wrote:
>> I think it is essential we have a clearly labeled official trunk
>> (perhaps with branches for releases), which will be used for all the
>> official releases (tar balls, zip files and windows installers). ?Our
>> main webpage should make this very clear.
> 
> I agree.
> 
> I would like to take this opportunity just to make my opinion clear (I
> normally tend to list hipothesis and refrain to give my own opinions).

[...]

+1 for Tiago's opinion.

L.


-- 
Dr Leighton Pritchard MRSC
D131, Plant Pathology Programme, SCRI
Errol Road, Invergowrie, Perth and Kinross, Scotland, DD2 5DA
e:lpritc at scri.ac.uk       w:http://www.scri.ac.uk/staff/leightonpritchard
gpg/pgp: 0xFEFC205C       tel:+44(0)1382 562731 x2405


______________________________________________________
SCRI, Invergowrie, Dundee, DD2 5DA.  
The Scottish Crop Research Institute is a charitable company limited by guarantee. 
Registered in Scotland No: SC 29367.
Recognised by the Inland Revenue as a Scottish Charity No: SC 006662.


DISCLAIMER:

This email is from the Scottish Crop Research Institute, but the views expressed by the sender are not necessarily the views of SCRI and its subsidiaries.  This email and any files transmitted with it are confidential to the intended recipient at the e-mail address to which it has been addressed.  It may not be disclosed or used by any other than that
addressee.
If you are not the intended recipient you are requested to preserve this confidentiality and you must not use, disclose, copy, print or rely on
this e-mail in any way. Please notify postmaster at scri.ac.uk quoting the name of the sender and delete the email from your system.

Although SCRI has taken reasonable precautions to ensure no viruses are present in this email, neither the Institute nor the sender accepts any responsibility for any viruses, and it is your responsibility to scan the email and the attachments (if any).
______________________________________________________


From biopython at maubp.freeserve.co.uk  Tue Mar 17 12:44:05 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Tue, 17 Mar 2009 12:44:05 +0000
Subject: [Biopython-dev] biopython on github
In-Reply-To: <6d941f120903170418k46481c8bj8c20d510314f57ee@mail.gmail.com>
References: <320fb6e00902250210t2ad19536ke379e219ba6f7dae@mail.gmail.com>
	<320fb6e00903130521s69c5b3eg55b71191b1e8ff21@mail.gmail.com>
	<128a885f0903142243r372026d7vdf5bbe998db3a326@mail.gmail.com>
	<20090315185443.GA30296@kunkel>
	<320fb6e00903160430h125d11a3jd100497d3e25ffb8@mail.gmail.com>
	<20090316224240.GA57054@sobchak.mgh.harvard.edu>
	<6d941f120903161711p71c7c940t1eabe933c0fa43e5@mail.gmail.com>
	<128a885f0903162045l474d0df3w2b8fad7f7f129a3b@mail.gmail.com>
	<320fb6e00903170146x59218aa0m857cab797ad3f440@mail.gmail.com>
	<6d941f120903170418k46481c8bj8c20d510314f57ee@mail.gmail.com>
Message-ID: <320fb6e00903170544i401fefa4gbfa2b2d542e94816@mail.gmail.com>

2009/3/17 Tiago Ant?o <tiagoantao at gmail.com>:
> On Tue, Mar 17, 2009 at 8:46 AM, Peter <biopython at maubp.freeserve.co.uk> wrote:
>> I think it is essential we have a clearly labeled official trunk
>> (perhaps with branches for releases), which will be used for all the
>> official releases (tar balls, zip files and windows installers). ?Our
>> main webpage should make this very clear.
>
> I agree.
>
> I would like to take this opportunity just to make my opinion clear (I
> normally tend to list hipothesis and refrain to give my own opinions).
>
> 1. I don't think there is a pressing need to go from CVS to whatever.
> While CVS is not perfect I don't think it has been a big hurdle. But
> if people want to go in that direction, I have no strong feelings
> against it also.

On a purely pragmatic level, yes, CVS has been enough.  This is one
real reason why there hasn't been a great deal of pressure on us to
move - it wasn't "broken" for how Biopython worked, although it does
make branching non-trivial.  Moving from CVS to a distributed version
control system (DVCS) won't make much difference for those of us with
CVS access - the big benefit as I see it is for potential contributors
who can easily make a branch to try out their ideas, and keep it in
sync with the master branch.  This could transform how new modules or
bug fixes get contributed, hopefully for the better.

> 2. The hurdle was that _policy_ was too conservative: Some time ago it
> was not acceptable even to consider a development branch. That
> stiffles things (although it ensures stability which is good).
> Fortunately things are more negotiatable now. The point is: the main
> issues are policy, not technology.

Historically Biopython has worked from a single stable branch (Brad -
can you comment about the history of this effective policy?).  I
recall saying something in the last year or so about not wanting to do
any branching in CVS while the SVN migration seemed imminent, but this
was primarily to avoid any complication in the migration itself,
rather than any deep objection to branches themselves.

> 3. Like it or not, different mechanisms (ie centralized versus
> distributed VCSs) facilitate different policies. Distributed version
> control facilitates branching to a massive degree.

True.

> 4. I think a middle ground is a good idea: While there is an official
> distribution (eg that one that is labelled biopython 1.50 and that
> will end up on most users computers) which is agressively controled,
> there should be space for people to try out new things.

I'm not quite sure what you mean by agressively controlled.  Moving to
a DVCS really should make public experimental branches much easier.

> 5. People that try out new things should be aware (to avoid
> disappointment) that their new code might not be accepted, for many
> reasons on the official trunk: not enough documentation, no test
> cases, design not acceptable, poorly-commented code, whatever. It
> would be very sad that people would start working on something, spend
> lots of time on their branch just to see their code refused to be on
> the "official" trunk.

That is a risk - especially if anyone were to go off and work in
complete isolation without even posting anything to this mailing list.

> So, in my view things work like this:
> A. The "official" version on biopython.org is controlled by a "head
> honcho", currently Peter with input from biopython-dev. This is the
> version that most users will ever see in practice.

That could work - although having anyone as a single bottle neck is a
risk, assuming you get someone to agree to the role in the first place
;)  I am generally happy with the current arrangement where module
owners have a degree of autonomy over their modules.  I wouldn't want
to have to approve every single minor change you (Tiago) make to
Bio.PopGen - but I suppose occasional review and merging of code from
Tiago's branch on request wouldn't be too onerous.

> B. The official version has a lot of quality enforcement on top.

What does that mean?  e.g. a strict policy about unit tests before
anything goes into the main branch?

> C. People should be free to branch away and try new things.

Given the Biopython license (as Leighton pointed out) this is already
the case with CVS.  Its just using a DVCS makes should this easier,
especially for keeping branches in sync with the official branch, and
hopefully for any merges back.

> D. People that branch away should be aware that their stuff might not
> be accepted on the official distribution. If they want it accepted
> they should come to biopython-dev and have a cup of tea with the
> community.

I agree.  I like tea.

> E. Maybe some contact points should be defined for modules?

Do you mean something more explicit about documenting who currently
maintains each module?

> F. People who want their code included in the "official" distribution
> should seriously think in branching from the "official" branch and not
> from any other.

I agree.

> I would really like to see an "official" git branch which should be
> created, in my opinion from a stable release and either by Peter or
> Michiel (or any other long term CVS-write user).

I think we'll have that - and in the short term the CVS mirror on
github can be used.

> In my case I would branch to maintain some of the PopGen code.

Great.

Peter


From chapmanb at 50mail.com  Tue Mar 17 12:49:30 2009
From: chapmanb at 50mail.com (Brad Chapman)
Date: Tue, 17 Mar 2009 08:49:30 -0400
Subject: [Biopython-dev] biopython on github
In-Reply-To: <C5E52504.1F20A%lpritc@scri.ac.uk>
References: <320fb6e00903170146x59218aa0m857cab797ad3f440@mail.gmail.com>
	<C5E52504.1F20A%lpritc@scri.ac.uk>
Message-ID: <20090317124930.GE57054@sobchak.mgh.harvard.edu>

Hi everyone;
Nice to see the discussion around trying out git. Leighton and
Tiago, you both brought up some definite concerns in moving
to a distributed version control system.

Git aims to help solve the problem of a them versus us community.
When you read posts critical of Biopython, you will find a lot of
complaints about "they didn't do this." This is confusing, as anyone
using, coding with, interested in, or contributing to Biopython is a
member of the community. CVS can help create this division, since it
appears as a walled off repository only the core developers can
access.

Git frees up the source code and lowers this barrier to contributing. Now
instead of saying "why didn't the developers integrate the code I
sent to the mailing list and write tests and documentation for it,"
we can all turn the question back on ourselves and ask why we didn't
create a branch with our new contribution and do it, soliciting help
from others in Biopython.

With solving the problems come potential concerns. This coincidental
blog post from yesterday intelligently covers a lot of the issues:

http://www.pointy-stick.com/blog/2009/03/16/dark-side-distributed-version-control/

The one we should be most concerned about is fragmentation. The
community of Python coders in bioinformatics is too small to be
split up; surely we are better served by resolving any differences
and producing one high quality reusable code base.

Tiago's assessment of how things should work practically looks
exactly right. Hard working core developers, like Peter and
Michiel, will be maintaining the trunk which we roll releases off
of. Contributors can either submit patches as now, or create short
branches which get merged back in. The advantage of branches is that
others can test and develop the branched code, and that the software
should help deal with some of the pain of merging.

There is a lot of good material in this thread for new potential
developers. Tiago, it would make sense to condense what you've
written and include it with the Contributing guide:

http://biopython.org/wiki/Contributing

We should also create a place on the wiki from the developer
documentation:

http://biopython.org/wiki/Documentation#Documentation_for_Developers

that describes active development branches and their goals
(called, say, ActiveBranches). Tiago, I thought you did a page for PopGen
earlier like this but I can't find it right now. We should keep
communication at a high level to avoid confusing fragmentation.

This is a difficult change in terms of how things work; we are
asking the right questions to create a good environment for improvement.

Brad

> Hi all,
> 
> This has been an occasionally frustrating thread to read...
> 
> On 17/03/2009 08:46, "Peter" <biopython at maubp.freeserve.co.uk> wrote:
> 
> > On Tue, Mar 17, 2009 at 3:45 AM, Chris Lasher <chris.lasher at gmail.com> wrote:
> >> 2009/3/16 Tiago Ant?o <tiagoantao at gmail.com>
> >> 
> 
> >>> How is the "official" biopython trunk controlled? Currently what is on
> >>> CVS is the gospel and Peter and Michiel essencially have control of
> >>> what is there and what is labelled as a "biopython distribution". How
> >>> will this work now?
>  
> >> In a distributed workflow, there is no technical official repository. The
> >> "official repository" is socially enforced.
> 
> That was true before.  Unless I misread the Biopython licencing, there was
> no real barrier to putting a branched copy of the code on your own
> server/site, with your own modifications.  What git does is provide tools to
> make merging of that sort of code easier (along with a number of of other
> nice features, such as authentication of contributions).  The presence of
> git does not ensure that your changes, or anyone else's, will be merged with
> any other repository, and nor does it ensure the quality of contributed
> code.  Git, while nice, and ideal for a number of tasks, is no magic bullet.
> 
> To an extent, the 'official' repository is, pragmatically, the one that is
> most stable and well-tested.  If my hypothetical branched version had become
> more stable and widely-used than the 'official' trunk, and become the most
> frequently downloaded and implemented, and received new contributions in its
> own right, it might then be considered de facto 'the distribution'; nasty
> online spats with the original authors notwithstanding.  The 'social
> enforcement' of politeness (i.e. *I* don't take credit for *your* work)
> prevents this to an extent, as it ought to under any versioning system.
> 
> There's a competing tendency to consider that the coders who spent the most
> time creating the code understand it the best, and are in the best position
> to maintain it directly.  This is true to a large degree, and entirely
> applicable to Biopython's contributed modules.  git can potentially
> facilitate that sort of contribution to the 'official' trunk in a way that
> CVS can't, due to its permissions bottleneck.  However, the mechanics of
> incorporating that contributed code are more or less the same: the people
> with control of the 'official' trunk review the code and decide whether to
> include it.  This is true whether the code is submitted as a patch to
> Bugzilla, emailed to a developer, put up on public CVS on your site, or in a
> forked git repository.  The same is true of your own git repository - you
> don't have to include someone else's forked code if you don't want to.
> 
> What possibly needs to change is not the version control system, but the way
> in which people think about their contribution.  Contributions can be made
> productively under any versioning system, and the key questions remain the
> same in all cases: Does the new code work (are there tests)? Does the new
> code break any old code?  Is there documentation?  Is the API consistent?
> 
> "What version control system are we using?" is a minor detail, unless it is
> inherently broken, hinders any of the above, or causes some other
> deal-breaking issue (for Linus Torvalds, this included speed issues for
> merges).
> 
> >> I think Michiel and Peter still head the Biopython project--at
> >> least they have the most clout, I would say. Therefore, we will probably
> >> look to one of their branches as the "official" branch of Biopython. When
> >> one of them wants to step down in duty, we will socially pass the torch on
> >> to the next taker.
> 
> It has always been thus.  Now, instead of passing on the user authentication
> to the CVS server at OBF, the user authentication to the biopython github
> account will be passed on, instead:
> 
> > I think it is essential we have a clearly labeled official trunk
> > (perhaps with branches for releases), which will be used for all the
> > official releases (tar balls, zip files and windows installers).  Our
> > main webpage should make this very clear.
> > 
> > We could potentially continue to have a shared official branch (e.g.
> > belonging to the generic github biopython user), and give all the
> > existing CVS contributors write access - and continue to manage this
> > as before.  So for example, if Frank wanted to check in some minor
> > changes to Bio.Nexus he could just do it.  Future contributors
> > patches/branches might get taken up by a developer on a personal
> > branch for testing, before being merged into the official branch.
> > 
> > i.e. We can initially continue as before - right now I don't have a
> > feel for how much work the role of an official branch maintainer would
> > be, and it is difficult to guess without more hands on experience
> > using the new tools.
>  
> Plus ca change (avec git)...
> 
> >>> The second question, related to the first is how will different
> >>> branches (of different persons) be managed? I am seeing people
> >>> starting working on the same code in different directions and then
> >>> having problems merging everything together.
> >> 
> >> People are supposed to work in different directions; this is the point of
> >> distributed workflows.
> 
> I may have a different understanding of 'different directions' than you
> mean, but I don't think that it's good for a community project if people
> work in different directions.  I also don't think that that is the point of
> distributed workflows; on the contrary, I think that they are intended to
> make it easier to work independently towards a common goal.  Even if that is
> by working on loosely- or non-interacting parts of the whole.
> 
> >> Merging tends not to be so difficult, and compared to
> >> centralized models like CVS and SVN, it's a cinch. We will help provide
> >> documentation for proper merging habits (e.g., merge early, merge often, and
> >> no rebasing after pushing, etc.). There are also screencasts popping up (in
> >> particular Scott Chacon's re-make of his Gitcasts, now at learn.github) that
> >> we will link to for educational purposes.
> >> And of course, other developers will be around to help out in tricky merges.
> 
> This characterises one of the frustrating aspects of this thread (not
> getting at you personally, Chris) - the occasional implicit assumption that
> 'things will be inherently *better* if we use git'.  Developers are around
> to help now, even using CVS (which also has clear, long-standing stable
> documentation - and even an O'Reilly book).  Several people don't seem to
> think that that - and the way that code is reviewed and incorporated into
> the main distribution - is good enough, and I don't think that this will
> change just because the version control system has changed.  Nor will
> changing revision control system generate significant free time to write,
> test and document code.  But we may have the recession to do that last one
> for us.
> 
> > Well, yes, in theory we have the same problem now with CVS - and while
> > the tools may make merging easier, some communication is essential
> > when working on the key modules which impact large parts of the code
> > base.
> 
> I would put it more strongly than that: communication is essential in all
> aspects of the project.  A number of related blog posts make statements
> along the lines of "I don't use Biopython, or post to the mailing lists, but
> I think that they're doing *this* wrong", or "I submitted code, but it
> didn't get taken up immediately".  Now, venting and ranting on a blog is
> fine, but it's not really *communicating*, any more than it was when I
> thought that the BioSQL GenBank upload code was broken, fixed it (for my
> purposes) and told no-one.  Git won't change the communication issue (in
> either direction) any more than it changes the code review process.
> 
> FWIW, I think that git looks like a good way to go, and that it could help
> encourage people to make local modifications of Biopython for their own
> benefit and in their own interests and expert area, in a way that is visible
> to the core distribution (unlike the patch submission process that is now
> implemented).  In that way it could facilitate more rapid expansion of the
> core distribution.  However, the bottlenecks of ensuring code quality,
> testing and documentation will only ease if that is taken up by the
> individuals/groups making those contributions, in addition to the core
> developers.
> 
> And yes, I know I'm late with the new GenomeDiagram docs... ;)
> 
> L.
> 
> -- 
> Dr Leighton Pritchard MRSC
> D131, Plant Pathology Programme, SCRI
> Errol Road, Invergowrie, Perth and Kinross, Scotland, DD2 5DA
> e:lpritc at scri.ac.uk       w:http://www.scri.ac.uk/staff/leightonpritchard
> gpg/pgp: 0xFEFC205C       tel:+44(0)1382 562731 x2405
> 
> 
> ______________________________________________________
> SCRI, Invergowrie, Dundee, DD2 5DA.  
> The Scottish Crop Research Institute is a charitable company limited by guarantee. 
> Registered in Scotland No: SC 29367.
> Recognised by the Inland Revenue as a Scottish Charity No: SC 006662.
> 
> 
> DISCLAIMER:
> 
> This email is from the Scottish Crop Research Institute, but the views expressed by the sender are not necessarily the views of SCRI and its subsidiaries.  This email and any files transmitted with it are confidential to the intended recipient at the e-mail address to which it has been addressed.  It may not be disclosed or used by any other than that
> addressee.
> If you are not the intended recipient you are requested to preserve this confidentiality and you must not use, disclose, copy, print or rely on
> this e-mail in any way. Please notify postmaster at scri.ac.uk quoting the name of the sender and delete the email from your system.
> 
> Although SCRI has taken reasonable precautions to ensure no viruses are present in this email, neither the Institute nor the sender accepts any responsibility for any viruses, and it is your responsibility to scan the email and the attachments (if any).
> ______________________________________________________
> 
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev


From tiagoantao at gmail.com  Tue Mar 17 13:10:18 2009
From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=)
Date: Tue, 17 Mar 2009 13:10:18 +0000
Subject: [Biopython-dev] biopython on github
In-Reply-To: <320fb6e00903170544i401fefa4gbfa2b2d542e94816@mail.gmail.com>
References: <320fb6e00902250210t2ad19536ke379e219ba6f7dae@mail.gmail.com>
	<128a885f0903142243r372026d7vdf5bbe998db3a326@mail.gmail.com>
	<20090315185443.GA30296@kunkel>
	<320fb6e00903160430h125d11a3jd100497d3e25ffb8@mail.gmail.com>
	<20090316224240.GA57054@sobchak.mgh.harvard.edu>
	<6d941f120903161711p71c7c940t1eabe933c0fa43e5@mail.gmail.com>
	<128a885f0903162045l474d0df3w2b8fad7f7f129a3b@mail.gmail.com>
	<320fb6e00903170146x59218aa0m857cab797ad3f440@mail.gmail.com>
	<6d941f120903170418k46481c8bj8c20d510314f57ee@mail.gmail.com>
	<320fb6e00903170544i401fefa4gbfa2b2d542e94816@mail.gmail.com>
Message-ID: <6d941f120903170610g161342f0ief365d68f25707c1@mail.gmail.com>

Hi,

> I'm not quite sure what you mean by agressively controlled. ?Moving to
> a DVCS really should make public experimental branches much easier.


I mean that the official release is a very controlled (a good thing!).
Development branches should be more free.

> That is a risk - especially if anyone were to go off and work in
> complete isolation without even posting anything to this mailing list.

I think our obligation is to inform people of the issue. If then
people go away and don't communicate, then it becomes their problem. I
think just a couple of sentences on the Contributing page on the wiki
would be more than enough.


> That could work - although having anyone as a single bottle neck is a
> risk, assuming you get someone to agree to the role in the first place
> ;) ?I am generally happy with the current arrangement where module
> owners have a degree of autonomy over their modules. ?I wouldn't want
> to have to approve every single minor change you (Tiago) make to
> Bio.PopGen - but I suppose occasional review and merging of code from
> Tiago's branch on request wouldn't be too onerous.

I agree. I am just trying to make this "explicit" policy. So that
everybody knows the rules of the game. If people dont agree than that
should be discussed and changed. But the point is, these kind of
management issues should be written down somewhere in a transparent
way.


>> B. The official version has a lot of quality enforcement on top.
>
> What does that mean? ?e.g. a strict policy about unit tests before
> anything goes into the main branch?

I was reading  http://biopython.org/wiki/Contributing and the main
stuff is already there (the "submitting code" place).
But the point is: the official version should be stable and reliable
(as it is now, IMHO)


>> E. Maybe some contact points should be defined for modules?
>
> Do you mean something more explicit about documenting who currently
> maintains each module?

That is my point. Makes any sense?


From chapmanb at 50mail.com  Tue Mar 17 13:04:53 2009
From: chapmanb at 50mail.com (Brad Chapman)
Date: Tue, 17 Mar 2009 09:04:53 -0400
Subject: [Biopython-dev] Preparing for Biopython 1.50 (beta)
In-Reply-To: <49BEDD9B.6030905@u.washington.edu>
References: <320fb6e00903160516yd63f61fu21ca7560562dd6dd@mail.gmail.com>
	<20090316225558.GC57054@sobchak.mgh.harvard.edu>
	<49BEDD9B.6030905@u.washington.edu>
Message-ID: <20090317130453.GF57054@sobchak.mgh.harvard.edu>

Hi David;

> I've got some 454 and Solid data you could test it on too.
> 
> Has anybody else looked into how these other two Next Gen formats might 
> complicate things?

Sweet. We definitely want to support output from them as well; it is 
great to have someone on board who is working with data from other
machines.

Peter did a pretty thorough investigation of the different formats
and wrote it up in the docs to the proposed QualityIO module:

http://github.com/peterjc/biopython-seqio-quality/blob/6fdf27393cb7318b229ff8587721e83544da968d/Bio/SeqIO/QualityIO.py

Does this make sense with your experience?

If you feel comfortable with git, Peter set up a new branch with his
code for this:

http://github.com/peterjc/biopython-seqio-quality/tree/master

and we'd be more than happy to have you testing it. Alternatively,
if you want to submit some smaller data files we can use in testing, you
could attach them to the current enhancement request:

http://bugzilla.open-bio.org/show_bug.cgi?id=2767

Thanks for the help,
Brad

> 
> Brad Chapman wrote:
> > Peter;
> >
> >   
> >> I think we should probably do another release soon 
> >>     
> >
> > Good call. +1 from me.
> >
> >   
> >> I'd like to include the following changes as part of the beta, but it
> >> would be sensible to have someone else try these out first.  Any
> >> volunteers?
> >>
> >> Bug 2767 - Bio.SeqIO support for FASTQ and QUAL files
> >>     
> >
> > The code for this looked good when I reviewed it earlier. I will
> > test it out with some solexa reads from here this week; any reason
> > not to check the patch and files into CVS? Then I can fire up my
> > coal-powered revision control system, feed two punch cards into the
> > mouth of the machine, hope the vacuum tubes don't burn out again,
> > and check it out locally.
> >
> > Brad
> > _______________________________________________
> > Biopython-dev mailing list
> > Biopython-dev at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/biopython-dev
> >   

> begin:vcard
> fn:David Schruth
> n:Schruth;David
> org:University of Washington, Department of Oceanography;The Center for Environmental Genomics
> adr;dom:616 NE Northlake Place;;Benjamin Hall IRB, Room 306;Seattle;WA;98105
> email;internet:dschruth at u.washington.edu
> title:Bioinformatics Research Consultant
> tel;work:(206) 328-7381
> tel;cell:(206) 250-9110
> x-mozilla-html:FALSE
> url:http://armbrustlab.ocean.washington.edu/people/schruth
> version:2.1
> end:vcard
> 


From tiagoantao at gmail.com  Tue Mar 17 13:19:38 2009
From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=)
Date: Tue, 17 Mar 2009 13:19:38 +0000
Subject: [Biopython-dev] biopython on github
In-Reply-To: <20090317124930.GE57054@sobchak.mgh.harvard.edu>
References: <320fb6e00903170146x59218aa0m857cab797ad3f440@mail.gmail.com>
	<C5E52504.1F20A%lpritc@scri.ac.uk>
	<20090317124930.GE57054@sobchak.mgh.harvard.edu>
Message-ID: <6d941f120903170619n4cb8d4dfr8a72f8ac1e0e896d@mail.gmail.com>

On Tue, Mar 17, 2009 at 12:49 PM, Brad Chapman <chapmanb at 50mail.com> wrote:
> There is a lot of good material in this thread for new potential
> developers. Tiago, it would make sense to condense what you've
> written and include it with the Contributing guide:
>
> http://biopython.org/wiki/Contributing


I can go ahead and try to put a summary of our discussions on that
page, if nobody opposes. The change can be rewritten afterwards or
deleted anyway. The only issue is that I can only to that on the
weekend and not before (travelling abroad from Wednsday to Friday).
What I think is needed is actually a final decision on how thigs will
progress. Will there be an official git branch? The official will
still be cvs? Where will it be hosted? These are lots of important
questions, but I think there is enough discussion to arrive at a
decision.

> (called, say, ActiveBranches). Tiago, I thought you did a page for PopGen
> earlier like this but I can't find it right now. We should keep
> communication at a high level to avoid confusing fragmentation.

Coincidentally I was editing that page today. I took the liberty of
creating a link from the documentation page to it. So it should be
reachable now.

Tiago


From p.j.a.cock at googlemail.com  Tue Mar 17 14:44:08 2009
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Tue, 17 Mar 2009 14:44:08 +0000
Subject: [Biopython-dev] biopython on github
In-Reply-To: <6d941f120903170619n4cb8d4dfr8a72f8ac1e0e896d@mail.gmail.com>
References: <320fb6e00903170146x59218aa0m857cab797ad3f440@mail.gmail.com>
	<C5E52504.1F20A%lpritc@scri.ac.uk>
	<20090317124930.GE57054@sobchak.mgh.harvard.edu>
	<6d941f120903170619n4cb8d4dfr8a72f8ac1e0e896d@mail.gmail.com>
Message-ID: <320fb6e00903170744j543f643fg6ef8d677287e2361@mail.gmail.com>

2009/3/17 Tiago Ant?o <tiagoantao at gmail.com>:
> I can go ahead and try to put a summary of our discussions on that
> page, if nobody opposes. The change can be rewritten afterwards or
> deleted anyway. The only issue is that I can only to that on the
> weekend and not before (travelling abroad from Wednsday to Friday).

Sure - by the weekend I hope we'll have come to a consensus.

> What I think is needed is actually a final decision on how thigs will
> progress. Will there be an official git branch? The official will
> still be cvs? Where will it be hosted? These are lots of important
> questions, but I think there is enough discussion to arrive at a
> decision.

I think it is still to early for a final decision, but here is my
suggested plan:

In the short term (at least until Biopython 1.50 beta is out, perhaps
until Biopython 1.50 proper is out), CVS will remain the official
repository.    Bartek will continue automatically updating the
mirrored copy on github, which will otherwise be treated as READ ONLY.
 If needs be, he may have to reimport the whole history (the tag issue
troubles me - see the other thread), so there may be some bumps along
this road.  Contributions/bug fixes can continue via bugzilla with a
patch, and contributors can also try providing a URL to their own git
branch if they prefer.  During this period I hope most (ideally all)
our active developers with CVS access will create an account on
github, and try out forking from the CVS mirror, creating their own
branches, checking in some changes, and doing some simple merges - for
example pulling code from other Biopython developer's public branches.
 This should give us the confidence to trust git and github enough to
use it for real.

i.e. For the roughly the next month, we will continue as before with
CVS for the real work, but will also try out github.

Once Biopython 1.50 final is out (hopefully by the end of April 2009,
probably sooner), we need to decide if we will actually make the more
to git on github.  At this point, I would expect this to happen by
declaring CVS read only, a static archive (and emergency fall back).
Bartek would turn off his automatic syncing.  We would then continue
working on the github branch with the full CVS history, with a core of
Biopython developers having write access to the "official" branch,
doing new work under their own personal branches for eventual merging
into the main trunk.

I'd still like to have a copy of the "official" git repository running
on biopython.org, but this may not be that easy without some technical
expertise in house to do this.  From initial discussion with the OBF
team about the idea of running git on their servers, my impression is
if we can do it ourselves, we may.  Jason Stajich actually suggested
we use github independently.

Peter

P.S. Could you all update your entry on the wiki participants page
(and if you have one, your wiki user page) to include a link to your
github account:
http://biopython.org/wiki/Participants


From biopython at maubp.freeserve.co.uk  Tue Mar 17 14:46:53 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Tue, 17 Mar 2009 14:46:53 +0000
Subject: [Biopython-dev] Preparing for Biopython 1.50 (beta)
In-Reply-To: <49BEDD9B.6030905@u.washington.edu>
References: <320fb6e00903160516yd63f61fu21ca7560562dd6dd@mail.gmail.com>
	<20090316225558.GC57054@sobchak.mgh.harvard.edu>
	<49BEDD9B.6030905@u.washington.edu>
Message-ID: <320fb6e00903170746g632f56a5hfae8a4960e77fa85@mail.gmail.com>

2009/3/16 David Schruth <dschruth at u.washington.edu>:
> I've got some 454 and Solid data you could test it on too.
>
> Has anybody else looked into how these other two Next Gen formats might
> complicate things?

Roche 454 sequencers produce their own binary SFF files (standing for
sequence file format?), but they provide tools which turn these into
standard Sanger style files using PHRED qualities.  In theory, we
might be able to parse the SFF files directly, see for example
http://blog.malde.org/index.php/2008/11/14/454-sequencing-and-parsing-the-sff-binary-format/
and the links given.  In practice, most sequencing centers using Roche
454 will be happy to provide FASTQ or FASTA+QUAL files, and the code
on Bug 2767 (or the associated experimental branch on github) should
work fine on these.
http://bugzilla.open-bio.org/show_bug.cgi?id=2767

You are free to try out the proposed code yourself now, but if you
have some particular 454 files you'd like me to check, please email me
(off the mailing list).  If you can share some real data which we
could include in Biopython for a unit test that would also be great
(but unless you tell me this explicitly, I'll only make sure we can
parse your files).

Regarding SOLiD files, they work in colour space and I am under the
impression that it doesn't make sense to convert them to sequence
space until after doing the assembly or genome mapping (in colour
space).  See for example
http://solidsoftwaretools.com/gf/project/mapreads/ i.e. It may not be
appropriate to parse SOLiD reads into Biopython SeqRecord objects, and
thus wouldn't belong in Bio.SeqIO.  That isn't to say we wouldn't want
a parser elsewhere in Biopython, perhaps under Bio.Sequencing would be
best.

Peter


From biopython at maubp.freeserve.co.uk  Tue Mar 17 14:57:49 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Tue, 17 Mar 2009 14:57:49 +0000
Subject: [Biopython-dev] biopython on github
In-Reply-To: <20090316224240.GA57054@sobchak.mgh.harvard.edu>
References: <320fb6e00902250210t2ad19536ke379e219ba6f7dae@mail.gmail.com>
	<8b34ec180902260526m3ff42f3x2a99a77d4d0fb928@mail.gmail.com>
	<320fb6e00902260600p5fb90241td1ded497c08cb901@mail.gmail.com>
	<128a885f0903121407g133ed8ctda57b21ff8adb70e@mail.gmail.com>
	<8b34ec180903121620w9c2ec46i8fed9ccb4781370e@mail.gmail.com>
	<320fb6e00903130521s69c5b3eg55b71191b1e8ff21@mail.gmail.com>
	<128a885f0903142243r372026d7vdf5bbe998db3a326@mail.gmail.com>
	<20090315185443.GA30296@kunkel>
	<320fb6e00903160430h125d11a3jd100497d3e25ffb8@mail.gmail.com>
	<20090316224240.GA57054@sobchak.mgh.harvard.edu>
Message-ID: <320fb6e00903170757s183f6f59x40549f7e3a853f06@mail.gmail.com>

On Mon, Mar 16, 2009 at 10:42 PM, Brad Chapman <chapmanb at 50mail.com> wrote:
> Peter wrote:
>> I'm thinking a news post on
>> http://news.open-bio.org/news/category/obf-projects/biopython/ about
>> version control would be a good idea at this point. ?How about this -
>
> This is great, and I would move the last paragraph describing
> the Git repository to the beginning; start with what we are doing and
> then describe the rationale. This should help for those with ADD, and
> also give more prominent credit to Bartek, Giovanni and you for the
> work that went into this.

Good idea about the reordering - done, and published:
http://news.open-bio.org/news/2009/03/biopython-and-version-control-systems/
It will also show up on http://biopython.org/wiki/News via the RSS feed.

Peter


From rodrigo_faccioli at uol.com.br  Tue Mar 17 15:30:48 2009
From: rodrigo_faccioli at uol.com.br (Rodrigo faccioli)
Date: Tue, 17 Mar 2009 12:30:48 -0300
Subject: [Biopython-dev] PDB Parser error
Message-ID: <3715adb70903170830x61bb6e3bl4412a8cf1504d80c@mail.gmail.com>

I built a relational database in PostgreSQL. This database stores some
informations form PDB file. These informations are about its sequence, atoms
and sbonds. Now, I'm building a parser for this my database which I want to
load it in a biopython PDB parser structure. The idea is  keep on whole my
souce-code  based in biopython PDB parser, because will be necessary to do
some operations with these informations.

So, I study the Bio.PDB directory and I read in the PDBPaerser.py file, its
_parse_coordinates method where there is some methods about initialization
structure. I run them in my code. However, is showing the message below.
Traceback (most recent call last):
  File "src/testefcfrpPDB.py", line 32, in <module>
    main()
  File "src/testefcfrpPDB.py", line 30, in main
    structure = FcfrpPDB.getPDBFile(id)
  File "/home/faccioli/workspace/blast/src/FcfrpPDB.py", line 67, in
getPDBFile
    return fcfrpPDBParser.loadStructureFromDatabase(id)
  File "/home/faccioli/workspace/blast/src/FcfrpPDBParser.py", line 48, in
loadStructureFromDatabase
    self._structure_builder.init_atom(D_Atoms[i].get_id(),
D_Atoms[i].get_coord(), D_Atoms[i].get_bfactor(),D_Atoms[i].get_occupancy()
,D_Atoms[i].get_altloc(), D_Atoms[i].get_fullname(),
D_Atoms[i].get_serial_number())
  File
"/usr/lib/python2.5/site-packages/biopython-1.49-py2.5-linux-i686.egg/Bio/PDB/StructureBuilder.py",
line 182, in init_atom
    if residue.has_id(name):
  File
"/usr/lib/python2.5/site-packages/biopython-1.49-py2.5-linux-i686.egg/Bio/PDB/Entity.py",
line 96, in has_id
    return self.child_dict.has_key(id)
TypeError: list objects are unhashable

This post is my first post in biopython developer's list and I don't know
what is the its process to send a code.

Thanks for any help.

-- 
Rodrigo Antonio Faccioli
Ph.D Student in Electrical Engineering
University of Sao Paulo - USP
Engineering School of Sao Carlos - EESC
Department of Electrical Engineering - SEL
Intelligent System in Structure Bioinformatics
http://laips.sel.eesc.usp.br
Phone: 55 (16) 3373-9366 Ext 229
Curriculum Lattes - http://lattes.cnpq.br/1025157978990218


From lpritc at scri.ac.uk  Tue Mar 17 15:42:55 2009
From: lpritc at scri.ac.uk (Leighton Pritchard)
Date: Tue, 17 Mar 2009 15:42:55 +0000
Subject: [Biopython-dev] Preparing for Biopython 1.50 (beta)
In-Reply-To: <320fb6e00903170746g632f56a5hfae8a4960e77fa85@mail.gmail.com>
Message-ID: <C5E5757F.1F268%lpritc@scri.ac.uk>

Hi,

On 17/03/2009 14:46, "Peter" <biopython at maubp.freeserve.co.uk> wrote:

> 2009/3/16 David Schruth <dschruth at u.washington.edu>:
>> I've got some 454 and Solid data you could test it on too.
>> 
>> Has anybody else looked into how these other two Next Gen formats might
>> complicate things?

> Regarding SOLiD files, they work in colour space and I am under the
> impression that it doesn't make sense to convert them to sequence
> space until after doing the assembly or genome mapping (in colour
> space).  See for example
> http://solidsoftwaretools.com/gf/project/mapreads/ i.e. It may not be
> appropriate to parse SOLiD reads into Biopython SeqRecord objects, and
> thus wouldn't belong in Bio.SeqIO.  That isn't to say we wouldn't want
> a parser elsewhere in Biopython, perhaps under Bio.Sequencing would be
> best.

That's my understanding and practical experience, too.  For lurkers' benefit
SOLiD data looks like this:

>4_48_57_F3
T33111210002200023033000000211000101
>4_48_89_F3
T22002312223133113013303322223322223
>4_48_95_F3
T22300102100203322101021130203000201

where each of the four values (0,1,2,3) corresponds to one of 16 dimers (AA,
AC, AG, AT, CA, ...), i.e. Each colour value is degenerate for four possible
dimers.  This system is described at
http://www3.appliedbiosystems.com/cms/groups/mcb_marketing/documents/general
documents/cms_057559.pdf.

The use of an appropriate colour->dimer mapping makes it possible, in
principle, to go from colour space to nucleotide sequence, so long as a
single base of the sequence is known.  In reality a single colour space read
error silently makes the rest of the SOLiD read mapping incorrect.
Practical use of SOLiD data involves mapping the sequence reads to a
reference sequence (either by converting the reference to colour space, or
dynamic programming) prior to conversion to 'base space'.

The mapping process is probably better handled by dedicated applications,
and I think the role for Biopython in this is to parse their output.  GFF
is, awkwardly enough, a popular output format for this kind of analysis.

L.


-- 
Dr Leighton Pritchard MRSC
D131, Plant Pathology Programme, SCRI
Errol Road, Invergowrie, Perth and Kinross, Scotland, DD2 5DA
e:lpritc at scri.ac.uk       w:http://www.scri.ac.uk/staff/leightonpritchard
gpg/pgp: 0xFEFC205C       tel:+44(0)1382 562731 x2405


______________________________________________________
SCRI, Invergowrie, Dundee, DD2 5DA.  
The Scottish Crop Research Institute is a charitable company limited by guarantee. 
Registered in Scotland No: SC 29367.
Recognised by the Inland Revenue as a Scottish Charity No: SC 006662.


DISCLAIMER:

This email is from the Scottish Crop Research Institute, but the views expressed by the sender are not necessarily the views of SCRI and its subsidiaries.  This email and any files transmitted with it are confidential to the intended recipient at the e-mail address to which it has been addressed.  It may not be disclosed or used by any other than that
addressee.
If you are not the intended recipient you are requested to preserve this confidentiality and you must not use, disclose, copy, print or rely on
this e-mail in any way. Please notify postmaster at scri.ac.uk quoting the name of the sender and delete the email from your system.

Although SCRI has taken reasonable precautions to ensure no viruses are present in this email, neither the Institute nor the sender accepts any responsibility for any viruses, and it is your responsibility to scan the email and the attachments (if any).
______________________________________________________


From biopython at maubp.freeserve.co.uk  Tue Mar 17 16:01:25 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Tue, 17 Mar 2009 16:01:25 +0000
Subject: [Biopython-dev] PDB Parser error
In-Reply-To: <3715adb70903170830x61bb6e3bl4412a8cf1504d80c@mail.gmail.com>
References: <3715adb70903170830x61bb6e3bl4412a8cf1504d80c@mail.gmail.com>
Message-ID: <320fb6e00903170901v6533910bl57ddd534dc05cf51@mail.gmail.com>

On Tue, Mar 17, 2009 at 3:30 PM, Rodrigo faccioli
<rodrigo_faccioli at uol.com.br> wrote:
> I built a relational database in PostgreSQL. This database stores some
> informations form PDB file. These informations are about its sequence, atoms
> and sbonds. Now, I'm building a parser for this my database which I want to
> load it in a biopython PDB parser structure. The idea is ?keep on whole my
> souce-code ?based in biopython PDB parser, because will be necessary to do
> some operations with these informations.
>
> So, I study the Bio.PDB directory and I read in the PDBPaerser.py file, its
> _parse_coordinates method where there is some methods about initialization
> structure. I run them in my code. However, is showing the message below.
> Traceback (most recent call last):
> ?File "src/testefcfrpPDB.py", line 32, in <module>
> ...
> ?File
> "/usr/lib/python2.5/site-packages/biopython-1.49-py2.5-linux-i686.egg/Bio/PDB/Entity.py",
> line 96, in has_id
> ? ?return self.child_dict.has_key(id)
> TypeError: list objects are unhashable
>
> This post is my first post in biopython developer's list and I don't know
> what is the its process to send a code.

Its hard to say without seeing your full code (and even then, without
the database it would be difficult to reproduce it).  As you have a
TypeError, I suspect you have something as the wrong datatype - maybe
a list that should be a string or something.

If you want to share the full file testefcfrpPDB.py you could post it
on http://pastebin.com/ or something (do you have your own website?).

Peter


From biopython at maubp.freeserve.co.uk  Tue Mar 17 17:59:43 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Tue, 17 Mar 2009 17:59:43 +0000
Subject: [Biopython-dev] biopython on github
In-Reply-To: <320fb6e00903170744j543f643fg6ef8d677287e2361@mail.gmail.com>
References: <320fb6e00903170146x59218aa0m857cab797ad3f440@mail.gmail.com>
	<C5E52504.1F20A%lpritc@scri.ac.uk>
	<20090317124930.GE57054@sobchak.mgh.harvard.edu>
	<6d941f120903170619n4cb8d4dfr8a72f8ac1e0e896d@mail.gmail.com>
	<320fb6e00903170744j543f643fg6ef8d677287e2361@mail.gmail.com>
Message-ID: <320fb6e00903171059r7a5528d5i19bf5fed9cfd8a63@mail.gmail.com>

I wrote:
> In the short term (at least until Biopython 1.50 beta is out, perhaps
> until Biopython 1.50 proper is out), CVS will remain the official
> repository. ?...  During this period I hope most (ideally all) our active
> developers with CVS access will create an account on github, and
> try out forking from the CVS mirror, creating their own branches,
> checking in some changes, and doing some simple merges - for
> example pulling code from other Biopython developer's public
> branches.  This should give us the confidence to trust git and
> github enough to use it for real.

Brad and I have been trying this out in practice, and it seems to work OK.

I started a fork to test the patches for Bug 2767, adding quality
parsers to Bio.SeqIO,
http://github.com/chapmanb/biopython-seqio-quality/tree/master
I made a few incremental checkins, pushed to github one by one.

Brad then took a fork of this in order to make some minor changes and
fix a typo in the documentation :
http://github.com/chapmanb/biopython-seqio-quality/tree/master

At this point the "network" diagrams showed up the two branches as
diverging.  Brad then sent me a "pull" request, suggesting I might
want to pull his work into my branch.

Using the git command line tool, I was able to pull and merge Brad's
changes (as I had made no changes in the meantime this could be done
automatically), and then push the merged version back up to github on
my branch.  At this point my branch and brad's agreed once again, and
the "network" diagram no longer shows both.  Note that my branch now
includes a commit from Brad.

At this point, Brad may choose to delete his branch, or perhaps make
further changes.

Now all this worked, but I was wondering if the github web interface
could have simplified any of this, if I'd only know where to click.
For example, does github offer any way to view a diff between to
branches?  Or, as I suspect, do they simply expect you to use the git
tools directly for this?

Peter


From bugzilla-daemon at portal.open-bio.org  Tue Mar 17 18:06:00 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 17 Mar 2009 14:06:00 -0400
Subject: [Biopython-dev] [Bug 2767] Bio.SeqIO support for FASTQ and QUAL
	files
In-Reply-To: <bug-2767-42@http.bugzilla.open-bio.org/>
Message-ID: <200903171806.n2HI60op012464@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2767


------- Comment #11 from biopython-bugzilla at maubp.freeserve.co.uk  2009-03-17 14:06 EST -------
(In reply to comment #10)
> I've made these changes available on a test github branch,
> http://github.com/peterjc/biopython-seqio-quality/tree/master
> 
> This doesn't include all the example files for the unit tests yet.
> 

I've now checked this into CVS.  The extra example files will follow later...
leaving this bug open until that is done.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From dalloliogm at gmail.com  Tue Mar 17 18:35:04 2009
From: dalloliogm at gmail.com (Giovanni Marco Dall'Olio)
Date: Tue, 17 Mar 2009 19:35:04 +0100
Subject: [Biopython-dev] biopython on github
In-Reply-To: <320fb6e00903171059r7a5528d5i19bf5fed9cfd8a63@mail.gmail.com>
References: <320fb6e00903170146x59218aa0m857cab797ad3f440@mail.gmail.com>
	<C5E52504.1F20A%lpritc@scri.ac.uk>
	<20090317124930.GE57054@sobchak.mgh.harvard.edu>
	<6d941f120903170619n4cb8d4dfr8a72f8ac1e0e896d@mail.gmail.com>
	<320fb6e00903170744j543f643fg6ef8d677287e2361@mail.gmail.com>
	<320fb6e00903171059r7a5528d5i19bf5fed9cfd8a63@mail.gmail.com>
Message-ID: <5aa3b3570903171135nb49de80h6c6ee0930c147d29@mail.gmail.com>

On Tue, Mar 17, 2009 at 6:59 PM, Peter <biopython at maubp.freeserve.co.uk> wrote:
>
> Brad and I have been trying this out in practice, and it seems to work OK.
>
> I started a fork to test the patches for Bug 2767, adding quality
> parsers to Bio.SeqIO,
> http://github.com/chapmanb/biopython-seqio-quality/tree/master
> I made a few incremental checkins, pushed to github one by one.
>
> Brad then took a fork of this in order to make some minor changes and
> fix a typo in the documentation :
> http://github.com/chapmanb/biopython-seqio-quality/tree/master

Yes, basically this is the way it should be working.
Usually I do something similar, only I use more the procedure explained here:
- http://www.kernel.org/pub/software/scm/git/docs/v1.4.4.4/tutorial.html
(section 'Using git for collaboration')

I fetch the other branch and call it as master:otheruser-incoming,
then compare the two branches with gitk or with git log
master..otheruser-incoming.


>
>
> At this point the "network" diagrams showed up the two branches as
> diverging. ?Brad then sent me a "pull" request, suggesting I might
> want to pull his work into my branch.
>
> Using the git command line tool, I was able to pull and merge Brad's
> changes (as I had made no changes in the meantime this could be done
> automatically),

If you go on 'Fork Queue' on github, it should show other people's commits.
However, I don't trust doing this with a web interface... moreover, it
seems to not work properly some times (it is not clear how it defines
if a commit will 'apply cleanly' or not)

On the same page, there is a 'pull merge request' button, which (I
never tried it) should send a merge request to the selected recipents.

> and then push the merged version back up to github on
> my branch. ?At this point my branch and brad's agreed once again, and
> the "network" diagram no longer shows both. ?Note that my branch now
> includes a commit from Brad.

Yes, this is right. The graph only shows the commits which differ, so
it included your two branches as a single one.

If you fell comfortable with the git mechanisms, maybe later you could
create a second branch in the 'biopython/biopython' repository, and
call it 'accepted-github-changes', or something like that, which will
collect all the changes that can be submitted to the cvs.


> At this point, Brad may choose to delete his branch, or perhaps make
> further changes.

I wonder if a good strategy with this is create branches only to test
specific changes, and then delete them.
If Brad keeps his branch, later he will have to remember to update it,
which maybe is less trouble than deleting a branch and creating it
when necessary.

> Now all this worked, but I was wondering if the github web interface
> could have simplified any of this, if I'd only know where to click.
> For example, does github offer any way to view a diff between to
> branches? ?Or, as I suspect, do they simply expect you to use the git
> tools directly for this?

For my knowledge, there are not such tools :-(.
You must rely on the commit's messages to identify the differences
between different branches.
Maybe they will implement such feature at some point.

>
> Peter
>
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev


--

My blog on bioinformatics (now in English): http://bioinfoblog.it


From dalloliogm at gmail.com  Tue Mar 17 18:36:24 2009
From: dalloliogm at gmail.com (Giovanni Marco Dall'Olio)
Date: Tue, 17 Mar 2009 19:36:24 +0100
Subject: [Biopython-dev] biopython on github
In-Reply-To: <320fb6e00903170744j543f643fg6ef8d677287e2361@mail.gmail.com>
References: <320fb6e00903170146x59218aa0m857cab797ad3f440@mail.gmail.com>
	<C5E52504.1F20A%lpritc@scri.ac.uk>
	<20090317124930.GE57054@sobchak.mgh.harvard.edu>
	<6d941f120903170619n4cb8d4dfr8a72f8ac1e0e896d@mail.gmail.com>
	<320fb6e00903170744j543f643fg6ef8d677287e2361@mail.gmail.com>
Message-ID: <5aa3b3570903171136k3dc616a3hc937381d940cd305@mail.gmail.com>

2009/3/17 Peter Cock <p.j.a.cock at googlemail.com>

> 2009/3/17 Tiago Ant?o <tiagoantao at gmail.com>:
>
> I'd still like to have a copy of the "official" git repository running
> on biopython.org, but this may not be that easy without some technical
> expertise in house to do this.  From initial discussion with the OBF
> team about the idea of running git on their servers, my impression is
> if we can do it ourselves, we may.  Jason Stajich actually suggested
> we use github independently.


Well, basically it is not strictly necessary to have git installed on their
computers to create a mirror.
You can just create the clone on your computer, raw-ly copy the files there,
and then you will be able to push the new changes with an ssh access.
Since git is a distributed source control system, it doesn't require to
configure a server part as with cvs :-)

To my knowledge, the pygr project (also a bioinformatics suite in python)
have an official repository hosted in gitourious, and a mirror in github to
collect patches from there.


-- 

My blog on bioinformatics (now in English): http://bioinfoblog.it


From tiagoantao at gmail.com  Tue Mar 17 19:09:13 2009
From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=)
Date: Tue, 17 Mar 2009 19:09:13 +0000
Subject: [Biopython-dev] biopython on github
In-Reply-To: <5aa3b3570903171136k3dc616a3hc937381d940cd305@mail.gmail.com>
References: <320fb6e00903170146x59218aa0m857cab797ad3f440@mail.gmail.com>
	<C5E52504.1F20A%lpritc@scri.ac.uk>
	<20090317124930.GE57054@sobchak.mgh.harvard.edu>
	<6d941f120903170619n4cb8d4dfr8a72f8ac1e0e896d@mail.gmail.com>
	<320fb6e00903170744j543f643fg6ef8d677287e2361@mail.gmail.com>
	<5aa3b3570903171136k3dc616a3hc937381d940cd305@mail.gmail.com>
Message-ID: <6d941f120903171209g751b5b86p797e79b333972301@mail.gmail.com>

OK, in order to exercise and try github development I have forked a
branch to work on the PopGen code. The idea of the branch is to serve
as a platform for merging with the "official" branch. So, the idea is:

1. Official branch - The stable thingy
2. PopGen stabilizer branch - The place to merge contributions from
PopGen development branches. The idea is that people can go crazy on
their own branches and this intermediate one serves as a point to
stabilize (unit test, documentation, QA, ...) before the commit to the
official one
3. Crazy branches - Develop your crazy idea. I have 3 ideas myself:
One for Jason's structure code, one for my LDNe code and another for
statistics. Many more welcomed....

The development procedure would be like this:
A. People would have all the fun on their development branches
B. When they felt confident they would submit their code to the
stabilizer branch, where we would check that all the important things
were there: unit test, code comments, QA, documentation
C. When things were in good shape, we would propose changes to the
official branch

And, by the way, bug fixes of existing production would also be done
on the stabilizer branch.

Does this make any sense?

In my view, with things like git, a policy like this encourages both
innovation while preserving stability and robustness of the official
branch.

Tiago

On Tue, Mar 17, 2009 at 6:36 PM, Giovanni Marco Dall'Olio
<dalloliogm at gmail.com> wrote:
>
>
> 2009/3/17 Peter Cock <p.j.a.cock at googlemail.com>
>>
>> 2009/3/17 Tiago Ant?o <tiagoantao at gmail.com>:
>>
>> I'd still like to have a copy of the "official" git repository running
>> on biopython.org, but this may not be that easy without some technical
>> expertise in house to do this. ?From initial discussion with the OBF
>> team about the idea of running git on their servers, my impression is
>> if we can do it ourselves, we may. ?Jason Stajich actually suggested
>> we use github independently.
>
> Well, basically it is not strictly necessary to have git installed on their
> computers to create a mirror.
> You can just create the clone on your computer, raw-ly copy the files there,
> and then you will be able to push the new changes with an ssh access.
> Since git is a distributed source control system, it doesn't require to
> configure a server part as with cvs :-)
>
> To my knowledge, the pygr project (also a bioinformatics suite in python)
> have an official repository hosted in gitourious, and a mirror in github to
> collect patches from there.
>
>
>
>
> --
>
> My blog on bioinformatics (now in English): http://bioinfoblog.it
>


-- 
 "A man who dares to waste one hour of time has not discovered the
value of life" - Charles Darwin


From mailinglist.honeypot at gmail.com  Tue Mar 17 19:21:57 2009
From: mailinglist.honeypot at gmail.com (Steve Lianoglou)
Date: Tue, 17 Mar 2009 15:21:57 -0400
Subject: [Biopython-dev] biopython on github
In-Reply-To: <6d941f120903171209g751b5b86p797e79b333972301@mail.gmail.com>
References: <320fb6e00903170146x59218aa0m857cab797ad3f440@mail.gmail.com>
	<C5E52504.1F20A%lpritc@scri.ac.uk>
	<20090317124930.GE57054@sobchak.mgh.harvard.edu>
	<6d941f120903170619n4cb8d4dfr8a72f8ac1e0e896d@mail.gmail.com>
	<320fb6e00903170744j543f643fg6ef8d677287e2361@mail.gmail.com>
	<5aa3b3570903171136k3dc616a3hc937381d940cd305@mail.gmail.com>
	<6d941f120903171209g751b5b86p797e79b333972301@mail.gmail.com>
Message-ID: <7063D4EA-D827-4D91-A15C-53F148660D96@gmail.com>

Hi,

I really just loom around here, but a slight correction/point:

> A. People would have all the fun on their development branches
> B. When they felt confident they would submit their code to the
> stabilizer branch, where we would check that all the important things
> were there: unit test, code comments, QA, documentation
> C. When things were in good shape, we would propose changes to the
> official branch

I'm very much a git noob, and from having been following this thread a  
bit, it seems that many of us are, so for the noobs:

I think somewhere around B, the person wanting to commit new code  
would have to rebase[1] their branch against the official "stabilizer  
branch" (that they had originally forked from). This would put the  
onus of fixing any breaks and keeping track of recent developments on  
the branch you propose to merge into (since you originally branched),  
on the person who is writing the new code.

This makes it easier for the "official keepers of the one true branch"  
to accept new patches, since they know the patch will work on the  
latest version.

Anyway, I think I just wanted to point out that rebase was there since  
I don't think there's anything really equivalent in the CVS/SVN world.

-steve

[1] rebase : http://www.kernel.org/pub/software/scm/git/docs/git-rebase.html


From tiagoantao at gmail.com  Tue Mar 17 19:27:10 2009
From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=)
Date: Tue, 17 Mar 2009 19:27:10 +0000
Subject: [Biopython-dev] biopython on github
In-Reply-To: <7063D4EA-D827-4D91-A15C-53F148660D96@gmail.com>
References: <320fb6e00903170146x59218aa0m857cab797ad3f440@mail.gmail.com>
	<C5E52504.1F20A%lpritc@scri.ac.uk>
	<20090317124930.GE57054@sobchak.mgh.harvard.edu>
	<6d941f120903170619n4cb8d4dfr8a72f8ac1e0e896d@mail.gmail.com>
	<320fb6e00903170744j543f643fg6ef8d677287e2361@mail.gmail.com>
	<5aa3b3570903171136k3dc616a3hc937381d940cd305@mail.gmail.com>
	<6d941f120903171209g751b5b86p797e79b333972301@mail.gmail.com>
	<7063D4EA-D827-4D91-A15C-53F148660D96@gmail.com>
Message-ID: <6d941f120903171227o54bf9d36s645404de9962eed3@mail.gmail.com>

2009/3/17 Steve Lianoglou <mailinglist.honeypot at gmail.com>:
> I think somewhere around B, the person wanting to commit new code would have
> to rebase[1] their branch against the official "stabilizer branch" (that


So, if I understand well, anyone wanting to submit a change to the
official version would be responsible for rebasing, right?

PS - being a git noob and a longtime cvs/svn user and manager I much
appreciated Randal Schwartz google talk at:
http://www.youtube.com/watch?v=8dhZ9BXQgc4 Especially aroung 30
minutes it is really informative.


From mailinglist.honeypot at gmail.com  Tue Mar 17 19:34:11 2009
From: mailinglist.honeypot at gmail.com (Steve Lianoglou)
Date: Tue, 17 Mar 2009 15:34:11 -0400
Subject: [Biopython-dev] biopython on github
In-Reply-To: <6d941f120903171227o54bf9d36s645404de9962eed3@mail.gmail.com>
References: <320fb6e00903170146x59218aa0m857cab797ad3f440@mail.gmail.com>
	<C5E52504.1F20A%lpritc@scri.ac.uk>
	<20090317124930.GE57054@sobchak.mgh.harvard.edu>
	<6d941f120903170619n4cb8d4dfr8a72f8ac1e0e896d@mail.gmail.com>
	<320fb6e00903170744j543f643fg6ef8d677287e2361@mail.gmail.com>
	<5aa3b3570903171136k3dc616a3hc937381d940cd305@mail.gmail.com>
	<6d941f120903171209g751b5b86p797e79b333972301@mail.gmail.com>
	<7063D4EA-D827-4D91-A15C-53F148660D96@gmail.com>
	<6d941f120903171227o54bf9d36s645404de9962eed3@mail.gmail.com>
Message-ID: <711E86ED-F220-4E97-84BC-9E94753E111A@gmail.com>

On Mar 17, 2009, at 3:27 PM, Tiago Ant?o wrote:
> 2009/3/17 Steve Lianoglou <mailinglist.honeypot at gmail.com>:
>> I think somewhere around B, the person wanting to commit new code  
>> would have
>> to rebase[1] their branch against the official "stabilizer  
>> branch" (that
>
> So, if I understand well, anyone wanting to submit a change to the
> official version would be responsible for rebasing, right?

And if I understand it well, then I think you're right.

I think that's a reasonable policy. That puts the responsibility to  
ensure that any new code I write works with whatever has been approved  
already on me, and not you.

While this may require a bit extra responsibility on the committer,  
I'd be surprised if it would be enough to deter any new would-be   
committers from taking a shot at contributing code (maybe it would? I  
guess it's debatable).

> PS - being a git noob and a longtime cvs/svn user and manager I much
> appreciated Randal Schwartz google talk at:
> http://www.youtube.com/watch?v=8dhZ9BXQgc4 Especially aroung 30
> minutes it is really informative.

Sweet.

To be honest, the only video I ever saw of git was Linus' SVN-bash  
google talk, which somehow put me off from considering git longer than  
I should have, so this is a good link to have :-)

Thanks,
-steve

--
Steve Lianoglou
Graduate Student: Physiology, Biophysics and Systems Biology
Weill Medical College of Cornell University

http://cbio.mskcc.org/~lianos


From biopython at maubp.freeserve.co.uk  Tue Mar 17 20:21:45 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Tue, 17 Mar 2009 20:21:45 +0000
Subject: [Biopython-dev] biopython on github
In-Reply-To: <6d941f120903171209g751b5b86p797e79b333972301@mail.gmail.com>
References: <320fb6e00903170146x59218aa0m857cab797ad3f440@mail.gmail.com>
	<C5E52504.1F20A%lpritc@scri.ac.uk>
	<20090317124930.GE57054@sobchak.mgh.harvard.edu>
	<6d941f120903170619n4cb8d4dfr8a72f8ac1e0e896d@mail.gmail.com>
	<320fb6e00903170744j543f643fg6ef8d677287e2361@mail.gmail.com>
	<5aa3b3570903171136k3dc616a3hc937381d940cd305@mail.gmail.com>
	<6d941f120903171209g751b5b86p797e79b333972301@mail.gmail.com>
Message-ID: <320fb6e00903171321y4b94f220h7d2d1172ee085e15@mail.gmail.com>

2009/3/17 Tiago Ant?o <tiagoantao at gmail.com>:
> OK, in order to exercise and try github development I have forked a
> branch to work on the PopGen code. The idea of the branch is to serve
> as a platform for merging with the "official" branch. So, the idea is:
>
> 1. Official branch - The stable thingy
> 2. PopGen stabilizer branch - The place to merge contributions from
> PopGen development branches. The idea is that people can go crazy on
> their own branches and this intermediate one serves as a point to
> stabilize (unit test, documentation, QA, ...) before the commit to the
> official one
> 3. Crazy branches - Develop your crazy idea. I have 3 ideas myself:
> One for Jason's structure code, one for my LDNe code and another for
> statistics. Many more welcomed....
>
> The development procedure would be like this:
> A. People would have all the fun on their development branches
> B. When they felt confident they would submit their code to the
> stabilizer branch, where we would check that all the important things
> were there: unit test, code comments, QA, documentation
> C. When things were in good shape, we would propose changes to the
> official branch
>
> And, by the way, bug fixes of existing production would also be done
> on the stabilizer branch.
>
> Does this make any sense?

Totally.  But keep in mind the current "official" git branch (the one
being updated from CVS) may get nuked if we have to redo the import to
fix the missing version tags - so I would suggest you name your
branches with "test" or "provisional" or something temporary in the
text for now.

> In my view, with things like git, a policy like this encourages both
> innovation while preserving stability and robustness of the official
> branch.

Yes - and this like the right approach for Bio.PopGen, with you acting
as the gatekeeper.

Peter


From chapmanb at 50mail.com  Tue Mar 17 21:34:14 2009
From: chapmanb at 50mail.com (Brad Chapman)
Date: Tue, 17 Mar 2009 17:34:14 -0400
Subject: [Biopython-dev] biopython on github
In-Reply-To: <320fb6e00903171059r7a5528d5i19bf5fed9cfd8a63@mail.gmail.com>
References: <320fb6e00903170146x59218aa0m857cab797ad3f440@mail.gmail.com>
	<C5E52504.1F20A%lpritc@scri.ac.uk>
	<20090317124930.GE57054@sobchak.mgh.harvard.edu>
	<6d941f120903170619n4cb8d4dfr8a72f8ac1e0e896d@mail.gmail.com>
	<320fb6e00903170744j543f643fg6ef8d677287e2361@mail.gmail.com>
	<320fb6e00903171059r7a5528d5i19bf5fed9cfd8a63@mail.gmail.com>
Message-ID: <20090317213414.GK57054@sobchak.mgh.harvard.edu>

Hi Peter;

> Using the git command line tool, I was able to pull and merge Brad's
> changes (as I had made no changes in the meantime this could be done
> automatically), and then push the merged version back up to github on
> my branch.  At this point my branch and brad's agreed once again, and
> the "network" diagram no longer shows both.  Note that my branch now
> includes a commit from Brad.

Sweet. Glad that worked. I deleted my branch (edit->delete
repository).

While doing so, I noticed that there is also a 'Repository
Collaborators' section within the 'edit' page. So, another working
model is to have multiple users simultaneously editing one forked
revision. If you are already communicating on the work through the
mailing list or wiki, this is more like CVS/SVN then the branching
model.

> Now all this worked, but I was wondering if the github web interface
> could have simplified any of this, if I'd only know where to click.
> For example, does github offer any way to view a diff between to
> branches?  Or, as I suspect, do they simply expect you to use the git
> tools directly for this?

What was the command you used for this? git diff is still befuddling
to me.

Brad


From bugzilla-daemon at portal.open-bio.org  Wed Mar 18 14:18:39 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 18 Mar 2009 10:18:39 -0400
Subject: [Biopython-dev] [Bug 2777] [Solution is one line change!] Entity
	sorting altered by detach_child() calls
In-Reply-To: <bug-2777-42@http.bugzilla.open-bio.org/>
Message-ID: <200903181418.n2IEIdIm003158@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2777


biopython-bugzilla at maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED


------- Comment #5 from biopython-bugzilla at maubp.freeserve.co.uk  2009-03-18 10:18 EST -------
Fix checked into CVS as Bio/PDB/Entity.py revision 1.26, marking as fixed.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From biopython at maubp.freeserve.co.uk  Wed Mar 18 15:07:42 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Wed, 18 Mar 2009 15:07:42 +0000
Subject: [Biopython-dev] Preparing for Biopython 1.50 (beta)
In-Reply-To: <320fb6e00903160516yd63f61fu21ca7560562dd6dd@mail.gmail.com>
References: <320fb6e00903160516yd63f61fu21ca7560562dd6dd@mail.gmail.com>
Message-ID: <320fb6e00903180807u4a0f7a5aqaa91f20b40891ca4@mail.gmail.com>

On Mon, Mar 16, 2009 at 12:16 PM, Peter <biopython at maubp.freeserve.co.uk> wrote:
> Bug 2767 - Bio.SeqIO support for FASTQ and QUAL files

That's in CVS now, Brad and I have used it a bit, but further testing
before the beta wouldn't hurt.

> Bug 2551 - Adding advanced __getitem__ to generic alignment, e.g.
> align[1:2,5:-5]

Anyone want try this out?
http://bugzilla.open-bio.org/show_bug.cgi?id=2551

> Any other nominations for Biopython 1.50?

Other candidates with patches that have since come to mind:

Bug 2733 - Runing unit tests where Biopthyon wasn't built from source
http://bugzilla.open-bio.org/show_bug.cgi?id=2733
This seemed patch seemed OK from both my and Bruce's testing.

Bug 2738 - Speed up GenBank parsing, in particular location parsing
http://bugzilla.open-bio.org/show_bug.cgi?id=2738
I would want to run some theses with EMBL files before committing this.

Bug 2745 - Bio.GenBank.LocationParserError with a GenBank CON file
http://bugzilla.open-bio.org/show_bug.cgi?id=2745
I'd like to change CONTIG line parsing to just use a string (or a list
of strings).

Peter


From nuin at genedrift.org  Wed Mar 18 19:50:28 2009
From: nuin at genedrift.org (Paulo Nuin)
Date: Wed, 18 Mar 2009 15:50:28 -0400
Subject: [Biopython-dev] biopython on github
In-Reply-To: <320fb6e00903161007p3e36b6d3j29e4c319c762576a@mail.gmail.com>
References: <8b34ec180902231029u7a9d003r533af7f078f4a8e2@mail.gmail.com>	
	<8b34ec180903121620w9c2ec46i8fed9ccb4781370e@mail.gmail.com>	
	<320fb6e00903130521s69c5b3eg55b71191b1e8ff21@mail.gmail.com>	
	<128a885f0903142243r372026d7vdf5bbe998db3a326@mail.gmail.com>	
	<20090315185443.GA30296@kunkel>	
	<320fb6e00903160430h125d11a3jd100497d3e25ffb8@mail.gmail.com>	
	<8b34ec180903160724h2e239fafi22d8f5fa9c1de7cc@mail.gmail.com>	
	<320fb6e00903160800s36b8231fo57e0a11506f8635d@mail.gmail.com>	
	<8b34ec180903160955m3d427927wce61940f51cf5337@mail.gmail.com>	
	<49BE8532.9040701@genedrift.org>
	<320fb6e00903161007p3e36b6d3j29e4c319c762576a@mail.gmail.com>
Message-ID: <49C15084.8040208@genedrift.org>

Peter wrote:
> On Mon, Mar 16, 2009 at 4:58 PM, Paulo Nuin <nuin at genedrift.org> wrote:
>   
>> No problem on Vista.
>>
>> Git (version 1.5.6.1-preview20080701)
>>
>> Paulo
>>     
>
> Hi Paulo,
>
> Could you be a bit more precise about the version are you using and
> where got it from? i.e. Are you using cygwin or the Windows native
> port, http://code.google.com/p/msysgit/
>   
I'm using msysgit version 1.5.6.


> And did you mean in general you have no problems with git on Windows
> Vista, or have you also tried fetching Biopython from github,
> building, testing (and installing it)?  For example, are there any new
> line issues from the unit tests?  This is one area where CVS and git
> may differ slightly...
>   
I'm using Github to store a couple of projects and this version is 
working great. Also Eclipse addon is also fine. I cloned BioPython but 
haven't tried installing or building it.

Paulo


From bugzilla-daemon at portal.open-bio.org  Thu Mar 19 13:42:23 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 19 Mar 2009 09:42:23 -0400
Subject: [Biopython-dev] [Bug 2654] Bio.Blast.NCBIStandalone does not
	support the output file argument
In-Reply-To: <bug-2654-42@http.bugzilla.open-bio.org/>
Message-ID: <200903191342.n2JDgN3p016978@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2654


yvan.strahm at bccs.uib.no changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |yvan.strahm at bccs.uib.no


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Thu Mar 19 17:08:16 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 19 Mar 2009 13:08:16 -0400
Subject: [Biopython-dev] [Bug 2654] Bio.Blast.NCBIStandalone does not
	support the output file argument
In-Reply-To: <bug-2654-42@http.bugzilla.open-bio.org/>
Message-ID: <200903191708.n2JH8GqS032350@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2654


biopython-bugzilla at maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED


------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk  2009-03-19 13:08 EST -------
Fixed in Bio/Blast/NCBIStandalone.py CVS revision 1.86
http://cvs.biopython.org/cgi-bin/viewcvs/viewcvs.cgi/biopython/Bio/Blast/NCBIStandalone.py?cvsroot=biopython

Note that the three tools themselves all use -o (lower case) for the output
file, but refer to it slightly differently:

$ ./rpsblast --help | grep " -o "
  -o  Output File for Alignment [File Out]  Optional
$ ./blastpgp --help | grep " -o "
  -o  Output File for Alignment [File Out]  Optional
$ ./blastall --help | grep " -o "
  -o  BLAST report Output File [File Out]  Optional

Our function for rpsblast already supported this argument under the name
"align_outfile" which I have therefore also used for blastpgp (this is good
name as blastpgp outputs more than one type of file).

For blastall "align_outfile" doesn't seem entirely appropriate, and although it
is inconsistent I have gone for "outfile" instead.

Example usage:

#imports and setting up input parameters omitted
out_handle, err_handle = NCBIStandalone.blastall(blastall_exe, "blastp",
                                                 blastdb_nr, query_file,
                                                 expectation=0.000001,
                                                 nprocessors=1, filter="F",
                                                 outfile=output_file,
                                                 alignments=5, descriptions=5)
assert "" == err_handle.read()
assert "" = out_handle.read() #Important so we wait for BLAST to finish!
err_handle.close()
out_handle.close()
assert os.path.isfile(output_file)

count = 0
for blast_record in NCBIXML.parse(open(output_file)) :
    count += 1
print "Found %i BLAST results" % count


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From biopython at maubp.freeserve.co.uk  Thu Mar 19 19:00:51 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Thu, 19 Mar 2009 19:00:51 +0000
Subject: [Biopython-dev] biopython on github
In-Reply-To: <20090317213414.GK57054@sobchak.mgh.harvard.edu>
References: <320fb6e00903170146x59218aa0m857cab797ad3f440@mail.gmail.com>
	<C5E52504.1F20A%lpritc@scri.ac.uk>
	<20090317124930.GE57054@sobchak.mgh.harvard.edu>
	<6d941f120903170619n4cb8d4dfr8a72f8ac1e0e896d@mail.gmail.com>
	<320fb6e00903170744j543f643fg6ef8d677287e2361@mail.gmail.com>
	<320fb6e00903171059r7a5528d5i19bf5fed9cfd8a63@mail.gmail.com>
	<20090317213414.GK57054@sobchak.mgh.harvard.edu>
Message-ID: <320fb6e00903191200q4ccff93v7e082990d115bc09@mail.gmail.com>

On Tue, Mar 17, 2009 at 9:34 PM, Brad Chapman <chapmanb at 50mail.com> wrote:
> Hi Peter;
>
>> Using the git command line tool, I was able to pull and merge Brad's
>> changes (as I had made no changes in the meantime this could be done
>> automatically), and then push the merged version back up to github on
>> my branch. ?At this point my branch and brad's agreed once again, and
>> the "network" diagram no longer shows both. ?Note that my branch now
>> includes a commit from Brad.
>
> Sweet. Glad that worked. I deleted my branch (edit->delete
> repository).

How long did it take to process?  I deleted mine (after attempting to
merge against the CVS mirror).  The delete was still in progress over
12 hours later!

> While doing so, I noticed that there is also a 'Repository
> Collaborators' section within the 'edit' page. So, another working
> model is to have multiple users simultaneously editing one forked
> revision. If you are already communicating on the work through the
> mailing list or wiki, this is more like CVS/SVN then the branching
> model.

Yes, this should be a fairly simple way to give all our current CVS
developers direct access to a master branch on github.

>> Now all this worked, but I was wondering if the github web interface
>> could have simplified any of this, if I'd only know where to click.
>> For example, does github offer any way to view a diff between to
>> branches? ?Or, as I suspect, do they simply expect you to use the git
>> tools directly for this?
>
> What was the command you used for this? git diff is still befuddling
> to me.

I didn't actually figure that out (how to do a diff between two
branches on github).  And this afternoon github seems to be down, so I
haven't played with it any more.

Peter


From chris.lasher at gmail.com  Fri Mar 20 04:52:49 2009
From: chris.lasher at gmail.com (Chris Lasher)
Date: Fri, 20 Mar 2009 00:52:49 -0400
Subject: [Biopython-dev] Help pages in Biopython wiki
Message-ID: <128a885f0903192152m7d1e24fdh3ace50021851b36e@mail.gmail.com>

Would it be possible to get the help documentation installed for the
Biopython wiki?

http://biopython.org/wiki/Help

Chris


From lpritc at scri.ac.uk  Fri Mar 20 08:42:44 2009
From: lpritc at scri.ac.uk (Leighton Pritchard)
Date: Fri, 20 Mar 2009 08:42:44 +0000
Subject: [Biopython-dev] Help pages in Biopython wiki
In-Reply-To: <128a885f0903192152m7d1e24fdh3ace50021851b36e@mail.gmail.com>
Message-ID: <C5E90784.1F50A%lpritc@scri.ac.uk>

Hi Chris,

That page doesn't exist, yet (click on the 'page' tab to see this), and no
pages link to it (see here:
http://biopython.org/wiki/Special:WhatLinksHere/Help)

What help were you expecting to see there?

L.

On 20/03/2009 04:52, "Chris Lasher" <chris.lasher at gmail.com> wrote:

> Would it be possible to get the help documentation installed for the
> Biopython wiki?
> 
> http://biopython.org/wiki/Help
> 
> Chris
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev
> 
> ______________________________________________________________________
> This email has been scanned by the MessageLabs Email Security System.
> For more information please visit http://www.messagelabs.com/email
> ______________________________________________________________________

-- 
Dr Leighton Pritchard MRSC
D131, Plant Pathology Programme, SCRI
Errol Road, Invergowrie, Perth and Kinross, Scotland, DD2 5DA
e:lpritc at scri.ac.uk       w:http://www.scri.ac.uk/staff/leightonpritchard
gpg/pgp: 0xFEFC205C       tel:+44(0)1382 562731 x2405


______________________________________________________
SCRI, Invergowrie, Dundee, DD2 5DA.  
The Scottish Crop Research Institute is a charitable company limited by guarantee. 
Registered in Scotland No: SC 29367.
Recognised by the Inland Revenue as a Scottish Charity No: SC 006662.


DISCLAIMER:

This email is from the Scottish Crop Research Institute, but the views expressed by the sender are not necessarily the views of SCRI and its subsidiaries.  This email and any files transmitted with it are confidential to the intended recipient at the e-mail address to which it has been addressed.  It may not be disclosed or used by any other than that
addressee.
If you are not the intended recipient you are requested to preserve this confidentiality and you must not use, disclose, copy, print or rely on
this e-mail in any way. Please notify postmaster at scri.ac.uk quoting the name of the sender and delete the email from your system.

Although SCRI has taken reasonable precautions to ensure no viruses are present in this email, neither the Institute nor the sender accepts any responsibility for any viruses, and it is your responsibility to scan the email and the attachments (if any).
______________________________________________________


From biopython at maubp.freeserve.co.uk  Fri Mar 20 10:41:49 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Fri, 20 Mar 2009 10:41:49 +0000
Subject: [Biopython-dev] biopython on github
In-Reply-To: <320fb6e00903191200q4ccff93v7e082990d115bc09@mail.gmail.com>
References: <320fb6e00903170146x59218aa0m857cab797ad3f440@mail.gmail.com>
	<C5E52504.1F20A%lpritc@scri.ac.uk>
	<20090317124930.GE57054@sobchak.mgh.harvard.edu>
	<6d941f120903170619n4cb8d4dfr8a72f8ac1e0e896d@mail.gmail.com>
	<320fb6e00903170744j543f643fg6ef8d677287e2361@mail.gmail.com>
	<320fb6e00903171059r7a5528d5i19bf5fed9cfd8a63@mail.gmail.com>
	<20090317213414.GK57054@sobchak.mgh.harvard.edu>
	<320fb6e00903191200q4ccff93v7e082990d115bc09@mail.gmail.com>
Message-ID: <320fb6e00903200341n7df020a7j95c611ab0a886ccb@mail.gmail.com>

On Thu, Mar 19, 2009 at 7:00 PM, Peter <biopython at maubp.freeserve.co.uk> wrote:
> On Tue, Mar 17, 2009 at 9:34 PM, Brad Chapman <chapmanb at 50mail.com> wrote:
>> Sweet. Glad that worked. I deleted my branch (edit->delete
>> repository).
>
> How long did it take to process? ?I deleted mine (after attempting to
> merge against the CVS mirror). ?The delete was still in progress over
> 12 hours later!

And the branch delete is still on-going :(

> ... ?And this afternoon github seems to be down, so I haven't played with it any more.

Its back online again, but right now for me github is a bit of a damp squid [*].
As my initial branch/fork of biopython still exists but is being
deleted, it seems
in the meantime I can't create a new branch of biopython.  Odd, and rather
frustrating.  Hopefully it will sort itself out shortly, and I can
have another play
with merging branches...

Peter

[*] For the benefit of non-native English speakers, or or anyone whose sense
of humour works differently to mine, this was a pun, based on the English phrase
"damp squib" for a disappointing event, and the fact that github's
error page has
some kind of cartoon squid/octopus-cat creature on it.


From dalloliogm at gmail.com  Fri Mar 20 11:15:21 2009
From: dalloliogm at gmail.com (Giovanni Marco Dall'Olio)
Date: Fri, 20 Mar 2009 12:15:21 +0100
Subject: [Biopython-dev] biopython on github
In-Reply-To: <320fb6e00903200341n7df020a7j95c611ab0a886ccb@mail.gmail.com>
References: <320fb6e00903170146x59218aa0m857cab797ad3f440@mail.gmail.com>
	<C5E52504.1F20A%lpritc@scri.ac.uk>
	<20090317124930.GE57054@sobchak.mgh.harvard.edu>
	<6d941f120903170619n4cb8d4dfr8a72f8ac1e0e896d@mail.gmail.com>
	<320fb6e00903170744j543f643fg6ef8d677287e2361@mail.gmail.com>
	<320fb6e00903171059r7a5528d5i19bf5fed9cfd8a63@mail.gmail.com>
	<20090317213414.GK57054@sobchak.mgh.harvard.edu>
	<320fb6e00903191200q4ccff93v7e082990d115bc09@mail.gmail.com>
	<320fb6e00903200341n7df020a7j95c611ab0a886ccb@mail.gmail.com>
Message-ID: <5aa3b3570903200415m2f46a45fs8be270f28357a994@mail.gmail.com>

On Fri, Mar 20, 2009 at 11:41 AM, Peter <biopython at maubp.freeserve.co.uk> wrote:
> On Thu, Mar 19, 2009 at 7:00 PM, Peter <biopython at maubp.freeserve.co.uk> wrote:
>> On Tue, Mar 17, 2009 at 9:34 PM, Brad Chapman <chapmanb at 50mail.com> wrote:
>>> Sweet. Glad that worked. I deleted my branch (edit->delete
>>> repository).
>>
>> How long did it take to process? ?I deleted mine (after attempting to
>> merge against the CVS mirror). ?The delete was still in progress over
>> 12 hours later!
>
> And the branch delete is still on-going :(
>
>> ... ?And this afternoon github seems to be down, so I haven't played with it any more.
>
> Its back online again, but right now for me github is a bit of a damp squid [*].
> As my initial branch/fork of biopython still exists but is being
> deleted, it seems
> in the meantime I can't create a new branch of biopython.

mmm are you referring to this:
- http://github.com/peterjc/biopython-seqio-quality/network
?

I can see it, and also fetch/pull changes from it..

I see that you have renamed your fork as seqio-quality. Ok, but I
think it is better to keep the fork's name as 'biopython', and then
create many branches inside it.

For example:

<create a fork on github>
git clone <yourforkurl>
cd biopython

# make some commits to your master branch:
touch testfile.txt
git add testfile.txt
git commit -a -m 'test file added'
# push the changes to your github repository ('origin' refers to
github; see $(CWD)/biopython/.git/config)
git push origin master


# create a branch called 'experimental-seqio-quality', and switch to it:
# without arguments, git branch shows the list of branches and the current one:
git branch
# create the experimental-seqio-quality branch:
git branch experimental-seqio-quality
# switch to it:
git checkout experimental-seqio-quality
# check that experimental-seqio-quality is the current working branch:
git branch

# now you are working in the branch called
'experimental-seqio-quality'. All the changes you
# commit here, will not be saved in the 'master' branch or the others,
as long as you don't
# merge them:
touch seqio-parser
git add seqio-parser
git commit -a -m 'added seqioparser'
git push origin experimental-seqio-quality
# after pushing, git will create a new branch in github. Look for
example at my fork here:
# - http://github.com/biopython/biopython/network

############

Here is how you can merge and compare your branch with someone else's
or with the biopython one:

# add a reference to biopython official branch
git remote add biopython git://github.com/biopython/biopython.git

# obtain the set of changes from the biopython branch, and merge them
git fetch biopython
git log master biopython/master
git diff master biopython/master
git merge master biopython/master

git remote add peter git://github.com/peterjc/biopython-seqio-quality.git
git fetch peter # there it should be a way to do this without having to fetch
git diff master peter/master

For references, look at this guide:
http://github.com/guides/keeping-a-git-fork-in-sync-with-the-forked-repo


>?Odd, and rather
> frustrating. ?Hopefully it will sort itself out shortly, and I can
> have another play
> with merging branches...
>
> Peter
>
> [*] For the benefit of non-native English speakers, or or anyone whose sense
> of humour works differently to mine, this was a pun, based on the English phrase
> "damp squib" for a disappointing event, and the fact that github's
> error page has
> some kind of cartoon squid/octopus-cat creature on it.
>
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev
>


--

My blog on bioinformatics (now in English): http://bioinfoblog.it


From cymon.cox at googlemail.com  Fri Mar 20 11:16:27 2009
From: cymon.cox at googlemail.com (Cymon Cox)
Date: Fri, 20 Mar 2009 11:16:27 +0000
Subject: [Biopython-dev] Test - ignore
Message-ID: <7265d4f0903200416o7c8135ddrfae4aad723bd17b7@mail.gmail.com>


From biopython at maubp.freeserve.co.uk  Fri Mar 20 11:32:15 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Fri, 20 Mar 2009 11:32:15 +0000
Subject: [Biopython-dev] biopython on github
In-Reply-To: <5aa3b3570903200415m2f46a45fs8be270f28357a994@mail.gmail.com>
References: <320fb6e00903170146x59218aa0m857cab797ad3f440@mail.gmail.com>
	<C5E52504.1F20A%lpritc@scri.ac.uk>
	<20090317124930.GE57054@sobchak.mgh.harvard.edu>
	<6d941f120903170619n4cb8d4dfr8a72f8ac1e0e896d@mail.gmail.com>
	<320fb6e00903170744j543f643fg6ef8d677287e2361@mail.gmail.com>
	<320fb6e00903171059r7a5528d5i19bf5fed9cfd8a63@mail.gmail.com>
	<20090317213414.GK57054@sobchak.mgh.harvard.edu>
	<320fb6e00903191200q4ccff93v7e082990d115bc09@mail.gmail.com>
	<320fb6e00903200341n7df020a7j95c611ab0a886ccb@mail.gmail.com>
	<5aa3b3570903200415m2f46a45fs8be270f28357a994@mail.gmail.com>
Message-ID: <320fb6e00903200432s59ddf9a8vfd8230c0a07cd598@mail.gmail.com>

>> As my initial branch/fork of biopython still exists but is being
>> deleted, it seems in the meantime I can't create a new branch
>> of biopython.
>
> mmm are you referring to this:
> - http://github.com/peterjc/biopython-seqio-quality/network
> ?
>
> I can see it, and also fetch/pull changes from it..

True, the network page is still there for me. But
http://github.com/peterjc/biopython-seqio-quality/ which redirects to
http://github.com/peterjc/biopython-seqio-quality/tree/master
shows me just a "This repository is being deleted" page.

> I see that you have renamed your fork as seqio-quality. Ok, but I
> think it is better to keep the fork's name as 'biopython', and then
> create many branches inside it.

I don't think I had entirely understood github's use of fork versus branch.
I'll have so do some more reading and try again once my account has
settled down.  Thanks for the details in your email.

Peter


From bugzilla-daemon at portal.open-bio.org  Fri Mar 20 12:18:53 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 20 Mar 2009 08:18:53 -0400
Subject: [Biopython-dev] [Bug 2678] Bio.Entrez module does not always
	retrieve or find DTD files
In-Reply-To: <bug-2678-42@http.bugzilla.open-bio.org/>
Message-ID: <200903201218.n2KCIrSX026346@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2678


------- Comment #10 from mdehoon at ims.u-tokyo.ac.jp  2009-03-20 08:18 EST -------
(In reply to comment #7)
> (In reply to comment #6)
> > If the DTD is available locally in Bio/Entrez/DTDs, then Bio.Entrez will read
> > it from there. If not, it tries to download it. This may fail if the servers
> > are busy. If the needed DTDs are saved in Bio/Entrez/DTDs (and installed when
> > Biopython is installed), you won't run into this problem.
> 
> I was just looking at this on my Windows XP Python 2.3 machine, and when it
> tried to download missing DTD files it was just using a filename as the URL.

In hindsight, I wonder if trying to download missing DTD files is really a good
idea. Suppose a user does a large number of Entrez queries, and saves the
results as XML files. Then, he tries to parse each of those XML files. If a DTD
file is missing, then Bio.Entrez will try to download the same DTD file for
each XML file it is trying to parse. This is not only wasteful, but also
bypasses Entrez's rule of no more than three accesses per second. In addition,
this is fragile. The XML files typically contain a full url to the needed DTD.
But many of Entrez's DTD files contain references to other DTD files, and those
references can be relative. When Bio.Entrez gets such a relative path to where
the DTD file is located, it is difficult to figure out the absolute path to the
DTD. Now we are looking for it in http://www.ncbi.nlm.nih.gov/dtd/, but this
does not seem to contain all required DTDs.

It may therefore make sense not to download the DTD file, but to raise an
Exception with a helpful error message, specifying which DTD file is missing,
where it can possibly be found, and where the DTD file can be installed. It
requires some more effort from the user, but it is more robust, won't break
Entrez' rules, and is more efficient.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From chapmanb at 50mail.com  Fri Mar 20 12:55:18 2009
From: chapmanb at 50mail.com (Brad Chapman)
Date: Fri, 20 Mar 2009 08:55:18 -0400
Subject: [Biopython-dev] biopython on github
In-Reply-To: <320fb6e00903200432s59ddf9a8vfd8230c0a07cd598@mail.gmail.com>
References: <C5E52504.1F20A%lpritc@scri.ac.uk>
	<20090317124930.GE57054@sobchak.mgh.harvard.edu>
	<6d941f120903170619n4cb8d4dfr8a72f8ac1e0e896d@mail.gmail.com>
	<320fb6e00903170744j543f643fg6ef8d677287e2361@mail.gmail.com>
	<320fb6e00903171059r7a5528d5i19bf5fed9cfd8a63@mail.gmail.com>
	<20090317213414.GK57054@sobchak.mgh.harvard.edu>
	<320fb6e00903191200q4ccff93v7e082990d115bc09@mail.gmail.com>
	<320fb6e00903200341n7df020a7j95c611ab0a886ccb@mail.gmail.com>
	<5aa3b3570903200415m2f46a45fs8be270f28357a994@mail.gmail.com>
	<320fb6e00903200432s59ddf9a8vfd8230c0a07cd598@mail.gmail.com>
Message-ID: <20090320125518.GA351@sobchak.mgh.harvard.edu>

Hi all;

> >> As my initial branch/fork of biopython still exists but is being
> >> deleted, it seems in the meantime I can't create a new branch
> >> of biopython.
[...]
> True, the network page is still there for me. But
> http://github.com/peterjc/biopython-seqio-quality/ which redirects to
> http://github.com/peterjc/biopython-seqio-quality/tree/master
> shows me just a "This repository is being deleted" page.

Peter, the repository deletion was very quick for me, so it looks like it
got stuck somewhere with the GitHub downtime. Does this help for getting it
removed:

http://originblog.wordpress.com/2008/04/28/github-tips-removing-a-remote-branch/

> > I see that you have renamed your fork as seqio-quality. Ok, but I
> > think it is better to keep the fork's name as 'biopython', and then
> > create many branches inside it.
> 
> I don't think I had entirely understood github's use of fork versus branch.
> I'll have so do some more reading and try again once my account has
> settled down.  Thanks for the details in your email.

Wow, now I am mad confused. I thought forks and branches were
conceptually the same. Giovanni, it seems like you are suggesting one
branch (the GitHub fork) and then a second branch (the git branch 
command). We were thinking of a standard case as:

1. Fork the Biopython trunk at GitHub. Name this something so it
makes sense what the fork/branch is for.
2. Work on the fork/branch. If you want, invite others to work on it
with you.
3. When finished, be sure you are up to date with the master
Biopython trunk.
4. Submit the fork/branch for inclusion in Biopython.
5. Once included, delete the fork/branch.

Which parts of this fall out of "standard" git practice? In general,
we should strive to keep this as simple as possible. If using Git is
complicated then we are losing a lot of our advantage over CVS/patches.

Giovanni, the example commands were very helpful; I added details to the Git
page on how to see diffs of branches:

http://biopython.org/wiki/GitMigration#Evaluating_changes

Brad


From bugzilla-daemon at portal.open-bio.org  Fri Mar 20 13:57:00 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 20 Mar 2009 09:57:00 -0400
Subject: [Biopython-dev] [Bug 2678] Bio.Entrez module does not always
	retrieve or find DTD files
In-Reply-To: <bug-2678-42@http.bugzilla.open-bio.org/>
Message-ID: <200903201357.n2KDv0JJ001146@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2678


------- Comment #11 from biopython-bugzilla at maubp.freeserve.co.uk  2009-03-20 09:57 EST -------
(In reply to comment #10)
> 
> In hindsight, I wonder if trying to download missing DTD files is really a
> good idea. Suppose a user does a large number of Entrez queries, and saves
> the results as XML files. Then, he tries to parse each of those XML files.
> If a DTD file is missing, then Bio.Entrez will try to download the same DTD
> file for each XML file it is trying to parse. This is not only wasteful, but
> also bypasses Entrez's rule of no more than three accesses per second.

Very true.  We should be able to enforce the access limit here without too much
trouble.  More generally, it would make sense for the DTD file to be saved -
ideally to the python site-packages but as we may not have write access, at
least to a cache.

> In addition, this is fragile. The XML files typically contain a full url to
> the needed DTD.   But many of Entrez's DTD files contain references to other
> DTD files, and those references can be relative. When Bio.Entrez gets such a
> relative path to where the DTD file is located, it is difficult to figure out
> the absolute path to the DTD. Now we are looking for it in
> http://www.ncbi.nlm.nih.gov/dtd/, but this does not seem to contain all
> required DTDs.

When I looked into the DTD URLs, I didn't see the NCBI using an relative
links, but they may have changed things since.  Additionally the NCBI have a
(different but overlapping) set of DTD files at:
http://eutils.ncbi.nlm.nih.gov/entrez/query/DTD/

Can we get some python XML/DTD library to resolve these links for us?

> It may therefore make sense not to download the DTD file, but to raise an
> Exception with a helpful error message, specifying which DTD file is missing,
> where it can possibly be found, and where the DTD file can be installed. It
> requires some more effort from the user, but it is more robust, won't break
> Entrez' rules, and is more efficient.

Biopython 1.49 generally failed to download missing DTD files.  Right now the
current code in CVS does much better at coping with missing DTD files, but in a
very wasteful way.  In either version, it does at least issue warnings,
indicating something is not right.

As a user, I would prefer Bio.Entrez to download missing DTD files on demand
AND SAVE THEM.  As a developer I can see this is rather complicated, and you
are right Michiel - a simple error message with instructions is much more
straight forward.

Note that the error might also suggest upgrading to the latest Biopython, or
reporting the issue to us - but it would then be a very long error message!

If you want to switch to a helpful error message for missing DTD files, I'm OK
with that.  We could also ship the current code for Biopython 1.50.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From dalloliogm at gmail.com  Fri Mar 20 14:25:41 2009
From: dalloliogm at gmail.com (Giovanni Marco Dall'Olio)
Date: Fri, 20 Mar 2009 15:25:41 +0100
Subject: [Biopython-dev] biopython on github
In-Reply-To: <20090320125518.GA351@sobchak.mgh.harvard.edu>
References: <C5E52504.1F20A%lpritc@scri.ac.uk>
	<6d941f120903170619n4cb8d4dfr8a72f8ac1e0e896d@mail.gmail.com>
	<320fb6e00903170744j543f643fg6ef8d677287e2361@mail.gmail.com>
	<320fb6e00903171059r7a5528d5i19bf5fed9cfd8a63@mail.gmail.com>
	<20090317213414.GK57054@sobchak.mgh.harvard.edu>
	<320fb6e00903191200q4ccff93v7e082990d115bc09@mail.gmail.com>
	<320fb6e00903200341n7df020a7j95c611ab0a886ccb@mail.gmail.com>
	<5aa3b3570903200415m2f46a45fs8be270f28357a994@mail.gmail.com>
	<320fb6e00903200432s59ddf9a8vfd8230c0a07cd598@mail.gmail.com>
	<20090320125518.GA351@sobchak.mgh.harvard.edu>
Message-ID: <5aa3b3570903200725p1437ceem6a538af640c52ced@mail.gmail.com>

On Fri, Mar 20, 2009 at 1:55 PM, Brad Chapman <chapmanb at 50mail.com> wrote:
> Hi all;
>
>> >> As my initial branch/fork of biopython still exists but is being
>> >> deleted, it seems in the meantime I can't create a new branch
>> >> of biopython.
> [...]
>> True, the network page is still there for me. But
>> http://github.com/peterjc/biopython-seqio-quality/ which redirects to
>> http://github.com/peterjc/biopython-seqio-quality/tree/master
>> shows me just a "This repository is being deleted" page.
>
> Peter, the repository deletion was very quick for me, so it looks like it
> got stuck somewhere with the GitHub downtime. Does this help for getting it
> removed:
>
> http://originblog.wordpress.com/2008/04/28/github-tips-removing-a-remote-branch/
>
>> > I see that you have renamed your fork as seqio-quality. Ok, but I
>> > think it is better to keep the fork's name as 'biopython', and then
>> > create many branches inside it.
>>
>> I don't think I had entirely understood github's use of fork versus branch.
>> I'll have so do some more reading and try again once my account has
>> settled down. ?Thanks for the details in your email.
>
> Wow, now I am mad confused. I thought forks and branches were
> conceptually the same.

Consider that the term "fork" is specific to github, and has nothing
to do with git. There is no 'git fork' command.
When you do a 'fork' in github, what it does it to create a personal
'space' on your account on github, to host all your personalizations,
including new commits and also new branches of development.
It is a kind of 'working space', that indicates all the work you have done.

I understand it seems a bit complicated at first :-( but I think that,
without using github, it is even more difficult to understand these
things.

In your account you can have more than one experimental branch. For
example, I can create a branch called 'experimental-xzy-parser',
another called 'personal modifications', and keep the master branch as
it is (or rename it).

if you want to contribute to my 'xyz parser', you can fetch this
branch into your space, with a command like:
$: git remote add giovanni <my url on github>
$: git pull giovanni master:experimental-xyz-parser # (not sure about
this last command)

this should create a branch called 'experimental-xyz-parser' in your
computer, so you can work with it, make modifications, and later push
it to github (where it will happear in the network graph).


> Giovanni, it seems like you are suggesting one
> branch (the GitHub fork) and then a second branch (the git branch
> command). We were thinking of a standard case as:
>
> 1. Fork the Biopython trunk at GitHub. Name this something so it
> makes sense what the fork/branch is for.
> 2. Work on the fork/branch. If you want, invite others to work on it
> with you.
> 3. When finished, be sure you are up to date with the master
> Biopython trunk.
> 4. Submit the fork/branch for inclusion in Biopython.
> 5. Once included, delete the fork/branch.
>
> Which parts of this fall out of "standard" git practice? In general,
> we should strive to keep this as simple as possible. If using Git is
> complicated then we are losing a lot of our advantage over CVS/patches.
>
> Giovanni, the example commands were very helpful; I added details to the Git
> page on how to see diffs of branches:
>
> http://biopython.org/wiki/GitMigration#Evaluating_changes
>
> Brad
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev
>


-- 

My blog on bioinformatics (now in English): http://bioinfoblog.it


From bugzilla-daemon at portal.open-bio.org  Fri Mar 20 14:50:49 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 20 Mar 2009 10:50:49 -0400
Subject: [Biopython-dev] [Bug 2767] Bio.SeqIO support for FASTQ and QUAL
	files
In-Reply-To: <bug-2767-42@http.bugzilla.open-bio.org/>
Message-ID: <200903201450.n2KEonrB005712@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2767


biopython-bugzilla at maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED


------- Comment #12 from biopython-bugzilla at maubp.freeserve.co.uk  2009-03-20 10:50 EST -------
Code is in CVS with unit tests.  Marking as fixed.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Fri Mar 20 14:53:37 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 20 Mar 2009 10:53:37 -0400
Subject: [Biopython-dev] [Bug 2770] suggestion: raise a warning if
	Entrez.email is not set
In-Reply-To: <bug-2770-42@http.bugzilla.open-bio.org/>
Message-ID: <200903201453.n2KErbfO006014@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2770


biopython-bugzilla at maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |WONTFIX


------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk  2009-03-20 10:53 EST -------
Resolved as won't fix (unless the NCBI change their guidelines).


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Fri Mar 20 15:49:52 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 20 Mar 2009 11:49:52 -0400
Subject: [Biopython-dev] [Bug 2718] Bio.Graphics and output file formats
	(PDF, EPS, SVG, and bitmaps)
In-Reply-To: <bug-2718-42@http.bugzilla.open-bio.org/>
Message-ID: <200903201549.n2KFnqs8011031@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2718


biopython-bugzilla at maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED


------- Comment #5 from biopython-bugzilla at maubp.freeserve.co.uk  2009-03-20 11:49 EST -------
(In reply to comment #0)
> (1) All the Bio.Graphics "write to file/handle" functions to accept any of the
> supported file formats (like Bio.Graphics.GenomeDiagram), which would require
> renderPM at run time for the bitmap formats (see Bug 2710).  They should share
> some code for mapping format names to ReportLab rendering module.  This would
> be easy to do without changing the existing mix of method names.

That should be working in CVS now.

> (2) Update the docstrings for the "write to file/handle" functions to make it
> clear they can accept a filename OR a handle (a result of the underlying
> reportlab renderer's drawToFile function's behaviour - see note below).

This was done in CVS some time ago (comment 2)

> (3) Standardise on the method naming (and perhaps deprecate the old methods). 
> Using "write" seems to be a sensible choice based on the current names used in
> Bio.Graphics.

This one is more difficult.  GenomeDiagram uses a two step system - draw then
write, where draw creates the ReportLab drawing object, and write saves it to a
file.  I'm going to leave this for another day...

Marking bug as fixed.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Fri Mar 20 17:32:50 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 20 Mar 2009 13:32:50 -0400
Subject: [Biopython-dev] [Bug 2795] New: Add commit, rollback,
	close to DBServer object
Message-ID: <bug-2795-42@http.bugzilla.open-bio.org/>

http://bugzilla.open-bio.org/show_bug.cgi?id=2795

           Summary: Add commit, rollback, close to DBServer object
           Product: Biopython
           Version: Not Applicable
          Platform: All
        OS/Version: All
            Status: NEW
          Severity: enhancement
          Priority: P2
         Component: BioSQL
        AssignedTo: biopython-dev at biopython.org
        ReportedBy: biopython-bugzilla at maubp.freeserve.co.uk


The DBServer object is defined in file BioSQL/BioSeqDatabase.py and it might
make sense to add the following methods to it:

    def commit(self):
        """Commits the current transaction to the database."""
        return self.adaptor.commit()

    def rollback(self):
        """Rolls backs the current transaction."""
        return self.adaptor.rollback()

    def close(self):
        """Close the connection. No further activity possible."""
        return self.adaptor.close()

I think the adaptor is intended to hide internal implementation details, so we
shouldn't be forcing people to use it directly for transaction support.
Consider this example from http://www.biopython.org/wiki/BioSQL currently:

from Bio import Entrez
from Bio import SeqIO
from BioSQL import BioSeqDatabase
server = BioSeqDatabase.open_database(driver="MySQLdb", user="root",
                     passwd = "", host = "localhost", db="bioseqdb")
db = server["orchids"]
handle = Entrez.efetch(db="nuccore", id="6273291,6273290,6273289",
rettype="genbank")
db.load(SeqIO.parse(handle, "genbank"))
server.adaptor.commit()

The last line would become just:

server.commit()

This seems cleaner.  Patch to follow...


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Fri Mar 20 17:34:14 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 20 Mar 2009 13:34:14 -0400
Subject: [Biopython-dev] [Bug 2795] Add commit, rollback,
	close to DBServer object
In-Reply-To: <bug-2795-42@http.bugzilla.open-bio.org/>
Message-ID: <200903201734.n2KHYEZR018864@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2795


------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk  2009-03-20 13:34 EST -------
Created an attachment (id=1263)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1263&action=view)
BioSQL patch

Patch to implement the change described.  Tested with MySQL only.

Cymon - what do you think of this?  And does it work on PostgreSQL?


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Fri Mar 20 17:59:14 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 20 Mar 2009 13:59:14 -0400
Subject: [Biopython-dev] [Bug 2795] Add commit, rollback,
	close to DBServer object
In-Reply-To: <bug-2795-42@http.bugzilla.open-bio.org/>
Message-ID: <200903201759.n2KHxENC020654@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2795


------- Comment #2 from cymon.cox at gmail.com  2009-03-20 13:59 EST -------
(In reply to comment #1)
> Created an attachment (id=1263)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1263&action=view) [details]
> BioSQL patch
> 
> Patch to implement the change described.  Tested with MySQL only.
> 
> Cymon - what do you think of this?  And does it work on PostgreSQL?

I think it makes sense, and works on PostgreSQL with the psycopg2 driver.
C.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Fri Mar 20 18:07:55 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 20 Mar 2009 14:07:55 -0400
Subject: [Biopython-dev] [Bug 2795] Add commit, rollback,
	close to DBServer object
In-Reply-To: <bug-2795-42@http.bugzilla.open-bio.org/>
Message-ID: <200903201807.n2KI7t37021424@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2795


biopython-bugzilla at maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED


------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk  2009-03-20 14:07 EST -------
(In reply to comment #2)
> I think it makes sense, and works on PostgreSQL with the psycopg2 driver.
> C.

Great, checked in, marking as fixed.  We should update the wiki once Biopython
1.50 is out...


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Fri Mar 20 18:52:44 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 20 Mar 2009 14:52:44 -0400
Subject: [Biopython-dev] [Bug 2754] Bio.PDB: Parse warnings should print to
	stderr, not stdout
In-Reply-To: <bug-2754-42@http.bugzilla.open-bio.org/>
Message-ID: <200903201852.n2KIqiBO024589@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2754


------- Comment #10 from eric.talevich at gmail.com  2009-03-20 14:52 EST -------
Here's the github branch where I'm working on this bug:

http://github.com/etal/biopython/tree/master

I've applied the two patches attached here and converted the test script from
print-and-compare to unittest. The tests pass now, but I haven't added checks
for specific parsing errors, just the general PDBConstructionError raised when
parsing the example file with PERMISSIVE=0.

The warnings are hidden during tests, as expected, but in this branch the
PDBParser warnings are noticeably more annoying during normal use. Fixing this
will require more tweaking in Bio/PDB/PDBParser.py -- I'll do that in the same
branch, since I don't think you'd want to merge one fix without the other. Same
goes for the __debug__ protection in StructureBuilder.py.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Fri Mar 20 20:08:37 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 20 Mar 2009 16:08:37 -0400
Subject: [Biopython-dev] [Bug 2754] Bio.PDB: Parse warnings should print to
	stderr, not stdout
In-Reply-To: <bug-2754-42@http.bugzilla.open-bio.org/>
Message-ID: <200903202008.n2KK8bpj029413@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2754


------- Comment #11 from biopython-bugzilla at maubp.freeserve.co.uk  2009-03-20 16:08 EST -------
(In reply to comment #10)
> Here's the github branch where I'm working on this bug:
> http://github.com/etal/biopython/tree/master

I've had a quick look on github, and this look interesting and I hope we can
get it into Biopython proper before too long.

Peter


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From biopython at maubp.freeserve.co.uk  Fri Mar 20 20:44:34 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Fri, 20 Mar 2009 20:44:34 +0000
Subject: [Biopython-dev] biopython on github
In-Reply-To: <20090320125518.GA351@sobchak.mgh.harvard.edu>
References: <C5E52504.1F20A%lpritc@scri.ac.uk>
	<6d941f120903170619n4cb8d4dfr8a72f8ac1e0e896d@mail.gmail.com>
	<320fb6e00903170744j543f643fg6ef8d677287e2361@mail.gmail.com>
	<320fb6e00903171059r7a5528d5i19bf5fed9cfd8a63@mail.gmail.com>
	<20090317213414.GK57054@sobchak.mgh.harvard.edu>
	<320fb6e00903191200q4ccff93v7e082990d115bc09@mail.gmail.com>
	<320fb6e00903200341n7df020a7j95c611ab0a886ccb@mail.gmail.com>
	<5aa3b3570903200415m2f46a45fs8be270f28357a994@mail.gmail.com>
	<320fb6e00903200432s59ddf9a8vfd8230c0a07cd598@mail.gmail.com>
	<20090320125518.GA351@sobchak.mgh.harvard.edu>
Message-ID: <320fb6e00903201344w64b303a1q1b1aac2740bac04a@mail.gmail.com>

On Fri, Mar 20, 2009 at 12:55 PM, Brad Chapman <chapmanb at 50mail.com> wrote:
>
> Peter, the repository deletion was very quick for me, so it looks like it
> got stuck somewhere with the GitHub downtime.

They've fixed it - I picked a bad day to delete a "fork".

Giovanni wrote:
>> > I see that you have renamed your fork as seqio-quality. Ok, but I
>> > think it is better to keep the fork's name as 'biopython', and then
>> > create many branches inside it.

Agreed - when I did that, I hadn't appreciated github's distinction between
branches and forks.

Peter wrote:
>> I don't think I had entirely understood github's use of fork versus branch.
>> I'll have so do some more reading and try again once my account has
>> settled down.  Thanks for the details in your email.

Brad wrote:
> Wow, now I am mad confused. I thought forks and branches were
> conceptually the same. Giovanni, it seems like you are suggesting one
> branch (the GitHub fork) and then a second branch (the git branch
> command). We were thinking of a standard case as:
>
> 1. Fork the Biopython trunk at GitHub. Name this something so it
> makes sense what the fork/branch is for.
> 2. Work on the fork/branch. If you want, invite others to work on it
> with you.
> 3. When finished, be sure you are up to date with the master
> Biopython trunk.
> 4. Submit the fork/branch for inclusion in Biopython.
> 5. Once included, delete the fork/branch.

If I understand correctly, a potential contributor does this:
1. Fork Biopython trunk at GitHub, which will give you your own
public repository (aka a "fork" in github's terminology), called
by default contributorname/biopython, containing initially a
single master branch, e.g.
http://github.com/peterjc/biopython/tree/master
2. Using the git command line tool, create a branch within your
repository to work on a problem, say bug2551, and upload this
branch to your github account. e.g.
http://github.com/peterjc/biopython/tree/bug2551 (I presume)
3. Work on your code, and commit changes to your bug2551 branch
and push these up to your github account.
4. Once you are happy, submit this bug2551 branch for inclusion in
Biopython (in the short term via Bugzilla, but if/when we have moved
to github fully, as a pull request to the main biopython master,
or if appropriate the master of the mainterainer of that module).
5. Once the changes are in the main Biopython, you can delete
the bug2551 branch (but not the whole "fork" which may contain
other branches).

Almost the same... I'll try this shortly (maybe Monday).

Peter


From bugzilla-daemon at portal.open-bio.org  Sat Mar 21 04:13:10 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sat, 21 Mar 2009 00:13:10 -0400
Subject: [Biopython-dev] [Bug 2678] Bio.Entrez module does not always
	retrieve or find DTD files
In-Reply-To: <bug-2678-42@http.bugzilla.open-bio.org/>
Message-ID: <200903210413.n2L4DAgf028509@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2678


mdehoon at ims.u-tokyo.ac.jp changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED


------- Comment #12 from mdehoon at ims.u-tokyo.ac.jp  2009-03-21 00:13 EST -------
(In reply to comment #11)
I've changed Parser.py to show an informative error message about the missing
DTD file, where most likely it can be found, and where to install it. Since
this is probably the best we can do, I'm marking this bug as fixed.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Sat Mar 21 04:24:43 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sat, 21 Mar 2009 00:24:43 -0400
Subject: [Biopython-dev] [Bug 2771] Bio.Entrez.read can't parse XML files
	from dbSNP (snp database)
In-Reply-To: <bug-2771-42@http.bugzilla.open-bio.org/>
Message-ID: <200903210424.n2L4OhOA029253@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2771


------- Comment #5 from mdehoon at ims.u-tokyo.ac.jp  2009-03-21 00:24 EST -------
(In reply to comment #0)
> >>> handle = Entrez.efetch(db='snp', id='9996597', retmode='xml')
> >>> cont = handle.read()
> >>> print cont
> '<?xml version="1.0"?>
> <ExchangeSet...>
> ...
> </ExchangeSet>
> 
With Bio.Entrez currently in CVS, Entrez.read does not raise an exception, but
simply returns an empty record. The problem is that EFetch from the SNP
database uses an XML Schema instead of a DTD to describe the contents of the
XML file, as shown in the first few lines of the XML file:

<?xml version="1.0"?>
<ExchangeSet xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns="http://www.ncbi.nlm.nih.gov/SNP/docsum"
xsi:schemaLocation="http://www.ncbi.nlm.nih.gov/SNP/docsum
http://www.ncbi.nlm.nih.gov/SNP/docsum/eudocsum.xsd">

The last url shows the XML Schema.
All other Entrez Utilities I've seen so far use a DTD instead of an XML Schema.
Hence, Entrez.read only has a DTD parser to find out how to interpret the XML
file. In principle, Bio.Entrez can be modified to add an XML Schema parser.
While this is not trivial, it is probably not super difficult. Marco, would you
be willing to write such a parser? If you have a parser for the XML Schema, I
can show you how to integrate it with Bio.Entrez.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From mjldehoon at yahoo.com  Sat Mar 21 04:47:07 2009
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Fri, 20 Mar 2009 21:47:07 -0700 (PDT)
Subject: [Biopython-dev] Bio.Entrez catching more errors
In-Reply-To: <320fb6e00903101640s5db8ed9hc1335d02f5e4123@mail.gmail.com>
Message-ID: <334920.51680.qm@web62402.mail.re1.yahoo.com>


I think it is good if we catch more errors in Bio.Entrez, but I think the error catching should be done by the parser, not when retrieving.

As you show, NCBI Entrez returns error messages in various different formats: plain text, HTML, incorrect XML, broken XML. Since there are many ways to access NCBI Entrez, there may be other styles of error messages that we don't know about. Then there is the added complication of accessing NCBI Entrez to get information in formats other than XML, e.g. GenBank files. And all this may be changed over time by NCBI.

Since the error message is ill-defined, code trying to identify error messages won't be robust. On the other hand, the format of files expected by a given parser is well-defined: Either the file agrees with the format expected by the parser, or it doesn't; if it doesn't, then that's an error. We may not be able to extract the exact error message returned by NCBI, but a parser for format XYZ can tell you that the file is not in format XYZ. Maybe the XML parser can say it doesn't look like an XML file, but that's about it.

Once NCBI Entrez starts to return errors in a uniform format, we can modify our parsers to find out the exact error message. Until that happens, trying to do so on our side will not be robust.

--Michiel


--- On Tue, 3/10/09, Peter <biopython at maubp.freeserve.co.uk> wrote:

> From: Peter <biopython at maubp.freeserve.co.uk>
> Subject: [Biopython-dev] Bio.Entrez catching more errors
> To: "BioPython-Dev Mailing List" <biopython-dev at lists.open-bio.org>
> Date: Tuesday, March 10, 2009, 7:40 PM
> Hi All,
> 
> It occured to me that the Bio.Entrez._open function can
> look at the
> retmode argument (if present) and spot if there is a
> mismatch between
> the requested format (e.g. XML, HTML, text or asn.1) and
> the actual
> data the NCBI returned.  Something along the following
> lines could be
> added to the end of the _open function in
> Bio/Entrez/__init__.py to
> acheive this:
> 
>     elif "retmode" in params and
> params["retmode"].lower()=="html" \
>     and not data.lower().startswith("<html")
> \
>     and not data.lower().startswith("<!doctype
> html") :
>         raise TypeError("Requested HTML, but
> didn't get it: %s..." % data)
>     elif "retmode" in params and
> params["retmode"].lower()=="xml" \
>     and not data.lower().startswith("<?xml") :
>         raise TypeError("Requested XML, but didn't
> get it: %s..." % data)
>     elif "retmode" in params and
> params["retmode"] \
>     and
> params["retmode"].lower()!="xml" \
>     and data.lower().startswith("<?xml") :
>         raise TypeError("Didn't request XML, but
> got it: %s..." % data)
>     elif "retmode" in params and
> params["retmode"] \
>     and
> params["retmode"].lower()!="html" \
>     and (data.lower().startswith("<html") or
> \
>          data.lower().startswith("<!doctype
> html")):
>         #Expected for some error pages (e.g. the Bad
> Gateway caught above)
>         raise TypeError("Didn't request HTML, but
> got it: %s..." % data)
> 
> I'm sure my XML/HTML detection could be made more
> robust here - I hope
> the principle is clear.  My motivation is that I have
> noticed the NCBI
> can return HTML error pages, and while we do catch some of
> these
> explicitly (e.g. Bad Gateway, or Service Unavailable), I
> think any
> HTML page when the user asked from XML, text or asn.1
> should be
> treated as error.  Similarly, not getting XML when you ask
> for it etc.
> 
> Note that by raising the exception including the message
> text it
> should be much easier to diagnose these failures.  As a
> tiny
> refinement to the above code, we should only add the
> "..." if there is
> more text to follow - this isn't always the case.
> 
> e.g. The following give an HTML error page (while some
> databases like
> "protein" are better behaved in this respect):
> >>> print Entrez.efetch(db="homologene",
> id="nonexistant", retmode="text").read()
> >>> print Entrez.efetch(db="homologene",
> id="nonexistant",
> retmode="asn.1").read()
> 
> Similarly, these give an XML like fragment (which is not a
> valid XML
> file in itself - arguably an NCBI bug; some databases like
> "protein"
> are better behaved in this respect):
> >>> print Entrez.efetch(db="pubmed",
> id="nonexistant", retmode="xml").read()
> >>> print Entrez.efetch(db="homologene",
> id="nonexistant", retmode="xml").read()
> >>> print Entrez.efetch(db="cdd",
> id="nonexistant", retmode="xml").read()
> >>> print Entrez.efetch(db="taxonomy",
> id="nonexistant", retmode="xml").read()
> 
> My suggested change to Bio.Entrez would also catch the
> following
> examples (using an invalid database) where the NCBI ignore
> the retmode
> and return an HTML help page:
> >>> print
> Entrez.efetch(db="nonexistant",
> id="123456", retmode="xml").read()
> >>> print
> Entrez.efetch(db="nonexistant",
> id="123456", retmode="text").read()
> 
> In a less clear cut example, this would flag the following
> as an error
> as the NCBI seem to return ASN.1 text instead of HTML
> here::
> >>> print Entrez.efetch(db="nucleotide",
> retmode="html", id="123456").read()
> 
> Overall, I think this change should catch lots of errors
> which
> otherwise may not be detected until later (e.g. while
> trying to parse
> the file).
> 
> --------------------------------------------------------------------------------------------------
> 
> On another point, should we catch these responses as
> errors:?
> 
> >>> efetch(db="snp",
> id="123456").read()
> '<html><head><title>PmFetch
> response</title></head><body>\n<pre>\n1:
> id: 123456 Error occurred: cannot get document
> summary\n</pre></body></html>'
> >>> efetch(db="snp",
> id="123456", retmode="html").read()
> '<html><head><title>PmFetch
> response</title></head><body>\n<pre>\n1:
> id: 123456 Error occurred: cannot get document
> summary\n</pre></body></html>'
> >>> efetch(db="snp",
> id="123456", retmode="xml").read()
> '<?xml
> version="1.0"?>\n<ExchangeSet
> xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"\nxmlns="http://www.ncbi.nlm.nih.gov/SNP/docsum"\nxsi:schemaLocation="http://www.ncbi.nlm.nih.gov/SNP/docsum\nhttp://www.ncbi.nlm.nih.gov/SNP/docsum/eudocsum.xsd">1:
> id: 123456 Error occurred: cannot get document
> summary\n\n</ExchangeSet>'
> >>> efetch(db="snp",
> id="123456", retmode="text").read()
> '1: id: 123456 Error occurred: cannot get document
> summary\n'
> 
> and,
> >>> print efetch(db="homologene",
> retmode="html", id="fake").read()
> <html>
> <body>
> <br/><h2>Error occurred: Empty id list -
> nothing todo</h2>...
> 
> Looking for the string "Error occurred: " looks
> fairly safe here, and
> should cover a range of entries.  Of course, you can
> imagine false
> positives too, e.g. a valid PUBMED plain text record for a
> tutorial
> article with a title like "Yikes! An Error Occurred: A
> beginner's
> Guide To Defensive Programming." could match.
> 
> Peter
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev


From mjldehoon at yahoo.com  Sat Mar 21 04:54:08 2009
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Fri, 20 Mar 2009 21:54:08 -0700 (PDT)
Subject: [Biopython-dev] Bio.Enzyme (was: Re:  Bio.ExPASy)
In-Reply-To: <76595.11423.qm@web62404.mail.re1.yahoo.com>
Message-ID: <517737.76119.qm@web62403.mail.re1.yahoo.com>


I've created a simplified version of the parser in Bio.Enzyme in Bio.ExPASy.Enzyme. The idea behind it is to collect all parsers related to ExPASy databases in Bio.ExPASy so that they can be found more easily by users.

Bio.ExPASy.Enzyme works essentially the same as Bio.Enzyme, but I've done a few things a bit differently. The biggest change is probably that Bio.Enzyme stores information as attributes to a record, whereas Bio.ExPASy.Enzyme has a Record derived from a dictionary, and stores information in the dictionary (same as Bio.Medline). Does anybody have any objection if Bio.ExPASy.Enzyme becomes the "official" parser for ExPASy's Enzyme database? If not, I'll modify the documentation and tests accordingly, and start the deprecation process for Bio.Enzyme.

--Michiel

--- On Sun, 3/15/09, Michiel de Hoon <mjldehoon at yahoo.com> wrote:

> From: Michiel de Hoon <mjldehoon at yahoo.com>
> Subject: [Biopython-dev] Bio.ExPASy
> To: biopython-dev at biopython.org
> Date: Sunday, March 15, 2009, 6:24 AM
> Hi everybody,
> 
> As discussed previously, I have moved the Bio.Prosite code
> to Bio.ExPASy, and I've added a ScanProsite module to
> Bio.ExPASy. I guess Bio.Enzyme should also move to
> Bio.ExPASy. See
> 
> http://biopython.org/DIST/docs/tutorial/Tutorial.proposal.html
> 
> for the documentation of Biopython as currently in CVS.
> 
> --Michiel.
> 
> 
>       
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev


From bugzilla-daemon at portal.open-bio.org  Sat Mar 21 05:05:19 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sat, 21 Mar 2009 01:05:19 -0400
Subject: [Biopython-dev] [Bug 2759] Unit test for Bio.PDB.HSExposure
In-Reply-To: <bug-2759-42@http.bugzilla.open-bio.org/>
Message-ID: <200903210505.n2L55Jb0031713@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2759


------- Comment #8 from eric.talevich at gmail.com  2009-03-21 01:05 EST -------
Marco & Peter, have either of you applied these patches to a git branch yet? My
branch for Bug 2754 and related changes also converts test_PDB.py to unittest. 
(I silence the warnings by calling warnings.simplefilter('ignore') in the setUp
method.) I'd like to try cherry-picking this commit if it's available on
github.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From mjldehoon at yahoo.com  Sat Mar 21 05:33:42 2009
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Fri, 20 Mar 2009 22:33:42 -0700 (PDT)
Subject: [Biopython-dev] biopython on github
In-Reply-To: <20090320125518.GA351@sobchak.mgh.harvard.edu>
Message-ID: <587027.97686.qm@web62408.mail.re1.yahoo.com>


> Which parts of this fall out of "standard" git
> practice? In general,
> we should strive to keep this as simple as possible. If
> using Git is
> complicated then we are losing a lot of our advantage over
> CVS/patches.

I haven't been following this topic closely, and as an "outsider" using git seems more complicated than using cvs or svn. And to be honest, I don't know if Biopython actually needs the branching and forking stuff. I think that this is more useful for bigger projects, where multiple developers may be working on interrelated parts of code at the same time. That hardly ever happens in Biopython, though.

--Michiel.


From idoerg at gmail.com  Sat Mar 21 05:55:36 2009
From: idoerg at gmail.com (Iddo Friedberg)
Date: Fri, 20 Mar 2009 22:55:36 -0700
Subject: [Biopython-dev] It's out!
Message-ID: <49C48158.9060004@gmail.com>

I'm first to announce this.... hehehe

http://bioinformatics.oxfordjournals.org/cgi/content/abstract/btp163v1

-- 
Iddo Friedberg Ph.D.
Atkinson Hall MC 0446
University of California San Diego
9500 Gilman Dr.
La Jolla, CA 92093-0446 USA
http://iddo-friedberg.net


From dalloliogm at gmail.com  Sat Mar 21 13:57:54 2009
From: dalloliogm at gmail.com (Giovanni Marco Dall'Olio)
Date: Sat, 21 Mar 2009 14:57:54 +0100
Subject: [Biopython-dev] biopython on github
In-Reply-To: <320fb6e00903201344w64b303a1q1b1aac2740bac04a@mail.gmail.com>
References: <C5E52504.1F20A%lpritc@scri.ac.uk>
	<320fb6e00903170744j543f643fg6ef8d677287e2361@mail.gmail.com>
	<320fb6e00903171059r7a5528d5i19bf5fed9cfd8a63@mail.gmail.com>
	<20090317213414.GK57054@sobchak.mgh.harvard.edu>
	<320fb6e00903191200q4ccff93v7e082990d115bc09@mail.gmail.com>
	<320fb6e00903200341n7df020a7j95c611ab0a886ccb@mail.gmail.com>
	<5aa3b3570903200415m2f46a45fs8be270f28357a994@mail.gmail.com>
	<320fb6e00903200432s59ddf9a8vfd8230c0a07cd598@mail.gmail.com>
	<20090320125518.GA351@sobchak.mgh.harvard.edu>
	<320fb6e00903201344w64b303a1q1b1aac2740bac04a@mail.gmail.com>
Message-ID: <5aa3b3570903210657v46b1b1bbj80c013b83ff635e3@mail.gmail.com>

On Fri, Mar 20, 2009 at 9:44 PM, Peter <biopython at maubp.freeserve.co.uk> wrote:

> If I understand correctly, a potential contributor does this:
> 1. Fork Biopython trunk at GitHub, which will give you your own
> public repository (aka a "fork" in github's terminology), called
> by default contributorname/biopython, containing initially a
> single master branch, e.g.
> http://github.com/peterjc/biopython/tree/master
> 2. Using the git command line tool, create a branch within your
> repository to work on a problem, say bug2551, and upload this
> branch to your github account. e.g.
> http://github.com/peterjc/biopython/tree/bug2551 (I presume)
> 3. Work on your code, and commit changes to your bug2551 branch
> and push these up to your github account.
> 4. Once you are happy, submit this bug2551 branch for inclusion in
> Biopython (in the short term via Bugzilla, but if/when we have moved
> to github fully, as a pull request to the main biopython master,
> or if appropriate the master of the mainterainer of that module).
> 5. Once the changes are in the main Biopython, you can delete
> the bug2551 branch (but not the whole "fork" which may contain
> other branches).


Yes, I think this is the procedure.
It is a good idea to create a branch with a bug's name, so more people
can work at the same time on the same fix.


-- 

My blog on bioinformatics (now in English): http://bioinfoblog.it


From bugzilla-daemon at portal.open-bio.org  Sat Mar 21 14:32:41 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sat, 21 Mar 2009 10:32:41 -0400
Subject: [Biopython-dev] [Bug 2759] Unit test for Bio.PDB.HSExposure
In-Reply-To: <bug-2759-42@http.bugzilla.open-bio.org/>
Message-ID: <200903211432.n2LEWfXP000985@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2759


------- Comment #9 from dalloliogm at gmail.com  2009-03-21 10:32 EST -------
(In reply to comment #8)
> Marco & Peter, have either of you applied these patches to a git branch yet? My
> branch for Bug 2754 and related changes also converts test_PDB.py to unittest. 
> (I silence the warnings by calling warnings.simplefilter('ignore') in the setUp
> method.) I'd like to try cherry-picking this commit if it's available on
> github.

ok... Is your branch this one:
-
http://github.com/etal/biopython/commit/65f5cf9fa8d6d63976b0942e00bd9aecef7e4197
?


This was my proposal:
-
http://github.com/dalloliogm/biopython/blob/alternative-pdb-exposure-test/Tests/test_PDBexposure.py


I have structured the unittest in a different way, so every test case
represents a pdb file with some known values for PDB exposure etc..: but the
result should be the same.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From dalloliogm at gmail.com  Sat Mar 21 14:40:05 2009
From: dalloliogm at gmail.com (Giovanni Marco Dall'Olio)
Date: Sat, 21 Mar 2009 15:40:05 +0100
Subject: [Biopython-dev] biopython on github
In-Reply-To: <587027.97686.qm@web62408.mail.re1.yahoo.com>
References: <20090320125518.GA351@sobchak.mgh.harvard.edu>
	<587027.97686.qm@web62408.mail.re1.yahoo.com>
Message-ID: <5aa3b3570903210740n7f818560x47991ed97ed616df@mail.gmail.com>

On Sat, Mar 21, 2009 at 6:33 AM, Michiel de Hoon <mjldehoon at yahoo.com> wrote:
>
>> Which parts of this fall out of "standard" git
>> practice? In general,
>> we should strive to keep this as simple as possible. If
>> using Git is
>> complicated then we are losing a lot of our advantage over
>> CVS/patches.
>
> I haven't been following this topic closely, and as an "outsider" using git seems more complicated than using cvs or svn. And to be honest, I don't know if Biopython actually needs the branching and forking stuff.


ok, but I assure you if you don't want to learn the advanced features
it can be used as you did with cvs.
The only difference, maybe, is that you work with a local copy
(offline) and push the changes only when you are sure about them.


If you keep a mirror on github to collect patched and enhancements, it
has some advantages:

- more than one people can work on a patch at the same time
- it is a lot easier to create customized branches of biopython. So if
someone needs to create a custom version of biopython for its own
purposes, it will be always easy to keep it compatible with the
official code.
- people can play with the code and propose enhancements, without
having to ask for write rights. This means that more people can take
confidence with biopython's code and propose fixes.

Have a look at this video, where it shows that the Ruby On Rails
project has grown quicker when it has moved to github:

- http://python.genedrift.org/2009/03/15/ror-commits/

(the jump should be on minute 5.10 or so)


> I think that this is more useful for bigger projects, where multiple developers may be working on interrelated parts of code at the same time. That hardly ever happens in Biopython, though.

Let's say I want to propose a patch to biopython. One of you
developers will probably need to look at it and propose some changes
to adapt it with the rest of biopython.
Isn't it this situation are you describing (multiple developers
working on interrelated parts of the code)?

Another example is the popgen module.
Since it is a pretty big module, and independent from the rest, an
'experimental popgen branch' of biopython has been created, based on
what was the latest biopython's cvs at the time.
However, in the range of time that it has passed since when this
branch has been created, the biopython's cvs has changed: so maybe now
the experimental popgen branch is not compatible any more with the
official code, if some module or convention has been changed.

So, git and github make the process of creating a new branch of
development and keeping it compatible with the original one easier.

> --Michiel.
>
>
>
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev
>


-- 

My blog on bioinformatics (now in English): http://bioinfoblog.it


From eric.talevich at gmail.com  Sat Mar 21 15:23:56 2009
From: eric.talevich at gmail.com (Eric Talevich)
Date: Sat, 21 Mar 2009 11:23:56 -0400
Subject: [Biopython-dev] [Bug 2759] Unit test for Bio.PDB.HSExposure
In-Reply-To: <200903211432.n2LEWfXP000985@portal.open-bio.org>
References: <bug-2759-42@http.bugzilla.open-bio.org/>
	<200903211432.n2LEWfXP000985@portal.open-bio.org>
Message-ID: <3f6baf360903210823o7a597a92va0edd2a281deb465@mail.gmail.com>

On Sat, Mar 21, 2009 at 10:32 AM, <bugzilla-daemon at portal.open-bio.org>wrote:

> http://bugzilla.open-bio.org/show_bug.cgi?id=2759
>
>
> ------- Comment #9 from dalloliogm at gmail.com  2009-03-21 10:32 EST -------
> (In reply to comment #8)
> > Marco & Peter, have either of you applied these patches to a git branch
> yet? My
> > branch for Bug 2754 and related changes also converts test_PDB.py to
> unittest.
> > (I silence the warnings by calling warnings.simplefilter('ignore') in the
> setUp
> > method.) I'd like to try cherry-picking this commit if it's available on
> > github.
>
> ok... Is your branch this one:
> -
>
> http://github.com/etal/biopython/commit/65f5cf9fa8d6d63976b0942e00bd9aecef7e4197
> ?
>
>
> This was my proposal:
> -
>
> http://github.com/dalloliogm/biopython/blob/alternative-pdb-exposure-test/Tests/test_PDBexposure.py
>
>
> I have structured the unittest in a different way, so every test case
> represents a pdb file with some known values for PDB exposure etc..: but
> the
> result should be the same.
>
>

Oh, I see now that these are meant to be separate files. Yes, that's my
branch. Perhaps test_PDB.py should be renamed to test_PDBParser.py, and the
NeighborSearch test moved elsewhere. In that case, there's no merging
problem here, and the only change needed in test_PDBexposure.py is to
silence the warnings... right?


From dalloliogm at gmail.com  Sat Mar 21 16:14:45 2009
From: dalloliogm at gmail.com (Giovanni Marco Dall'Olio)
Date: Sat, 21 Mar 2009 17:14:45 +0100
Subject: [Biopython-dev] [Bug 2759] Unit test for Bio.PDB.HSExposure
In-Reply-To: <3f6baf360903210823o7a597a92va0edd2a281deb465@mail.gmail.com>
References: <bug-2759-42@http.bugzilla.open-bio.org/>
	<200903211432.n2LEWfXP000985@portal.open-bio.org>
	<3f6baf360903210823o7a597a92va0edd2a281deb465@mail.gmail.com>
Message-ID: <5aa3b3570903210914id0bad69xc5459de68b64ec55@mail.gmail.com>

On Sat, Mar 21, 2009 at 4:23 PM, Eric Talevich <eric.talevich at gmail.com> wrote:
> On Sat, Mar 21, 2009 at 10:32 AM, <bugzilla-daemon at portal.open-bio.org>wrote:
>
>> http://bugzilla.open-bio.org/show_bug.cgi?id=2759
>>
>>
>> ------- Comment #9 from dalloliogm at gmail.com ?2009-03-21 10:32 EST -------
>> (In reply to comment #8)
>> > Marco & Peter, have either of you applied these patches to a git branch
>> yet? My
>> > branch for Bug 2754 and related changes also converts test_PDB.py to
>> unittest.
>> > (I silence the warnings by calling warnings.simplefilter('ignore') in the
>> setUp
>> > method.) I'd like to try cherry-picking this commit if it's available on
>> > github.
>>
>> ok... Is your branch this one:
>> -
>>
>> http://github.com/etal/biopython/commit/65f5cf9fa8d6d63976b0942e00bd9aecef7e4197
>> ?
>>
>>
>> This was my proposal:
>> -
>>
>> http://github.com/dalloliogm/biopython/blob/alternative-pdb-exposure-test/Tests/test_PDBexposure.py
>>
>>
>> I have structured the unittest in a different way, so every test case
>> represents a pdb file with some known values for PDB exposure etc..: but
>> the
>> result should be the same.
>>
>>
>
> Oh, I see now that these are meant to be separate files. Yes, that's my
> branch. Perhaps test_PDB.py should be renamed to test_PDBParser.py, and the
> NeighborSearch test moved elsewhere. In that case, there's no merging
> problem here, and the only change needed in test_PDBexposure.py is to
> silence the warnings... right?

well, it depends also on what Peter think.
Mine was only a proof of concept to see if the unittest could be
refactored in that way.
In principle, it should be equivalent to the the original one and
execute the same tests.

If you want to use it, the problem is that it make use of a decorator
function (@classmethod) which is not supported by earlier versions of
python.

This can be resolved by moving all the instructions in setUpAll into
setUp, like here:
- http://github.com/dalloliogm/biopython/commit/83864b8a1269aaf52ac193d7bf9ed9ca5edc5a30

(however, this way the setUp instructions - like opening and parsing
the PPDB file - will be repeated for every test).


> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev
>


-- 

My blog on bioinformatics (now in English): http://bioinfoblog.it


From eric.talevich at gmail.com  Sat Mar 21 17:13:52 2009
From: eric.talevich at gmail.com (Eric Talevich)
Date: Sat, 21 Mar 2009 13:13:52 -0400
Subject: [Biopython-dev] [Bug 2759] Unit test for Bio.PDB.HSExposure
In-Reply-To: <5aa3b3570903210914id0bad69xc5459de68b64ec55@mail.gmail.com>
References: <bug-2759-42@http.bugzilla.open-bio.org/>
	<200903211432.n2LEWfXP000985@portal.open-bio.org>
	<3f6baf360903210823o7a597a92va0edd2a281deb465@mail.gmail.com>
	<5aa3b3570903210914id0bad69xc5459de68b64ec55@mail.gmail.com>
Message-ID: <3f6baf360903211013k423b925avc4a3e714ce36ff85@mail.gmail.com>

On Sat, Mar 21, 2009 at 12:14 PM, Giovanni Marco Dall'Olio <
dalloliogm at gmail.com> wrote:

> On Sat, Mar 21, 2009 at 4:23 PM, Eric Talevich <eric.talevich at gmail.com>
> wrote:
> > On Sat, Mar 21, 2009 at 10:32 AM, <bugzilla-daemon at portal.open-bio.org
> >wrote:
> >
> >> http://bugzilla.open-bio.org/show_bug.cgi?id=2759
> >>
> >>
> >> ok... Is your branch this one:
> >> -
> >>
> http://github.com/etal/biopython/commit/65f5cf9fa8d6d63976b0942e00bd9aecef7e4197
> >> ?
> >>
> >>
> >> This was my proposal:
> >> -
> >>
> http://github.com/dalloliogm/biopython/blob/alternative-pdb-exposure-test/Tests/test_PDBexposure.py
> >>
>
>
> If you want to use it, the problem is that it make use of a decorator
> function (@classmethod) which is not supported by earlier versions of
> python.
>
>
Decorators and @classmethod were added in Python 2.4. Since support for
Python 2.3 is being dropped after the release of BioPython 1.50 (I believe),
it should be safe to apply the decorator to post-1.50 branches. If this
needs to be in 1.50, the older way of "mymethod = classmethod(mymethod)"
would work fine in Py2.3, although I personally would just move the PDB
loading steps to setUp, since the parser is pretty quick and the code for
that is easy to read.

I'll finish up my work on Bug 2754 and merge/rebase it before trying to
integrate this code -- that should bring the parse warnings under control
and make it easier for Peter to dispatch this bug.


From biopython at maubp.freeserve.co.uk  Sat Mar 21 21:16:43 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Sat, 21 Mar 2009 21:16:43 +0000
Subject: [Biopython-dev] biopython on github
In-Reply-To: <587027.97686.qm@web62408.mail.re1.yahoo.com>
References: <20090320125518.GA351@sobchak.mgh.harvard.edu>
	<587027.97686.qm@web62408.mail.re1.yahoo.com>
Message-ID: <320fb6e00903211416r457e303bnc0515b576bbe6c9a@mail.gmail.com>

On Sat, Mar 21, 2009 at 5:33 AM, Michiel de Hoon <mjldehoon at yahoo.com> wrote:
> I haven't been following this topic closely, and as an
> "outsider" using git seems more complicated than using
> cvs or svn. And to be honest, I don't know if Biopython
> actually needs the branching and forking stuff. I think
> that this is more useful for bigger projects, where
> multiple developers may be working on interrelated
> parts of code at the same time. That hardly ever
> happens in Biopython, though.

Certainly git and github is much more powerful, and
therefore more complicated.  There is no denying that.

However, if we move to git on github, I would expect
those of us with CVS access to all be given write
access to the official Biopython branch (probably
using the collaborators feature).  If that is done, I
think you won't find things so different from now.
i.e. Initially at least, it would be business as usual -
our core official developers would be trusted to work
directly on the main branch as now (with discussions
before commits as appropriate), and do not have to
worry about forking/branching etc (unless they want
to).

In terms of the actual command(s) you'd have to type
in at the terminal to commit a change to the online
repository, this goes from one step:

cvs commit -m "Comment here" file1.py file2.py

... to two steps.  First you you have to commit changes
locally (to git on your personal machine) and then
push them to the main Biopython branch on public
server (on github).  Once I'm back at work where I
have git installed, I'll write this up on the wiki -
assuming Brad doesn't beat me too it ;)

The big change is for non-core developers, i.e.
potential contributors (like Eric who is currently trying
some Bio.PDB changes).  For them, using git allows
them to work on their changes and keep in sync with
the master repository with much more ease.

Peter


From chris.lasher at gmail.com  Sun Mar 22 02:33:11 2009
From: chris.lasher at gmail.com (Chris Lasher)
Date: Sat, 21 Mar 2009 22:33:11 -0400
Subject: [Biopython-dev] Help pages in Biopython wiki
In-Reply-To: <C5E90784.1F50A%lpritc@scri.ac.uk>
References: <128a885f0903192152m7d1e24fdh3ace50021851b36e@mail.gmail.com>
	<C5E90784.1F50A%lpritc@scri.ac.uk>
Message-ID: <128a885f0903211933w2fd8986ek53ad8d083cca3534@mail.gmail.com>

On Fri, Mar 20, 2009 at 4:42 AM, Leighton Pritchard <lpritc at scri.ac.uk>wrote:

> Hi Chris,
>
> That page doesn't exist, yet (click on the 'page' tab to see this), and no
> pages link to it (see here:
> http://biopython.org/wiki/Special:WhatLinksHere/Help)
>
> What help were you expecting to see there?


Hi Leighton,

I'm fairly certain there are pages one can install with a MediaWiki instance
that provide the standard help. They look like this:
http://www.mediawiki.org/wiki/Help:Contents

They contain the standard documentation about how to edit, format, create
new pages, etc. Useful things for new community members and people like me
who forget the nuances of each wiki software's markup language from time to
time. :-)

Chris


From biopython at maubp.freeserve.co.uk  Sun Mar 22 10:18:49 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Sun, 22 Mar 2009 10:18:49 +0000
Subject: [Biopython-dev] Help pages in Biopython wiki
In-Reply-To: <128a885f0903211933w2fd8986ek53ad8d083cca3534@mail.gmail.com>
References: <128a885f0903192152m7d1e24fdh3ace50021851b36e@mail.gmail.com>
	<C5E90784.1F50A%lpritc@scri.ac.uk>
	<128a885f0903211933w2fd8986ek53ad8d083cca3534@mail.gmail.com>
Message-ID: <320fb6e00903220318g7e214c8bmf1e6012e5db505fd@mail.gmail.com>

On Sun, Mar 22, 2009 at 2:33 AM, Chris Lasher <chris.lasher at gmail.com> wrote:
> Hi Leighton,
>
> I'm fairly certain there are pages one can install with a MediaWiki instance
> that provide the standard help. They look like this:
> http://www.mediawiki.org/wiki/Help:Contents
>
> They contain the standard documentation about how to edit, format, create
> new pages, etc. Useful things for new community members and people like me
> who forget the nuances of each wiki software's markup language from time to
> time. :-)
>
> Chris

I'm glad Leighton asked - otherwise I would had.

Would it suffice to create an a manual help page, saying this is a
wiki and we are
happy for people to create their own account to fix any minor errors they
spot, and just link to http://www.mediawiki.org/wiki/Help:Contents for help?

Peter


From biopython at maubp.freeserve.co.uk  Sun Mar 22 10:51:17 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Sun, 22 Mar 2009 10:51:17 +0000
Subject: [Biopython-dev] [Bug 2759] Unit test for Bio.PDB.HSExposure
In-Reply-To: <3f6baf360903211013k423b925avc4a3e714ce36ff85@mail.gmail.com>
References: <bug-2759-42@http.bugzilla.open-bio.org/>
	<200903211432.n2LEWfXP000985@portal.open-bio.org>
	<3f6baf360903210823o7a597a92va0edd2a281deb465@mail.gmail.com>
	<5aa3b3570903210914id0bad69xc5459de68b64ec55@mail.gmail.com>
	<3f6baf360903211013k423b925avc4a3e714ce36ff85@mail.gmail.com>
Message-ID: <320fb6e00903220351u53563f03m4c54359278c5b7f0@mail.gmail.com>

On Sat, Mar 21, 2009 at 5:13 PM, Eric Talevich <eric.talevich at gmail.com> wrote:
> Giovanni wrote:
>> If you want to use it, the problem is that it make use of a decorator
>> function (@classmethod) which is not supported by earlier versions of
>> python.
>
> Decorators and @classmethod were added in Python 2.4. Since support for
> Python 2.3 is being dropped after the release of BioPython 1.50 (I believe),
> it should be safe to apply the decorator to post-1.50 branches. If this
> needs to be in 1.50, the older way of "mymethod = classmethod(mymethod)"
> would work fine in Py2.3, although I personally would just move the PDB
> loading steps to setUp, since the parser is pretty quick and the code for
> that is easy to read.

Extra PDB unit tests would be nice to have in Biopython 1.50, which means
they must work on Python 2.3, so no decorators please.

I agree with Eric that it is simpler just to use setUp for PDB file
parsing.  Yes,
it is slower as for each test method the PDB file is reloaded - but you also
make sure it is a clean object structure, which is important as some
operations we will testing may change the object.  e.g. HSExposure:
http://bugzilla.open-bio.org/show_bug.cgi?id=2759#c4

Peter


From biopython at maubp.freeserve.co.uk  Sun Mar 22 10:44:42 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Sun, 22 Mar 2009 10:44:42 +0000
Subject: [Biopython-dev] Bio.Entrez catching more errors
In-Reply-To: <334920.51680.qm@web62402.mail.re1.yahoo.com>
References: <320fb6e00903101640s5db8ed9hc1335d02f5e4123@mail.gmail.com>
	<334920.51680.qm@web62402.mail.re1.yahoo.com>
Message-ID: <320fb6e00903220344t1057bf74mcdc1f2256d8b29b4@mail.gmail.com>

On Sat, Mar 21, 2009 at 4:47 AM, Michiel de Hoon <mjldehoon at yahoo.com> wrote:
>
> I think it is good if we catch more errors in Bio.Entrez, but I think
> the error catching should be done by the parser, not when
> retrieving.

We could do that - maybe some common functions for checking
the first line to see if it looks like HTML or XML would help.  It means
lots of changes to lots of parsers, but would help outside the use
case of Bio.Entrez - so this perhaps worth doing anyway.

What about the fairly common situation (at, its something I've done
fairly often) where Bio.Entrez.efetch() is used to fetch records which
are saved directly to file without verification - e.g. to be parsed by
another program?  Unless the error is caught in Bio.Entrez.efetch()
it may be out of our control.

> As you show, NCBI Entrez returns error messages in various
> different formats: plain text, HTML, incorrect XML, broken XML.
> Since there are many ways to access NCBI Entrez, there may
> be other styles of error messages that we don't know about.
> Then there is the added complication of accessing NCBI Entrez
> to get information in formats other than XML, e.g. GenBank files.
> And all this may be changed over time by NCBI.
>
> Since the error message is ill-defined, code trying to identify
> error messages won't be robust.

All very true.  But the main point in my original email was on
something slightly different...

> On the other hand, the format of files expected by a given
> parser is well-defined: Either the file agrees with the format
> expected by the parser, or it doesn't; if it doesn't, then that's
> an error.

Its not that simple - we are often dealing with loosely defined
file formats, and you may be able to reasonably interpret one
file in several different formats (giving difference/incorrect data).

Some parsers are very tolerant at the moment, for example
GenBank files can have a legitimate free format comment
before the records, so the parser skips anything until it
recognizes a GenBank locus id line.

> We may not be able to extract the exact error message
> returned by NCBI, but a parser for format XYZ can tell
> you that the file is not in format XYZ.

Some parsers may be able to do this, but not all.

> Maybe the XML parser can say it doesn't look like an
> XML file, but that's about it.

This is an easy case because XML is so strictly defined.
Spotting a non-XML file is pretty trivial.

> Once NCBI Entrez starts to return errors in a uniform
> format, we can modify our parsers to find out the
> exact error message. Until that happens, trying to do
> so on our side will not be robust.

I agree that pulling out error messages (the second half
of my original email in the thread) is error prone.  You
might argue that catching any errors is still worthwhile,
as long as there are no false positives.

The first half of the email (the main point) was based
on a special case: HTML and XML are pretty easy to
identify.  If you ask for HTML and don't get it, it is an
error (and vice versa).  If you ask for XML and don't
get it, it is an error (and vice versa).  The fact that
the NCBI currently often return an HTML or XML error
page when a plain text format was requested is then
easily detected as an error (simply from the file type).
This will still work even if the NCBI do change their
error formats or wording - it should be pretty robust.

Peter


From bugzilla-daemon at portal.open-bio.org  Sun Mar 22 11:36:38 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sun, 22 Mar 2009 07:36:38 -0400
Subject: [Biopython-dev] [Bug 2754] Bio.PDB: Parse warnings should print to
	stderr, not stdout
In-Reply-To: <bug-2754-42@http.bugzilla.open-bio.org/>
Message-ID: <200903221136.n2MBacSc000608@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2754


------- Comment #12 from biopython-bugzilla at maubp.freeserve.co.uk  2009-03-22 07:36 EST -------
I have a thought last night about this - how about we keep PERMISSIVE=1 as the
default but offer a "very permissive" mode:

PERMISSIVE=2 (or more), silently ignore problems, continue parsing.
PERMISSIVE=1 (or True), use stderr via the warning module, continue parsing.
PERMISSIVE=0 (or False), raise exceptions, halt parsing.

It would ofter an alternative way to silence the warnings in the unit tests,
and could be controlled at the level of individual tests - for example where we
want to make sure certain errors are caught.

It might also be useful in ordinary scripts.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From tiagoantao at gmail.com  Sun Mar 22 11:50:50 2009
From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=)
Date: Sun, 22 Mar 2009 11:50:50 +0000
Subject: [Biopython-dev] biopython on github
In-Reply-To: <587027.97686.qm@web62408.mail.re1.yahoo.com>
References: <20090320125518.GA351@sobchak.mgh.harvard.edu>
	<587027.97686.qm@web62408.mail.re1.yahoo.com>
Message-ID: <6d941f120903220450y4005b63bvd23dcb4981edec7b@mail.gmail.com>

On Sat, Mar 21, 2009 at 5:33 AM, Michiel de Hoon <mjldehoon at yahoo.com> wrote:
> I haven't been following this topic closely, and as an "outsider" using git seems more complicated than using cvs or svn. And to be honest, I don't know if Biopython actually needs the branching and forking stuff. I think that this is more useful for bigger projects, where multiple developers may be working on interrelated parts of code at the same time. That hardly ever happens in Biopython, though.


I would actually take this argument and reverse it:
The reason why biopython has been a small project, and above all, slow
to develop and innovate is excessive centralization. Using a
distributed technology allows for people to try new ideas and to get
things moving (while still maintaining an official rock stable version
with maybe glacial policies).
Lets not kid ourselves: biopython lacks a lot of stuff that is
fundamental in modern computational biology. The current status quo is
essentially maintaining a frozen set of functionality (most new code
is really just code cleanup and optimization).

While I would be cautious with a distributed environment and would
agree that checks has to be put in place to assure that the official
product is rock solid, has documentation and is reasonably future
proof, I nonetheless warmly welcome this new development.

It is also good, for a change, to have an active discussion on the
list: Now this actually seems like proper, live community.

Tiago


From eric.talevich at gmail.com  Sun Mar 22 15:25:23 2009
From: eric.talevich at gmail.com (Eric Talevich)
Date: Sun, 22 Mar 2009 11:25:23 -0400
Subject: [Biopython-dev] [Bug 2754] Bio.PDB: Parse warnings should print
	to stderr, not stdout
In-Reply-To: <200903221136.n2MBacSc000608@portal.open-bio.org>
References: <bug-2754-42@http.bugzilla.open-bio.org/>
	<200903221136.n2MBacSc000608@portal.open-bio.org>
Message-ID: <3f6baf360903220825g2b871432yba5749dab4c2ba34@mail.gmail.com>

On Sun, Mar 22, 2009 at 7:36 AM, <bugzilla-daemon at portal.open-bio.org>wrote:

> http://bugzilla.open-bio.org/show_bug.cgi?id=2754
>
>
>
> ------- Comment #12 from biopython-bugzilla at maubp.freeserve.co.uk 2009-03-22 07:36 EST -------
> I have a thought last night about this - how about we keep PERMISSIVE=1 as
> the
> default but offer a "very permissive" mode:
>
> PERMISSIVE=2 (or more), silently ignore problems, continue parsing.
> PERMISSIVE=1 (or True), use stderr via the warning module, continue
> parsing.
> PERMISSIVE=0 (or False), raise exceptions, halt parsing.
>
> It would ofter an alternative way to silence the warnings in the unit
> tests,
> and could be controlled at the level of individual tests - for example
> where we
> want to make sure certain errors are caught.
>
> It might also be useful in ordinary scripts.
>
>

I like the idea. I still have to comb through the documentation for the
warnings module some more, but I think it should be possible to do all of
this through that API -- loading PERMISSIVE=0 turns the warnings into full
exceptions, =1 makes them messages on stderr, and =2 switches them off.

At some point I'd like to make a script called something like pdbtidy.py
which parses a potentially not-quite-conformant PDB file in a permissive
mode, lists all complaints (including things like discontinuously-numbered
residues, atom collisions, psi-phi outliers, etc.), and writes out a fixed
version of the file. The model for this is HTML Tidy. Do you think this
would have a place in the Biopython distribution?


From biopython at maubp.freeserve.co.uk  Sun Mar 22 15:53:21 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Sun, 22 Mar 2009 15:53:21 +0000
Subject: [Biopython-dev]  PDB tidy script, was: [Bug 275
Message-ID: <320fb6e00903220853u7b594ee3na86560e34f742b5f@mail.gmail.com>

On Bug 2754 comment 12, I wrote:
http://bugzilla.open-bio.org/show_bug.cgi?id=2754#c12
>> I have a thought last night about this - how about we keep PERMISSIVE=1
>> as the default but offer a "very permissive" mode:
>>
>> PERMISSIVE=2 (or more), silently ignore problems, continue parsing.
>> PERMISSIVE=1 (or True), use stderr via the warning module, continue
>> parsing.
>> PERMISSIVE=0 (or False), raise exceptions, halt parsing.
>>
>> It would ofter an alternative way to silence the warnings in the unit
>> tests, and could be controlled at the level of individual tests - for
>> example where we want to make sure certain errors are caught.
>>
>> It might also be useful in ordinary scripts.

Eric replied:
> I like the idea. I still have to comb through the documentation for the
> warnings module some more, but I think it should be possible to do all of
> this through that API -- loading PERMISSIVE=0 turns the warnings into full
> exceptions, =1 makes them messages on stderr, and =2 switches them off.

It doesn't really matter - all the PDB contruction warning/errors go though
_handle_PDB_exception() to this would be the least invasive way to
implement this.

> At some point I'd like to make a script called something like pdbtidy.py
> which parses a potentially not-quite-conformant PDB file in a permissive
> mode, lists all complaints (including things like discontinuously-numbered
> residues, atom collisions, psi-phi outliers, etc.), and writes out a fixed
> version of the file. The model for this is HTML Tidy. Do you think this
> would have a place in the Biopython distribution?

It sounds useful to me, it can probably go in the scripts subdirectory,
along with the PDB surface exposure script.

One drawback is that currently Bio.PDB's header parsing leaves a lot to
be desired, and very little of the header is output when saving a PDB file
(Thomas' focus is/was very much on the 3D data).

Peter


From lpritc at scri.ac.uk  Mon Mar 23 09:02:53 2009
From: lpritc at scri.ac.uk (Leighton Pritchard)
Date: Mon, 23 Mar 2009 09:02:53 +0000
Subject: [Biopython-dev] biopython on github
In-Reply-To: <5aa3b3570903210740n7f818560x47991ed97ed616df@mail.gmail.com>
Message-ID: <C5ED00BD.1F6E4%lpritc@scri.ac.uk>

On 21/03/2009 14:40, "Giovanni Marco Dall'Olio" <dalloliogm at gmail.com>
wrote:

> Have a look at this video, where it shows that the Ruby On Rails
> project has grown quicker when it has moved to github:
> 
> - http://python.genedrift.org/2009/03/15/ror-commits/
> 
> (the jump should be on minute 5.10 or so)

I've seen this argument a couple of times, now - mostly on blogs - and I'm
not sure that it's all that clear-cut.

The RoR video shows a greater number of individual names associated with
commits, after the move to github.  However, it's not clear whether this is
because a large number of individuals have suddenly decided to contribute to
the project, or whether the move to a version control system in which author
attribution remains with contributed code - as opposed to the bottleneck of
having to be submitted with the id of someone with write access - is
responsible.  I don't think there's enough evidence to say 'the move to
github caused an increase in contributions'.

As a counter-example, the number of people who have recorded contributions
to Biopython code is 46 (from the CONTRIB file on CVS).  I don't think that
there are that many ids associated with committing the codebase on there.
My name's only associated with GenomeDiagram in the commit comments, not as
an author/committer of the code - at least, as far as the CVS application is
concerned - for example.  This might change with git.  Of course, I might be
misunderstanding git's attribution model, or how the stats for that RoR
video were compiled...

L.


-- 
Dr Leighton Pritchard MRSC
D131, Plant Pathology Programme, SCRI
Errol Road, Invergowrie, Perth and Kinross, Scotland, DD2 5DA
e:lpritc at scri.ac.uk       w:http://www.scri.ac.uk/staff/leightonpritchard
gpg/pgp: 0xFEFC205C       tel:+44(0)1382 562731 x2405


______________________________________________________
SCRI, Invergowrie, Dundee, DD2 5DA.  
The Scottish Crop Research Institute is a charitable company limited by guarantee. 
Registered in Scotland No: SC 29367.
Recognised by the Inland Revenue as a Scottish Charity No: SC 006662.


DISCLAIMER:

This email is from the Scottish Crop Research Institute, but the views expressed by the sender are not necessarily the views of SCRI and its subsidiaries.  This email and any files transmitted with it are confidential to the intended recipient at the e-mail address to which it has been addressed.  It may not be disclosed or used by any other than that
addressee.
If you are not the intended recipient you are requested to preserve this confidentiality and you must not use, disclose, copy, print or rely on
this e-mail in any way. Please notify postmaster at scri.ac.uk quoting the name of the sender and delete the email from your system.

Although SCRI has taken reasonable precautions to ensure no viruses are present in this email, neither the Institute nor the sender accepts any responsibility for any viruses, and it is your responsibility to scan the email and the attachments (if any).
______________________________________________________


From p.j.a.cock at googlemail.com  Mon Mar 23 10:14:10 2009
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Mon, 23 Mar 2009 10:14:10 +0000
Subject: [Biopython-dev] biopython on github
In-Reply-To: <C5ED00BD.1F6E4%lpritc@scri.ac.uk>
References: <5aa3b3570903210740n7f818560x47991ed97ed616df@mail.gmail.com>
	<C5ED00BD.1F6E4%lpritc@scri.ac.uk>
Message-ID: <320fb6e00903230314y212be042gfd2f0b86f8738f2d@mail.gmail.com>

On Mon, Mar 23, 2009 at 9:02 AM, Leighton Pritchard <lpritc at scri.ac.uk> wrote:
> On 21/03/2009 14:40, "Giovanni Marco Dall'Olio" <dalloliogm at gmail.com>
> wrote:
>
>> Have a look at this video, where it shows that the Ruby On Rails
>> project has grown quicker when it has moved to github:
>>
>> - http://python.genedrift.org/2009/03/15/ror-commits/
>>
>> (the jump should be on minute 5.10 or so)
>
> I've seen this argument a couple of times, now - mostly on blogs - and I'm
> not sure that it's all that clear-cut.
>
> The RoR video shows a greater number of individual names associated with
> commits, after the move to github. ?However, it's not clear whether this is
> because a large number of individuals have suddenly decided to contribute to
> the project, or whether the move to a version control system in which author
> attribution remains with contributed code - as opposed to the bottleneck of
> having to be submitted with the id of someone with write access - is
> responsible. ?I don't think there's enough evidence to say 'the move to
> github caused an increase in contributions'.
>
> As a counter-example, the number of people who have recorded contributions
> to Biopython code is 46 (from the CONTRIB file on CVS). ?I don't think that
> there are that many ids associated with committing the codebase on there.
> My name's only associated with GenomeDiagram in the commit comments, not as
> an author/committer of the code - at least, as far as the CVS application is
> concerned - for example. ?This might change with git. ?Of course, I might be
> misunderstanding git's attribution model, or how the stats for that RoR
> video were compiled...

Leighton has a good point about the attribution, and the dangers in
over interpreting such a video.  With git/github it will make it
easier to see who contributed patches (if a patch is pulled into
another branch, both the person doing the merge and the person who
originally checked in the patch get recorded), and that may indirectly
encourage more contributions.  As Leighton points out, we do try and
give credit now in CVS commit comments, but these are checked in by a
core developer.  I imagine this happened with RoR, but compiling this
information for that video would probably have been too much work.  As
well as switching tools, you are also changing the metric.

Something else to consider is how you are measuring activity: the git
and github documentation and press encourages people to commit more
often - for example while working on a bug fix or a new feature, I
might make three incremental commits on my local copy of the
repository, before I am happy enough to push this to the online
repository.  This would then show as three commits, wouldn't it - but
on CVS it would probably be just one.   i.e.  On CVS I suspect you
naturally tend to get a smaller number of larger commits than with
git.  This difference will probably vary from person to person - I
haven't counted or anything, but with CVS I think I tend to commit
lots of smaller changes, while Michiel for example tends to make fewer
but larger commits).  i.e. If the RoR video shows a sudden jump in the
number of commits, that doesn't mean more code changes.  Scaling by
number of lines changed would be another metric which is perhaps more
robust - but has drawbacks of its own.

Peter


From eric.talevich at gmail.com  Mon Mar 23 20:39:05 2009
From: eric.talevich at gmail.com (Eric Talevich)
Date: Mon, 23 Mar 2009 16:39:05 -0400
Subject: [Biopython-dev] PDB tidy script, was: [Bug 275
In-Reply-To: <320fb6e00903220853u7b594ee3na86560e34f742b5f@mail.gmail.com>
References: <320fb6e00903220853u7b594ee3na86560e34f742b5f@mail.gmail.com>
Message-ID: <3f6baf360903231339i22438e3bia554a0b7bdda7a5d@mail.gmail.com>

On Sun, Mar 22, 2009 at 11:53 AM, Peter <biopython at maubp.freeserve.co.uk>wrote:

>
> One drawback is that currently Bio.PDB's header parsing leaves a lot to
> be desired, and very little of the header is output when saving a PDB file
> (Thomas' focus is/was very much on the 3D data).
>
> Peter
>


I haven't been on this list long enough to know -- is Thomas still
supporting the PDB module? If so, would he give his blessing to some more
invasive changes to the PDB module, such as unifying PDBParser and
parse_pdb_header? That separation has always seemed curiously vestigal to
me. Now that github gives us some flexibility with public branches, it would
be nice to have a discussion on some longer-term plans for Bio.PDB. I do a
fair amount of work with PDB files and PyMol at my lab, and if the Biopython
core devs are open to it, I can start merging enhancements into my public
branch on github. However, if there's already a plan for the module, it's
obviously best for me not to publish a divergent branch.

-Eric


From biopython at maubp.freeserve.co.uk  Mon Mar 23 21:05:21 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Mon, 23 Mar 2009 21:05:21 +0000
Subject: [Biopython-dev] PDB tidy script
Message-ID: <320fb6e00903231405l479ddcc6of9cd0c1aa8fd98d4@mail.gmail.com>

On Mon, Mar 23, 2009 at 8:39 PM, Eric Talevich <eric.talevich at gmail.com> wrote:
> On Sun, Mar 22, 2009 at 11:53 AM, Peter <biopython at maubp.freeserve.co.uk>wrote:
>
>>
>> One drawback is that currently Bio.PDB's header parsing leaves a lot to
>> be desired, and very little of the header is output when saving a PDB file
>> (Thomas' focus is/was very much on the 3D data).
>>
>> Peter
>
> I haven't been on this list long enough to know -- is Thomas still
> supporting the PDB module? If so, would he give his blessing to some more
> invasive changes to the PDB module, such as unifying PDBParser and
> parse_pdb_header? That separation has always seemed curiously vestigal to
> me.
> Now that github gives us some flexibility with public branches, it would
> be nice to have a discussion on some longer-term plans for Bio.PDB. I do a
> fair amount of work with PDB files and PyMol at my lab, and if the Biopython
> core devs are open to it, I can start merging enhancements into my public
> branch on github. However, if there's already a plan for the module, it's
> obviously best for me not to publish a divergent branch.

If you look back over the history, there initially was no header parsing,
it was a contribution from Kristian Rother, and I would agree, it is rather
disjoint from the rest of the code.  One thing I personally wanted last
time I was working with PDB files was to have secondary structure
information (for them alpha and beta sheet lines in the header)
mapped onto the residue objects automatically.

And yes, Thomas is supporting the PDB module, but his time has
been rather limited of late.  When I asked him about some of the
open enhancement requests in bugzilla recently (off list) he said
said we needed "a separate class to parse all the info in the header,
not a slew of additions to the core parser class (which is designed
to deal with the 3D data only)."

I would suggest you try and get Thomas involved now for his input
on the design (before you start coding), but if need be press ahead
anyway for your own use, and he can always comment on your
public branch.  I hope the two of you can work together on this, and
if/when Thomas does stand down (or delagate), you could then be
in an excellent position to take over as the Bio.PDB maintainer if
that's what you wanted.

Peter


From sbassi at clubdelarazon.org  Tue Mar 24 06:24:38 2009
From: sbassi at clubdelarazon.org (Sebastian Bassi)
Date: Tue, 24 Mar 2009 03:24:38 -0300
Subject: [Biopython-dev] SeqIO and qual: Question about reading and writing
	qual files
Message-ID: <9e2f512b0903232324qb509c60v4154d3e1bffb089e@mail.gmail.com>

I have a .fasta file and its corresponding .qual file.
I run seqclean on the fasta file and I got a shorter .fasta file as
output (that is expected).
Using the .cln file from seqclean, I want to "trim" the .qual file the
same way my new fasta is trimmed.
I can read the cln and parse the information of "where to trim".
For example, in one original sequence of 1000 bp, I may need to trim
from 150 to 800.
The problem is that I can't modify qual values using the new SeqIO
qual parser (at least the size of the list can't be modified). I read
the example in the doc, where it is cut doing something like:
sub_rec = fullrec[150:800]
But, this works only when there is a sequence (so, when read it as
"fastq"), but it doesn't work when the sequence is read as "qual"
(because there is no sequence and in this case I can't modify the
length of the list in letter_annotations['phred_quality'], it is true
that I can modify qual values in the list, but I want to modify list
size).
Here is the error:
Traceback (most recent call last):
  File "/home/sbassi/bioinfo/INTA/qualparser.py", line 18, in <module>
    s.letter_annotations['phred_quality'] = [0,0,0,0,10,1]
  File "/home/sbassi/test/virtualenv-1.3.2/t6/lib/python2.5/site-packages/biopython-1.49-py2.5-linux-i686.egg/Bio/SeqRecord.py",
line 33, in __setitem__
    "strings) of length %i." % self._length)
TypeError: We only allow python sequences (lists, tuples or strings)
of length 5.


(Note: 5 was the size of the original qual record, when I tried to set
it to [0,0,0,0,10,1], I get this).

So my question is: Does it make sense to allow the user to modify the
size of the list in letter_annotations['phred_quality'] in qual
sequences? I think this is a nice feature for qual SeqIO.parse. If I
can modify the list size, then I can save the modified version with
SeqIO.write(x,fh,"qual") and have a qual file with a new size.

I am using 1.49 with new files from CVS.


-- 
Sebasti?n Bassi. Diplomado en Ciencia y Tecnolog?a.

Non standard disclaimer: READ CAREFULLY. By reading this email,
you agree, on behalf of your employer, to release me from all
obligations and waivers arising from any and all NON-NEGOTIATED
agreements, licenses, terms-of-service, shrinkwrap, clickwrap,
browsewrap, confidentiality, non-disclosure, non-compete and
acceptable use policies ("BOGUS AGREEMENTS") that I have
entered into with your employer, its partners, licensors, agents and
assigns, in perpetuity, without prejudice to my ongoing rights and
privileges. You further represent that you have the authority to release
me from any BOGUS AGREEMENTS on behalf of your employer.


From biopython at maubp.freeserve.co.uk  Tue Mar 24 09:49:33 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Tue, 24 Mar 2009 09:49:33 +0000
Subject: [Biopython-dev] SeqIO and qual: Question about reading and
	writing qual files
In-Reply-To: <9e2f512b0903232324qb509c60v4154d3e1bffb089e@mail.gmail.com>
References: <9e2f512b0903232324qb509c60v4154d3e1bffb089e@mail.gmail.com>
Message-ID: <320fb6e00903240249h4d0bf648rfd5de741e582f687@mail.gmail.com>

On Tue, Mar 24, 2009 at 6:24 AM, Sebastian Bassi
<sbassi at clubdelarazon.org> wrote:
> I have a .fasta file and its corresponding .qual file.
> I run seqclean on the fasta file and I got a shorter .fasta file as
> output (that is expected).

Whose seqclean script are you using?  If it doesn't output the trimmed
qual file, can it work with FASTQ output instead?

> Using the .cln file from seqclean, I want to "trim" the .qual file the
> same way my new fasta is trimmed.
> I can read the cln and parse the information of "where to trim".
> For example, in one original sequence of 1000 bp, I may need to trim
> from 150 to 800.
> The problem is that I can't modify qual values using the new SeqIO
> qual parser (at least the size of the list can't be modified). I read
> the example in the doc, where it is cut doing something like:
> sub_rec = fullrec[150:800]
> But, this works only when there is a sequence (so, when read it as
> "fastq"), but it doesn't work when the sequence is read as "qual"
> (because there is no sequence ...
> So my question is: Does it make sense to allow the user to modify the
> size of the list in letter_annotations['phred_quality'] in qual
> sequences? I think this is a nice feature for qual SeqIO.parse.

This was one area of the new SeqRecord slicing I was a little unsure
about - slicing a qual file's SeqRecord (or any SeqRecord with a None
for the sequence).  I hadn't done anything about it immediately as I
couldn't think of a use case for it - so that's solved ;)

One solution would be to introduce an UnknownSeq object, which
would be much nicer to deal with than a None object, as it would have
a length and support slicing.  I've mentioned this idea before, but
haven't yet put forward any actual code.  This seems most elegant.

Another option would be to special case handle slicing a SeqRecord
with a None sequence, where we'd slice its per-letter-annotation. For
now, you can force this with the current code by:

#Not recommend, short term hack
s.letter_annotations._length = 6
s.letter_annotations['phred_quality'] = [0,0,0,0,10,1]

Right now, without changing Biopython, I have another workaround for
you: Use the paired reader in Bio.SeqIO.QualityIO on the untrimmed
FASTA and QUAL files, which will give you SeqRecords with both the
sequence and the quality - and trim these by slicing the SeqRecord.

Peter


From sbassi at clubdelarazon.org  Tue Mar 24 14:59:51 2009
From: sbassi at clubdelarazon.org (Sebastian Bassi)
Date: Tue, 24 Mar 2009 11:59:51 -0300
Subject: [Biopython-dev] SeqIO and qual: Question about reading and
	writing qual files
In-Reply-To: <320fb6e00903240249h4d0bf648rfd5de741e582f687@mail.gmail.com>
References: <9e2f512b0903232324qb509c60v4154d3e1bffb089e@mail.gmail.com>
	<320fb6e00903240249h4d0bf648rfd5de741e582f687@mail.gmail.com>
Message-ID: <9e2f512b0903240759n3c7f8b8fpc96bccd4d629082d@mail.gmail.com>

On Tue, Mar 24, 2009 at 6:49 AM, Peter <biopython at maubp.freeserve.co.uk> wrote:
> Whose seqclean script are you using?  If it doesn't output the trimmed
> qual file, can it work with FASTQ output instead?

I am using the seqclean found here:
http://compbio.dfci.harvard.edu/tgi/software/
I doesn't output a trimmed qual file because seqclean accepts only
fasta as input. Oh, wait!!!. Looking at my seqclean directory I found
a cln2qual script. So I looked at the README to see what is it, and I
found:

"If after seqclean one needs to trim the corresponding quality values too,
according to the new coordinates or trash codes found by seqclean, the
utility script "cln2qual" is included (see the usage message). It expects
a fasta-like file containing space delimited quality values for each nucleotide
of the original sequences. It should be run after the seqclean, as it parses the
trimming ("clear range") coordinates and trash codes from the cleaning report
and applies them to the quality records."

So this utility does what I was about to do with Biopython.

But anyway, regarding this:

> This was one area of the new SeqRecord slicing I was a little unsure
> about - slicing a qual file's SeqRecord (or any SeqRecord with a None
> for the sequence).  I hadn't done anything about it immediately as I
> couldn't think of a use case for it - so that's solved ;)
> One solution would be to introduce an UnknownSeq object, which
....

I agree with the need of an UnknownSeq object for modify the size of
the qual file.

Best,
SB.


From biopython at maubp.freeserve.co.uk  Tue Mar 24 15:13:40 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Tue, 24 Mar 2009 15:13:40 +0000
Subject: [Biopython-dev] SeqIO and qual: Question about reading and
	writing qual files
In-Reply-To: <9e2f512b0903240759n3c7f8b8fpc96bccd4d629082d@mail.gmail.com>
References: <9e2f512b0903232324qb509c60v4154d3e1bffb089e@mail.gmail.com>
	<320fb6e00903240249h4d0bf648rfd5de741e582f687@mail.gmail.com>
	<9e2f512b0903240759n3c7f8b8fpc96bccd4d629082d@mail.gmail.com>
Message-ID: <320fb6e00903240813x5fdb3589qef340129b5e267c0@mail.gmail.com>

On Tue, Mar 24, 2009 at 2:59 PM, Sebastian Bassi
<sbassi at clubdelarazon.org> wrote:
> But anyway, regarding this:
>
>> This was one area of the new SeqRecord slicing I was a little unsure
>> about - slicing a qual file's SeqRecord (or any SeqRecord with a None
>> for the sequence). ?I hadn't done anything about it immediately as I
>> couldn't think of a use case for it - so that's solved ;)
>> One solution would be to introduce an UnknownSeq object, which
>> ....
>
> I agree with the need of an UnknownSeq object for modify the size of
> the qual file.

Suppose you read in a qual file (or a GenBank file with no sequence, just a
CONTIG line), and instead of None, the SeqRecord object(s) had a new
UnknownSeq object saying they where made up of a given number of "N"
characters using a DNA alphabet. What would you expect to get if you
used Bio.SeqIO to write out the file in FASTA format?  To my mind there
are two sensible options - write out the file using the "NNN....N"
sequence, or raise an error.

Peter


From biopython at maubp.freeserve.co.uk  Tue Mar 24 15:23:20 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Tue, 24 Mar 2009 15:23:20 +0000
Subject: [Biopython-dev] SeqIO and qual: Question about reading and
	writing qual files
In-Reply-To: <320fb6e00903240249h4d0bf648rfd5de741e582f687@mail.gmail.com>
References: <9e2f512b0903232324qb509c60v4154d3e1bffb089e@mail.gmail.com>
	<320fb6e00903240249h4d0bf648rfd5de741e582f687@mail.gmail.com>
Message-ID: <320fb6e00903240823o53267d8bn36908f001708f974@mail.gmail.com>

On Tue, Mar 24, 2009 at 9:49 AM, Peter <biopython at maubp.freeserve.co.uk> wrote:
>
> This was one area of the new SeqRecord slicing I was a little unsure
> about - slicing a qual file's SeqRecord (or any SeqRecord with a None
> for the sequence). ?I hadn't done anything about it immediately as I
> couldn't think of a use case for it - so that's solved ;)
>
> One solution would be to introduce an UnknownSeq object, which
> would be much nicer to deal with than a None object, as it would have
> a length and support slicing. ?I've mentioned this idea before, but
> haven't yet put forward any actual code. ?This seems most elegant.
>
> Another option would be to special case handle slicing a SeqRecord
> with a None sequence, where we'd slice its per-letter-annotation.

That should now be working with the change I've just checked into CVS,
but the combination of slicing per-letter-annotation while the sequence
is None is a real pain.

I'm almost tempted to back out the qual parser for the next release
(FASTQ support is fine), but let's see if if we can reach a consensus on
a new UnknownSeq class instead (see my earlier email on this - what
would you expect to happen if you read in a QUAL file and tried to
save it as a FASTA file?).

Peter


From sbassi at clubdelarazon.org  Tue Mar 24 15:33:56 2009
From: sbassi at clubdelarazon.org (Sebastian Bassi)
Date: Tue, 24 Mar 2009 12:33:56 -0300
Subject: [Biopython-dev] SeqIO and qual: Question about reading and
	writing qual files
In-Reply-To: <320fb6e00903240813x5fdb3589qef340129b5e267c0@mail.gmail.com>
References: <9e2f512b0903232324qb509c60v4154d3e1bffb089e@mail.gmail.com>
	<320fb6e00903240249h4d0bf648rfd5de741e582f687@mail.gmail.com>
	<9e2f512b0903240759n3c7f8b8fpc96bccd4d629082d@mail.gmail.com>
	<320fb6e00903240813x5fdb3589qef340129b5e267c0@mail.gmail.com>
Message-ID: <9e2f512b0903240833g7768de97q8f10fe72cde7e64a@mail.gmail.com>

On Tue, Mar 24, 2009 at 12:13 PM, Peter <biopython at maubp.freeserve.co.uk> wrote:
....
> characters using a DNA alphabet. What would you expect to get if you
> used Bio.SeqIO to write out the file in FASTA format?  To my mind there
> are two sensible options - write out the file using the "NNN....N"
> sequence, or raise an error.

"N" is OK (with the same length of the qual file), that is what ABI
does when the QV is low. This is not the same case but I always think
of "N" as "unknown".
Raise an error is not bad because I don't see the need to go from an
non-sequence qual to a fasta (it doesn't make sense). But that I don't
see the need, doesn't means someone else may have a reason.
Best,

-- 
Sebasti?n Bassi. Diplomado en Ciencia y Tecnolog?a.

Non standard disclaimer: READ CAREFULLY. By reading this email,
you agree, on behalf of your employer, to release me from all
obligations and waivers arising from any and all NON-NEGOTIATED
agreements, licenses, terms-of-service, shrinkwrap, clickwrap,
browsewrap, confidentiality, non-disclosure, non-compete and
acceptable use policies ("BOGUS AGREEMENTS") that I have
entered into with your employer, its partners, licensors, agents and
assigns, in perpetuity, without prejudice to my ongoing rights and
privileges. You further represent that you have the authority to release
me from any BOGUS AGREEMENTS on behalf of your employer.


From bugzilla-daemon at portal.open-bio.org  Tue Mar 24 18:25:17 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 24 Mar 2009 14:25:17 -0400
Subject: [Biopython-dev] [Bug 2799] New: UnknownSeq object (e.g. for QUAL
	files)
Message-ID: <bug-2799-42@http.bugzilla.open-bio.org/>

http://bugzilla.open-bio.org/show_bug.cgi?id=2799

           Summary: UnknownSeq object (e.g. for QUAL files)
           Product: Biopython
           Version: Not Applicable
          Platform: All
        OS/Version: All
            Status: NEW
          Severity: enhancement
          Priority: P2
         Component: Main Distribution
        AssignedTo: biopython-dev at biopython.org
        ReportedBy: biopython-bugzilla at maubp.freeserve.co.uk


Sometimes we want to represent an unknown sequence with a known length, e.g.
"N"*length for nucleotides.  This enhancement is about adding an UnknownSeq
object to Biopython which would have the following init arguments:

* length
* alphabet
* character (single letter string, defaulting to "X" for protein and "N" for
nucleotides, "?" otherwise)

Currently the Bio.SeqIO "qual" parser produces SeqRecord objects where the seq
is None, yet there is a known length.  This can also occur in GenBank files
where the is a CONTIG line but no sequence.  This makes supporting slicing (Bug
2507) complicated.  Adding a new UnknownSeq class would solve this elegantly.

In general, the UnknownSeq object should act as a Seq object whose sequence is
the character*length.

Slicing or adding UnknownSeq objects should give a new UnknownSeq object. 

Complement, reverse complement, transcribe and back transcribe can also return
new UnknownSeq objects of the same length (alphabet permitting).  Translation
can return an UnknownSeq object using "X" and a protein alphabet (with the
length roughly one third of the nucleotide length - whatever is consistent with
the Seq translate method).

Adding an UnknownSeq object to a Seq would have to give a new Seq object (or an
error?).  One use-case example here would be joining together contigs with
unknown regions of a given length (strings of N's).

This bug is a placeholder for patches or pointers to possible implementations
(e.g. I intend to try some ideas on a branch on github).  I expect most of the
discussion to be on the (dev) mailing list, rather than bugzilla.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From tiagoantao at gmail.com  Tue Mar 24 18:42:56 2009
From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=)
Date: Tue, 24 Mar 2009 18:42:56 +0000
Subject: [Biopython-dev] biopython on github
In-Reply-To: <20090317124930.GE57054@sobchak.mgh.harvard.edu>
References: <320fb6e00903170146x59218aa0m857cab797ad3f440@mail.gmail.com>
	<C5E52504.1F20A%lpritc@scri.ac.uk>
	<20090317124930.GE57054@sobchak.mgh.harvard.edu>
Message-ID: <6d941f120903241142m2f39213yfd180fcdc7ab7f0e@mail.gmail.com>

Hi,

On Tue, Mar 17, 2009 at 12:49 PM, Brad Chapman <chapmanb at 50mail.com> wrote:
> There is a lot of good material in this thread for new potential
> developers. Tiago, it would make sense to condense what you've
> written and include it with the Contributing guide:

Just a followup on this: I think it makes no sense to put much of the
new content before there is an official step of moving to github.

What I am doing, is just to put, for test purposes a framework to see
how these suggestions my work.

I' ve created a fork
http://github.com/tiagoantao/biopython-popgen-test/
with several branches

The proposed idea is:
1. The master branch should be a clearing house and stability point
for things to be suggested for submission to the official branch. All
code here should have unit tests, all unit tests should pass and
documentation should exist. Is is also a place to correct bugs that
are discovered in the official trunk (if these are simple to correct
and don' t require the creation of a temporary branch to sort them
out)
2. There is a stats branch to work on Bio.PopGen.Stats. If you want to
work on statistics you can follow/fork from the statistics branch. Any
code that people might have should be discussed to if they want to
make it on the official release.
3. Less interesting to others, I will personally create a genepop
branch to make an enhancement to the existing parser and on the
ability to call the genepop binary.

So:
People work on their very personal branches (like my genepop one).
Development branches that might have shared interests (like the stats
one) should be forked/shared commit and people interested should
discuss among themselves.
Whenever some content is deemed ready it is then put on the popgen
master branch (alongside with tests and documentation). When the
master branch is in a stable state, then the changes are proposed to
the official one.

In my view, this protects the people working on the official thing
from the potential chaos of new developments, while creating a
framework which allow for people to test innovations...

Tiago


From biopython at maubp.freeserve.co.uk  Tue Mar 24 18:54:28 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Tue, 24 Mar 2009 18:54:28 +0000
Subject: [Biopython-dev] biopython on github
In-Reply-To: <6d941f120903241142m2f39213yfd180fcdc7ab7f0e@mail.gmail.com>
References: <320fb6e00903170146x59218aa0m857cab797ad3f440@mail.gmail.com>
	<C5E52504.1F20A%lpritc@scri.ac.uk>
	<20090317124930.GE57054@sobchak.mgh.harvard.edu>
	<6d941f120903241142m2f39213yfd180fcdc7ab7f0e@mail.gmail.com>
Message-ID: <320fb6e00903241154g1a0f468cy512b29504b8b637a@mail.gmail.com>

2009/3/24 Tiago Ant?o <tiagoantao at gmail.com>:
> Hi,
>
> On Tue, Mar 17, 2009 at 12:49 PM, Brad Chapman <chapmanb at 50mail.com> wrote:
>> There is a lot of good material in this thread for new potential
>> developers. Tiago, it would make sense to condense what you've
>> written and include it with the Contributing guide:
>
> Just a followup on this: I think it makes no sense to put much of the
> new content before there is an official step of moving to github.

True - but we do need enough pointers for people to help try things out.

> What I am doing, is just to put, for test purposes a framework to see
> how these suggestions my work....
> In my view, this protects the people working on the official thing
> from the potential chaos of new developments, while creating a
> framework which allow for people to test innovations...

That sounds great, and a good model for other (self contained) modules under
active development.  I'm thinking along similar lines for Bio.SeqIO and AlignIO
(and by implication, the SeqRecord and the Alignment classes).

I would assume (although you didn't say this) you would also pull changes to
the official trunk into your branches periodically - at very least
after each official
Biopython release.

Peter


From bartek at rezolwenta.eu.org  Tue Mar 24 23:58:30 2009
From: bartek at rezolwenta.eu.org (Bartek Wilczynski)
Date: Wed, 25 Mar 2009 00:58:30 +0100
Subject: [Biopython-dev] history on github - where are the tags?
In-Reply-To: <8b34ec180903241649p7e81a2cew6587512c0cef16f@mail.gmail.com>
References: <320fb6e00903170206h570989bbgb6b9a761d2aa70ed@mail.gmail.com>
	<8b34ec180903241649p7e81a2cew6587512c0cef16f@mail.gmail.com>
Message-ID: <8b34ec180903241658k21a76269r789600f92c17fbbb@mail.gmail.com>

Hi all,

Sorry for being quiet all that time, but the conference (+jet lag both
ways) proved to be more engaging than I thought.

For the tags, they were not pushed to github before, because I didn't
know I need to specifically do it qith git push --tags.

Now they are pushed to the repository and you can fetch them to local
copies by git pull -t in any local directory which resulted from
cloning the official branch.

They probably won't get automatically transfered to derived branches,
I guess you need to pull
them from the original (official) branch.

cheers
Bartek

On Wed, Mar 25, 2009 at 12:49 AM, Bartek Wilczynski <barwil at gmail.com> wrote:
> Hi all,
>
> Sorry for being quiet all that time, but the conference (+jet lag both
> ways) proved to be more engaging than I thought.
>
> For the tags, they were not pushed to github before, because I didn't
> know I need to specifically do it qith git push --tags.
>
> Now they are pushed to the repository and you can fetch them to local
> copies by git pull -t in any local directory which resulted from
> cloning the official branch.
>
> They probably won't get automatically transfered to derived branches,
> I guess you need to pull
> them from the original (official) branch.
>
> cheers
> Bartek
>
> On Tue, Mar 17, 2009 at 10:06 AM, Peter <biopython at maubp.freeserve.co.uk> wrote:
>> Hi Bartek et al,
>>
>> I've just been looking over the github mirror of CVS, and wanted to
>> see it presented the history of individual files. ?For example, this
>> page looks at the Bio/SeqRecord.py history using ViewCVS:
>> http://cvs.biopython.org/cgi-bin/viewcvs/viewcvs.cgi/biopython/Bio/SeqRecord.py?cvsroot=biopython
>>
>> For comparison, in GitHub,
>> http://github.com/biopython/biopython/commits/master/Bio/SeqRecord.py
>>
>> As you can see, all the comments and changes are there - which is
>> great. ?But I can't see the CVS tag information, which I assume would
>> be converting into git tags. ?Is this information present in the git
>> repository, but not shown by github, or was it lost during the
>> migration? ?This might seem like a little thing, but I have found it
>> incredibly important for tracing bugs reported in older releases, for
>> example in narrowing down when something changed.
>>
>> Peter
>>
>
>
>
> --
> Bartek Wilczynski
> ==================
> Postdoctoral fellow
> EMBL, Furlong group
> Meyerhoffstrasse 1,
> 69012 Heidelberg,
> Germany
> tel: +49 6221 387 8433
>


-- 
Bartek Wilczynski
==================
Postdoctoral fellow
EMBL, Furlong group
Meyerhoffstrasse 1,
69012 Heidelberg,
Germany
tel: +49 6221 387 8433


From biopython at maubp.freeserve.co.uk  Wed Mar 25 10:01:45 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Wed, 25 Mar 2009 10:01:45 +0000
Subject: [Biopython-dev] SeqIO and qual: Question about reading and
	writing qual files
In-Reply-To: <9e2f512b0903240833g7768de97q8f10fe72cde7e64a@mail.gmail.com>
References: <9e2f512b0903232324qb509c60v4154d3e1bffb089e@mail.gmail.com>
	<320fb6e00903240249h4d0bf648rfd5de741e582f687@mail.gmail.com>
	<9e2f512b0903240759n3c7f8b8fpc96bccd4d629082d@mail.gmail.com>
	<320fb6e00903240813x5fdb3589qef340129b5e267c0@mail.gmail.com>
	<9e2f512b0903240833g7768de97q8f10fe72cde7e64a@mail.gmail.com>
Message-ID: <320fb6e00903250301v59319214pa3246e0a49899e87@mail.gmail.com>

On Tue, Mar 24, 2009 at 3:33 PM, Sebastian Bassi
<sbassi at clubdelarazon.org> wrote:
> On Tue, Mar 24, 2009 at 12:13 PM, Peter <biopython at maubp.freeserve.co.uk> wrote:
> ....
>> characters using a DNA alphabet. What would you expect to get if you
>> used Bio.SeqIO to write out the file in FASTA format? ?To my mind there
>> are two sensible options - write out the file using the "NNN....N"
>> sequence, or raise an error.
>
> "N" is OK (with the same length of the qual file), that is what ABI
> does when the QV is low. This is not the same case but I always think
> of "N" as "unknown".
> Raise an error is not bad because I don't see the need to go from an
> non-sequence qual to a fasta (it doesn't make sense). But that I don't
> see the need, doesn't means someone else may have a reason.
> Best,

I've filed an enhancement bug for the possible enhancement to add an UnknownSeq
object, perhaps as part of the Bio.Seq module, Bug 2799
http://bugzilla.open-bio.org/show_bug.cgi?id=2799

I've done an initial patch (which I plan to upload on Bugzilla) which
is available now
on git hub on a new branch:
http://github.com/peterjc/biopython/tree/bug2799-UnknownSeq

Note this doesn't do anything special (yet) when writing output files,
so they will
by default record a string of whatever unknown sequence character was used.

It would make sense for GenBank/EMBL in SeqIO to also take advantage o
the UnknownSeq object, because here the sequence is essentially optional
(consider files with just a CONTIG line), but should always have a length.

Sebastian - could you have a quick play with this github code (using the new
UnknownSeq class), and the current CVS code (using None), and make sure
both support the slicing operations you were trying earlier?  Thanks.

Peter


From biopython at maubp.freeserve.co.uk  Wed Mar 25 10:28:46 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Wed, 25 Mar 2009 10:28:46 +0000
Subject: [Biopython-dev] history on github - where are the tags?
In-Reply-To: <8b34ec180903241658k21a76269r789600f92c17fbbb@mail.gmail.com>
References: <320fb6e00903170206h570989bbgb6b9a761d2aa70ed@mail.gmail.com>
	<8b34ec180903241649p7e81a2cew6587512c0cef16f@mail.gmail.com>
	<8b34ec180903241658k21a76269r789600f92c17fbbb@mail.gmail.com>
Message-ID: <320fb6e00903250328y19165a77t470124ce490cea3d@mail.gmail.com>

On Tue, Mar 24, 2009 at 11:58 PM, Bartek Wilczynski wrote:
>
> Hi all,
>
> Sorry for being quiet all that time, but the conference (+jet lag both
> ways) proved to be more engaging than I thought.

That's fine - sleep is important ;)

> For the tags, they were not pushed to github before, because I didn't
> know I need to specifically do it qith git push --tags.

I assume you've updated your cron job so this will happen
automatically in future (e.g. when we do Biopython 1.50 beta).

> Now they are pushed to the repository and you can fetch them to local
> copies by git pull -t in any local directory which resulted from
> cloning the official branch.

Yes, I've checked and I can get the tags with:
git pull -t ...
or,
git pull --tags ...

They also show up in github (near the top, drop down menu next to
branches) and in gitx (and I assume other GUI clients).

They have commit comments like "This commit was manufactured by
cvs2svn to create tag 'biopython-146'", which is fine.

However, all the tags seem to have associated with them the deletion
of the files AUTHORS and Bio/UniGene/UniGene.py which is rather odd.
If you can work out how this happened, would it be trivial to back
these tags out and redo it?

> They probably won't get automatically transfered to derived branches,
> I guess you need to pull them from the original (official) branch.

That makes sense.

Peter


From mjldehoon at yahoo.com  Wed Mar 25 11:47:59 2009
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Wed, 25 Mar 2009 04:47:59 -0700 (PDT)
Subject: [Biopython-dev] Bio.Entrez catching more errors
In-Reply-To: <320fb6e00903220344t1057bf74mcdc1f2256d8b29b4@mail.gmail.com>
Message-ID: <559251.50851.qm@web62401.mail.re1.yahoo.com>


> What about the fairly common situation (at, its something
> I've done fairly often) where Bio.Entrez.efetch() is used
> to fetch records which are saved directly to file without
> verification - e.g. to be parsed by another program?
> Unless the error is caught in Bio.Entrez.efetch()
> it may be out of our control.

That is easy: just run the output returned by NCBI through the appropriate parser. If the parser is happy, proceed to save the NCBI output in a file.

> The first half of the email (the main point) was based
> on a special case: HTML and XML are pretty easy to
> identify.  If you ask for HTML and don't get it, it is
> an error (and vice versa).  If you ask for XML and don't
> get it, it is an error (and vice versa).  The fact that
> the NCBI currently often return an HTML or XML error
> page when a plain text format was requested is then
> easily detected as an error (simply from the file type).
> This will still work even if the NCBI do change their
> error formats or wording - it should be pretty robust.

Have a look at serialset.xml in the Bio.Entrez test cases ... this is the output obtained from NCBI using efetch from the journals database with retmode='xml'. The file looks like XML, but it doesn't start with "<!xml". However, Bio.Entrez.read parses it correctly, so while it's not pretty to me this would not count as an error.

--Michiel.


From biopython at maubp.freeserve.co.uk  Wed Mar 25 12:15:21 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Wed, 25 Mar 2009 12:15:21 +0000
Subject: [Biopython-dev] Bio.Entrez catching more errors
In-Reply-To: <559251.50851.qm@web62401.mail.re1.yahoo.com>
References: <320fb6e00903220344t1057bf74mcdc1f2256d8b29b4@mail.gmail.com>
	<559251.50851.qm@web62401.mail.re1.yahoo.com>
Message-ID: <320fb6e00903250515vd885b34s629dd9253d4f9186@mail.gmail.com>

On Wed, Mar 25, 2009 at 11:47 AM, Michiel de Hoon <mjldehoon at yahoo.com> wrote:
>
>> What about the fairly common situation (at, its something
>> I've done fairly often) where Bio.Entrez.efetch() is used
>> to fetch records which are saved directly to file without
>> verification - e.g. to be parsed by another program?
>> Unless the error is caught in Bio.Entrez.efetch()
>> it may be out of our control.
>
> That is easy: just run the output returned by NCBI through
> the appropriate parser. If the parser is happy, proceed to
> save the NCBI output in a file.

Possible, but you'd need to cache the handle's data in order
to be able to save it after parsing.  The UndoHandle doesn't
do this.

You could save the data to a file, and then check the parser
can read it back - however, this would be complicated if you
are downloading data in batches to go into a single file.

>> The first half of the email (the main point) was based
>> on a special case: HTML and XML are pretty easy to
>> identify. ?If you ask for HTML and don't get it, it is
>> an error (and vice versa). ?If you ask for XML and don't
>> get it, it is an error (and vice versa). ?The fact that
>> the NCBI currently often return an HTML or XML error
>> page when a plain text format was requested is then
>> easily detected as an error (simply from the file type).
>> This will still work even if the NCBI do change their
>> error formats or wording - it should be pretty robust.
>
> Have a look at serialset.xml in the Bio.Entrez test cases ... this
> is the output obtained from NCBI using efetch from the journals
> database with retmode='xml'. The file looks like XML, but it
> doesn't start with "<!xml". However, Bio.Entrez.read parses it
> correctly, so while it's not pretty to me this would not count as
> an error.

I do concede my sample code for detecting XML or HTML could
be improved, and this provides a good test case for a difficult
XML file.  Maybe when we expect XML (or HTML), all we should
check is the file starts with "<"?  e.g.

   elif "retmode" in params and params["retmode"].lower()=="html" \
   and not data.lower().startswith("<") :
       raise TypeError("Requested HTML, but didn't get it: %s..." % data)
   elif "retmode" in params and params["retmode"].lower()=="xml" \
   and not data.lower().startswith("<") :
       raise TypeError("Requested XML, but didn't get it: %s..." % data)
   elif "retmode" in params and params["retmode"] \
   and params["retmode"].lower()!="xml" \
   and data.lower().startswith("<?xml") :
       raise TypeError("Didn't request XML, but got it: %s..." % data)
   elif "retmode" in params and params["retmode"] \
   and params["retmode"].lower()!="html" \
   and (data.lower().startswith("<html") or \
        data.lower().startswith("<!doctype html")):
       #Expected for some error pages (e.g. the Bad Gateway caught above)
       raise TypeError("Didn't request HTML, but got it: %s..." % data)

The above code isn't expected to catch all possible errors - just the
most common ones.  One this thing version won't catch is a mix up
between XML and HTML (e.g. requested XML, given HTML error page)
but the two do overlap somewhat anyway.

Peter


From bartek at rezolwenta.eu.org  Wed Mar 25 12:16:08 2009
From: bartek at rezolwenta.eu.org (Bartek Wilczynski)
Date: Wed, 25 Mar 2009 13:16:08 +0100
Subject: [Biopython-dev] history on github - where are the tags?
In-Reply-To: <320fb6e00903250328y19165a77t470124ce490cea3d@mail.gmail.com>
References: <320fb6e00903170206h570989bbgb6b9a761d2aa70ed@mail.gmail.com>
	<8b34ec180903241649p7e81a2cew6587512c0cef16f@mail.gmail.com>
	<8b34ec180903241658k21a76269r789600f92c17fbbb@mail.gmail.com>
	<320fb6e00903250328y19165a77t470124ce490cea3d@mail.gmail.com>
Message-ID: <8b34ec180903250516v75efdd2i95cb77145b4d3001@mail.gmail.com>

On Wed, Mar 25, 2009 at 11:28 AM, Peter <biopython at maubp.freeserve.co.uk> wrote:

> I assume you've updated your cron job so this will happen
> automatically in future (e.g. when we do Biopython 1.50 beta).

Yes, naturally.

>
> However, all the tags seem to have associated with them the deletion
> of the files AUTHORS and Bio/UniGene/UniGene.py which is rather odd.
> If you can work out how this happened, would it be trivial to back
> these tags out and redo it?
>
That's really odd. I don't know exactly where it comes from, but I've
done some detective work and here are my findings:

For the AUTHORS  file, it was indeed deleted in a commit by Jeff Chang (2001):
http://github.com/biopython/biopython/tree/c9dfca8631c23b47bddb519dce9e98d07079eb65
Which "renames" the AUTHORS file into CONTRIB file.

The AUTHORS file is in the biopython tags prior to 1.00a1 and then it
should not be there anymore (it's in CVS'a attic)
 I don't know where how it came back...

Similarly, the Bio/Unigene/UniGene.py file was removed by Jeff in a commit:
http://github.com/biopython/biopython/commit/8b940e38d0fbb7c471366f844318c32b08bdd8c2

And similarly, UniGene.py is no longer in CVS repo (but it's still in
the attic).

What these files have in common, is that there are some commits to
them after they've been moved to Attic (sic!)

http://github.com/biopython/biopython/commits/master/Bio/UniGene/UniGene.py
http://github.com/biopython/biopython/commits/master/AUTHORS

I don't know exactly how this could happen, but this inconsistency in
CVS might be causing cvs2git to actually include these guys.

I'll increase the verbosity of the log messages in my cron script, so
Maybe I'll see some indication of a problem.

If nobody has a reason for these files to be included in the current
trunk, I'll go ahead and remove them from git.

cheers
Bartek

-- 
Bartek Wilczynski
==================
Postdoctoral fellow
EMBL, Furlong group
Meyerhoffstrasse 1,
69012 Heidelberg,
Germany
tel: +49 6221 387 8433


From biopython at maubp.freeserve.co.uk  Wed Mar 25 12:20:05 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Wed, 25 Mar 2009 12:20:05 +0000
Subject: [Biopython-dev] history on github - where are the tags?
In-Reply-To: <8b34ec180903250516v75efdd2i95cb77145b4d3001@mail.gmail.com>
References: <320fb6e00903170206h570989bbgb6b9a761d2aa70ed@mail.gmail.com>
	<8b34ec180903241649p7e81a2cew6587512c0cef16f@mail.gmail.com>
	<8b34ec180903241658k21a76269r789600f92c17fbbb@mail.gmail.com>
	<320fb6e00903250328y19165a77t470124ce490cea3d@mail.gmail.com>
	<8b34ec180903250516v75efdd2i95cb77145b4d3001@mail.gmail.com>
Message-ID: <320fb6e00903250520nedc0aaj84c10a1b2a72e8a2@mail.gmail.com>

>> However, all the tags seem to have associated with them the deletion
>> of the files AUTHORS and Bio/UniGene/UniGene.py which is rather odd.
>> If you can work out how this happened, would it be trivial to back
>> these tags out and redo it?
>>
> That's really odd. I don't know exactly where it comes from, but I've
> done some detective work and here are my findings:
>
> For the AUTHORS ?file, it was indeed deleted in a commit by Jeff Chang (2001):
> http://github.com/biopython/biopython/tree/c9dfca8631c23b47bddb519dce9e98d07079eb65
> Which "renames" the AUTHORS file into CONTRIB file.
>
> The AUTHORS file is in the biopython tags prior to 1.00a1 and then it
> should not be there anymore (it's in CVS'a attic)
>?I don't know where how it came back...
>
> Similarly, the Bio/Unigene/UniGene.py file was removed by Jeff in a commit:
> http://github.com/biopython/biopython/commit/8b940e38d0fbb7c471366f844318c32b08bdd8c2
>
> And similarly, UniGene.py is no longer in CVS repo (but it's still in
> the attic).
>
> What these files have in common, is that there are some commits to
> them after they've been moved to Attic (sic!)
>
> http://github.com/biopython/biopython/commits/master/Bio/UniGene/UniGene.py
> http://github.com/biopython/biopython/commits/master/AUTHORS
>
> I don't know exactly how this could happen, but this inconsistency in
> CVS might be causing cvs2git to actually include these guys.

It does sound like a hidden hickup in our CVS repository... very strange.

Peter


From bartek at rezolwenta.eu.org  Wed Mar 25 12:43:00 2009
From: bartek at rezolwenta.eu.org (Bartek Wilczynski)
Date: Wed, 25 Mar 2009 13:43:00 +0100
Subject: [Biopython-dev] history on github - where are the tags?
In-Reply-To: <320fb6e00903250520nedc0aaj84c10a1b2a72e8a2@mail.gmail.com>
References: <320fb6e00903170206h570989bbgb6b9a761d2aa70ed@mail.gmail.com>
	<8b34ec180903241649p7e81a2cew6587512c0cef16f@mail.gmail.com>
	<8b34ec180903241658k21a76269r789600f92c17fbbb@mail.gmail.com>
	<320fb6e00903250328y19165a77t470124ce490cea3d@mail.gmail.com>
	<8b34ec180903250516v75efdd2i95cb77145b4d3001@mail.gmail.com>
	<320fb6e00903250520nedc0aaj84c10a1b2a72e8a2@mail.gmail.com>
Message-ID: <8b34ec180903250543g3029edb4h33d332371ef4e469@mail.gmail.com>

On Wed, Mar 25, 2009 at 1:20 PM, Peter <biopython at maubp.freeserve.co.uk> wrote:

>> I don't know exactly how this could happen, but this inconsistency in
>> CVS might be causing cvs2git to actually include these guys.
>
> It does sound like a hidden hickup in our CVS repository... very strange.


I would rather call it a glitch in a transition. I was actually quite
surprised that the transition went so smooth.
Now we can see that actually some things did not transfer too well...

I did a  thorough check to compare checkouts from current CVS and git
trunks to see that there are also some other differences:
As you can see below, there apart from these two files present only in
git, a number of directories are not missing in git. I've checked:
they are all empty directories leftover because you cannot delete a
directory from CVS (some of them, like Bio.Tools have actually a
number of directories in them, but they are all empty).

I think that it's actually a desired behavior (removing empty
directories) but if anyone is missing any of these dirs, please let me
know.

The diff:

Only in git_branch/: AUTHORS
Only in biopython/Bio: Ais
Only in biopython/Bio: CDD
Only in biopython/Bio: cmmCIF
Only in biopython/Bio: config
Only in biopython/Bio: dbdefs
Only in biopython/Bio: ECell
Only in biopython/Bio: expressions
Only in biopython/Bio: formatdefs
Only in biopython/Bio: Gobase
Only in biopython/Bio: iodefs
Only in biopython/Bio: Kabat
Only in biopython/Bio: LocusLink
Only in biopython/Bio: MultiProc
Only in biopython/Bio/PDB: mmCIF_lex
Only in biopython/Bio: Rebase
Only in biopython/Bio/SCOP: tests
Only in biopython/Bio: sources
Only in biopython/Bio: Tools
Only in git_branch/Bio/UniGene: UniGene.py
Only in biopython/Doc/cookbook: biopython_test
Only in biopython/Doc/cookbook: genbank_to_fasta
Only in biopython/Doc/cookbook: LogisticRegression
Only in biopython: Experimental
Only in git_branch/: .git
Only in biopython/Martel: examples
Only in biopython/Tests: CDD
Only in biopython/Tests: ECell
Only in biopython/Tests: Gobase
Only in biopython/Tests: Kabat
Only in biopython/Tests: LocusLink
Only in biopython/Tests: Ndb
Only in biopython/Tests: UnitTests
Only in biopython/Tests: WIT

cheers
Bartek


From biopython at maubp.freeserve.co.uk  Wed Mar 25 12:47:02 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Wed, 25 Mar 2009 12:47:02 +0000
Subject: [Biopython-dev] history on github - where are the tags?
In-Reply-To: <8b34ec180903250543g3029edb4h33d332371ef4e469@mail.gmail.com>
References: <320fb6e00903170206h570989bbgb6b9a761d2aa70ed@mail.gmail.com>
	<8b34ec180903241649p7e81a2cew6587512c0cef16f@mail.gmail.com>
	<8b34ec180903241658k21a76269r789600f92c17fbbb@mail.gmail.com>
	<320fb6e00903250328y19165a77t470124ce490cea3d@mail.gmail.com>
	<8b34ec180903250516v75efdd2i95cb77145b4d3001@mail.gmail.com>
	<320fb6e00903250520nedc0aaj84c10a1b2a72e8a2@mail.gmail.com>
	<8b34ec180903250543g3029edb4h33d332371ef4e469@mail.gmail.com>
Message-ID: <320fb6e00903250547s7d88a1b3h8c52dd852047edb6@mail.gmail.com>

On Wed, Mar 25, 2009 at 12:43 PM, Bartek Wilczynski
<bartek at rezolwenta.eu.org> wrote:
> I did a ?thorough check to compare checkouts from current CVS and git
> trunks to see that there are also some other differences:
> As you can see below, there apart from these two files present only in
> git, a number of directories are not missing in git. I've checked:
> they are all empty directories leftover because you cannot delete a
> directory from CVS (some of them, like Bio.Tools have actually a
> number of directories in them, but they are all empty).
>
> I think that it's actually a desired behavior (removing empty
> directories) but if anyone is missing any of these dirs, please let me
> know.

I don't care about the missing empty directories - if/once we move
to git, we would have deleted them anyway.  So if that has been done
automatically, that's fine in my opinion.

Peter


From tiagoantao at gmail.com  Wed Mar 25 15:39:42 2009
From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=)
Date: Wed, 25 Mar 2009 15:39:42 +0000
Subject: [Biopython-dev] biopython on github
In-Reply-To: <320fb6e00903241154g1a0f468cy512b29504b8b637a@mail.gmail.com>
References: <320fb6e00903170146x59218aa0m857cab797ad3f440@mail.gmail.com>
	<C5E52504.1F20A%lpritc@scri.ac.uk>
	<20090317124930.GE57054@sobchak.mgh.harvard.edu>
	<6d941f120903241142m2f39213yfd180fcdc7ab7f0e@mail.gmail.com>
	<320fb6e00903241154g1a0f468cy512b29504b8b637a@mail.gmail.com>
Message-ID: <6d941f120903250839i62f6d8f9i8a5f5b85ff694848@mail.gmail.com>

On Tue, Mar 24, 2009 at 6:54 PM, Peter <biopython at maubp.freeserve.co.uk> wrote:
>> In my view, this protects the people working on the official thing
>> from the potential chaos of new developments, while creating a
>> framework which allow for people to test innovations...
>
> That sounds great, and a good model for other (self contained) modules under


Just a minor point. any development branches should be seen as highly
unstable. I say this just because I am restarting to work on
statistics and I am seeing massive refactoring going on. So if people
track development branches, they should be prepared for chaos ;) .
Which is exactly the opposite they should expect from the official
branch ;)


From biopython at maubp.freeserve.co.uk  Wed Mar 25 15:45:00 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Wed, 25 Mar 2009 15:45:00 +0000
Subject: [Biopython-dev] biopython on github
In-Reply-To: <6d941f120903250839i62f6d8f9i8a5f5b85ff694848@mail.gmail.com>
References: <320fb6e00903170146x59218aa0m857cab797ad3f440@mail.gmail.com>
	<C5E52504.1F20A%lpritc@scri.ac.uk>
	<20090317124930.GE57054@sobchak.mgh.harvard.edu>
	<6d941f120903241142m2f39213yfd180fcdc7ab7f0e@mail.gmail.com>
	<320fb6e00903241154g1a0f468cy512b29504b8b637a@mail.gmail.com>
	<6d941f120903250839i62f6d8f9i8a5f5b85ff694848@mail.gmail.com>
Message-ID: <320fb6e00903250845u23dea2a6o5330bfdec0d577ef@mail.gmail.com>

2009/3/25 Tiago Ant?o <tiagoantao at gmail.com>:
> Just a minor point. any development branches should be seen as highly
> unstable. I say this just because I am restarting to work on
> statistics and I am seeing massive refactoring going on. So if people
> track development branches, they should be prepared for chaos ;) .
> Which is exactly the opposite they should expect from the official
> branch ;)

We should probably all write something on the wiki page for our
personal forks, describing what you're using it for, what at the main
branches likely to be of interest etc.

Peter


From bartek at rezolwenta.eu.org  Wed Mar 25 16:33:13 2009
From: bartek at rezolwenta.eu.org (Bartek Wilczynski)
Date: Wed, 25 Mar 2009 17:33:13 +0100
Subject: [Biopython-dev] biopython on github
In-Reply-To: <320fb6e00903250845u23dea2a6o5330bfdec0d577ef@mail.gmail.com>
References: <320fb6e00903170146x59218aa0m857cab797ad3f440@mail.gmail.com>
	<C5E52504.1F20A%lpritc@scri.ac.uk>
	<20090317124930.GE57054@sobchak.mgh.harvard.edu>
	<6d941f120903241142m2f39213yfd180fcdc7ab7f0e@mail.gmail.com>
	<320fb6e00903241154g1a0f468cy512b29504b8b637a@mail.gmail.com>
	<6d941f120903250839i62f6d8f9i8a5f5b85ff694848@mail.gmail.com>
	<320fb6e00903250845u23dea2a6o5330bfdec0d577ef@mail.gmail.com>
Message-ID: <8b34ec180903250933y5a4bdf6elae31f683d2848205@mail.gmail.com>

2009/3/25 Peter <biopython at maubp.freeserve.co.uk>:
>
> We should probably all write something on the wiki page for our
> personal forks, describing what you're using it for, what at the main
> branches likely to be of interest etc.

Hi,

I'll be happy to write some draft version of guidelines for developers
and contibutors to the wiki.

It just seems that currently there are some problems with biopython
wiki. Does anyone know what is the problem?
Is it some kind of internal OBF issue or is it because of increased
interest in biopython after the application note was
published? Do we have access to any access statistics to the website?

cheers
Bartek


From biopython at maubp.freeserve.co.uk  Wed Mar 25 16:41:00 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Wed, 25 Mar 2009 16:41:00 +0000
Subject: [Biopython-dev] biopython on github
In-Reply-To: <8b34ec180903250933y5a4bdf6elae31f683d2848205@mail.gmail.com>
References: <320fb6e00903170146x59218aa0m857cab797ad3f440@mail.gmail.com>
	<C5E52504.1F20A%lpritc@scri.ac.uk>
	<20090317124930.GE57054@sobchak.mgh.harvard.edu>
	<6d941f120903241142m2f39213yfd180fcdc7ab7f0e@mail.gmail.com>
	<320fb6e00903241154g1a0f468cy512b29504b8b637a@mail.gmail.com>
	<6d941f120903250839i62f6d8f9i8a5f5b85ff694848@mail.gmail.com>
	<320fb6e00903250845u23dea2a6o5330bfdec0d577ef@mail.gmail.com>
	<8b34ec180903250933y5a4bdf6elae31f683d2848205@mail.gmail.com>
Message-ID: <320fb6e00903250941o6e99e06egb672b62f2d661e15@mail.gmail.com>

On Wed, Mar 25, 2009 at 4:33 PM, Bartek Wilczynski
<bartek at rezolwenta.eu.org> wrote:
>
> 2009/3/25 Peter <biopython at maubp.freeserve.co.uk>:
>>
>> We should probably all write something on the wiki page for our
>> personal forks, describing what you're using it for, what at the main
>> branches likely to be of interest etc.
>
> Hi,
>
> I'll be happy to write some draft version of guidelines for developers
> and contibutors to the wiki.

Certainly add a section to the git migration page.

> It just seems that currently there are some problems with biopython
> wiki. Does anyone know what is the problem?
> Is it some kind of internal OBF issue or is it because of increased
> interest in biopython after the application note was
> published? Do we have access to any access statistics to the website?

Its seems to be all the OBF pages (e.g. bioperl.org too), and its been
more than an hour so I'll drop their support team an email.

Peter


From sbassi at clubdelarazon.org  Wed Mar 25 16:59:28 2009
From: sbassi at clubdelarazon.org (Sebastian Bassi)
Date: Wed, 25 Mar 2009 13:59:28 -0300
Subject: [Biopython-dev] SeqIO and qual: Question about reading and
	writing qual files
In-Reply-To: <320fb6e00903250301v59319214pa3246e0a49899e87@mail.gmail.com>
References: <9e2f512b0903232324qb509c60v4154d3e1bffb089e@mail.gmail.com>
	<320fb6e00903240249h4d0bf648rfd5de741e582f687@mail.gmail.com>
	<9e2f512b0903240759n3c7f8b8fpc96bccd4d629082d@mail.gmail.com>
	<320fb6e00903240813x5fdb3589qef340129b5e267c0@mail.gmail.com>
	<9e2f512b0903240833g7768de97q8f10fe72cde7e64a@mail.gmail.com>
	<320fb6e00903250301v59319214pa3246e0a49899e87@mail.gmail.com>
Message-ID: <9e2f512b0903250959h26081e4ak3246252d02be2ee0@mail.gmail.com>

On Wed, Mar 25, 2009 at 7:01 AM, Peter <biopython at maubp.freeserve.co.uk> wrote:
....
> Sebastian - could you have a quick play with this github code (using the new
> UnknownSeq class), and the current CVS code (using None), and make sure
> both support the slicing operations you were trying earlier?  Thanks.

OK, I'll try both today and report back to the list.


From eric.talevich at gmail.com  Wed Mar 25 21:44:30 2009
From: eric.talevich at gmail.com (Eric Talevich)
Date: Wed, 25 Mar 2009 17:44:30 -0400
Subject: [Biopython-dev] PDB tidy script
In-Reply-To: <320fb6e00903231405l479ddcc6of9cd0c1aa8fd98d4@mail.gmail.com>
References: <320fb6e00903231405l479ddcc6of9cd0c1aa8fd98d4@mail.gmail.com>
Message-ID: <3f6baf360903251444l3064963bp788750ed7a67e4d4@mail.gmail.com>

On Mon, Mar 23, 2009 at 5:05 PM, Peter <biopython at maubp.freeserve.co.uk>wrote:

>
> If you look back over the history, there initially was no header parsing,
> it was a contribution from Kristian Rother, and I would agree, it is rather
> disjoint from the rest of the code.  One thing I personally wanted last
> time I was working with PDB files was to have secondary structure
> information (for them alpha and beta sheet lines in the header)
> mapped onto the residue objects automatically.
>
> And yes, Thomas is supporting the PDB module, but his time has
> been rather limited of late.  When I asked him about some of the
> open enhancement requests in bugzilla recently (off list) he said
> said we needed "a separate class to parse all the info in the header,
> not a slew of additions to the core parser class (which is designed
> to deal with the 3D data only)."
>
>
I can understand both those wishes. Looking at the features currently
available in the module, the best approach might be to leave the 3D parser
and PDB.Entity-derived classes alone and add another wrapper class
containing the header, sequence (maybe), secondary and tertiary structure as
separate attributes.

When working in the REPL, I've wished for a simpler function to load PDB
files by path and figure out the name automatically; this would be an easy
way to do it without violating Thomas's parser -- just use
parse_pdb_header() in the wrapper, and use the name from there as the first
argument to PDB.get_structure(). For example (quick & dirty):

class PDBLoader:
    def __init__(self, path):
        self.__dict__ = parse_pdb_header(path)
        if not self.name:
            self.name = os.path.basename(path).split('.')[0]
        parse_3d = PDBParser()
        self.structure = parse_3d.get_structure(self.name, path)
        # self.secondary = ?
        # link 1/2/3ary data in various ways ...

>>> pdb = PDBLoader('a_structure.pdb')
>>> dir(pdb)
['__doc__', '__init__', '__module__', 'author', 'compound',
'deposition_date', 'head', 'journal_reference', 'name', 'release_date',
'resolution', 'source', 'structure', 'structure_method',
'structure_reference']


In that case, it would be reasonable to let get_structure and
parse_pdb_header take an open file-like object as an alternative to the PDB
file's path to avoid opening and closing the same file repeatedly. There's
also some cleanup to do in parse_pdb_header.py alongside this.

Does this sound reasonable?

-Eric


From chapmanb at 50mail.com  Wed Mar 25 21:55:48 2009
From: chapmanb at 50mail.com (Brad Chapman)
Date: Wed, 25 Mar 2009 17:55:48 -0400
Subject: [Biopython-dev] biopython on github
In-Reply-To: <320fb6e00903250845u23dea2a6o5330bfdec0d577ef@mail.gmail.com>
References: <320fb6e00903170146x59218aa0m857cab797ad3f440@mail.gmail.com>
	<C5E52504.1F20A%lpritc@scri.ac.uk>
	<20090317124930.GE57054@sobchak.mgh.harvard.edu>
	<6d941f120903241142m2f39213yfd180fcdc7ab7f0e@mail.gmail.com>
	<320fb6e00903241154g1a0f468cy512b29504b8b637a@mail.gmail.com>
	<6d941f120903250839i62f6d8f9i8a5f5b85ff694848@mail.gmail.com>
	<320fb6e00903250845u23dea2a6o5330bfdec0d577ef@mail.gmail.com>
Message-ID: <20090325215548.GB21577@sobchak.mgh.harvard.edu>

Hey all;
Good discussion on this; I touch on a few points from different
threads below.

Michiel:
> I haven't been following this topic closely, and as an "outsider"
> using git seems more complicated than using cvs or svn. And to be
> honest, I don't know if Biopython actually needs the branching and
> forking stuff. I think that this is more useful for bigger projects,
> where multiple developers may be working on interrelated parts of code
> at the same time. That hardly ever happens in Biopython, though.

Tiago:
> I would actually take this argument and reverse it:
[...]
> Using a distributed technology allows for people to try new ideas and 
> to get things moving (while still maintaining an official rock stable 
> version with maybe glacial policies).

I fall in between these two viewpoints. Git has more complications and,
unless we manage those, we risk introducing additional barriers to
contribution. Imagine looking at biopython on git hub and seeing 10
different branches for different users, many of which may be old and
out of date. This could lead to the impression that we are not
organized toward a single goal. If you are still interested, how
do you know which ones could use your help and what they are for?

The solution to this is documentation on the wiki. We rely too much on
the mailing list and expect people to keep up. Peter read my mind on
this:

Peter:
> We should probably all write something on the wiki page for our
> personal forks, describing what you're using it for, what at the main
> branches likely to be of interest etc.

I started a page over the weekend doing this:

http://biopython.org/wiki/Active_projects

It's a skeleton so add or subtract away. My idea for this is that it
is for longer projects that could use outside help. It's not reasonable
to spend time writing up things you'll be finishing in a week or so; for
that bugzilla does fine keeping interested parties up to date.

Another idea on this page is a specific wish list of libraries for
future work. This is a starting point for anyone who comes into
Biopython fresh and would like to take something on. Also, it encourages
people who have developed external libraries to deal with problems we
are interested in to consider folding them into Biopython.

Me:
> > There is a lot of good material in this thread for new potential
> > developers. Tiago, it would make sense to condense what you've
> > written and include it with the Contributing guide:

Tiago:
> Just a followup on this: I think it makes no sense to put much of the
> new content before there is an official step of moving to github.

We are serious about moving to Git and need to have the documentation in
place so others can learn it. You wrote up a lot of good stuff, and it
will be lost on the mailing list.

Brad


From bugzilla-daemon at portal.open-bio.org  Wed Mar 25 22:43:57 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 25 Mar 2009 18:43:57 -0400
Subject: [Biopython-dev] [Bug 2799] UnknownSeq object (e.g. for QUAL files)
In-Reply-To: <bug-2799-42@http.bugzilla.open-bio.org/>
Message-ID: <200903252243.n2PMhvoT007523@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2799


------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk  2009-03-25 18:43 EST -------
I've made my first attempt at this available as a personal branch on github,
http://github.com/peterjc/biopython/tree/bug2799-UnknownSeq


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From sbassi at clubdelarazon.org  Wed Mar 25 23:15:05 2009
From: sbassi at clubdelarazon.org (Sebastian Bassi)
Date: Wed, 25 Mar 2009 20:15:05 -0300
Subject: [Biopython-dev] SeqIO and qual: Question about reading and
	writing qual files
In-Reply-To: <320fb6e00903250301v59319214pa3246e0a49899e87@mail.gmail.com>
References: <9e2f512b0903232324qb509c60v4154d3e1bffb089e@mail.gmail.com>
	<320fb6e00903240249h4d0bf648rfd5de741e582f687@mail.gmail.com>
	<9e2f512b0903240759n3c7f8b8fpc96bccd4d629082d@mail.gmail.com>
	<320fb6e00903240813x5fdb3589qef340129b5e267c0@mail.gmail.com>
	<9e2f512b0903240833g7768de97q8f10fe72cde7e64a@mail.gmail.com>
	<320fb6e00903250301v59319214pa3246e0a49899e87@mail.gmail.com>
Message-ID: <9e2f512b0903251615x7c14c90en3b3a9b2b6ff86186@mail.gmail.com>

On Wed, Mar 25, 2009 at 7:01 AM, Peter <biopython at maubp.freeserve.co.uk> wrote:
> Sebastian - could you have a quick play with this github code (using the new
> UnknownSeq class), and the current CVS code (using None), and make sure
> both support the slicing operations you were trying earlier?  Thanks.

First I tried the CVS code (with None in seq), it worked.
Then I tried the git code and it also worked. One thing I noticed is
that I got "?" instead of "N" the "sequence" of the UnknownSeq.
>From a practical point of view, both versions are the same, but the
concept of UnknownSeq looks solid than None, because if I don't know
about about biopython internals, I would never try to slice a None
seq. With "None":
len(s) returns:

Traceback (most recent call last):
  File "/home/sbassi/bioinfo/INTA/qualparser.py", line 21, in <module>
    print len(s)
  File "/home/sbassi/test/virtualenv-1.3.2/t6/lib/python2.5/site-packages/biopython-1.49-py2.5-linux-i686.egg/Bio/SeqRecord.py",
line 481, in __len__
    return len(self.seq)
TypeError: object of type 'NoneType' has no len()

So I would never try to do:
new_s = s[10:30]

But with the UnknownSeq object, len(s) returns an actual length, so it
is more intuitive that it can be sliced.

I liked the github interface, may I setup my own repository?

Best,

-- 
Sebasti?n Bassi. Diplomado en Ciencia y Tecnolog?a.

Non standard disclaimer: READ CAREFULLY. By reading this email,
you agree, on behalf of your employer, to release me from all
obligations and waivers arising from any and all NON-NEGOTIATED
agreements, licenses, terms-of-service, shrinkwrap, clickwrap,
browsewrap, confidentiality, non-disclosure, non-compete and
acceptable use policies ("BOGUS AGREEMENTS") that I have
entered into with your employer, its partners, licensors, agents and
assigns, in perpetuity, without prejudice to my ongoing rights and
privileges. You further represent that you have the authority to release
me from any BOGUS AGREEMENTS on behalf of your employer.


From biopython at maubp.freeserve.co.uk  Wed Mar 25 23:30:14 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Wed, 25 Mar 2009 23:30:14 +0000
Subject: [Biopython-dev] SeqIO and qual: Question about reading and
	writing qual files
In-Reply-To: <9e2f512b0903251615x7c14c90en3b3a9b2b6ff86186@mail.gmail.com>
References: <9e2f512b0903232324qb509c60v4154d3e1bffb089e@mail.gmail.com>
	<320fb6e00903240249h4d0bf648rfd5de741e582f687@mail.gmail.com>
	<9e2f512b0903240759n3c7f8b8fpc96bccd4d629082d@mail.gmail.com>
	<320fb6e00903240813x5fdb3589qef340129b5e267c0@mail.gmail.com>
	<9e2f512b0903240833g7768de97q8f10fe72cde7e64a@mail.gmail.com>
	<320fb6e00903250301v59319214pa3246e0a49899e87@mail.gmail.com>
	<9e2f512b0903251615x7c14c90en3b3a9b2b6ff86186@mail.gmail.com>
Message-ID: <320fb6e00903251630t45da293fl4d8d111b7e7eedc9@mail.gmail.com>

On Wed, Mar 25, 2009 at 11:15 PM, Sebastian Bassi:
>> Sebastian - could you have a quick play with this github code (using the new
>> UnknownSeq class), and the current CVS code (using None), and make sure
>> both support the slicing operations you were trying earlier? ?Thanks.
>
> First I tried the CVS code (with None in seq), it worked.

OK, good.  That will do in the very short term - the UnknownSeq needs
some more testing and general approval before I'd check that in.

> Then I tried the git code and it also worked. One thing I noticed is
> that I got "?" instead of "N" the "sequence" of the UnknownSeq.

I felt we shouldn't use an "N" unless we are confident the sequence
is nucleotides.  In practice, this is probably a safe assumption for
FASTQ and QUAL files - unless anyone can think of a counter example?
Do you think it is safe to assume FASTQ and QUAL files are just for
nucleotides?

I mean, you could translate a CDS from transcriptome sequencing,
and for the sake of argument give each amino acid a quality score
from the three nucleotide quality scores, and then save this a protein
FASTQ file.  But I've never heard of anyone actually doing this ;)

> From a practical point of view, both versions are the same, but the
> concept of UnknownSeq looks solid than None, because if I don't know
> about about biopython internals, I would never try to slice a None
> seq. With "None":
> len(s) returns:
>
> Traceback (most recent call last):
> ...
> TypeError: object of type 'NoneType' has no len()
>
> So I would never try to do:
> new_s = s[10:30]
>
> But with the UnknownSeq object, len(s) returns an actual length, so it
> is more intuitive that it can be sliced.

I agree the UnknownSeq is more intuitive - plus it makes the SeqRecord
__getitem__ code nicer, and it means you can do len(SeqRecord) too,
which was problematic if the sequence was None.

>
> I liked the github interface, may I setup my own repository?
>

Yes - this is one of the nice things about git, it makes it easy for anyone
to make their own local branch of Biopython, but keep it under version
control and pull in changes from the master branch (or another git user)
quite easily.  It should also make it easy to offer changes back to the
main project (assuming we do switch to hosting it on git, for now it is
still being done via CVS).  However, bear in mind this is still only a test
migration, and it is still possible we'll have to redo the CVS to git
migration.  There is a long (and on going) thread on this mailing list
about all this already, with an evolving wiki page:
http://biopython.org/wiki/GitMigration

Peter


From bartek at rezolwenta.eu.org  Thu Mar 26 01:02:59 2009
From: bartek at rezolwenta.eu.org (Bartek Wilczynski)
Date: Thu, 26 Mar 2009 02:02:59 +0100
Subject: [Biopython-dev] biopython on github
In-Reply-To: <20090325215548.GB21577@sobchak.mgh.harvard.edu>
References: <320fb6e00903170146x59218aa0m857cab797ad3f440@mail.gmail.com>
	<C5E52504.1F20A%lpritc@scri.ac.uk>
	<20090317124930.GE57054@sobchak.mgh.harvard.edu>
	<6d941f120903241142m2f39213yfd180fcdc7ab7f0e@mail.gmail.com>
	<320fb6e00903241154g1a0f468cy512b29504b8b637a@mail.gmail.com>
	<6d941f120903250839i62f6d8f9i8a5f5b85ff694848@mail.gmail.com>
	<320fb6e00903250845u23dea2a6o5330bfdec0d577ef@mail.gmail.com>
	<20090325215548.GB21577@sobchak.mgh.harvard.edu>
Message-ID: <8b34ec180903251802h30661c80q51aab573f5c07c5@mail.gmail.com>

On Wed, Mar 25, 2009 at 10:55 PM, Brad Chapman <chapmanb at 50mail.com> wrote:
> Hey all;
> Good discussion on this; I touch on a few points from different
> threads below.
>
Indeed, I'm very happy that we got the ball rolling and more people
now take part in the discussion.

> I fall in between these two viewpoints. Git has more complications and,
> unless we manage those, we risk introducing additional barriers to
> contribution. Imagine looking at biopython on git hub and seeing 10
> different branches for different users, many of which may be old and
> out of date. This could lead to the impression that we are not
> organized toward a single goal. If you are still interested, how
> do you know which ones could use your help and what they are for?
>
> The solution to this is documentation on the wiki. We rely too much on
> the mailing list and expect people to keep up. Peter read my mind on
> this:
>
> Peter:
>> We should probably all write something on the wiki page for our
>> personal forks, describing what you're using it for, what at the main
>> branches likely to be of interest etc.
>
> I started a page over the weekend doing this:
>
> http://biopython.org/wiki/Active_projects
>
> It's a skeleton so add or subtract away. My idea for this is that it
> is for longer projects that could use outside help. It's not reasonable
> to spend time writing up things you'll be finishing in a week or so; for
> that bugzilla does fine keeping interested parties up to date.
>
> Another idea on this page is a specific wish list of libraries for
> future work. This is a starting point for anyone who comes into
> Biopython fresh and would like to take something on. Also, it encourages
> people who have developed external libraries to deal with problems we
> are interested in to consider folding them into Biopython.

Great ideas. I fully agree that we need clear documentation if we want
more people to contribute.

>
> Me:
>> > There is a lot of good material in this thread for new potential
>> > developers. Tiago, it would make sense to condense what you've
>> > written and include it with the Contributing guide:
>
> Tiago:
>> Just a followup on this: I think it makes no sense to put much of the
>> new content before there is an official step of moving to github.
>
> We are serious about moving to Git and need to have the documentation in
> place so others can learn it. You wrote up a lot of good stuff, and it
> will be lost on the mailing list.

Continuing on that topic. I think there are three (more or less
separate) issues here:
1) Describing git usage technically, to make sure all developers have
a smooth transition to git from CVS
2) Describing typical ways to use git in biopython. This is very
important to calrify how we are going to use cool features of
git/github in biopython. I'm not advocating here to write it very
precisely and I'm fully aware that it's going to change over time as
we learn to use things better, but writing things up will help us
understand how we want to use git/github.
3) General contributing guide with coding style and testing framework etc.

I think that point 3 is quite well separated from the other two
points, which are more git related. I think it is also nicely handled
by the current wiki page:
http://biopython.org/wiki/Contributing. It might be mildly adapted to
include some info on git branches, but these will be minor things.

Points 1 and 2 are not so easily separable, but I don't think it's a
major problem. Current version of the
http://biopython.org/wiki/GitMigration
 touches upon them, but it is meant as a temporary info, so it does
not describe how things should be done after we really make the
switch. I think we need to spearate these issues (temporary
arrangements vs. final desired procedures), so I made a new wiki page:
 http://biopython.org/wiki/GitUsage
which is meant as an early draft of such guidelines. This page is
meant to serve as a technical tutorial describing typical tasks in
biopython development.

Please feel free to modify/expand this page and/or send comments to
the mailing list.
I've tried to keep it close to our current development model, but
there is a lot of room for discussion and I'm very open to new ideas.

cheers
  Bartek


From lpritc at scri.ac.uk  Thu Mar 26 11:21:26 2009
From: lpritc at scri.ac.uk (Leighton Pritchard)
Date: Thu, 26 Mar 2009 11:21:26 +0000
Subject: [Biopython-dev] Biopython on Twitter
Message-ID: <C5F115B6.1FA1D%lpritc@scri.ac.uk>

Hi all,

There's a fair old bit of chatter on the latest bandwagon: Twitter, about
Biopython 
(http://search.twitter.com/search?max_id=1393366734&page=1&q=biopython).
Seeing as both BioPerl and the OBF have 'official' Twitter accounts, it
might be useful to have a Biopython Twitter account as a way of getting news
out automatically (there's a python-twitter API:
http://code.google.com/p/python-twitter/), and as a way of facilitating
conversation or community around Biopython - suitable representatives of the
official edifice/holders of the password no doubt to be discussed ;)

Anyhoo, to avoid it being squatted in the interim, I've set up an account in
Biopython's name, with Peter's email account (thanks, Peter) - he also knows
the password.  

If no-one likes the idea or thinks it worthwhile, or Twitter goes the way of
Gopher and OS/2 Warp in short order, it can just die on the vine - but given
the number of tweets mentioning Biopython, it would be a shame for that to
happen too soon ;)

The Biopython Twitter home page is at http://twitter.com/Biopython

L.

-- 
Dr Leighton Pritchard MRSC
D131, Plant Pathology Programme, SCRI
Errol Road, Invergowrie, Perth and Kinross, Scotland, DD2 5DA
e:lpritc at scri.ac.uk       w:http://www.scri.ac.uk/staff/leightonpritchard
gpg/pgp: 0xFEFC205C       tel:+44(0)1382 562731 x2405


______________________________________________________
SCRI, Invergowrie, Dundee, DD2 5DA.  
The Scottish Crop Research Institute is a charitable company limited by guarantee. 
Registered in Scotland No: SC 29367.
Recognised by the Inland Revenue as a Scottish Charity No: SC 006662.


DISCLAIMER:

This email is from the Scottish Crop Research Institute, but the views expressed by the sender are not necessarily the views of SCRI and its subsidiaries.  This email and any files transmitted with it are confidential to the intended recipient at the e-mail address to which it has been addressed.  It may not be disclosed or used by any other than that
addressee.
If you are not the intended recipient you are requested to preserve this confidentiality and you must not use, disclose, copy, print or rely on
this e-mail in any way. Please notify postmaster at scri.ac.uk quoting the name of the sender and delete the email from your system.

Although SCRI has taken reasonable precautions to ensure no viruses are present in this email, neither the Institute nor the sender accepts any responsibility for any viruses, and it is your responsibility to scan the email and the attachments (if any).
______________________________________________________


From tiagoantao at gmail.com  Thu Mar 26 12:13:20 2009
From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=)
Date: Thu, 26 Mar 2009 12:13:20 +0000
Subject: [Biopython-dev] biopython on github
In-Reply-To: <20090325215548.GB21577@sobchak.mgh.harvard.edu>
References: <320fb6e00903170146x59218aa0m857cab797ad3f440@mail.gmail.com>
	<C5E52504.1F20A%lpritc@scri.ac.uk>
	<20090317124930.GE57054@sobchak.mgh.harvard.edu>
	<6d941f120903241142m2f39213yfd180fcdc7ab7f0e@mail.gmail.com>
	<320fb6e00903241154g1a0f468cy512b29504b8b637a@mail.gmail.com>
	<6d941f120903250839i62f6d8f9i8a5f5b85ff694848@mail.gmail.com>
	<320fb6e00903250845u23dea2a6o5330bfdec0d577ef@mail.gmail.com>
	<20090325215548.GB21577@sobchak.mgh.harvard.edu>
Message-ID: <6d941f120903260513v734b5dd8kd8d148bebec9674b@mail.gmail.com>

Hi,

On Wed, Mar 25, 2009 at 9:55 PM, Brad Chapman <chapmanb at 50mail.com> wrote:
> The solution to this is documentation on the wiki. We rely too much on
> the mailing list and expect people to keep up. Peter read my mind on
> this:

I fully agree on this. There is lots of implicit policy that is either
not documented at all or only to be read here on the mailing list. All
should be on the wiki. Clear, transparent, explicit, for everybody to
see (at least that is my personal opinion).


> We are serious about moving to Git and need to have the documentation in
> place so others can learn it. You wrote up a lot of good stuff, and it
> will be lost on the mailing list.

I am planning on changing http://biopython.org/wiki/PopGen_dev and
"GITify" it completely. I will draft a document with a policy for
updates (just as a starting point, please feel free to disagree), the
currently existing branches and so on.

I will include a set of tips on how to pull stuff from GIT, regarding
this part I note:
a. maybe this can be moved, in the future, to the general biopython documentaion
b. I am far from being a git specialist. Corrections will surely be
needed and encouraged.

I will write back here when the changes are done.

Tiago


From jblanca at btc.upv.es  Thu Mar 26 12:24:59 2009
From: jblanca at btc.upv.es (Jose Blanca)
Date: Thu, 26 Mar 2009 13:24:59 +0100
Subject: [Biopython-dev] [BioPython] about the SeqRecord slicing
In-Reply-To: <320fb6e00903260505j387279b7kfa4c69c33efe5487@mail.gmail.com>
References: <200903261248.02279.jblanca@btc.upv.es>
	<320fb6e00903260505j387279b7kfa4c69c33efe5487@mail.gmail.com>
Message-ID: <200903261324.59655.jblanca@btc.upv.es>

Fisrt of all sorry for sending the last mail to the BioPython general list.

On Thursday 26 March 2009 13:05:25 Peter wrote:
> Can you give me an example of where you want to pull out a single
> character from a SeqRecord, and its quality? ?I would consider things
> like this quite elegant:
>
> for letter, quality in zip(record.seq,
> record.letter_annotations("phred_quality") :
> ? ?#do stuff
I'm implementing a Contig class similar to the Alignment class but with the 
added capability of supporting sequences that do not start and end at the 
same position and with the capability of masking the sequences.
I'm implementing the __getitem__ method.
When I request a column I get for all sequences a int slice and I return the 
result of adding them all. I could solve the problem as you suggest. The 
problem is that this Contig class can work also with Seqs and strs (to 
simplify its use when we don't need a full SeqRecord). If SeqRecord behaves 
more like a Seq or a str I wouldn't need to check for the special SeqRecord 
case in the Contig.__getitem__ method.
Best regards,

-- 
Jose M. Blanca Postigo
Instituto Universitario de Conservacion y
Mejora de la Agrodiversidad Valenciana (COMAV)
Universidad Politecnica de Valencia (UPV)
Edificio CPI (Ciudad Politecnica de la Innovacion), 8E
46022 Valencia (SPAIN)
Tlf.:+34-96-3877000 (ext 88473)


From chapmanb at 50mail.com  Thu Mar 26 12:57:07 2009
From: chapmanb at 50mail.com (Brad Chapman)
Date: Thu, 26 Mar 2009 08:57:07 -0400
Subject: [Biopython-dev] biopython on github
In-Reply-To: <8b34ec180903251802h30661c80q51aab573f5c07c5@mail.gmail.com>
References: <320fb6e00903170146x59218aa0m857cab797ad3f440@mail.gmail.com>
	<C5E52504.1F20A%lpritc@scri.ac.uk>
	<20090317124930.GE57054@sobchak.mgh.harvard.edu>
	<6d941f120903241142m2f39213yfd180fcdc7ab7f0e@mail.gmail.com>
	<320fb6e00903241154g1a0f468cy512b29504b8b637a@mail.gmail.com>
	<6d941f120903250839i62f6d8f9i8a5f5b85ff694848@mail.gmail.com>
	<320fb6e00903250845u23dea2a6o5330bfdec0d577ef@mail.gmail.com>
	<20090325215548.GB21577@sobchak.mgh.harvard.edu>
	<8b34ec180903251802h30661c80q51aab573f5c07c5@mail.gmail.com>
Message-ID: <20090326125707.GE21577@sobchak.mgh.harvard.edu>

Hi all;

Bartek:
> Continuing on that topic. I think there are three (more or less
> separate) issues here:
> 1) Describing git usage technically, to make sure all developers have
> a smooth transition to git from CVS
> 2) Describing typical ways to use git in biopython. 
[...]
> 3) General contributing guide with coding style and testing framework etc.
> 
> I think that point 3 is quite well separated from the other two
> points, which are more git related. I think it is also nicely handled
> by the current wiki page: http://biopython.org/wiki/Contributing. 
[...]
> Points 1 and 2 are not so easily separable, but I don't think it's a
> major problem. Current version of the
> http://biopython.org/wiki/GitMigration
>  touches upon them, but it is meant as a temporary info, so it does
> not describe how things should be done after we really make the
> switch. I think we need to spearate these issues (temporary
> arrangements vs. final desired procedures), so I made a new wiki page:
>  http://biopython.org/wiki/GitUsage
> which is meant as an early draft of such guidelines. This page is
> meant to serve as a technical tutorial describing typical tasks in
> biopython development.

Great writeup, and I agree with you on everything up until the last
point. Why do we need two pages with overlapping information? This
means we have to do more work to keep them in sync and creates confusion.
GitMigration is/was our documentation page. If it is the name that
makes it seem temporary, we should kill GitMigration and re-route all
wiki links to GitUsage. Then we can continue forward with getting
the documentation up to par on GitUsage.

Having the disclaimer that the page and migration is in process is
enough of a warning. When we move to git permanently, we can just
remove the warnings, update the final links and we will be done.

Brad


From tiagoantao at gmail.com  Thu Mar 26 13:09:31 2009
From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=)
Date: Thu, 26 Mar 2009 13:09:31 +0000
Subject: [Biopython-dev] biopython on github
In-Reply-To: <20090326125707.GE21577@sobchak.mgh.harvard.edu>
References: <320fb6e00903170146x59218aa0m857cab797ad3f440@mail.gmail.com>
	<C5E52504.1F20A%lpritc@scri.ac.uk>
	<20090317124930.GE57054@sobchak.mgh.harvard.edu>
	<6d941f120903241142m2f39213yfd180fcdc7ab7f0e@mail.gmail.com>
	<320fb6e00903241154g1a0f468cy512b29504b8b637a@mail.gmail.com>
	<6d941f120903250839i62f6d8f9i8a5f5b85ff694848@mail.gmail.com>
	<320fb6e00903250845u23dea2a6o5330bfdec0d577ef@mail.gmail.com>
	<20090325215548.GB21577@sobchak.mgh.harvard.edu>
	<8b34ec180903251802h30661c80q51aab573f5c07c5@mail.gmail.com>
	<20090326125707.GE21577@sobchak.mgh.harvard.edu>
Message-ID: <6d941f120903260609q247ad2b0o4c810fa7afda7449@mail.gmail.com>

I've added some text regarding git on
http://biopython.org/wiki/PopGen_dev
(see "Code and Contributing" and "Existing Development branches").
Feel free to criticise. I've included a link to the wonderful GitUsage page
Giovanni: if you feel that I've deleted/changed something I should not
have, please say.


On Thu, Mar 26, 2009 at 12:57 PM, Brad Chapman <chapmanb at 50mail.com> wrote:
> Hi all;
>
> Bartek:
>> Continuing on that topic. I think there are three (more or less
>> separate) issues here:
>> 1) Describing git usage technically, to make sure all developers have
>> a smooth transition to git from CVS
>> 2) Describing typical ways to use git in biopython.
> [...]
>> 3) General contributing guide with coding style and testing framework etc.
>>
>> I think that point 3 is quite well separated from the other two
>> points, which are more git related. I think it is also nicely handled
>> by the current wiki page: http://biopython.org/wiki/Contributing.
> [...]
>> Points 1 and 2 are not so easily separable, but I don't think it's a
>> major problem. Current version of the
>> http://biopython.org/wiki/GitMigration
>> ?touches upon them, but it is meant as a temporary info, so it does
>> not describe how things should be done after we really make the
>> switch. I think we need to spearate these issues (temporary
>> arrangements vs. final desired procedures), so I made a new wiki page:
>> ?http://biopython.org/wiki/GitUsage
>> which is meant as an early draft of such guidelines. This page is
>> meant to serve as a technical tutorial describing typical tasks in
>> biopython development.
>
> Great writeup, and I agree with you on everything up until the last
> point. Why do we need two pages with overlapping information? This
> means we have to do more work to keep them in sync and creates confusion.
> GitMigration is/was our documentation page. If it is the name that
> makes it seem temporary, we should kill GitMigration and re-route all
> wiki links to GitUsage. Then we can continue forward with getting
> the documentation up to par on GitUsage.
>
> Having the disclaimer that the page and migration is in process is
> enough of a warning. When we move to git permanently, we can just
> remove the warnings, update the final links and we will be done.
>
> Brad
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev
>


-- 
 "A man who dares to waste one hour of time has not discovered the
value of life" - Charles Darwin


From bartek at rezolwenta.eu.org  Thu Mar 26 14:49:54 2009
From: bartek at rezolwenta.eu.org (Bartek Wilczynski)
Date: Thu, 26 Mar 2009 15:49:54 +0100
Subject: [Biopython-dev] biopython on github
In-Reply-To: <20090326125707.GE21577@sobchak.mgh.harvard.edu>
References: <320fb6e00903170146x59218aa0m857cab797ad3f440@mail.gmail.com>
	<C5E52504.1F20A%lpritc@scri.ac.uk>
	<20090317124930.GE57054@sobchak.mgh.harvard.edu>
	<6d941f120903241142m2f39213yfd180fcdc7ab7f0e@mail.gmail.com>
	<320fb6e00903241154g1a0f468cy512b29504b8b637a@mail.gmail.com>
	<6d941f120903250839i62f6d8f9i8a5f5b85ff694848@mail.gmail.com>
	<320fb6e00903250845u23dea2a6o5330bfdec0d577ef@mail.gmail.com>
	<20090325215548.GB21577@sobchak.mgh.harvard.edu>
	<8b34ec180903251802h30661c80q51aab573f5c07c5@mail.gmail.com>
	<20090326125707.GE21577@sobchak.mgh.harvard.edu>
Message-ID: <8b34ec180903260749q2b59594fo1d34cd1f721ff3b7@mail.gmail.com>

Hi,

On Thu, Mar 26, 2009 at 1:57 PM, Brad Chapman <chapmanb at 50mail.com> wrote:
> Great writeup, and I agree with you on everything up until the last
> point. Why do we need two pages with overlapping information? This
> means we have to do more work to keep them in sync and creates confusion.
> GitMigration is/was our documentation page. If it is the name that
> makes it seem temporary, we should kill GitMigration and re-route all
> wiki links to GitUsage. Then we can continue forward with getting
> the documentation up to par on GitUsage.
>
> Having the disclaimer that the page and migration is in process is
> enough of a warning. When we move to git permanently, we can just
> remove the warnings, update the final links and we will be done.
>

I agree that two pages with mostly the same stuff is too much. My
original idea was to first extract the "non-temporary" info from
the GitMigration page and expand it into the GitUsage page. It needs a
lot of work but at least the extraction part is don. Now I would
suggest not to kill the GitMigration, but to remove most things from
it and just leave the stuff relevant for the (hopefully not too long)
transitional period.

After a second of thought I decided to go ahead and change the
GitMigration so that it does not overlap with GitUsage. See for
yourself here:
http://biopython.org/wiki/GitMigration

We can revert the changes if people don't like it.

cheers
  Bartek


From biopython at maubp.freeserve.co.uk  Thu Mar 26 15:07:33 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Thu, 26 Mar 2009 15:07:33 +0000
Subject: [Biopython-dev] [BioPython] about the SeqRecord slicing
In-Reply-To: <200903261324.59655.jblanca@btc.upv.es>
References: <200903261248.02279.jblanca@btc.upv.es>
	<320fb6e00903260505j387279b7kfa4c69c33efe5487@mail.gmail.com>
	<200903261324.59655.jblanca@btc.upv.es>
Message-ID: <320fb6e00903260807m64d36b55n41cce7510a6809e3@mail.gmail.com>

On Thu, Mar 26, 2009 at 12:24 PM, Jose Blanca <jblanca at btc.upv.es> wrote:
> On Thursday 26 March 2009 13:05:25 Peter wrote:
>> Can you give me an example of where you want to pull out a single
>> character from a SeqRecord, and its quality? ?I would consider things
>> like this quite elegant:
>>
>> for letter, quality in zip(record.seq,
>> record.letter_annotations("phred_quality") :
>> ? ?#do stuff
>
> I'm implementing a Contig class similar to the Alignment class but with the
> added capability of supporting sequences that do not start and end at the
> same position and with the capability of masking the sequences.
> I'm implementing the __getitem__ method.
> When I request a column I get for all sequences a int slice and I return the
> result of adding them all. I could solve the problem as you suggest. The
> problem is that this Contig class can work also with Seqs and strs (to
> simplify its use when we don't need a full SeqRecord). If SeqRecord behaves
> more like a Seq or a str I wouldn't need to check for the special SeqRecord
> case in the Contig.__getitem__ method.
> Best regards,

If you pull out a column from a Seq or string based alignment, there is no
annotation to worry about, and you can return the column as a Seq or string.
As things stand, if it was a SeqRecord based alignment, having my_string[i],
my_seq[i] and my_seqrecord[i] all return a single letter string is actually
rather nice for generic code - as long as you are happy returning a Seq or a
string for the column.

However, if I understand you, when pulling a column from a SeqRecord
based alignment in addition to the column's sequence you'd like the get the
per-letter-annotations as well.  This assumes that all the SeqRecord objects
in the alignment have the same per-letter-annotation present - some might
have quality and others might not!  But how would you want to store this
new column object?  Using a string or a Seq doesn't support any annotation
 - you *could* use a SeqRecord with no id, name, description, features,
annotation - just a sequence and any common per-letter-annotation.  Is this
what you had in mind?

Peter


From jblanca at btc.upv.es  Thu Mar 26 15:14:13 2009
From: jblanca at btc.upv.es (Jose Blanca)
Date: Thu, 26 Mar 2009 16:14:13 +0100
Subject: [Biopython-dev] [BioPython] about the SeqRecord slicing
In-Reply-To: <320fb6e00903260807m64d36b55n41cce7510a6809e3@mail.gmail.com>
References: <200903261248.02279.jblanca@btc.upv.es>
	<200903261324.59655.jblanca@btc.upv.es>
	<320fb6e00903260807m64d36b55n41cce7510a6809e3@mail.gmail.com>
Message-ID: <200903261614.13454.jblanca@btc.upv.es>

On Thursday 26 March 2009 16:07:33 Peter wrote:
> On Thu, Mar 26, 2009 at 12:24 PM, Jose Blanca <jblanca at btc.upv.es> wrote:
> > On Thursday 26 March 2009 13:05:25 Peter wrote:

> However, if I understand you, when pulling a column from a SeqRecord
> based alignment in addition to the column's sequence you'd like the get the
> per-letter-annotations as well.  This assumes that all the SeqRecord
> objects in the alignment have the same per-letter-annotation present - some
> might have quality and others might not!  But how would you want to store
> this new column object?  Using a string or a Seq doesn't support any
> annotation - you *could* use a SeqRecord with no id, name, description,
> features, annotation - just a sequence and any common
> per-letter-annotation.  Is this what you had in mind?
Yes, that's exactly what I have in mind. Do you see any problem with that 
approach?

-- 
Jose M. Blanca Postigo
Instituto Universitario de Conservacion y
Mejora de la Agrodiversidad Valenciana (COMAV)
Universidad Politecnica de Valencia (UPV)
Edificio CPI (Ciudad Politecnica de la Innovacion), 8E
46022 Valencia (SPAIN)
Tlf.:+34-96-3877000 (ext 88473)


From biopython at maubp.freeserve.co.uk  Thu Mar 26 15:32:23 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Thu, 26 Mar 2009 15:32:23 +0000
Subject: [Biopython-dev] [BioPython] about the SeqRecord slicing
In-Reply-To: <200903261614.13454.jblanca@btc.upv.es>
References: <200903261248.02279.jblanca@btc.upv.es>
	<200903261324.59655.jblanca@btc.upv.es>
	<320fb6e00903260807m64d36b55n41cce7510a6809e3@mail.gmail.com>
	<200903261614.13454.jblanca@btc.upv.es>
Message-ID: <320fb6e00903260832m65c6888dpc856d033ceceda5@mail.gmail.com>

On Thu, Mar 26, 2009 at 3:14 PM, Jose Blanca <jblanca at btc.upv.es> wrote:
> On Thursday 26 March 2009 16:07:33 Peter wrote:
>> However, if I understand you, when pulling a column from a SeqRecord
>> based alignment in addition to the column's sequence you'd like the get the
>> per-letter-annotations as well. ?This assumes that all the SeqRecord
>> objects in the alignment have the same per-letter-annotation present - some
>> might have quality and others might not! ?But how would you want to store
>> this new column object? ?Using a string or a Seq doesn't support any
>> annotation - you *could* use a SeqRecord with no id, name, description,
>> features, annotation - just a sequence and any common
>> per-letter-annotation. ?Is this what you had in mind?
>
> Yes, that's exactly what I have in mind. Do you see any problem with that
> approach?

Well yes.  For your code to work on SeqRecord objects (based on the
verbal description earlier), it needs at least the following changes
to the SeqRecord:

The SeqRecord __getitem__ would have to return a SeqRecord when given
a single integer index, holding a single letter sequence.  What about
the name/id/description and annotations (e.g. organism) - do they
really apply to a single letter from the sequence?  Technically
writing the code to offer this isn't such a problem, but I am
unconvinced this is the best behaviour for normal usage.

Also closely related to this, what would you expect __iter__ to
iterate over?  Currently it acts like iteration over the record's
sequence.

You'd also want the SeqRecord to support __add__ (and __radd__) so
that two SeqRecord objects can be added together.  I have thought
about this before, and it is a *much* more complicated issue due to
the meta data.  In general the only safe and unambiguous choice is to
exclude it from the combined record:
* sequence - just add (using normal rules for adding Seq objects)
* name/id/description - if the two agree, use that?  Otherwise default
to a blank value?
* annotations - for each keyed value, you could combine the entries?
Or just throwing them all away?
* letter_annotations - if an entry is present in both you can combine
it.  Otherwise throw them away?
* features - these could be combined, adjusting the locations for one
record's features as appropriate

I'm not ruling out adding SeqRecord addition, but I don't want to rush
it while we are trying to get Biopython 1.50 done.

Peter


From biopython at maubp.freeserve.co.uk  Thu Mar 26 15:49:49 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Thu, 26 Mar 2009 15:49:49 +0000
Subject: [Biopython-dev] Biopython on Twitter
In-Reply-To: <C5F115B6.1FA1D%lpritc@scri.ac.uk>
References: <C5F115B6.1FA1D%lpritc@scri.ac.uk>
Message-ID: <320fb6e00903260849n683d3e39kf68fd91727970dc7@mail.gmail.com>

On Thu, Mar 26, 2009 at 11:21 AM, Leighton Pritchard <lpritc at scri.ac.uk> wrote:
> Hi all,
>
> There's a fair old bit of chatter on the latest bandwagon: Twitter, about
> Biopython
> (http://search.twitter.com/search?max_id=1393366734&page=1&q=biopython).
> Seeing as both BioPerl and the OBF have 'official' Twitter accounts, it
> might be useful to have a Biopython Twitter account as a way of getting news
> out automatically (there's a python-twitter API:
> http://code.google.com/p/python-twitter/), and as a way of facilitating
> conversation or community around Biopython - suitable representatives of the
> official edifice/holders of the password no doubt to be discussed ;)
>
> Anyhoo, to avoid it being squatted in the interim, I've set up an account in
> Biopython's name, with Peter's email account (thanks, Peter) - he also knows
> the password.
>
> If no-one likes the idea or thinks it worthwhile, or Twitter goes the way of
> Gopher and OS/2 Warp in short order, it can just die on the vine - but given
> the number of tweets mentioning Biopython, it would be a shame for that to
> happen too soon ;)
>
> The Biopython Twitter home page is at http://twitter.com/Biopython

Quite a few people have started following this already - which is fun.  I see
the OBF news page entries are automatically pushed to their twitter account,
http://twitter.com/obf_news plus the BioPerl tagged entries are also pushed
to http://twitter.com/bioperl - I'll get in touch to see how they did
it so we can
have the Biopython news feed automatically echoed to twitter as well.

This servers as a good point to remind/inform you that there are RSS, Atom
etc feeds for the Biopython news - links on http://biopython.org/wiki/News

e.g.
http://news.open-bio.org/news/category/obf-projects/biopython/feed/rdf
http://news.open-bio.org/news/category/obf-projects/biopython/feed/rss
http://news.open-bio.org/news/category/obf-projects/biopython/feed/rss2
http://news.open-bio.org/news/category/obf-projects/biopython/feed/atom

We could probably also echo the CVS (or git) RSS feed into twitter, but I
suspect that would drown out any more interesting tweets.  The RSS feed
is listed on http://biopython.org/wiki/CVS and shown on the wiki too at:
http://biopython.org/wiki/Tracking_CVS_commits (not sure how often this
gets updated).  The feed itself is here:
http://biopython.open-bio.org/CVS2RSS/biopython.rss

Peter


From lpritc at scri.ac.uk  Thu Mar 26 16:31:07 2009
From: lpritc at scri.ac.uk (Leighton Pritchard)
Date: Thu, 26 Mar 2009 16:31:07 +0000
Subject: [Biopython-dev] Biopython on Twitter
In-Reply-To: <320fb6e00903260849n683d3e39kf68fd91727970dc7@mail.gmail.com>
Message-ID: <C5F15E4B.1FA8B%lpritc@scri.ac.uk>

Hi all,

It's great to see that people have picked up on the Biopython Twitter
account already - I hope that it proves useful in the longer term.

Regarding the social etiquette of Twitter, and the ease with which
'following' can be taken to imply 'approval' I wonder if it would be a good
policy to restrict the Twitter accounts that Biopython follows only to those
representing organisations or groups.  Following some individuals and not
others might be seen to privilege a self-selecting group, cabal or 'elite',
even the accidental suggestion of which I think would be best avoided.

On 26/03/2009 15:49, "Peter" <biopython at maubp.freeserve.co.uk> wrote:
> 
> Quite a few people have started following this already - which is fun.  I see
> the OBF news page entries are automatically pushed to their twitter account,
> http://twitter.com/obf_news plus the BioPerl tagged entries are also pushed
> to http://twitter.com/bioperl - I'll get in touch to see how they did
> it so we can

[...]

> We could probably also echo the CVS (or git) RSS feed into twitter, but I
> suspect that would drown out any more interesting tweets.

Signal to noise is apparently not an issue that bothers very many Tweeters,
but I see no harm in starting a trend ;)

L.

-- 
Dr Leighton Pritchard MRSC
D131, Plant Pathology Programme, SCRI
Errol Road, Invergowrie, Perth and Kinross, Scotland, DD2 5DA
e:lpritc at scri.ac.uk       w:http://www.scri.ac.uk/staff/leightonpritchard
gpg/pgp: 0xFEFC205C       tel:+44(0)1382 562731 x2405


______________________________________________________
SCRI, Invergowrie, Dundee, DD2 5DA.  
The Scottish Crop Research Institute is a charitable company limited by guarantee. 
Registered in Scotland No: SC 29367.
Recognised by the Inland Revenue as a Scottish Charity No: SC 006662.


DISCLAIMER:

This email is from the Scottish Crop Research Institute, but the views expressed by the sender are not necessarily the views of SCRI and its subsidiaries.  This email and any files transmitted with it are confidential to the intended recipient at the e-mail address to which it has been addressed.  It may not be disclosed or used by any other than that
addressee.
If you are not the intended recipient you are requested to preserve this confidentiality and you must not use, disclose, copy, print or rely on
this e-mail in any way. Please notify postmaster at scri.ac.uk quoting the name of the sender and delete the email from your system.

Although SCRI has taken reasonable precautions to ensure no viruses are present in this email, neither the Institute nor the sender accepts any responsibility for any viruses, and it is your responsibility to scan the email and the attachments (if any).
______________________________________________________


From jblanca at btc.upv.es  Fri Mar 27 08:22:27 2009
From: jblanca at btc.upv.es (Jose Blanca)
Date: Fri, 27 Mar 2009 09:22:27 +0100
Subject: [Biopython-dev] [BioPython] about the SeqRecord slicing
In-Reply-To: <320fb6e00903260832m65c6888dpc856d033ceceda5@mail.gmail.com>
References: <200903261248.02279.jblanca@btc.upv.es>
	<200903261614.13454.jblanca@btc.upv.es>
	<320fb6e00903260832m65c6888dpc856d033ceceda5@mail.gmail.com>
Message-ID: <200903270922.27152.jblanca@btc.upv.es>

On Thursday 26 March 2009 16:32:23 Peter wrote:

> The SeqRecord __getitem__ would have to return a SeqRecord when given
> a single integer index, holding a single letter sequence.  What about
> the name/id/description and annotations (e.g. organism) - do they
> really apply to a single letter from the sequence?  Technically
> writing the code to offer this isn't such a problem, but I am
> unconvinced this is the best behaviour for normal usage.
You're right, I was not thinking on the rest of the properties because I don't 
need them. They're a problem when slicing and adding SeqRecords. But they're 
also a problem in standard slicing. Should the annotations be kept when the 
SeqRecord is sliced? Are they still relevant? None of the behaviours will be 
ok for all the cases.

> Also closely related to this, what would you expect __iter__ to
> iterate over?  Currently it acts like iteration over the record's
> sequence.
The SeqRecord can already hold a sequence of length one, so we have the same 
problem. In fact I could do seq_rec[n:n+1] and I would obtain the SeqRecord 
that I want. 

> You'd also want the SeqRecord to support __add__ (and __radd__) so
> that two SeqRecord objects can be added together.  I have thought
> about this before, and it is a *much* more complicated issue due to
> the meta data.  In general the only safe and unambiguous choice is to
> exclude it from the combined record:
> * sequence - just add (using normal rules for adding Seq objects)
> * name/id/description - if the two agree, use that?  Otherwise default
> to a blank value?
> * annotations - for each keyed value, you could combine the entries?
> Or just throwing them all away?
> * letter_annotations - if an entry is present in both you can combine
> it.  Otherwise throw them away?
> * features - these could be combined, adjusting the locations for one
> record's features as appropriate
As I said before I think that the same problem is presented when you do a 
slice. If I have the sequence of a gene named X with some annotations and I 
slice a part, is still be named geneX? Should the annotations be kept?

> I'm not ruling out adding SeqRecord addition, but I don't want to rush
> it while we are trying to get Biopython 1.50 done.
That's quite sensible. I think that is a good thing to discuss all this 
issues, I keep learning a lot from you.
Best regards,

-- 
Jose M. Blanca Postigo
Instituto Universitario de Conservacion y
Mejora de la Agrodiversidad Valenciana (COMAV)
Universidad Politecnica de Valencia (UPV)
Edificio CPI (Ciudad Politecnica de la Innovacion), 8E
46022 Valencia (SPAIN)
Tlf.:+34-96-3877000 (ext 88473)


From biopython at maubp.freeserve.co.uk  Fri Mar 27 10:29:10 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Fri, 27 Mar 2009 10:29:10 +0000
Subject: [Biopython-dev] [BioPython] about the SeqRecord slicing
In-Reply-To: <200903270922.27152.jblanca@btc.upv.es>
References: <200903261248.02279.jblanca@btc.upv.es>
	<200903261614.13454.jblanca@btc.upv.es>
	<320fb6e00903260832m65c6888dpc856d033ceceda5@mail.gmail.com>
	<200903270922.27152.jblanca@btc.upv.es>
Message-ID: <320fb6e00903270329r74a48dcerf8e00a0ba3776af4@mail.gmail.com>

On Fri, Mar 27, 2009 at 8:22 AM, Jose Blanca <jblanca at btc.upv.es> wrote:
> On Thursday 26 March 2009 16:32:23 Peter wrote:
>
>> You'd also want the SeqRecord to support __add__ (and __radd__) so
>> that two SeqRecord objects can be added together. ?I have thought
>> about this before, and it is a *much* more complicated issue due to
>> the meta data. ?In general the only safe and unambiguous choice is to
>> exclude it from the combined record:
>> * sequence - just add (using normal rules for adding Seq objects)
>> * name/id/description - if the two agree, use that? ?Otherwise default
>> to a blank value?
>> * annotations - for each keyed value, you could combine the entries?
>> Or just throwing them all away?
>> * letter_annotations - if an entry is present in both you can combine
>> it. ?Otherwise throw them away?
>> * features - these could be combined, adjusting the locations for one
>> record's features as appropriate
>
> As I said before I think that the same problem is presented when you do a
> slice. If I have the sequence of a gene named X with some annotations and I
> slice a part, is still be named geneX? Should the annotations be kept?

The problems about the annotation when slicing a SeqRecord are similar, but
I think things are worse when adding two SeqRecords together.

For slicing, there are a few sub of cases:
- per-letter-annotation can be sliced too - easy.
- features - we retain only features fully inside the new sub-sequence (the
  border line features which cross the slice boundary are a small problem -
  excluding them is the simplest solution to code and explain).
- id/name - debatable.  Currently kept.
- description - debatable.  Consider a description which says "whole genome",
  that doesn't really apply to a partial sequence.  On the other hand, it may.
  Currently kept for the sub-record.
- annotations - again debatable.    Without context information, we can't guess.
  The only sensible options are keep it all (as in CVS) or none of it.

I think it is worth keeping the id/name in general (consider typical use cases
like cropping a domain from a gene, or cropping columns off an alignment).
I would be OK with dropping the contents of the annotations dictionary and
description is order to avoid ambiguity, but this would prevent certain tasks.

Peter


From sbassi at clubdelarazon.org  Fri Mar 27 13:31:01 2009
From: sbassi at clubdelarazon.org (Sebastian Bassi)
Date: Fri, 27 Mar 2009 10:31:01 -0300
Subject: [Biopython-dev] [BioPython] about the SeqRecord slicing
In-Reply-To: <320fb6e00903270329r74a48dcerf8e00a0ba3776af4@mail.gmail.com>
References: <200903261248.02279.jblanca@btc.upv.es>
	<200903261614.13454.jblanca@btc.upv.es>
	<320fb6e00903260832m65c6888dpc856d033ceceda5@mail.gmail.com>
	<200903270922.27152.jblanca@btc.upv.es>
	<320fb6e00903270329r74a48dcerf8e00a0ba3776af4@mail.gmail.com>
Message-ID: <9e2f512b0903270631l2b806f55oc02b1e1396bd0bfb@mail.gmail.com>

On Fri, Mar 27, 2009 at 7:29 AM, Peter <biopython at maubp.freeserve.co.uk> wrote:
....
> - id/name - debatable.  Currently kept.
> - description - debatable.  Consider a description which says "whole genome",
>  that doesn't really apply to a partial sequence.  On the other hand, it may.
>  Currently kept for the sub-record.

I think is up to the user to keep updated the id/name/descripption
field when slicing a sequence.

.....
> I would be OK with dropping the contents of the annotations dictionary and
> description is order to avoid ambiguity, but this would prevent certain tasks.

Another option is to make this behavior optional (I mean, select to
keep or to drop the annotations, but default I would drop them).


From biopython at maubp.freeserve.co.uk  Fri Mar 27 13:57:30 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Fri, 27 Mar 2009 13:57:30 +0000
Subject: [Biopython-dev] [BioPython] about the SeqRecord slicing
In-Reply-To: <9e2f512b0903270631l2b806f55oc02b1e1396bd0bfb@mail.gmail.com>
References: <200903261248.02279.jblanca@btc.upv.es>
	<200903261614.13454.jblanca@btc.upv.es>
	<320fb6e00903260832m65c6888dpc856d033ceceda5@mail.gmail.com>
	<200903270922.27152.jblanca@btc.upv.es>
	<320fb6e00903270329r74a48dcerf8e00a0ba3776af4@mail.gmail.com>
	<9e2f512b0903270631l2b806f55oc02b1e1396bd0bfb@mail.gmail.com>
Message-ID: <320fb6e00903270657j1aa06199o4996f11c25bf2a3b@mail.gmail.com>

On Fri, Mar 27, 2009 at 1:31 PM, Sebastian Bassi wrote:
> I think is up to the user to keep updated the id/name/descripption
> field when slicing a sequence.

If you make a new SeqRecord by first slicing a Seq object (which is
how you have to do it with Biopython 1.49 or older), then dealing
with ALL the annotation is explicitly in the hands of the user.

Or are you saying when slicing a SeqRecord you wouldn't expect
the id/name/description to be preserved for the sub-record?

> .....
>> I would be OK with dropping the contents of the annotations
>> dictionary and description is order to avoid ambiguity, but
>> this would prevent certain tasks.
>
> Another option is to make this behavior optional (I mean, select to
> keep or to drop the annotations, but default I would drop them).

How would you make it optional?  As an extra non-standard argument
to __getitem__?  e.g.something like my_record[10:50, annotation=False]?
That seems nasty.

I am sympathetic to dropping the annotations dictionary when creating
a "child" SeqRecord when slicing its parent.  There is also the database
cross reference list (which i forgot on my last email).  Again, I wouldn't
object to dropping this for a sliced sub-record.

If we did drop the annotations and dbxrefs when slicing, the user can
manually choose to explicitly copy them from the parent object if the
do want them.

Peter


From jblanca at btc.upv.es  Fri Mar 27 14:02:57 2009
From: jblanca at btc.upv.es (Jose Blanca)
Date: Fri, 27 Mar 2009 15:02:57 +0100
Subject: [Biopython-dev] [BioPython] about the SeqRecord slicing
In-Reply-To: <320fb6e00903270657j1aa06199o4996f11c25bf2a3b@mail.gmail.com>
References: <200903261248.02279.jblanca@btc.upv.es>
	<9e2f512b0903270631l2b806f55oc02b1e1396bd0bfb@mail.gmail.com>
	<320fb6e00903270657j1aa06199o4996f11c25bf2a3b@mail.gmail.com>
Message-ID: <200903271502.57872.jblanca@btc.upv.es>

On Friday 27 March 2009 14:57:30 Peter wrote:

> How would you make it optional?  As an extra non-standard argument
> to __getitem__?  e.g.something like my_record[10:50, annotation=False]?
> That seems nasty.
That's very nasty, not pythonic, and adds complexity to the api.

> I am sympathetic to dropping the annotations dictionary when creating
> a "child" SeqRecord when slicing its parent.  There is also the database
> cross reference list (which i forgot on my last email).  Again, I wouldn't
> object to dropping this for a sliced sub-record.
>
> If we did drop the annotations and dbxrefs when slicing, the user can
> manually choose to explicitly copy them from the parent object if the
> do want them.
I also think that dropping all that stuff when slicing or adding is the best 
behaviour.

-- 
Jose M. Blanca Postigo
Instituto Universitario de Conservacion y
Mejora de la Agrodiversidad Valenciana (COMAV)
Universidad Politecnica de Valencia (UPV)
Edificio CPI (Ciudad Politecnica de la Innovacion), 8E
46022 Valencia (SPAIN)
Tlf.:+34-96-3877000 (ext 88473)


From sbassi at clubdelarazon.org  Fri Mar 27 14:17:55 2009
From: sbassi at clubdelarazon.org (Sebastian Bassi)
Date: Fri, 27 Mar 2009 11:17:55 -0300
Subject: [Biopython-dev] [BioPython] about the SeqRecord slicing
In-Reply-To: <320fb6e00903270657j1aa06199o4996f11c25bf2a3b@mail.gmail.com>
References: <200903261248.02279.jblanca@btc.upv.es>
	<200903261614.13454.jblanca@btc.upv.es>
	<320fb6e00903260832m65c6888dpc856d033ceceda5@mail.gmail.com>
	<200903270922.27152.jblanca@btc.upv.es>
	<320fb6e00903270329r74a48dcerf8e00a0ba3776af4@mail.gmail.com>
	<9e2f512b0903270631l2b806f55oc02b1e1396bd0bfb@mail.gmail.com>
	<320fb6e00903270657j1aa06199o4996f11c25bf2a3b@mail.gmail.com>
Message-ID: <9e2f512b0903270717s13c82d19v7c48dddda4a8fcb@mail.gmail.com>

On Fri, Mar 27, 2009 at 10:57 AM, Peter <biopython at maubp.freeserve.co.uk> wrote:
> How would you make it optional?  As an extra non-standard argument
> to __getitem__?  e.g.something like my_record[10:50, annotation=False]?
> That seems nasty.

Yes it is nasty this way, I never meant to do it in __getitem__.
Anyway I can't think a nice and intuitive way to do it.

> If we did drop the annotations and dbxrefs when slicing, the user can
> manually choose to explicitly copy them from the parent object if the
> do want them.

Yes, that is OK.


From biopython at maubp.freeserve.co.uk  Fri Mar 27 14:24:13 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Fri, 27 Mar 2009 14:24:13 +0000
Subject: [Biopython-dev] [BioPython] about the SeqRecord slicing
In-Reply-To: <9e2f512b0903270717s13c82d19v7c48dddda4a8fcb@mail.gmail.com>
References: <200903261248.02279.jblanca@btc.upv.es>
	<200903261614.13454.jblanca@btc.upv.es>
	<320fb6e00903260832m65c6888dpc856d033ceceda5@mail.gmail.com>
	<200903270922.27152.jblanca@btc.upv.es>
	<320fb6e00903270329r74a48dcerf8e00a0ba3776af4@mail.gmail.com>
	<9e2f512b0903270631l2b806f55oc02b1e1396bd0bfb@mail.gmail.com>
	<320fb6e00903270657j1aa06199o4996f11c25bf2a3b@mail.gmail.com>
	<9e2f512b0903270717s13c82d19v7c48dddda4a8fcb@mail.gmail.com>
Message-ID: <320fb6e00903270724r432b4daco920648d921890623@mail.gmail.com>

On Fri, Mar 27, 2009 at 2:17 PM, Sebastian Bassi
<sbassi at clubdelarazon.org> wrote:
> On Fri, Mar 27, 2009 at 10:57 AM, Peter <biopython at maubp.freeserve.co.uk> wrote:
>> How would you make it optional? ?As an extra non-standard argument
>> to __getitem__? ?e.g.something like my_record[10:50, annotation=False]?
>> That seems nasty.
>
> Yes it is nasty this way, I never meant to do it in __getitem__.
> Anyway I can't think a nice and intuitive way to do it.

Me neither right now.

>> If we did drop the annotations and dbxrefs when slicing, the user can
>> manually choose to explicitly copy them from the parent object if the
>> do want them.
>
> Yes, that is OK.

Jose agrees, so that makes a mini consensus (at least amongst everyone who
has tried the CVS code and posted to this thread).  I've made that
change in CVS,
see Bio/SeqRecord.py revision 1.31.
http://cvs.biopython.org/cgi-bin/viewcvs/viewcvs.cgi/biopython/Bio/SeqRecord.py?cvsroot=biopython

As I said before, I want to preserve the id and name - preserving
these would be key
for cross referencing the sub-record back to its parent.

Do either of you think we should also discard the description?

Peter


From eric.talevich at gmail.com  Fri Mar 27 15:16:19 2009
From: eric.talevich at gmail.com (Eric Talevich)
Date: Fri, 27 Mar 2009 11:16:19 -0400
Subject: [Biopython-dev] [BioPython] about the SeqRecord slicing
In-Reply-To: <320fb6e00903270724r432b4daco920648d921890623@mail.gmail.com>
References: <200903261248.02279.jblanca@btc.upv.es>
	<200903261614.13454.jblanca@btc.upv.es>
	<320fb6e00903260832m65c6888dpc856d033ceceda5@mail.gmail.com>
	<200903270922.27152.jblanca@btc.upv.es>
	<320fb6e00903270329r74a48dcerf8e00a0ba3776af4@mail.gmail.com>
	<9e2f512b0903270631l2b806f55oc02b1e1396bd0bfb@mail.gmail.com>
	<320fb6e00903270657j1aa06199o4996f11c25bf2a3b@mail.gmail.com>
	<9e2f512b0903270717s13c82d19v7c48dddda4a8fcb@mail.gmail.com>
	<320fb6e00903270724r432b4daco920648d921890623@mail.gmail.com>
Message-ID: <3f6baf360903270816x4fcfd8ccg5906a9edb53709d4@mail.gmail.com>

On Fri, Mar 27, 2009 at 10:24 AM, Peter <biopython at maubp.freeserve.co.uk>wrote:

> On Fri, Mar 27, 2009 at 2:17 PM, Sebastian Bassi
> <sbassi at clubdelarazon.org> wrote:
> > On Fri, Mar 27, 2009 at 10:57 AM, Peter <biopython at maubp.freeserve.co.uk>
> wrote:
> >> How would you make it optional?  As an extra non-standard argument
> >> to __getitem__?  e.g.something like my_record[10:50, annotation=False]?
> >> That seems nasty.
> >
> > Yes it is nasty this way, I never meant to do it in __getitem__.
> > Anyway I can't think a nice and intuitive way to do it.
>
> Me neither right now.
>
> >> If we did drop the annotations and dbxrefs when slicing, the user can
> >> manually choose to explicitly copy them from the parent object if the
> >> do want them.
> >
> > Yes, that is OK.
>
>
One way to allow non-default options for adding and slicing is to provide a
couple of functions at the class or module level (classmethod, staticmethod,
plain ol' function) that have the necessary keyword arguments. These
functions would do the same thing by default as the corresponding syntax,
and the syntax-friendly magic methods would just pass their arguments
straight to these functions. This makes the syntax pretty for the common
cases, and makes the nonstandard stuff visually obvious.

Examples:

my_record.slice(10, 50) == my_record[10:50]
my_record.slice(10, 50, annotation=True) == my_record[10:50] plus updated
annotations

my_record.add(other_record) == my_record + other_record
my_record.add(other_record, annotation=True) == my_record + other_record,
keeping annotations
my_record.slice(10, 50, annotation=True).add(
    my_record.slice(100, 200, annotation=True),
    annotation=True) == my_record[10:50] + my_record[100:200], keeping all
annotations (a pain otherwise)


From biopython at maubp.freeserve.co.uk  Fri Mar 27 15:51:53 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Fri, 27 Mar 2009 15:51:53 +0000
Subject: [Biopython-dev] [BioPython] about the SeqRecord slicing
In-Reply-To: <3f6baf360903270816x4fcfd8ccg5906a9edb53709d4@mail.gmail.com>
References: <200903261248.02279.jblanca@btc.upv.es>
	<200903261614.13454.jblanca@btc.upv.es>
	<320fb6e00903260832m65c6888dpc856d033ceceda5@mail.gmail.com>
	<200903270922.27152.jblanca@btc.upv.es>
	<320fb6e00903270329r74a48dcerf8e00a0ba3776af4@mail.gmail.com>
	<9e2f512b0903270631l2b806f55oc02b1e1396bd0bfb@mail.gmail.com>
	<320fb6e00903270657j1aa06199o4996f11c25bf2a3b@mail.gmail.com>
	<9e2f512b0903270717s13c82d19v7c48dddda4a8fcb@mail.gmail.com>
	<320fb6e00903270724r432b4daco920648d921890623@mail.gmail.com>
	<3f6baf360903270816x4fcfd8ccg5906a9edb53709d4@mail.gmail.com>
Message-ID: <320fb6e00903270851i47db9121p6d272b5f7095a5d3@mail.gmail.com>

On Fri, Mar 27, 2009 at 3:16 PM, Eric Talevich <eric.talevich at gmail.com> wrote:
> One way to allow non-default options for adding and slicing is to provide a
> couple of functions at the class or module level (classmethod, staticmethod,
> plain ol' function) that have the necessary keyword arguments. These
> functions would do the same thing by default as the corresponding syntax,
> and the syntax-friendly magic methods would just pass their arguments
> straight to these functions. This makes the syntax pretty for the common
> cases, and makes the nonstandard stuff visually obvious.
>
> Examples:
>
> my_record.slice(10, 50) == my_record[10:50]
> my_record.slice(10, 50, annotation=True) == my_record[10:50] plus updated
> annotations
> ...

I think I understand your idea, but I'm not very keen on adding slice
and add methods as alternatives to __getitem__ and __add__.

As things stand (with CVS after the change an hour ago), if you want
the annotations dictionary copied with a slice you must do this
explicitly:

>>> from Bio import SeqIO
>>> my_record = SeqIO.read(open("NC_005816.gb"),"genbank")
>>> my_record
SeqRecord(seq=Seq('TGTAACGAACGGTGCAATAGTGATCCACACCCAACGCCTGAAATCAGATCCAGG...CTG',
IUPACAmbiguousDNA()), id='NC_005816.1', name='NC_005816',
description='Yersinia pestis biovar Microtus str. 91001 plasmid pPCP1,
complete sequence.', dbxrefs=['Project:10638'])
>>> len(my_record)
9609
>>> len(my_record.features)
29
>>> len(my_record.annotations)
11
>>> len(my_record.dbxrefs)
1

Doing a slice will not copy/preserve the annotations dict or dbxrefs list:

>>> sub_record = my_record[1000:2000]
>>> sub_record
SeqRecord(seq=Seq('GAAAAAAGAGTATGACGTGCATCTTGATGAAAATCTGGTGAACTTCGACAAACA...GGA',
IUPACAmbiguousDNA()), id='NC_005816.1', name='NC_005816',
description='Yersinia pestis biovar Microtus str. 91001 plasmid pPCP1,
complete sequence.', dbxrefs=[])
>>> len(sub_record)
1000
>>> len(sub_record.features)
2
>>> assert not sub_record.annotations and not sub_record.dbxrefs

You can then choose to blindly reuse the annotations and dbxrefs if you want to:

>>> sub_record.annotations = my_record.anntations #shares the dict
>>> sub_record.dbxrefs = my_record.dbxrefs #shares the list

or as a simple copy:

>>> sub_record.annotations = my_record.annotations.copy()
>>> sub_record.dbxrefs = my_record.dbxrefs[:]

The good thing about this is it makes you think about the annotations,
and which (if any) are appropriate to transfer to the sub-record.  As
per my earlier email, maybe we should do the same with the
description?

Peter


From chapmanb at 50mail.com  Sun Mar 29 01:06:52 2009
From: chapmanb at 50mail.com (Brad Chapman)
Date: Sat, 28 Mar 2009 21:06:52 -0400
Subject: [Biopython-dev] Biopython on Twitter
In-Reply-To: <C5F15E4B.1FA8B%lpritc@scri.ac.uk>
References: <320fb6e00903260849n683d3e39kf68fd91727970dc7@mail.gmail.com>
	<C5F15E4B.1FA8B%lpritc@scri.ac.uk>
Message-ID: <20090329010652.GA914@kunkel>

Hi all;
It is great we are exploring getting news out about Biopython in
additional ways. One thing this can really help with is recognizing
contributions to Biopython. Another is pointing out interesting
discussion threads on the mailing lists and getting others involved.

Do you think it would be worthwhile to "advertise" on the main list
for someone interested in coordinating news and communication? They
could do things like:

- Send updates through twitter on day to day activities, like:

  Bartek and Tiago cleaned up documentation on Git submissions 
    (link to wiki page)
  Peter, Jose and Sebastian are discussing slicing on SeqRecords
    (link to mailing list discussion)

- Send out monthly news reports on new items in Biopython, in the
  style of Peter's update recently:
  http://news.open-bio.org/news/2009/03/biopython-next-gen-sequencing/
  (but it should also give credit to the fine people who coded it)

Perhaps there are members who are interested in Biopython and follow
what is going on but aren't coders. This would be a way to get
involved, and also take some of the burden off Peter. What do 
y'all think?

Brad
 

> 
> It's great to see that people have picked up on the Biopython Twitter
> account already - I hope that it proves useful in the longer term.
> 
> Regarding the social etiquette of Twitter, and the ease with which
> 'following' can be taken to imply 'approval' I wonder if it would be a good
> policy to restrict the Twitter accounts that Biopython follows only to those
> representing organisations or groups.  Following some individuals and not
> others might be seen to privilege a self-selecting group, cabal or 'elite',
> even the accidental suggestion of which I think would be best avoided.
> 
> On 26/03/2009 15:49, "Peter" <biopython at maubp.freeserve.co.uk> wrote:
> > 
> > Quite a few people have started following this already - which is fun.  I see
> > the OBF news page entries are automatically pushed to their twitter account,
> > http://twitter.com/obf_news plus the BioPerl tagged entries are also pushed
> > to http://twitter.com/bioperl - I'll get in touch to see how they did
> > it so we can
> 
> [...]
> 
> > We could probably also echo the CVS (or git) RSS feed into twitter, but I
> > suspect that would drown out any more interesting tweets.
> 
> Signal to noise is apparently not an issue that bothers very many Tweeters,
> but I see no harm in starting a trend ;)
> 
> L.
> 
> -- 
> Dr Leighton Pritchard MRSC
> D131, Plant Pathology Programme, SCRI
> Errol Road, Invergowrie, Perth and Kinross, Scotland, DD2 5DA
> e:lpritc at scri.ac.uk       w:http://www.scri.ac.uk/staff/leightonpritchard
> gpg/pgp: 0xFEFC205C       tel:+44(0)1382 562731 x2405
> 
> 
> ______________________________________________________
> SCRI, Invergowrie, Dundee, DD2 5DA.  
> The Scottish Crop Research Institute is a charitable company limited by guarantee. 
> Registered in Scotland No: SC 29367.
> Recognised by the Inland Revenue as a Scottish Charity No: SC 006662.
> 
> 
> DISCLAIMER:
> 
> This email is from the Scottish Crop Research Institute, but the views expressed by the sender are not necessarily the views of SCRI and its subsidiaries.  This email and any files transmitted with it are confidential to the intended recipient at the e-mail address to which it has been addressed.  It may not be disclosed or used by any other than that
> addressee.
> If you are not the intended recipient you are requested to preserve this confidentiality and you must not use, disclose, copy, print or rely on
> this e-mail in any way. Please notify postmaster at scri.ac.uk quoting the name of the sender and delete the email from your system.
> 
> Although SCRI has taken reasonable precautions to ensure no viruses are present in this email, neither the Institute nor the sender accepts any responsibility for any viruses, and it is your responsibility to scan the email and the attachments (if any).
> ______________________________________________________
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev


From biopython at maubp.freeserve.co.uk  Sun Mar 29 22:58:47 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Sun, 29 Mar 2009 23:58:47 +0100
Subject: [Biopython-dev] Biopython on Twitter
In-Reply-To: <20090329010652.GA914@kunkel>
References: <320fb6e00903260849n683d3e39kf68fd91727970dc7@mail.gmail.com>
	<C5F15E4B.1FA8B%lpritc@scri.ac.uk> <20090329010652.GA914@kunkel>
Message-ID: <320fb6e00903291558o6299575dq80eea647b1c6a900@mail.gmail.com>

On Sun, Mar 29, 2009 at 2:06 AM, Brad Chapman <chapmanb at 50mail.com> wrote:
> Hi all;
> It is great we are exploring getting news out about Biopython in
> additional ways. One thing this can really help with is recognizing
> contributions to Biopython. Another is pointing out interesting
> discussion threads on the mailing lists and getting others involved.

Do you think the recent release notes and NEWS file entries have been
a bit too impersonal?  We can certainly be a bit more explicit if people
that is a good thing. For example, should we mention Bartek by name
in the paragraph on the new Bio.Motif module?

This is linked to from the wiki's news page BTW:
http://biopython.open-bio.org/SRC/biopython/NEWS
http://cvs.biopython.org/cgi-bin/viewcvs/viewcvs.cgi/biopython/NEWS?cvsroot=biopython

> Do you think it would be worthwhile to "advertise" on the main list
> for someone interested in coordinating news and communication?
> ... Perhaps there are members who are interested in Biopython and
> follow what is going on but aren't coders. This would be a way to
> get involved, ...

Are you up for the job yourself Brad?  From your own blog we know
you can and do write regularly anyway ;)   Would you like an account
on the OBF news server? Email me off list and we can sort that out.

In terms of micro-blogging via twitter, you sound like you have a better
feel for this than me - I don't even have a personal twitter account.

Monthly news posts (perhaps cc'd to the announcement email list)
would be a nice idea - especially if we can encourage more lurkers
to speak up.  For a while BioPerl had something like this going
(digest emails or something), but it needs a pretty dedicated person
or team.  In the meantime as you've noticed I've started making
more use of our news facility myself...

Peter


From biopython at maubp.freeserve.co.uk  Mon Mar 30 10:26:09 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Mon, 30 Mar 2009 11:26:09 +0100
Subject: [Biopython-dev] Testing Biopython with NumPy 1.3
Message-ID: <320fb6e00903300326x4cb5eb95r87dbd5c95d5379d9@mail.gmail.com>

Hi all,

NumPy 1.3 is about to be released, so we should try and make sure the
forthcoming
Biopython 1.50 release works with it.  Of particular interest, this
will be the first version
of NumPy to support Python 2.6 on Windows, so we will hopefully be
able to include
a Python 2.6 Windows installer for Biopython 1.50 :)

There is a release candidate out for NumPy 1.3, but so far no Windows
installer for
Python 2.6, but in the meantime I've just tried the NumPy 1.3 beta
release instead.
The good news is everything seems to compile with MinGW, but unfortunately
test_Cluster.py is failing on the second line of
Bio/Cluster/__init__.py, "from cluster
import *".  This could be a hiccup with NumPy itself - I am using
their beta after all,
or perhaps they have changed something.

To try and narrow down the problem, has anyone else tried NumPy 1.3 (beta or
release candidate) with the latest Biopython from CVS (on any platform)?

Thanks,

Peter


From biopython at maubp.freeserve.co.uk  Mon Mar 30 10:29:02 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Mon, 30 Mar 2009 11:29:02 +0100
Subject: [Biopython-dev] Testing Biopython with NumPy 1.3
In-Reply-To: <320fb6e00903300326x4cb5eb95r87dbd5c95d5379d9@mail.gmail.com>
References: <320fb6e00903300326x4cb5eb95r87dbd5c95d5379d9@mail.gmail.com>
Message-ID: <320fb6e00903300329ra19fe06j1cd12477e591afdf@mail.gmail.com>

On Mon, Mar 30, 2009 at 11:26 AM, Peter wrote:
> Hi all,
>
> NumPy 1.3 is about to be released, so we should try and make sure the
> forthcoming Biopython 1.50 release works with it. ?Of particular interest,
> this will be the first version of NumPy to support Python 2.6 on Windows,
> so we will hopefully be able to include a Python 2.6 Windows installer
> for Biopython 1.50 :)
>
> There is a release candidate out for NumPy 1.3, but so far no Windows
> installer for Python 2.6, but in the meantime I've just tried the NumPy 1.3
> beta release instead.

David Cournapeau has just updated sourceforge - so I will try again with
the actual release candidate instead of just the beta...

Peter


From biopython at maubp.freeserve.co.uk  Mon Mar 30 10:38:58 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Mon, 30 Mar 2009 11:38:58 +0100
Subject: [Biopython-dev] Testing Biopython with NumPy 1.3
In-Reply-To: <320fb6e00903300329ra19fe06j1cd12477e591afdf@mail.gmail.com>
References: <320fb6e00903300326x4cb5eb95r87dbd5c95d5379d9@mail.gmail.com>
	<320fb6e00903300329ra19fe06j1cd12477e591afdf@mail.gmail.com>
Message-ID: <320fb6e00903300338v35b14fa2yc0d2ba68925808da@mail.gmail.com>

On Mon, Mar 30, 2009 at 11:29 AM, Peter <biopython at maubp.freeserve.co.uk> wrote:
> David Cournapeau has just updated sourceforge - so I will try again with
> the actual release candidate instead of just the beta...

Nope - using numpy-1.3.0rc1-win32-superpack-python2.6.exe on Windows
XP, Python 2.6 using the python.org installer, with Biopython compiled
with cygwin mingw32 as normal, same error - test_Cluster.py is failing
on the second line of Bio/Cluster/__init__.py, "from cluster import
*".

So the question stands - has anyone else tried Biopython (from CVS)
with NumPy 1.3 (beta or release candidate) on any platform?  I should
be able to check it tonight on a Linux machine myself without too much
trouble... but a few more data points wouldn't hurt ;)

Peter


From biopython at maubp.freeserve.co.uk  Mon Mar 30 11:15:06 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Mon, 30 Mar 2009 12:15:06 +0100
Subject: [Biopython-dev] test_Nexus.py and NamedTemporaryFile mode
Message-ID: <320fb6e00903300415i350610c0i4c2aeed1834011da@mail.gmail.com>

I've been running the test suite again on Windows, and was reminded of
this open issue with NamedTemporaryFile on Windows...

On Fri, Feb 13, 2009 at 5:02 PM, Peter <biopython at maubp.freeserve.co.uk> wrote:
> On Tue, Feb 10, 2009 at 11:25 AM, Michiel de Hoon <mjldehoon at yahoo.com> wrote:
>>
>>> The test_Nexus tearDown used to make sure the temp output
>>> files were removed. ?This is important on Windows which
>>> does not do this automatically. ?I see you now allocate
>>> "random" filenames using tempfile.NamedTemporaryFile(...)
>>> so presumably we would need to record these so that the
>>> tearDown method knows what temp files to remove.
>>
>> From reading the Python documentation, the file created by
>> tempfile.NamedTemporaryFile is removed automatically
>> when the file handle is closed, even on Windows.
>
> That's good to know. ?On a related point, I've just found
> test_Nexus.py is failing on Windows XP with Python 2.6 (but is fine
> with Python 2.3, 2.4 and 2.5):
>
> C:\repository\biopython\Tests>c:\python26\python test_Nexus.py
> Test Nexus module ... ERROR
> Test Tree module. ... ok
>
> ======================================================================
> ERROR: Test Nexus module
> ----------------------------------------------------------------------
> Traceback (most recent call last):
> ?File "test_Nexus.py", line 114, in test_NexusTest1
> ? ?f1=tempfile.NamedTemporaryFile(mode='r+w+b')
> ?File "c:\python26\lib\tempfile.py", line 445, in NamedTemporaryFile
> ? ?file = _os.fdopen(fd, mode, bufsize)
> OSError: [Errno 22] Invalid argument
>
> ----------------------------------------------------------------------
> Ran 2 tests in 0.016s
>
> FAILED (errors=1)

You can recreate this at the python 2.6 prompt with the one line:
f1=tempfile.NamedTemporaryFile(mode='r+w+b')

I couldn't solve this from looking at the Python documentation, but
after some Google searching the answer seems to be just to use the
default mode (w+b):
f1=tempfile.NamedTemporaryFile()

This works on Windows with Python 2.3 to 2.6, and also works on Mac OS
X and Linux too (only one version of Python tested here).  Fix checked
into CVS.

Peter


From cy at cymon.org  Mon Mar 30 11:42:00 2009
From: cy at cymon.org (Cymon Cox)
Date: Mon, 30 Mar 2009 12:42:00 +0100
Subject: [Biopython-dev] Multiple alignment - Clustalw etc...
Message-ID: <7265d4f0903300442h276df25ay1d78fb04180c5b5b@mail.gmail.com>

Hi Folks,

I've been trying to formalize a bunch of randomly scattered bits of code to
support the use of the alignment programme Muscle
(http://www.drive5.com/muscle/). I prefer to use this software in preference
to
Clustalw - subjectively, it seems to give the most accurate alignments.
(Whether
Biopython would want to support a second alignment programme/external
dependency
is another matter...)

Anyway, while doing so, I realised just how awkward the current interface to
Clustalw is, which doesn't fit the SeqIO/AlignIO paradigm well.

Currently, if we have a bunch of SeqRecords, say after downloading from
GenBank
or being pulled from a BioSQL db, we have to write them to disk and call
clustalw on the file:

>>> from Bio import Clustalw
>>> from Bio.Clustalw import MultipleAlignCL
>>> cline = MultipleAlignCL("f002", command="clustalw")
>>> align = Clustalw.do_alignment(cline)

It seems to me more appropriate to be able to call clustalw directly on a
bunch
of SeqRecords:

eg (suggested implementation)
>>> records = list(SeqIO.parse(open("f002", "r"), "fasta"))
>>> from Bio.Align import MultipleAlignment
>>> align = MultipleAlignment(records, executable="clustalw")

Secondly, the biopython interface does not support calling Clustalw to
perform
profile alignments,

(suggested implementation)
# The scaffold alignment:
>>> align = AlignIO.read(open("blah.nex", "r"), "nexus")
# The sequences we want to add to it:
>>> records = list(SeqIO.parse(open("f002", "r"), "fasta"))
>>> from Bio.Align import ProfileAlignment
>>> align = ProfileAlignment(align, records, executable="clustalw")

Calls to MultipleAlignment and ProfileAlignment would take a **options
parameter to collect any additional command line options.

Thirdly, should an alignment object have a
Alignment.refine_alignment(executable="clustalw")
method?

Any thoughts?

Cheers, C.
-- 
____________________________________________________________________

Cymon J. Cox

Centro de Ciencias do Mar
Faculdade de Ciencias do Mar e Ambiente (FCMA)
Universidade do Algarve
Campus de Gambelas
8005-139 Faro
Portugal

Phone: +0351 289800909 ext 7909
Fax: +0351 289800051
Email: cy at cymon.org, cymon at ualg.pt, cymon.cox at gmail.com
HomePage : http://biology.duke.edu/bryology/cymon.html
-8.63/-6.77


From chapmanb at 50mail.com  Mon Mar 30 13:00:27 2009
From: chapmanb at 50mail.com (Brad Chapman)
Date: Mon, 30 Mar 2009 09:00:27 -0400
Subject: [Biopython-dev] Testing Biopython with NumPy 1.3
In-Reply-To: <320fb6e00903300338v35b14fa2yc0d2ba68925808da@mail.gmail.com>
References: <320fb6e00903300326x4cb5eb95r87dbd5c95d5379d9@mail.gmail.com>
	<320fb6e00903300329ra19fe06j1cd12477e591afdf@mail.gmail.com>
	<320fb6e00903300338v35b14fa2yc0d2ba68925808da@mail.gmail.com>
Message-ID: <20090330130027.GB36526@sobchak.mgh.harvard.edu>

Hi Peter;
Things work on FreeBSD 7.1 with python2.5 and the numpy release
candidate:

> python2.5
Python 2.5.4 (r254:67916, Feb 18 2009, 08:20:57) [GCC 4.2.1 20070719  [FreeBSD]] on freebsd7
>>> import numpy
>>> numpy.__version__
'1.3.0rc1'

> python2.5 test_Cluster.py
test_clusterdistance (__main__.TestCluster) ... ok
test_distancematrix_kmedoids (__main__.TestCluster) ... ok
test_kcluster (__main__.TestCluster) ... ok
test_matrix_parse (__main__.TestCluster) ... ok
test_median_mean (__main__.TestCluster) ... ok
test_somcluster (__main__.TestCluster) ... ok
test_treecluster (__main__.TestCluster) ... ok

----------------------------------------------------------------------
Ran 7 tests in 0.009s

OK

The whole test suite passes as well. Maybe this is a windows issue?
Brad


> On Mon, Mar 30, 2009 at 11:29 AM, Peter <biopython at maubp.freeserve.co.uk> wrote:
> > David Cournapeau has just updated sourceforge - so I will try again with
> > the actual release candidate instead of just the beta...
> 
> Nope - using numpy-1.3.0rc1-win32-superpack-python2.6.exe on Windows
> XP, Python 2.6 using the python.org installer, with Biopython compiled
> with cygwin mingw32 as normal, same error - test_Cluster.py is failing
> on the second line of Bio/Cluster/__init__.py, "from cluster import
> *".
> 
> So the question stands - has anyone else tried Biopython (from CVS)
> with NumPy 1.3 (beta or release candidate) on any platform?  I should
> be able to check it tonight on a Linux machine myself without too much
> trouble... but a few more data points wouldn't hurt ;)
> 
> Peter
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev


From biopython at maubp.freeserve.co.uk  Mon Mar 30 13:23:31 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Mon, 30 Mar 2009 14:23:31 +0100
Subject: [Biopython-dev] Testing Biopython with NumPy 1.3
In-Reply-To: <20090330130027.GB36526@sobchak.mgh.harvard.edu>
References: <320fb6e00903300326x4cb5eb95r87dbd5c95d5379d9@mail.gmail.com>
	<320fb6e00903300329ra19fe06j1cd12477e591afdf@mail.gmail.com>
	<320fb6e00903300338v35b14fa2yc0d2ba68925808da@mail.gmail.com>
	<20090330130027.GB36526@sobchak.mgh.harvard.edu>
Message-ID: <320fb6e00903300623j1f17fe6fia6ded742a7c610ec@mail.gmail.com>

On Mon, Mar 30, 2009 at 2:00 PM, Brad Chapman <chapmanb at 50mail.com> wrote:
> Hi Peter;
> Things work on FreeBSD 7.1 with python2.5 and the numpy release
> candidate:
> ...
> The whole test suite passes as well. Maybe this is a windows issue?
> Brad

Thanks Brad - nice to know we have Biopython being tested on a
fourth major OS being tested (FreeBSD, in addition to Linux, Mac
OS X and Windows XP).

I've just used NumPy 1.3.0rc1 with Python 2.4.3 on a Linux box, and
test_Cluster and the rest of the Biopython tests passed.  This looks like
a Windows and/or Python 2.6 problem - I should be able to try a Linux
machine with Python 2.6 tonight...

Peter


From biopython at maubp.freeserve.co.uk  Mon Mar 30 14:37:18 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Mon, 30 Mar 2009 15:37:18 +0100
Subject: [Biopython-dev] Multiple alignment - Clustalw etc...
In-Reply-To: <7265d4f0903300442h276df25ay1d78fb04180c5b5b@mail.gmail.com>
References: <7265d4f0903300442h276df25ay1d78fb04180c5b5b@mail.gmail.com>
Message-ID: <320fb6e00903300737i73f6efaex7b0a22ee685c74c1@mail.gmail.com>

On Mon, Mar 30, 2009 at 12:42 PM, Cymon Cox <cy at cymon.org> wrote:
>
> Hi Folks,
>
> I've been trying to formalize a bunch of randomly scattered bits of code to
> support the use of the alignment programme Muscle
> (http://www.drive5.com/muscle/). I prefer to use this software in preference
> to Clustalw - subjectively, it seems to give the most accurate alignments.
> (Whether Biopython would want to support a second alignment programme
> /external dependency is another matter...)

A wrapper for MUSCLE wouldn't hurt - although there is scope for some
rearrangement of our command line tool wrappers rather than adding more
and more top level modules.  Maybe under Bio.Align, and move the Clustalw
wrapper there too.

> Anyway, while doing so, I realised just how awkward the current interface to
> Clustalw is, which doesn't fit the SeqIO/AlignIO paradigm well.

What I typically do fits pretty nicely with the SeqIO/AlignIO paradigm:
(1) use SeqIO to prepare the FASTA input file.
(2) run the command line tool (e.g. MUSCLE).
(3) use AlignIO (or SeqIO) to read the alignment output file.

Actually I think that Bio.Clustalw interface is now a bit out of place,
as it hides some of this from you.  (Note that Bio.Clustalw predates
Bio.AlignIO, and that by working with handles Bio.AlignIO is fairly
tool neutral).

> Currently, if we have a bunch of SeqRecords, say after downloading from
> GenBank or being pulled from a BioSQL db, we have to write them to disk
> and call clustalw on the file:
>
>>>> from Bio import Clustalw
>>>> from Bio.Clustalw import MultipleAlignCL
>>>> cline = MultipleAlignCL("f002", command="clustalw")
>>>> align = Clustalw.do_alignment(cline)

Well yes. Typically for any alignment tool you'd have to write the
unaligned records in FASTA format.  Some tools may let handle
this via standard input, so you may be able to use a pipe instead
of a file - but the issues are similar.

> It seems to me more appropriate to be able to call clustalw directly on a
> bunch of SeqRecords:
>
> eg (suggested implementation)
>>>> records = list(SeqIO.parse(open("f002", "r"), "fasta"))
>>>> from Bio.Align import MultipleAlignment
>>>> align = MultipleAlignment(records, executable="clustalw")

i.e. Have a Biopython wrapper use a temp file to record the
given records to in a format appropriate for the command line
tool selected, and capturing the output?  In the case of
ClustalW or MUSCLE this means making a temp FASTA input
file.  For ClustalW we'd then have to open the output file, read
it, and then delete it.  For other tools we may be able to just
capture its output on stdout and not have to clean up a temp
output file.

All the possible command line tools have their own arguments,
range of file formats, behaviour with respect to default filenames
etc.  Trying to capture all this in a single wrapper seems rather
ambitious.  For example, how would you handle gap penalties?
Keep in mind that different tools may use the same name for
a gap extension penalty but interpret the values differently.

Also, while I can see this might be nice for short alignments
(which are quick to run), its rather implicit or magic.  I personally
prefer to have to deal with the files explicitly myself - but then I
have been dealing with large alignments which I want to keep
on disk.

> Secondly, the biopython interface does not support calling
> Clustalw to perform profile alignments,
>
> (suggested implementation)
> # The scaffold alignment:
>>>> align = AlignIO.read(open("blah.nex", "r"), "nexus")
> # The sequences we want to add to it:
>>>> records = list(SeqIO.parse(open("f002", "r"), "fasta"))
>>>> from Bio.Align import ProfileAlignment
>>>> align = ProfileAlignment(align, records, executable="clustalw")
>
> Calls to MultipleAlignment and ProfileAlignment would take a
> **options parameter to collect any additional command line options.
>
> Thirdly, should an alignment object have a
> Alignment.refine_alignment(executable="clustalw")
> method?
>
> Any thoughts?

I may have misunderstood you, but the ideas you've sketched out
seem very very broad/ambitious - and actually take us further away
from the SeqIO/AlignIO interface by hiding all the filenames and
handles from the user.  I think these should be kept explicit.

Peter


From eric.talevich at gmail.com  Mon Mar 30 18:34:09 2009
From: eric.talevich at gmail.com (Eric Talevich)
Date: Mon, 30 Mar 2009 14:34:09 -0400
Subject: [Biopython-dev] Google Summer of Code -- phyloXML parser project
Message-ID: <3f6baf360903301134p421a41f2if2b8980e9e166451@mail.gmail.com>

Hi folks,

I noticed earlier this month that several Biopython developers had signed up
as potential mentors in OBF's Summer of Code application. Although OBF
apparently wasn't selected as a mentoring organization this year, some other
bioinformatics-related groups were -- in particular, the National
Evolutionary Synthesis Center's page mentions involvement with the Bio*
projects:

http://socghop.appspot.com/org/show/google/gsoc2009/nescent

The project I'd like to work on is a phyloXML parser for Biopython.
NESCent's idea list includes a similar entry for BioRuby (links below). I
asked the mentor, Christian Zmasek, if it would be acceptable to do the
project with Biopython instead of BioRuby, and he said it would, but he'd
prefer to have a Biopython specialist on board as another mentor.

Would any of you be interested in being a mentor for this project? I imagine
it would have some things in common with the existing Nexus parser, as a
starting point.

http://www.phyloxml.org/
https://www.nescent.org/wg/phyloinformatics/index.php?title=Phyloinformatics_Summer_of_Code_2009#phyloXML_support_in_BioRuby

Thanks,
Eric


From chapmanb at 50mail.com  Mon Mar 30 21:00:07 2009
From: chapmanb at 50mail.com (Brad Chapman)
Date: Mon, 30 Mar 2009 17:00:07 -0400
Subject: [Biopython-dev] Multiple alignment - Clustalw etc...
In-Reply-To: <320fb6e00903300737i73f6efaex7b0a22ee685c74c1@mail.gmail.com>
References: <7265d4f0903300442h276df25ay1d78fb04180c5b5b@mail.gmail.com>
	<320fb6e00903300737i73f6efaex7b0a22ee685c74c1@mail.gmail.com>
Message-ID: <20090330210007.GC72956@sobchak.mgh.harvard.edu>

Cymon;
I wrote a bunch of the Clustalw stuff a long while ago, and it
sounds like Peter has a good handle on integrating it with AlignIO
so I will leave that to him.

On the choosing aligners side of things, have you tried MAFFT?

http://align.bmr.kyushu-u.ac.jp/mafft/software/

It's updated regularly and seems to have good buzz in the community.
I haven't had to do lots of multiple alignments recently, but it's
worked well for the few I've done.

Having support for multiple aligners is good stuff; I second Peter's
suggestion of having these live under Bio.Align.

Brad

> On Mon, Mar 30, 2009 at 12:42 PM, Cymon Cox <cy at cymon.org> wrote:
> >
> > Hi Folks,
> >
> > I've been trying to formalize a bunch of randomly scattered bits of code to
> > support the use of the alignment programme Muscle
> > (http://www.drive5.com/muscle/). I prefer to use this software in preference
> > to Clustalw - subjectively, it seems to give the most accurate alignments.
> > (Whether Biopython would want to support a second alignment programme
> > /external dependency is another matter...)
> 
> A wrapper for MUSCLE wouldn't hurt - although there is scope for some
> rearrangement of our command line tool wrappers rather than adding more
> and more top level modules.  Maybe under Bio.Align, and move the Clustalw
> wrapper there too.
> 
> > Anyway, while doing so, I realised just how awkward the current interface to
> > Clustalw is, which doesn't fit the SeqIO/AlignIO paradigm well.
> 
> What I typically do fits pretty nicely with the SeqIO/AlignIO paradigm:
> (1) use SeqIO to prepare the FASTA input file.
> (2) run the command line tool (e.g. MUSCLE).
> (3) use AlignIO (or SeqIO) to read the alignment output file.
> 
> Actually I think that Bio.Clustalw interface is now a bit out of place,
> as it hides some of this from you.  (Note that Bio.Clustalw predates
> Bio.AlignIO, and that by working with handles Bio.AlignIO is fairly
> tool neutral).
> 
> > Currently, if we have a bunch of SeqRecords, say after downloading from
> > GenBank or being pulled from a BioSQL db, we have to write them to disk
> > and call clustalw on the file:
> >
> >>>> from Bio import Clustalw
> >>>> from Bio.Clustalw import MultipleAlignCL
> >>>> cline = MultipleAlignCL("f002", command="clustalw")
> >>>> align = Clustalw.do_alignment(cline)
> 
> Well yes. Typically for any alignment tool you'd have to write the
> unaligned records in FASTA format.  Some tools may let handle
> this via standard input, so you may be able to use a pipe instead
> of a file - but the issues are similar.
> 
> > It seems to me more appropriate to be able to call clustalw directly on a
> > bunch of SeqRecords:
> >
> > eg (suggested implementation)
> >>>> records = list(SeqIO.parse(open("f002", "r"), "fasta"))
> >>>> from Bio.Align import MultipleAlignment
> >>>> align = MultipleAlignment(records, executable="clustalw")
> 
> i.e. Have a Biopython wrapper use a temp file to record the
> given records to in a format appropriate for the command line
> tool selected, and capturing the output?  In the case of
> ClustalW or MUSCLE this means making a temp FASTA input
> file.  For ClustalW we'd then have to open the output file, read
> it, and then delete it.  For other tools we may be able to just
> capture its output on stdout and not have to clean up a temp
> output file.
> 
> All the possible command line tools have their own arguments,
> range of file formats, behaviour with respect to default filenames
> etc.  Trying to capture all this in a single wrapper seems rather
> ambitious.  For example, how would you handle gap penalties?
> Keep in mind that different tools may use the same name for
> a gap extension penalty but interpret the values differently.
> 
> Also, while I can see this might be nice for short alignments
> (which are quick to run), its rather implicit or magic.  I personally
> prefer to have to deal with the files explicitly myself - but then I
> have been dealing with large alignments which I want to keep
> on disk.
> 
> > Secondly, the biopython interface does not support calling
> > Clustalw to perform profile alignments,
> >
> > (suggested implementation)
> > # The scaffold alignment:
> >>>> align = AlignIO.read(open("blah.nex", "r"), "nexus")
> > # The sequences we want to add to it:
> >>>> records = list(SeqIO.parse(open("f002", "r"), "fasta"))
> >>>> from Bio.Align import ProfileAlignment
> >>>> align = ProfileAlignment(align, records, executable="clustalw")
> >
> > Calls to MultipleAlignment and ProfileAlignment would take a
> > **options parameter to collect any additional command line options.
> >
> > Thirdly, should an alignment object have a
> > Alignment.refine_alignment(executable="clustalw")
> > method?
> >
> > Any thoughts?
> 
> I may have misunderstood you, but the ideas you've sketched out
> seem very very broad/ambitious - and actually take us further away
> from the SeqIO/AlignIO interface by hiding all the filenames and
> handles from the user.  I think these should be kept explicit.
> 
> Peter
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev


From chapmanb at 50mail.com  Mon Mar 30 21:14:48 2009
From: chapmanb at 50mail.com (Brad Chapman)
Date: Mon, 30 Mar 2009 17:14:48 -0400
Subject: [Biopython-dev] Google Summer of Code -- phyloXML parser	project
In-Reply-To: <3f6baf360903301134p421a41f2if2b8980e9e166451@mail.gmail.com>
References: <3f6baf360903301134p421a41f2if2b8980e9e166451@mail.gmail.com>
Message-ID: <20090330211448.GF72956@sobchak.mgh.harvard.edu>

Hi Eric;
I would be happy to help with mentoring. I have been helping another
student with his application and could definitely give you feedback
on yours. Based on good ones coming through the list, it should be
detailed with a week by week description of what you plan to be working
on and specific deliverables. They also have a short description of the
motivation and your qualifications.

This is my first time doing this, so I don't know much about the
selection process. If more than one Biopython project was selected,
I couldn't realistically mentor both; I am not even sure if that
is a possibility. Either way, Google recommends having two mentors
per student so it would be good to have someone else step up as
well.

Let me know if you have any specific questions while you are getting
things together this week,
Brad

> Hi folks,
> 
> I noticed earlier this month that several Biopython developers had signed up
> as potential mentors in OBF's Summer of Code application. Although OBF
> apparently wasn't selected as a mentoring organization this year, some other
> bioinformatics-related groups were -- in particular, the National
> Evolutionary Synthesis Center's page mentions involvement with the Bio*
> projects:
> 
> http://socghop.appspot.com/org/show/google/gsoc2009/nescent
> 
> The project I'd like to work on is a phyloXML parser for Biopython.
> NESCent's idea list includes a similar entry for BioRuby (links below). I
> asked the mentor, Christian Zmasek, if it would be acceptable to do the
> project with Biopython instead of BioRuby, and he said it would, but he'd
> prefer to have a Biopython specialist on board as another mentor.
> 
> Would any of you be interested in being a mentor for this project? I imagine
> it would have some things in common with the existing Nexus parser, as a
> starting point.
> 
> http://www.phyloxml.org/
> https://www.nescent.org/wg/phyloinformatics/index.php?title=Phyloinformatics_Summer_of_Code_2009#phyloXML_support_in_BioRuby
> 
> Thanks,
> Eric
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev


From chapmanb at 50mail.com  Mon Mar 30 21:33:17 2009
From: chapmanb at 50mail.com (Brad Chapman)
Date: Mon, 30 Mar 2009 17:33:17 -0400
Subject: [Biopython-dev] Biopython on Twitter
In-Reply-To: <320fb6e00903291558o6299575dq80eea647b1c6a900@mail.gmail.com>
References: <320fb6e00903260849n683d3e39kf68fd91727970dc7@mail.gmail.com>
	<C5F15E4B.1FA8B%lpritc@scri.ac.uk> <20090329010652.GA914@kunkel>
	<320fb6e00903291558o6299575dq80eea647b1c6a900@mail.gmail.com>
Message-ID: <20090330213317.GG72956@sobchak.mgh.harvard.edu>

Hi Peter;
Thanks for the feedback. I was definitely not being critical of your
postings, or fishing for extra jobs for myself. On the contrary, I
was inspired by the news items and brainstorming some ways to get
additional people involved.

People who express an interest in Biopython and don't get involved
often list the following reasons:

- Not feeling like they are technically able to contribute. Perhaps
  they are just learning Python, or don't feel comfortable with the
  Biopython library itself.

- Traditional academics doesn't offer recognition for contributing to
  open source projects. While we can't change academics, we can try
  and come up with ways to improve the visibility of contributors and
  make sure they are recognized in the larger bioinformatics
  community.

My thought was that a "news coordinator" would give one or more
interested people a chance to help the community, learn more about
Biopython by being involved, and also increase name recognition for
everyone coding, bug fixing and discussing.

In terms of how it is done, those were only my random suggestions.
Certainly if someone took it up they could be as creative as they
want about how to go about it.

Brad


> On Sun, Mar 29, 2009 at 2:06 AM, Brad Chapman <chapmanb at 50mail.com> wrote:
> > Hi all;
> > It is great we are exploring getting news out about Biopython in
> > additional ways. One thing this can really help with is recognizing
> > contributions to Biopython. Another is pointing out interesting
> > discussion threads on the mailing lists and getting others involved.
> 
> Do you think the recent release notes and NEWS file entries have been
> a bit too impersonal?  We can certainly be a bit more explicit if people
> that is a good thing. For example, should we mention Bartek by name
> in the paragraph on the new Bio.Motif module?
> 
> This is linked to from the wiki's news page BTW:
> http://biopython.open-bio.org/SRC/biopython/NEWS
> http://cvs.biopython.org/cgi-bin/viewcvs/viewcvs.cgi/biopython/NEWS?cvsroot=biopython
> 
> > Do you think it would be worthwhile to "advertise" on the main list
> > for someone interested in coordinating news and communication?
> > ... Perhaps there are members who are interested in Biopython and
> > follow what is going on but aren't coders. This would be a way to
> > get involved, ...
> 
> Are you up for the job yourself Brad?  From your own blog we know
> you can and do write regularly anyway ;)   Would you like an account
> on the OBF news server? Email me off list and we can sort that out.
> 
> In terms of micro-blogging via twitter, you sound like you have a better
> feel for this than me - I don't even have a personal twitter account.
> 
> Monthly news posts (perhaps cc'd to the announcement email list)
> would be a nice idea - especially if we can encourage more lurkers
> to speak up.  For a while BioPerl had something like this going
> (digest emails or something), but it needs a pretty dedicated person
> or team.  In the meantime as you've noticed I've started making
> more use of our news facility myself...
> 
> Peter


From biopython at maubp.freeserve.co.uk  Mon Mar 30 21:58:52 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Mon, 30 Mar 2009 22:58:52 +0100
Subject: [Biopython-dev] Biopython on Twitter
In-Reply-To: <20090330213317.GG72956@sobchak.mgh.harvard.edu>
References: <320fb6e00903260849n683d3e39kf68fd91727970dc7@mail.gmail.com>
	<C5F15E4B.1FA8B%lpritc@scri.ac.uk> <20090329010652.GA914@kunkel>
	<320fb6e00903291558o6299575dq80eea647b1c6a900@mail.gmail.com>
	<20090330213317.GG72956@sobchak.mgh.harvard.edu>
Message-ID: <320fb6e00903301458s7216ec97gc4ac71a03d0fd350@mail.gmail.com>

On Mon, Mar 30, 2009 at 10:33 PM, Brad Chapman <chapmanb at 50mail.com> wrote:
> Hi Peter;
> Thanks for the feedback. I was definitely not being critical of your
> postings, ...

I hadn't had that impression, but that's still nice to hear ;)

> ... or fishing for extra jobs for myself.

Darn - I thought you'd be an excellent choice.

> On the contrary, I was inspired by the news items and
> brainstorming some ways to get additional people involved.

Well unless anyone already lurking on the dev mailing list steps
forward (*hint hint*), do you (Brad) want to try asking on the main
discussion list to see if there are any takers?

> People who express an interest in Biopython and don't get
> involved often list the following reasons:
>
> - Not feeling like they are technically able to contribute. Perhaps
> ?they are just learning Python, or don't feel comfortable with the
> ?Biopython library itself.

I find once they get over any shyness, even just having beginners
asking questions can be valuable in itself.  It shows us potential
blind spots, or areas of the documentation which need clarification
(or writing) - plus of course it can bring about discussions etc.

> - Traditional academics doesn't offer recognition for contributing to
> ?open source projects. While we can't change academics, we can try
> ?and come up with ways to improve the visibility of contributors and
> ?make sure they are recognized in the larger bioinformatics
> ?community.
>
> My thought was that a "news coordinator" would give one or more
> interested people a chance to help the community, learn more about
> Biopython by being involved, and also increase name recognition for
> everyone coding, bug fixing and discussing.

Some of us are very aware of this issue (accademic recognition for
contributions to projects like Biopython), and different employers
will take different attitudes here.  In some cases making our
contributors more visible won't always be a good idea...  In my
case work on Biopython was a definite plus point in landing my
current job, but there are of course still limits to how much work
time I can reasonably spend on this (and limits to how much time
I spend out of work - like right now on this email).

> In terms of how it is done, those were only my random suggestions.
> Certainly if someone took it up they could be as creative as they
> want about how to go about it.
>
> Brad

It's certainly worth a go :)

Peter


From biopython at maubp.freeserve.co.uk  Mon Mar 30 22:35:05 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Mon, 30 Mar 2009 23:35:05 +0100
Subject: [Biopython-dev] Testing Biopython with NumPy 1.3
In-Reply-To: <320fb6e00903300623j1f17fe6fia6ded742a7c610ec@mail.gmail.com>
References: <320fb6e00903300326x4cb5eb95r87dbd5c95d5379d9@mail.gmail.com>
	<320fb6e00903300329ra19fe06j1cd12477e591afdf@mail.gmail.com>
	<320fb6e00903300338v35b14fa2yc0d2ba68925808da@mail.gmail.com>
	<20090330130027.GB36526@sobchak.mgh.harvard.edu>
	<320fb6e00903300623j1f17fe6fia6ded742a7c610ec@mail.gmail.com>
Message-ID: <320fb6e00903301535j21ae6659r931c9be0fd17faf3@mail.gmail.com>

On Mon, Mar 30, 2009 at 2:23 PM, Peter wrote:
> I've just used NumPy 1.3.0rc1 with Python 2.4.3 on a Linux box, and
> test_Cluster and the rest of the Biopython tests passed. ?This looks like
> a Windows and/or Python 2.6 problem - I should be able to try a Linux
> machine with Python 2.6 tonight...

I've just tried it on Ubuntu Jaunty (Alpha 6), with Python 2.6.1+
(already installed), the wise and clustalw packages installed, Numpy
1.3.0rc1 installed from source, and Biopython CVS installed from
source.  Again, test_Cluster.py and the rest of our tests pass
(ignoring those with additional external dependencies like BioSQL,
fdist, simcoal2).

So, whatever is going wrong on test_Cluster.py seems to be specific to
Windows (XP) and Python 2.6 - and possibly just my Windows development
machine.

Peter


From mjldehoon at yahoo.com  Tue Mar 31 00:08:34 2009
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Mon, 30 Mar 2009 17:08:34 -0700 (PDT)
Subject: [Biopython-dev] Testing Biopython with NumPy 1.3
In-Reply-To: <320fb6e00903301535j21ae6659r931c9be0fd17faf3@mail.gmail.com>
Message-ID: <730606.962.qm@web62408.mail.re1.yahoo.com>


> So, whatever is going wrong on test_Cluster.py seems to be
> specific to
> Windows (XP) and Python 2.6 - and possibly just my Windows
> development
> machine.
> 
I believe that the problem is that msvcr90.dll is missing. This is the C runtime from Microsoft. Earlier Pythons used msvcr71.dll, if I'm not mistaken.

--Michiel


From biopython at maubp.freeserve.co.uk  Tue Mar 31 09:12:21 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Tue, 31 Mar 2009 10:12:21 +0100
Subject: [Biopython-dev] Testing Biopython with NumPy 1.3
In-Reply-To: <730606.962.qm@web62408.mail.re1.yahoo.com>
References: <320fb6e00903301535j21ae6659r931c9be0fd17faf3@mail.gmail.com>
	<730606.962.qm@web62408.mail.re1.yahoo.com>
Message-ID: <320fb6e00903310212o29bba163ma9d68a901eabc2c9@mail.gmail.com>

On Tue, Mar 31, 2009 at 1:08 AM, Michiel de Hoon <mjldehoon at yahoo.com> wrote:
>
>> So, whatever is going wrong on test_Cluster.py seems to be
>> specific to Windows (XP) and Python 2.6 - and possibly just
>> my Windows development machine.
>>
> I believe that the problem is that msvcr90.dll is missing. This
> is the C runtime from Microsoft. Earlier Pythons used
> msvcr71.dll, if I'm not mistaken.

You may be right - there is some stuff on the numpy mailing list
about this and manifest files etc when using mingw32.  It may
be simplest to try the appropriate MS compiler instead...

Peter


From biopython at maubp.freeserve.co.uk  Tue Mar 31 10:28:35 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Tue, 31 Mar 2009 11:28:35 +0100
Subject: [Biopython-dev] Python's new DVCS chosen
Message-ID: <320fb6e00903310328x3c2d8bc0n8138f551da7ea4a2@mail.gmail.com>

Hi all,

This might be of interest (although I'm sure some of you already
know).  Earlier this month on the python-dev mailing list, Guido
van Rossum wrote:

> Dear Python developers,
>
> The decision is made! I've selected a DVCS to use for Python.
> We're switching to Mercurial (Hg).
>
> The implementation and schedule is still up in the air -- I am
> hoping that we can switch before the summer.
> ...

http://mail.python.org/pipermail/python-dev/2009-March/087931.html
See also PEP-374, http://www.python.org/dev/peps/pep-0374/

Interestingly, Mercurial (Hg) didn't get much of a mention in our
discussions here.

Peter


From bartek at rezolwenta.eu.org  Tue Mar 31 11:05:07 2009
From: bartek at rezolwenta.eu.org (Bartek Wilczynski)
Date: Tue, 31 Mar 2009 13:05:07 +0200
Subject: [Biopython-dev] Python's new DVCS chosen
In-Reply-To: <320fb6e00903310328x3c2d8bc0n8138f551da7ea4a2@mail.gmail.com>
References: <320fb6e00903310328x3c2d8bc0n8138f551da7ea4a2@mail.gmail.com>
Message-ID: <8b34ec180903310405x5d5353f0q2de270a3c16bdc95@mail.gmail.com>

Hi,
On Tue, Mar 31, 2009 at 12:28 PM, Peter <biopython at maubp.freeserve.co.uk> wrote:
> Hi all,
>
> This might be of interest (although I'm sure some of you already
> know). ?Earlier this month on the python-dev mailing list, Guido
> van Rossum wrote:
>> We're switching to Mercurial (Hg).
> Interestingly, Mercurial (Hg) didn't get much of a mention in our
> discussions here.

Their evaluation of different options  (in PEP 374) was mentioned on
the list by Bruce, so everyone was able to make their opinions.

As Guido explains in another paragraph:
>It's hard to explain my reasons for choosing -- like most language
>decisions (especially the difficult ones) it's mostly a matter of gut
>feelings. One thing I know is that it's better to decide now than to
>spend another year discussing the pros and cons. All that could be
>said has been said, pretty much, and my mind is made up.

He seems to find all the candidates good enough. It's a matter then of
a consensus between developers. Git happened to have many antagonists
on python-dev list, but it happened to have more protagonists on
biopython-dev.

I think we have made a consensus decision to try out git/github and I
think it's extremely counter-productive to re-open the discussion on
our choice now. I'm not a git fanboy, but because there are _no_
universal criteria to choose between git vs. bzr vs. Hg we should not
spend more time on this issue.

cheers
  Bartek


From cy at cymon.org  Tue Mar 31 11:25:27 2009
From: cy at cymon.org (Cymon Cox)
Date: Tue, 31 Mar 2009 12:25:27 +0100
Subject: [Biopython-dev] Multiple alignment - Clustalw etc...
In-Reply-To: <320fb6e00903300737i73f6efaex7b0a22ee685c74c1@mail.gmail.com>
References: <7265d4f0903300442h276df25ay1d78fb04180c5b5b@mail.gmail.com> 
	<320fb6e00903300737i73f6efaex7b0a22ee685c74c1@mail.gmail.com>
Message-ID: <7265d4f0903310425p60a8f80ewb8aee8cc6b4a663c@mail.gmail.com>

Hi Peter,

2009/3/30 Peter <biopython at maubp.freeserve.co.uk>

> On Mon, Mar 30, 2009 at 12:42 PM, Cymon Cox <cy at cymon.org> wrote:
> >
> > Hi Folks,
> >
> > I've been trying to formalize a bunch of randomly scattered bits of code
> to
> > support the use of the alignment programme Muscle
> > (http://www.drive5.com/muscle/). I prefer to use this software in
> preference
> > to Clustalw - subjectively, it seems to give the most accurate
> alignments.
> > (Whether Biopython would want to support a second alignment programme
> > /external dependency is another matter...)
>
> A wrapper for MUSCLE wouldn't hurt - although there is scope for some
> rearrangement of our command line tool wrappers rather than adding more
> and more top level modules.  Maybe under Bio.Align, and move the Clustalw
> wrapper there too.


Agreed - it would seem more appropriate to have the alignment interfaces in
Bio.Align.


> > Anyway, while doing so, I realised just how awkward the current interface
> to
> > Clustalw is, which doesn't fit the SeqIO/AlignIO paradigm well.
>
> What I typically do fits pretty nicely with the SeqIO/AlignIO paradigm:
> (1) use SeqIO to prepare the FASTA input file.
> (2) run the command line tool (e.g. MUSCLE).
> (3) use AlignIO (or SeqIO) to read the alignment output file.


Well, yes - we can always not use the biopython interface.


> Actually I think that Bio.Clustalw interface is now a bit out of place,
> as it hides some of this from you. (Note that Bio.Clustalw predates
> Bio.AlignIO, and that by working with handles Bio.AlignIO is fairly
> tool neutral).
>
> > Currently, if we have a bunch of SeqRecords, say after downloading from
> > GenBank or being pulled from a BioSQL db, we have to write them to disk
> > and call clustalw on the file:
> >
> >>>> from Bio import Clustalw
> >>>> from Bio.Clustalw import MultipleAlignCL
> >>>> cline = MultipleAlignCL("f002", command="clustalw")
> >>>> align = Clustalw.do_alignment(cline)
>
> Well yes. Typically for any alignment tool you'd have to write the
> unaligned records in FASTA format.  Some tools may let handle
> this via standard input, so you may be able to use a pipe instead
> of a file - but the issues are similar.
>
> > It seems to me more appropriate to be able to call clustalw directly on a
> > bunch of SeqRecords:
> >
> > eg (suggested implementation)
> >>>> records = list(SeqIO.parse(open("f002", "r"), "fasta"))
> >>>> from Bio.Align import MultipleAlignment
> >>>> align = MultipleAlignment(records, executable="clustalw")
>
> i.e. Have a Biopython wrapper use a temp file to record the
> given records to in a format appropriate for the command line
> tool selected, and capturing the output?  In the case of
> ClustalW or MUSCLE this means making a temp FASTA input
> file.  For ClustalW we'd then have to open the output file, read
> it, and then delete it.


Yes, that's what I'm suggesting.

Here's my reasoning: it seems to me the input and output formats of the data
required by a particular alignment tool are incidental and should be hidden
from the user. At present the Clustalw interface forces you to write a fasta
formatted file of your records to disk, and then has Clustalw write an
aligned matrix to disk in a format specified by the user. If the latter is
Clustal format, then the record is parsed and an alignment object is
returned, else None is returned. In either case, an output file(s) remains
on disk.

So, say we have a bunch of sequences in pir format and we'd like them
aligned and saved in stockholm format:

from Bio import SeqIO
from Bio import AlignIO
from Bio import Clustalw
from Bio.Clustalw import MultipleAlignCL
records = SeqIO.parse(open("Tests/NBRF/DMA_nuc.pir", "r"), "pir")
AlignIO.write([records], open("temp.fasta", "w"), "fasta")
cline = MultipleAlignCL("temp.fasta", command="clustalw")
align = Clustalw.do_alignment(cline)
AlignIO.write([align], open("temp.sth", "w"), "stockholm")

we end up with 4 output files on disk: temp.aln,  temp.dnd,  temp.fasta,
temp.sth - 3 of which are incidental.

(BTW, using the above procedure on the files "B_nuc.pir" and "Cw_prot.pir"
in Tests/NBRF hangs on RH and Ubuntu linux: it seems to be waiting for the
subprocess to return, which it never does: pid, sts = os.waitpid(self.pid,
0))

As I say, I'd like to see this:
>>> from Bio.Align import MultipleAlignment
>>> records = list(SeqIO.parse(open("Tests/NBRF/DMA_nuc.pir", "r"), "pir"))
>>> align = MultipleAlignment(records, executable="clustalw")
>>> AlignIO.write([align], open("temp.sth", "w"), "stockholm")

ie resulting in one file temp.sth, which we've explicitly written to disk.


>  For other tools we may be able to just
> capture its output on stdout and not have to clean up a temp
> output file.
>
> All the possible command line tools have their own arguments,
> range of file formats, behaviour with respect to default filenames
> etc.  Trying to capture all this in a single wrapper seems rather
> ambitious.  For example, how would you handle gap penalties?
> Keep in mind that different tools may use the same name for
> a gap extension penalty but interpret the values differently.


Sorry, I wasn't very clear about what I intended:

MultipleAlignment(records, executable="clustalw", <keyword args>)
returns Clustalw.do_alignment(records, <keyword args>)
and
MultipleAlignment(records, executable="muscle", <keyword args>)
returns Muscle.do_alignments(records, <keyword args>)

I'm not suggesting unifying all programme options into a single interface,
just wrap the individual alignment tool modules in a common call,
MulitpleAlignment(), align_records(), or whatever...

As for the keyword options, at present the Clustalw interface supports the
manual setting of some attributes to the MultipleAlignCL instance, but there
is no type or value checking. I think as many options as possible should be
supported through keyword arguments - tedious, but doable.

Also, while I can see this might be nice for short alignments
> (which are quick to run), its rather implicit or magic.


Not sure what you mean here? Why would the size of alignment matter? And as
for it being magic, its seems to me it does, and only does, what it says on
the label - aligns the data.


>  I personally
> prefer to have to deal with the files explicitly myself - but then I
> have been dealing with large alignments which I want to keep
> on disk.


I tend to build many (small - <100 taxa) single gene alignments - in one
use-case, 280 of them...

> Secondly, the biopython interface does not support calling
> > Clustalw to perform profile alignments,
> >
> > (suggested implementation)
> > # The scaffold alignment:
> >>>> align = AlignIO.read(open("blah.nex", "r"), "nexus")
> > # The sequences we want to add to it:
> >>>> records = list(SeqIO.parse(open("f002", "r"), "fasta"))
> >>>> from Bio.Align import ProfileAlignment
> >>>> align = ProfileAlignment(align, records, executable="clustalw")
> >
> > Calls to MultipleAlignment and ProfileAlignment would take a
> > **options parameter to collect any additional command line options.
>

I'm very keen to see profile alignments supported - be it either in Clustalw
or Muscle, or both.

>
> > Thirdly, should an alignment object have a
> > Alignment.refine_alignment(executable="clustalw")
> > method?
> >
> > Any thoughts?
>
> I may have misunderstood you, but the ideas you've sketched out
> seem very very broad/ambitious - and actually take us further away
>
from the SeqIO/AlignIO interface by hiding all the filenames and
> handles from the user.  I think these should be kept explicit.


OK, well having had my say, I'm quite happy to write the Muscle module in
the style of the current Clustalw interface, or whatever style is most
appropriate for exposing the filename handles. But I'm not sure what that
would be - perhaps you could elaborate on this a bit...

Cheers, C.
-- 
____________________________________________________________________

Cymon J. Cox

Centro de Ciencias do Mar
Faculdade de Ciencias do Mar e Ambiente (FCMA)
Universidade do Algarve
Campus de Gambelas
8005-139 Faro
Portugal

Phone: +0351 289800909 ext 7909
Fax: +0351 289800051
Email: cy at cymon.org, cymon at ualg.pt, cymon.cox at gmail.com
HomePage : http://biology.duke.edu/bryology/cymon.html
-8.63/-6.77


From biopython at maubp.freeserve.co.uk  Tue Mar 31 11:27:07 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Tue, 31 Mar 2009 12:27:07 +0100
Subject: [Biopython-dev] Python's new DVCS chosen
In-Reply-To: <8b34ec180903310405x5d5353f0q2de270a3c16bdc95@mail.gmail.com>
References: <320fb6e00903310328x3c2d8bc0n8138f551da7ea4a2@mail.gmail.com>
	<8b34ec180903310405x5d5353f0q2de270a3c16bdc95@mail.gmail.com>
Message-ID: <320fb6e00903310427s46e45337g42ced1a8e9c3a37f@mail.gmail.com>

On Tue, Mar 31, 2009 at 12:05 PM, Bartek Wilczynski
<bartek at rezolwenta.eu.org> wrote:
> I think we have made a consensus decision to try out git/github and I
> think it's extremely counter-productive to re-open the discussion on
> our choice now. I'm not a git fanboy, but because there are _no_
> universal criteria to choose between git vs. bzr vs. Hg we should not
> spend more time on this issue.

I hadn't intended to reopen the debate - it was just a post for interests sake.

As you can probably tell from looking at the biopython network graph
on github (which I got to work on Linux but only with Adobe's flash
plugin - gnash etc didn't seem to cope), I've been getting to grips
with git (and github).

Peter


From biopython at maubp.freeserve.co.uk  Tue Mar 31 12:56:21 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Tue, 31 Mar 2009 13:56:21 +0100
Subject: [Biopython-dev] Multiple alignment - Clustalw etc...
In-Reply-To: <7265d4f0903310425p60a8f80ewb8aee8cc6b4a663c@mail.gmail.com>
References: <7265d4f0903300442h276df25ay1d78fb04180c5b5b@mail.gmail.com>
	<320fb6e00903300737i73f6efaex7b0a22ee685c74c1@mail.gmail.com>
	<7265d4f0903310425p60a8f80ewb8aee8cc6b4a663c@mail.gmail.com>
Message-ID: <320fb6e00903310556h670634c2rcaa56c254ade07c5@mail.gmail.com>

On Tue, Mar 31, 2009 at 12:25 PM, Cymon Cox <cy at cymon.org> wrote:
>> What I typically do fits pretty nicely with the SeqIO/AlignIO paradigm:
>> (1) use SeqIO to prepare the FASTA input file.
>> (2) run the command line tool (e.g. MUSCLE).
>> (3) use AlignIO (or SeqIO) to read the alignment output file.
>
> Well, yes - we can always not use the biopython interface.

Ideally step (2) in the above would be handled via a Biopython
command line wrapper, offering keyword arguments etc.

>> i.e. Have a Biopython wrapper use a temp file to record the
>> given records to in a format appropriate for the command line
>> tool selected, and capturing the output? ?In the case of
>> ClustalW or MUSCLE this means making a temp FASTA input
>> file. ?For ClustalW we'd then have to open the output file, read
>> it, and then delete it.
>
> Yes, that's what I'm suggesting.
>
> Here's my reasoning: it seems to me the input and output formats of the data
> required by a particular alignment tool are incidental and should be hidden
> from the user.

OK - I see this as doing some implicit behind the scenes magic.
Arguably this kind of thing is still nice to have if it makes things
simpler for the user.

I may over use this mantra, but "Explicit is better than implicit",
from the Zen of Python.  http://www.python.org/dev/peps/pep-0020/

> At present the Clustalw interface forces you to write a fasta
> formatted file of your records to disk, and then has Clustalw
> write an aligned matrix to disk in a format specified by the user.

The Clustalw tool only takes FASTA formatted input, so if you have
a bunch of sequences in memory you are forced to convert them
into FASTA format to use them as input.  The question is where
does this conversion take place - explicitly by the user, or implicitly
by a wrapper.

> If the latter is Clustal format, then the record is parsed and an alignment
> object is returned, else None is returned. In either case, an output file(s)
> remains on disk.

It should be a fairly simple enhancement to look at the arguments
to see if another output format we can parse was selected, e.g.
PHYLIP?) and also parse that.  Do you think that would be a
sensible addition to Bio.Clustalw.do_alignment?  Its never been
an issue for me as if you are using the Bio.Clustalw.do_alignment
interface you probably don't care about the output file format.

> So, say we have a bunch of sequences in pir format and we'd like them
> aligned and saved in stockholm format:
>
> from Bio import SeqIO
> from Bio import AlignIO
> from Bio import Clustalw
> from Bio.Clustalw import MultipleAlignCL
> records = SeqIO.parse(open("Tests/NBRF/DMA_nuc.pir", "r"), "pir")
> AlignIO.write([records], open("temp.fasta", "w"), "fasta")

The above line is wrong - it should be:
SeqIO.write(records, open("temp.fasta", "w"), "fasta")
At this point your PIR sequences are not yet aligned, so they'll (probably)
have different lengths, so shouldn't be treated as an alignment.  If it
doesn't raise an error maybe it should...

Also you don't explicitly close the handle this way.

> cline = MultipleAlignCL("temp.fasta", command="clustalw")
> align = Clustalw.do_alignment(cline)
> AlignIO.write([align], open("temp.sth", "w"), "stockholm")

> we end up with 4 output files on disk: temp.aln, ?temp.dnd, ?temp.fasta,
> temp.sth - 3 of which are incidental.

Yes - but as the ClustalW doesn't read in PIR files, and doesn't output
Stockholm files on its own, so this has to happen.  It's just a question
of who does it (the user, or the wrapper code).

> (BTW, using the above procedure on the files "B_nuc.pir" and "Cw_prot.pir"
> in Tests/NBRF hangs on RH and Ubuntu linux: it seems to be waiting for the
> subprocess to return, which it never does: pid, sts = os.waitpid(self.pid,
> 0))

I would guess this is because you never properly closed the
temp.fasta file, so it may not have been flushed to disk when the
Clustalw tool was called.

> As I say, I'd like to see this:
>>>> from Bio.Align import MultipleAlignment
>>>> records = list(SeqIO.parse(open("Tests/NBRF/DMA_nuc.pir", "r"), "pir"))
>>>> align = MultipleAlignment(records, executable="clustalw")
>>>> AlignIO.write([align], open("temp.sth", "w"), "stockholm")
>
> ie resulting in one file temp.sth, which we've explicitly written to disk.

So you'd like the wrapper to take care of creating and deleting the
temp input FASTA file, and also deleting the temp output ClustalW
file after parsing it.  This can probably be done quite cleanly using
python's NamedTemporaryFile object.

>>?For other tools we may be able to just capture its output on
>> stdout and not have to clean up a temp output file.
>>
>> All the possible command line tools have their own arguments,
>> range of file formats, behaviour with respect to default filenames
>> etc. ?Trying to capture all this in a single wrapper seems rather
>> ambitious. ?For example, how would you handle gap penalties?
>> Keep in mind that different tools may use the same name for
>> a gap extension penalty but interpret the values differently.
>
> Sorry, I wasn't very clear about what I intended:
>
> MultipleAlignment(records, executable="clustalw", <keyword args>)
> returns Clustalw.do_alignment(records, <keyword args>)
> and
> MultipleAlignment(records, executable="muscle", <keyword args>)
> returns Muscle.do_alignments(records, <keyword args>)
>
> I'm not suggesting unifying all programme options into a single interface,
> just wrap the individual alignment tool modules in a common call,
> MulitpleAlignment(), align_records(), or whatever...

I see.

> As for the keyword options, at present the Clustalw interface supports the
> manual setting of some attributes to the MultipleAlignCL instance, but there
> is no type or value checking. I think as many options as possible should be
> supported through keyword arguments - tedious, but doable.
>
>> Also, while I can see this might be nice for short alignments
>> (which are quick to run), its rather implicit or magic.
>
> Not sure what you mean here? Why would the size of alignment matter?

Size of alignment influences the compute time, and therefore is an issue for
anyone doing things at the python prompt.  Moreover, if the alignments are
big and slow, you generally want to make sure the output file is kept on disk,
as you'll probably want to read it more than once.

> And as for it being magic, its seems to me it does, and only does, what
> it says on the label - aligns the data.

The magic is the behind the scenes creation/deletion of the input/output
files, and the conversion between file formats.

>> I personally prefer to have to deal with the files explicitly myself
>> - but then I have been dealing with large alignments which I want
>> to keep on disk.
>
> I tend to build many (small - <100 taxa) single gene alignments - in one
> use-case, 280 of them...

In your case I would assume the alignment takes minutes to run.  You tend
to care more about preserving the output files if they take hours to create ;)

>> > Secondly, the biopython interface does not support calling
>> > Clustalw to perform profile alignments,

That is something we should probably add.

> OK, well having had my say, I'm quite happy to write the Muscle module in
> the style of the current Clustalw interface, or whatever style is most
> appropriate for exposing the filename handles. But I'm not sure what that
> would be - perhaps you could elaborate on this a bit...

I've elaborated, perhaps too much? ;)

Basically you seem to be thinking about a high level abstraction for
multiple alignment tools (dependent on the Bio.SeqIO and Bio.AlignIO
module), while I am more focused on the low level abstraction for
wrapping any command line tool.  This isn't to say we can't have both,
but to me it makes sense to start with the low level stuff first.

We (unfortunately) have several styles of command line tool wrappers
in Biopython already - this is a wart that has been on my mental to do
list for some time.  I think we should focus on dealing with command
line strings, and keep this separate from how the tools are invoked
(e.g. subprocess or os.system), preparation of input files, and how
any output is parsed.  As long as this core is in place, more advanced
wrappers are possible on top of this basic infrastructure (Tiago may
have some comments here from his Bio.PopGen work).

Essentially all our command line wrappers start by building a command
line string.  In some cases this command line string is exposed to the
user (e.g. Bio.EMBOSS), and they can choose how they want to invoke
it.  For example, they can explicitly opt to use the Python subprocess
module and pipes if they want to - or use a standard invocation from
Bio.Applications (we may want to add a couple of variations to this
module).

Other wrappers (e.g. Bio.Blast.NCBIStandalone) instead call the tool
for you. In the case of Bio.Blast.NCBIStandalone, if you don't want
the handles because you've told Blast to save its output to a file,
our wrapper still returns the standard output and standard error
handles - it is forced on you (see Bug 2654).   Also, there is no easy
way to see what the actual command line string was, which can make
debugging hard, and also prevents certain things (e.g. submitting the
command line as a task to a cluster of workstations).  At least
Bio.Clustalw offers a command line string object (MultipleAlignCL),
its just the do_alignment helper function I'm not so keen on.

The Bio.Clustalw.do_alignment wrapper is rather unusual in that it
automatically parses the output - while most of our wrappers don't.
Decoupling the parsing is more modular - it makes it easy for the user
to use any parser for the output from a command line tool (either
using stdout, or by reading an output file).  I like this, and it fits
with the handle based approach in most of our parsers.

So, I would suggest we think about adding new wrappers under Bio.Align
(e.g. Bio.Align.Clustalw, Bio.Align.Muscle, Bio.Align.TCoffee - or
perhaps all together in Bio.Align.Applications or something) based on
the Bio.Application module as used in Bio.EMBOSS.  We could then
deprecate Bio.Clustalw, which should also help tidy up the top level
name space.  Initially at least, I wouldn't include any clever wrapper
code at all.

Once we have the basic command line objects done, these could be used
to later add another layer on top implementing Cymon's ideas for
multiple alignment wrappers taking care of intermediate file and
inter-converting file formats on the fly, although I remain to be
convinced about the value this.  If you can pull it off (cross
platform, on several versions of python) then a user friendly high
level interface would be impressive.

Peter


From bartek at rezolwenta.eu.org  Tue Mar 31 13:14:39 2009
From: bartek at rezolwenta.eu.org (Bartek Wilczynski)
Date: Tue, 31 Mar 2009 15:14:39 +0200
Subject: [Biopython-dev] Python's new DVCS chosen
In-Reply-To: <320fb6e00903310427s46e45337g42ced1a8e9c3a37f@mail.gmail.com>
References: <320fb6e00903310328x3c2d8bc0n8138f551da7ea4a2@mail.gmail.com>
	<8b34ec180903310405x5d5353f0q2de270a3c16bdc95@mail.gmail.com>
	<320fb6e00903310427s46e45337g42ced1a8e9c3a37f@mail.gmail.com>
Message-ID: <8b34ec180903310614k1fe4a08bkac19c2cc96b36fad@mail.gmail.com>

On Tue, Mar 31, 2009 at 1:27 PM, Peter <biopython at maubp.freeserve.co.uk> wrote:
> On Tue, Mar 31, 2009 at 12:05 PM, Bartek Wilczynski
> <bartek at rezolwenta.eu.org> wrote:
>> I think we have made a consensus decision to try out git/github and I
>> think it's extremely counter-productive to re-open the discussion on
>> our choice now. I'm not a git fanboy, but because there are _no_
>> universal criteria to choose between git vs. bzr vs. Hg we should not
>> spend more time on this issue.
>
> I hadn't intended to reopen the debate - it was just a post for interests sake.
>
That's relieving. Maybe I'm becoming overly sensitive on the subject.

> As you can probably tell from looking at the biopython network graph
> on github (which I got to work on Linux but only with Adobe's flash
> plugin - gnash etc didn't seem to cope), I've been getting to grips
> with git (and github).
>
I haven't  checked for a while, but it seem's that we've got quite a number
 of people making changes on different branches. That's cool.

I'd like to encourage people to share their impressions of git+github
with others on  the list.
If there are any issues, it's better to discuss them early.

cheers
  Bartek


From biopython at maubp.freeserve.co.uk  Tue Mar 31 14:10:00 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Tue, 31 Mar 2009 15:10:00 +0100
Subject: [Biopython-dev] Easy Git - git for mere mortals?
Message-ID: <320fb6e00903310710x693527f2k25b49d958543939d@mail.gmail.com>

Hi all,

Have any of you tried out easygit (eg)?  If it is as good as it sounds
on their website, it might be a sensible option for those migrating
from CVS/SVN to git for the first time.
http://www.gnome.org/~newren/eg/

Reading the easygit documentation, it sounds like git gives the user
plenty of ways to shoot themselves in the foot (especially if used to
CVS/SVN), and a lot of what easygit does is catch some of these
potential mistakes.  They also stress you can mix and match git and
easy git, so it can act as a stepping stone to using git directly.

This presentation seems a fairly gentle introduction (with plenty of
for interest stuff in the second half that can be ignored),
http://www.gnome.org/~newren/eg/presentations/git-introduction.pdf

There are quite a few other wrappers for git too - all referred to as
"porcelain", which apparently follows from Linux's division of end
user commands in git into external "porcelain" and internal
"plumbing".  The "porcelain" are the bits of a bathroom the end user
sees (like the sink), while they normally only interact with the "ugly
plumbing" when something goes wrong (like dropping an ear ring down
the sink).  This kind of quirky language doesn't really make the
documentation any clearer in my opinion, still I'm sure things are
improving gradually (or at least, I hope they are).  For the moment
I've come to the conclusion the git man pages are not really suitable
for beginners.

Peter

P.S. For the moment, let's keep the wiki page focused on using git
itself directly - too many choices will confuse things.


From cy at cymon.org  Tue Mar 31 14:49:20 2009
From: cy at cymon.org (Cymon Cox)
Date: Tue, 31 Mar 2009 15:49:20 +0100
Subject: [Biopython-dev] Multiple alignment - Clustalw etc...
In-Reply-To: <320fb6e00903310556h670634c2rcaa56c254ade07c5@mail.gmail.com>
References: <7265d4f0903300442h276df25ay1d78fb04180c5b5b@mail.gmail.com> 
	<320fb6e00903300737i73f6efaex7b0a22ee685c74c1@mail.gmail.com> 
	<7265d4f0903310425p60a8f80ewb8aee8cc6b4a663c@mail.gmail.com> 
	<320fb6e00903310556h670634c2rcaa56c254ade07c5@mail.gmail.com>
Message-ID: <7265d4f0903310749x154623few2689a0285f5f6983@mail.gmail.com>

Hi Peter,

2009/3/31 Peter <biopython at maubp.freeserve.co.uk>

> On Tue, Mar 31, 2009 at 12:25 PM, Cymon Cox <cy at cymon.org> wrote:#
>

> > At present the Clustalw interface forces you to write a fasta
> > formatted file of your records to disk, and then has Clustalw
> > write an aligned matrix to disk in a format specified by the user.
>
> The Clustalw tool only takes FASTA formatted input, so if you have
> a bunch of sequences in memory you are forced to convert them
> into FASTA format to use them as input.  The question is where
> does this conversion take place - explicitly by the user, or implicitly
> by a wrapper.


Agreed - that's the question...


> > If the latter is Clustal format, then the record is parsed and an
> alignment
>  > object is returned, else None is returned. In either case, an output
> file(s)
> > remains on disk.
>
> It should be a fairly simple enhancement to look at the arguments
> to see if another output format we can parse was selected, e.g.
> PHYLIP?) and also parse that.  Do you think that would be a
> sensible addition to Bio.Clustalw.do_alignment?


No - I dont think there should be any output file (of any format) at all, an
alignment object should always be returned and the user explicitly write to
format they want using AlignIO. (But I think this becomes clearer below...)


>  Its never been
> an issue for me as if you are using the Bio.Clustalw.do_alignment
> interface you probably don't care about the output file format.


Quite. (Unless you are trying to write to a format not supported by
biopython e.g. GCG, GDE, of course.)


> > So, say we have a bunch of sequences in pir format and we'd like them
> > aligned and saved in stockholm format:
> >
> > from Bio import SeqIO
> > from Bio import AlignIO
> > from Bio import Clustalw
> > from Bio.Clustalw import MultipleAlignCL
> > records = SeqIO.parse(open("Tests/NBRF/DMA_nuc.pir", "r"), "pir")
> > AlignIO.write([records], open("temp.fasta", "w"), "fasta")
>
> The above line is wrong


Doh! Grrr...

Yeah, perhaps it should have raised an error - I'll follow this up elsewhere
- but even with the corrected line and explicitly opening and closing the
file handles, I still can get clustalw to align this file... (later...)

> we end up with 4 output files on disk: temp.aln,  temp.dnd,  temp.fasta,
> > temp.sth - 3 of which are incidental.
>
> Yes - but as the ClustalW doesn't read in PIR files, and doesn't output
> Stockholm files on its own, so this has to happen.  It's just a question
> of who does it (the user, or the wrapper code).


Yep...


> > As I say, I'd like to see this:
>  >>>> from Bio.Align import MultipleAlignment
> >>>> records = list(SeqIO.parse(open("Tests/NBRF/DMA_nuc.pir", "r"),
> "pir"))
> >>>> align = MultipleAlignment(records, executable="clustalw")
> >>>> AlignIO.write([align], open("temp.sth", "w"), "stockholm")
> >
> > ie resulting in one file temp.sth, which we've explicitly written to
> disk.
>
> So you'd like the wrapper to take care of creating and deleting the
> temp input FASTA file, and also deleting the temp output ClustalW
> file after parsing it.  This can probably be done quite cleanly using
> python's NamedTemporaryFile object.
>

Yep.


> >> Also, while I can see this might be nice for short alignments
>  >> (which are quick to run), its rather implicit or magic.
> >
> > Not sure what you mean here? Why would the size of alignment matter?
>
> Size of alignment influences the compute time, and therefore is an issue
> for
> anyone doing things at the python prompt.  Moreover, if the alignments are
> big and slow, you generally want to make sure the output file is kept on
> disk,
> as you'll probably want to read it more than once.


Agreed, but should the call to align the data (ie to clustalw) be writing
the output to disk or should the user be making an explicit call using
AlignIO?


> > And as for it being magic, its seems to me it does, and only does, what
> > it says on the label - aligns the data.
>
> The magic is the behind the scenes creation/deletion of the input/output
> files, and the conversion between file formats.


Fair enough - then magic it be... :)


> > OK, well having had my say, I'm quite happy to write the Muscle module in
> > the style of the current Clustalw interface, or whatever style is most
> > appropriate for exposing the filename handles. But I'm not sure what that
> > would be - perhaps you could elaborate on this a bit...
>
> I've elaborated, perhaps too much? ;)
>
> Basically you seem to be thinking about a high level abstraction for
> multiple alignment tools (dependent on the Bio.SeqIO and Bio.AlignIO
> module), while I am more focused on the low level abstraction for
> wrapping any command line tool.  This isn't to say we can't have both,
> but to me it makes sense to start with the low level stuff first.
>
> We (unfortunately) have several styles of command line tool wrappers
> in Biopython already - this is a wart that has been on my mental to do
> list for some time.  I think we should focus on dealing with command
> line strings, and keep this separate from how the tools are invoked
> (e.g. subprocess or os.system), preparation of input files, and how
> any output is parsed.  As long as this core is in place, more advanced
> wrappers are possible on top of this basic infrastructure (Tiago may
> have some comments here from his Bio.PopGen work).
>
> Essentially all our command line wrappers start by building a command
> line string.  In some cases this command line string is exposed to the
> user (e.g. Bio.EMBOSS), and they can choose how they want to invoke
> it.  For example, they can explicitly opt to use the Python subprocess
> module and pipes if they want to - or use a standard invocation from
> Bio.Applications (we may want to add a couple of variations to this
> module).
>
> Other wrappers (e.g. Bio.Blast.NCBIStandalone) instead call the tool
> for you. In the case of Bio.Blast.NCBIStandalone, if you don't want
> the handles because you've told Blast to save its output to a file,
> our wrapper still returns the standard output and standard error
> handles - it is forced on you (see Bug 2654).   Also, there is no easy
> way to see what the actual command line string was, which can make
> debugging hard, and also prevents certain things (e.g. submitting the
> command line as a task to a cluster of workstations).  At least
> Bio.Clustalw offers a command line string object (MultipleAlignCL),
> its just the do_alignment helper function I'm not so keen on.
>
> The Bio.Clustalw.do_alignment wrapper is rather unusual in that it
> automatically parses the output - while most of our wrappers don't.
> Decoupling the parsing is more modular - it makes it easy for the user
> to use any parser for the output from a command line tool (either
> using stdout, or by reading an output file).  I like this, and it fits
> with the handle based approach in most of our parsers.


Thanks for your thoughts on this, it helps clarify some things...


> So, I would suggest we think about adding new wrappers under Bio.Align
> (e.g. Bio.Align.Clustalw, Bio.Align.Muscle, Bio.Align.TCoffee - or
> perhaps all together in Bio.Align.Applications or something) based on
> the Bio.Application module as used in Bio.EMBOSS.  We could then
> deprecate Bio.Clustalw, which should also help tidy up the top level
> name space.  Initially at least, I wouldn't include any clever wrapper
> code at all.


OK, I'll aim for this with the Muscle code...

Cheers, C.
-- 
____________________________________________________________________

Cymon J. Cox

Centro de Ciencias do Mar
Faculdade de Ciencias do Mar e Ambiente (FCMA)
Universidade do Algarve
Campus de Gambelas
8005-139 Faro
Portugal

Phone: +0351 289800909 ext 7909
Fax: +0351 289800051
Email: cy at cymon.org, cymon at ualg.pt, cymon.cox at gmail.com
HomePage : http://biology.duke.edu/bryology/cymon.html
-8.63/-6.77


From biopython at maubp.freeserve.co.uk  Tue Mar 31 15:24:32 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Tue, 31 Mar 2009 16:24:32 +0100
Subject: [Biopython-dev] Multiple alignment - Clustalw etc...
In-Reply-To: <7265d4f0903310749x154623few2689a0285f5f6983@mail.gmail.com>
References: <7265d4f0903300442h276df25ay1d78fb04180c5b5b@mail.gmail.com>
	<320fb6e00903300737i73f6efaex7b0a22ee685c74c1@mail.gmail.com>
	<7265d4f0903310425p60a8f80ewb8aee8cc6b4a663c@mail.gmail.com>
	<320fb6e00903310556h670634c2rcaa56c254ade07c5@mail.gmail.com>
	<7265d4f0903310749x154623few2689a0285f5f6983@mail.gmail.com>
Message-ID: <320fb6e00903310824v6fb0e1d2gff32b3effccd00b1@mail.gmail.com>

On Tue, Mar 31, 2009 at 3:49 PM, Cymon Cox <cy at cymon.org> wrote:
>>>
>>> If the latter is Clustal format, then the record is parsed and an
>>> alignment object is returned, else None is returned. In either
>>> case, an output file(s) remains on disk.
>>
>> It should be a fairly simple enhancement to look at the arguments
>> to see if another output format we can parse was selected, e.g.
>> PHYLIP?) and also parse that. ?Do you think that would be a
>> sensible addition to Bio.Clustalw.do_alignment?
>
> No - I dont think there should be any output file (of any format) at all, an
> alignment object should always be returned and the user explicitly write to
> format they want using AlignIO. (But I think this becomes clearer below...)

Well there must be an output file, since ClustalW won't write its output
alignment to stdout.  Of course, you would have a wrapper which
deletes the output file after it has been parsed into an Alignment object.
However, we shouldn't change the existing Bio.Clustalw.do_alignment
function to do this (or to delete the .dnd guide tree), since people may
be using the call for these "side effects".

>> ?Its never been
>> an issue for me as if you are using the Bio.Clustalw.do_alignment
>> interface you probably don't care about the output file format.
>
> Quite. (Unless you are trying to write to a format not supported by
> biopython e.g. GCG, GDE, of course.)

What I was saying was Bio.Clustalw.do_alignment knows the requested
output format, and if it is ClustalW it automatically parses the output file
and returns the alignment.  Since this code was written, Bio.AlignIO was
added and could potentially be used to parse PHYLIP (etc) output from
the Clustalw tool.  And one day maybe GCG etc too.

i.e. Right now Bio.Clustalw.do_alignment will return an alignment if it is in
ClustalW format, or None if it isn't.  I'm suggesting Bio.Clustalw.do_alignment
could return an alignment when Bio.AlignIO can parse the requested file
format, or None if it can't.

This would only be a small enhancement, and may not be worth bothering
with if we are thinking about deprecating Bio.Clustalw with a replacement
under Bio.Align.

>> Size of alignment influences the compute time, and therefore is an issue
>> for anyone doing things at the python prompt. ?Moreover, if the alignments
>> are big and slow, you generally want to make sure the output file is kept
>> on disk, as you'll probably want to read it more than once.
>
> Agreed, but should the call to align the data (ie to clustalw) be writing
> the output to disk or should the user be making an explicit call using
> AlignIO?

The command line tool ClustalW will itself write the output to disk.  I don't
recall off hand, but other tools like Muscle may give the option of writing
to a file or to stdout.  In either case, the tool writes to a handle, and the
user may want to *read* this handle using Bio.AlignIO.

If I want the tool's output to go straight to a file, I'd get the tool to do it.
The only reason I can see to be *writing* the alignment with Bio.AlignIO
would be for file conversion (or after manipulating the alignment), and that
would done by the user's python code.

If you are talking about the data preparation (i.e. the input file rather than
the output file), then I think it is up to the user's code to prepare a suitable
input FASTA file (e.g. from SeqRecord objects with Bio.SeqIO) before
calling the command line tool.

>>> And as for it being magic, its seems to me it does, and only does, what
>>> it says on the label - aligns the data.
>>
>> The magic is the behind the scenes creation/deletion of the input/output
>> files, and the conversion between file formats.
>
> Fair enough - then magic it be... :)

:)

>> > OK, well having had my say, I'm quite happy to write the Muscle module in
>> > the style of the current Clustalw interface, or whatever style is most
>> > appropriate for exposing the filename handles. But I'm not sure what that
>> > would be - perhaps you could elaborate on this a bit...
>>
>> I've elaborated, ...
>
> Thanks for your thoughts on this, it helps clarify some things...

Oh good.  If you don't agree with any of that, do say so by the way.

>> So, I would suggest we think about adding new wrappers under Bio.Align
>> (e.g. Bio.Align.Clustalw, Bio.Align.Muscle, Bio.Align.TCoffee - or
>> perhaps all together in Bio.Align.Applications or something) based on
>> the Bio.Application module as used in Bio.EMBOSS. ?We could then
>> deprecate Bio.Clustalw, which should also help tidy up the top level
>> name space. ?Initially at least, I wouldn't include any clever wrapper
>> code at all.
>
> OK, I'll aim for this with the Muscle code...

That sounds good.  Now can I tempt you into trying out github at the same
time, so we can see your proposed code evolve in public?

Could I add at this point that I don't think the wrapper should set any default
arguments - leave that up to the command line tool itself.  Otherwise you can
get the situation where the Biopython defaults get out of sync with the tool's
own default values (an issue with our online qblast wrapper and the NCBI
change their default settings over time).

As an aside, I have used Muscle with Biopython thanks to its option for
strict Clustal ouput, which can be parsed by Bio.AlignIO fine.  For this I
just generated my own command line on the fly, but I was only using a
couple of the command line arguments.

Peter


From bugzilla-daemon at portal.open-bio.org  Tue Mar 31 17:05:50 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 31 Mar 2009 13:05:50 -0400
Subject: [Biopython-dev] [Bug 2799] UnknownSeq object (e.g. for QUAL files)
In-Reply-To: <bug-2799-42@http.bugzilla.open-bio.org/>
Message-ID: <200903311705.n2VH5oKe025136@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2799


biopython-bugzilla at maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED


------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk  2009-03-31 13:05 EST -------
Checked into CVS from
http://github.com/peterjc/biopython/tree/bug2799-UnknownSeq

Checking in Bio/Seq.py;
/home/repository/biopython/biopython/Bio/Seq.py,v  <--  Seq.py
new revision: 1.67; previous revision: 1.66
done
Checking in Bio/SeqRecord.py;
/home/repository/biopython/biopython/Bio/SeqRecord.py,v  <--  SeqRecord.py
new revision: 1.32; previous revision: 1.31
done
Checking in Bio/GenBank/__init__.py;
/home/repository/biopython/biopython/Bio/GenBank/__init__.py,v  <-- 
__init__.py
new revision: 1.106; previous revision: 1.105
done
Checking in Bio/SeqIO/InsdcIO.py;
/home/repository/biopython/biopython/Bio/SeqIO/InsdcIO.py,v  <--  InsdcIO.py
new revision: 1.9; previous revision: 1.8
done
Checking in Bio/SeqIO/QualityIO.py;
/home/repository/biopython/biopython/Bio/SeqIO/QualityIO.py,v  <-- 
QualityIO.py
new revision: 1.8; previous revision: 1.7
done
Checking in Tests/test_SeqIO.py;
/home/repository/biopython/biopython/Tests/test_SeqIO.py,v  <--  test_SeqIO.py
new revision: 1.50; previous revision: 1.49
done
Checking in Tests/output/test_GenBank;
/home/repository/biopython/biopython/Tests/output/test_GenBank,v  <-- 
test_GenBank
new revision: 1.41; previous revision: 1.40
done
Checking in Tests/output/test_SeqIO;
/home/repository/biopython/biopython/Tests/output/test_SeqIO,v  <--  test_SeqIO
new revision: 1.36; previous revision: 1.35
done


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From biopython at maubp.freeserve.co.uk  Tue Mar 31 17:12:37 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Tue, 31 Mar 2009 18:12:37 +0100
Subject: [Biopython-dev] SeqIO and qual: Question about reading and
	writing qual files
In-Reply-To: <320fb6e00903251630t45da293fl4d8d111b7e7eedc9@mail.gmail.com>
References: <9e2f512b0903232324qb509c60v4154d3e1bffb089e@mail.gmail.com>
	<320fb6e00903240249h4d0bf648rfd5de741e582f687@mail.gmail.com>
	<9e2f512b0903240759n3c7f8b8fpc96bccd4d629082d@mail.gmail.com>
	<320fb6e00903240813x5fdb3589qef340129b5e267c0@mail.gmail.com>
	<9e2f512b0903240833g7768de97q8f10fe72cde7e64a@mail.gmail.com>
	<320fb6e00903250301v59319214pa3246e0a49899e87@mail.gmail.com>
	<9e2f512b0903251615x7c14c90en3b3a9b2b6ff86186@mail.gmail.com>
	<320fb6e00903251630t45da293fl4d8d111b7e7eedc9@mail.gmail.com>
Message-ID: <320fb6e00903311012y393761dev975a39464ab82043@mail.gmail.com>

On Thu, Mar 26, 2009 at 12:30 AM, Peter <biopython at maubp.freeserve.co.uk> wrote:
> On Wed, Mar 25, 2009 at 11:15 PM, Sebastian Bassi:
>>> Sebastian - could you have a quick play with this github code (using the new
>>> UnknownSeq class), and the current CVS code (using None), and make sure
>>> both support the slicing operations you were trying earlier? ?Thanks.
>>
>> ...
>>
>> From a practical point of view, both versions are the same, but the
>> concept of UnknownSeq looks solid than None, because if I don't know
>> about about biopython internals, I would never try to slice a None
>> seq. With "None" ...
>> But with the UnknownSeq object, len(s) returns an actual length, so it
>> is more intuitive that it can be sliced.
>
> I agree the UnknownSeq is more intuitive - plus it makes the SeqRecord
> __getitem__ code nicer, and it means you can do len(SeqRecord) too,
> which was problematic if the sequence was None.

I've checked this into CVS after this discussion (and a little off thread).
I wasn't comfortable with using None for a sequence, and doing this
while also wanting to support len(...) and slicing on such SeqRecord
objects was basically horrible.

>> Then I tried the git code and it also worked. One thing I noticed is
>> that I got "?" instead of "N" the "sequence" of the UnknownSeq.
>
> I felt we shouldn't use an "N" unless we are confident the sequence
> is nucleotides.  In practice, this is probably a safe assumption for
> FASTQ and QUAL files - unless anyone can think of a counter example?
> Do you think it is safe to assume FASTQ and QUAL files are just for
> nucleotides?
>
> I mean, you could translate a CDS from transcriptome sequencing,
> and for the sake of argument give each amino acid a quality score
> from the three nucleotide quality scores, and then save this a protein
> FASTQ file.  But I've never heard of anyone actually doing this ;)

So, should we assume QUAL files (and perhaps FASTQ files) are
nucleotides when reading them in, and enforce this when writing
them out?  This would mean the QUAL files' UnknownSeq objects
would use the letter "N" instead of "?".

Or is it more generic to leave it as it is, and not make or force any
assumptions about the nature of the sequence?

Peter


From biopython at maubp.freeserve.co.uk  Tue Mar 31 21:38:48 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Tue, 31 Mar 2009 22:38:48 +0100
Subject: [Biopython-dev] Plan for Biopython 1.50 (beta)
Message-ID: <320fb6e00903311438g6fb0813bt18a035d485a6bb99@mail.gmail.com>

Hi all,

OK guys, after a brief chat off the mailing list, I'm hoping to do the
Biopython 1.50 beta release roughly this weekend, somewhere between
Friday 4 and Monday 6 April.  Until then please consider CVS "frozen"
for anything other that documentation changes or unit test additions,
or at a push really tiny changes.  Once I'm ready to actually do the
release, I'll send out an email requesting no further CVS commits.

Those of you that have committed changes, please check the NEWS file
and DEPRECATED file is up to date - thanks.

After the release of Biopython 1.50 beta, we'll reopen CVS again for
small changes and documentation.  While the beta is being tested by
our user base, I'd like us to push to finish any missing documentation
- in particular for new modules Bio.Motif (Bartek) and
Bio.Graphics.GenomeDiagram (me and/or Leighton), plus the new
SeqRecord slicing and UnknownSeq class (me).

Depending on the feedback from the beta, I'd hope we can do the final
release of Biopython 1.50 well before the end of April, and then
reopen CVS for new code.

That would also be a good point to evaluate moving from CVS to git.
In the meantime, while CVS is (semi) frozen you can all try using
github for keeping your pending submissions under version control ;)

Peter