[Biopython-dev] [Bug 2968] New: Modifications to Emboss eprimer3 parser and associated files

Wed Dec 9 17:57:37 UTC 2009

http://bugzilla.open-bio.org/show_bug.cgi?id=2968

           Summary: Modifications to Emboss eprimer3 parser and associated
                    files
           Product: Biopython
           Version: 1.52
          Platform: All
        OS/Version: All
            Status: NEW
          Severity: enhancement
          Priority: P2
         Component: Main Distribution
        AssignedTo: biopython-dev at biopython.org
        ReportedBy: lpritc at scri.sari.ac.uk

The existing Emboss primer3/eprimer3 code has a couple of issues, and some
scope for improvement:

- The existing Primer3.py parser code can only parse output when eprimer3 is
applied to a single sequence.  When eprimer3 is applied to multiple sequence
input, it groups all primers for all sequences into a single record, which may
incorrectly associate primers with the wrong sequences in downstream analysis.
- The current parser lacks an iterator for iterating over multiple sequence
output
- The current parser creates 'ghost' primers for all primer pairs, with length
zero and sequence as an empty string; it does not do this for internal oligos. 
A more intuitive solution might be to return None for absent primers/oligos
- The current data model stores all primer data as individual attributes.  It
might be more useful to group the attributes of individual primers into their
natural associations

I have written new code for Emboss/Primer3.py that adds iterator/multiple
sequence parsing functionality to the parser, and extensively revises the
object model for the data.  The Record and Primers objects are retained, but
each primer/oligo is now represented by a Primer object that collects the
relevant data together.  The Record object has a new attribute that allows the
sequence to be recorded directly, rather than having to be parsed from the
comments attribute.  The new data model retains the old attribute-based access
for compatibility, but adds direct access to the Primer objects (where present)
by .forward, .reverse and .oligo attributes, and by keywords.

One change was required to the unit test, to account for the reporting of
absent primers as None, rather than having 'null' attributes.  I've added two
further test output files, which may be rather large for the distribution (60kb
total), and doctests that use these.

The code can be inspected at my GitHub repository:

http://github.com/widdowquinn/biopython/commit/b4701079ced297d7af5aa75b46738280c8783fe0

This enhancement request also relates to bug 2966.

-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.