[Biopython-dev] [Bug 2622] New: Parsing between position locations like 5933^5934 in GenBank/EMBL files

bugzilla-daemon at portal.open-bio.org bugzilla-daemon at portal.open-bio.org
Tue Oct 21 07:28:57 EDT 2008


http://bugzilla.open-bio.org/show_bug.cgi?id=2622

           Summary: Parsing between position locations like 5933^5934 in
                    GenBank/EMBL files
           Product: Biopython
           Version: Not Applicable
          Platform: All
        OS/Version: All
            Status: NEW
          Severity: normal
          Priority: P2
         Component: Main Distribution
        AssignedTo: biopython-dev at biopython.org
        ReportedBy: biopython-bugzilla at maubp.freeserve.co.uk


GenBank and EMBL files can contain features with locations like 123^456,
handled in Biopython as BetweenPosition objects.

Quoting ftp://ftp.ncbi.nih.gov/genbank/gbrel.txt
> A site between two residues, such as an endonuclease cleavage site, is
> indicated by listing the two bases separated by a carat (e.g., 23^24).

A small GenBank example containing examples of this is NC_005816.gbk available
here:
ftp://ftp.ncbi.nih.gov/genomes/Bacteria/Yersinia_pestis_biovar_Microtus_91001/NC_005816.gbk

e.g.
     variation       5933^5934
                     /note="compared to AL109969"
                     /replace="a"
     variation       5933^5934
                     /note="compared to AF053945"
                     /replace="aa"

For a larger example, see NC_005027.gbk
ftp://ftp.ncbi.nih.gov/genomes/Bacteria/Pirellula_sp/NC_005027.gbk

e.g.
     misc_feature    41855^41856
                     /note="cosmid pircos-a3a12/ cosmid pircos-a1d04 joining
                     point"

See also one of the Biopython unit test examples, SC10H5.embl, a pre-2006 style
EMBL file from BioPerl.

As the following example script and its output will show, Biopython CVS (and I
presume several releases) does not parse these locations sensibly.  There are
at least two issues, firstly there is a numerical error from treating 5933^5934
as 5932^11866 (position versus extension) and secondly the representation of
these locations might be better not using separate start/end objects.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


More information about the Biopython-dev mailing list