[Biopython-dev] [Bug 2799] New: UnknownSeq object (e.g. for QUAL files)

bugzilla-daemon at portal.open-bio.org bugzilla-daemon at portal.open-bio.org
Tue Mar 24 18:25:17 UTC 2009


http://bugzilla.open-bio.org/show_bug.cgi?id=2799

           Summary: UnknownSeq object (e.g. for QUAL files)
           Product: Biopython
           Version: Not Applicable
          Platform: All
        OS/Version: All
            Status: NEW
          Severity: enhancement
          Priority: P2
         Component: Main Distribution
        AssignedTo: biopython-dev at biopython.org
        ReportedBy: biopython-bugzilla at maubp.freeserve.co.uk


Sometimes we want to represent an unknown sequence with a known length, e.g.
"N"*length for nucleotides.  This enhancement is about adding an UnknownSeq
object to Biopython which would have the following init arguments:

* length
* alphabet
* character (single letter string, defaulting to "X" for protein and "N" for
nucleotides, "?" otherwise)

Currently the Bio.SeqIO "qual" parser produces SeqRecord objects where the seq
is None, yet there is a known length.  This can also occur in GenBank files
where the is a CONTIG line but no sequence.  This makes supporting slicing (Bug
2507) complicated.  Adding a new UnknownSeq class would solve this elegantly.

In general, the UnknownSeq object should act as a Seq object whose sequence is
the character*length.

Slicing or adding UnknownSeq objects should give a new UnknownSeq object. 

Complement, reverse complement, transcribe and back transcribe can also return
new UnknownSeq objects of the same length (alphabet permitting).  Translation
can return an UnknownSeq object using "X" and a protein alphabet (with the
length roughly one third of the nucleotide length - whatever is consistent with
the Seq translate method).

Adding an UnknownSeq object to a Seq would have to give a new Seq object (or an
error?).  One use-case example here would be joining together contigs with
unknown regions of a given length (strings of N's).

This bug is a placeholder for patches or pointers to possible implementations
(e.g. I intend to try some ideas on a branch on github).  I expect most of the
discussion to be on the (dev) mailing list, rather than bugzilla.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.



More information about the Biopython-dev mailing list