[Biopython] SeqRecord subclassing or composition

Peter Cock p.j.a.cock at googlemail.com
Wed Mar 9 04:04:26 EST 2011


On Wed, Mar 9, 2011 at 3:07 AM, Uri Laserson <laserson at mit.edu> wrote:
> I am trying to implement a data type for my work.  Each object will have a
> sequence (derived from a single read) and lots of annotations and features.
>  However, I want to implement some extra interface that is problem-specific
> to make my analysis more convenient.
>
> I am debating whether to subclass SeqRecord and simply implement the extra
> interface or define a new object that wraps a SeqRecord object and pass on
> the subset of native SeqRecord calls and/or simply access the underlying
> SeqRecord directly.
>
> One additional factor is that I want to be able to read/write INSDC-style
> files for the data (e.g., GenBank).  Therefore, if I use the SeqIO parser,
> it will return native SeqRecords.  If I go the inheritance route, how do I
> cast a SeqRecord object to my new subclass?

There is (currently at least) no option in SeqIO parse/read
to override the use of the SeqRecord object. So you'd need
code to 'upgrade' a SeqRecord into your class. Probably
the simplest route would be for it's __init__ method to
take a single argument (a SeqRecord). Then you could
have:

def my_parse(...):
    for seq_record in SeqIO.parse(...):
        yield MyClass(seq_record)

def my_read(...):
    return MyClass(SeqIO.read(...))

etc

> So, I am debating between inheritance
>
> class ImmuneChain(SeqRecord):
>    def __init__(self, *args, **kw):
>        SeqRecord.__init__(self,*args,**kw)
>        # But how do I cast a SeqRecord to an ImmuneChain?

Unless you modify the methods/atttributes too much, a
ImmuneChain subclass of SeqRecord should be usable
as is with SeqIO.write etc. You don't need to 'cast'.

Also note the above __init__ method can be more specific,
you might have say 10 init args for ImmuneChain,  only
some of which you pass to the SeqRecord init.

You could even have a single __init__ argument of a
SeqRecord, and copy all its attributes.

> or composition
>
> class ImmuneChain(object):
>    def __init__(self, *args, **kw):
>        if isinstance(args[0],SeqRecord):
>            self._record = args[0]
>        else:
>            # Initialize the underlying SeqRecord manually
>            self._record.seq = ...

With the above approach you'd have to pass the
private record to SeqIO.write etc (anything which
needs a SeqRecord). That could be done inside
methods of the ImmuneChain object (e.g. you
could expose the format method of the SeqRecord).

>
> Any thoughts?
>

You could alternatively go for a procedural style where
you write your code as functions taking SeqRecord
objects (perhaps expecting particular information in
the annotation).

Peter



More information about the Biopython mailing list