[Biopython-dev] Parsing TRANSFAC matrices with Bio.Motif

Michiel de Hoon mjldehoon at yahoo.com
Tue Aug 7 04:39:15 EDT 2012


Hi Bartek,

Thanks for your reply.

--- On Tue, 8/7/12, Bartek Wilczynski <bartek at rezolwenta.eu.org> wrote:
> If you do, then you get access to a number of interconnected
> datasets, including information about what they call "matrices",
> "sites" and "transcription factors" and "classes". I think that if
> we want to support their filetypes, we probably should think whether
> we should support the matrix file only or maybe the other ones asa
> well.

I would suggest to just support the matrices for now.

> The confusing part is that many programs use "transfac-like"
> formats, i.e. files very similar to the part in the "matrix"
> file that corresponds to the PWM itself. (For example see
> http://www.benoslab.pitt.edu/stamp/help.html).

This also means that if Bio.Motif can parse TRANSFAC files, then it
can parse the transfac-like formats, at least to some degree. Personally I am actually more interested in the SwissRegulon database, which uses a transfac-like format

> Then comes the thing with annotations. I would rather
> vote for something more similar to SeqRecord and Seq,
> where a new class (MotifRecord?) would hold all the
> annotation data from TRANSFAC or somesuch DB, and the
> Motif would remain more sequence-like.

Are you suggesting that MotifRecord subclasses Bio.Motif._Motif.Motif?
For example we could have a Bio.Motif.Parsers.TRANSFAC.Motif class that subclasses Bio.Motif._Motif.Motif. Then  Bio.Motif._Motif.Motif remains sequence-like, and Bio.Motif.Parsers.TRANSFAC.Motif takes care of the annotations.

Alternatively we could say that Bio.Motif.Parsers.TRANSFAC.read returns a Bio.Motif.Parsers.TRANSFAC.Record object that contains the motif information as an attribute (so record.motif would be an instance of Bio.Motif._Motif.Motif).

Best,
-Michiel



More information about the Biopython-dev mailing list