[Bioperl-l] TFBS databases, Bio::Matrix::PSM suitable?

Tue Aug 22 13:14:34 UTC 2006

Chris Fields wrote:
> Many Bio::DB* modules access the database to get the raw data, and this 
> is attached to an Bio::*IO stream class in some way (for most cases).  
> There are a few that get around this; for instance, Bio::DB::Taxonomy* 
> uses no specialized SeqIO-like class.  

Yes, Taxonomy being what I'm familiar with I was thinking of doing it 
the same way, especially given that there are so many completely 
different kinds of information you would want to get out of a TFBS 
database. I'll look into how it is 'normally' done if anyone suggests 
that would be better.

> Like you mentioned, you could extend Bio::Matrix::PSM::IO::transfac 
> specifically to encompass the 'instance' sequences (the other PSM::IO 
> modules wouldn't have the same methods available to them), use 
> SimpleAlign or SeqFeature::SimilarityPair (I agree the former is 
> probably better).

It's better because we're talking about a multiple alignment almost 
always with more than 2 sequences, so SimilarityPair would not be 
appropriate...

> Or have the Bio::DB module set up to grab either your 
> 'instance' sequences by ID (where you could possibly implement 
> RandomAccessI)

... though having said that you'd still want access to the individual 
sequences by ID.

> Does the TFBS package have any overlap here?  I haven't used them (they 
> require PDL which is a pain to install on WinXP) but they are supposed 
> to be fully integrated with Bioperl.

TFBS::DB::Local_TRANSFAC parses only the pure matrix information; even 
Bio::Matrix::PSM::IO::transfac parses out more of the information and 
makes it available in a useful way.

Transfac is far more complicated, interesting and useful than just the 
matrix.dat file though.