[Bioperl-l] PSI-BLAST Matrix Parser?

Stefan Kirov skirov at utk.edu
Wed Sep 8 14:07:21 EDT 2004


This seems reasonable to me. The one thing you need to consider is the 
structure that should contain the matrix. The current design of

Bio::Matrix::PSM::Psm and Bio::Matrix::PSM::SiteMatrix does not allow this as SiteMatrix is a DNA only object.
There are two ways to go:
Either change SiteMatrix to accept protein matrix data or add a protein matrix class to Bio::Matrix::PSM (say Bio::Matrix::PSM::ProtMatrix), which will hold the data and make Bio::Matrix::PSM::Psm inherit from the class and be able to contain the object (as it is actually a container right now).
So you will have something like:
my $psmIO= new Bio::Matrix::PSM::IO(-file=>$file, -format=>'psi-blast'); #this will call the actual parser (Bio::Matrix::PSM::IO::psiblast)
  my $header=$psmIO->.... #I guess there will be some header data

  while (my $psm=$psmIO->next_psm) {
   my $psimatrix=$psm->protmatrix; #This will be Bio::Matrix::PSM::ProtMatrix object 
   $psimatrix->.....; #Now process the data parsed into this object through its methods...
  }

If you do this maybe you should get an account and commit it yourself?
Does this make sense to you?
Stefan

James Thompson wrote:

>Stefan,
>
>Thanks for the response. For reading in the actual alignment I would use
>Bio::AlignIO to read the PSI-BLAST output as it's just another alignment file,
>but the matrix file that I'm talking about is slightly different. Now that
>I've perused CVS more and learned more about how the Bio::Matrix::PSM modules
>work, I think I have a more clear picture of what I'd like to do. 
>
>If you run PSI-BLAST with the -Q option, will take the matrix that it
>used for the position-specific search and output it to a file. I've put up a
>link to one of my matrix files up here if you'd like to look at it:
>
>http://bioinformatics.rit.edu/~tex/atp1.matrix
>
>Basically I'd like to make some Bio::Matrix::PSM::Psm objects (or at least
>a PsmI-compliant object), and I think that the correct way to do this would
>be to add a file format parser to Bio::Matrix::PSM::IO. Currently in Bioperl
>there are three format parsers:
>   - mast
>   - meme
>   - transfac
>
>None of these work with the PSI-BLAST matrix files.  I'd like to write a new
>matrix file parser (perhaps called psi-blast?) in the spirit of the three other
>parsers.
>
>If I were to write this, could someone commit it for me? 
>
>James Thompson
>
>On Tue, 7 Sep 2004, Stefan A Kirov wrote:
>
>  
>
>>I am not sure what object you are going to store your data in... Are you
>>going to develop your own class to hold the data or use an existing one?
>>Also is there any reason not to use Bio::AlignIO (it reads PSI-Blast as
>>far as I know)?
>>Stefan
>>
>>
>>On Tue, 7 Sep 2004, James Thompson wrote:
>>
>>    
>>
>>>Dear Bioperl-ers,
>>>
>>>I'd like to parse the output of a PSI-BLAST matrix, and I was wondering if
>>>there was a Bioperl way of parsing these files. If not, I'd like to make my
>>>code general enough to be committed, and I'd like some advice on where exactly
>>>to put such a module. From my cursory knowledge of Bioperl, I think that adding
>>>another format parser to Bio::Matrix::PSM::IO would be a good way to go.
>>>
>>>I have a couple of questions:
>>>- Does anyone know what the PSI-BLAST matrix format is called?
>>>- Is this the correct place in which to put code for parsing this type of files?
>>>
>>>The file format represents a position-specific scoring matrix with some added
>>>statistical information, here's a general overview of the information available
>>>      
>>>
>>>from the matrix file:
>>    
>>
>>>Last position-specific scoring matrix computed, weighted observed percentages
>>>rounded down, information per position, and relative weight of gapless real
>>>matches to p seudocounts.
>>>
>>>Any help is greatly appreciated.
>>>
>>>James Thompson
>>>
>>>_______________________________________________
>>>Bioperl-l mailing list
>>>Bioperl-l at portal.open-bio.org
>>>http://portal.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>>      
>>>
>
>
>  
>

-- 
Stefan Kirov, Ph.D.
University of Tennessee/Oak Ridge National Laboratory
1060 Commerce Park, Oak Ridge
TN 37830-8026
USA
tel +865 576 5120
fax +865 241 1965
e-mail: skirov at utk.edu
sao at ornl.gov



More information about the Bioperl-l mailing list