Bioperl: Bio::Tools::Blast vs. Bio::GSC::Tool::Blast

Nigel Brown brown@ebi.ac.uk
Mon, 8 Jun 1998 19:16:49 +0100 (BST)


A while ago I mentioned the MView software for displaying search results or
pre-computed multiple alignments in an HTML page. The underlying parsers
might also be of interest in this discussion...

Background
----------

The idea here was to have some boiler-plate code for (recursively) breaking
down a (nested) flatfile into records (and sub-records) *on demand*. An
initial pass finds and indexes the top-level record locations, one
intention being that these could be saved for subsequent radom access to an
old BLAST run or an EMBL feature table, or whatever. New parsers are
subclassed from these and are relatively easy to build and test, since most
work is reusing the boiler-plate and then just embedding the necessary
regexps for detailed parsing.

There is NO attempt at providing any kind of higher-level behaviour such as
initiating runs or embedding HTML - that's for the caller. If the caller
wants some datum, they access it explicitly as an object attribute (ouch!),
ie., there are no nice access methods.

Parsers already there
---------------------

In the ftp file (see below) are various test datafiles I use for the MView
production and simple test scripts for loading them together with the
expected output.

examples/blastn:
  blastn_1.4.9.big.dat
  blastn_1.4.9.dat
  blastn_2.0a19MP-WashU.dat

examples/blastp:
  blast2_2.0.4.dat
  blast2_2.0a13MP-WashU.dat
  blastp_1.4.7.dat
  blastp_1.4.9+hist.dat
  psi-blast_2.0.2.dat
  psi-blast_2.0.4.dat

examples/blastx:
  blastx_1.4.9.dat

examples/fasta:
  fasta_1.6c24.dat
  fasta_2.0u.dat
  fasta_2.0u.dna.dat

  fasta_3.0t76.dat
  tfastx_2.0u.dat
  tfastx_3.0t.dat

examples/hssp:
  9wga.hssp

examples/multi:
  clu_1.51.dat
  clu_1.60.dat
  clu_1.70.dat
  msf.1.dat
  msf.2.dat
  msf.3.dat
  pear.dat

I've also parsers for EMBL/GenBank concentrating on the Feature Table, but
less complete, since I have no need of them right now - this stuff is
driven by my needs rather than untainted altruism.


Todo if I ever find time and this doesn't become obsolete
---------------------------------------------------------

Write Pods.

Write a meta-programming tool for defining regexps and actions (a la
icarus i suppose, but simpler?) that would be used to synthesize the actual
parser subclass from the boiler-plate stuff.

Some kind of self-documenting meta-level description on any format(ASN.1?).

Systematize the BLAST and FASTA class hierarchies better.


The code
--------

MView:
  http://columba.ebi.ac.uk:8765/mview/

Parser stuff (anon ftp):
  www.sander.ebi.ac.uk:/pub/nige/Parse.tar.gz


Example test script
-------------------

This test script parses PSI-BLAST output, and has subrecords named like:

HEADER
RANKING
SEARCH
  PROTEIN
    PSUM
    PHIT
    PHIT
    PHIT
    ...
  PROTEIN
    PSUM
    PHIT
    PHIT
    ...
SEARCH
...
PARAMETERS
WARNINGS


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
#!/usr/bin/env perl5

$^W = 1;

use strict;

use lib '/people/nbrown/work/perl/lib';
use Universal;
use Parse::Search::BLAST2;

my @datfiles = qw(test.bp2);

@datfiles = @ARGV if @ARGV;

my ($file, $entry, $ob, $search, $frag);


foreach $file (@datfiles) {
    open(DATA, "< $file") or die "can't open $file\n";
    while ($entry = Parse::Search::BLAST2::scan_entry($file, *DATA)) {
	
	$entry->print; print "\n";
	
	print $entry->string('Header'); print "\n";
	
	print "COUNTS:  ", join(", ", $entry->count), "\n";
	if ($entry->count('WARNING')) {
	    foreach $ob ($entry->parse('WARNING')) {
		$ob->print; print "\n";
	    }
	}
	
	foreach $ob ($entry->parse('HEADER')) {
	    $ob->print; print "\n";
	    #print $ob->string; print "\n";
	} 
	
	foreach $search ($entry->parse('SEARCH')) {
	    $search->print; print "\n";
	    
	    foreach $ob ($search->parse('RANKING')) {
		#print $ob->string; print "\n";
		$ob->print; print "\n";
	    } 

	    foreach $ob ($search->parse('PROTEIN')) {
		$ob->print; print "\n";
		
		foreach $frag ($ob->parse('PSUM')) {
		    $frag->print; print "\n";
		    next;
		}
		
		foreach $frag ($ob->parse('PHIT')) {
		    $frag->print; print "\n";
		    next;
		}

	    }

	} 
	
	$entry->free;
    }
    close DATA;
}

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

Enjoy!
n

-- 
---------------------------------------------------------------------------
 Nigel P. Brown, Ph.D.                               Nigel.Brown@ebi.ac.uk 
 http://www.sander.ebi.ac.uk/~brown/    Tel: +44 (0)1223 494 451  FAX: 468
 European Bioinformatics Institute,   Hinxton,    Cambridge CB10 1SD,   UK
---------------------------------------------------------------------------
=========== Bioperl Project Mailing List Message Footer =======
Project URL: http://www.techfak.uni-bielefeld.de/bcd/Perl/Bio/
For info about how to (un)subscribe, where messages are archived, etc:
http://www.techfak.uni-bielefeld.de/bcd/Perl/Bio/vsns-bcd-perl.html
====================================================================