[Biopython-dev] should we make a BLAT parser?

Brandon King kingb at caltech.edu
Thu Jul 7 14:30:23 EDT 2005

Hi Yair,
    I'm new to the developers list, but I do think it would be a great
idea to create a BLAT parser based on the NCBIStandalone module. I have
to do about a million BLATs soon. I have code for processing many BLAST
results from the NCBIStandalone, but I don't have anything nearly as
good for BLAT. Being able to use the same analysis code for BLAST/BLAT
would be great (assuming the change your talking about will return
result objects the same way that you can with the NCBIStandalone module?).

-Brandon King

Yair Benita wrote:

>I noticed a while ago that someone asked for a BLAT parser.
>I just had to do a few thousands BLATs and I don't really liked the psl
>output format it used. It is a bit confusing in my opinion. So I used the
>blast-like output and with minor changes to the NCBIStandalone module I was
>able to parse it with no problems.
>Should we introduce modifications in the NCBIStrandalone file or make a new
>separate file for parsing BLAT output?
>The main changes are in the header and footer of the file. I append examples
>below. There were a few other minor changes.
>----- header blat ------
>BLASTN 2.2.4 [blat]
>Reference:  Kent, WJ. (2002) BLAT - The BLAST-like alignment tool
>----- header blast ------
>BLASTX 2.2.6 [Apr-09-2003]
>Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. Schaffer,
>Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997),
>"Gapped BLAST and PSI-BLAST: a new generation of protein database search
>programs",  Nucleic Acids Res. 25:3389-3402.
>----- footer blat ------
>  Database: localhost:4303
>----- footer blast ------
>  Database: nr
>    Posted date:  Aug 11, 2004  8:59 AM
>  Number of letters in database: 663,053,178
>  Number of sequences in database:  1,971,122
>Lambda     K      H
>   0.310    0.133    0.405
>Lambda     K      H
>   0.267   0.0410    0.140
>Matrix: BLOSUM62
>Gap Penalties: Existence: 11, Extension: 1
>Number of Hits to DB: 111,495,368
>Number of Sequences: 1971122
>Number of extensions: 811791
>Number of successful extensions: 2455
>Number of sequences better than 1.0e-01: 0
>Number of HSP's better than  0.1 without gapping: 2446
>Number of HSP's successfully gapped in prelim test: 0
>Number of HSP's that attempted gapping in prelim test: 0
>Number of HSP's gapped (non-prelim): 2455
>length of database: 663,053,178
>effective HSP length: 2
>effective length of database: 659,110,934
>effective search space used: 15818662416
>frameshift window, decay const: 50,  0.1
>T: 12
>A: 40
>X1: 16 ( 7.2 bits)
>X2: 38 (14.6 bits)
>X3: 64 (24.7 bits)
>S1: 42 (21.7 bits)
