[Biopython-dev] should we make a BLAT parser?

Yair Benita y.benita at wanadoo.nl
Thu Jul 7 09:17:23 EDT 2005


I noticed a while ago that someone asked for a BLAT parser.
I just had to do a few thousands BLATs and I don't really liked the psl
output format it used. It is a bit confusing in my opinion. So I used the
blast-like output and with minor changes to the NCBIStandalone module I was
able to parse it with no problems.

Should we introduce modifications in the NCBIStrandalone file or make a new
separate file for parsing BLAT output?

The main changes are in the header and footer of the file. I append examples
below. There were a few other minor changes.

Yair

----- header blat ------
BLASTN 2.2.4 [blat]

Reference:  Kent, WJ. (2002) BLAT - The BLAST-like alignment tool

----- header blast ------
BLASTX 2.2.6 [Apr-09-2003]


Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. Schaffer,
Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997),
"Gapped BLAST and PSI-BLAST: a new generation of protein database search
programs",  Nucleic Acids Res. 25:3389-3402.

----- footer blat ------
  Database: localhost:4303

----- footer blast ------
  Database: nr
    Posted date:  Aug 11, 2004  8:59 AM
  Number of letters in database: 663,053,178
  Number of sequences in database:  1,971,122
  
Lambda     K      H
   0.310    0.133    0.405

Gapped
Lambda     K      H
   0.267   0.0410    0.140


Matrix: BLOSUM62
Gap Penalties: Existence: 11, Extension: 1
Number of Hits to DB: 111,495,368
Number of Sequences: 1971122
Number of extensions: 811791
Number of successful extensions: 2455
Number of sequences better than 1.0e-01: 0
Number of HSP's better than  0.1 without gapping: 2446
Number of HSP's successfully gapped in prelim test: 0
Number of HSP's that attempted gapping in prelim test: 0
Number of HSP's gapped (non-prelim): 2455
length of database: 663,053,178
effective HSP length: 2
effective length of database: 659,110,934
effective search space used: 15818662416
frameshift window, decay const: 50,  0.1
T: 12
A: 40
X1: 16 ( 7.2 bits)
X2: 38 (14.6 bits)
X3: 64 (24.7 bits)
S1: 42 (21.7 bits)




More information about the Biopython-dev mailing list