[Biopython-dev] should we make a BLAT parser?
Brandon King
kingb at caltech.edu
Thu Jul 7 20:44:37 EDT 2005
FYI, I just looked at my code and realized I wrote a BLAT parser that
loads the data into simple objects. I don't know if that might be
useful? If your interested, I can tell you more about it.
-Brandon King
Yair Benita wrote:
> Since the only differences are in the header/footer and some spaces
> and numbers, it is essentially just like parsing a BLAST output.
> Tomorrow I will post all the changes needed. On my machine I just
> made a copy of the NCBIStandalone and modified it to fit the BLAT
> output but the correct way to do this is to modify the original
> NCBIStrandalone to handle all these outputs. The thing is I don't
> fully understand how this parser works (with all those uhandles,
> scanners, consumers, etc.), so I rather someone who does makes the
> changes in the CVS.
>
> Yair
>
> On Jul 7, 2005, at 20:30, Brandon King wrote:
>
>> Hi Yair,
>> I'm new to the developers list, but I do think it would be a great
>> idea to create a BLAT parser based on the NCBIStandalone module. I have
>> to do about a million BLATs soon. I have code for processing many BLAST
>> results from the NCBIStandalone, but I don't have anything nearly as
>> good for BLAT. Being able to use the same analysis code for BLAST/BLAT
>> would be great (assuming the change your talking about will return
>> result objects the same way that you can with the NCBIStandalone
>> module?).
>>
>> -Brandon King
>>
>> Yair Benita wrote:
>>
>>
>>> I noticed a while ago that someone asked for a BLAT parser.
>>> I just had to do a few thousands BLATs and I don't really liked the
>>> psl
>>> output format it used. It is a bit confusing in my opinion. So I
>>> used the
>>> blast-like output and with minor changes to the NCBIStandalone
>>> module I was
>>> able to parse it with no problems.
>>>
>>> Should we introduce modifications in the NCBIStrandalone file or
>>> make a new
>>> separate file for parsing BLAT output?
>>>
>>> The main changes are in the header and footer of the file. I append
>>> examples
>>> below. There were a few other minor changes.
>>>
>>> Yair
>>>
>>> ----- header blat ------
>>> BLASTN 2.2.4 [blat]
>>>
>>> Reference: Kent, WJ. (2002) BLAT - The BLAST-like alignment tool
>>>
>>> ----- header blast ------
>>> BLASTX 2.2.6 [Apr-09-2003]
>>>
>>>
>>> Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A.
>>> Schaffer,
>>> Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997),
>>> "Gapped BLAST and PSI-BLAST: a new generation of protein database
>>> search
>>> programs", Nucleic Acids Res. 25:3389-3402.
>>>
>>> ----- footer blat ------
>>> Database: localhost:4303
>>>
>>> ----- footer blast ------
>>> Database: nr
>>> Posted date: Aug 11, 2004 8:59 AM
>>> Number of letters in database: 663,053,178
>>> Number of sequences in database: 1,971,122
>>>
>>> Lambda K H
>>> 0.310 0.133 0.405
>>>
>>> Gapped
>>> Lambda K H
>>> 0.267 0.0410 0.140
>>>
>>>
>>> Matrix: BLOSUM62
>>> Gap Penalties: Existence: 11, Extension: 1
>>> Number of Hits to DB: 111,495,368
>>> Number of Sequences: 1971122
>>> Number of extensions: 811791
>>> Number of successful extensions: 2455
>>> Number of sequences better than 1.0e-01: 0
>>> Number of HSP's better than 0.1 without gapping: 2446
>>> Number of HSP's successfully gapped in prelim test: 0
>>> Number of HSP's that attempted gapping in prelim test: 0
>>> Number of HSP's gapped (non-prelim): 2455
>>> length of database: 663,053,178
>>> effective HSP length: 2
>>> effective length of database: 659,110,934
>>> effective search space used: 15818662416
>>> frameshift window, decay const: 50, 0.1
>>> T: 12
>>> A: 40
>>> X1: 16 ( 7.2 bits)
>>> X2: 38 (14.6 bits)
>>> X3: 64 (24.7 bits)
>>> S1: 42 (21.7 bits)
>>>
>>>
>>> _______________________________________________
>>> Biopython-dev mailing list
>>> Biopython-dev at biopython.org
>>> http://biopython.org/mailman/listinfo/biopython-dev
>>>
>>>
>>>
>>>
>>>
>>
>>
>>
>
>
>
More information about the Biopython-dev
mailing list