From y.benita at wanadoo.nl  Thu Jul  7 09:17:23 2005
From: y.benita at wanadoo.nl (Yair Benita)
Date: Thu Jul  7 09:08:30 2005
Subject: [Biopython-dev] should we make a BLAT parser?
Message-ID: <BEF2F803.71D7%y.benita@wanadoo.nl>

I noticed a while ago that someone asked for a BLAT parser.
I just had to do a few thousands BLATs and I don't really liked the psl
output format it used. It is a bit confusing in my opinion. So I used the
blast-like output and with minor changes to the NCBIStandalone module I was
able to parse it with no problems.

Should we introduce modifications in the NCBIStrandalone file or make a new
separate file for parsing BLAT output?

The main changes are in the header and footer of the file. I append examples
below. There were a few other minor changes.

Yair

----- header blat ------
BLASTN 2.2.4 [blat]

Reference:  Kent, WJ. (2002) BLAT - The BLAST-like alignment tool

----- header blast ------
BLASTX 2.2.6 [Apr-09-2003]


Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. Schaffer,
Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997),
"Gapped BLAST and PSI-BLAST: a new generation of protein database search
programs",  Nucleic Acids Res. 25:3389-3402.

----- footer blat ------
  Database: localhost:4303

----- footer blast ------
  Database: nr
    Posted date:  Aug 11, 2004  8:59 AM
  Number of letters in database: 663,053,178
  Number of sequences in database:  1,971,122
  
Lambda     K      H
   0.310    0.133    0.405

Gapped
Lambda     K      H
   0.267   0.0410    0.140


Matrix: BLOSUM62
Gap Penalties: Existence: 11, Extension: 1
Number of Hits to DB: 111,495,368
Number of Sequences: 1971122
Number of extensions: 811791
Number of successful extensions: 2455
Number of sequences better than 1.0e-01: 0
Number of HSP's better than  0.1 without gapping: 2446
Number of HSP's successfully gapped in prelim test: 0
Number of HSP's that attempted gapping in prelim test: 0
Number of HSP's gapped (non-prelim): 2455
length of database: 663,053,178
effective HSP length: 2
effective length of database: 659,110,934
effective search space used: 15818662416
frameshift window, decay const: 50,  0.1
T: 12
A: 40
X1: 16 ( 7.2 bits)
X2: 38 (14.6 bits)
X3: 64 (24.7 bits)
S1: 42 (21.7 bits)


From kingb at caltech.edu  Thu Jul  7 14:30:23 2005
From: kingb at caltech.edu (Brandon King)
Date: Thu Jul  7 14:22:21 2005
Subject: [Biopython-dev] should we make a BLAT parser?
In-Reply-To: <BEF2F803.71D7%y.benita@wanadoo.nl>
References: <BEF2F803.71D7%y.benita@wanadoo.nl>
Message-ID: <42CD74BF.1050200@caltech.edu>

Hi Yair,
    I'm new to the developers list, but I do think it would be a great
idea to create a BLAT parser based on the NCBIStandalone module. I have
to do about a million BLATs soon. I have code for processing many BLAST
results from the NCBIStandalone, but I don't have anything nearly as
good for BLAT. Being able to use the same analysis code for BLAST/BLAT
would be great (assuming the change your talking about will return
result objects the same way that you can with the NCBIStandalone module?).

-Brandon King

Yair Benita wrote:

>I noticed a while ago that someone asked for a BLAT parser.
>I just had to do a few thousands BLATs and I don't really liked the psl
>output format it used. It is a bit confusing in my opinion. So I used the
>blast-like output and with minor changes to the NCBIStandalone module I was
>able to parse it with no problems.
>
>Should we introduce modifications in the NCBIStrandalone file or make a new
>separate file for parsing BLAT output?
>
>The main changes are in the header and footer of the file. I append examples
>below. There were a few other minor changes.
>
>Yair
>
>----- header blat ------
>BLASTN 2.2.4 [blat]
>
>Reference:  Kent, WJ. (2002) BLAT - The BLAST-like alignment tool
>
>----- header blast ------
>BLASTX 2.2.6 [Apr-09-2003]
>
>
>Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. Schaffer,
>Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997),
>"Gapped BLAST and PSI-BLAST: a new generation of protein database search
>programs",  Nucleic Acids Res. 25:3389-3402.
>
>----- footer blat ------
>  Database: localhost:4303
>
>----- footer blast ------
>  Database: nr
>    Posted date:  Aug 11, 2004  8:59 AM
>  Number of letters in database: 663,053,178
>  Number of sequences in database:  1,971,122
>  
>Lambda     K      H
>   0.310    0.133    0.405
>
>Gapped
>Lambda     K      H
>   0.267   0.0410    0.140
>
>
>Matrix: BLOSUM62
>Gap Penalties: Existence: 11, Extension: 1
>Number of Hits to DB: 111,495,368
>Number of Sequences: 1971122
>Number of extensions: 811791
>Number of successful extensions: 2455
>Number of sequences better than 1.0e-01: 0
>Number of HSP's better than  0.1 without gapping: 2446
>Number of HSP's successfully gapped in prelim test: 0
>Number of HSP's that attempted gapping in prelim test: 0
>Number of HSP's gapped (non-prelim): 2455
>length of database: 663,053,178
>effective HSP length: 2
>effective length of database: 659,110,934
>effective search space used: 15818662416
>frameshift window, decay const: 50,  0.1
>T: 12
>A: 40
>X1: 16 ( 7.2 bits)
>X2: 38 (14.6 bits)
>X3: 64 (24.7 bits)
>S1: 42 (21.7 bits)
>
>
>_______________________________________________
>Biopython-dev mailing list
>Biopython-dev@biopython.org
>http://biopython.org/mailman/listinfo/biopython-dev
>
>
>  
>

From y.benita at wanadoo.nl  Thu Jul  7 18:31:05 2005
From: y.benita at wanadoo.nl (Yair Benita)
Date: Thu Jul  7 18:22:06 2005
Subject: [Biopython-dev] should we make a BLAT parser?
In-Reply-To: <42CD74BF.1050200@caltech.edu>
References: <BEF2F803.71D7%y.benita@wanadoo.nl> <42CD74BF.1050200@caltech.edu>
Message-ID: <E62EADD4-3FA9-470B-8B12-2313D3F99E46@wanadoo.nl>

Since the only differences are in the header/footer and some spaces  
and numbers, it is essentially just like parsing a BLAST output.
Tomorrow I will post all the changes needed. On my machine I just  
made a copy of the NCBIStandalone and modified it to fit the BLAT  
output but the correct way to do this is to modify the original  
NCBIStrandalone to handle all these outputs. The thing is I don't  
fully understand how this parser works (with all those uhandles,  
scanners, consumers, etc.), so I rather someone who does makes the  
changes in the CVS.

Yair

On Jul 7, 2005, at 20:30, Brandon King wrote:

> Hi Yair,
>     I'm new to the developers list, but I do think it would be a great
> idea to create a BLAT parser based on the NCBIStandalone module. I  
> have
> to do about a million BLATs soon. I have code for processing many  
> BLAST
> results from the NCBIStandalone, but I don't have anything nearly as
> good for BLAT. Being able to use the same analysis code for BLAST/BLAT
> would be great (assuming the change your talking about will return
> result objects the same way that you can with the NCBIStandalone  
> module?).
>
> -Brandon King
>
> Yair Benita wrote:
>
>
>> I noticed a while ago that someone asked for a BLAT parser.
>> I just had to do a few thousands BLATs and I don't really liked  
>> the psl
>> output format it used. It is a bit confusing in my opinion. So I  
>> used the
>> blast-like output and with minor changes to the NCBIStandalone  
>> module I was
>> able to parse it with no problems.
>>
>> Should we introduce modifications in the NCBIStrandalone file or  
>> make a new
>> separate file for parsing BLAT output?
>>
>> The main changes are in the header and footer of the file. I  
>> append examples
>> below. There were a few other minor changes.
>>
>> Yair
>>
>> ----- header blat ------
>> BLASTN 2.2.4 [blat]
>>
>> Reference:  Kent, WJ. (2002) BLAT - The BLAST-like alignment tool
>>
>> ----- header blast ------
>> BLASTX 2.2.6 [Apr-09-2003]
>>
>>
>> Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A.  
>> Schaffer,
>> Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997),
>> "Gapped BLAST and PSI-BLAST: a new generation of protein database  
>> search
>> programs",  Nucleic Acids Res. 25:3389-3402.
>>
>> ----- footer blat ------
>>  Database: localhost:4303
>>
>> ----- footer blast ------
>>  Database: nr
>>    Posted date:  Aug 11, 2004  8:59 AM
>>  Number of letters in database: 663,053,178
>>  Number of sequences in database:  1,971,122
>>
>> Lambda     K      H
>>   0.310    0.133    0.405
>>
>> Gapped
>> Lambda     K      H
>>   0.267   0.0410    0.140
>>
>>
>> Matrix: BLOSUM62
>> Gap Penalties: Existence: 11, Extension: 1
>> Number of Hits to DB: 111,495,368
>> Number of Sequences: 1971122
>> Number of extensions: 811791
>> Number of successful extensions: 2455
>> Number of sequences better than 1.0e-01: 0
>> Number of HSP's better than  0.1 without gapping: 2446
>> Number of HSP's successfully gapped in prelim test: 0
>> Number of HSP's that attempted gapping in prelim test: 0
>> Number of HSP's gapped (non-prelim): 2455
>> length of database: 663,053,178
>> effective HSP length: 2
>> effective length of database: 659,110,934
>> effective search space used: 15818662416
>> frameshift window, decay const: 50,  0.1
>> T: 12
>> A: 40
>> X1: 16 ( 7.2 bits)
>> X2: 38 (14.6 bits)
>> X3: 64 (24.7 bits)
>> S1: 42 (21.7 bits)
>>
>>
>> _______________________________________________
>> Biopython-dev mailing list
>> Biopython-dev@biopython.org
>> http://biopython.org/mailman/listinfo/biopython-dev
>>
>>
>>
>>
>>
>
>
>

From kingb at caltech.edu  Thu Jul  7 20:44:37 2005
From: kingb at caltech.edu (Brandon King)
Date: Thu Jul  7 20:35:38 2005
Subject: [Biopython-dev] should we make a BLAT parser?
In-Reply-To: <E62EADD4-3FA9-470B-8B12-2313D3F99E46@wanadoo.nl>
References: <BEF2F803.71D7%y.benita@wanadoo.nl> <42CD74BF.1050200@caltech.edu>
	<E62EADD4-3FA9-470B-8B12-2313D3F99E46@wanadoo.nl>
Message-ID: <42CDCC75.4000705@caltech.edu>

FYI, I just looked at my code and realized I wrote a BLAT parser that
loads the data into simple objects. I don't know if that might be
useful? If your interested, I can tell you more about it.

-Brandon King

Yair Benita wrote:

> Since the only differences are in the header/footer and some spaces 
> and numbers, it is essentially just like parsing a BLAST output.
> Tomorrow I will post all the changes needed. On my machine I just 
> made a copy of the NCBIStandalone and modified it to fit the BLAT 
> output but the correct way to do this is to modify the original 
> NCBIStrandalone to handle all these outputs. The thing is I don't 
> fully understand how this parser works (with all those uhandles, 
> scanners, consumers, etc.), so I rather someone who does makes the 
> changes in the CVS.
>
> Yair
>
> On Jul 7, 2005, at 20:30, Brandon King wrote:
>
>> Hi Yair,
>>     I'm new to the developers list, but I do think it would be a great
>> idea to create a BLAT parser based on the NCBIStandalone module. I  have
>> to do about a million BLATs soon. I have code for processing many  BLAST
>> results from the NCBIStandalone, but I don't have anything nearly as
>> good for BLAT. Being able to use the same analysis code for BLAST/BLAT
>> would be great (assuming the change your talking about will return
>> result objects the same way that you can with the NCBIStandalone 
>> module?).
>>
>> -Brandon King
>>
>> Yair Benita wrote:
>>
>>
>>> I noticed a while ago that someone asked for a BLAT parser.
>>> I just had to do a few thousands BLATs and I don't really liked  the
>>> psl
>>> output format it used. It is a bit confusing in my opinion. So I 
>>> used the
>>> blast-like output and with minor changes to the NCBIStandalone 
>>> module I was
>>> able to parse it with no problems.
>>>
>>> Should we introduce modifications in the NCBIStrandalone file or 
>>> make a new
>>> separate file for parsing BLAT output?
>>>
>>> The main changes are in the header and footer of the file. I  append
>>> examples
>>> below. There were a few other minor changes.
>>>
>>> Yair
>>>
>>> ----- header blat ------
>>> BLASTN 2.2.4 [blat]
>>>
>>> Reference:  Kent, WJ. (2002) BLAT - The BLAST-like alignment tool
>>>
>>> ----- header blast ------
>>> BLASTX 2.2.6 [Apr-09-2003]
>>>
>>>
>>> Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. 
>>> Schaffer,
>>> Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997),
>>> "Gapped BLAST and PSI-BLAST: a new generation of protein database 
>>> search
>>> programs",  Nucleic Acids Res. 25:3389-3402.
>>>
>>> ----- footer blat ------
>>>  Database: localhost:4303
>>>
>>> ----- footer blast ------
>>>  Database: nr
>>>    Posted date:  Aug 11, 2004  8:59 AM
>>>  Number of letters in database: 663,053,178
>>>  Number of sequences in database:  1,971,122
>>>
>>> Lambda     K      H
>>>   0.310    0.133    0.405
>>>
>>> Gapped
>>> Lambda     K      H
>>>   0.267   0.0410    0.140
>>>
>>>
>>> Matrix: BLOSUM62
>>> Gap Penalties: Existence: 11, Extension: 1
>>> Number of Hits to DB: 111,495,368
>>> Number of Sequences: 1971122
>>> Number of extensions: 811791
>>> Number of successful extensions: 2455
>>> Number of sequences better than 1.0e-01: 0
>>> Number of HSP's better than  0.1 without gapping: 2446
>>> Number of HSP's successfully gapped in prelim test: 0
>>> Number of HSP's that attempted gapping in prelim test: 0
>>> Number of HSP's gapped (non-prelim): 2455
>>> length of database: 663,053,178
>>> effective HSP length: 2
>>> effective length of database: 659,110,934
>>> effective search space used: 15818662416
>>> frameshift window, decay const: 50,  0.1
>>> T: 12
>>> A: 40
>>> X1: 16 ( 7.2 bits)
>>> X2: 38 (14.6 bits)
>>> X3: 64 (24.7 bits)
>>> S1: 42 (21.7 bits)
>>>
>>>
>>> _______________________________________________
>>> Biopython-dev mailing list
>>> Biopython-dev@biopython.org
>>> http://biopython.org/mailman/listinfo/biopython-dev
>>>
>>>
>>>
>>>
>>>
>>
>>
>>
>
>
>

From hoffman at ebi.ac.uk  Wed Jul 13 03:58:38 2005
From: hoffman at ebi.ac.uk (Michael Hoffman)
Date: Wed Jul 13 03:49:26 2005
Subject: [Biopython-dev] C++ extension discussion on python-dev
Message-ID: <Pine.LNX.4.62.0507130857280.24177@qnzvnan.rov.np.hx>

Just FYI (and I'm sure some of you already know) there has been a lot
of discussion of C++ extensions and distutils on python-dev recently,
which might be useful in shedding some light on the issues we have
with the same.

Or you can just wait for the inevitable changes to distutils. ;)
-- 
Michael Hoffman <hoffman@ebi.ac.uk>
European Bioinformatics Institute
From jpaint at u.washington.edu  Fri Jul 29 20:14:05 2005
From: jpaint at u.washington.edu (Jay Painter)
Date: Fri Jul 29 20:05:05 2005
Subject: [Biopython-dev] mmLib/Biopython
Message-ID: <1122682445.8441.35.camel@zen>

Hi all,

I see there's been a small discussion about merging mmLib into
BioPython.  I'd personally really like to see that happen, so if there
is any interest from the BioPython developers, we could start working on
a plan to do so.  The only way such a merge would really be practical is
to have both the current BioPython structure modules and mmLib modules
be independent libraries at first, and then slowly integrate features of
the two over a number of release cycles.  I can't really have any of my
applications breaking, and I'm sure there are plenty of BioPython users
who would prefer not to have to change their code either.

Peace,
Jay Painter