From y.benita at wanadoo.nl Thu Jul 7 09:17:23 2005 From: y.benita at wanadoo.nl (Yair Benita) Date: Thu Jul 7 09:08:30 2005 Subject: [Biopython-dev] should we make a BLAT parser? Message-ID: I noticed a while ago that someone asked for a BLAT parser. I just had to do a few thousands BLATs and I don't really liked the psl output format it used. It is a bit confusing in my opinion. So I used the blast-like output and with minor changes to the NCBIStandalone module I was able to parse it with no problems. Should we introduce modifications in the NCBIStrandalone file or make a new separate file for parsing BLAT output? The main changes are in the header and footer of the file. I append examples below. There were a few other minor changes. Yair ----- header blat ------ BLASTN 2.2.4 [blat] Reference: Kent, WJ. (2002) BLAT - The BLAST-like alignment tool ----- header blast ------ BLASTX 2.2.6 [Apr-09-2003] Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. Schaffer, Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997), "Gapped BLAST and PSI-BLAST: a new generation of protein database search programs", Nucleic Acids Res. 25:3389-3402. ----- footer blat ------ Database: localhost:4303 ----- footer blast ------ Database: nr Posted date: Aug 11, 2004 8:59 AM Number of letters in database: 663,053,178 Number of sequences in database: 1,971,122 Lambda K H 0.310 0.133 0.405 Gapped Lambda K H 0.267 0.0410 0.140 Matrix: BLOSUM62 Gap Penalties: Existence: 11, Extension: 1 Number of Hits to DB: 111,495,368 Number of Sequences: 1971122 Number of extensions: 811791 Number of successful extensions: 2455 Number of sequences better than 1.0e-01: 0 Number of HSP's better than 0.1 without gapping: 2446 Number of HSP's successfully gapped in prelim test: 0 Number of HSP's that attempted gapping in prelim test: 0 Number of HSP's gapped (non-prelim): 2455 length of database: 663,053,178 effective HSP length: 2 effective length of database: 659,110,934 effective search space used: 15818662416 frameshift window, decay const: 50, 0.1 T: 12 A: 40 X1: 16 ( 7.2 bits) X2: 38 (14.6 bits) X3: 64 (24.7 bits) S1: 42 (21.7 bits) From kingb at caltech.edu Thu Jul 7 14:30:23 2005 From: kingb at caltech.edu (Brandon King) Date: Thu Jul 7 14:22:21 2005 Subject: [Biopython-dev] should we make a BLAT parser? In-Reply-To: References: Message-ID: <42CD74BF.1050200@caltech.edu> Hi Yair, I'm new to the developers list, but I do think it would be a great idea to create a BLAT parser based on the NCBIStandalone module. I have to do about a million BLATs soon. I have code for processing many BLAST results from the NCBIStandalone, but I don't have anything nearly as good for BLAT. Being able to use the same analysis code for BLAST/BLAT would be great (assuming the change your talking about will return result objects the same way that you can with the NCBIStandalone module?). -Brandon King Yair Benita wrote: >I noticed a while ago that someone asked for a BLAT parser. >I just had to do a few thousands BLATs and I don't really liked the psl >output format it used. It is a bit confusing in my opinion. So I used the >blast-like output and with minor changes to the NCBIStandalone module I was >able to parse it with no problems. > >Should we introduce modifications in the NCBIStrandalone file or make a new >separate file for parsing BLAT output? > >The main changes are in the header and footer of the file. I append examples >below. There were a few other minor changes. > >Yair > >----- header blat ------ >BLASTN 2.2.4 [blat] > >Reference: Kent, WJ. (2002) BLAT - The BLAST-like alignment tool > >----- header blast ------ >BLASTX 2.2.6 [Apr-09-2003] > > >Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. Schaffer, >Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997), >"Gapped BLAST and PSI-BLAST: a new generation of protein database search >programs", Nucleic Acids Res. 25:3389-3402. > >----- footer blat ------ > Database: localhost:4303 > >----- footer blast ------ > Database: nr > Posted date: Aug 11, 2004 8:59 AM > Number of letters in database: 663,053,178 > Number of sequences in database: 1,971,122 > >Lambda K H > 0.310 0.133 0.405 > >Gapped >Lambda K H > 0.267 0.0410 0.140 > > >Matrix: BLOSUM62 >Gap Penalties: Existence: 11, Extension: 1 >Number of Hits to DB: 111,495,368 >Number of Sequences: 1971122 >Number of extensions: 811791 >Number of successful extensions: 2455 >Number of sequences better than 1.0e-01: 0 >Number of HSP's better than 0.1 without gapping: 2446 >Number of HSP's successfully gapped in prelim test: 0 >Number of HSP's that attempted gapping in prelim test: 0 >Number of HSP's gapped (non-prelim): 2455 >length of database: 663,053,178 >effective HSP length: 2 >effective length of database: 659,110,934 >effective search space used: 15818662416 >frameshift window, decay const: 50, 0.1 >T: 12 >A: 40 >X1: 16 ( 7.2 bits) >X2: 38 (14.6 bits) >X3: 64 (24.7 bits) >S1: 42 (21.7 bits) > > >_______________________________________________ >Biopython-dev mailing list >Biopython-dev@biopython.org >http://biopython.org/mailman/listinfo/biopython-dev > > > > From y.benita at wanadoo.nl Thu Jul 7 18:31:05 2005 From: y.benita at wanadoo.nl (Yair Benita) Date: Thu Jul 7 18:22:06 2005 Subject: [Biopython-dev] should we make a BLAT parser? In-Reply-To: <42CD74BF.1050200@caltech.edu> References: <42CD74BF.1050200@caltech.edu> Message-ID: Since the only differences are in the header/footer and some spaces and numbers, it is essentially just like parsing a BLAST output. Tomorrow I will post all the changes needed. On my machine I just made a copy of the NCBIStandalone and modified it to fit the BLAT output but the correct way to do this is to modify the original NCBIStrandalone to handle all these outputs. The thing is I don't fully understand how this parser works (with all those uhandles, scanners, consumers, etc.), so I rather someone who does makes the changes in the CVS. Yair On Jul 7, 2005, at 20:30, Brandon King wrote: > Hi Yair, > I'm new to the developers list, but I do think it would be a great > idea to create a BLAT parser based on the NCBIStandalone module. I > have > to do about a million BLATs soon. I have code for processing many > BLAST > results from the NCBIStandalone, but I don't have anything nearly as > good for BLAT. Being able to use the same analysis code for BLAST/BLAT > would be great (assuming the change your talking about will return > result objects the same way that you can with the NCBIStandalone > module?). > > -Brandon King > > Yair Benita wrote: > > >> I noticed a while ago that someone asked for a BLAT parser. >> I just had to do a few thousands BLATs and I don't really liked >> the psl >> output format it used. It is a bit confusing in my opinion. So I >> used the >> blast-like output and with minor changes to the NCBIStandalone >> module I was >> able to parse it with no problems. >> >> Should we introduce modifications in the NCBIStrandalone file or >> make a new >> separate file for parsing BLAT output? >> >> The main changes are in the header and footer of the file. I >> append examples >> below. There were a few other minor changes. >> >> Yair >> >> ----- header blat ------ >> BLASTN 2.2.4 [blat] >> >> Reference: Kent, WJ. (2002) BLAT - The BLAST-like alignment tool >> >> ----- header blast ------ >> BLASTX 2.2.6 [Apr-09-2003] >> >> >> Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. >> Schaffer, >> Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997), >> "Gapped BLAST and PSI-BLAST: a new generation of protein database >> search >> programs", Nucleic Acids Res. 25:3389-3402. >> >> ----- footer blat ------ >> Database: localhost:4303 >> >> ----- footer blast ------ >> Database: nr >> Posted date: Aug 11, 2004 8:59 AM >> Number of letters in database: 663,053,178 >> Number of sequences in database: 1,971,122 >> >> Lambda K H >> 0.310 0.133 0.405 >> >> Gapped >> Lambda K H >> 0.267 0.0410 0.140 >> >> >> Matrix: BLOSUM62 >> Gap Penalties: Existence: 11, Extension: 1 >> Number of Hits to DB: 111,495,368 >> Number of Sequences: 1971122 >> Number of extensions: 811791 >> Number of successful extensions: 2455 >> Number of sequences better than 1.0e-01: 0 >> Number of HSP's better than 0.1 without gapping: 2446 >> Number of HSP's successfully gapped in prelim test: 0 >> Number of HSP's that attempted gapping in prelim test: 0 >> Number of HSP's gapped (non-prelim): 2455 >> length of database: 663,053,178 >> effective HSP length: 2 >> effective length of database: 659,110,934 >> effective search space used: 15818662416 >> frameshift window, decay const: 50, 0.1 >> T: 12 >> A: 40 >> X1: 16 ( 7.2 bits) >> X2: 38 (14.6 bits) >> X3: 64 (24.7 bits) >> S1: 42 (21.7 bits) >> >> >> _______________________________________________ >> Biopython-dev mailing list >> Biopython-dev@biopython.org >> http://biopython.org/mailman/listinfo/biopython-dev >> >> >> >> >> > > > From kingb at caltech.edu Thu Jul 7 20:44:37 2005 From: kingb at caltech.edu (Brandon King) Date: Thu Jul 7 20:35:38 2005 Subject: [Biopython-dev] should we make a BLAT parser? In-Reply-To: References: <42CD74BF.1050200@caltech.edu> Message-ID: <42CDCC75.4000705@caltech.edu> FYI, I just looked at my code and realized I wrote a BLAT parser that loads the data into simple objects. I don't know if that might be useful? If your interested, I can tell you more about it. -Brandon King Yair Benita wrote: > Since the only differences are in the header/footer and some spaces > and numbers, it is essentially just like parsing a BLAST output. > Tomorrow I will post all the changes needed. On my machine I just > made a copy of the NCBIStandalone and modified it to fit the BLAT > output but the correct way to do this is to modify the original > NCBIStrandalone to handle all these outputs. The thing is I don't > fully understand how this parser works (with all those uhandles, > scanners, consumers, etc.), so I rather someone who does makes the > changes in the CVS. > > Yair > > On Jul 7, 2005, at 20:30, Brandon King wrote: > >> Hi Yair, >> I'm new to the developers list, but I do think it would be a great >> idea to create a BLAT parser based on the NCBIStandalone module. I have >> to do about a million BLATs soon. I have code for processing many BLAST >> results from the NCBIStandalone, but I don't have anything nearly as >> good for BLAT. Being able to use the same analysis code for BLAST/BLAT >> would be great (assuming the change your talking about will return >> result objects the same way that you can with the NCBIStandalone >> module?). >> >> -Brandon King >> >> Yair Benita wrote: >> >> >>> I noticed a while ago that someone asked for a BLAT parser. >>> I just had to do a few thousands BLATs and I don't really liked the >>> psl >>> output format it used. It is a bit confusing in my opinion. So I >>> used the >>> blast-like output and with minor changes to the NCBIStandalone >>> module I was >>> able to parse it with no problems. >>> >>> Should we introduce modifications in the NCBIStrandalone file or >>> make a new >>> separate file for parsing BLAT output? >>> >>> The main changes are in the header and footer of the file. I append >>> examples >>> below. There were a few other minor changes. >>> >>> Yair >>> >>> ----- header blat ------ >>> BLASTN 2.2.4 [blat] >>> >>> Reference: Kent, WJ. (2002) BLAT - The BLAST-like alignment tool >>> >>> ----- header blast ------ >>> BLASTX 2.2.6 [Apr-09-2003] >>> >>> >>> Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. >>> Schaffer, >>> Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997), >>> "Gapped BLAST and PSI-BLAST: a new generation of protein database >>> search >>> programs", Nucleic Acids Res. 25:3389-3402. >>> >>> ----- footer blat ------ >>> Database: localhost:4303 >>> >>> ----- footer blast ------ >>> Database: nr >>> Posted date: Aug 11, 2004 8:59 AM >>> Number of letters in database: 663,053,178 >>> Number of sequences in database: 1,971,122 >>> >>> Lambda K H >>> 0.310 0.133 0.405 >>> >>> Gapped >>> Lambda K H >>> 0.267 0.0410 0.140 >>> >>> >>> Matrix: BLOSUM62 >>> Gap Penalties: Existence: 11, Extension: 1 >>> Number of Hits to DB: 111,495,368 >>> Number of Sequences: 1971122 >>> Number of extensions: 811791 >>> Number of successful extensions: 2455 >>> Number of sequences better than 1.0e-01: 0 >>> Number of HSP's better than 0.1 without gapping: 2446 >>> Number of HSP's successfully gapped in prelim test: 0 >>> Number of HSP's that attempted gapping in prelim test: 0 >>> Number of HSP's gapped (non-prelim): 2455 >>> length of database: 663,053,178 >>> effective HSP length: 2 >>> effective length of database: 659,110,934 >>> effective search space used: 15818662416 >>> frameshift window, decay const: 50, 0.1 >>> T: 12 >>> A: 40 >>> X1: 16 ( 7.2 bits) >>> X2: 38 (14.6 bits) >>> X3: 64 (24.7 bits) >>> S1: 42 (21.7 bits) >>> >>> >>> _______________________________________________ >>> Biopython-dev mailing list >>> Biopython-dev@biopython.org >>> http://biopython.org/mailman/listinfo/biopython-dev >>> >>> >>> >>> >>> >> >> >> > > > From hoffman at ebi.ac.uk Wed Jul 13 03:58:38 2005 From: hoffman at ebi.ac.uk (Michael Hoffman) Date: Wed Jul 13 03:49:26 2005 Subject: [Biopython-dev] C++ extension discussion on python-dev Message-ID: Just FYI (and I'm sure some of you already know) there has been a lot of discussion of C++ extensions and distutils on python-dev recently, which might be useful in shedding some light on the issues we have with the same. Or you can just wait for the inevitable changes to distutils. ;) -- Michael Hoffman European Bioinformatics Institute From jpaint at u.washington.edu Fri Jul 29 20:14:05 2005 From: jpaint at u.washington.edu (Jay Painter) Date: Fri Jul 29 20:05:05 2005 Subject: [Biopython-dev] mmLib/Biopython Message-ID: <1122682445.8441.35.camel@zen> Hi all, I see there's been a small discussion about merging mmLib into BioPython. I'd personally really like to see that happen, so if there is any interest from the BioPython developers, we could start working on a plan to do so. The only way such a merge would really be practical is to have both the current BioPython structure modules and mmLib modules be independent libraries at first, and then slowly integrate features of the two over a number of release cycles. I can't really have any of my applications breaking, and I'm sure there are plenty of BioPython users who would prefer not to have to change their code either. Peace, Jay Painter