From hase at umbc.edu Mon Aug 1 23:27:54 2005 From: hase at umbc.edu (HASE) Date: Mon Aug 1 23:18:09 2005 Subject: [Biopython-dev] Bioinformatics Software Development Survey Message-ID: <2010.68.49.173.177.1122953274.squirrel@68.49.173.177> Hello, As part of our research at UMBC, we are studying the characteristics of software development in the bioinformatics domain. We believe that this study should be guided by the people who are actively involved in bioinformatics. This research is our first step towards enabling the production of high quality bioinformatics software with less time and effort. Therefore, your feedback is very important to us. We seek your input in the form of a survey questionnaire that will take around 15 minutes of your time. We solicit general demographic information, information about the products that you have developed, your work practices, and your software development process. So, if you are a bioinformatics professional doing software development or a software developer working in the bioinformatics domain, please provide us with your valuable input. We assure you that this information will be used only for academic purposes and will be completely confidential. Please follow the link below to start the survey: http://www.is.umbc.edu/bio-survey/ We appreciate your participation in advance. Regards, HASE (Human Aspects of Software Engineering) 1000 Hilltop Circle Department of Information Systems University of Maryland Baltimore County Baltimore, MD, 21250 hase@umbc.edu From y.benita at wanadoo.nl Mon Aug 8 10:49:16 2005 From: y.benita at wanadoo.nl (Yair Benita) Date: Mon Aug 8 10:39:13 2005 Subject: [Biopython-dev] comments on BLAT parser Message-ID: Hi All, Jeff Chang and I made a few changes to the NCBIstandalone module and you may now use it to parse BLAT output. Just a few comments on that: 1. BLAT can be run either using the BLAT program or the gfServer gfClient programs. The testing was done using gfServer-gfClient version 32. 2. Use the option -out=blast to get the output file in BLAST format. 3. When using BLAT to compare a DNA query to a DNA database, everything works perfectly well. However, when comparing a protein query to a translated DNA database, there is a bug in the BLAST output. The subject coordinates are wrong if the hit is on the opposite strand. This bug is known and will be fixed in the next release of BLAT. For now, if you compare proteins to a translated DNA database, use the psl format. Below is an example for parsing the blat output (note that the query_end and sbjct_end have also been added to the NCBIstandalone module). Yair ################################################## from Bio.Blast import NCBIStandalone BlatFile = "blat_output.txt" blast_out = open(BlatFile,'r') b_parser = NCBIStandalone.BlastParser() b_iterator = NCBIStandalone.Iterator(blast_out, b_parser) while 1: b_record = b_iterator.next() if b_record is None: break print "Query used:", b_record.query for hitX in b_record.alignments: print "\t Target: ", hitX.title for hspX in hitX.hsps: print "\t\tQuery location: %s to %s" % ( hspX.query_start, hspX.query_end) print "\t\ttarget location: %s to %s" % ( hspX.sbjct_start, hspX.sbjct_end) print "\t\tstrand:", hspX.strand print "\t\tscore: %s" % hspX.score print "\t\tbits: %s" % hspX.bits print "\t\texpect: %s" % hspX.expect print "\t\tidentity:", hspX.identities print "\t\t" + "-"*20 blast_out.close() ################################################## From gvwilson at cs.utoronto.ca Tue Aug 9 09:12:58 2005 From: gvwilson at cs.utoronto.ca (Greg Wilson) Date: Tue Aug 9 12:55:10 2005 Subject: [Biopython-dev] re: software skills course Message-ID: Hi, I'm working with support from the Python Software Foundation to develop an open source course on basic software development skills for people with backgrounds in science and engineering. I have a beta version of the course notes ready for review, and would like to pull in people in sci&eng to look it over and give me feedback. If you know anyone who fits this bill (particularly people who might be interested in following along with a trial run of the course this fall), I'd be grateful for pointers. Thanks, Greg Wilson From kingb at caltech.edu Fri Aug 12 17:20:26 2005 From: kingb at caltech.edu (Brandon King) Date: Fri Aug 12 17:10:42 2005 Subject: [Biopython-dev] comments on BLAT parser In-Reply-To: References: Message-ID: <42FD129A.6010609@caltech.edu> Hi Yair, Thanks for the update! That should come in handy! -Brandon King Yair Benita wrote: >Hi All, >Jeff Chang and I made a few changes to the NCBIstandalone module and you may >now use it to parse BLAT output. Just a few comments on that: > >1. BLAT can be run either using the BLAT program or the gfServer gfClient >programs. The testing was done using gfServer-gfClient version 32. > >2. Use the option -out=blast to get the output file in BLAST format. > >3. When using BLAT to compare a DNA query to a DNA database, everything >works perfectly well. However, when comparing a protein query to a >translated DNA database, there is a bug in the BLAST output. The subject >coordinates are wrong if the hit is on the opposite strand. This bug is >known and will be fixed in the next release of BLAT. For now, if you compare >proteins to a translated DNA database, use the psl format. > >Below is an example for parsing the blat output (note that the query_end and >sbjct_end have also been added to the NCBIstandalone module). > >Yair > >################################################## >from Bio.Blast import NCBIStandalone > >BlatFile = "blat_output.txt" >blast_out = open(BlatFile,'r') >b_parser = NCBIStandalone.BlastParser() >b_iterator = NCBIStandalone.Iterator(blast_out, b_parser) > > >while 1: > b_record = b_iterator.next() > > if b_record is None: > break > > print "Query used:", b_record.query > > for hitX in b_record.alignments: > print "\t Target: ", hitX.title > > for hspX in hitX.hsps: > print "\t\tQuery location: %s to %s" % ( hspX.query_start, > hspX.query_end) > print "\t\ttarget location: %s to %s" % ( hspX.sbjct_start, > hspX.sbjct_end) > print "\t\tstrand:", hspX.strand > print "\t\tscore: %s" % hspX.score > print "\t\tbits: %s" % hspX.bits > print "\t\texpect: %s" % hspX.expect > print "\t\tidentity:", hspX.identities > print "\t\t" + "-"*20 > >blast_out.close() >################################################## > > > >_______________________________________________ >Biopython-dev mailing list >Biopython-dev@biopython.org >http://biopython.org/mailman/listinfo/biopython-dev > > > > From dalke at dalkescientific.com Thu Aug 18 18:21:19 2005 From: dalke at dalkescientific.com (Andrew Dalke) Date: Thu Aug 18 18:13:02 2005 Subject: [Biopython-dev] warning in NCBIWWW.py Message-ID: Using NCBIWWW.py I get Bio/Blast/NCBIWWW.py:1070: UserWarning: qblast works only with blastn and blastp for now. warnings.warn("qblast works only with blastn and blastp for now.") I'm using blastp. Do I really need that warning? Jeff commented # Is this warning useful? If the program is blastn or blastp, the # warning is not necessary because the program is supported. If # it is not, the script will raise an exception anyway. # - Jeff import warnings warnings.warn("qblast works only with blastn and blastp for now.") assert program == 'blastn' or program == 'blastp' That should more likely be if program not in ("blastn", "blastp"): raise AssertionError("qblast only supports 'blastn' and 'blastp' " "applications, not %r" % (program,)) assert errors are turned off if Python is run with -O so I only use them to double check internal invariants, not as a way to validate user input. Also, like someone mentioned earlier, qblast uses the lower-level httplib instead of urllib/urlib2 to do the HTTP connection to qblast. That doesn't work on sites that depend on proxies to talk to the outside world. Andrew dalke@dalkescientific.com