From jelle.feringa at ezct.net Fri May 5 05:17:27 2006 From: jelle.feringa at ezct.net (Jelle Feringa / EZCT Architecture & Design Research) Date: Fri, 5 May 2006 11:17:27 +0200 Subject: [Biopython-dev] issues compiling KDTree.cpp / win32 Message-ID: <009501c67024$baa13b20$0b01a8c0@JELLE> Hi, I know there have been some issues compiling KDTree.cpp and have made a brave effort as well (not necessarily a trivial thing to do for the compiler challenged. ) but haven't been able to compile its successfully. Would anyone more succesfull in compiling the _CKDTree.pyd module be so kind to share it with me? I'm pretty desperate for a well implemented kdtree to be frank. Apart from that, I'm putting together a python module for scripting Rhino, a terrific nurbs CAD modeler. It would be great to include the kdtree module into this module. Giving the appropriate credits would I be allowed to do so? Many thanks in advance, -jelle From mdehoon at c2b2.columbia.edu Tue May 9 19:59:55 2006 From: mdehoon at c2b2.columbia.edu (Michiel Jan Laurens de Hoon) Date: Tue, 09 May 2006 16:59:55 -0700 Subject: [Biopython-dev] new website & documentation Message-ID: <44612CFB.7030008@c2b2.columbia.edu> Hi everybody, As you may know, the Biopython website is transitioning to a new server of the Open Bioinformatics Foundation. The OBF folks suggested that we use a wiki-based website (just like bioperl and biojava) instead of quixote, which we have been using so far. A prototype wiki-based Biopython website is now running at http://biopython.open-bio.org/wiki/Biopython. Feel free to contribute to this website as you see fit. Don't be shy. Now the question is whether the complete Biopython documentation should be wiki-based. For an example, see the Bioperl HOWTOs. This would make it easier to update the Biopython documentation, and hopefully result in better and more extensive documentation (which wouldn't hurt). On the downside, we'd lose the PDFs. (Or is it possible to generate PDFs from the wiki website? Wiki gurus, let us know). Another option would be to convert the generic Biopython documentation to wiki (e.g., the tutorial), but keep specialized modules in the current format. Opinions? --Michiel. From biopython-dev at maubp.freeserve.co.uk Wed May 10 10:24:17 2006 From: biopython-dev at maubp.freeserve.co.uk (Peter (BioPython Dev)) Date: Wed, 10 May 2006 15:24:17 +0100 Subject: [Biopython-dev] Biopython for GDS SOFT? In-Reply-To: References: Message-ID: <4461F791.5020607@maubp.freeserve.co.uk> Ramon Aragues wrote: > Hi, > > I-ve seen your post on the bipython discussion list (from 2005) about > GDS SOFT files. > > Has any work been done on it since then? Can I use biopython for > parsing GDS SOFT files? Yes, I checked in some revisions to the Bio/Geo files in Jan 2006, and it seemed to work for me. http://cvs.biopython.org/cgi-bin/viewcvs/viewcvs.cgi/biopython/Bio/Geo/?cvsroot=biopython I also added the examples the NCBI provided to document their 2005 file format changes to the BioPython GEO test suite. I was only interested in some very basic exploration at the time, and then moved on to using Sean Davis' GEOquery for R/BioConductor instead. I found this made statistical analysis and visualisation much easier. In any event, looking at GDS SOFT files was rather an aside to my main research interests... > I am already using biopython in my framework ( http://sbi.imim.es/piana > ) so it would be great if I can use biopython for this as well. This should be possible once you update the Bio/Geo files to those from CVS or the next release of BioPython. Please let us (me) know how you get on... However, I think you will be on your own in terms of statistical data analysis with python. > Cheers! > > Ramon Good luck Peter From bill at barnard-engineering.com Mon May 15 12:22:30 2006 From: bill at barnard-engineering.com (Bill Barnard) Date: Mon, 15 May 2006 09:22:30 -0700 Subject: [Biopython-dev] "Online" tests, was [Bug 1972] In-Reply-To: <442D93EE.9040707@maubp.freeserve.co.uk> References: <200603210825.k2L8Pg3S006352@portal.open-bio.org> <441FEEE6.8040402@maubp.freeserve.co.uk> <1143045514.813.26.camel@tioga.barnard-engineering.com> <442403DE.7090607@biopython.org> <1143750836.5736.27.camel@tioga.barnard-engineering.com> <442D93EE.9040707@maubp.freeserve.co.uk> Message-ID: <1147710150.1517.37.camel@tioga.barnard-engineering.com> On Fri, 2006-03-31 at 21:41 +0100, Peter (BioPython Dev) wrote: > Bill Barnard wrote: > > I've made a first cut unit test, tentatively named > > test_Parsers_for_newest_formats, ... > > Sounds good to me. But not a very snappy name - how about something > shorter like test_OnlineFormats.py instead? Works for me. > I think the Blast test should actually submit a short protein/nucleotide > sequence known to be in the online database. Maybe do some basic sanity > testing like check it returns at least N results and the best hit is at > least a certain score. I have one such test working. > > >>In some cases (e.g. GenBank, Fasta) once the sample file is downloaded > >>there are multiple parsers to be checked (e.g. record and feature parsers). I have tests using EUtils that check the record and feature parsers for GenBank and Fasta files. > I'll volunteer to add cases for GenBank, Fasta and GEO files. I've not yet looked at GEO files, so there's that yet to do. I expect to have some free time soon so I may look at the GEO and other parsers at that time. I'd certainly be interested in any feedback you have on these tests, or to see additional test cases. Finding the parsers to test and implementing the tests seems pretty straightforward. I've attached the test code to this email, and added a .txt file extension just in case the listserv won't allow attaching a code file. Bill p.s. I'm still looking for a more interesting project I can contribute to Biopython. Doing some of the tests and such is useful to me in learning my way around the codebase; I hope to find something requiring a bit more creativity. If anyone has any suggestions about areas that need some attention I'll happily consider them. I am considering writing some code for HMMs that would build on some prototype code I wrote. -------------- next part -------------- """Test to make sure all parsers retrieve and read current data formats. """ import requires_internet import sys import unittest from Bio import ParserSupport # ExpasyTest from Bio import Prosite from Bio.Prosite import Prodoc from Bio.SwissProt import SProt # PubMedTest from Bio import PubMed, Medline # EUtilsTest from Bio import EUtils from Bio.EUtils import DBIdsClient from Bio import GenBank from Bio import Fasta # BlastTest try: import cStringIO as StringIO except ImportError: import StringIO from Bio.Blast import NCBIWWW, NCBIXML def run_tests(argv): test_suite = testing_suite() runner = unittest.TextTestRunner(sys.stdout, verbosity = 2) runner.run(test_suite) def testing_suite(): """Generate the suite of tests. """ test_suite = unittest.TestSuite() test_loader = unittest.TestLoader() test_loader.testMethodPrefix = 't_' tests = [ExpasyTest, PubMedTest, EUtilsTest, BlastTest] for test in tests: cur_suite = test_loader.loadTestsFromTestCase(test) test_suite.addTest(cur_suite) return test_suite class ExpasyTest(unittest.TestCase): """Test that parsers can read the current Expasy database formats """ def setUp(self): pass def t_parse_prosite_record(self): """Retrieve a Prosite record and parse it """ prosite_dict = Prosite.ExPASyDictionary(parser=Prosite.RecordParser()) accession = 'PS00159' entry = prosite_dict[accession] self.assertEqual(entry.accession, accession) def t_parse_prodoc_record(self): """Retrieve a Prodoc record and parse it """ prodoc_dict = Prodoc.ExPASyDictionary(parser=Prodoc.RecordParser()) accession = 'PDOC00933' entry = prodoc_dict[accession] self.assertEqual(entry.accession, accession) def t_parse_sprot_record(self): """Retrieve a SwissProt record and parse it into Record format """ sprot_record_dict = SProt.ExPASyDictionary(parser=SProt.RecordParser()) accession = 'Q5TYW8' entry = sprot_record_dict[accession] self.failUnless(accession in entry.accessions) def t_parse_sprot_seq(self): """Retrieve a SwissProt record and parse it into Sequence format """ sprot_seq_dict = SProt.ExPASyDictionary(parser=SProt.SequenceParser()) accession = 'Q5TYW8' entry = sprot_seq_dict[accession] self.assertEqual(entry.id, accession) class PubMedTest(unittest.TestCase): """Test that Medline parsers can read the current PubMed format """ def setUp(self): pass def t_parse_pubmed_record(self): """Retrieve a PubMed record and parse it """ pubmed_dict = PubMed.Dictionary(parser=Medline.RecordParser()) pubmed_id = '3136164' entry = pubmed_dict[pubmed_id] self.assertEqual(entry.pubmed_id, pubmed_id) class EUtilsTest(unittest.TestCase): """Test that GenBank, Fasta parsers can read EUtils retrieved db formats """ # Database primary IDs # http://eutils.ncbi.nlm.nih.gov/entrez/eutils/einfo.fcgi? # lists all E-Utility database names. Primary IDs, which are # always an integer, are listed below for a few databases: # Entrez Database Primary ID E-Utility Database Name # 3D Domains 3D SDI domains # Domains PSSM-ID cdd # Genome Genome ID genome # Nucleotide GI number nucleotide # OMIM MIM number omim # PopSet Popset ID popset # Protein GI number protein # ProbeSet GEO ID geo # PubMed PMID pubmed # Structure MMDB ID structure # SNP SNP ID snp # Taxonomy TAXID taxonomy # UniGene UniGene ID unigene # UniSTS UniSTS ID unists # see Bio.EUtils.Config.py for supported EUtils databases def setUp(self): self.client = DBIdsClient.DBIdsClient() def t_parse_nucleotide_gb(self): """Use EUtils to retrieve and parse an NCBI nucleotide GenBank record """ db = "nucleotide" gi = "57165207" result = self.client.search(gi, db, retmax=1) sfp = result[0].efetch(retmode="text", rettype="gb", \ seq_start=5458145, seq_stop=5458942) text = sfp.readlines() sfp.close() self.assertEqual(result.dbids.db, db) # sanity check, not a parser test locus_line = text[0] # now parse the text with GenBank parsers parser_input = StringIO.StringIO(''.join(text)) # RecordParser parser = GenBank.RecordParser() record = parser.parse(parser_input) parser_input.reset() self.assertEqual(record.gi, gi) # split locus line to eliminate whitespace differences self.assertEqual(record._locus_line().split(), locus_line.split()) # FeatureParser parser = GenBank.FeatureParser() record = parser.parse(parser_input) parser_input.reset() self.assertEqual(record.annotations['gi'], gi) def t_parse_protein_fasta(self): """Use EUtils to retrieve and parse an NCBI protein Fasta record """ db = "protein" gi = "21220767" result = self.client.search(gi, db, retmax=1) sfp = result[0].efetch(retmode="text", rettype="fasta") text = sfp.readlines() sfp.close() self.assertEqual(result.dbids.db, db) # sanity check, not a parser test # now parse the text with Fasta parsers parser_input = StringIO.StringIO(''.join(text)) # RecordParser parser = Fasta.RecordParser() record = parser.parse(parser_input) parser_input.reset() self.assert_(gi in record.title.split('|')) self.assertEqual(record.title, text[0][1:-1]) # strip > and \n from text[0] # SequenceParser parser = Fasta.SequenceParser() record = parser.parse(parser_input) parser_input.reset() self.assert_(gi in record.description.split('|')) self.assertEqual(record.description, text[0][1:-1]) # strip > and \n from text[0] class BlastTest(unittest.TestCase): """Test that the Blast XML parser can read the current NCBI formats """ def setUp(self): pass def t_parse_blast_xml(self): """Use NCBIWWW to retrieve Blast results and use NCBIXML to parse it """ query = '''>sp|P09405|NUCL_MOUSE Nucleolin (Protein C23) - Mus musculus (Mouse). VKLAKAGKTHGEAKKMAPPPKEVEEDSEDEEMSEDEDDSSGEEEVVIPQKKGKKATTTPA KKVVVSQTKKAAVPTPAKKAAVTPGKKAVATPAKKNITPAKVIPTPGKKGAAQAKALVPT''' result_handle = NCBIWWW.qblast('blastp', 'swissprot', \ query, expect=0.0001, format_type="XML") blast_results = result_handle.read() result_handle.close() blast_out = StringIO.StringIO(blast_results) parser = NCBIXML.BlastParser() b_record = parser.parse(blast_out) ### write output file for testing purposes ~35 kB ## import os ## save_fname = os.path.expanduser('~/tmp/blast_out.xml') ## save_file = open(save_fname, 'w') ## save_file.write(blast_results) ## save_file.close() ### # When I ran this on 4 May 2006 I got max_score of 601 & 21 hits expected_score_cutoff = 600 expected_min_hits = 20 max_score = 0.0 for alignment in b_record.alignments: for hsp in alignment.hsps: if hsp.score > max_score: max_score = hsp.score msg = 'max score (%g) < expected_score_cutoff(%g)' % \ (max_score, expected_score_cutoff) self.assert_(max_score >= expected_score_cutoff, msg) msg = 'N(%d) < expected_min_hits(%d)' % \ (len(b_record.alignments), expected_min_hits) self.assert_(len(b_record.alignments) >= expected_min_hits, msg) if __name__ == "__main__": sys.exit(run_tests(sys.argv)) From biopython-dev at maubp.freeserve.co.uk Mon May 15 13:25:13 2006 From: biopython-dev at maubp.freeserve.co.uk (Peter (BioPython-dev)) Date: Mon, 15 May 2006 18:25:13 +0100 Subject: [Biopython-dev] "Online" tests, was [Bug 1972] In-Reply-To: <1147710150.1517.37.camel@tioga.barnard-engineering.com> References: <200603210825.k2L8Pg3S006352@portal.open-bio.org> <441FEEE6.8040402@maubp.freeserve.co.uk> <1143045514.813.26.camel@tioga.barnard-engineering.com> <442403DE.7090607@biopython.org> <1143750836.5736.27.camel@tioga.barnard-engineering.com> <442D93EE.9040707@maubp.freeserve.co.uk> <1147710150.1517.37.camel@tioga.barnard-engineering.com> Message-ID: <4468B979.4090504@maubp.freeserve.co.uk> >>>>In some cases (e.g. GenBank, Fasta) once the sample file is downloaded >>>>there are multiple parsers to be checked (e.g. record and feature parsers). > > I have tests using EUtils that check the record and feature parsers for > GenBank and Fasta files. You don't check the iteration over files with multiple records... that shouldn't be a problem as other than changing the number of blank lines between records I can't think of anything the NCBI might change, but you never know. >>I'll volunteer to add cases for GenBank, Fasta and GEO files. > > I've not yet looked at GEO files, so there's that yet to do. I expect to > have some free time soon so I may look at the GEO and other parsers at > that time. Checking the basic "did it load something with the expected ID" shouldn't be too tricky for GEO Soft files. > I'd certainly be interested in any feedback you have on these > tests, or to see additional test cases. Finding the parsers to test and > implementing the tests seems pretty straightforward. You are only testing XML parsing with blastp - do you think its worth including tests of any of the other variants? Especially the slightly more exotic ones offered by the NBCI. Also, you are only submitting a single query, so only a single blast result is returned. Can multiple queries be submitted as this would give you an excuse to test the Blast iterator support. Loading a PDB file strikes me as another fairly simple one to include. If you can get a link to the "PDB file of the month" then even better. Are there any "online" phylogenetic tree resources we should use to valid ate the new Nexus parser? > I've attached the test code to this email, and added a .txt file > extension just in case the listserv won't allow attaching a code file. I can see the test code file. I haven't run it yet. Thanks Bill. Peter From mcolosimo at mitre.org Mon May 15 16:14:39 2006 From: mcolosimo at mitre.org (Marc Colosimo) Date: Mon, 15 May 2006 16:14:39 -0400 Subject: [Biopython-dev] DNA Strider like frame translation Message-ID: <0F965CC3-12E2-41C9-B725-B4D90538F5B1@mitre.org> All, I just spent some time writing this up and I thought I'll share it with everyone (and hopefully have it committed). Yes, there is all ready is something similar under Bio.SeqUtils (six_frame_translations). However, mine actually looks identical to DNA Striders for both 3/6-frame single letter translations. It also has the power to do -1 and 1 frame, or 1 frame, or 1 and 3 frame, etc... translations. If I get around to it, I might add support for three letter amino acids. Marc p.s. hopefully this gets through. -------------- next part -------------- A non-text attachment was scrubbed... Name: frameTranslations.py Type: text/x-python-script Size: 5489 bytes Desc: not available Url : http://lists.open-bio.org/pipermail/biopython-dev/attachments/20060515/c4e9e1b6/attachment.bin From idoerg at burnham.org Mon May 22 13:40:52 2006 From: idoerg at burnham.org (Iddo Friedberg) Date: Mon, 22 May 2006 10:40:52 -0700 Subject: [Biopython-dev] [Fwd: Re: BOSC 2006 2nd Call for Papers] Message-ID: <4471F7A4.4050203@burnham.org> > > > > ------------------------------------------------------------------------ > > Subject: > BOSC 2006 2nd Call for Papers > From: > Darin London > Date: > Mon, 22 May 2006 11:29:45 -0400 > To: > bosc at open-bio.org > > To: > bosc at open-bio.org > CC: > Authors at lists.open-bio.org, BioBiz at lists.open-bio.org, > Biocorba-announce-l at lists.open-bio.org, Biocorba-l at lists.open-bio.org, > Biograph at lists.open-bio.org, bioinfo-core at lists.open-bio.org, > biojava-dev at lists.open-bio.org, Biojava-l at lists.open-bio.org, > bioped-l at lists.open-bio.org, Bioperl-announce-l at lists.open-bio.org, > Bioperl-l at lists.open-bio.org, bioperl-microarray at lists.open-bio.org, > bioperl-pipeline at lists.open-bio.org, BioPython at lists.open-bio.org, > BioPython-announce at lists.open-bio.org, > Biopython-dev at lists.open-bio.org, BioRuby at lists.open-bio.org, > BioRuby-ja at lists.open-bio.org, Biosoap-l at lists.open-bio.org, > BioSQL-l at lists.open-bio.org, BP-announce at lists.open-bio.org, > DAS at lists.open-bio.org, DAS-announce at lists.open-bio.org, > DAS2 at lists.open-bio.org, Dynamite at lists.open-bio.org, > EMBOSS at lists.open-bio.org, emboss-announce at lists.open-bio.org, > emboss-dev at lists.open-bio.org, Moby-announce at lists.open-bio.org, > MOBY-dev at lists.open-bio.org, moby-l at lists.open-bio.org, > obf-developers at lists.open-bio.org, Ontologies at lists.open-bio.org, > Open-bio-announce at lists.open-bio.org, Open-Bio-l at lists.open-bio.org, > Open-Bioinformatics-Foundation at lists.open-bio.org > > >2nd CALL FOR SPEAKERS > >This is the second and last official call for speakers to submit their >abstracts to speak at BOSC 2006 >in Fortaleza, Brasil. In order to be considered as a potential speaker, >an abstract must be recieved by >Monday, June 5th, 2006. We look forward to a great conference this >year. Please consult >The Official BOSC 2006 Website at: > >http://www.open-bio.org/wiki/BOSC_2006 > >for more details and information. > > >In addition, a BOSC weblog has been setup to make it easier to >desiminate all BOSC >related announcements: > >http://wiki.open-bio.org/boscblog/ > >And if you have an ICAL compatible Calendar, there is an EventDB >calendar set up with all >BOSC related deadlines. > >http://eventful.com/groups/G0-001-000014747-0 > >More information about ISMB can be found at the Official ISMB 2006 Website: > >http://ismb2006.cbi.cnptia.embrapa.br/ > >Thank You, and we look forward to seeing you all, > >The BOSC Organizing Committee. > > > -- Iddo Friedberg, Ph.D. Burnham Institute for Medical Research 10901 N. Torrey Pines Rd. La Jolla, CA 92037 Tel: (858) 646 3100 x3516 Fax: (858) 713 9949 http://iddo-friedberg.org http://BioFunctionPrediction.org From jelle.feringa at ezct.net Fri May 5 09:17:27 2006 From: jelle.feringa at ezct.net (Jelle Feringa / EZCT Architecture & Design Research) Date: Fri, 5 May 2006 11:17:27 +0200 Subject: [Biopython-dev] issues compiling KDTree.cpp / win32 Message-ID: <009501c67024$baa13b20$0b01a8c0@JELLE> Hi, I know there have been some issues compiling KDTree.cpp and have made a brave effort as well (not necessarily a trivial thing to do for the compiler challenged. ) but haven't been able to compile its successfully. Would anyone more succesfull in compiling the _CKDTree.pyd module be so kind to share it with me? I'm pretty desperate for a well implemented kdtree to be frank. Apart from that, I'm putting together a python module for scripting Rhino, a terrific nurbs CAD modeler. It would be great to include the kdtree module into this module. Giving the appropriate credits would I be allowed to do so? Many thanks in advance, -jelle From mdehoon at c2b2.columbia.edu Tue May 9 23:59:55 2006 From: mdehoon at c2b2.columbia.edu (Michiel Jan Laurens de Hoon) Date: Tue, 09 May 2006 16:59:55 -0700 Subject: [Biopython-dev] new website & documentation Message-ID: <44612CFB.7030008@c2b2.columbia.edu> Hi everybody, As you may know, the Biopython website is transitioning to a new server of the Open Bioinformatics Foundation. The OBF folks suggested that we use a wiki-based website (just like bioperl and biojava) instead of quixote, which we have been using so far. A prototype wiki-based Biopython website is now running at http://biopython.open-bio.org/wiki/Biopython. Feel free to contribute to this website as you see fit. Don't be shy. Now the question is whether the complete Biopython documentation should be wiki-based. For an example, see the Bioperl HOWTOs. This would make it easier to update the Biopython documentation, and hopefully result in better and more extensive documentation (which wouldn't hurt). On the downside, we'd lose the PDFs. (Or is it possible to generate PDFs from the wiki website? Wiki gurus, let us know). Another option would be to convert the generic Biopython documentation to wiki (e.g., the tutorial), but keep specialized modules in the current format. Opinions? --Michiel. From biopython-dev at maubp.freeserve.co.uk Wed May 10 14:24:17 2006 From: biopython-dev at maubp.freeserve.co.uk (Peter (BioPython Dev)) Date: Wed, 10 May 2006 15:24:17 +0100 Subject: [Biopython-dev] Biopython for GDS SOFT? In-Reply-To: References: Message-ID: <4461F791.5020607@maubp.freeserve.co.uk> Ramon Aragues wrote: > Hi, > > I-ve seen your post on the bipython discussion list (from 2005) about > GDS SOFT files. > > Has any work been done on it since then? Can I use biopython for > parsing GDS SOFT files? Yes, I checked in some revisions to the Bio/Geo files in Jan 2006, and it seemed to work for me. http://cvs.biopython.org/cgi-bin/viewcvs/viewcvs.cgi/biopython/Bio/Geo/?cvsroot=biopython I also added the examples the NCBI provided to document their 2005 file format changes to the BioPython GEO test suite. I was only interested in some very basic exploration at the time, and then moved on to using Sean Davis' GEOquery for R/BioConductor instead. I found this made statistical analysis and visualisation much easier. In any event, looking at GDS SOFT files was rather an aside to my main research interests... > I am already using biopython in my framework ( http://sbi.imim.es/piana > ) so it would be great if I can use biopython for this as well. This should be possible once you update the Bio/Geo files to those from CVS or the next release of BioPython. Please let us (me) know how you get on... However, I think you will be on your own in terms of statistical data analysis with python. > Cheers! > > Ramon Good luck Peter From bill at barnard-engineering.com Mon May 15 16:22:30 2006 From: bill at barnard-engineering.com (Bill Barnard) Date: Mon, 15 May 2006 09:22:30 -0700 Subject: [Biopython-dev] "Online" tests, was [Bug 1972] In-Reply-To: <442D93EE.9040707@maubp.freeserve.co.uk> References: <200603210825.k2L8Pg3S006352@portal.open-bio.org> <441FEEE6.8040402@maubp.freeserve.co.uk> <1143045514.813.26.camel@tioga.barnard-engineering.com> <442403DE.7090607@biopython.org> <1143750836.5736.27.camel@tioga.barnard-engineering.com> <442D93EE.9040707@maubp.freeserve.co.uk> Message-ID: <1147710150.1517.37.camel@tioga.barnard-engineering.com> On Fri, 2006-03-31 at 21:41 +0100, Peter (BioPython Dev) wrote: > Bill Barnard wrote: > > I've made a first cut unit test, tentatively named > > test_Parsers_for_newest_formats, ... > > Sounds good to me. But not a very snappy name - how about something > shorter like test_OnlineFormats.py instead? Works for me. > I think the Blast test should actually submit a short protein/nucleotide > sequence known to be in the online database. Maybe do some basic sanity > testing like check it returns at least N results and the best hit is at > least a certain score. I have one such test working. > > >>In some cases (e.g. GenBank, Fasta) once the sample file is downloaded > >>there are multiple parsers to be checked (e.g. record and feature parsers). I have tests using EUtils that check the record and feature parsers for GenBank and Fasta files. > I'll volunteer to add cases for GenBank, Fasta and GEO files. I've not yet looked at GEO files, so there's that yet to do. I expect to have some free time soon so I may look at the GEO and other parsers at that time. I'd certainly be interested in any feedback you have on these tests, or to see additional test cases. Finding the parsers to test and implementing the tests seems pretty straightforward. I've attached the test code to this email, and added a .txt file extension just in case the listserv won't allow attaching a code file. Bill p.s. I'm still looking for a more interesting project I can contribute to Biopython. Doing some of the tests and such is useful to me in learning my way around the codebase; I hope to find something requiring a bit more creativity. If anyone has any suggestions about areas that need some attention I'll happily consider them. I am considering writing some code for HMMs that would build on some prototype code I wrote. -------------- next part -------------- """Test to make sure all parsers retrieve and read current data formats. """ import requires_internet import sys import unittest from Bio import ParserSupport # ExpasyTest from Bio import Prosite from Bio.Prosite import Prodoc from Bio.SwissProt import SProt # PubMedTest from Bio import PubMed, Medline # EUtilsTest from Bio import EUtils from Bio.EUtils import DBIdsClient from Bio import GenBank from Bio import Fasta # BlastTest try: import cStringIO as StringIO except ImportError: import StringIO from Bio.Blast import NCBIWWW, NCBIXML def run_tests(argv): test_suite = testing_suite() runner = unittest.TextTestRunner(sys.stdout, verbosity = 2) runner.run(test_suite) def testing_suite(): """Generate the suite of tests. """ test_suite = unittest.TestSuite() test_loader = unittest.TestLoader() test_loader.testMethodPrefix = 't_' tests = [ExpasyTest, PubMedTest, EUtilsTest, BlastTest] for test in tests: cur_suite = test_loader.loadTestsFromTestCase(test) test_suite.addTest(cur_suite) return test_suite class ExpasyTest(unittest.TestCase): """Test that parsers can read the current Expasy database formats """ def setUp(self): pass def t_parse_prosite_record(self): """Retrieve a Prosite record and parse it """ prosite_dict = Prosite.ExPASyDictionary(parser=Prosite.RecordParser()) accession = 'PS00159' entry = prosite_dict[accession] self.assertEqual(entry.accession, accession) def t_parse_prodoc_record(self): """Retrieve a Prodoc record and parse it """ prodoc_dict = Prodoc.ExPASyDictionary(parser=Prodoc.RecordParser()) accession = 'PDOC00933' entry = prodoc_dict[accession] self.assertEqual(entry.accession, accession) def t_parse_sprot_record(self): """Retrieve a SwissProt record and parse it into Record format """ sprot_record_dict = SProt.ExPASyDictionary(parser=SProt.RecordParser()) accession = 'Q5TYW8' entry = sprot_record_dict[accession] self.failUnless(accession in entry.accessions) def t_parse_sprot_seq(self): """Retrieve a SwissProt record and parse it into Sequence format """ sprot_seq_dict = SProt.ExPASyDictionary(parser=SProt.SequenceParser()) accession = 'Q5TYW8' entry = sprot_seq_dict[accession] self.assertEqual(entry.id, accession) class PubMedTest(unittest.TestCase): """Test that Medline parsers can read the current PubMed format """ def setUp(self): pass def t_parse_pubmed_record(self): """Retrieve a PubMed record and parse it """ pubmed_dict = PubMed.Dictionary(parser=Medline.RecordParser()) pubmed_id = '3136164' entry = pubmed_dict[pubmed_id] self.assertEqual(entry.pubmed_id, pubmed_id) class EUtilsTest(unittest.TestCase): """Test that GenBank, Fasta parsers can read EUtils retrieved db formats """ # Database primary IDs # http://eutils.ncbi.nlm.nih.gov/entrez/eutils/einfo.fcgi? # lists all E-Utility database names. Primary IDs, which are # always an integer, are listed below for a few databases: # Entrez Database Primary ID E-Utility Database Name # 3D Domains 3D SDI domains # Domains PSSM-ID cdd # Genome Genome ID genome # Nucleotide GI number nucleotide # OMIM MIM number omim # PopSet Popset ID popset # Protein GI number protein # ProbeSet GEO ID geo # PubMed PMID pubmed # Structure MMDB ID structure # SNP SNP ID snp # Taxonomy TAXID taxonomy # UniGene UniGene ID unigene # UniSTS UniSTS ID unists # see Bio.EUtils.Config.py for supported EUtils databases def setUp(self): self.client = DBIdsClient.DBIdsClient() def t_parse_nucleotide_gb(self): """Use EUtils to retrieve and parse an NCBI nucleotide GenBank record """ db = "nucleotide" gi = "57165207" result = self.client.search(gi, db, retmax=1) sfp = result[0].efetch(retmode="text", rettype="gb", \ seq_start=5458145, seq_stop=5458942) text = sfp.readlines() sfp.close() self.assertEqual(result.dbids.db, db) # sanity check, not a parser test locus_line = text[0] # now parse the text with GenBank parsers parser_input = StringIO.StringIO(''.join(text)) # RecordParser parser = GenBank.RecordParser() record = parser.parse(parser_input) parser_input.reset() self.assertEqual(record.gi, gi) # split locus line to eliminate whitespace differences self.assertEqual(record._locus_line().split(), locus_line.split()) # FeatureParser parser = GenBank.FeatureParser() record = parser.parse(parser_input) parser_input.reset() self.assertEqual(record.annotations['gi'], gi) def t_parse_protein_fasta(self): """Use EUtils to retrieve and parse an NCBI protein Fasta record """ db = "protein" gi = "21220767" result = self.client.search(gi, db, retmax=1) sfp = result[0].efetch(retmode="text", rettype="fasta") text = sfp.readlines() sfp.close() self.assertEqual(result.dbids.db, db) # sanity check, not a parser test # now parse the text with Fasta parsers parser_input = StringIO.StringIO(''.join(text)) # RecordParser parser = Fasta.RecordParser() record = parser.parse(parser_input) parser_input.reset() self.assert_(gi in record.title.split('|')) self.assertEqual(record.title, text[0][1:-1]) # strip > and \n from text[0] # SequenceParser parser = Fasta.SequenceParser() record = parser.parse(parser_input) parser_input.reset() self.assert_(gi in record.description.split('|')) self.assertEqual(record.description, text[0][1:-1]) # strip > and \n from text[0] class BlastTest(unittest.TestCase): """Test that the Blast XML parser can read the current NCBI formats """ def setUp(self): pass def t_parse_blast_xml(self): """Use NCBIWWW to retrieve Blast results and use NCBIXML to parse it """ query = '''>sp|P09405|NUCL_MOUSE Nucleolin (Protein C23) - Mus musculus (Mouse). VKLAKAGKTHGEAKKMAPPPKEVEEDSEDEEMSEDEDDSSGEEEVVIPQKKGKKATTTPA KKVVVSQTKKAAVPTPAKKAAVTPGKKAVATPAKKNITPAKVIPTPGKKGAAQAKALVPT''' result_handle = NCBIWWW.qblast('blastp', 'swissprot', \ query, expect=0.0001, format_type="XML") blast_results = result_handle.read() result_handle.close() blast_out = StringIO.StringIO(blast_results) parser = NCBIXML.BlastParser() b_record = parser.parse(blast_out) ### write output file for testing purposes ~35 kB ## import os ## save_fname = os.path.expanduser('~/tmp/blast_out.xml') ## save_file = open(save_fname, 'w') ## save_file.write(blast_results) ## save_file.close() ### # When I ran this on 4 May 2006 I got max_score of 601 & 21 hits expected_score_cutoff = 600 expected_min_hits = 20 max_score = 0.0 for alignment in b_record.alignments: for hsp in alignment.hsps: if hsp.score > max_score: max_score = hsp.score msg = 'max score (%g) < expected_score_cutoff(%g)' % \ (max_score, expected_score_cutoff) self.assert_(max_score >= expected_score_cutoff, msg) msg = 'N(%d) < expected_min_hits(%d)' % \ (len(b_record.alignments), expected_min_hits) self.assert_(len(b_record.alignments) >= expected_min_hits, msg) if __name__ == "__main__": sys.exit(run_tests(sys.argv)) From biopython-dev at maubp.freeserve.co.uk Mon May 15 17:25:13 2006 From: biopython-dev at maubp.freeserve.co.uk (Peter (BioPython-dev)) Date: Mon, 15 May 2006 18:25:13 +0100 Subject: [Biopython-dev] "Online" tests, was [Bug 1972] In-Reply-To: <1147710150.1517.37.camel@tioga.barnard-engineering.com> References: <200603210825.k2L8Pg3S006352@portal.open-bio.org> <441FEEE6.8040402@maubp.freeserve.co.uk> <1143045514.813.26.camel@tioga.barnard-engineering.com> <442403DE.7090607@biopython.org> <1143750836.5736.27.camel@tioga.barnard-engineering.com> <442D93EE.9040707@maubp.freeserve.co.uk> <1147710150.1517.37.camel@tioga.barnard-engineering.com> Message-ID: <4468B979.4090504@maubp.freeserve.co.uk> >>>>In some cases (e.g. GenBank, Fasta) once the sample file is downloaded >>>>there are multiple parsers to be checked (e.g. record and feature parsers). > > I have tests using EUtils that check the record and feature parsers for > GenBank and Fasta files. You don't check the iteration over files with multiple records... that shouldn't be a problem as other than changing the number of blank lines between records I can't think of anything the NCBI might change, but you never know. >>I'll volunteer to add cases for GenBank, Fasta and GEO files. > > I've not yet looked at GEO files, so there's that yet to do. I expect to > have some free time soon so I may look at the GEO and other parsers at > that time. Checking the basic "did it load something with the expected ID" shouldn't be too tricky for GEO Soft files. > I'd certainly be interested in any feedback you have on these > tests, or to see additional test cases. Finding the parsers to test and > implementing the tests seems pretty straightforward. You are only testing XML parsing with blastp - do you think its worth including tests of any of the other variants? Especially the slightly more exotic ones offered by the NBCI. Also, you are only submitting a single query, so only a single blast result is returned. Can multiple queries be submitted as this would give you an excuse to test the Blast iterator support. Loading a PDB file strikes me as another fairly simple one to include. If you can get a link to the "PDB file of the month" then even better. Are there any "online" phylogenetic tree resources we should use to valid ate the new Nexus parser? > I've attached the test code to this email, and added a .txt file > extension just in case the listserv won't allow attaching a code file. I can see the test code file. I haven't run it yet. Thanks Bill. Peter From mcolosimo at mitre.org Mon May 15 20:14:39 2006 From: mcolosimo at mitre.org (Marc Colosimo) Date: Mon, 15 May 2006 16:14:39 -0400 Subject: [Biopython-dev] DNA Strider like frame translation Message-ID: <0F965CC3-12E2-41C9-B725-B4D90538F5B1@mitre.org> All, I just spent some time writing this up and I thought I'll share it with everyone (and hopefully have it committed). Yes, there is all ready is something similar under Bio.SeqUtils (six_frame_translations). However, mine actually looks identical to DNA Striders for both 3/6-frame single letter translations. It also has the power to do -1 and 1 frame, or 1 frame, or 1 and 3 frame, etc... translations. If I get around to it, I might add support for three letter amino acids. Marc p.s. hopefully this gets through. -------------- next part -------------- A non-text attachment was scrubbed... Name: frameTranslations.py Type: text/x-python-script Size: 5489 bytes Desc: not available URL: From idoerg at burnham.org Mon May 22 17:40:52 2006 From: idoerg at burnham.org (Iddo Friedberg) Date: Mon, 22 May 2006 10:40:52 -0700 Subject: [Biopython-dev] [Fwd: Re: BOSC 2006 2nd Call for Papers] Message-ID: <4471F7A4.4050203@burnham.org> > > > > ------------------------------------------------------------------------ > > Subject: > BOSC 2006 2nd Call for Papers > From: > Darin London > Date: > Mon, 22 May 2006 11:29:45 -0400 > To: > bosc at open-bio.org > > To: > bosc at open-bio.org > CC: > Authors at lists.open-bio.org, BioBiz at lists.open-bio.org, > Biocorba-announce-l at lists.open-bio.org, Biocorba-l at lists.open-bio.org, > Biograph at lists.open-bio.org, bioinfo-core at lists.open-bio.org, > biojava-dev at lists.open-bio.org, Biojava-l at lists.open-bio.org, > bioped-l at lists.open-bio.org, Bioperl-announce-l at lists.open-bio.org, > Bioperl-l at lists.open-bio.org, bioperl-microarray at lists.open-bio.org, > bioperl-pipeline at lists.open-bio.org, BioPython at lists.open-bio.org, > BioPython-announce at lists.open-bio.org, > Biopython-dev at lists.open-bio.org, BioRuby at lists.open-bio.org, > BioRuby-ja at lists.open-bio.org, Biosoap-l at lists.open-bio.org, > BioSQL-l at lists.open-bio.org, BP-announce at lists.open-bio.org, > DAS at lists.open-bio.org, DAS-announce at lists.open-bio.org, > DAS2 at lists.open-bio.org, Dynamite at lists.open-bio.org, > EMBOSS at lists.open-bio.org, emboss-announce at lists.open-bio.org, > emboss-dev at lists.open-bio.org, Moby-announce at lists.open-bio.org, > MOBY-dev at lists.open-bio.org, moby-l at lists.open-bio.org, > obf-developers at lists.open-bio.org, Ontologies at lists.open-bio.org, > Open-bio-announce at lists.open-bio.org, Open-Bio-l at lists.open-bio.org, > Open-Bioinformatics-Foundation at lists.open-bio.org > > >2nd CALL FOR SPEAKERS > >This is the second and last official call for speakers to submit their >abstracts to speak at BOSC 2006 >in Fortaleza, Brasil. In order to be considered as a potential speaker, >an abstract must be recieved by >Monday, June 5th, 2006. We look forward to a great conference this >year. Please consult >The Official BOSC 2006 Website at: > >http://www.open-bio.org/wiki/BOSC_2006 > >for more details and information. > > >In addition, a BOSC weblog has been setup to make it easier to >desiminate all BOSC >related announcements: > >http://wiki.open-bio.org/boscblog/ > >And if you have an ICAL compatible Calendar, there is an EventDB >calendar set up with all >BOSC related deadlines. > >http://eventful.com/groups/G0-001-000014747-0 > >More information about ISMB can be found at the Official ISMB 2006 Website: > >http://ismb2006.cbi.cnptia.embrapa.br/ > >Thank You, and we look forward to seeing you all, > >The BOSC Organizing Committee. > > > -- Iddo Friedberg, Ph.D. Burnham Institute for Medical Research 10901 N. Torrey Pines Rd. La Jolla, CA 92037 Tel: (858) 646 3100 x3516 Fax: (858) 713 9949 http://iddo-friedberg.org http://BioFunctionPrediction.org