From jelle.feringa at ezct.net  Fri May  5 05:17:27 2006
From: jelle.feringa at ezct.net (Jelle Feringa / EZCT Architecture & Design Research)
Date: Fri, 5 May 2006 11:17:27 +0200
Subject: [Biopython-dev] issues compiling KDTree.cpp / win32
Message-ID: <009501c67024$baa13b20$0b01a8c0@JELLE>

 
Hi,

 
I know there have been some issues compiling KDTree.cpp and have made a
brave effort as well (not necessarily a trivial thing to do for the compiler
challenged. ) but haven't been able to compile its successfully. Would
anyone more succesfull in compiling the _CKDTree.pyd module be so kind to
share it with me? I'm pretty desperate for a well implemented kdtree to be
frank.

 
Apart from that, I'm putting together a python module for scripting Rhino, a
terrific nurbs CAD modeler.

It would be great to include the kdtree module into this module. Giving the
appropriate credits would I be allowed to do so? 

 
Many thanks in advance,

 
-jelle

 
From mdehoon at c2b2.columbia.edu  Tue May  9 19:59:55 2006
From: mdehoon at c2b2.columbia.edu (Michiel Jan Laurens de Hoon)
Date: Tue, 09 May 2006 16:59:55 -0700
Subject: [Biopython-dev] new website & documentation
Message-ID: <44612CFB.7030008@c2b2.columbia.edu>

Hi everybody,

As you may know, the Biopython website is transitioning to a new server 
of the Open Bioinformatics Foundation. The OBF folks suggested that we 
use a wiki-based website (just like bioperl and biojava) instead of 
quixote, which we have been using so far. A prototype wiki-based 
Biopython website is now running at 
http://biopython.open-bio.org/wiki/Biopython. Feel free to contribute to 
this website as you see fit. Don't be shy.

Now the question is whether the complete Biopython documentation should 
be wiki-based. For an example, see the Bioperl HOWTOs. This would make 
it easier to update the Biopython documentation, and hopefully result in 
better and more extensive documentation (which wouldn't hurt). On the 
downside, we'd lose the PDFs. (Or is it possible to generate PDFs from 
the wiki website? Wiki gurus, let us know). Another option would be to 
convert the generic Biopython documentation to wiki (e.g., the 
tutorial), but keep specialized modules in the current format.

Opinions?

--Michiel.

From biopython-dev at maubp.freeserve.co.uk  Wed May 10 10:24:17 2006
From: biopython-dev at maubp.freeserve.co.uk (Peter (BioPython Dev))
Date: Wed, 10 May 2006 15:24:17 +0100
Subject: [Biopython-dev] Biopython for GDS SOFT?
In-Reply-To: <E30B6CAA-9047-4B53-BFA2-A70770C64710@upf.edu>
References: <E30B6CAA-9047-4B53-BFA2-A70770C64710@upf.edu>
Message-ID: <4461F791.5020607@maubp.freeserve.co.uk>

Ramon Aragues wrote:
> Hi,
> 
> I-ve seen your post on the bipython discussion list (from 2005) about  
> GDS SOFT files.
> 
> Has any work been done on it since then? Can I use biopython for  
> parsing GDS SOFT files?

Yes, I checked in some revisions to the Bio/Geo files in Jan 2006, and 
it seemed to work for me.

http://cvs.biopython.org/cgi-bin/viewcvs/viewcvs.cgi/biopython/Bio/Geo/?cvsroot=biopython

I also added the examples the NCBI provided to document their 2005 file 
format changes to the BioPython GEO test suite.

I was only interested in some very basic exploration at the time, and 
then moved on to using Sean Davis' GEOquery for R/BioConductor instead. 
  I found this made statistical analysis and visualisation much easier.

In any event, looking at GDS SOFT files was rather an aside to my main 
research interests...

> I am already using biopython in my framework ( http://sbi.imim.es/piana 
> ) so it would be great if I can use biopython for this as well.

This should be possible once you update the Bio/Geo files to those from 
CVS or the next release of BioPython.  Please let us (me) know how you 
get on...

However, I think you will be on your own in terms of statistical data 
analysis with python.

> Cheers!
> 
> Ramon

Good luck

Peter


From bill at barnard-engineering.com  Mon May 15 12:22:30 2006
From: bill at barnard-engineering.com (Bill Barnard)
Date: Mon, 15 May 2006 09:22:30 -0700
Subject: [Biopython-dev] "Online" tests, was [Bug 1972]
In-Reply-To: <442D93EE.9040707@maubp.freeserve.co.uk>
References: <200603210825.k2L8Pg3S006352@portal.open-bio.org>
	<441FEEE6.8040402@maubp.freeserve.co.uk>
	<1143045514.813.26.camel@tioga.barnard-engineering.com>
	<442403DE.7090607@biopython.org>
	<1143750836.5736.27.camel@tioga.barnard-engineering.com>
	<442D93EE.9040707@maubp.freeserve.co.uk>
Message-ID: <1147710150.1517.37.camel@tioga.barnard-engineering.com>

On Fri, 2006-03-31 at 21:41 +0100, Peter (BioPython Dev) wrote:
> Bill Barnard wrote:
> > I've made a first cut unit test, tentatively named
> > test_Parsers_for_newest_formats, ...
> 
> Sounds good to me.  But not a very snappy name - how about something 
> shorter like test_OnlineFormats.py instead?

Works for me.

> I think the Blast test should actually submit a short protein/nucleotide 
> sequence known to be in the online database.  Maybe do some basic sanity 
> testing like check it returns at least N results and the best hit is at 
> least a certain score.

I have one such test working.

> 
> >>In some cases (e.g. GenBank, Fasta) once the sample file is downloaded 
> >>there are multiple parsers to be checked (e.g. record and feature parsers).

I have tests using EUtils that check the record and feature parsers for
GenBank and Fasta files.

> I'll volunteer to add cases for GenBank, Fasta and GEO files.

I've not yet looked at GEO files, so there's that yet to do. I expect to
have some free time soon so I may look at the GEO and other parsers at
that time. I'd certainly be interested in any feedback you have on these
tests, or to see additional test cases. Finding the parsers to test and
implementing the tests seems pretty straightforward.

I've attached the test code to this email, and added a .txt file
extension just in case the listserv won't allow attaching a code file.

Bill

p.s. I'm still looking for a more interesting project I can contribute
to Biopython. Doing some of the tests and such is useful to me in
learning my way around the codebase; I hope to find something requiring
a bit more creativity. If anyone has any suggestions about areas that
need some attention I'll happily consider them. I am considering writing
some code for HMMs that would build on some prototype code I wrote.
-------------- next part --------------
"""Test to make sure all parsers retrieve and read current
data formats.
"""
import requires_internet

import sys
import unittest

from Bio import ParserSupport
# ExpasyTest
from Bio import Prosite
from Bio.Prosite import Prodoc
from Bio.SwissProt import SProt
# PubMedTest
from Bio import PubMed, Medline
# EUtilsTest
from Bio import EUtils
from Bio.EUtils import DBIdsClient
from Bio import GenBank
from Bio import Fasta
# BlastTest
try:
    import cStringIO as StringIO
except ImportError:
    import StringIO
from Bio.Blast import NCBIWWW, NCBIXML

def run_tests(argv):
    test_suite = testing_suite()
    runner = unittest.TextTestRunner(sys.stdout, verbosity = 2)
    runner.run(test_suite)

def testing_suite():
    """Generate the suite of tests.
    """
    test_suite = unittest.TestSuite()

    test_loader = unittest.TestLoader()
    test_loader.testMethodPrefix = 't_'
    tests = [ExpasyTest, PubMedTest, EUtilsTest, BlastTest]
    
    for test in tests:
        cur_suite = test_loader.loadTestsFromTestCase(test)
        test_suite.addTest(cur_suite)

    return test_suite

class ExpasyTest(unittest.TestCase):
    """Test that parsers can read the current Expasy database formats
    """
    def setUp(self):
        pass
    def t_parse_prosite_record(self):
        """Retrieve a Prosite record and parse it
        """
        prosite_dict = Prosite.ExPASyDictionary(parser=Prosite.RecordParser())
        accession = 'PS00159'
        entry = prosite_dict[accession]
        self.assertEqual(entry.accession, accession)
    def t_parse_prodoc_record(self):
        """Retrieve a Prodoc record and parse it
        """
        prodoc_dict = Prodoc.ExPASyDictionary(parser=Prodoc.RecordParser())
        accession = 'PDOC00933'
        entry = prodoc_dict[accession]
        self.assertEqual(entry.accession, accession)
    def t_parse_sprot_record(self):
        """Retrieve a SwissProt record and parse it into Record format
        """
        sprot_record_dict = SProt.ExPASyDictionary(parser=SProt.RecordParser())
        accession = 'Q5TYW8'
        entry = sprot_record_dict[accession]
        self.failUnless(accession in entry.accessions)
    def t_parse_sprot_seq(self):
        """Retrieve a SwissProt record and parse it into Sequence format
        """
        sprot_seq_dict = SProt.ExPASyDictionary(parser=SProt.SequenceParser())
        accession = 'Q5TYW8'
        entry = sprot_seq_dict[accession]
        self.assertEqual(entry.id, accession)

class PubMedTest(unittest.TestCase):
    """Test that Medline parsers can read the current PubMed format
    """
    def setUp(self):
        pass
    def t_parse_pubmed_record(self):
        """Retrieve a PubMed record and parse it
        """
        pubmed_dict = PubMed.Dictionary(parser=Medline.RecordParser())
        pubmed_id = '3136164'
        entry = pubmed_dict[pubmed_id]
        self.assertEqual(entry.pubmed_id, pubmed_id)

class EUtilsTest(unittest.TestCase):
    """Test that GenBank, Fasta parsers can read  EUtils retrieved db formats
    """
    # Database primary IDs

    # http://eutils.ncbi.nlm.nih.gov/entrez/eutils/einfo.fcgi?
    # lists all E-Utility database names. Primary IDs, which are
    # always an integer, are listed below for a few databases:
        
    # Entrez Database   Primary ID      E-Utility Database Name
    # 3D Domains        3D SDI          domains
    # Domains           PSSM-ID         cdd
    # Genome            Genome ID       genome
    # Nucleotide        GI number       nucleotide
    # OMIM              MIM number      omim
    # PopSet            Popset ID       popset
    # Protein           GI number       protein
    # ProbeSet          GEO ID          geo
    # PubMed            PMID            pubmed
    # Structure         MMDB ID         structure
    # SNP               SNP ID          snp
    # Taxonomy          TAXID           taxonomy
    # UniGene           UniGene ID      unigene
    # UniSTS            UniSTS ID       unists

    # see Bio.EUtils.Config.py for supported EUtils databases

    def setUp(self):
        self.client = DBIdsClient.DBIdsClient()
    def t_parse_nucleotide_gb(self):
        """Use EUtils to retrieve and parse an NCBI nucleotide GenBank record
        """
        db = "nucleotide"
        gi = "57165207"
        result = self.client.search(gi, db, retmax=1)
        sfp = result[0].efetch(retmode="text", rettype="gb", \
                               seq_start=5458145, seq_stop=5458942)
        text = sfp.readlines()
        sfp.close()
        self.assertEqual(result.dbids.db, db) # sanity check, not a parser test
        locus_line = text[0]
        # now parse the text with GenBank parsers
        parser_input = StringIO.StringIO(''.join(text))
        # RecordParser
        parser = GenBank.RecordParser()
        record = parser.parse(parser_input)
        parser_input.reset()
        self.assertEqual(record.gi, gi)
        # split locus line to eliminate whitespace differences
        self.assertEqual(record._locus_line().split(), locus_line.split())
        # FeatureParser
        parser = GenBank.FeatureParser()
        record = parser.parse(parser_input)
        parser_input.reset()
        self.assertEqual(record.annotations['gi'], gi)
    def t_parse_protein_fasta(self):
        """Use EUtils to retrieve and parse an NCBI protein Fasta record
        """
        db = "protein"
        gi = "21220767"
        result = self.client.search(gi, db, retmax=1)
        sfp = result[0].efetch(retmode="text", rettype="fasta")
        text = sfp.readlines()
        sfp.close()
        self.assertEqual(result.dbids.db, db) # sanity check, not a parser test
        # now parse the text with Fasta parsers
        parser_input = StringIO.StringIO(''.join(text))
        # RecordParser
        parser = Fasta.RecordParser()
        record = parser.parse(parser_input)
        parser_input.reset()
        self.assert_(gi in record.title.split('|'))
        self.assertEqual(record.title, text[0][1:-1]) # strip > and \n from text[0]
        # SequenceParser
        parser = Fasta.SequenceParser()
        record = parser.parse(parser_input)
        parser_input.reset()
        self.assert_(gi in record.description.split('|'))
        self.assertEqual(record.description, text[0][1:-1]) # strip > and \n from text[0]

class BlastTest(unittest.TestCase):
    """Test that the Blast XML parser can read the current NCBI formats
    """
    def setUp(self):
        pass
    def t_parse_blast_xml(self):
        """Use NCBIWWW to retrieve Blast results and use NCBIXML to parse it
        """
        query = '''>sp|P09405|NUCL_MOUSE Nucleolin (Protein C23) - Mus musculus (Mouse).
VKLAKAGKTHGEAKKMAPPPKEVEEDSEDEEMSEDEDDSSGEEEVVIPQKKGKKATTTPA
KKVVVSQTKKAAVPTPAKKAAVTPGKKAVATPAKKNITPAKVIPTPGKKGAAQAKALVPT'''
        result_handle = NCBIWWW.qblast('blastp', 'swissprot', \
                                       query, expect=0.0001, format_type="XML")
        blast_results = result_handle.read()
        result_handle.close()
        blast_out = StringIO.StringIO(blast_results)
        parser = NCBIXML.BlastParser()
        b_record = parser.parse(blast_out)
        ### write output file for testing purposes ~35 kB
##        import os
##        save_fname = os.path.expanduser('~/tmp/blast_out.xml')
##        save_file = open(save_fname, 'w')
##        save_file.write(blast_results)
##        save_file.close()
        ###
        # When I ran this on 4 May 2006 I got max_score of 601 & 21 hits
        expected_score_cutoff = 600
        expected_min_hits = 20
        max_score = 0.0
        for alignment in b_record.alignments:
            for hsp in alignment.hsps:
                if hsp.score > max_score: max_score = hsp.score

        msg = 'max score (%g) < expected_score_cutoff(%g)' % \
              (max_score, expected_score_cutoff)
        self.assert_(max_score >= expected_score_cutoff, msg)

        msg = 'N(%d) < expected_min_hits(%d)' % \
              (len(b_record.alignments), expected_min_hits)
        self.assert_(len(b_record.alignments) >= expected_min_hits, msg)

if __name__ == "__main__":
    sys.exit(run_tests(sys.argv))

From biopython-dev at maubp.freeserve.co.uk  Mon May 15 13:25:13 2006
From: biopython-dev at maubp.freeserve.co.uk (Peter (BioPython-dev))
Date: Mon, 15 May 2006 18:25:13 +0100
Subject: [Biopython-dev] "Online" tests, was [Bug 1972]
In-Reply-To: <1147710150.1517.37.camel@tioga.barnard-engineering.com>
References: <200603210825.k2L8Pg3S006352@portal.open-bio.org>	<441FEEE6.8040402@maubp.freeserve.co.uk>	<1143045514.813.26.camel@tioga.barnard-engineering.com>	<442403DE.7090607@biopython.org>	<1143750836.5736.27.camel@tioga.barnard-engineering.com>	<442D93EE.9040707@maubp.freeserve.co.uk>
	<1147710150.1517.37.camel@tioga.barnard-engineering.com>
Message-ID: <4468B979.4090504@maubp.freeserve.co.uk>

>>>>In some cases (e.g. GenBank, Fasta) once the sample file is downloaded 
>>>>there are multiple parsers to be checked (e.g. record and feature parsers).
> 
> I have tests using EUtils that check the record and feature parsers for
> GenBank and Fasta files.

You don't check the iteration over files with multiple records... that 
shouldn't be a problem as other than changing the number of blank lines 
between records I can't think of anything the NCBI might change, but you 
never know.

>>I'll volunteer to add cases for GenBank, Fasta and GEO files.
> 
> I've not yet looked at GEO files, so there's that yet to do. I expect to
> have some free time soon so I may look at the GEO and other parsers at
> that time.

Checking the basic "did it load something with the expected ID" 
shouldn't be too tricky for GEO Soft files.

 > I'd certainly be interested in any feedback you have on these
> tests, or to see additional test cases. Finding the parsers to test and
> implementing the tests seems pretty straightforward.

You are only testing XML parsing with blastp - do you think its worth 
including tests of any of the other variants?  Especially the slightly 
more exotic ones offered by the NBCI.

Also, you are only submitting a single query, so only a single blast 
result is returned.  Can multiple queries be submitted as this would 
give you an excuse to test the Blast iterator support.

Loading a PDB file strikes me as another fairly simple one to include. 
If you can get a link to the "PDB file of the month" then even better.

Are there any "online" phylogenetic tree resources we should use to 
valid ate the new Nexus parser?

> I've attached the test code to this email, and added a .txt file
> extension just in case the listserv won't allow attaching a code file.

I can see the test code file.  I haven't run it yet.

Thanks Bill.

Peter


From mcolosimo at mitre.org  Mon May 15 16:14:39 2006
From: mcolosimo at mitre.org (Marc Colosimo)
Date: Mon, 15 May 2006 16:14:39 -0400
Subject: [Biopython-dev] DNA Strider like frame translation
Message-ID: <0F965CC3-12E2-41C9-B725-B4D90538F5B1@mitre.org>

All,

I just spent some time writing this up and I thought I'll share it  
with everyone (and hopefully have it committed). Yes, there is all  
ready is something similar under Bio.SeqUtils  
(six_frame_translations). However, mine actually looks identical to  
DNA Striders for both 3/6-frame single letter translations. It also  
has the power to do -1 and 1 frame, or 1 frame, or 1 and 3 frame,  
etc... translations. If I get around to it, I might add support for  
three letter amino acids.

Marc

p.s. hopefully this gets through.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: frameTranslations.py
Type: text/x-python-script
Size: 5489 bytes
Desc: not available
Url : http://lists.open-bio.org/pipermail/biopython-dev/attachments/20060515/c4e9e1b6/attachment.bin 

From idoerg at burnham.org  Mon May 22 13:40:52 2006
From: idoerg at burnham.org (Iddo Friedberg)
Date: Mon, 22 May 2006 10:40:52 -0700
Subject: [Biopython-dev] [Fwd: Re: BOSC 2006 2nd Call for Papers]
Message-ID: <4471F7A4.4050203@burnham.org>


>  
>
>
> ------------------------------------------------------------------------
>
> Subject:
> BOSC 2006 2nd Call for Papers
> From:
> Darin London <darin.london at duke.edu>
> Date:
> Mon, 22 May 2006 11:29:45 -0400
> To:
> bosc at open-bio.org
>
> To:
> bosc at open-bio.org
> CC:
> Authors at lists.open-bio.org, BioBiz at lists.open-bio.org,
> Biocorba-announce-l at lists.open-bio.org, Biocorba-l at lists.open-bio.org,
> Biograph at lists.open-bio.org, bioinfo-core at lists.open-bio.org,
> biojava-dev at lists.open-bio.org, Biojava-l at lists.open-bio.org,
> bioped-l at lists.open-bio.org, Bioperl-announce-l at lists.open-bio.org,
> Bioperl-l at lists.open-bio.org, bioperl-microarray at lists.open-bio.org,
> bioperl-pipeline at lists.open-bio.org, BioPython at lists.open-bio.org,
> BioPython-announce at lists.open-bio.org,
> Biopython-dev at lists.open-bio.org, BioRuby at lists.open-bio.org,
> BioRuby-ja at lists.open-bio.org, Biosoap-l at lists.open-bio.org,
> BioSQL-l at lists.open-bio.org, BP-announce at lists.open-bio.org,
> DAS at lists.open-bio.org, DAS-announce at lists.open-bio.org,
> DAS2 at lists.open-bio.org, Dynamite at lists.open-bio.org,
> EMBOSS at lists.open-bio.org, emboss-announce at lists.open-bio.org,
> emboss-dev at lists.open-bio.org, Moby-announce at lists.open-bio.org,
> MOBY-dev at lists.open-bio.org, moby-l at lists.open-bio.org,
> obf-developers at lists.open-bio.org, Ontologies at lists.open-bio.org,
> Open-bio-announce at lists.open-bio.org, Open-Bio-l at lists.open-bio.org,
> Open-Bioinformatics-Foundation at lists.open-bio.org
>
>
>2nd CALL FOR SPEAKERS
>
>This is the second and last official call for speakers to submit their
>abstracts to speak at  BOSC 2006
>in Fortaleza, Brasil.  In order to be considered as a potential speaker,
>an abstract must be recieved by
>Monday, June 5th, 2006.  We look forward to a great conference this
>year. Please consult
>The Official BOSC 2006 Website at:
>
>http://www.open-bio.org/wiki/BOSC_2006
>
>for more details and information.
> 
>
>In addition, a BOSC weblog has been setup to make it easier to
>desiminate all BOSC
>related announcements:
>
>http://wiki.open-bio.org/boscblog/
>
>And if you have an ICAL compatible Calendar, there is an EventDB
>calendar set up with all
>BOSC related deadlines.
>
>http://eventful.com/groups/G0-001-000014747-0
>
>More information about ISMB can be found at the Official ISMB 2006 Website:
>
>http://ismb2006.cbi.cnptia.embrapa.br/
>
>Thank You, and we look forward to seeing you all,
>
>The BOSC Organizing Committee.
>
>  
>


-- 

Iddo Friedberg, Ph.D.
Burnham Institute for Medical Research
10901 N. Torrey Pines Rd.
La Jolla, CA 92037
Tel: (858) 646 3100 x3516
Fax: (858) 713 9949
http://iddo-friedberg.org
http://BioFunctionPrediction.org


From jelle.feringa at ezct.net  Fri May  5 09:17:27 2006
From: jelle.feringa at ezct.net (Jelle Feringa / EZCT Architecture & Design Research)
Date: Fri, 5 May 2006 11:17:27 +0200
Subject: [Biopython-dev] issues compiling KDTree.cpp / win32
Message-ID: <009501c67024$baa13b20$0b01a8c0@JELLE>

 
Hi,

 
I know there have been some issues compiling KDTree.cpp and have made a
brave effort as well (not necessarily a trivial thing to do for the compiler
challenged. ) but haven't been able to compile its successfully. Would
anyone more succesfull in compiling the _CKDTree.pyd module be so kind to
share it with me? I'm pretty desperate for a well implemented kdtree to be
frank.

 
Apart from that, I'm putting together a python module for scripting Rhino, a
terrific nurbs CAD modeler.

It would be great to include the kdtree module into this module. Giving the
appropriate credits would I be allowed to do so? 

 
Many thanks in advance,

 
-jelle

 
From mdehoon at c2b2.columbia.edu  Tue May  9 23:59:55 2006
From: mdehoon at c2b2.columbia.edu (Michiel Jan Laurens de Hoon)
Date: Tue, 09 May 2006 16:59:55 -0700
Subject: [Biopython-dev] new website & documentation
Message-ID: <44612CFB.7030008@c2b2.columbia.edu>

Hi everybody,

As you may know, the Biopython website is transitioning to a new server 
of the Open Bioinformatics Foundation. The OBF folks suggested that we 
use a wiki-based website (just like bioperl and biojava) instead of 
quixote, which we have been using so far. A prototype wiki-based 
Biopython website is now running at 
http://biopython.open-bio.org/wiki/Biopython. Feel free to contribute to 
this website as you see fit. Don't be shy.

Now the question is whether the complete Biopython documentation should 
be wiki-based. For an example, see the Bioperl HOWTOs. This would make 
it easier to update the Biopython documentation, and hopefully result in 
better and more extensive documentation (which wouldn't hurt). On the 
downside, we'd lose the PDFs. (Or is it possible to generate PDFs from 
the wiki website? Wiki gurus, let us know). Another option would be to 
convert the generic Biopython documentation to wiki (e.g., the 
tutorial), but keep specialized modules in the current format.

Opinions?

--Michiel.


From biopython-dev at maubp.freeserve.co.uk  Wed May 10 14:24:17 2006
From: biopython-dev at maubp.freeserve.co.uk (Peter (BioPython Dev))
Date: Wed, 10 May 2006 15:24:17 +0100
Subject: [Biopython-dev] Biopython for GDS SOFT?
In-Reply-To: <E30B6CAA-9047-4B53-BFA2-A70770C64710@upf.edu>
References: <E30B6CAA-9047-4B53-BFA2-A70770C64710@upf.edu>
Message-ID: <4461F791.5020607@maubp.freeserve.co.uk>

Ramon Aragues wrote:
> Hi,
> 
> I-ve seen your post on the bipython discussion list (from 2005) about  
> GDS SOFT files.
> 
> Has any work been done on it since then? Can I use biopython for  
> parsing GDS SOFT files?

Yes, I checked in some revisions to the Bio/Geo files in Jan 2006, and 
it seemed to work for me.

http://cvs.biopython.org/cgi-bin/viewcvs/viewcvs.cgi/biopython/Bio/Geo/?cvsroot=biopython

I also added the examples the NCBI provided to document their 2005 file 
format changes to the BioPython GEO test suite.

I was only interested in some very basic exploration at the time, and 
then moved on to using Sean Davis' GEOquery for R/BioConductor instead. 
  I found this made statistical analysis and visualisation much easier.

In any event, looking at GDS SOFT files was rather an aside to my main 
research interests...

> I am already using biopython in my framework ( http://sbi.imim.es/piana 
> ) so it would be great if I can use biopython for this as well.

This should be possible once you update the Bio/Geo files to those from 
CVS or the next release of BioPython.  Please let us (me) know how you 
get on...

However, I think you will be on your own in terms of statistical data 
analysis with python.

> Cheers!
> 
> Ramon

Good luck

Peter


From bill at barnard-engineering.com  Mon May 15 16:22:30 2006
From: bill at barnard-engineering.com (Bill Barnard)
Date: Mon, 15 May 2006 09:22:30 -0700
Subject: [Biopython-dev] "Online" tests, was [Bug 1972]
In-Reply-To: <442D93EE.9040707@maubp.freeserve.co.uk>
References: <200603210825.k2L8Pg3S006352@portal.open-bio.org>
	<441FEEE6.8040402@maubp.freeserve.co.uk>
	<1143045514.813.26.camel@tioga.barnard-engineering.com>
	<442403DE.7090607@biopython.org>
	<1143750836.5736.27.camel@tioga.barnard-engineering.com>
	<442D93EE.9040707@maubp.freeserve.co.uk>
Message-ID: <1147710150.1517.37.camel@tioga.barnard-engineering.com>

On Fri, 2006-03-31 at 21:41 +0100, Peter (BioPython Dev) wrote:
> Bill Barnard wrote:
> > I've made a first cut unit test, tentatively named
> > test_Parsers_for_newest_formats, ...
> 
> Sounds good to me.  But not a very snappy name - how about something 
> shorter like test_OnlineFormats.py instead?

Works for me.

> I think the Blast test should actually submit a short protein/nucleotide 
> sequence known to be in the online database.  Maybe do some basic sanity 
> testing like check it returns at least N results and the best hit is at 
> least a certain score.

I have one such test working.

> 
> >>In some cases (e.g. GenBank, Fasta) once the sample file is downloaded 
> >>there are multiple parsers to be checked (e.g. record and feature parsers).

I have tests using EUtils that check the record and feature parsers for
GenBank and Fasta files.

> I'll volunteer to add cases for GenBank, Fasta and GEO files.

I've not yet looked at GEO files, so there's that yet to do. I expect to
have some free time soon so I may look at the GEO and other parsers at
that time. I'd certainly be interested in any feedback you have on these
tests, or to see additional test cases. Finding the parsers to test and
implementing the tests seems pretty straightforward.

I've attached the test code to this email, and added a .txt file
extension just in case the listserv won't allow attaching a code file.

Bill

p.s. I'm still looking for a more interesting project I can contribute
to Biopython. Doing some of the tests and such is useful to me in
learning my way around the codebase; I hope to find something requiring
a bit more creativity. If anyone has any suggestions about areas that
need some attention I'll happily consider them. I am considering writing
some code for HMMs that would build on some prototype code I wrote.
-------------- next part --------------
"""Test to make sure all parsers retrieve and read current
data formats.
"""
import requires_internet

import sys
import unittest

from Bio import ParserSupport
# ExpasyTest
from Bio import Prosite
from Bio.Prosite import Prodoc
from Bio.SwissProt import SProt
# PubMedTest
from Bio import PubMed, Medline
# EUtilsTest
from Bio import EUtils
from Bio.EUtils import DBIdsClient
from Bio import GenBank
from Bio import Fasta
# BlastTest
try:
    import cStringIO as StringIO
except ImportError:
    import StringIO
from Bio.Blast import NCBIWWW, NCBIXML

def run_tests(argv):
    test_suite = testing_suite()
    runner = unittest.TextTestRunner(sys.stdout, verbosity = 2)
    runner.run(test_suite)

def testing_suite():
    """Generate the suite of tests.
    """
    test_suite = unittest.TestSuite()

    test_loader = unittest.TestLoader()
    test_loader.testMethodPrefix = 't_'
    tests = [ExpasyTest, PubMedTest, EUtilsTest, BlastTest]
    
    for test in tests:
        cur_suite = test_loader.loadTestsFromTestCase(test)
        test_suite.addTest(cur_suite)

    return test_suite

class ExpasyTest(unittest.TestCase):
    """Test that parsers can read the current Expasy database formats
    """
    def setUp(self):
        pass
    def t_parse_prosite_record(self):
        """Retrieve a Prosite record and parse it
        """
        prosite_dict = Prosite.ExPASyDictionary(parser=Prosite.RecordParser())
        accession = 'PS00159'
        entry = prosite_dict[accession]
        self.assertEqual(entry.accession, accession)
    def t_parse_prodoc_record(self):
        """Retrieve a Prodoc record and parse it
        """
        prodoc_dict = Prodoc.ExPASyDictionary(parser=Prodoc.RecordParser())
        accession = 'PDOC00933'
        entry = prodoc_dict[accession]
        self.assertEqual(entry.accession, accession)
    def t_parse_sprot_record(self):
        """Retrieve a SwissProt record and parse it into Record format
        """
        sprot_record_dict = SProt.ExPASyDictionary(parser=SProt.RecordParser())
        accession = 'Q5TYW8'
        entry = sprot_record_dict[accession]
        self.failUnless(accession in entry.accessions)
    def t_parse_sprot_seq(self):
        """Retrieve a SwissProt record and parse it into Sequence format
        """
        sprot_seq_dict = SProt.ExPASyDictionary(parser=SProt.SequenceParser())
        accession = 'Q5TYW8'
        entry = sprot_seq_dict[accession]
        self.assertEqual(entry.id, accession)

class PubMedTest(unittest.TestCase):
    """Test that Medline parsers can read the current PubMed format
    """
    def setUp(self):
        pass
    def t_parse_pubmed_record(self):
        """Retrieve a PubMed record and parse it
        """
        pubmed_dict = PubMed.Dictionary(parser=Medline.RecordParser())
        pubmed_id = '3136164'
        entry = pubmed_dict[pubmed_id]
        self.assertEqual(entry.pubmed_id, pubmed_id)

class EUtilsTest(unittest.TestCase):
    """Test that GenBank, Fasta parsers can read  EUtils retrieved db formats
    """
    # Database primary IDs

    # http://eutils.ncbi.nlm.nih.gov/entrez/eutils/einfo.fcgi?
    # lists all E-Utility database names. Primary IDs, which are
    # always an integer, are listed below for a few databases:
        
    # Entrez Database   Primary ID      E-Utility Database Name
    # 3D Domains        3D SDI          domains
    # Domains           PSSM-ID         cdd
    # Genome            Genome ID       genome
    # Nucleotide        GI number       nucleotide
    # OMIM              MIM number      omim
    # PopSet            Popset ID       popset
    # Protein           GI number       protein
    # ProbeSet          GEO ID          geo
    # PubMed            PMID            pubmed
    # Structure         MMDB ID         structure
    # SNP               SNP ID          snp
    # Taxonomy          TAXID           taxonomy
    # UniGene           UniGene ID      unigene
    # UniSTS            UniSTS ID       unists

    # see Bio.EUtils.Config.py for supported EUtils databases

    def setUp(self):
        self.client = DBIdsClient.DBIdsClient()
    def t_parse_nucleotide_gb(self):
        """Use EUtils to retrieve and parse an NCBI nucleotide GenBank record
        """
        db = "nucleotide"
        gi = "57165207"
        result = self.client.search(gi, db, retmax=1)
        sfp = result[0].efetch(retmode="text", rettype="gb", \
                               seq_start=5458145, seq_stop=5458942)
        text = sfp.readlines()
        sfp.close()
        self.assertEqual(result.dbids.db, db) # sanity check, not a parser test
        locus_line = text[0]
        # now parse the text with GenBank parsers
        parser_input = StringIO.StringIO(''.join(text))
        # RecordParser
        parser = GenBank.RecordParser()
        record = parser.parse(parser_input)
        parser_input.reset()
        self.assertEqual(record.gi, gi)
        # split locus line to eliminate whitespace differences
        self.assertEqual(record._locus_line().split(), locus_line.split())
        # FeatureParser
        parser = GenBank.FeatureParser()
        record = parser.parse(parser_input)
        parser_input.reset()
        self.assertEqual(record.annotations['gi'], gi)
    def t_parse_protein_fasta(self):
        """Use EUtils to retrieve and parse an NCBI protein Fasta record
        """
        db = "protein"
        gi = "21220767"
        result = self.client.search(gi, db, retmax=1)
        sfp = result[0].efetch(retmode="text", rettype="fasta")
        text = sfp.readlines()
        sfp.close()
        self.assertEqual(result.dbids.db, db) # sanity check, not a parser test
        # now parse the text with Fasta parsers
        parser_input = StringIO.StringIO(''.join(text))
        # RecordParser
        parser = Fasta.RecordParser()
        record = parser.parse(parser_input)
        parser_input.reset()
        self.assert_(gi in record.title.split('|'))
        self.assertEqual(record.title, text[0][1:-1]) # strip > and \n from text[0]
        # SequenceParser
        parser = Fasta.SequenceParser()
        record = parser.parse(parser_input)
        parser_input.reset()
        self.assert_(gi in record.description.split('|'))
        self.assertEqual(record.description, text[0][1:-1]) # strip > and \n from text[0]

class BlastTest(unittest.TestCase):
    """Test that the Blast XML parser can read the current NCBI formats
    """
    def setUp(self):
        pass
    def t_parse_blast_xml(self):
        """Use NCBIWWW to retrieve Blast results and use NCBIXML to parse it
        """
        query = '''>sp|P09405|NUCL_MOUSE Nucleolin (Protein C23) - Mus musculus (Mouse).
VKLAKAGKTHGEAKKMAPPPKEVEEDSEDEEMSEDEDDSSGEEEVVIPQKKGKKATTTPA
KKVVVSQTKKAAVPTPAKKAAVTPGKKAVATPAKKNITPAKVIPTPGKKGAAQAKALVPT'''
        result_handle = NCBIWWW.qblast('blastp', 'swissprot', \
                                       query, expect=0.0001, format_type="XML")
        blast_results = result_handle.read()
        result_handle.close()
        blast_out = StringIO.StringIO(blast_results)
        parser = NCBIXML.BlastParser()
        b_record = parser.parse(blast_out)
        ### write output file for testing purposes ~35 kB
##        import os
##        save_fname = os.path.expanduser('~/tmp/blast_out.xml')
##        save_file = open(save_fname, 'w')
##        save_file.write(blast_results)
##        save_file.close()
        ###
        # When I ran this on 4 May 2006 I got max_score of 601 & 21 hits
        expected_score_cutoff = 600
        expected_min_hits = 20
        max_score = 0.0
        for alignment in b_record.alignments:
            for hsp in alignment.hsps:
                if hsp.score > max_score: max_score = hsp.score

        msg = 'max score (%g) < expected_score_cutoff(%g)' % \
              (max_score, expected_score_cutoff)
        self.assert_(max_score >= expected_score_cutoff, msg)

        msg = 'N(%d) < expected_min_hits(%d)' % \
              (len(b_record.alignments), expected_min_hits)
        self.assert_(len(b_record.alignments) >= expected_min_hits, msg)

if __name__ == "__main__":
    sys.exit(run_tests(sys.argv))

From biopython-dev at maubp.freeserve.co.uk  Mon May 15 17:25:13 2006
From: biopython-dev at maubp.freeserve.co.uk (Peter (BioPython-dev))
Date: Mon, 15 May 2006 18:25:13 +0100
Subject: [Biopython-dev] "Online" tests, was [Bug 1972]
In-Reply-To: <1147710150.1517.37.camel@tioga.barnard-engineering.com>
References: <200603210825.k2L8Pg3S006352@portal.open-bio.org>	<441FEEE6.8040402@maubp.freeserve.co.uk>	<1143045514.813.26.camel@tioga.barnard-engineering.com>	<442403DE.7090607@biopython.org>	<1143750836.5736.27.camel@tioga.barnard-engineering.com>	<442D93EE.9040707@maubp.freeserve.co.uk>
	<1147710150.1517.37.camel@tioga.barnard-engineering.com>
Message-ID: <4468B979.4090504@maubp.freeserve.co.uk>

>>>>In some cases (e.g. GenBank, Fasta) once the sample file is downloaded 
>>>>there are multiple parsers to be checked (e.g. record and feature parsers).
> 
> I have tests using EUtils that check the record and feature parsers for
> GenBank and Fasta files.

You don't check the iteration over files with multiple records... that 
shouldn't be a problem as other than changing the number of blank lines 
between records I can't think of anything the NCBI might change, but you 
never know.

>>I'll volunteer to add cases for GenBank, Fasta and GEO files.
> 
> I've not yet looked at GEO files, so there's that yet to do. I expect to
> have some free time soon so I may look at the GEO and other parsers at
> that time.

Checking the basic "did it load something with the expected ID" 
shouldn't be too tricky for GEO Soft files.

 > I'd certainly be interested in any feedback you have on these
> tests, or to see additional test cases. Finding the parsers to test and
> implementing the tests seems pretty straightforward.

You are only testing XML parsing with blastp - do you think its worth 
including tests of any of the other variants?  Especially the slightly 
more exotic ones offered by the NBCI.

Also, you are only submitting a single query, so only a single blast 
result is returned.  Can multiple queries be submitted as this would 
give you an excuse to test the Blast iterator support.

Loading a PDB file strikes me as another fairly simple one to include. 
If you can get a link to the "PDB file of the month" then even better.

Are there any "online" phylogenetic tree resources we should use to 
valid ate the new Nexus parser?

> I've attached the test code to this email, and added a .txt file
> extension just in case the listserv won't allow attaching a code file.

I can see the test code file.  I haven't run it yet.

Thanks Bill.

Peter


From mcolosimo at mitre.org  Mon May 15 20:14:39 2006
From: mcolosimo at mitre.org (Marc Colosimo)
Date: Mon, 15 May 2006 16:14:39 -0400
Subject: [Biopython-dev] DNA Strider like frame translation
Message-ID: <0F965CC3-12E2-41C9-B725-B4D90538F5B1@mitre.org>

All,

I just spent some time writing this up and I thought I'll share it  
with everyone (and hopefully have it committed). Yes, there is all  
ready is something similar under Bio.SeqUtils  
(six_frame_translations). However, mine actually looks identical to  
DNA Striders for both 3/6-frame single letter translations. It also  
has the power to do -1 and 1 frame, or 1 frame, or 1 and 3 frame,  
etc... translations. If I get around to it, I might add support for  
three letter amino acids.

Marc

p.s. hopefully this gets through.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: frameTranslations.py
Type: text/x-python-script
Size: 5489 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/biopython-dev/attachments/20060515/c4e9e1b6/attachment-0002.bin>

From idoerg at burnham.org  Mon May 22 17:40:52 2006
From: idoerg at burnham.org (Iddo Friedberg)
Date: Mon, 22 May 2006 10:40:52 -0700
Subject: [Biopython-dev] [Fwd: Re: BOSC 2006 2nd Call for Papers]
Message-ID: <4471F7A4.4050203@burnham.org>


>  
>
>
> ------------------------------------------------------------------------
>
> Subject:
> BOSC 2006 2nd Call for Papers
> From:
> Darin London <darin.london at duke.edu>
> Date:
> Mon, 22 May 2006 11:29:45 -0400
> To:
> bosc at open-bio.org
>
> To:
> bosc at open-bio.org
> CC:
> Authors at lists.open-bio.org, BioBiz at lists.open-bio.org,
> Biocorba-announce-l at lists.open-bio.org, Biocorba-l at lists.open-bio.org,
> Biograph at lists.open-bio.org, bioinfo-core at lists.open-bio.org,
> biojava-dev at lists.open-bio.org, Biojava-l at lists.open-bio.org,
> bioped-l at lists.open-bio.org, Bioperl-announce-l at lists.open-bio.org,
> Bioperl-l at lists.open-bio.org, bioperl-microarray at lists.open-bio.org,
> bioperl-pipeline at lists.open-bio.org, BioPython at lists.open-bio.org,
> BioPython-announce at lists.open-bio.org,
> Biopython-dev at lists.open-bio.org, BioRuby at lists.open-bio.org,
> BioRuby-ja at lists.open-bio.org, Biosoap-l at lists.open-bio.org,
> BioSQL-l at lists.open-bio.org, BP-announce at lists.open-bio.org,
> DAS at lists.open-bio.org, DAS-announce at lists.open-bio.org,
> DAS2 at lists.open-bio.org, Dynamite at lists.open-bio.org,
> EMBOSS at lists.open-bio.org, emboss-announce at lists.open-bio.org,
> emboss-dev at lists.open-bio.org, Moby-announce at lists.open-bio.org,
> MOBY-dev at lists.open-bio.org, moby-l at lists.open-bio.org,
> obf-developers at lists.open-bio.org, Ontologies at lists.open-bio.org,
> Open-bio-announce at lists.open-bio.org, Open-Bio-l at lists.open-bio.org,
> Open-Bioinformatics-Foundation at lists.open-bio.org
>
>
>2nd CALL FOR SPEAKERS
>
>This is the second and last official call for speakers to submit their
>abstracts to speak at  BOSC 2006
>in Fortaleza, Brasil.  In order to be considered as a potential speaker,
>an abstract must be recieved by
>Monday, June 5th, 2006.  We look forward to a great conference this
>year. Please consult
>The Official BOSC 2006 Website at:
>
>http://www.open-bio.org/wiki/BOSC_2006
>
>for more details and information.
> 
>
>In addition, a BOSC weblog has been setup to make it easier to
>desiminate all BOSC
>related announcements:
>
>http://wiki.open-bio.org/boscblog/
>
>And if you have an ICAL compatible Calendar, there is an EventDB
>calendar set up with all
>BOSC related deadlines.
>
>http://eventful.com/groups/G0-001-000014747-0
>
>More information about ISMB can be found at the Official ISMB 2006 Website:
>
>http://ismb2006.cbi.cnptia.embrapa.br/
>
>Thank You, and we look forward to seeing you all,
>
>The BOSC Organizing Committee.
>
>  
>


-- 

Iddo Friedberg, Ph.D.
Burnham Institute for Medical Research
10901 N. Torrey Pines Rd.
La Jolla, CA 92037
Tel: (858) 646 3100 x3516
Fax: (858) 713 9949
http://iddo-friedberg.org
http://BioFunctionPrediction.org