From golubchi at stats.ox.ac.uk Wed Sep 4 06:41:54 2013 From: golubchi at stats.ox.ac.uk (Tanya Golubchik) Date: Wed, 04 Sep 2013 11:41:54 +0100 Subject: [Biopython] Memory use - alignment formats Message-ID: <52270E72.2090306@stats.ox.ac.uk> Hello, I'm looking for the most memory-efficient way to write a large number of very long sequences (several Mb each) to a file. This works easily with a generator passed to SeqIO.write if I'm writing in a sequential format like multifasta, but what about, say, phylip? It is better/equivalent to convert the alignment to a list first (obviously using a lot of memory in the process), or to write to a multifasta file, then use SeqIO.convert? Thanks, Tanya From p.j.a.cock at googlemail.com Thu Sep 5 12:16:19 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Thu, 5 Sep 2013 17:16:19 +0100 Subject: [Biopython] Memory use - alignment formats In-Reply-To: <52270E72.2090306@stats.ox.ac.uk> References: <52270E72.2090306@stats.ox.ac.uk> Message-ID: Hi Tanya, For any alignment based format SeqIO will call AlignIO, which means it will load all the records into memory at once to build a MSA object which holds a list of all the SeqRecord objects in memory. SeqIO handles FASTA files itself so doesn't do this. There is no simple answer for your specific need with PHYLIP format - potential something more memory efficient could be done for the non-interlaced PHYLIP formats... If you can work with FASTA and SeqIO instead, that would be best. Peter On Wed, Sep 4, 2013 at 11:41 AM, Tanya Golubchik wrote: > Hello, > > I'm looking for the most memory-efficient way to write a large number of > very long sequences (several Mb each) to a file. This works easily with a > generator passed to SeqIO.write if I'm writing in a sequential format like > multifasta, but what about, say, phylip? > > It is better/equivalent to convert the alignment to a list first (obviously > using a lot of memory in the process), or to write to a multifasta file, > then use SeqIO.convert? > > Thanks, > Tanya > _______________________________________________ > Biopython mailing list - Biopython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython From Gerard.Schaafsma at med.lu.se Fri Sep 6 03:38:33 2013 From: Gerard.Schaafsma at med.lu.se (Gerard Schaafsma) Date: Fri, 06 Sep 2013 09:38:33 +0200 Subject: [Biopython] parsing Entrez SNP XML files Message-ID: <1378453112.9730.18.camel@gerard-desktop> Hi, I am trying to parse XML files which I downloaded from the NCBI site (ftp://ftp.ncbi.nih.gov/snp/organisms/human_9606/XML/) containing records from the SNP (dbSNP) database. When I do: import sys from Bio import Entrez handle = open(xmlFile) records = Entrez.parse(handle) for record in records: for k, v in record.items(): print k, v I get the following error message: NotImplementedError: The Bio.Entrez parser cannot handle XML data that make use of XML namespaces I am using Biopython 1.62 on a PC with Linux 3.2.0-52-generic x86_64 GNU/Linux Looking for this error message showed that it might have something to do with the DTD files from NCBI, but since I am using the newest Biopython version, I would expect these to be OK. Moreover, in the first 2 lines of the XML file there is no mention of any DTD file, just: Anyone with the same problem, and a solution? Best regards, Gerard -- Gerard Schaafsma Lund University Department of Experimental Medical Science Protein Structure and Bioinformatics Group Hs 66, BMC D10 Box 117 22100 Lund Sweden From p.j.a.cock at googlemail.com Fri Sep 6 04:42:22 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Fri, 6 Sep 2013 09:42:22 +0100 Subject: [Biopython] parsing Entrez SNP XML files In-Reply-To: <1378453112.9730.18.camel@gerard-desktop> References: <1378453112.9730.18.camel@gerard-desktop> Message-ID: On Fri, Sep 6, 2013 at 8:38 AM, Gerard Schaafsma wrote: > Hi, > > I am trying to parse XML files which I downloaded from the NCBI site > (ftp://ftp.ncbi.nih.gov/snp/organisms/human_9606/XML/) containing > records from the SNP (dbSNP) database. > > When I do: > > import sys > from Bio import Entrez > > handle = open(xmlFile) > records = Entrez.parse(handle) > > for record in records: > for k, v in record.items(): > print k, v > > I get the following error message: > > NotImplementedError: The Bio.Entrez parser cannot handle XML data that > make use of XML namespaces Yes, sadly unlike most of the NCBI XML files, for dbSNP they don't provide a DTD file describing the object model, and the Bio.Entrez parser requires that: http://bugzilla.open-bio.org/show_bug.cgi?id=2771 Unless the NCBI change this, you will have to use an alternative XML parser - Python comes with several including ElementTree which is quite popular. Peter From mjldehoon at yahoo.com Fri Sep 6 07:37:52 2013 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Fri, 6 Sep 2013 04:37:52 -0700 (PDT) Subject: [Biopython] parsing Entrez SNP XML files In-Reply-To: References: <1378453112.9730.18.camel@gerard-desktop> Message-ID: <1378467472.95535.YahooMailNeo@web164003.mail.gq1.yahoo.com> It's a bit more complicated than that. Bio.Entrez can parse XML files that come with a DTD, which is the vast majority of XML files from NCBI Entrez. Apparently the dbSNP database uses an XML Schema instead of a DTD, so Bio.Entrez would need a parser for an XML Schema to be able to parse XML files from dbSNP. I won't be able to look into this, but any volunteers are strongly encouraged. Best, -Michiel. ________________________________ From: Peter Cock To: Gerard Schaafsma Cc: Biopython Mailing List Sent: Friday, September 6, 2013 5:42 PM Subject: Re: [Biopython] parsing Entrez SNP XML files On Fri, Sep 6, 2013 at 8:38 AM, Gerard Schaafsma wrote: > Hi, > > I am trying to parse XML files which I downloaded from the NCBI site > (ftp://ftp.ncbi.nih.gov/snp/organisms/human_9606/XML/) containing > records from the SNP (dbSNP) database. > > When I do: > > import sys > from Bio import Entrez > > handle = open(xmlFile) > records = Entrez.parse(handle) > > for record in records: >? for k, v in record.items(): >? ? print k, v > > I get the following error message: > > NotImplementedError: The Bio.Entrez parser cannot handle XML data that > make use of XML namespaces Yes, sadly unlike most of the NCBI XML files, for dbSNP they don't provide a DTD file describing the object model, and the Bio.Entrez parser requires that: http://bugzilla.open-bio.org/show_bug.cgi?id=2771 Unless the NCBI change this, you will have to use an alternative XML parser - Python comes with several including ElementTree which is quite popular. Peter _______________________________________________ Biopython mailing list? -? Biopython at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biopython From cacaucenturion2 at gmail.com Fri Sep 6 08:42:04 2013 From: cacaucenturion2 at gmail.com (cacaucenturion2) Date: Fri, 6 Sep 2013 20:42:04 +0800 Subject: [Biopython] Question about Nexus Format Converting Message-ID: <201309062042038283734@gmail.com> Hi all, Does anyone know if it is possible to convert interleaved nexus format file to non-interleaved nexus format file in Biopython? Thanks! Sincerely yours, Cacau From winda002 at student.otago.ac.nz Sat Sep 7 01:53:41 2013 From: winda002 at student.otago.ac.nz (David Winter) Date: Sat, 7 Sep 2013 05:53:41 +0000 Subject: [Biopython] Question about Nexus Format Converting In-Reply-To: <201309062042038283734@gmail.com> References: <201309062042038283734@gmail.com> Message-ID: <9dc9a3c3bc964a66bcae6f0e6ca0b361@SINPR03MB155.apcprd03.prod.outlook.com> Hi cacaucenturion, The Nexus module will let you do this. You'd create an empty nexus object, read your existing file into, then write out the nexus object specifying that you want a non-interleaved format (which is actually the default): from Bio.Nexus import Nexus new_ali = Nexus.Nexus() new_ali.read("../biopython/Tests/Nexus/test_Nexus_input.nex") #interleaved nexus file in test suite new_ali.write_nexus_data("test_sequential.nex", interleave=False) Hope that helps, David _____________________ From: biopython-bounces at lists.open-bio.org on behalf of cacaucenturion2 Sent: Saturday, 7 September 2013 12:42 a.m. To: biopython Subject: [Biopython] Question about Nexus Format Converting Hi all, Does anyone know if it is possible to convert interleaved nexus format file to non-interleaved nexus format file in Biopython? Thanks! Sincerely yours, Cacau From p.j.a.cock at googlemail.com Sat Sep 7 07:41:53 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Sat, 7 Sep 2013 12:41:53 +0100 Subject: [Biopython] Print statements vs functions (Python 2 vs 3) Message-ID: Dear Biopythoneers, As you will be aware, with our recent release of Biopython 1.62 we now officially support Python 3 for the first time (specifically Python 3.3 - we don't recommend 3.0, 3.1 or 3.2 here), while continuing to support Python 2 as well. Currently all our documentation is written assuming Python 2, but with some small changes most things can be written to work under both variants. The most visible change is how to print things, and that happens a lot in our examples. I would like us to switch to using the Python 3 style print function in our documentation (including the Tutorial and the docstrings embedded in the code as help text). In the simplest case, this Python 2 only code: >>> print variable would be changed to this using Python 3 style: >>> print(variable) Luckily that will also work on Python 2 as well. For more complicated examples to use the print function under Python 2 you must add this import line to the start of your file: >>> from __future__ import print_function For example, this Python 2 only example: >>> print "Two plus two is", 4 becomes: >>> from __future__ import print_function # at start >>> ... >>> print("Two plus two is", 4) or more elegantly, >>> print("Two plus two is %i" % 4) Would anyone object to us using the print function style in the Biopython documentation? I'm particularly keen to hear from beginners - as this is potentially confusing. Thanks, Peter. From p.j.a.cock at googlemail.com Sat Sep 7 09:52:54 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Sat, 7 Sep 2013 14:52:54 +0100 Subject: [Biopython] Print statements vs functions (Python 2 vs 3) In-Reply-To: References: Message-ID: On Sat, Sep 7, 2013 at 12:41 PM, Peter Cock wrote: > Dear Biopythoneers, > > As you will be aware, with our recent release of Biopython 1.62 > we now officially support Python 3 for the first time (specifically > Python 3.3 - we don't recommend 3.0, 3.1 or 3.2 here), while > continuing to support Python 2 as well. > > Currently all our documentation is written assuming Python 2, > but with some small changes most things can be written to > work under both variants. The most visible change is how to > print things, and that happens a lot in our examples. > > I would like us to switch to using the Python 3 style print > function in our documentation (including the Tutorial and > the docstrings embedded in the code as help text). > > In the simplest case, this Python 2 only code: > >>>> print variable > > would be changed to this using Python 3 style: > >>>> print(variable) > > Luckily that will also work on Python 2 as well. For more > complicated examples to use the print function under > Python 2 you must add this import line to the start of > your file: > >>>> from __future__ import print_function > > For example, this Python 2 only example: > >>>> print "Two plus two is", 4 > > becomes: > >>>> from __future__ import print_function # at start >>>> ... >>>> print("Two plus two is", 4) > > or more elegantly, > >>>> print("Two plus two is %i" % 4) > > Would anyone object to us using the print function style > in the Biopython documentation? > > I'm particularly keen to hear from beginners - as this > is potentially confusing. > > Thanks, > > Peter. I tweeted this email, Biopython Project (@Biopython): Would anyone object to us using #Python3 print function style in the #Biopython documentation? http://lists.open-bio.org/pipermail/biopython/2013-September/008751.html https://twitter.com/Biopython/status/376309705972654080 Two replies already: Raphael Mattos (@rsmattos): @Biopython I think it's time.... https://twitter.com/rsmattos/status/376321218456338432 Alec Munro (@alecmunro): @Biopython do it! https://twitter.com/alecmunro/status/376341224544038912 Peter From dtomso at agbiome.com Sat Sep 7 09:50:18 2013 From: dtomso at agbiome.com (Dan Tomso) Date: Sat, 7 Sep 2013 13:50:18 +0000 Subject: [Biopython] Print statements vs functions (Python 2 vs 3) In-Reply-To: References: Message-ID: Hi, Peter. This sounds OK to me. Regarding the overall transition to Python 3--is there some sort of master plan with phasing of all the changes? It might be helpful to have some longer-term context on any roll out. Best regards, Dan Tomso -------- Original message -------- From: Peter Cock Date: 09/07/2013 7:42 AM (GMT-05:00) To: Biopython Mailing List Subject: [Biopython] Print statements vs functions (Python 2 vs 3) Dear Biopythoneers, As you will be aware, with our recent release of Biopython 1.62 we now officially support Python 3 for the first time (specifically Python 3.3 - we don't recommend 3.0, 3.1 or 3.2 here), while continuing to support Python 2 as well. Currently all our documentation is written assuming Python 2, but with some small changes most things can be written to work under both variants. The most visible change is how to print things, and that happens a lot in our examples. I would like us to switch to using the Python 3 style print function in our documentation (including the Tutorial and the docstrings embedded in the code as help text). In the simplest case, this Python 2 only code: >>> print variable would be changed to this using Python 3 style: >>> print(variable) Luckily that will also work on Python 2 as well. For more complicated examples to use the print function under Python 2 you must add this import line to the start of your file: >>> from __future__ import print_function For example, this Python 2 only example: >>> print "Two plus two is", 4 becomes: >>> from __future__ import print_function # at start >>> ... >>> print("Two plus two is", 4) or more elegantly, >>> print("Two plus two is %i" % 4) Would anyone object to us using the print function style in the Biopython documentation? I'm particularly keen to hear from beginners - as this is potentially confusing. Thanks, Peter. _______________________________________________ Biopython mailing list - Biopython at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biopython From p.j.a.cock at googlemail.com Sat Sep 7 12:25:44 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Sat, 7 Sep 2013 17:25:44 +0100 Subject: [Biopython] Print statements vs functions (Python 2 vs 3) In-Reply-To: References: Message-ID: On Sat, Sep 7, 2013 at 2:50 PM, Dan Tomso wrote: > Hi, Peter. > > This sounds OK to me. Thanks Dan. And another voice of approval on Twitter: Karin Lagesen (@karinlag): @Biopython @pjacock Go for it! https://twitter.com/karinlag/status/376356704080105472 > Regarding the overall transition to Python 3 -- is there some sort > of master plan with phasing of all the changes? > It might be helpful to have some longer-term context on any roll out. I'm not quite sure what you are asking, but this might answer you: http://lists.open-bio.org/pipermail/biopython-dev/2013-May/010633.html http://lists.open-bio.org/pipermail/biopython-dev/2013-September/010880.html Using print functions in the Tutorial seems doable for the next release, Biopython 1.63, if that's what you meant. Regards, Peter From p.j.a.cock at googlemail.com Sat Sep 7 14:12:37 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Sat, 7 Sep 2013 19:12:37 +0100 Subject: [Biopython] Print statements vs functions (Python 2 vs 3) In-Reply-To: References: Message-ID: On Sat, Sep 7, 2013 at 2:52 PM, Peter Cock wrote: > On Sat, Sep 7, 2013 at 12:41 PM, Peter Cock wrote: >> Dear Biopythoneers, >> >> As you will be aware, with our recent release of Biopython 1.62 >> we now officially support Python 3 for the first time (specifically >> Python 3.3 - we don't recommend 3.0, 3.1 or 3.2 here), while >> continuing to support Python 2 as well. >> >> Currently all our documentation is written assuming Python 2, >> but with some small changes most things can be written to >> work under both variants. The most visible change is how to >> print things, and that happens a lot in our examples. >> >> I would like us to switch to using the Python 3 style print >> function in our documentation (including the Tutorial and >> the docstrings embedded in the code as help text). >> >> ... >> >> Would anyone object to us using the print function style >> in the Biopython documentation? >> >> I'm particularly keen to hear from beginners - as this >> is potentially confusing. >> >> Thanks, >> >> Peter. > > I tweeted this email, > > Biopython Project (@Biopython): Would anyone object to us using > #Python3 print function style in the #Biopython documentation? > http://lists.open-bio.org/pipermail/biopython/2013-September/008751.html > https://twitter.com/Biopython/status/376309705972654080 > > Two replies already: > > Raphael Mattos (@rsmattos): @Biopython I think it's time.... > https://twitter.com/rsmattos/status/376321218456338432 > > Alec Munro (@alecmunro): @Biopython do it! > https://twitter.com/alecmunro/status/376341224544038912 > > Peter On Sat, Sep 7, 2013 at 5:25 PM, Peter Cock wrote: > On Sat, Sep 7, 2013 at 2:50 PM, Dan Tomso wrote: >> Hi, Peter. >> >> This sounds OK to me. > > Thanks Dan. And another voice of approval on Twitter: > > Karin Lagesen (@karinlag): @Biopython @pjacock Go for it! > https://twitter.com/karinlag/status/376356704080105472 And another positive voice: Dave Lunt (@davelunt): @Biopython the docs change sounds good, that very clear explanation you link to should also be somewhere obvious https://twitter.com/davelunt/status/376405338511384576 Since there has only been positive reaction, I've made a start at converting the examples in the Tutorial to use the Python 3 style print function (maintaining full Python 2 compatibility under Python 2.6 and 2.7 via the future import): https://github.com/biopython/biopython/commit/34d155a02cbcf7c953fb8238a5412f8c7c0e1cc5 https://github.com/biopython/biopython/commit/74a8b8349b58ae9aa7a727d6e1ab774a4c9008a3 For those curious to see how it looks (but not already familiar with LaTeX, pdflatex and hevea), you can see a sneak preview here: http://biopython.org/DIST/docs/tutorial/Tutorial-dev.html http://biopython.org/DIST/docs/tutorial/Tutorial-dev.pdf (Hopefully those links will once again auto-update every night, something that was working nicely prior to the server move) If you spot any typos, please let us know. Thanks! Peter From carlos.borroto at gmail.com Mon Sep 9 10:35:54 2013 From: carlos.borroto at gmail.com (Carlos Borroto) Date: Mon, 9 Sep 2013 10:35:54 -0400 Subject: [Biopython] Print statements vs functions (Python 2 vs 3) In-Reply-To: References: Message-ID: On Sat, Sep 7, 2013 at 7:41 AM, Peter Cock wrote: > or more elegantly, > >>>> print("Two plus two is %i" % 4) Shouldn't this be done even more elegantly with something like this: >>> print( "Two plus two is {0:d}".format( 4 ) ) --Carlos From p.j.a.cock at googlemail.com Mon Sep 9 10:49:16 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Mon, 9 Sep 2013 15:49:16 +0100 Subject: [Biopython] Print statements vs functions (Python 2 vs 3) In-Reply-To: References: Message-ID: On Mon, Sep 9, 2013 at 3:35 PM, Carlos Borroto wrote: > On Sat, Sep 7, 2013 at 7:41 AM, Peter Cock wrote: >> or more elegantly, >> >>>>> print("Two plus two is %i" % 4) > > Shouldn't this be done even more elegantly with something like this: > >>>> print( "Two plus two is {0:d}".format( 4 ) ) > > --Carlos Personally I find the % version shorter and clearer, but this may reflect my past exposure to C. Where you have more than a couple of place holders, naming them seems a bigger win through. Peter From carlos.borroto at gmail.com Mon Sep 9 10:58:30 2013 From: carlos.borroto at gmail.com (Carlos Borroto) Date: Mon, 9 Sep 2013 10:58:30 -0400 Subject: [Biopython] Print statements vs functions (Python 2 vs 3) In-Reply-To: References: Message-ID: On Mon, Sep 9, 2013 at 10:49 AM, Peter Cock wrote: > On Mon, Sep 9, 2013 at 3:35 PM, Carlos Borroto wrote: >> On Sat, Sep 7, 2013 at 7:41 AM, Peter Cock wrote: >>> or more elegantly, >>> >>>>>> print("Two plus two is %i" % 4) >> >> Shouldn't this be done even more elegantly with something like this: >> >>>>> print( "Two plus two is {0:d}".format( 4 ) ) >> >> --Carlos > > Personally I find the % version shorter and clearer, but this > may reflect my past exposure to C. > > Where you have more than a couple of place holders, > naming them seems a bigger win through. > > Peter I was trying to remember the reason I switched to .format(). I believe it was for better compatibility with future versions of Python. See this[1] stackoverflow answer. Is that correct? Is % being phased out?. I know old habits are hard to kill, but maybe the tutorial should introduce newcomers to future-proof way of doing things. That way they won't have to kill one more "old" habit. [1]http://stackoverflow.com/questions/517355/string-formatting-in-python --Carlos From p.j.a.cock at googlemail.com Mon Sep 9 11:02:59 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Mon, 9 Sep 2013 16:02:59 +0100 Subject: [Biopython] Print statements vs functions (Python 2 vs 3) In-Reply-To: References: Message-ID: On Mon, Sep 9, 2013 at 3:58 PM, Carlos Borroto wrote: > On Mon, Sep 9, 2013 at 10:49 AM, Peter Cock wrote: >> On Mon, Sep 9, 2013 at 3:35 PM, Carlos Borroto wrote: >>> On Sat, Sep 7, 2013 at 7:41 AM, Peter Cock wrote: >>>> or more elegantly, >>>> >>>>>>> print("Two plus two is %i" % 4) >>> >>> Shouldn't this be done even more elegantly with something like this: >>> >>>>>> print( "Two plus two is {0:d}".format( 4 ) ) >>> >>> --Carlos >> >> Personally I find the % version shorter and clearer, but this >> may reflect my past exposure to C. >> >> Where you have more than a couple of place holders, >> naming them seems a bigger win through. >> >> Peter > > I was trying to remember the reason I switched to .format(). I believe > it was for better compatibility with future versions of Python. See > this[1] stackoverflow answer. Is that correct? Is % being phased out?. > I know old habits are hard to kill, but maybe the tutorial should > introduce newcomers to future-proof way of doing things. That way they > won't have to kill one more "old" habit. > > [1]http://stackoverflow.com/questions/517355/string-formatting-in-python > > --Carlos I'm not aware of any plans to withdraw the old string % operator in Python 3, although the implication from PEP3101 is that might eventually happen. http://www.python.org/dev/peps/pep-3101/ Peter From jesshedge at yahoo.co.uk Wed Sep 25 08:14:21 2013 From: jesshedge at yahoo.co.uk (Jessica Hedge) Date: Wed, 25 Sep 2013 13:14:21 +0100 (BST) Subject: [Biopython] Bio.Phyo.Consensus Message-ID: <1380111261.26815.YahooMailNeo@web171905.mail.ir2.yahoo.com> Hello, I'm trying to use the Bio.Phylo.Consensus module in order to simulate bootstrap replicates of a multiple sequence alignment. I have both Biopython (version 1.61) and Phylo module installed but the Consensus module doesn't seem to exist. Is the Consensus module available in another version of Biopython? Many thanks, Jess From p.j.a.cock at googlemail.com Wed Sep 25 09:25:32 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Wed, 25 Sep 2013 14:25:32 +0100 Subject: [Biopython] Bio.Phyo.Consensus In-Reply-To: <1380111261.26815.YahooMailNeo@web171905.mail.ir2.yahoo.com> References: <1380111261.26815.YahooMailNeo@web171905.mail.ir2.yahoo.com> Message-ID: On Wed, Sep 25, 2013 at 1:14 PM, Jessica Hedge wrote: > Hello, > > I'm trying to use the Bio.Phylo.Consensus module in order to > simulate bootstrap replicates of a multiple sequence alignment. > I have both Biopython (version 1.61) and Phylo module > installed but the Consensus module doesn't seem to exist. > > Is the Consensus module available in another version of Biopython? > > Many thanks, > > Jess This is probably a question for Eric (CC'd explicitly). There isn't currently a module with that exact name under Bio.Phylo, but the wiki does mention it - so this is probably just an out of date example reflecting non-final code: http://biopython.org/wiki/Phylo Peter From eric.talevich at gmail.com Thu Sep 26 01:29:29 2013 From: eric.talevich at gmail.com (Eric Talevich) Date: Wed, 25 Sep 2013 22:29:29 -0700 Subject: [Biopython] Bio.Phyo.Consensus In-Reply-To: References: <1380111261.26815.YahooMailNeo@web171905.mail.ir2.yahoo.com> Message-ID: On Wed, Sep 25, 2013 at 6:25 AM, Peter Cock wrote: > On Wed, Sep 25, 2013 at 1:14 PM, Jessica Hedge > wrote: > > Hello, > > > > I'm trying to use the Bio.Phylo.Consensus module in order to > > simulate bootstrap replicates of a multiple sequence alignment. > > I have both Biopython (version 1.61) and Phylo module > > installed but the Consensus module doesn't seem to exist. > > > > Is the Consensus module available in another version of Biopython? > > > > Many thanks, > > > > Jess > > This is probably a question for Eric (CC'd explicitly). > > There isn't currently a module with that exact name > under Bio.Phylo, but the wiki does mention it - so this > is probably just an out of date example reflecting > non-final code: http://biopython.org/wiki/Phylo > > Peter > Ah, sorry for the confusion -- the Bio.Phylo.Consensus function and most of the other things on that page starting at the "Tree Construction" section refer to Yanbo Ye's development branch for Google Summer of Code, which just completed. You can get a copy of this code here: https://github.com/lijax/biopython The work is not yet merged into the main Biopython source tree, and some parts of it could potentially change in the near future. I'll add a note to this effect on the wiki. -Eric From thomas.girke at ucr.edu Sun Sep 29 19:55:19 2013 From: thomas.girke at ucr.edu (Thomas Girke) Date: Sun, 29 Sep 2013 16:55:19 -0700 Subject: [Biopython] Bioinformatics Facility Director Position at UC Riverside Message-ID: <20130929235519.GA1542@Thomas-Girkes-MacBook-Pro.local> POSITION ANNOUNCEMENT The Institute for Integrative Genome Biology (IIGB) at the University of California, Riverside is seeking a Ph.D. level bioinformatician to coordinate its bioinformatics research activities, data analysis workshop program and computing facility. TITLE/RANK Bioinformatics Facility Director. Salary will be competitive and commensurate with accomplishments. LOCATION IIGB at the University of California, Riverside. BACKGROUND Successful candidates will join an innovative and multidisciplinary Institute for Integrative Genome Biology (IIGB) that connects theoretical and experimental researchers from different departments in Life, Physical and Mathematical Sciences, Medicine, Engineering and various campus based Centers. The IIGB is organized around a 10,000 sq.ft. suite of Instrumentation Facilities that serve as a centralized, shared-use resource for faculty, staff and students, offering advanced tools in bioinformatics, microscopy, proteomics and genomics. Its bioinformatic component is equipped with a modern high-performance compute (HPC) infrastructure. QUALIFICATIONS Applicants must have a Ph.D. from a recognized university in bioinformatics; or combined degrees in computer science and a biological science; or a degree in either computer science combined with relevant experience in biological science; or a degree in a biological science combined with relevant experience in computer science. The successful candidate will have at least two years professional hands-on experience with next generation sequence data analysis, and basic knowledge of high-performance computing and cloud computing technologies. Another requirement is several years of professional experience with common programming languages/environments used in scientific data analysis, such as Python, R or C++. Experience with web development frameworks and relational database design will also be a plus. RESPONSIBILITIES The Bioinformatics Facility Director leads IIGB?s bioinformatics facility staff and manages its computational infrastructure. The incumbent will engage in collaborative research activities, contribute to scientific publications, and participate in the preparation of joint grant applications. The teaching expectations include the development of a state-of-the-art workshop program on scientific data analysis. TO APPLY Review of applications will begin October 11, 2013 and continue until the position is filled. Interested individuals should: (1) submit a curriculum vitae, (2) provide a statement of research interests, and (3) arrange to have three letters of reference sent on their behalf. All information should be addressed to: Thomas Girke Institute for Integrative Genome Biology University of California Riverside, CA 92521 thomas.girke at ucr.edu WEBSITE http://facility.bioinformatics.ucr.edu/position-opening POLICY The University of California is an Equal Opportunity/Affirmative Action Employer. In accordance with Federal law, we are making available our Campus Security Report to all prospective employees. From devaniranjan at gmail.com Mon Sep 30 12:02:41 2013 From: devaniranjan at gmail.com (George Devaniranjan) Date: Mon, 30 Sep 2013 12:02:41 -0400 Subject: [Biopython] Generating high seq similar lists Message-ID: Hi, Not directly related to biopython but I am trying to use biopython to generate a substitution matrix. I was looking for a list of PDB, with high sequence similarity (>75%) but less than 90%-so something like that at 1.6A or better resolution to create my own substitution matrix. I looked at http://dunbrack.fccc.edu/PISCES.php But I cannot specify a range for the sequence similarity, any other way I can get a similar list? Thank you, George From golubchi at stats.ox.ac.uk Wed Sep 4 10:41:54 2013 From: golubchi at stats.ox.ac.uk (Tanya Golubchik) Date: Wed, 04 Sep 2013 11:41:54 +0100 Subject: [Biopython] Memory use - alignment formats Message-ID: <52270E72.2090306@stats.ox.ac.uk> Hello, I'm looking for the most memory-efficient way to write a large number of very long sequences (several Mb each) to a file. This works easily with a generator passed to SeqIO.write if I'm writing in a sequential format like multifasta, but what about, say, phylip? It is better/equivalent to convert the alignment to a list first (obviously using a lot of memory in the process), or to write to a multifasta file, then use SeqIO.convert? Thanks, Tanya From p.j.a.cock at googlemail.com Thu Sep 5 16:16:19 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Thu, 5 Sep 2013 17:16:19 +0100 Subject: [Biopython] Memory use - alignment formats In-Reply-To: <52270E72.2090306@stats.ox.ac.uk> References: <52270E72.2090306@stats.ox.ac.uk> Message-ID: Hi Tanya, For any alignment based format SeqIO will call AlignIO, which means it will load all the records into memory at once to build a MSA object which holds a list of all the SeqRecord objects in memory. SeqIO handles FASTA files itself so doesn't do this. There is no simple answer for your specific need with PHYLIP format - potential something more memory efficient could be done for the non-interlaced PHYLIP formats... If you can work with FASTA and SeqIO instead, that would be best. Peter On Wed, Sep 4, 2013 at 11:41 AM, Tanya Golubchik wrote: > Hello, > > I'm looking for the most memory-efficient way to write a large number of > very long sequences (several Mb each) to a file. This works easily with a > generator passed to SeqIO.write if I'm writing in a sequential format like > multifasta, but what about, say, phylip? > > It is better/equivalent to convert the alignment to a list first (obviously > using a lot of memory in the process), or to write to a multifasta file, > then use SeqIO.convert? > > Thanks, > Tanya > _______________________________________________ > Biopython mailing list - Biopython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython From Gerard.Schaafsma at med.lu.se Fri Sep 6 07:38:33 2013 From: Gerard.Schaafsma at med.lu.se (Gerard Schaafsma) Date: Fri, 06 Sep 2013 09:38:33 +0200 Subject: [Biopython] parsing Entrez SNP XML files Message-ID: <1378453112.9730.18.camel@gerard-desktop> Hi, I am trying to parse XML files which I downloaded from the NCBI site (ftp://ftp.ncbi.nih.gov/snp/organisms/human_9606/XML/) containing records from the SNP (dbSNP) database. When I do: import sys from Bio import Entrez handle = open(xmlFile) records = Entrez.parse(handle) for record in records: for k, v in record.items(): print k, v I get the following error message: NotImplementedError: The Bio.Entrez parser cannot handle XML data that make use of XML namespaces I am using Biopython 1.62 on a PC with Linux 3.2.0-52-generic x86_64 GNU/Linux Looking for this error message showed that it might have something to do with the DTD files from NCBI, but since I am using the newest Biopython version, I would expect these to be OK. Moreover, in the first 2 lines of the XML file there is no mention of any DTD file, just: Anyone with the same problem, and a solution? Best regards, Gerard -- Gerard Schaafsma Lund University Department of Experimental Medical Science Protein Structure and Bioinformatics Group Hs 66, BMC D10 Box 117 22100 Lund Sweden From p.j.a.cock at googlemail.com Fri Sep 6 08:42:22 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Fri, 6 Sep 2013 09:42:22 +0100 Subject: [Biopython] parsing Entrez SNP XML files In-Reply-To: <1378453112.9730.18.camel@gerard-desktop> References: <1378453112.9730.18.camel@gerard-desktop> Message-ID: On Fri, Sep 6, 2013 at 8:38 AM, Gerard Schaafsma wrote: > Hi, > > I am trying to parse XML files which I downloaded from the NCBI site > (ftp://ftp.ncbi.nih.gov/snp/organisms/human_9606/XML/) containing > records from the SNP (dbSNP) database. > > When I do: > > import sys > from Bio import Entrez > > handle = open(xmlFile) > records = Entrez.parse(handle) > > for record in records: > for k, v in record.items(): > print k, v > > I get the following error message: > > NotImplementedError: The Bio.Entrez parser cannot handle XML data that > make use of XML namespaces Yes, sadly unlike most of the NCBI XML files, for dbSNP they don't provide a DTD file describing the object model, and the Bio.Entrez parser requires that: http://bugzilla.open-bio.org/show_bug.cgi?id=2771 Unless the NCBI change this, you will have to use an alternative XML parser - Python comes with several including ElementTree which is quite popular. Peter From mjldehoon at yahoo.com Fri Sep 6 11:37:52 2013 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Fri, 6 Sep 2013 04:37:52 -0700 (PDT) Subject: [Biopython] parsing Entrez SNP XML files In-Reply-To: References: <1378453112.9730.18.camel@gerard-desktop> Message-ID: <1378467472.95535.YahooMailNeo@web164003.mail.gq1.yahoo.com> It's a bit more complicated than that. Bio.Entrez can parse XML files that come with a DTD, which is the vast majority of XML files from NCBI Entrez. Apparently the dbSNP database uses an XML Schema instead of a DTD, so Bio.Entrez would need a parser for an XML Schema to be able to parse XML files from dbSNP. I won't be able to look into this, but any volunteers are strongly encouraged. Best, -Michiel. ________________________________ From: Peter Cock To: Gerard Schaafsma Cc: Biopython Mailing List Sent: Friday, September 6, 2013 5:42 PM Subject: Re: [Biopython] parsing Entrez SNP XML files On Fri, Sep 6, 2013 at 8:38 AM, Gerard Schaafsma wrote: > Hi, > > I am trying to parse XML files which I downloaded from the NCBI site > (ftp://ftp.ncbi.nih.gov/snp/organisms/human_9606/XML/) containing > records from the SNP (dbSNP) database. > > When I do: > > import sys > from Bio import Entrez > > handle = open(xmlFile) > records = Entrez.parse(handle) > > for record in records: >? for k, v in record.items(): >? ? print k, v > > I get the following error message: > > NotImplementedError: The Bio.Entrez parser cannot handle XML data that > make use of XML namespaces Yes, sadly unlike most of the NCBI XML files, for dbSNP they don't provide a DTD file describing the object model, and the Bio.Entrez parser requires that: http://bugzilla.open-bio.org/show_bug.cgi?id=2771 Unless the NCBI change this, you will have to use an alternative XML parser - Python comes with several including ElementTree which is quite popular. Peter _______________________________________________ Biopython mailing list? -? Biopython at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biopython From cacaucenturion2 at gmail.com Fri Sep 6 12:42:04 2013 From: cacaucenturion2 at gmail.com (cacaucenturion2) Date: Fri, 6 Sep 2013 20:42:04 +0800 Subject: [Biopython] Question about Nexus Format Converting Message-ID: <201309062042038283734@gmail.com> Hi all, Does anyone know if it is possible to convert interleaved nexus format file to non-interleaved nexus format file in Biopython? Thanks! Sincerely yours, Cacau From winda002 at student.otago.ac.nz Sat Sep 7 05:53:41 2013 From: winda002 at student.otago.ac.nz (David Winter) Date: Sat, 7 Sep 2013 05:53:41 +0000 Subject: [Biopython] Question about Nexus Format Converting In-Reply-To: <201309062042038283734@gmail.com> References: <201309062042038283734@gmail.com> Message-ID: <9dc9a3c3bc964a66bcae6f0e6ca0b361@SINPR03MB155.apcprd03.prod.outlook.com> Hi cacaucenturion, The Nexus module will let you do this. You'd create an empty nexus object, read your existing file into, then write out the nexus object specifying that you want a non-interleaved format (which is actually the default): from Bio.Nexus import Nexus new_ali = Nexus.Nexus() new_ali.read("../biopython/Tests/Nexus/test_Nexus_input.nex") #interleaved nexus file in test suite new_ali.write_nexus_data("test_sequential.nex", interleave=False) Hope that helps, David _____________________ From: biopython-bounces at lists.open-bio.org on behalf of cacaucenturion2 Sent: Saturday, 7 September 2013 12:42 a.m. To: biopython Subject: [Biopython] Question about Nexus Format Converting Hi all, Does anyone know if it is possible to convert interleaved nexus format file to non-interleaved nexus format file in Biopython? Thanks! Sincerely yours, Cacau From p.j.a.cock at googlemail.com Sat Sep 7 11:41:53 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Sat, 7 Sep 2013 12:41:53 +0100 Subject: [Biopython] Print statements vs functions (Python 2 vs 3) Message-ID: Dear Biopythoneers, As you will be aware, with our recent release of Biopython 1.62 we now officially support Python 3 for the first time (specifically Python 3.3 - we don't recommend 3.0, 3.1 or 3.2 here), while continuing to support Python 2 as well. Currently all our documentation is written assuming Python 2, but with some small changes most things can be written to work under both variants. The most visible change is how to print things, and that happens a lot in our examples. I would like us to switch to using the Python 3 style print function in our documentation (including the Tutorial and the docstrings embedded in the code as help text). In the simplest case, this Python 2 only code: >>> print variable would be changed to this using Python 3 style: >>> print(variable) Luckily that will also work on Python 2 as well. For more complicated examples to use the print function under Python 2 you must add this import line to the start of your file: >>> from __future__ import print_function For example, this Python 2 only example: >>> print "Two plus two is", 4 becomes: >>> from __future__ import print_function # at start >>> ... >>> print("Two plus two is", 4) or more elegantly, >>> print("Two plus two is %i" % 4) Would anyone object to us using the print function style in the Biopython documentation? I'm particularly keen to hear from beginners - as this is potentially confusing. Thanks, Peter. From p.j.a.cock at googlemail.com Sat Sep 7 13:52:54 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Sat, 7 Sep 2013 14:52:54 +0100 Subject: [Biopython] Print statements vs functions (Python 2 vs 3) In-Reply-To: References: Message-ID: On Sat, Sep 7, 2013 at 12:41 PM, Peter Cock wrote: > Dear Biopythoneers, > > As you will be aware, with our recent release of Biopython 1.62 > we now officially support Python 3 for the first time (specifically > Python 3.3 - we don't recommend 3.0, 3.1 or 3.2 here), while > continuing to support Python 2 as well. > > Currently all our documentation is written assuming Python 2, > but with some small changes most things can be written to > work under both variants. The most visible change is how to > print things, and that happens a lot in our examples. > > I would like us to switch to using the Python 3 style print > function in our documentation (including the Tutorial and > the docstrings embedded in the code as help text). > > In the simplest case, this Python 2 only code: > >>>> print variable > > would be changed to this using Python 3 style: > >>>> print(variable) > > Luckily that will also work on Python 2 as well. For more > complicated examples to use the print function under > Python 2 you must add this import line to the start of > your file: > >>>> from __future__ import print_function > > For example, this Python 2 only example: > >>>> print "Two plus two is", 4 > > becomes: > >>>> from __future__ import print_function # at start >>>> ... >>>> print("Two plus two is", 4) > > or more elegantly, > >>>> print("Two plus two is %i" % 4) > > Would anyone object to us using the print function style > in the Biopython documentation? > > I'm particularly keen to hear from beginners - as this > is potentially confusing. > > Thanks, > > Peter. I tweeted this email, Biopython Project (@Biopython): Would anyone object to us using #Python3 print function style in the #Biopython documentation? http://lists.open-bio.org/pipermail/biopython/2013-September/008751.html https://twitter.com/Biopython/status/376309705972654080 Two replies already: Raphael Mattos (@rsmattos): @Biopython I think it's time.... https://twitter.com/rsmattos/status/376321218456338432 Alec Munro (@alecmunro): @Biopython do it! https://twitter.com/alecmunro/status/376341224544038912 Peter From dtomso at agbiome.com Sat Sep 7 13:50:18 2013 From: dtomso at agbiome.com (Dan Tomso) Date: Sat, 7 Sep 2013 13:50:18 +0000 Subject: [Biopython] Print statements vs functions (Python 2 vs 3) In-Reply-To: References: Message-ID: Hi, Peter. This sounds OK to me. Regarding the overall transition to Python 3--is there some sort of master plan with phasing of all the changes? It might be helpful to have some longer-term context on any roll out. Best regards, Dan Tomso -------- Original message -------- From: Peter Cock Date: 09/07/2013 7:42 AM (GMT-05:00) To: Biopython Mailing List Subject: [Biopython] Print statements vs functions (Python 2 vs 3) Dear Biopythoneers, As you will be aware, with our recent release of Biopython 1.62 we now officially support Python 3 for the first time (specifically Python 3.3 - we don't recommend 3.0, 3.1 or 3.2 here), while continuing to support Python 2 as well. Currently all our documentation is written assuming Python 2, but with some small changes most things can be written to work under both variants. The most visible change is how to print things, and that happens a lot in our examples. I would like us to switch to using the Python 3 style print function in our documentation (including the Tutorial and the docstrings embedded in the code as help text). In the simplest case, this Python 2 only code: >>> print variable would be changed to this using Python 3 style: >>> print(variable) Luckily that will also work on Python 2 as well. For more complicated examples to use the print function under Python 2 you must add this import line to the start of your file: >>> from __future__ import print_function For example, this Python 2 only example: >>> print "Two plus two is", 4 becomes: >>> from __future__ import print_function # at start >>> ... >>> print("Two plus two is", 4) or more elegantly, >>> print("Two plus two is %i" % 4) Would anyone object to us using the print function style in the Biopython documentation? I'm particularly keen to hear from beginners - as this is potentially confusing. Thanks, Peter. _______________________________________________ Biopython mailing list - Biopython at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biopython From p.j.a.cock at googlemail.com Sat Sep 7 16:25:44 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Sat, 7 Sep 2013 17:25:44 +0100 Subject: [Biopython] Print statements vs functions (Python 2 vs 3) In-Reply-To: References: Message-ID: On Sat, Sep 7, 2013 at 2:50 PM, Dan Tomso wrote: > Hi, Peter. > > This sounds OK to me. Thanks Dan. And another voice of approval on Twitter: Karin Lagesen (@karinlag): @Biopython @pjacock Go for it! https://twitter.com/karinlag/status/376356704080105472 > Regarding the overall transition to Python 3 -- is there some sort > of master plan with phasing of all the changes? > It might be helpful to have some longer-term context on any roll out. I'm not quite sure what you are asking, but this might answer you: http://lists.open-bio.org/pipermail/biopython-dev/2013-May/010633.html http://lists.open-bio.org/pipermail/biopython-dev/2013-September/010880.html Using print functions in the Tutorial seems doable for the next release, Biopython 1.63, if that's what you meant. Regards, Peter From p.j.a.cock at googlemail.com Sat Sep 7 18:12:37 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Sat, 7 Sep 2013 19:12:37 +0100 Subject: [Biopython] Print statements vs functions (Python 2 vs 3) In-Reply-To: References: Message-ID: On Sat, Sep 7, 2013 at 2:52 PM, Peter Cock wrote: > On Sat, Sep 7, 2013 at 12:41 PM, Peter Cock wrote: >> Dear Biopythoneers, >> >> As you will be aware, with our recent release of Biopython 1.62 >> we now officially support Python 3 for the first time (specifically >> Python 3.3 - we don't recommend 3.0, 3.1 or 3.2 here), while >> continuing to support Python 2 as well. >> >> Currently all our documentation is written assuming Python 2, >> but with some small changes most things can be written to >> work under both variants. The most visible change is how to >> print things, and that happens a lot in our examples. >> >> I would like us to switch to using the Python 3 style print >> function in our documentation (including the Tutorial and >> the docstrings embedded in the code as help text). >> >> ... >> >> Would anyone object to us using the print function style >> in the Biopython documentation? >> >> I'm particularly keen to hear from beginners - as this >> is potentially confusing. >> >> Thanks, >> >> Peter. > > I tweeted this email, > > Biopython Project (@Biopython): Would anyone object to us using > #Python3 print function style in the #Biopython documentation? > http://lists.open-bio.org/pipermail/biopython/2013-September/008751.html > https://twitter.com/Biopython/status/376309705972654080 > > Two replies already: > > Raphael Mattos (@rsmattos): @Biopython I think it's time.... > https://twitter.com/rsmattos/status/376321218456338432 > > Alec Munro (@alecmunro): @Biopython do it! > https://twitter.com/alecmunro/status/376341224544038912 > > Peter On Sat, Sep 7, 2013 at 5:25 PM, Peter Cock wrote: > On Sat, Sep 7, 2013 at 2:50 PM, Dan Tomso wrote: >> Hi, Peter. >> >> This sounds OK to me. > > Thanks Dan. And another voice of approval on Twitter: > > Karin Lagesen (@karinlag): @Biopython @pjacock Go for it! > https://twitter.com/karinlag/status/376356704080105472 And another positive voice: Dave Lunt (@davelunt): @Biopython the docs change sounds good, that very clear explanation you link to should also be somewhere obvious https://twitter.com/davelunt/status/376405338511384576 Since there has only been positive reaction, I've made a start at converting the examples in the Tutorial to use the Python 3 style print function (maintaining full Python 2 compatibility under Python 2.6 and 2.7 via the future import): https://github.com/biopython/biopython/commit/34d155a02cbcf7c953fb8238a5412f8c7c0e1cc5 https://github.com/biopython/biopython/commit/74a8b8349b58ae9aa7a727d6e1ab774a4c9008a3 For those curious to see how it looks (but not already familiar with LaTeX, pdflatex and hevea), you can see a sneak preview here: http://biopython.org/DIST/docs/tutorial/Tutorial-dev.html http://biopython.org/DIST/docs/tutorial/Tutorial-dev.pdf (Hopefully those links will once again auto-update every night, something that was working nicely prior to the server move) If you spot any typos, please let us know. Thanks! Peter From carlos.borroto at gmail.com Mon Sep 9 14:35:54 2013 From: carlos.borroto at gmail.com (Carlos Borroto) Date: Mon, 9 Sep 2013 10:35:54 -0400 Subject: [Biopython] Print statements vs functions (Python 2 vs 3) In-Reply-To: References: Message-ID: On Sat, Sep 7, 2013 at 7:41 AM, Peter Cock wrote: > or more elegantly, > >>>> print("Two plus two is %i" % 4) Shouldn't this be done even more elegantly with something like this: >>> print( "Two plus two is {0:d}".format( 4 ) ) --Carlos From p.j.a.cock at googlemail.com Mon Sep 9 14:49:16 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Mon, 9 Sep 2013 15:49:16 +0100 Subject: [Biopython] Print statements vs functions (Python 2 vs 3) In-Reply-To: References: Message-ID: On Mon, Sep 9, 2013 at 3:35 PM, Carlos Borroto wrote: > On Sat, Sep 7, 2013 at 7:41 AM, Peter Cock wrote: >> or more elegantly, >> >>>>> print("Two plus two is %i" % 4) > > Shouldn't this be done even more elegantly with something like this: > >>>> print( "Two plus two is {0:d}".format( 4 ) ) > > --Carlos Personally I find the % version shorter and clearer, but this may reflect my past exposure to C. Where you have more than a couple of place holders, naming them seems a bigger win through. Peter From carlos.borroto at gmail.com Mon Sep 9 14:58:30 2013 From: carlos.borroto at gmail.com (Carlos Borroto) Date: Mon, 9 Sep 2013 10:58:30 -0400 Subject: [Biopython] Print statements vs functions (Python 2 vs 3) In-Reply-To: References: Message-ID: On Mon, Sep 9, 2013 at 10:49 AM, Peter Cock wrote: > On Mon, Sep 9, 2013 at 3:35 PM, Carlos Borroto wrote: >> On Sat, Sep 7, 2013 at 7:41 AM, Peter Cock wrote: >>> or more elegantly, >>> >>>>>> print("Two plus two is %i" % 4) >> >> Shouldn't this be done even more elegantly with something like this: >> >>>>> print( "Two plus two is {0:d}".format( 4 ) ) >> >> --Carlos > > Personally I find the % version shorter and clearer, but this > may reflect my past exposure to C. > > Where you have more than a couple of place holders, > naming them seems a bigger win through. > > Peter I was trying to remember the reason I switched to .format(). I believe it was for better compatibility with future versions of Python. See this[1] stackoverflow answer. Is that correct? Is % being phased out?. I know old habits are hard to kill, but maybe the tutorial should introduce newcomers to future-proof way of doing things. That way they won't have to kill one more "old" habit. [1]http://stackoverflow.com/questions/517355/string-formatting-in-python --Carlos From p.j.a.cock at googlemail.com Mon Sep 9 15:02:59 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Mon, 9 Sep 2013 16:02:59 +0100 Subject: [Biopython] Print statements vs functions (Python 2 vs 3) In-Reply-To: References: Message-ID: On Mon, Sep 9, 2013 at 3:58 PM, Carlos Borroto wrote: > On Mon, Sep 9, 2013 at 10:49 AM, Peter Cock wrote: >> On Mon, Sep 9, 2013 at 3:35 PM, Carlos Borroto wrote: >>> On Sat, Sep 7, 2013 at 7:41 AM, Peter Cock wrote: >>>> or more elegantly, >>>> >>>>>>> print("Two plus two is %i" % 4) >>> >>> Shouldn't this be done even more elegantly with something like this: >>> >>>>>> print( "Two plus two is {0:d}".format( 4 ) ) >>> >>> --Carlos >> >> Personally I find the % version shorter and clearer, but this >> may reflect my past exposure to C. >> >> Where you have more than a couple of place holders, >> naming them seems a bigger win through. >> >> Peter > > I was trying to remember the reason I switched to .format(). I believe > it was for better compatibility with future versions of Python. See > this[1] stackoverflow answer. Is that correct? Is % being phased out?. > I know old habits are hard to kill, but maybe the tutorial should > introduce newcomers to future-proof way of doing things. That way they > won't have to kill one more "old" habit. > > [1]http://stackoverflow.com/questions/517355/string-formatting-in-python > > --Carlos I'm not aware of any plans to withdraw the old string % operator in Python 3, although the implication from PEP3101 is that might eventually happen. http://www.python.org/dev/peps/pep-3101/ Peter From jesshedge at yahoo.co.uk Wed Sep 25 12:14:21 2013 From: jesshedge at yahoo.co.uk (Jessica Hedge) Date: Wed, 25 Sep 2013 13:14:21 +0100 (BST) Subject: [Biopython] Bio.Phyo.Consensus Message-ID: <1380111261.26815.YahooMailNeo@web171905.mail.ir2.yahoo.com> Hello, I'm trying to use the Bio.Phylo.Consensus module in order to simulate bootstrap replicates of a multiple sequence alignment. I have both Biopython (version 1.61) and Phylo module installed but the Consensus module doesn't seem to exist. Is the Consensus module available in another version of Biopython? Many thanks, Jess From p.j.a.cock at googlemail.com Wed Sep 25 13:25:32 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Wed, 25 Sep 2013 14:25:32 +0100 Subject: [Biopython] Bio.Phyo.Consensus In-Reply-To: <1380111261.26815.YahooMailNeo@web171905.mail.ir2.yahoo.com> References: <1380111261.26815.YahooMailNeo@web171905.mail.ir2.yahoo.com> Message-ID: On Wed, Sep 25, 2013 at 1:14 PM, Jessica Hedge wrote: > Hello, > > I'm trying to use the Bio.Phylo.Consensus module in order to > simulate bootstrap replicates of a multiple sequence alignment. > I have both Biopython (version 1.61) and Phylo module > installed but the Consensus module doesn't seem to exist. > > Is the Consensus module available in another version of Biopython? > > Many thanks, > > Jess This is probably a question for Eric (CC'd explicitly). There isn't currently a module with that exact name under Bio.Phylo, but the wiki does mention it - so this is probably just an out of date example reflecting non-final code: http://biopython.org/wiki/Phylo Peter From eric.talevich at gmail.com Thu Sep 26 05:29:29 2013 From: eric.talevich at gmail.com (Eric Talevich) Date: Wed, 25 Sep 2013 22:29:29 -0700 Subject: [Biopython] Bio.Phyo.Consensus In-Reply-To: References: <1380111261.26815.YahooMailNeo@web171905.mail.ir2.yahoo.com> Message-ID: On Wed, Sep 25, 2013 at 6:25 AM, Peter Cock wrote: > On Wed, Sep 25, 2013 at 1:14 PM, Jessica Hedge > wrote: > > Hello, > > > > I'm trying to use the Bio.Phylo.Consensus module in order to > > simulate bootstrap replicates of a multiple sequence alignment. > > I have both Biopython (version 1.61) and Phylo module > > installed but the Consensus module doesn't seem to exist. > > > > Is the Consensus module available in another version of Biopython? > > > > Many thanks, > > > > Jess > > This is probably a question for Eric (CC'd explicitly). > > There isn't currently a module with that exact name > under Bio.Phylo, but the wiki does mention it - so this > is probably just an out of date example reflecting > non-final code: http://biopython.org/wiki/Phylo > > Peter > Ah, sorry for the confusion -- the Bio.Phylo.Consensus function and most of the other things on that page starting at the "Tree Construction" section refer to Yanbo Ye's development branch for Google Summer of Code, which just completed. You can get a copy of this code here: https://github.com/lijax/biopython The work is not yet merged into the main Biopython source tree, and some parts of it could potentially change in the near future. I'll add a note to this effect on the wiki. -Eric From thomas.girke at ucr.edu Sun Sep 29 23:55:19 2013 From: thomas.girke at ucr.edu (Thomas Girke) Date: Sun, 29 Sep 2013 16:55:19 -0700 Subject: [Biopython] Bioinformatics Facility Director Position at UC Riverside Message-ID: <20130929235519.GA1542@Thomas-Girkes-MacBook-Pro.local> POSITION ANNOUNCEMENT The Institute for Integrative Genome Biology (IIGB) at the University of California, Riverside is seeking a Ph.D. level bioinformatician to coordinate its bioinformatics research activities, data analysis workshop program and computing facility. TITLE/RANK Bioinformatics Facility Director. Salary will be competitive and commensurate with accomplishments. LOCATION IIGB at the University of California, Riverside. BACKGROUND Successful candidates will join an innovative and multidisciplinary Institute for Integrative Genome Biology (IIGB) that connects theoretical and experimental researchers from different departments in Life, Physical and Mathematical Sciences, Medicine, Engineering and various campus based Centers. The IIGB is organized around a 10,000 sq.ft. suite of Instrumentation Facilities that serve as a centralized, shared-use resource for faculty, staff and students, offering advanced tools in bioinformatics, microscopy, proteomics and genomics. Its bioinformatic component is equipped with a modern high-performance compute (HPC) infrastructure. QUALIFICATIONS Applicants must have a Ph.D. from a recognized university in bioinformatics; or combined degrees in computer science and a biological science; or a degree in either computer science combined with relevant experience in biological science; or a degree in a biological science combined with relevant experience in computer science. The successful candidate will have at least two years professional hands-on experience with next generation sequence data analysis, and basic knowledge of high-performance computing and cloud computing technologies. Another requirement is several years of professional experience with common programming languages/environments used in scientific data analysis, such as Python, R or C++. Experience with web development frameworks and relational database design will also be a plus. RESPONSIBILITIES The Bioinformatics Facility Director leads IIGB?s bioinformatics facility staff and manages its computational infrastructure. The incumbent will engage in collaborative research activities, contribute to scientific publications, and participate in the preparation of joint grant applications. The teaching expectations include the development of a state-of-the-art workshop program on scientific data analysis. TO APPLY Review of applications will begin October 11, 2013 and continue until the position is filled. Interested individuals should: (1) submit a curriculum vitae, (2) provide a statement of research interests, and (3) arrange to have three letters of reference sent on their behalf. All information should be addressed to: Thomas Girke Institute for Integrative Genome Biology University of California Riverside, CA 92521 thomas.girke at ucr.edu WEBSITE http://facility.bioinformatics.ucr.edu/position-opening POLICY The University of California is an Equal Opportunity/Affirmative Action Employer. In accordance with Federal law, we are making available our Campus Security Report to all prospective employees. From devaniranjan at gmail.com Mon Sep 30 16:02:41 2013 From: devaniranjan at gmail.com (George Devaniranjan) Date: Mon, 30 Sep 2013 12:02:41 -0400 Subject: [Biopython] Generating high seq similar lists Message-ID: Hi, Not directly related to biopython but I am trying to use biopython to generate a substitution matrix. I was looking for a list of PDB, with high sequence similarity (>75%) but less than 90%-so something like that at 1.6A or better resolution to create my own substitution matrix. I looked at http://dunbrack.fccc.edu/PISCES.php But I cannot specify a range for the sequence similarity, any other way I can get a similar list? Thank you, George