From p.j.a.cock at googlemail.com Fri Feb 1 08:34:46 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Fri, 1 Feb 2013 13:34:46 +0000 Subject: [Biopython] Fwd: Bug in bgzf module In-Reply-To: References: Message-ID: On Thu, Jan 31, 2013 at 10:57 PM, Petra Kubincov? wrote: > Hi Peter, > > well, I don't have much experience with unit tests but I will try to come up > with something. :) > I'll let you know if I won't succeed. That would be great - in the short term I've added something quite simple: https://github.com/biopython/biopython/commit/5b0d0bd55024d6dbbdea85ff73e6bd2fbbfd5ee1 > And yes, recording an index is exactly the thing I need to do. (I am > currently working on interval mapping tool for multiple whole-genome > alignments, where I need to read .maf file, write preprocessed data into a > compressed file and then work just with index for the compressed file and > the compressed file itself to do the mapping.) That reminds me I need to look at Andrew's MAF work: http://biopython.org/wiki/Multiple_Alignment_Format https://github.com/biopython/biopython/pull/5 Regards, Peter From p.j.a.cock at googlemail.com Mon Feb 4 13:04:40 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Mon, 4 Feb 2013 18:04:40 +0000 Subject: [Biopython] Proof reading the tutorial for the next release? Message-ID: Hello all, If you're also on the Biopython-Dev Mailing List you'll know we're hoping to release Biopython 1.61 this week. If anyone here wants to help out, proof-reading the draft tutorial would be great :) I've posted the current tutorial as HTML and PDF online, http://biopython.org/DIST/docs/tutorial/Tutorial-dev.html http://biopython.org/DIST/docs/tutorial/Tutorial-dev.pdf Currently those are being updated manually (it used to be done automatically every night - something which needs to be-configured following a server move). If you see an error, and want to know if it has already been fixed, then the source file is Tutorial.tex (it is written using LaTex), and you can see the recent changes here on GitHub: https://github.com/biopython/biopython/commits/master/Doc/Tutorial.tex Thanks, Peter From p.j.a.cock at googlemail.com Tue Feb 5 17:05:25 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Tue, 5 Feb 2013 22:05:25 +0000 Subject: [Biopython] Biopython 1.61 released Message-ID: Dear Biopythoneers, Source distributions and Windows installers for Biopython 1.61 are now available from the downloads page on the Biopython website and from the Python Package Index (PyPI). The updated Biopython Tutorial and Cookbook is online (PDF). Platforms/Deployment: We currently support Python 2.5, 2.6 and 2.7 and also test under Python 3.1, 3.2 and 3.3 (including modules using NumPy), and Jython 2.5 and PyPy 1.9 (Jython and PyPy do not cover NumPy or our C extensions). We are still encouraging early adopters to help test on these platforms, and have included a ?beta? installer for Python 3.2 (and Python 3.3 to follow soon) under 32-bit Windows. Please note we are phasing out support for Python 2.5. We will continue support for at least one further release (Biopython 1.62). This could be extended given feedback from our users. Focusing on Python 2.6 and 2.7 only will make writing Python 3 compatible code easier. New Features: GenomeDiagram has three new sigils (shapes to illustrate features). OCTO shows an octagonal shape, like the existing BOX sigil but with the corners cut off. JAGGY shows a box with jagged edges at the start and end, intended for things like NNNNN regions in draft genomes. Finally BIGARROW is like the existing ARROW sigil but is drawn straddling the axis. This is useful for drawing vertically compact figures where you do not have overlapping genes. New module Bio.Graphics.ColorSpiral can generate colors along a spiral path through HSV color space. This can be used to make arbitrary ?rainbow? scales, for example to color features or cross-links on a GenomeDiagram figure. The Bio.SeqIO module now supports reading sequences from PDB files in two different ways. The ?pdb-atom? format determines the sequence as it appears in the structure based on the atom coordinate section of the file (via Bio.PDB, so NumPy is currently required for this). Alternatively, you can use the ?pdb-seqres? format to read the complete protein sequence as it is listed in the PDB header, if available. The Bio.SeqUtils module how has a seq1 function to turn a sequence using three letter amino acid codes into one using the more common one letter codes. This acts as the inverse of the existing seq3 function. The multiple-sequence-alignment object used by Bio.AlignIO etc now supports an annotation dictionary. Additional support for per-column annotation is planned, with addition and splicing to work like that for the SeqRecord per-letter annotation. The Bio.Motif module has been updated and reorganized. To allow for a clean deprecation of the old code, the new motif code is stored in a new module Bio.motifs, and a PendingDeprecationWarning was added to Bio.Motif. Experimental Code ? SearchIO: This release also includes Bow?s Google Summer of Code work writing a unified parsing framework for NCBI BLAST (assorted formats including tabular and XML), HMMER, BLAT, and other sequence searching tools. This is currently available with the new BiopythonExperimentalWarning to indicate that this is still somewhat experimental. We?re bundling it with the main release to get more public feedback, but with the big warning that the API is likely to change. In fact, even the current name of Bio.SearchIO may change since unless you are familiar with BioPerl its purpose isn?t immediately clear. Contributors: Brandon Invergo Bryan Lunt (first contribution) Christian Brueffer (first contribution) David Cain Eric Talevich Grace Yeo (first contribution) Jeffrey Chang Jingping Li (first contribution) Kai Blin (first contribution) Leighton Pritchard Lenna Peterson Lucas Sinclair (first contribution) Michiel de Hoon Nick Semenkovich (first contribution) Peter Cock Robert Ernst (first contribution) Tiago Antao Wibowo ?Bow? Arindrarto Thank you all. Release announcement here (RSS feed available): http://news.open-bio.org/news/2013/02/biopython-1-61-released/ P.S. You can follow @Biopython on Twitter https://twitter.com/Biopython From w.arindrarto at gmail.com Tue Feb 5 19:03:52 2013 From: w.arindrarto at gmail.com (Wibowo Arindrarto) Date: Wed, 6 Feb 2013 01:03:52 +0100 Subject: [Biopython] Biopython 1.61 released In-Reply-To: References: Message-ID: Hi Peter, > Dear Biopythoneers, > > Source distributions and Windows installers for Biopython 1.61 are now > available from the downloads page on the Biopython website and from > the Python Package Index (PyPI). > > The updated Biopython Tutorial and Cookbook is online (PDF). > > Platforms/Deployment: > > We currently support Python 2.5, 2.6 and 2.7 and also test under > Python 3.1, 3.2 and 3.3 (including modules using NumPy), and Jython > 2.5 and PyPy 1.9 (Jython and PyPy do not cover NumPy or our C > extensions). We are still encouraging early adopters to help test on > these platforms, and have included a ?beta? installer for Python 3.2 > (and Python 3.3 to follow soon) under 32-bit Windows. > > Please note we are phasing out support for Python 2.5. We will > continue support for at least one further release (Biopython 1.62). > This could be extended given feedback from our users. Focusing on > Python 2.6 and 2.7 only will make writing Python 3 compatible code > easier. > > New Features: > > GenomeDiagram has three new sigils (shapes to illustrate features). > OCTO shows an octagonal shape, like the existing BOX sigil but with > the corners cut off. JAGGY shows a box with jagged edges at the start > and end, intended for things like NNNNN regions in draft genomes. > Finally BIGARROW is like the existing ARROW sigil but is drawn > straddling the axis. This is useful for drawing vertically compact > figures where you do not have overlapping genes. > > New module Bio.Graphics.ColorSpiral can generate colors along a spiral > path through HSV color space. This can be used to make arbitrary > ?rainbow? scales, for example to color features or cross-links on a > GenomeDiagram figure. > > The Bio.SeqIO module now supports reading sequences from PDB files in > two different ways. The ?pdb-atom? format determines the sequence as > it appears in the structure based on the atom coordinate section of > the file (via Bio.PDB, > so NumPy is currently required for this). Alternatively, you can use > the ?pdb-seqres? format to read the complete protein sequence as it is > listed in the PDB header, if available. > > The Bio.SeqUtils module how has a seq1 function to turn a sequence > using three letter amino acid codes into one using the more common one > letter codes. This acts as the inverse of the existing seq3 function. > > The multiple-sequence-alignment object used by Bio.AlignIO etc now > supports an annotation dictionary. Additional support for per-column > annotation is planned, with addition and splicing to work like that > for the SeqRecord per-letter annotation. > > The Bio.Motif module has been updated and reorganized. To allow for a > clean deprecation of the old code, the new motif code is stored in a > new module Bio.motifs, and a PendingDeprecationWarning was added to > Bio.Motif. > > Experimental Code ? SearchIO: > > This release also includes Bow?s Google Summer of Code work writing a > unified parsing framework for NCBI BLAST (assorted formats including > tabular and XML), HMMER, BLAT, and other sequence searching tools. > This is currently available with the new BiopythonExperimentalWarning > to indicate that this is still somewhat experimental. We?re bundling > it with the main release to get more public feedback, but with the big > warning that the API is likely to change. In fact, even the current > name of Bio.SearchIO may change since unless you are familiar with > BioPerl its purpose isn?t immediately clear. > > Contributors: > > Brandon Invergo > Bryan Lunt (first contribution) > Christian Brueffer (first contribution) > David Cain > Eric Talevich > Grace Yeo (first contribution) > Jeffrey Chang > Jingping Li (first contribution) > Kai Blin (first contribution) > Leighton Pritchard > Lenna Peterson > Lucas Sinclair (first contribution) > Michiel de Hoon > Nick Semenkovich (first contribution) > Peter Cock > Robert Ernst (first contribution) > Tiago Antao > Wibowo ?Bow? Arindrarto > > Thank you all. > > Release announcement here (RSS feed available): > http://news.open-bio.org/news/2013/02/biopython-1-61-released/ > > P.S. You can follow @Biopython on Twitter > https://twitter.com/Biopython Thanks for doing the release! It feels exciting to see SearchIO code finally live in the distributions :). Hopefully this will result in more feedback (and then more improvements ~ likewise for the whole Biopython as well). Also, thank you as well to everyone who has criticized / commented / contributed code to the module :). cheers, Bow From p.j.a.cock at googlemail.com Thu Feb 7 06:33:25 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Thu, 7 Feb 2013 11:33:25 +0000 Subject: [Biopython] Biopython 1.61 released In-Reply-To: References: Message-ID: On Tue, Feb 5, 2013 at 10:05 PM, Peter Cock wrote: > Dear Biopythoneers, > > Source distributions and Windows installers for Biopython 1.61 are now > available from the downloads page on the Biopython website and from > the Python Package Index (PyPI). > > The updated Biopython Tutorial and Cookbook is online (PDF). > > Platforms/Deployment: > > We currently support Python 2.5, 2.6 and 2.7 and also test under > Python 3.1, 3.2 and 3.3 (including modules using NumPy), and Jython > 2.5 and PyPy 1.9 (Jython and PyPy do not cover NumPy or our C > extensions). We are still encouraging early adopters to help test on > these platforms, and have included a ?beta? installer for Python 3.2 > (and Python 3.3 to follow soon) under 32-bit Windows. For those of you wanting to try Biopython on Python 3.3 on Windows, there is now an installer for Biopython 1.61 built against NumPy 1.7.0rc2. NumPy 1.7 is their first release to support Python 3.3, and the official release is expected to be near-identical to this second release candidate, see: http://mail.scipy.org/pipermail/numpy-discussion/2013-February/065384.html Regards, Peter From vincent at vincentdavis.net Sat Feb 9 22:47:20 2013 From: vincent at vincentdavis.net (Vincent Davis) Date: Sat, 9 Feb 2013 20:47:20 -0700 Subject: [Biopython] Taxonomic Classification tree Message-ID: Any suggestion of how to build a Taxonomic Classification tree. That is, like a Phylo tree but based on taxa. Vincent Davis From nuin at genedrift.org Sat Feb 9 23:03:27 2013 From: nuin at genedrift.org (Paulo Nuin) Date: Sat, 9 Feb 2013 23:03:27 -0500 Subject: [Biopython] Taxonomic Classification tree In-Reply-To: References: Message-ID: <848FC12D-5B1C-4D1F-94B1-4EC97845FFEE@genedrift.org> All phylogenetic trees are based on taxa. You might need to be more specific. Paulo On 2013-02-09, at 10:47 PM, Vincent Davis wrote: > Any suggestion of how to build a Taxonomic Classification tree. That is, > like a Phylo tree but based on taxa. > > Vincent Davis > _______________________________________________ > Biopython mailing list - Biopython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython From vincent at vincentdavis.net Sat Feb 9 23:53:13 2013 From: vincent at vincentdavis.net (Vincent Davis) Date: Sat, 9 Feb 2013 21:53:13 -0700 Subject: [Biopython] Taxonomic Classification tree In-Reply-To: <848FC12D-5B1C-4D1F-94B1-4EC97845FFEE@genedrift.org> References: <848FC12D-5B1C-4D1F-94B1-4EC97845FFEE@genedrift.org> Message-ID: On Sat, Feb 9, 2013 at 9:03 PM, Paulo Nuin wrote: > All phylogenetic trees are based on taxa. You might need to be more > specific. Maybe but Taxonomic Classification is not based on phylogenetics. What I have is a list of organisms and their Taxonomic Classification. I want to build a tree based on only the Taxonomic Classification. Vincent Davis 720-301-3003 From cjfields at illinois.edu Sun Feb 10 00:01:59 2013 From: cjfields at illinois.edu (Fields, Christopher J) Date: Sun, 10 Feb 2013 05:01:59 +0000 Subject: [Biopython] Taxonomic Classification tree In-Reply-To: References: <848FC12D-5B1C-4D1F-94B1-4EC97845FFEE@genedrift.org> Message-ID: <118F034CF4C3EF48A96F86CE585B94BF6CE1FE3E@CHIMBX5.ad.uillinois.edu> On Feb 9, 2013, at 10:53 PM, Vincent Davis wrote: > On Sat, Feb 9, 2013 at 9:03 PM, Paulo Nuin wrote: > >> All phylogenetic trees are based on taxa. You might need to be more >> specific. > > > Maybe but Taxonomic Classification is not based on phylogenetics. > What I have is a list of organisms and their Taxonomic Classification. I > want to build a tree based on only the Taxonomic Classification. > > Vincent Davis > 720-301-3003 There's code floating around on the bioperl side for doing this sort of thing, not sure if biopython has anything along these lines (I would be surprised if someone hasn't done this yet, though). chris From vincent at vincentdavis.net Sun Feb 10 15:16:20 2013 From: vincent at vincentdavis.net (Vincent Davis) Date: Sun, 10 Feb 2013 13:16:20 -0700 Subject: [Biopython] NCBI Blast, what an I going wrong Message-ID: I am having trouble with NCBIWWW.qblast I can get the the example to work. Maybe I need help with reading :-) >From the documentation result_handle = NCBIWWW.qblast("blastn", "nt", "8332116") save_file = open("temp.xml", "w") save_file.write(result_handle.read()) save_file.close() result_handle.close() result_handle = open("temp.xml") blast_record = NCBIXML.parse(result_handle) The temp.xml looks correct but I can get nothing from blast_record. I have tried passing the directly to NCBIXML.parse and still no luck. How would I for example get the first hit "gi|224094601" ? Vincent Davis From p.j.a.cock at googlemail.com Sun Feb 10 15:35:20 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Sun, 10 Feb 2013 20:35:20 +0000 Subject: [Biopython] NCBI Blast, what an I going wrong In-Reply-To: References: Message-ID: On Sun, Feb 10, 2013 at 8:16 PM, Vincent Davis wrote: > I am having trouble with NCBIWWW.qblast I can get the the example to work. > Maybe I need help with reading :-) > > >From the documentation > result_handle = NCBIWWW.qblast("blastn", "nt", "8332116") > save_file = open("temp.xml", "w") > save_file.write(result_handle.read()) > save_file.close() > result_handle.close() > result_handle = open("temp.xml") > blast_record = NCBIXML.parse(result_handle) > > The temp.xml looks correct but I can get nothing from blast_record. I have > tried passing the directly to NCBIXML.parse and still no luck. > > How would I for example get the first hit "gi|224094601" ? > > Vincent Davis Hi Vincent, Well, first I would check that the BLAST results were downloaded ok - can you open the temp.xml file in a text editor (e.g. WordPad on Windows)? Can you see the hits you are expecting? Second, the parse function is for iterating over the file - if you expect just one query's results, try: blast_record = NCBIXML.read(result_handle) Peter From vincent at vincentdavis.net Sun Feb 10 15:41:42 2013 From: vincent at vincentdavis.net (Vincent Davis) Date: Sun, 10 Feb 2013 13:41:42 -0700 Subject: [Biopython] NCBI Blast, what an I going wrong In-Reply-To: References: Message-ID: Peter, I verified the file, I miss understood the " if you expect just one query's results" I was reading this as meaning that there would be more than one hit. I figured this was a stupid mistake. Thanks Vincent Vincent Davis 720-301-3003 On Sun, Feb 10, 2013 at 1:35 PM, Peter Cock wrote: > On Sun, Feb 10, 2013 at 8:16 PM, Vincent Davis > wrote: > > I am having trouble with NCBIWWW.qblast I can get the the example to > work. > > Maybe I need help with reading :-) > > > > >From the documentation > > result_handle = NCBIWWW.qblast("blastn", "nt", "8332116") > > save_file = open("temp.xml", "w") > > save_file.write(result_handle.read()) > > save_file.close() > > result_handle.close() > > result_handle = open("temp.xml") > > blast_record = NCBIXML.parse(result_handle) > > > > The temp.xml looks correct but I can get nothing from blast_record. I > have > > tried passing the directly to NCBIXML.parse and still no luck. > > > > How would I for example get the first hit "gi|224094601" ? > > > > Vincent Davis > > Hi Vincent, > > Well, first I would check that the BLAST results were downloaded > ok - can you open the temp.xml file in a text editor (e.g. WordPad > on Windows)? Can you see the hits you are expecting? > > Second, the parse function is for iterating over the file - if you > expect just one query's results, try: > > blast_record = NCBIXML.read(result_handle) > > Peter > From p.j.a.cock at googlemail.com Sun Feb 10 15:42:56 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Sun, 10 Feb 2013 20:42:56 +0000 Subject: [Biopython] NCBI Blast, what an I going wrong In-Reply-To: References: Message-ID: On Sun, Feb 10, 2013 at 8:41 PM, Vincent Davis wrote: > Peter, > I verified the file, > I miss understood the " if you expect just one query's results" I was > reading this as meaning that there would be more than one hit. > I figured this was a stupid mistake. > > Thanks > Vincent So things are working now :) Great, Peter From vincent at vincentdavis.net Sun Feb 10 15:46:25 2013 From: vincent at vincentdavis.net (Vincent Davis) Date: Sun, 10 Feb 2013 13:46:25 -0700 Subject: [Biopython] NCBI Blast, what an I going wrong In-Reply-To: References: Message-ID: Yes Vincent Davis 720-301-3003 On Sun, Feb 10, 2013 at 1:42 PM, Peter Cock wrote: > On Sun, Feb 10, 2013 at 8:41 PM, Vincent Davis > wrote: > > Peter, > > I verified the file, > > I miss understood the " if you expect just one query's results" I was > > reading this as meaning that there would be more than one hit. > > I figured this was a stupid mistake. > > > > Thanks > > Vincent > > So things are working now :) > > Great, > > Peter > From winda002 at student.otago.ac.nz Sun Feb 10 16:16:53 2013 From: winda002 at student.otago.ac.nz (David Winter) Date: Mon, 11 Feb 2013 10:16:53 +1300 Subject: [Biopython] Taxonomic Classification tree In-Reply-To: <118F034CF4C3EF48A96F86CE585B94BF6CE1FE3E@CHIMBX5.ad.uillinois.edu> References: <848FC12D-5B1C-4D1F-94B1-4EC97845FFEE@genedrift.org> <118F034CF4C3EF48A96F86CE585B94BF6CE1FE3E@CHIMBX5.ad Message-ID: <51180E45.9000102@student.otago.ac.nz> Hi Vincent, It would probably be possible to do this with Biopython, either by (a) searching the NCBI's taxonomy database with Eutils to get IDs, then fetching the corresponding taxonomy records and extracting the complete lineage for each of your taxa. You could find the "lowest shared taxon" for each one an build a tree (b) Read the whole NCBI taxonomy using Phylo, and extracting a subtree with just your taxa Both those are probably more work than you need to do though. The Interactive Tree of Life page (http://itol.embl.de/other_trees.shtml) can take taxon names or IDs and return a phylogeny. You should be aware - taxonomy is a dynamic science, and assignments can change. The NCBI taxonomy is curated by people that know what they're talking about, but it's not a definitive tree of life or the result of a particular phylogenetic analysis. David -- David Winter Research Associate Allan Wilson Centre for Molecular Ecology and Evolution Univeristy of Otago Dunedin New Zealand/ Aotearoa ph + 64 22 018 0449 w: www.david-winter.info blog: sciblogs.co.nz/the-atavism On 2/10/2013 6:01 PM, Fields, Christopher J wrote: > On Feb 9, 2013, at 10:53 PM, Vincent Davis > wrote: > >> On Sat, Feb 9, 2013 at 9:03 PM, Paulo Nuin wrote: >> >>> All phylogenetic trees are based on taxa. You might need to be more >>> specific. >> >> >> Maybe but Taxonomic Classification is not based on phylogenetics. >> What I have is a list of organisms and their Taxonomic Classification. I >> want to build a tree based on only the Taxonomic Classification. >> >> Vincent Davis >> 720-301-3003 > > > There's code floating around on the bioperl side for doing this sort of thing, not sure if biopython has anything along these lines (I would be surprised if someone hasn't done this yet, though). > > chris > > > _______________________________________________ > Biopython mailing list - Biopython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython > > From vincent at vincentdavis.net Sun Feb 10 17:25:53 2013 From: vincent at vincentdavis.net (Vincent Davis) Date: Sun, 10 Feb 2013 15:25:53 -0700 Subject: [Biopython] How to BLAST Optimize for : More dissimilar sequences (discontiguous megablast) Message-ID: On the NCBI Blast website there is an option to *Optimize for : **More dissimilar sequences (discontiguous megablast) *The URL shows to to this to be BLAST_PROGRAMS="discoMegablast" is there a way to do this with NCBIWWW.qblast ? * * Vincent Davis From vincent at vincentdavis.net Sun Feb 10 17:31:22 2013 From: vincent at vincentdavis.net (Vincent Davis) Date: Sun, 10 Feb 2013 15:31:22 -0700 Subject: [Biopython] Taxonomic Classification tree In-Reply-To: <51180E45.9000102@student.otago.ac.nz> References: <848FC12D-5B1C-4D1F-94B1-4EC97845FFEE@genedrift.org> <118F034CF4C3EF48A96F86CE585B94BF6CE1FE3E@CHIMBX5.ad.uillinois.edu> <51180E45.9000102@student.otago.ac.nz> Message-ID: On Sun, Feb 10, 2013 at 2:16 PM, David Winter wrote: > > Both those are probably more work than you need to do though. The > Interactive Tree of Life page (http://itol.embl.de/other_**trees.shtml) > can take taxon names or IDs and return a phylogeny. > This is what I needed thanks David Vincent Davis 720-301-3003 From vincent at vincentdavis.net Sun Feb 10 23:35:49 2013 From: vincent at vincentdavis.net (Vincent Davis) Date: Sun, 10 Feb 2013 21:35:49 -0700 Subject: [Biopython] How to BLAST Optimize for : More dissimilar sequences (discontiguous megablast) In-Reply-To: References: Message-ID: On Sun, Feb 10, 2013 at 3:25 PM, Vincent Davis wrote: > BLAST_PROGRAMS I got it figured out. Just need to change the defaults Vincent Davis 720-301-3003 From hlapp at drycafe.net Wed Feb 13 23:30:15 2013 From: hlapp at drycafe.net (Hilmar Lapp) Date: Wed, 13 Feb 2013 23:30:15 -0500 Subject: [Biopython] Taxonomic Classification tree In-Reply-To: References: <848FC12D-5B1C-4D1F-94B1-4EC97845FFEE@genedrift.org> Message-ID: <94D820E8-1E6A-438F-B923-63288D7DBAC6@drycafe.net> On Feb 9, 2013, at 11:53 PM, Vincent Davis wrote: > On Sat, Feb 9, 2013 at 9:03 PM, Paulo Nuin wrote: > >> All phylogenetic trees are based on taxa. This is not true. Phylogenetic trees are based on a character matrix. The rows in such a matrix are called OTUs. OTUs may or may not refer to a taxon; they could (and nowadays typically do) refer to a gene, a protein, a (part of a) genome, or some other nucleic acid or amino acid sequence. > Maybe but Taxonomic Classification is not based on phylogenetics. Not strictly, but it aspires to be. I.e., species taxonomies aspire to group taxa together that are monophyletic. In practice this isn't always the case, but it's the idea, and is one reason why taxonomies change. > What I have is a list of organisms and their Taxonomic Classification. I want to build a tree based on only the Taxonomic Classification. You can obtain this directly from the NCBI taxonomy: http://www.ncbi.nlm.nih.gov/Taxonomy/CommonTree/wwwcmt.cgi -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net : =========================================================== From nuin at genedrift.org Thu Feb 14 05:22:19 2013 From: nuin at genedrift.org (Paulo Nuin) Date: Thu, 14 Feb 2013 05:22:19 -0500 Subject: [Biopython] Taxonomic Classification tree In-Reply-To: <94D820E8-1E6A-438F-B923-63288D7DBAC6@drycafe.net> References: <848FC12D-5B1C-4D1F-94B1-4EC97845FFEE@genedrift.org> <94D820E8-1E6A-438F-B923-63288D7DBAC6@drycafe.net> Message-ID: On 2013-02-13, at 11:30 PM, Hilmar Lapp wrote: > > On Feb 9, 2013, at 11:53 PM, Vincent Davis wrote: > >> On Sat, Feb 9, 2013 at 9:03 PM, Paulo Nuin wrote: >> >>> All phylogenetic trees are based on taxa. > > This is not true. Phylogenetic trees are based on a character matrix. The rows in such a matrix are called OTUs. OTUs may or may not refer to a taxon; they could (and nowadays typically do) refer to a gene, a protein, a (part of a) genome, or some other nucleic acid or amino acid sequence. > >> Around the gene, protein, sequence, phenotipic character there's an OTU, and there's a a taxon. If you are analyzing extraterrestrial species (or car colours, or fridge models) you might not have a taxon on your OTU but otherwise each and every piece of data you analyze has come from a species, known or not, repeated or unique in your rows. Semantically, you are correct, but even if you put 1000 genes from the same species in a matrix, and generate a phylogenetic tree, you still based your tree on a taxon. P > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net : > =========================================================== > > > > From vincent at vincentdavis.net Thu Feb 14 12:20:58 2013 From: vincent at vincentdavis.net (Vincent Davis) Date: Thu, 14 Feb 2013 10:20:58 -0700 Subject: [Biopython] Concatenate to aligned sequences Message-ID: I have 2 fasta files from a mucle alignment. Both have the same number of sequences from the same organism. If I what to concatenate the pairs of sequences what it the best way to do this. Right now I am doing this: def concatenate(fa1, fa2): fa1open = open(fa1, "rU") fa2open = open(fa1, "rU") fa1dict = SeqIO.to_dict(SeqIO.parse(fa1open, "fasta")) fa2dict = SeqIO.to_dict(SeqIO.parse(fa2open, "fasta")) fa1open.close() fa2open.close() # check that both files have the same sequnce id's if set(fa1dict.keys()) != set(fa2dict.keys()): print(fa1dict.keys(), fa2dict.keys()) print('The fasta files do not have the same sequences') bothdict = {} bothlist = [] count = 1 for key in fa2dict.keys(): bothdict[key] = fa2dict[key] bothdict[key].seq = fa2dict[key].seq + fa1dict[key].seq bothlist.append(bothdict[key]) return bothdict, bothlist Vincent Davis 720-301-3003 From p.j.a.cock at googlemail.com Thu Feb 14 12:29:12 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Thu, 14 Feb 2013 17:29:12 +0000 Subject: [Biopython] Concatenate to aligned sequences In-Reply-To: References: Message-ID: On Thu, Feb 14, 2013 at 5:20 PM, Vincent Davis wrote: > I have 2 fasta files from a mucle alignment. Both have the same number of > sequences from the same organism. If I what to concatenate the pairs of > sequences what it the best way to do this. > Right now I am doing this: > > def concatenate(fa1, fa2): > fa1open = open(fa1, "rU") > fa2open = open(fa1, "rU") > fa1dict = SeqIO.to_dict(SeqIO.parse(fa1open, "fasta")) > fa2dict = SeqIO.to_dict(SeqIO.parse(fa2open, "fasta")) > fa1open.close() > fa2open.close() > # check that both files have the same sequnce id's > if set(fa1dict.keys()) != set(fa2dict.keys()): > print(fa1dict.keys(), fa2dict.keys()) > print('The fasta files do not have the same sequences') > bothdict = {} > bothlist = [] > count = 1 > for key in fa2dict.keys(): > bothdict[key] = fa2dict[key] > bothdict[key].seq = fa2dict[key].seq + fa1dict[key].seq > bothlist.append(bothdict[key]) > return bothdict, bothlist > > Vincent Davis > 720-301-3003 Have you tried loading the two alignment files via AlignIO, sorting by name if required, and adding the alignment objects? http://biopython.org/DIST/docs/api/Bio.Align.MultipleSeqAlignment-class.html#__add__ Peter From vincent at vincentdavis.net Thu Feb 14 12:38:43 2013 From: vincent at vincentdavis.net (Vincent Davis) Date: Thu, 14 Feb 2013 10:38:43 -0700 Subject: [Biopython] Concatenate to aligned sequences In-Reply-To: References: Message-ID: Thanks Vincent Vincent Davis 720-301-3003 On Thu, Feb 14, 2013 at 10:29 AM, Peter Cock wrote: > On Thu, Feb 14, 2013 at 5:20 PM, Vincent Davis > wrote: > > I have 2 fasta files from a mucle alignment. Both have the same number of > > sequences from the same organism. If I what to concatenate the pairs of > > sequences what it the best way to do this. > > Right now I am doing this: > > > > def concatenate(fa1, fa2): > > fa1open = open(fa1, "rU") > > fa2open = open(fa1, "rU") > > fa1dict = SeqIO.to_dict(SeqIO.parse(fa1open, "fasta")) > > fa2dict = SeqIO.to_dict(SeqIO.parse(fa2open, "fasta")) > > fa1open.close() > > fa2open.close() > > # check that both files have the same sequnce id's > > if set(fa1dict.keys()) != set(fa2dict.keys()): > > print(fa1dict.keys(), fa2dict.keys()) > > print('The fasta files do not have the same sequences') > > bothdict = {} > > bothlist = [] > > count = 1 > > for key in fa2dict.keys(): > > bothdict[key] = fa2dict[key] > > bothdict[key].seq = fa2dict[key].seq + fa1dict[key].seq > > bothlist.append(bothdict[key]) > > return bothdict, bothlist > > > > Vincent Davis > > 720-301-3003 > > Have you tried loading the two alignment files via AlignIO, > sorting by name if required, and adding the alignment objects? > > > http://biopython.org/DIST/docs/api/Bio.Align.MultipleSeqAlignment-class.html#__add__ > > Peter > From karolisr at gmail.com Fri Feb 15 12:28:06 2013 From: karolisr at gmail.com (Karolis Ramanauskas) Date: Fri, 15 Feb 2013 11:28:06 -0600 Subject: [Biopython] Concatenate to aligned sequences In-Reply-To: References: Message-ID: Good day, I have written a function that will take a list of alignments and will concatenate them based on the sequence ids. The advantage here is that the lists do not have to contain the same number of sequences, which is helpful when you are trying to create one big alignment for phylogenetic applications and some taxa are missing certain sequences. concatenate function is here: https://github.com/karolisr/krpy/blob/master/kralign.py other functions can be ignored, it only depends on biopython to work. Peace On Thu, Feb 14, 2013 at 11:20 AM, Vincent Davis wrote: > I have 2 fasta files from a mucle alignment. Both have the same number of > sequences from the same organism. If I what to concatenate the pairs of > sequences what it the best way to do this. > Right now I am doing this: > > def concatenate(fa1, fa2): > fa1open = open(fa1, "rU") > fa2open = open(fa1, "rU") > fa1dict = SeqIO.to_dict(SeqIO.parse(fa1open, "fasta")) > fa2dict = SeqIO.to_dict(SeqIO.parse(fa2open, "fasta")) > fa1open.close() > fa2open.close() > # check that both files have the same sequnce id's > if set(fa1dict.keys()) != set(fa2dict.keys()): > print(fa1dict.keys(), fa2dict.keys()) > print('The fasta files do not have the same sequences') > bothdict = {} > bothlist = [] > count = 1 > for key in fa2dict.keys(): > bothdict[key] = fa2dict[key] > bothdict[key].seq = fa2dict[key].seq + fa1dict[key].seq > bothlist.append(bothdict[key]) > return bothdict, bothlist > > Vincent Davis > 720-301-3003 > _______________________________________________ > Biopython mailing list - Biopython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython From jordan.r.willis at Vanderbilt.Edu Thu Feb 21 21:19:40 2013 From: jordan.r.willis at Vanderbilt.Edu (Willis, Jordan R) Date: Fri, 22 Feb 2013 02:19:40 +0000 Subject: [Biopython] User Defined Scoring Matrix Message-ID: Hello, Since I'm not sure which tool to exactly use, I will defer to the biopython community since odds are I will be using it. I'm trying to produce a multiple sequence alignment with a user defined scoring matrix. When I look at Clustalw, there is an option to put in such a matrix, and the help indicates that this should be in "blast" format. When I search for blast format, they indicate that this is hard coded into the software. My end goal is to produce a phylogeny tree using this PSSM, but I have no idea how to input this into ClustalW or any multiple sequence alignment software. I don't really care which software to use, which wrappers, or how I have to do it.I have used biopython to produce this matrix, so I thought it would be relatively easy to implement it in any multiple sequence alignment software. I'm not having very good luck and any help would be must appreciated. Jordan From p.j.a.cock at googlemail.com Fri Feb 22 05:35:41 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Fri, 22 Feb 2013 10:35:41 +0000 Subject: [Biopython] User Defined Scoring Matrix In-Reply-To: References: Message-ID: On Fri, Feb 22, 2013 at 2:19 AM, Willis, Jordan R wrote: > Hello, > > Since I'm not sure which tool to exactly use, I will defer to the > biopython community since odds are I will be using it. I'm trying to produce > a multiple sequence alignment with a user defined scoring matrix. When I > look at Clustalw, there is an option to put in such a matrix, and the help > indicates that this should be in "blast" format. When I search for blast > format, they indicate that this is hard coded into the software. I wouldn't start with ClustalW - it is old and still widley used, but even the authors are trying to discourage this. They suggest their new tool Clustal Omega, and that as a Biopython wrapper and takes an optional distance matrix as input via the --distmat-i argument. from Bio.Align.Applications import ClustalOmegaCommandline help(ClustalOmegaCommandline) http://biopython.org/DIST/docs/api/Bio.Align.Applications._ClustalOmega.ClustalOmegaCommandline-class.html > My end goal is to produce a phylogeny tree using this PSSM, but I have no > idea how to input this into ClustalW or any multiple sequence alignment > software. I don't really care which software to use, which wrappers, or how > I have to do it.I have used biopython to produce this matrix, so I thought > it would be relatively easy to implement it in any multiple sequence > alignment software. > > I'm not having very good luck and any help would be must appreciated. > > Jordan There are people far more qualified than me to comment on the goals and if and when you should use a distance based tree (my understanding is distance based trees are the worst kind, but as they are computationally inexpensive make can sense for large datasets). Regards, Peter From biocyberman at gmail.com Fri Feb 22 10:18:58 2013 From: biocyberman at gmail.com (Biocyberman) Date: Fri, 22 Feb 2013 16:18:58 +0100 Subject: [Biopython] read and write full ID line of EMBL SeqRecord? Message-ID: Hi there, I am using Biopython version 1.6.1 (latest). My original ID line is: ID ACCESSION1; SV 1; linear; genomic DNA; HTG; PRO; 26402 BP. But after reading and writing out, I got this: ID ACCESSION1; SV 1; ; DNA; ; PRO; 26402 BP. How do I get the same ID line ? Attached is the python script and input file. Thanks for taking a look. Biocyberman -------------- next part -------------- A non-text attachment was scrubbed... Name: input.embl Type: application/octet-stream Size: 1063 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: checkconvert.py Type: application/octet-stream Size: 249 bytes Desc: not available URL: From p.j.a.cock at googlemail.com Fri Feb 22 11:08:13 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Fri, 22 Feb 2013 16:08:13 +0000 Subject: [Biopython] read and write full ID line of EMBL SeqRecord? In-Reply-To: References: Message-ID: On Fri, Feb 22, 2013 at 3:18 PM, Biocyberman wrote: > Hi there, > I am using Biopython version 1.6.1 (latest). > My original ID line is: > ID ACCESSION1; SV 1; linear; genomic DNA; HTG; PRO; 26402 BP. > > But after reading and writing out, I got this: > > ID ACCESSION1; SV 1; ; DNA; ; PRO; 26402 BP. > > How do I get the same ID line ? > > Attached is the python script and input file. > > Thanks for taking a look. > Biocyberman This is probably part of https://redmine.open-bio.org/issues/2578 (the GenBank and EMBL code overlaps a lot). Peter From ferreirafm at usp.br Fri Feb 22 12:01:02 2013 From: ferreirafm at usp.br (Frederico Moraes Ferreira) Date: Fri, 22 Feb 2013 14:01:02 -0300 Subject: [Biopython] blastdbcmd Message-ID: <5127A44E.2030403@usp.br> Hi there Biopythoneers, As long as I know, there isnt't a blastdbcmd submodule into Biopython. So, I've been writing the blast matched sequences ID's to a file, fetching them all with a subprocess and reading with SeqIO afterwards. In some cases, however, I miss a blastdbcmd parser to make things easy. How do you guys are dealing with this? Best, Fred From p.j.a.cock at googlemail.com Fri Feb 22 12:23:44 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Fri, 22 Feb 2013 17:23:44 +0000 Subject: [Biopython] blastdbcmd In-Reply-To: <5127A44E.2030403@usp.br> References: <5127A44E.2030403@usp.br> Message-ID: On Fri, Feb 22, 2013 at 5:01 PM, Frederico Moraes Ferreira wrote: > Hi there Biopythoneers, > As long as I know, there isnt't a blastdbcmd submodule into Biopython. So, > I've been writing the blast matched sequences ID's to a file, fetching them > all with a subprocess and reading with SeqIO afterwards. In some cases, > however, I miss a blastdbcmd parser to make things easy. How do you guys are > dealing with this? > Best, > Fred Are you talking about a command line wrapper for blastdbcmd, to go in Bio/Blast/Applications.py? That seems a good idea. Personally I find the blastdbcmd tool quite handicapped due to the introduction of generated sequence identifiers, and rarely use it: http://blastedbio.blogspot.co.uk/2012/10/my-ids-not-good-enough-for-ncbi-blast.html Instead I would use Bio.SeqIO to index the FASTA file used for the database, and get the sequences that way. Peter From jgibbons1 at mail.usf.edu Tue Feb 26 11:45:03 2013 From: jgibbons1 at mail.usf.edu (Justin Gibbons) Date: Tue, 26 Feb 2013 11:45:03 -0500 Subject: [Biopython] Filter Blast results Message-ID: I know that there is already a script in the Cookbook for filtering out blast queries with no hits, but it involves holding all of the sequence objects in memory, which isn't good if you have to work with a lot of sequences. I came up with the following function, which works, but I would appreciate any input for how to improve it. In particular I don't like that I am appending the sequence objects to file and would like to know of any alternatives. The main function is: def filter_no_hits(blast_xml_results, source_fasta, file_format, no_hit_file, hit_file): """Scans Blast XML results and if the query sequence has no hits prints the sequence record in the no_hit_file, otherwise in the hit_file. The source_fasta is the file that was used to perform the blast search and is used to retrieve the sequence record""" result_handle=open(blast_xml_results) #open the xml file blast_records=NCBIXML.parse(result_handle) #create the generator object indexed_fasta=create_indexed_fasta(source_fasta, file_format) #create the indexed file object for record in blast_records: hit_def_list=blast_xml_hit_def(record) #returns list of hit_def results record_id=get_id_str_from_desc(record.query) #get the record ID to search the indexed file record_object=indexed_fasta.get_raw(record_id) #Use the sequence ID to get the sequence record if is_list_null(hit_def_list): #if no hits append_to_file(no_hit_file, record_object) else: #if hits append_to_file(hit_file, record_object) result_handle.close() Sub-functions: def create_indexed_fasta(path, file_format): """Makes a fasta file searchable like a dictionary with the sequence Id as the key""" return SeqIO.index(path, file_format) def blast_xml_hit_def(record): """Returns a list of hit_def for a record from a NCBI blast XML report""" hit_def_list=[] for alignment in record.alignments: hit_def_list.append(alignment.hit_def) return hit_def_list def get_id_str_from_desc(desc): """Returns the Id from a fasta record description""" parts=desc.split(" ") return parts[0] def is_list_null(lst): """Returns True if list is empty and False otherwise""" if len(lst)==0: return True else: return False def append_to_file(path, string): with open(path, 'a') as f: f.write(string) def record_counter(path, file_format): """Input a file path and the format of the file and it returns the number of records in the file""" counter=0 for seq_record in SeqIO.parse(path, file_format): counter+=1 print "%s contains %i records" %(path, counter) return counter Thank you Justin Gibbons From p.j.a.cock at googlemail.com Tue Feb 26 11:57:01 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Tue, 26 Feb 2013 16:57:01 +0000 Subject: [Biopython] Filter Blast results In-Reply-To: References: Message-ID: On Tue, Feb 26, 2013 at 4:45 PM, Justin Gibbons wrote: > I know that there is already a script in the Cookbook for filtering out > blast queries with no hits, but it involves holding all of the sequence > objects in memory, which isn't good if you have to work with a lot of > sequences. Hi Justin, Which example are you referring too? It doesn't sound very efficient. There are some wiki pages with user contributed cookbook recipes: http://biopython.org/wiki/Category:Cookbook There is also the "Biopython Tutorial and Cookbook", online here: http://biopython.org/DIST/docs/tutorial/Tutorial.html http://biopython.org/DIST/docs/tutorial/Tutorial.pdf Thanks, Peter From w.arindrarto at gmail.com Tue Feb 26 12:27:21 2013 From: w.arindrarto at gmail.com (Wibowo Arindrarto) Date: Tue, 26 Feb 2013 18:27:21 +0100 Subject: [Biopython] Filter Blast results In-Reply-To: References: Message-ID: Hi Justin, For your purpose, you can try using the SearchIO module (http://biopython.org/DIST/docs/tutorial/Tutorial.html#htoc101), from the latest Biopython (1.61). Here's my attempt to have a similar working function: from Bio import SearchIO, SeqIO fasta_ids = set([x.id for x in SeqIO.parse('fasta', 'fasta')]) # get all fasta IDs in a set with open('no_hit', 'w') as no_hit, open('hit', 'w') as hit: for qresult in SearchIO.parse('blast_results.xml', 'blast-xml'): hits = set([x.id for x in qresult]) # get all the ID in a set present = fasta_ids.intersection(hits) # output all IDs present in both sets if present: # set is not empty hit.write(qresult.id) else: no_hit.write(qresult.id) On another note, if you are always checking against the same Fasta file, you can try to create your own BLAST database consisting of only those files and search against them, so any BLAST results you have will always at least contain one of the sequences in your FASTA file. This makes the functions slightly simpler: from Bio import SearchIO with open('no_hit', 'w') as no_hit, open('hit', 'w') as hit: for qresult in SearchIO.parse('blast_results.xml', 'blast-xml'): # empty queries evaluate to False if qresult: hit.write(qresult.id) else: no_hit.write(qresult.id) Both functions still require you to store all the FASTA IDs in memory, but should be more reasonable than storing whole SeqRecord objects. Hope that helps, Bow On Tue, Feb 26, 2013 at 5:57 PM, Peter Cock wrote: > On Tue, Feb 26, 2013 at 4:45 PM, Justin Gibbons wrote: >> I know that there is already a script in the Cookbook for filtering out >> blast queries with no hits, but it involves holding all of the sequence >> objects in memory, which isn't good if you have to work with a lot of >> sequences. > > Hi Justin, > > Which example are you referring too? It doesn't sound very efficient. > > There are some wiki pages with user contributed cookbook recipes: > http://biopython.org/wiki/Category:Cookbook > > There is also the "Biopython Tutorial and Cookbook", online here: > http://biopython.org/DIST/docs/tutorial/Tutorial.html > http://biopython.org/DIST/docs/tutorial/Tutorial.pdf > > Thanks, > > Peter > _______________________________________________ > Biopython mailing list - Biopython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython From robert.j.ahern at mycit.ie Wed Feb 27 12:21:24 2013 From: robert.j.ahern at mycit.ie (Robert Ahern) Date: Wed, 27 Feb 2013 17:21:24 +0000 Subject: [Biopython] (no subject) Message-ID: <9042978694721632165@unknownmsgid> robert.j.ahern at mycit.ie Sent from Windows Mail From p.j.a.cock at googlemail.com Wed Feb 27 17:32:35 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Wed, 27 Feb 2013 22:32:35 +0000 Subject: [Biopython] Fwd: [Numpy-discussion] [ANN] SciPy2013: Call for abstracts In-Reply-To: References: Message-ID: The new bioinformatics mini-symposium this year makes SciPy 2013 especially interesting. Peter ---------- Forwarded message ---------- From: *Jonathan Rocher* Date: Wednesday, February 27, 2013 Subject: [Numpy-discussion] [ANN] SciPy2013: Call for abstracts To: SciPy Users List , numfocus at googlegroups.com, Discussion of Numerical Python [Apologies for cross-posts] Dear all, The annual SciPy Conference (Scientific Computing with Python) allows participants from academic, commercial, and governmental organizations to showcase their latest projects, learn from skilled users and developers, and collaborate on code development. *The deadline for abstract submissions is March 20th, 2013. * Submissions are welcome that address general Scientific Computing with Python, one of the two special themes for this years conference (machine learning & reproducible science), or the domain-specific mini-symposiaheld during the conference (Meteorology, climatology, and atmospheric and oceanic science, Astronomy and astrophysics, Medical imaging, Bio-informatics). Please submit your abstract at the SciPy 2013 website abstract submission form . Abstracts will be accepted for posters or presentations. Optional papers to be published in the conference proceedings will be requested following abstract submission. This year the proceedings will be made available prior to the conference to help attendees navigate the conference. We look forward to an exciting and interesting set of talks, posters, and discussions and hope to see you at the conference. The SciPy 2013 Program Committee Chairs Matt McCormick, Kitware, Inc. Katy Huff, University of Wisconsin-Madison and Argonne National Laboratory From chapmanb at 50mail.com Thu Feb 28 21:36:34 2013 From: chapmanb at 50mail.com (Brad Chapman) Date: Thu, 28 Feb 2013 21:36:34 -0500 Subject: [Biopython] [ANN] SciPy2013: Call for abstracts In-Reply-To: References: Message-ID: <87ppzjsv65.fsf@fastmail.fm> Peter; Thanks for sending this out. I'm helping with the organization of the SciPy bioinformatics session thanks to Peter's recommendation and wrote up a little bit about the types of abstracts that would fit will with the overall theme of SciPy: http://j.mp/Z4xxXB This is a great chance to connect with another open source scientific community so definitely send in an abstract if this is of interest; the deadline is coming up next month: March 20th. Austin also has awesome music and barbecue in addition to science and hacking so lots of reasons to attend, Brad > The new bioinformatics mini-symposium this year makes SciPy 2013 > especially interesting. > > Peter > > ---------- Forwarded message ---------- > From: *Jonathan Rocher* > Date: Wednesday, February 27, 2013 > Subject: [Numpy-discussion] [ANN] SciPy2013: Call for abstracts > To: SciPy Users List , numfocus at googlegroups.com, > Discussion of Numerical Python > > > [Apologies for cross-posts] > > Dear all, > > The annual SciPy Conference (Scientific Computing with > Python) allows > participants from academic, commercial, and governmental organizations to > showcase their latest projects, learn from skilled users and developers, > and collaborate on code development. *The deadline for abstract submissions > is March 20th, 2013. * > > Submissions are welcome that address general Scientific Computing with > Python, one of the two special themes for this years conference (machine > learning & reproducible science), or the domain-specific > mini-symposiaheld > during the conference (Meteorology, climatology, and atmospheric and > oceanic science, Astronomy and astrophysics, Medical imaging, > Bio-informatics). > > Please submit your abstract at the SciPy 2013 website abstract submission > form . > Abstracts will be accepted for posters or presentations. Optional papers to > be published in the conference proceedings will be requested following > abstract submission. This year the proceedings will be made available prior > to the conference to help attendees navigate the conference. > > We look forward to an exciting and interesting set of talks, posters, and > discussions and hope to see you at the conference. > The SciPy 2013 Program Committee Chairs > > Matt McCormick, Kitware, Inc. > Katy Huff, University of Wisconsin-Madison and Argonne National Laboratory > _______________________________________________ > Biopython mailing list - Biopython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython From p.j.a.cock at googlemail.com Fri Feb 1 13:34:46 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Fri, 1 Feb 2013 13:34:46 +0000 Subject: [Biopython] Fwd: Bug in bgzf module In-Reply-To: References: Message-ID: On Thu, Jan 31, 2013 at 10:57 PM, Petra Kubincov? wrote: > Hi Peter, > > well, I don't have much experience with unit tests but I will try to come up > with something. :) > I'll let you know if I won't succeed. That would be great - in the short term I've added something quite simple: https://github.com/biopython/biopython/commit/5b0d0bd55024d6dbbdea85ff73e6bd2fbbfd5ee1 > And yes, recording an index is exactly the thing I need to do. (I am > currently working on interval mapping tool for multiple whole-genome > alignments, where I need to read .maf file, write preprocessed data into a > compressed file and then work just with index for the compressed file and > the compressed file itself to do the mapping.) That reminds me I need to look at Andrew's MAF work: http://biopython.org/wiki/Multiple_Alignment_Format https://github.com/biopython/biopython/pull/5 Regards, Peter From p.j.a.cock at googlemail.com Mon Feb 4 18:04:40 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Mon, 4 Feb 2013 18:04:40 +0000 Subject: [Biopython] Proof reading the tutorial for the next release? Message-ID: Hello all, If you're also on the Biopython-Dev Mailing List you'll know we're hoping to release Biopython 1.61 this week. If anyone here wants to help out, proof-reading the draft tutorial would be great :) I've posted the current tutorial as HTML and PDF online, http://biopython.org/DIST/docs/tutorial/Tutorial-dev.html http://biopython.org/DIST/docs/tutorial/Tutorial-dev.pdf Currently those are being updated manually (it used to be done automatically every night - something which needs to be-configured following a server move). If you see an error, and want to know if it has already been fixed, then the source file is Tutorial.tex (it is written using LaTex), and you can see the recent changes here on GitHub: https://github.com/biopython/biopython/commits/master/Doc/Tutorial.tex Thanks, Peter From p.j.a.cock at googlemail.com Tue Feb 5 22:05:25 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Tue, 5 Feb 2013 22:05:25 +0000 Subject: [Biopython] Biopython 1.61 released Message-ID: Dear Biopythoneers, Source distributions and Windows installers for Biopython 1.61 are now available from the downloads page on the Biopython website and from the Python Package Index (PyPI). The updated Biopython Tutorial and Cookbook is online (PDF). Platforms/Deployment: We currently support Python 2.5, 2.6 and 2.7 and also test under Python 3.1, 3.2 and 3.3 (including modules using NumPy), and Jython 2.5 and PyPy 1.9 (Jython and PyPy do not cover NumPy or our C extensions). We are still encouraging early adopters to help test on these platforms, and have included a ?beta? installer for Python 3.2 (and Python 3.3 to follow soon) under 32-bit Windows. Please note we are phasing out support for Python 2.5. We will continue support for at least one further release (Biopython 1.62). This could be extended given feedback from our users. Focusing on Python 2.6 and 2.7 only will make writing Python 3 compatible code easier. New Features: GenomeDiagram has three new sigils (shapes to illustrate features). OCTO shows an octagonal shape, like the existing BOX sigil but with the corners cut off. JAGGY shows a box with jagged edges at the start and end, intended for things like NNNNN regions in draft genomes. Finally BIGARROW is like the existing ARROW sigil but is drawn straddling the axis. This is useful for drawing vertically compact figures where you do not have overlapping genes. New module Bio.Graphics.ColorSpiral can generate colors along a spiral path through HSV color space. This can be used to make arbitrary ?rainbow? scales, for example to color features or cross-links on a GenomeDiagram figure. The Bio.SeqIO module now supports reading sequences from PDB files in two different ways. The ?pdb-atom? format determines the sequence as it appears in the structure based on the atom coordinate section of the file (via Bio.PDB, so NumPy is currently required for this). Alternatively, you can use the ?pdb-seqres? format to read the complete protein sequence as it is listed in the PDB header, if available. The Bio.SeqUtils module how has a seq1 function to turn a sequence using three letter amino acid codes into one using the more common one letter codes. This acts as the inverse of the existing seq3 function. The multiple-sequence-alignment object used by Bio.AlignIO etc now supports an annotation dictionary. Additional support for per-column annotation is planned, with addition and splicing to work like that for the SeqRecord per-letter annotation. The Bio.Motif module has been updated and reorganized. To allow for a clean deprecation of the old code, the new motif code is stored in a new module Bio.motifs, and a PendingDeprecationWarning was added to Bio.Motif. Experimental Code ? SearchIO: This release also includes Bow?s Google Summer of Code work writing a unified parsing framework for NCBI BLAST (assorted formats including tabular and XML), HMMER, BLAT, and other sequence searching tools. This is currently available with the new BiopythonExperimentalWarning to indicate that this is still somewhat experimental. We?re bundling it with the main release to get more public feedback, but with the big warning that the API is likely to change. In fact, even the current name of Bio.SearchIO may change since unless you are familiar with BioPerl its purpose isn?t immediately clear. Contributors: Brandon Invergo Bryan Lunt (first contribution) Christian Brueffer (first contribution) David Cain Eric Talevich Grace Yeo (first contribution) Jeffrey Chang Jingping Li (first contribution) Kai Blin (first contribution) Leighton Pritchard Lenna Peterson Lucas Sinclair (first contribution) Michiel de Hoon Nick Semenkovich (first contribution) Peter Cock Robert Ernst (first contribution) Tiago Antao Wibowo ?Bow? Arindrarto Thank you all. Release announcement here (RSS feed available): http://news.open-bio.org/news/2013/02/biopython-1-61-released/ P.S. You can follow @Biopython on Twitter https://twitter.com/Biopython From w.arindrarto at gmail.com Wed Feb 6 00:03:52 2013 From: w.arindrarto at gmail.com (Wibowo Arindrarto) Date: Wed, 6 Feb 2013 01:03:52 +0100 Subject: [Biopython] Biopython 1.61 released In-Reply-To: References: Message-ID: Hi Peter, > Dear Biopythoneers, > > Source distributions and Windows installers for Biopython 1.61 are now > available from the downloads page on the Biopython website and from > the Python Package Index (PyPI). > > The updated Biopython Tutorial and Cookbook is online (PDF). > > Platforms/Deployment: > > We currently support Python 2.5, 2.6 and 2.7 and also test under > Python 3.1, 3.2 and 3.3 (including modules using NumPy), and Jython > 2.5 and PyPy 1.9 (Jython and PyPy do not cover NumPy or our C > extensions). We are still encouraging early adopters to help test on > these platforms, and have included a ?beta? installer for Python 3.2 > (and Python 3.3 to follow soon) under 32-bit Windows. > > Please note we are phasing out support for Python 2.5. We will > continue support for at least one further release (Biopython 1.62). > This could be extended given feedback from our users. Focusing on > Python 2.6 and 2.7 only will make writing Python 3 compatible code > easier. > > New Features: > > GenomeDiagram has three new sigils (shapes to illustrate features). > OCTO shows an octagonal shape, like the existing BOX sigil but with > the corners cut off. JAGGY shows a box with jagged edges at the start > and end, intended for things like NNNNN regions in draft genomes. > Finally BIGARROW is like the existing ARROW sigil but is drawn > straddling the axis. This is useful for drawing vertically compact > figures where you do not have overlapping genes. > > New module Bio.Graphics.ColorSpiral can generate colors along a spiral > path through HSV color space. This can be used to make arbitrary > ?rainbow? scales, for example to color features or cross-links on a > GenomeDiagram figure. > > The Bio.SeqIO module now supports reading sequences from PDB files in > two different ways. The ?pdb-atom? format determines the sequence as > it appears in the structure based on the atom coordinate section of > the file (via Bio.PDB, > so NumPy is currently required for this). Alternatively, you can use > the ?pdb-seqres? format to read the complete protein sequence as it is > listed in the PDB header, if available. > > The Bio.SeqUtils module how has a seq1 function to turn a sequence > using three letter amino acid codes into one using the more common one > letter codes. This acts as the inverse of the existing seq3 function. > > The multiple-sequence-alignment object used by Bio.AlignIO etc now > supports an annotation dictionary. Additional support for per-column > annotation is planned, with addition and splicing to work like that > for the SeqRecord per-letter annotation. > > The Bio.Motif module has been updated and reorganized. To allow for a > clean deprecation of the old code, the new motif code is stored in a > new module Bio.motifs, and a PendingDeprecationWarning was added to > Bio.Motif. > > Experimental Code ? SearchIO: > > This release also includes Bow?s Google Summer of Code work writing a > unified parsing framework for NCBI BLAST (assorted formats including > tabular and XML), HMMER, BLAT, and other sequence searching tools. > This is currently available with the new BiopythonExperimentalWarning > to indicate that this is still somewhat experimental. We?re bundling > it with the main release to get more public feedback, but with the big > warning that the API is likely to change. In fact, even the current > name of Bio.SearchIO may change since unless you are familiar with > BioPerl its purpose isn?t immediately clear. > > Contributors: > > Brandon Invergo > Bryan Lunt (first contribution) > Christian Brueffer (first contribution) > David Cain > Eric Talevich > Grace Yeo (first contribution) > Jeffrey Chang > Jingping Li (first contribution) > Kai Blin (first contribution) > Leighton Pritchard > Lenna Peterson > Lucas Sinclair (first contribution) > Michiel de Hoon > Nick Semenkovich (first contribution) > Peter Cock > Robert Ernst (first contribution) > Tiago Antao > Wibowo ?Bow? Arindrarto > > Thank you all. > > Release announcement here (RSS feed available): > http://news.open-bio.org/news/2013/02/biopython-1-61-released/ > > P.S. You can follow @Biopython on Twitter > https://twitter.com/Biopython Thanks for doing the release! It feels exciting to see SearchIO code finally live in the distributions :). Hopefully this will result in more feedback (and then more improvements ~ likewise for the whole Biopython as well). Also, thank you as well to everyone who has criticized / commented / contributed code to the module :). cheers, Bow From p.j.a.cock at googlemail.com Thu Feb 7 11:33:25 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Thu, 7 Feb 2013 11:33:25 +0000 Subject: [Biopython] Biopython 1.61 released In-Reply-To: References: Message-ID: On Tue, Feb 5, 2013 at 10:05 PM, Peter Cock wrote: > Dear Biopythoneers, > > Source distributions and Windows installers for Biopython 1.61 are now > available from the downloads page on the Biopython website and from > the Python Package Index (PyPI). > > The updated Biopython Tutorial and Cookbook is online (PDF). > > Platforms/Deployment: > > We currently support Python 2.5, 2.6 and 2.7 and also test under > Python 3.1, 3.2 and 3.3 (including modules using NumPy), and Jython > 2.5 and PyPy 1.9 (Jython and PyPy do not cover NumPy or our C > extensions). We are still encouraging early adopters to help test on > these platforms, and have included a ?beta? installer for Python 3.2 > (and Python 3.3 to follow soon) under 32-bit Windows. For those of you wanting to try Biopython on Python 3.3 on Windows, there is now an installer for Biopython 1.61 built against NumPy 1.7.0rc2. NumPy 1.7 is their first release to support Python 3.3, and the official release is expected to be near-identical to this second release candidate, see: http://mail.scipy.org/pipermail/numpy-discussion/2013-February/065384.html Regards, Peter From vincent at vincentdavis.net Sun Feb 10 03:47:20 2013 From: vincent at vincentdavis.net (Vincent Davis) Date: Sat, 9 Feb 2013 20:47:20 -0700 Subject: [Biopython] Taxonomic Classification tree Message-ID: Any suggestion of how to build a Taxonomic Classification tree. That is, like a Phylo tree but based on taxa. Vincent Davis From nuin at genedrift.org Sun Feb 10 04:03:27 2013 From: nuin at genedrift.org (Paulo Nuin) Date: Sat, 9 Feb 2013 23:03:27 -0500 Subject: [Biopython] Taxonomic Classification tree In-Reply-To: References: Message-ID: <848FC12D-5B1C-4D1F-94B1-4EC97845FFEE@genedrift.org> All phylogenetic trees are based on taxa. You might need to be more specific. Paulo On 2013-02-09, at 10:47 PM, Vincent Davis wrote: > Any suggestion of how to build a Taxonomic Classification tree. That is, > like a Phylo tree but based on taxa. > > Vincent Davis > _______________________________________________ > Biopython mailing list - Biopython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython From vincent at vincentdavis.net Sun Feb 10 04:53:13 2013 From: vincent at vincentdavis.net (Vincent Davis) Date: Sat, 9 Feb 2013 21:53:13 -0700 Subject: [Biopython] Taxonomic Classification tree In-Reply-To: <848FC12D-5B1C-4D1F-94B1-4EC97845FFEE@genedrift.org> References: <848FC12D-5B1C-4D1F-94B1-4EC97845FFEE@genedrift.org> Message-ID: On Sat, Feb 9, 2013 at 9:03 PM, Paulo Nuin wrote: > All phylogenetic trees are based on taxa. You might need to be more > specific. Maybe but Taxonomic Classification is not based on phylogenetics. What I have is a list of organisms and their Taxonomic Classification. I want to build a tree based on only the Taxonomic Classification. Vincent Davis 720-301-3003 From cjfields at illinois.edu Sun Feb 10 05:01:59 2013 From: cjfields at illinois.edu (Fields, Christopher J) Date: Sun, 10 Feb 2013 05:01:59 +0000 Subject: [Biopython] Taxonomic Classification tree In-Reply-To: References: <848FC12D-5B1C-4D1F-94B1-4EC97845FFEE@genedrift.org> Message-ID: <118F034CF4C3EF48A96F86CE585B94BF6CE1FE3E@CHIMBX5.ad.uillinois.edu> On Feb 9, 2013, at 10:53 PM, Vincent Davis wrote: > On Sat, Feb 9, 2013 at 9:03 PM, Paulo Nuin wrote: > >> All phylogenetic trees are based on taxa. You might need to be more >> specific. > > > Maybe but Taxonomic Classification is not based on phylogenetics. > What I have is a list of organisms and their Taxonomic Classification. I > want to build a tree based on only the Taxonomic Classification. > > Vincent Davis > 720-301-3003 There's code floating around on the bioperl side for doing this sort of thing, not sure if biopython has anything along these lines (I would be surprised if someone hasn't done this yet, though). chris From vincent at vincentdavis.net Sun Feb 10 20:16:20 2013 From: vincent at vincentdavis.net (Vincent Davis) Date: Sun, 10 Feb 2013 13:16:20 -0700 Subject: [Biopython] NCBI Blast, what an I going wrong Message-ID: I am having trouble with NCBIWWW.qblast I can get the the example to work. Maybe I need help with reading :-) >From the documentation result_handle = NCBIWWW.qblast("blastn", "nt", "8332116") save_file = open("temp.xml", "w") save_file.write(result_handle.read()) save_file.close() result_handle.close() result_handle = open("temp.xml") blast_record = NCBIXML.parse(result_handle) The temp.xml looks correct but I can get nothing from blast_record. I have tried passing the directly to NCBIXML.parse and still no luck. How would I for example get the first hit "gi|224094601" ? Vincent Davis From p.j.a.cock at googlemail.com Sun Feb 10 20:35:20 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Sun, 10 Feb 2013 20:35:20 +0000 Subject: [Biopython] NCBI Blast, what an I going wrong In-Reply-To: References: Message-ID: On Sun, Feb 10, 2013 at 8:16 PM, Vincent Davis wrote: > I am having trouble with NCBIWWW.qblast I can get the the example to work. > Maybe I need help with reading :-) > > >From the documentation > result_handle = NCBIWWW.qblast("blastn", "nt", "8332116") > save_file = open("temp.xml", "w") > save_file.write(result_handle.read()) > save_file.close() > result_handle.close() > result_handle = open("temp.xml") > blast_record = NCBIXML.parse(result_handle) > > The temp.xml looks correct but I can get nothing from blast_record. I have > tried passing the directly to NCBIXML.parse and still no luck. > > How would I for example get the first hit "gi|224094601" ? > > Vincent Davis Hi Vincent, Well, first I would check that the BLAST results were downloaded ok - can you open the temp.xml file in a text editor (e.g. WordPad on Windows)? Can you see the hits you are expecting? Second, the parse function is for iterating over the file - if you expect just one query's results, try: blast_record = NCBIXML.read(result_handle) Peter From vincent at vincentdavis.net Sun Feb 10 20:41:42 2013 From: vincent at vincentdavis.net (Vincent Davis) Date: Sun, 10 Feb 2013 13:41:42 -0700 Subject: [Biopython] NCBI Blast, what an I going wrong In-Reply-To: References: Message-ID: Peter, I verified the file, I miss understood the " if you expect just one query's results" I was reading this as meaning that there would be more than one hit. I figured this was a stupid mistake. Thanks Vincent Vincent Davis 720-301-3003 On Sun, Feb 10, 2013 at 1:35 PM, Peter Cock wrote: > On Sun, Feb 10, 2013 at 8:16 PM, Vincent Davis > wrote: > > I am having trouble with NCBIWWW.qblast I can get the the example to > work. > > Maybe I need help with reading :-) > > > > >From the documentation > > result_handle = NCBIWWW.qblast("blastn", "nt", "8332116") > > save_file = open("temp.xml", "w") > > save_file.write(result_handle.read()) > > save_file.close() > > result_handle.close() > > result_handle = open("temp.xml") > > blast_record = NCBIXML.parse(result_handle) > > > > The temp.xml looks correct but I can get nothing from blast_record. I > have > > tried passing the directly to NCBIXML.parse and still no luck. > > > > How would I for example get the first hit "gi|224094601" ? > > > > Vincent Davis > > Hi Vincent, > > Well, first I would check that the BLAST results were downloaded > ok - can you open the temp.xml file in a text editor (e.g. WordPad > on Windows)? Can you see the hits you are expecting? > > Second, the parse function is for iterating over the file - if you > expect just one query's results, try: > > blast_record = NCBIXML.read(result_handle) > > Peter > From p.j.a.cock at googlemail.com Sun Feb 10 20:42:56 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Sun, 10 Feb 2013 20:42:56 +0000 Subject: [Biopython] NCBI Blast, what an I going wrong In-Reply-To: References: Message-ID: On Sun, Feb 10, 2013 at 8:41 PM, Vincent Davis wrote: > Peter, > I verified the file, > I miss understood the " if you expect just one query's results" I was > reading this as meaning that there would be more than one hit. > I figured this was a stupid mistake. > > Thanks > Vincent So things are working now :) Great, Peter From vincent at vincentdavis.net Sun Feb 10 20:46:25 2013 From: vincent at vincentdavis.net (Vincent Davis) Date: Sun, 10 Feb 2013 13:46:25 -0700 Subject: [Biopython] NCBI Blast, what an I going wrong In-Reply-To: References: Message-ID: Yes Vincent Davis 720-301-3003 On Sun, Feb 10, 2013 at 1:42 PM, Peter Cock wrote: > On Sun, Feb 10, 2013 at 8:41 PM, Vincent Davis > wrote: > > Peter, > > I verified the file, > > I miss understood the " if you expect just one query's results" I was > > reading this as meaning that there would be more than one hit. > > I figured this was a stupid mistake. > > > > Thanks > > Vincent > > So things are working now :) > > Great, > > Peter > From winda002 at student.otago.ac.nz Sun Feb 10 21:16:53 2013 From: winda002 at student.otago.ac.nz (David Winter) Date: Mon, 11 Feb 2013 10:16:53 +1300 Subject: [Biopython] Taxonomic Classification tree In-Reply-To: <118F034CF4C3EF48A96F86CE585B94BF6CE1FE3E@CHIMBX5.ad.uillinois.edu> References: <848FC12D-5B1C-4D1F-94B1-4EC97845FFEE@genedrift.org> <118F034CF4C3EF48A96F86CE585B94BF6CE1FE3E@CHIMBX5.ad Message-ID: <51180E45.9000102@student.otago.ac.nz> Hi Vincent, It would probably be possible to do this with Biopython, either by (a) searching the NCBI's taxonomy database with Eutils to get IDs, then fetching the corresponding taxonomy records and extracting the complete lineage for each of your taxa. You could find the "lowest shared taxon" for each one an build a tree (b) Read the whole NCBI taxonomy using Phylo, and extracting a subtree with just your taxa Both those are probably more work than you need to do though. The Interactive Tree of Life page (http://itol.embl.de/other_trees.shtml) can take taxon names or IDs and return a phylogeny. You should be aware - taxonomy is a dynamic science, and assignments can change. The NCBI taxonomy is curated by people that know what they're talking about, but it's not a definitive tree of life or the result of a particular phylogenetic analysis. David -- David Winter Research Associate Allan Wilson Centre for Molecular Ecology and Evolution Univeristy of Otago Dunedin New Zealand/ Aotearoa ph + 64 22 018 0449 w: www.david-winter.info blog: sciblogs.co.nz/the-atavism On 2/10/2013 6:01 PM, Fields, Christopher J wrote: > On Feb 9, 2013, at 10:53 PM, Vincent Davis > wrote: > >> On Sat, Feb 9, 2013 at 9:03 PM, Paulo Nuin wrote: >> >>> All phylogenetic trees are based on taxa. You might need to be more >>> specific. >> >> >> Maybe but Taxonomic Classification is not based on phylogenetics. >> What I have is a list of organisms and their Taxonomic Classification. I >> want to build a tree based on only the Taxonomic Classification. >> >> Vincent Davis >> 720-301-3003 > > > There's code floating around on the bioperl side for doing this sort of thing, not sure if biopython has anything along these lines (I would be surprised if someone hasn't done this yet, though). > > chris > > > _______________________________________________ > Biopython mailing list - Biopython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython > > From vincent at vincentdavis.net Sun Feb 10 22:25:53 2013 From: vincent at vincentdavis.net (Vincent Davis) Date: Sun, 10 Feb 2013 15:25:53 -0700 Subject: [Biopython] How to BLAST Optimize for : More dissimilar sequences (discontiguous megablast) Message-ID: On the NCBI Blast website there is an option to *Optimize for : **More dissimilar sequences (discontiguous megablast) *The URL shows to to this to be BLAST_PROGRAMS="discoMegablast" is there a way to do this with NCBIWWW.qblast ? * * Vincent Davis From vincent at vincentdavis.net Sun Feb 10 22:31:22 2013 From: vincent at vincentdavis.net (Vincent Davis) Date: Sun, 10 Feb 2013 15:31:22 -0700 Subject: [Biopython] Taxonomic Classification tree In-Reply-To: <51180E45.9000102@student.otago.ac.nz> References: <848FC12D-5B1C-4D1F-94B1-4EC97845FFEE@genedrift.org> <118F034CF4C3EF48A96F86CE585B94BF6CE1FE3E@CHIMBX5.ad.uillinois.edu> <51180E45.9000102@student.otago.ac.nz> Message-ID: On Sun, Feb 10, 2013 at 2:16 PM, David Winter wrote: > > Both those are probably more work than you need to do though. The > Interactive Tree of Life page (http://itol.embl.de/other_**trees.shtml) > can take taxon names or IDs and return a phylogeny. > This is what I needed thanks David Vincent Davis 720-301-3003 From vincent at vincentdavis.net Mon Feb 11 04:35:49 2013 From: vincent at vincentdavis.net (Vincent Davis) Date: Sun, 10 Feb 2013 21:35:49 -0700 Subject: [Biopython] How to BLAST Optimize for : More dissimilar sequences (discontiguous megablast) In-Reply-To: References: Message-ID: On Sun, Feb 10, 2013 at 3:25 PM, Vincent Davis wrote: > BLAST_PROGRAMS I got it figured out. Just need to change the defaults Vincent Davis 720-301-3003 From hlapp at drycafe.net Thu Feb 14 04:30:15 2013 From: hlapp at drycafe.net (Hilmar Lapp) Date: Wed, 13 Feb 2013 23:30:15 -0500 Subject: [Biopython] Taxonomic Classification tree In-Reply-To: References: <848FC12D-5B1C-4D1F-94B1-4EC97845FFEE@genedrift.org> Message-ID: <94D820E8-1E6A-438F-B923-63288D7DBAC6@drycafe.net> On Feb 9, 2013, at 11:53 PM, Vincent Davis wrote: > On Sat, Feb 9, 2013 at 9:03 PM, Paulo Nuin wrote: > >> All phylogenetic trees are based on taxa. This is not true. Phylogenetic trees are based on a character matrix. The rows in such a matrix are called OTUs. OTUs may or may not refer to a taxon; they could (and nowadays typically do) refer to a gene, a protein, a (part of a) genome, or some other nucleic acid or amino acid sequence. > Maybe but Taxonomic Classification is not based on phylogenetics. Not strictly, but it aspires to be. I.e., species taxonomies aspire to group taxa together that are monophyletic. In practice this isn't always the case, but it's the idea, and is one reason why taxonomies change. > What I have is a list of organisms and their Taxonomic Classification. I want to build a tree based on only the Taxonomic Classification. You can obtain this directly from the NCBI taxonomy: http://www.ncbi.nlm.nih.gov/Taxonomy/CommonTree/wwwcmt.cgi -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net : =========================================================== From nuin at genedrift.org Thu Feb 14 10:22:19 2013 From: nuin at genedrift.org (Paulo Nuin) Date: Thu, 14 Feb 2013 05:22:19 -0500 Subject: [Biopython] Taxonomic Classification tree In-Reply-To: <94D820E8-1E6A-438F-B923-63288D7DBAC6@drycafe.net> References: <848FC12D-5B1C-4D1F-94B1-4EC97845FFEE@genedrift.org> <94D820E8-1E6A-438F-B923-63288D7DBAC6@drycafe.net> Message-ID: On 2013-02-13, at 11:30 PM, Hilmar Lapp wrote: > > On Feb 9, 2013, at 11:53 PM, Vincent Davis wrote: > >> On Sat, Feb 9, 2013 at 9:03 PM, Paulo Nuin wrote: >> >>> All phylogenetic trees are based on taxa. > > This is not true. Phylogenetic trees are based on a character matrix. The rows in such a matrix are called OTUs. OTUs may or may not refer to a taxon; they could (and nowadays typically do) refer to a gene, a protein, a (part of a) genome, or some other nucleic acid or amino acid sequence. > >> Around the gene, protein, sequence, phenotipic character there's an OTU, and there's a a taxon. If you are analyzing extraterrestrial species (or car colours, or fridge models) you might not have a taxon on your OTU but otherwise each and every piece of data you analyze has come from a species, known or not, repeated or unique in your rows. Semantically, you are correct, but even if you put 1000 genes from the same species in a matrix, and generate a phylogenetic tree, you still based your tree on a taxon. P > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net : > =========================================================== > > > > From vincent at vincentdavis.net Thu Feb 14 17:20:58 2013 From: vincent at vincentdavis.net (Vincent Davis) Date: Thu, 14 Feb 2013 10:20:58 -0700 Subject: [Biopython] Concatenate to aligned sequences Message-ID: I have 2 fasta files from a mucle alignment. Both have the same number of sequences from the same organism. If I what to concatenate the pairs of sequences what it the best way to do this. Right now I am doing this: def concatenate(fa1, fa2): fa1open = open(fa1, "rU") fa2open = open(fa1, "rU") fa1dict = SeqIO.to_dict(SeqIO.parse(fa1open, "fasta")) fa2dict = SeqIO.to_dict(SeqIO.parse(fa2open, "fasta")) fa1open.close() fa2open.close() # check that both files have the same sequnce id's if set(fa1dict.keys()) != set(fa2dict.keys()): print(fa1dict.keys(), fa2dict.keys()) print('The fasta files do not have the same sequences') bothdict = {} bothlist = [] count = 1 for key in fa2dict.keys(): bothdict[key] = fa2dict[key] bothdict[key].seq = fa2dict[key].seq + fa1dict[key].seq bothlist.append(bothdict[key]) return bothdict, bothlist Vincent Davis 720-301-3003 From p.j.a.cock at googlemail.com Thu Feb 14 17:29:12 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Thu, 14 Feb 2013 17:29:12 +0000 Subject: [Biopython] Concatenate to aligned sequences In-Reply-To: References: Message-ID: On Thu, Feb 14, 2013 at 5:20 PM, Vincent Davis wrote: > I have 2 fasta files from a mucle alignment. Both have the same number of > sequences from the same organism. If I what to concatenate the pairs of > sequences what it the best way to do this. > Right now I am doing this: > > def concatenate(fa1, fa2): > fa1open = open(fa1, "rU") > fa2open = open(fa1, "rU") > fa1dict = SeqIO.to_dict(SeqIO.parse(fa1open, "fasta")) > fa2dict = SeqIO.to_dict(SeqIO.parse(fa2open, "fasta")) > fa1open.close() > fa2open.close() > # check that both files have the same sequnce id's > if set(fa1dict.keys()) != set(fa2dict.keys()): > print(fa1dict.keys(), fa2dict.keys()) > print('The fasta files do not have the same sequences') > bothdict = {} > bothlist = [] > count = 1 > for key in fa2dict.keys(): > bothdict[key] = fa2dict[key] > bothdict[key].seq = fa2dict[key].seq + fa1dict[key].seq > bothlist.append(bothdict[key]) > return bothdict, bothlist > > Vincent Davis > 720-301-3003 Have you tried loading the two alignment files via AlignIO, sorting by name if required, and adding the alignment objects? http://biopython.org/DIST/docs/api/Bio.Align.MultipleSeqAlignment-class.html#__add__ Peter From vincent at vincentdavis.net Thu Feb 14 17:38:43 2013 From: vincent at vincentdavis.net (Vincent Davis) Date: Thu, 14 Feb 2013 10:38:43 -0700 Subject: [Biopython] Concatenate to aligned sequences In-Reply-To: References: Message-ID: Thanks Vincent Vincent Davis 720-301-3003 On Thu, Feb 14, 2013 at 10:29 AM, Peter Cock wrote: > On Thu, Feb 14, 2013 at 5:20 PM, Vincent Davis > wrote: > > I have 2 fasta files from a mucle alignment. Both have the same number of > > sequences from the same organism. If I what to concatenate the pairs of > > sequences what it the best way to do this. > > Right now I am doing this: > > > > def concatenate(fa1, fa2): > > fa1open = open(fa1, "rU") > > fa2open = open(fa1, "rU") > > fa1dict = SeqIO.to_dict(SeqIO.parse(fa1open, "fasta")) > > fa2dict = SeqIO.to_dict(SeqIO.parse(fa2open, "fasta")) > > fa1open.close() > > fa2open.close() > > # check that both files have the same sequnce id's > > if set(fa1dict.keys()) != set(fa2dict.keys()): > > print(fa1dict.keys(), fa2dict.keys()) > > print('The fasta files do not have the same sequences') > > bothdict = {} > > bothlist = [] > > count = 1 > > for key in fa2dict.keys(): > > bothdict[key] = fa2dict[key] > > bothdict[key].seq = fa2dict[key].seq + fa1dict[key].seq > > bothlist.append(bothdict[key]) > > return bothdict, bothlist > > > > Vincent Davis > > 720-301-3003 > > Have you tried loading the two alignment files via AlignIO, > sorting by name if required, and adding the alignment objects? > > > http://biopython.org/DIST/docs/api/Bio.Align.MultipleSeqAlignment-class.html#__add__ > > Peter > From karolisr at gmail.com Fri Feb 15 17:28:06 2013 From: karolisr at gmail.com (Karolis Ramanauskas) Date: Fri, 15 Feb 2013 11:28:06 -0600 Subject: [Biopython] Concatenate to aligned sequences In-Reply-To: References: Message-ID: Good day, I have written a function that will take a list of alignments and will concatenate them based on the sequence ids. The advantage here is that the lists do not have to contain the same number of sequences, which is helpful when you are trying to create one big alignment for phylogenetic applications and some taxa are missing certain sequences. concatenate function is here: https://github.com/karolisr/krpy/blob/master/kralign.py other functions can be ignored, it only depends on biopython to work. Peace On Thu, Feb 14, 2013 at 11:20 AM, Vincent Davis wrote: > I have 2 fasta files from a mucle alignment. Both have the same number of > sequences from the same organism. If I what to concatenate the pairs of > sequences what it the best way to do this. > Right now I am doing this: > > def concatenate(fa1, fa2): > fa1open = open(fa1, "rU") > fa2open = open(fa1, "rU") > fa1dict = SeqIO.to_dict(SeqIO.parse(fa1open, "fasta")) > fa2dict = SeqIO.to_dict(SeqIO.parse(fa2open, "fasta")) > fa1open.close() > fa2open.close() > # check that both files have the same sequnce id's > if set(fa1dict.keys()) != set(fa2dict.keys()): > print(fa1dict.keys(), fa2dict.keys()) > print('The fasta files do not have the same sequences') > bothdict = {} > bothlist = [] > count = 1 > for key in fa2dict.keys(): > bothdict[key] = fa2dict[key] > bothdict[key].seq = fa2dict[key].seq + fa1dict[key].seq > bothlist.append(bothdict[key]) > return bothdict, bothlist > > Vincent Davis > 720-301-3003 > _______________________________________________ > Biopython mailing list - Biopython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython From jordan.r.willis at Vanderbilt.Edu Fri Feb 22 02:19:40 2013 From: jordan.r.willis at Vanderbilt.Edu (Willis, Jordan R) Date: Fri, 22 Feb 2013 02:19:40 +0000 Subject: [Biopython] User Defined Scoring Matrix Message-ID: Hello, Since I'm not sure which tool to exactly use, I will defer to the biopython community since odds are I will be using it. I'm trying to produce a multiple sequence alignment with a user defined scoring matrix. When I look at Clustalw, there is an option to put in such a matrix, and the help indicates that this should be in "blast" format. When I search for blast format, they indicate that this is hard coded into the software. My end goal is to produce a phylogeny tree using this PSSM, but I have no idea how to input this into ClustalW or any multiple sequence alignment software. I don't really care which software to use, which wrappers, or how I have to do it.I have used biopython to produce this matrix, so I thought it would be relatively easy to implement it in any multiple sequence alignment software. I'm not having very good luck and any help would be must appreciated. Jordan From p.j.a.cock at googlemail.com Fri Feb 22 10:35:41 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Fri, 22 Feb 2013 10:35:41 +0000 Subject: [Biopython] User Defined Scoring Matrix In-Reply-To: References: Message-ID: On Fri, Feb 22, 2013 at 2:19 AM, Willis, Jordan R wrote: > Hello, > > Since I'm not sure which tool to exactly use, I will defer to the > biopython community since odds are I will be using it. I'm trying to produce > a multiple sequence alignment with a user defined scoring matrix. When I > look at Clustalw, there is an option to put in such a matrix, and the help > indicates that this should be in "blast" format. When I search for blast > format, they indicate that this is hard coded into the software. I wouldn't start with ClustalW - it is old and still widley used, but even the authors are trying to discourage this. They suggest their new tool Clustal Omega, and that as a Biopython wrapper and takes an optional distance matrix as input via the --distmat-i argument. from Bio.Align.Applications import ClustalOmegaCommandline help(ClustalOmegaCommandline) http://biopython.org/DIST/docs/api/Bio.Align.Applications._ClustalOmega.ClustalOmegaCommandline-class.html > My end goal is to produce a phylogeny tree using this PSSM, but I have no > idea how to input this into ClustalW or any multiple sequence alignment > software. I don't really care which software to use, which wrappers, or how > I have to do it.I have used biopython to produce this matrix, so I thought > it would be relatively easy to implement it in any multiple sequence > alignment software. > > I'm not having very good luck and any help would be must appreciated. > > Jordan There are people far more qualified than me to comment on the goals and if and when you should use a distance based tree (my understanding is distance based trees are the worst kind, but as they are computationally inexpensive make can sense for large datasets). Regards, Peter From biocyberman at gmail.com Fri Feb 22 15:18:58 2013 From: biocyberman at gmail.com (Biocyberman) Date: Fri, 22 Feb 2013 16:18:58 +0100 Subject: [Biopython] read and write full ID line of EMBL SeqRecord? Message-ID: Hi there, I am using Biopython version 1.6.1 (latest). My original ID line is: ID ACCESSION1; SV 1; linear; genomic DNA; HTG; PRO; 26402 BP. But after reading and writing out, I got this: ID ACCESSION1; SV 1; ; DNA; ; PRO; 26402 BP. How do I get the same ID line ? Attached is the python script and input file. Thanks for taking a look. Biocyberman -------------- next part -------------- A non-text attachment was scrubbed... Name: input.embl Type: application/octet-stream Size: 1063 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: checkconvert.py Type: application/octet-stream Size: 249 bytes Desc: not available URL: From p.j.a.cock at googlemail.com Fri Feb 22 16:08:13 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Fri, 22 Feb 2013 16:08:13 +0000 Subject: [Biopython] read and write full ID line of EMBL SeqRecord? In-Reply-To: References: Message-ID: On Fri, Feb 22, 2013 at 3:18 PM, Biocyberman wrote: > Hi there, > I am using Biopython version 1.6.1 (latest). > My original ID line is: > ID ACCESSION1; SV 1; linear; genomic DNA; HTG; PRO; 26402 BP. > > But after reading and writing out, I got this: > > ID ACCESSION1; SV 1; ; DNA; ; PRO; 26402 BP. > > How do I get the same ID line ? > > Attached is the python script and input file. > > Thanks for taking a look. > Biocyberman This is probably part of https://redmine.open-bio.org/issues/2578 (the GenBank and EMBL code overlaps a lot). Peter From ferreirafm at usp.br Fri Feb 22 17:01:02 2013 From: ferreirafm at usp.br (Frederico Moraes Ferreira) Date: Fri, 22 Feb 2013 14:01:02 -0300 Subject: [Biopython] blastdbcmd Message-ID: <5127A44E.2030403@usp.br> Hi there Biopythoneers, As long as I know, there isnt't a blastdbcmd submodule into Biopython. So, I've been writing the blast matched sequences ID's to a file, fetching them all with a subprocess and reading with SeqIO afterwards. In some cases, however, I miss a blastdbcmd parser to make things easy. How do you guys are dealing with this? Best, Fred From p.j.a.cock at googlemail.com Fri Feb 22 17:23:44 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Fri, 22 Feb 2013 17:23:44 +0000 Subject: [Biopython] blastdbcmd In-Reply-To: <5127A44E.2030403@usp.br> References: <5127A44E.2030403@usp.br> Message-ID: On Fri, Feb 22, 2013 at 5:01 PM, Frederico Moraes Ferreira wrote: > Hi there Biopythoneers, > As long as I know, there isnt't a blastdbcmd submodule into Biopython. So, > I've been writing the blast matched sequences ID's to a file, fetching them > all with a subprocess and reading with SeqIO afterwards. In some cases, > however, I miss a blastdbcmd parser to make things easy. How do you guys are > dealing with this? > Best, > Fred Are you talking about a command line wrapper for blastdbcmd, to go in Bio/Blast/Applications.py? That seems a good idea. Personally I find the blastdbcmd tool quite handicapped due to the introduction of generated sequence identifiers, and rarely use it: http://blastedbio.blogspot.co.uk/2012/10/my-ids-not-good-enough-for-ncbi-blast.html Instead I would use Bio.SeqIO to index the FASTA file used for the database, and get the sequences that way. Peter From jgibbons1 at mail.usf.edu Tue Feb 26 16:45:03 2013 From: jgibbons1 at mail.usf.edu (Justin Gibbons) Date: Tue, 26 Feb 2013 11:45:03 -0500 Subject: [Biopython] Filter Blast results Message-ID: I know that there is already a script in the Cookbook for filtering out blast queries with no hits, but it involves holding all of the sequence objects in memory, which isn't good if you have to work with a lot of sequences. I came up with the following function, which works, but I would appreciate any input for how to improve it. In particular I don't like that I am appending the sequence objects to file and would like to know of any alternatives. The main function is: def filter_no_hits(blast_xml_results, source_fasta, file_format, no_hit_file, hit_file): """Scans Blast XML results and if the query sequence has no hits prints the sequence record in the no_hit_file, otherwise in the hit_file. The source_fasta is the file that was used to perform the blast search and is used to retrieve the sequence record""" result_handle=open(blast_xml_results) #open the xml file blast_records=NCBIXML.parse(result_handle) #create the generator object indexed_fasta=create_indexed_fasta(source_fasta, file_format) #create the indexed file object for record in blast_records: hit_def_list=blast_xml_hit_def(record) #returns list of hit_def results record_id=get_id_str_from_desc(record.query) #get the record ID to search the indexed file record_object=indexed_fasta.get_raw(record_id) #Use the sequence ID to get the sequence record if is_list_null(hit_def_list): #if no hits append_to_file(no_hit_file, record_object) else: #if hits append_to_file(hit_file, record_object) result_handle.close() Sub-functions: def create_indexed_fasta(path, file_format): """Makes a fasta file searchable like a dictionary with the sequence Id as the key""" return SeqIO.index(path, file_format) def blast_xml_hit_def(record): """Returns a list of hit_def for a record from a NCBI blast XML report""" hit_def_list=[] for alignment in record.alignments: hit_def_list.append(alignment.hit_def) return hit_def_list def get_id_str_from_desc(desc): """Returns the Id from a fasta record description""" parts=desc.split(" ") return parts[0] def is_list_null(lst): """Returns True if list is empty and False otherwise""" if len(lst)==0: return True else: return False def append_to_file(path, string): with open(path, 'a') as f: f.write(string) def record_counter(path, file_format): """Input a file path and the format of the file and it returns the number of records in the file""" counter=0 for seq_record in SeqIO.parse(path, file_format): counter+=1 print "%s contains %i records" %(path, counter) return counter Thank you Justin Gibbons From p.j.a.cock at googlemail.com Tue Feb 26 16:57:01 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Tue, 26 Feb 2013 16:57:01 +0000 Subject: [Biopython] Filter Blast results In-Reply-To: References: Message-ID: On Tue, Feb 26, 2013 at 4:45 PM, Justin Gibbons wrote: > I know that there is already a script in the Cookbook for filtering out > blast queries with no hits, but it involves holding all of the sequence > objects in memory, which isn't good if you have to work with a lot of > sequences. Hi Justin, Which example are you referring too? It doesn't sound very efficient. There are some wiki pages with user contributed cookbook recipes: http://biopython.org/wiki/Category:Cookbook There is also the "Biopython Tutorial and Cookbook", online here: http://biopython.org/DIST/docs/tutorial/Tutorial.html http://biopython.org/DIST/docs/tutorial/Tutorial.pdf Thanks, Peter From w.arindrarto at gmail.com Tue Feb 26 17:27:21 2013 From: w.arindrarto at gmail.com (Wibowo Arindrarto) Date: Tue, 26 Feb 2013 18:27:21 +0100 Subject: [Biopython] Filter Blast results In-Reply-To: References: Message-ID: Hi Justin, For your purpose, you can try using the SearchIO module (http://biopython.org/DIST/docs/tutorial/Tutorial.html#htoc101), from the latest Biopython (1.61). Here's my attempt to have a similar working function: from Bio import SearchIO, SeqIO fasta_ids = set([x.id for x in SeqIO.parse('fasta', 'fasta')]) # get all fasta IDs in a set with open('no_hit', 'w') as no_hit, open('hit', 'w') as hit: for qresult in SearchIO.parse('blast_results.xml', 'blast-xml'): hits = set([x.id for x in qresult]) # get all the ID in a set present = fasta_ids.intersection(hits) # output all IDs present in both sets if present: # set is not empty hit.write(qresult.id) else: no_hit.write(qresult.id) On another note, if you are always checking against the same Fasta file, you can try to create your own BLAST database consisting of only those files and search against them, so any BLAST results you have will always at least contain one of the sequences in your FASTA file. This makes the functions slightly simpler: from Bio import SearchIO with open('no_hit', 'w') as no_hit, open('hit', 'w') as hit: for qresult in SearchIO.parse('blast_results.xml', 'blast-xml'): # empty queries evaluate to False if qresult: hit.write(qresult.id) else: no_hit.write(qresult.id) Both functions still require you to store all the FASTA IDs in memory, but should be more reasonable than storing whole SeqRecord objects. Hope that helps, Bow On Tue, Feb 26, 2013 at 5:57 PM, Peter Cock wrote: > On Tue, Feb 26, 2013 at 4:45 PM, Justin Gibbons wrote: >> I know that there is already a script in the Cookbook for filtering out >> blast queries with no hits, but it involves holding all of the sequence >> objects in memory, which isn't good if you have to work with a lot of >> sequences. > > Hi Justin, > > Which example are you referring too? It doesn't sound very efficient. > > There are some wiki pages with user contributed cookbook recipes: > http://biopython.org/wiki/Category:Cookbook > > There is also the "Biopython Tutorial and Cookbook", online here: > http://biopython.org/DIST/docs/tutorial/Tutorial.html > http://biopython.org/DIST/docs/tutorial/Tutorial.pdf > > Thanks, > > Peter > _______________________________________________ > Biopython mailing list - Biopython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython From robert.j.ahern at mycit.ie Wed Feb 27 17:21:24 2013 From: robert.j.ahern at mycit.ie (Robert Ahern) Date: Wed, 27 Feb 2013 17:21:24 +0000 Subject: [Biopython] (no subject) Message-ID: <9042978694721632165@unknownmsgid> robert.j.ahern at mycit.ie Sent from Windows Mail From p.j.a.cock at googlemail.com Wed Feb 27 22:32:35 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Wed, 27 Feb 2013 22:32:35 +0000 Subject: [Biopython] Fwd: [Numpy-discussion] [ANN] SciPy2013: Call for abstracts In-Reply-To: References: Message-ID: The new bioinformatics mini-symposium this year makes SciPy 2013 especially interesting. Peter ---------- Forwarded message ---------- From: *Jonathan Rocher* Date: Wednesday, February 27, 2013 Subject: [Numpy-discussion] [ANN] SciPy2013: Call for abstracts To: SciPy Users List , numfocus at googlegroups.com, Discussion of Numerical Python [Apologies for cross-posts] Dear all, The annual SciPy Conference (Scientific Computing with Python) allows participants from academic, commercial, and governmental organizations to showcase their latest projects, learn from skilled users and developers, and collaborate on code development. *The deadline for abstract submissions is March 20th, 2013. * Submissions are welcome that address general Scientific Computing with Python, one of the two special themes for this years conference (machine learning & reproducible science), or the domain-specific mini-symposiaheld during the conference (Meteorology, climatology, and atmospheric and oceanic science, Astronomy and astrophysics, Medical imaging, Bio-informatics). Please submit your abstract at the SciPy 2013 website abstract submission form . Abstracts will be accepted for posters or presentations. Optional papers to be published in the conference proceedings will be requested following abstract submission. This year the proceedings will be made available prior to the conference to help attendees navigate the conference. We look forward to an exciting and interesting set of talks, posters, and discussions and hope to see you at the conference. The SciPy 2013 Program Committee Chairs Matt McCormick, Kitware, Inc. Katy Huff, University of Wisconsin-Madison and Argonne National Laboratory