From p.j.a.cock at googlemail.com  Fri Feb  1 08:34:46 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Fri, 1 Feb 2013 13:34:46 +0000
Subject: [Biopython] Fwd: Bug in bgzf module
In-Reply-To: <CANJ6P8J7PBXSQngbLJ2QKkFHEpFWwaQ8opiQRBPtu01eiUK2KQ@mail.gmail.com>
References: <CANJ6P8KTPF0DCoOGvFfVAXQkwJtZezncpr4HDDTYn4HAQJjUnQ@mail.gmail.com>
	<CANJ6P8LBkbR89pROYfka4P82TFAPvdLSOiwjEr3gxNgxx=wghw@mail.gmail.com>
	<CAKVJ-_48X3YGXN7ky+LmtQf8YFyscm6e0wtJWZs2ZM8yLyj3Bg@mail.gmail.com>
	<CANJ6P8J7PBXSQngbLJ2QKkFHEpFWwaQ8opiQRBPtu01eiUK2KQ@mail.gmail.com>
Message-ID: <CAKVJ-_6tvf3U3MJp0O2Cd6sPsgoMP_yaQtWnkG3yro5vsoXneA@mail.gmail.com>

On Thu, Jan 31, 2013 at 10:57 PM, Petra Kubincov?
<petra.kubincova at gmail.com> wrote:
> Hi Peter,
>
> well, I don't have much experience with unit tests but I will try to come up
> with something. :)
> I'll let you know if I won't succeed.

That would be great - in the short term I've added something quite simple:
https://github.com/biopython/biopython/commit/5b0d0bd55024d6dbbdea85ff73e6bd2fbbfd5ee1

> And yes, recording an index is exactly the thing I need to do. (I am
> currently working on interval mapping tool for multiple whole-genome
> alignments, where I need to read .maf file, write preprocessed data into a
> compressed file and then work just with index for the compressed file and
> the compressed file itself to do the mapping.)

That reminds me I need to look at Andrew's MAF work:
http://biopython.org/wiki/Multiple_Alignment_Format
https://github.com/biopython/biopython/pull/5

Regards,

Peter


From p.j.a.cock at googlemail.com  Mon Feb  4 13:04:40 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Mon, 4 Feb 2013 18:04:40 +0000
Subject: [Biopython] Proof reading the tutorial for the next release?
Message-ID: <CAKVJ-_4+H+qb-nB-6PBZzBxoRgGDrtWy54jE3qT4MQcio2mZ_w@mail.gmail.com>

Hello all,

If you're also on the Biopython-Dev Mailing List you'll know
we're hoping to release Biopython 1.61 this week. If anyone
here wants to help out, proof-reading the draft tutorial would
be great :)

I've posted the current tutorial as HTML and PDF online,
http://biopython.org/DIST/docs/tutorial/Tutorial-dev.html
http://biopython.org/DIST/docs/tutorial/Tutorial-dev.pdf

Currently those are being updated manually (it used to be
done automatically every night - something which needs
to be-configured following a server move). If you see an
error, and want to know if it has already been fixed, then
the source file is Tutorial.tex (it is written using LaTex), and
you can see the recent changes here on GitHub:

https://github.com/biopython/biopython/commits/master/Doc/Tutorial.tex

Thanks,

Peter

From p.j.a.cock at googlemail.com  Tue Feb  5 17:05:25 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Tue, 5 Feb 2013 22:05:25 +0000
Subject: [Biopython] Biopython 1.61 released
Message-ID: <CAKVJ-_6jxFnV8HozDT8sc7xey88y8hXkW8dQUSw0yZDO-q00FA@mail.gmail.com>

Dear Biopythoneers,

Source distributions and Windows installers for Biopython 1.61 are now
available from the downloads page on the Biopython website and from
the Python Package Index (PyPI).

The updated Biopython Tutorial and Cookbook is online (PDF).

Platforms/Deployment:

We currently support Python 2.5, 2.6 and 2.7 and also test under
Python 3.1, 3.2 and 3.3 (including modules using NumPy), and Jython
2.5 and PyPy 1.9 (Jython and PyPy do not cover NumPy or our C
extensions). We are still encouraging early adopters to help test on
these platforms, and have included a ?beta? installer for Python 3.2
(and Python 3.3 to follow soon) under 32-bit Windows.

Please note we are phasing out support for Python 2.5. We will
continue support for at least one further release (Biopython 1.62).
This could be extended given feedback from our users. Focusing on
Python 2.6 and 2.7 only will make writing Python 3 compatible code
easier.

New Features:

GenomeDiagram has three new sigils (shapes to illustrate features).
OCTO shows an octagonal shape, like the existing BOX sigil but with
the corners cut off. JAGGY shows a box with jagged edges at the start
and end, intended for things like NNNNN regions in draft genomes.
Finally BIGARROW is like the existing ARROW sigil but is drawn
straddling the axis. This is useful for drawing vertically compact
figures where you do not have overlapping genes.

New module Bio.Graphics.ColorSpiral can generate colors along a spiral
path through HSV color space. This can be used to make arbitrary
?rainbow? scales, for example to color features or cross-links on a
GenomeDiagram figure.

The Bio.SeqIO module now supports reading sequences from PDB files in
two different ways. The ?pdb-atom? format determines the sequence as
it appears in the structure based on the atom coordinate section of
the file (via Bio.PDB,
so NumPy is currently required for this). Alternatively, you can use
the ?pdb-seqres? format to read the complete protein sequence as it is
listed in the PDB header, if available.

The Bio.SeqUtils module how has a seq1 function to turn a sequence
using three letter amino acid codes into one using the more common one
letter codes. This acts as the inverse of the existing seq3 function.

The multiple-sequence-alignment object used by Bio.AlignIO etc now
supports an annotation dictionary. Additional support for per-column
annotation is planned, with addition and splicing to work like that
for the SeqRecord per-letter annotation.

The Bio.Motif module has been updated and reorganized. To allow for a
clean deprecation of the old code, the new motif code is stored in a
new module Bio.motifs, and a PendingDeprecationWarning was added to
Bio.Motif.

Experimental Code ? SearchIO:

This release also includes Bow?s Google Summer of Code work writing a
unified parsing framework for NCBI BLAST (assorted formats including
tabular and XML), HMMER, BLAT, and other sequence searching tools.
This is currently available with the new BiopythonExperimentalWarning
to indicate that this is still somewhat experimental. We?re bundling
it with the main release to get more public feedback, but with the big
warning that the API is likely to change. In fact, even the current
name of Bio.SearchIO may change since unless you are familiar with
BioPerl its purpose isn?t immediately clear.

Contributors:

Brandon Invergo
Bryan Lunt (first contribution)
Christian Brueffer (first contribution)
David Cain
Eric Talevich
Grace Yeo (first contribution)
Jeffrey Chang
Jingping Li (first contribution)
Kai Blin (first contribution)
Leighton Pritchard
Lenna Peterson
Lucas Sinclair (first contribution)
Michiel de Hoon
Nick Semenkovich (first contribution)
Peter Cock
Robert Ernst (first contribution)
Tiago Antao
Wibowo ?Bow? Arindrarto

Thank you all.

Release announcement here (RSS feed available):
http://news.open-bio.org/news/2013/02/biopython-1-61-released/

P.S. You can follow @Biopython on Twitter
https://twitter.com/Biopython


From w.arindrarto at gmail.com  Tue Feb  5 19:03:52 2013
From: w.arindrarto at gmail.com (Wibowo Arindrarto)
Date: Wed, 6 Feb 2013 01:03:52 +0100
Subject: [Biopython] Biopython 1.61 released
In-Reply-To: <CAKVJ-_6jxFnV8HozDT8sc7xey88y8hXkW8dQUSw0yZDO-q00FA@mail.gmail.com>
References: <CAKVJ-_6jxFnV8HozDT8sc7xey88y8hXkW8dQUSw0yZDO-q00FA@mail.gmail.com>
Message-ID: <CADEGkF4-H0cc2zC245gaK3AbN8kZRyByD0xe5o8RfX1patj-qA@mail.gmail.com>

Hi Peter,

> Dear Biopythoneers,
>
> Source distributions and Windows installers for Biopython 1.61 are now
> available from the downloads page on the Biopython website and from
> the Python Package Index (PyPI).
>
> The updated Biopython Tutorial and Cookbook is online (PDF).
>
> Platforms/Deployment:
>
> We currently support Python 2.5, 2.6 and 2.7 and also test under
> Python 3.1, 3.2 and 3.3 (including modules using NumPy), and Jython
> 2.5 and PyPy 1.9 (Jython and PyPy do not cover NumPy or our C
> extensions). We are still encouraging early adopters to help test on
> these platforms, and have included a ?beta? installer for Python 3.2
> (and Python 3.3 to follow soon) under 32-bit Windows.
>
> Please note we are phasing out support for Python 2.5. We will
> continue support for at least one further release (Biopython 1.62).
> This could be extended given feedback from our users. Focusing on
> Python 2.6 and 2.7 only will make writing Python 3 compatible code
> easier.
>
> New Features:
>
> GenomeDiagram has three new sigils (shapes to illustrate features).
> OCTO shows an octagonal shape, like the existing BOX sigil but with
> the corners cut off. JAGGY shows a box with jagged edges at the start
> and end, intended for things like NNNNN regions in draft genomes.
> Finally BIGARROW is like the existing ARROW sigil but is drawn
> straddling the axis. This is useful for drawing vertically compact
> figures where you do not have overlapping genes.
>
> New module Bio.Graphics.ColorSpiral can generate colors along a spiral
> path through HSV color space. This can be used to make arbitrary
> ?rainbow? scales, for example to color features or cross-links on a
> GenomeDiagram figure.
>
> The Bio.SeqIO module now supports reading sequences from PDB files in
> two different ways. The ?pdb-atom? format determines the sequence as
> it appears in the structure based on the atom coordinate section of
> the file (via Bio.PDB,
> so NumPy is currently required for this). Alternatively, you can use
> the ?pdb-seqres? format to read the complete protein sequence as it is
> listed in the PDB header, if available.
>
> The Bio.SeqUtils module how has a seq1 function to turn a sequence
> using three letter amino acid codes into one using the more common one
> letter codes. This acts as the inverse of the existing seq3 function.
>
> The multiple-sequence-alignment object used by Bio.AlignIO etc now
> supports an annotation dictionary. Additional support for per-column
> annotation is planned, with addition and splicing to work like that
> for the SeqRecord per-letter annotation.
>
> The Bio.Motif module has been updated and reorganized. To allow for a
> clean deprecation of the old code, the new motif code is stored in a
> new module Bio.motifs, and a PendingDeprecationWarning was added to
> Bio.Motif.
>
> Experimental Code ? SearchIO:
>
> This release also includes Bow?s Google Summer of Code work writing a
> unified parsing framework for NCBI BLAST (assorted formats including
> tabular and XML), HMMER, BLAT, and other sequence searching tools.
> This is currently available with the new BiopythonExperimentalWarning
> to indicate that this is still somewhat experimental. We?re bundling
> it with the main release to get more public feedback, but with the big
> warning that the API is likely to change. In fact, even the current
> name of Bio.SearchIO may change since unless you are familiar with
> BioPerl its purpose isn?t immediately clear.
>
> Contributors:
>
> Brandon Invergo
> Bryan Lunt (first contribution)
> Christian Brueffer (first contribution)
> David Cain
> Eric Talevich
> Grace Yeo (first contribution)
> Jeffrey Chang
> Jingping Li (first contribution)
> Kai Blin (first contribution)
> Leighton Pritchard
> Lenna Peterson
> Lucas Sinclair (first contribution)
> Michiel de Hoon
> Nick Semenkovich (first contribution)
> Peter Cock
> Robert Ernst (first contribution)
> Tiago Antao
> Wibowo ?Bow? Arindrarto
>
> Thank you all.
>
> Release announcement here (RSS feed available):
> http://news.open-bio.org/news/2013/02/biopython-1-61-released/
>
> P.S. You can follow @Biopython on Twitter
> https://twitter.com/Biopython

Thanks for doing the release! It feels exciting to see SearchIO code
finally live in the distributions :). Hopefully this will result in
more feedback (and then more improvements ~ likewise for the whole
Biopython as well).

Also, thank you as well to everyone who has criticized / commented /
contributed code to the module :).

cheers,
Bow


From p.j.a.cock at googlemail.com  Thu Feb  7 06:33:25 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Thu, 7 Feb 2013 11:33:25 +0000
Subject: [Biopython] Biopython 1.61 released
In-Reply-To: <CAKVJ-_6jxFnV8HozDT8sc7xey88y8hXkW8dQUSw0yZDO-q00FA@mail.gmail.com>
References: <CAKVJ-_6jxFnV8HozDT8sc7xey88y8hXkW8dQUSw0yZDO-q00FA@mail.gmail.com>
Message-ID: <CAKVJ-_7bXcXkxFQ9Xx0W3CDwd_QzhYiRKpsFHGL9n5YoSFDtXQ@mail.gmail.com>

On Tue, Feb 5, 2013 at 10:05 PM, Peter Cock <p.j.a.cock at googlemail.com> wrote:
> Dear Biopythoneers,
>
> Source distributions and Windows installers for Biopython 1.61 are now
> available from the downloads page on the Biopython website and from
> the Python Package Index (PyPI).
>
> The updated Biopython Tutorial and Cookbook is online (PDF).
>
> Platforms/Deployment:
>
> We currently support Python 2.5, 2.6 and 2.7 and also test under
> Python 3.1, 3.2 and 3.3 (including modules using NumPy), and Jython
> 2.5 and PyPy 1.9 (Jython and PyPy do not cover NumPy or our C
> extensions). We are still encouraging early adopters to help test on
> these platforms, and have included a ?beta? installer for Python 3.2
> (and Python 3.3 to follow soon) under 32-bit Windows.

For those of you wanting to try Biopython on Python 3.3 on Windows,
there is now an installer for Biopython 1.61 built against NumPy 1.7.0rc2.

NumPy 1.7 is their first release to support Python 3.3, and the
official release is expected to be near-identical to this second
release candidate, see:
http://mail.scipy.org/pipermail/numpy-discussion/2013-February/065384.html

Regards,

Peter


From vincent at vincentdavis.net  Sat Feb  9 22:47:20 2013
From: vincent at vincentdavis.net (Vincent Davis)
Date: Sat, 9 Feb 2013 20:47:20 -0700
Subject: [Biopython] Taxonomic Classification tree
Message-ID: <CALyJZZXrqwxZsVSXunYCZWqCeQ25HsYNbXjFmqkqZCV_QEfQQQ@mail.gmail.com>

Any suggestion of how to build a Taxonomic Classification tree. That is,
like a Phylo tree but based on taxa.

Vincent Davis

From nuin at genedrift.org  Sat Feb  9 23:03:27 2013
From: nuin at genedrift.org (Paulo Nuin)
Date: Sat, 9 Feb 2013 23:03:27 -0500
Subject: [Biopython] Taxonomic Classification tree
In-Reply-To: <CALyJZZXrqwxZsVSXunYCZWqCeQ25HsYNbXjFmqkqZCV_QEfQQQ@mail.gmail.com>
References: <CALyJZZXrqwxZsVSXunYCZWqCeQ25HsYNbXjFmqkqZCV_QEfQQQ@mail.gmail.com>
Message-ID: <848FC12D-5B1C-4D1F-94B1-4EC97845FFEE@genedrift.org>

All phylogenetic trees are based on taxa. You might need to be more specific.

Paulo


On 2013-02-09, at 10:47 PM, Vincent Davis <vincent at vincentdavis.net> wrote:

> Any suggestion of how to build a Taxonomic Classification tree. That is,
> like a Phylo tree but based on taxa.
> 
> Vincent Davis
> _______________________________________________
> Biopython mailing list  -  Biopython at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython


From vincent at vincentdavis.net  Sat Feb  9 23:53:13 2013
From: vincent at vincentdavis.net (Vincent Davis)
Date: Sat, 9 Feb 2013 21:53:13 -0700
Subject: [Biopython] Taxonomic Classification tree
In-Reply-To: <848FC12D-5B1C-4D1F-94B1-4EC97845FFEE@genedrift.org>
References: <CALyJZZXrqwxZsVSXunYCZWqCeQ25HsYNbXjFmqkqZCV_QEfQQQ@mail.gmail.com>
	<848FC12D-5B1C-4D1F-94B1-4EC97845FFEE@genedrift.org>
Message-ID: <CALyJZZXdrr0VmFWsrUUa+ULr6+m2BiJb_F5sv8cEG5ikzJdENw@mail.gmail.com>

On Sat, Feb 9, 2013 at 9:03 PM, Paulo Nuin <nuin at genedrift.org> wrote:

> All phylogenetic trees are based on taxa. You might need to be more
> specific.


Maybe but Taxonomic Classification is not based on phylogenetics.
What I have is a list of organisms and their Taxonomic Classification. I
want to build a tree based on only the Taxonomic Classification.

Vincent Davis
720-301-3003

From cjfields at illinois.edu  Sun Feb 10 00:01:59 2013
From: cjfields at illinois.edu (Fields, Christopher J)
Date: Sun, 10 Feb 2013 05:01:59 +0000
Subject: [Biopython] Taxonomic Classification tree
In-Reply-To: <CALyJZZXdrr0VmFWsrUUa+ULr6+m2BiJb_F5sv8cEG5ikzJdENw@mail.gmail.com>
References: <CALyJZZXrqwxZsVSXunYCZWqCeQ25HsYNbXjFmqkqZCV_QEfQQQ@mail.gmail.com>
	<848FC12D-5B1C-4D1F-94B1-4EC97845FFEE@genedrift.org>
	<CALyJZZXdrr0VmFWsrUUa+ULr6+m2BiJb_F5sv8cEG5ikzJdENw@mail.gmail.com>
Message-ID: <118F034CF4C3EF48A96F86CE585B94BF6CE1FE3E@CHIMBX5.ad.uillinois.edu>

On Feb 9, 2013, at 10:53 PM, Vincent Davis <vincent at vincentdavis.net>
 wrote:

> On Sat, Feb 9, 2013 at 9:03 PM, Paulo Nuin <nuin at genedrift.org> wrote:
> 
>> All phylogenetic trees are based on taxa. You might need to be more
>> specific.
> 
> 
> Maybe but Taxonomic Classification is not based on phylogenetics.
> What I have is a list of organisms and their Taxonomic Classification. I
> want to build a tree based on only the Taxonomic Classification.
> 
> Vincent Davis
> 720-301-3003


There's code floating around on the bioperl side for doing this sort of thing, not sure if biopython has anything along these lines (I would be surprised if someone hasn't done this yet, though).

chris


From vincent at vincentdavis.net  Sun Feb 10 15:16:20 2013
From: vincent at vincentdavis.net (Vincent Davis)
Date: Sun, 10 Feb 2013 13:16:20 -0700
Subject: [Biopython] NCBI Blast, what an I going wrong
Message-ID: <CALyJZZV8FwjHJsiS0_sGJYNggvhBtyivKd8QiNgpqsC+ACudsQ@mail.gmail.com>

I am having trouble with NCBIWWW.qblast  I can get the the example to work.
Maybe I need help with reading :-)

>From the documentation
result_handle = NCBIWWW.qblast("blastn", "nt", "8332116")
save_file = open("temp.xml", "w")
save_file.write(result_handle.read())
save_file.close()
result_handle.close()
result_handle = open("temp.xml")
blast_record = NCBIXML.parse(result_handle)

The temp.xml looks correct but I can get nothing from blast_record. I have
tried passing the directly to NCBIXML.parse and still no luck.

How would I for example get the first hit "gi|224094601" ?

Vincent Davis

From p.j.a.cock at googlemail.com  Sun Feb 10 15:35:20 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Sun, 10 Feb 2013 20:35:20 +0000
Subject: [Biopython] NCBI Blast, what an I going wrong
In-Reply-To: <CALyJZZV8FwjHJsiS0_sGJYNggvhBtyivKd8QiNgpqsC+ACudsQ@mail.gmail.com>
References: <CALyJZZV8FwjHJsiS0_sGJYNggvhBtyivKd8QiNgpqsC+ACudsQ@mail.gmail.com>
Message-ID: <CAKVJ-_7J49FT8QA+iwyFg_Aswt_QG+zqgzMz0teKOx4QPeTOsw@mail.gmail.com>

On Sun, Feb 10, 2013 at 8:16 PM, Vincent Davis <vincent at vincentdavis.net> wrote:
> I am having trouble with NCBIWWW.qblast  I can get the the example to work.
> Maybe I need help with reading :-)
>
> >From the documentation
> result_handle = NCBIWWW.qblast("blastn", "nt", "8332116")
> save_file = open("temp.xml", "w")
> save_file.write(result_handle.read())
> save_file.close()
> result_handle.close()
> result_handle = open("temp.xml")
> blast_record = NCBIXML.parse(result_handle)
>
> The temp.xml looks correct but I can get nothing from blast_record. I have
> tried passing the directly to NCBIXML.parse and still no luck.
>
> How would I for example get the first hit "gi|224094601" ?
>
> Vincent Davis

Hi Vincent,

Well, first I would check that the BLAST results were downloaded
ok - can you open the temp.xml file in a text editor (e.g. WordPad
on Windows)? Can you see the hits you are expecting?

Second, the parse function is for iterating over the file - if you
expect just one query's results, try:

blast_record = NCBIXML.read(result_handle)

Peter

From vincent at vincentdavis.net  Sun Feb 10 15:41:42 2013
From: vincent at vincentdavis.net (Vincent Davis)
Date: Sun, 10 Feb 2013 13:41:42 -0700
Subject: [Biopython] NCBI Blast, what an I going wrong
In-Reply-To: <CAKVJ-_7J49FT8QA+iwyFg_Aswt_QG+zqgzMz0teKOx4QPeTOsw@mail.gmail.com>
References: <CALyJZZV8FwjHJsiS0_sGJYNggvhBtyivKd8QiNgpqsC+ACudsQ@mail.gmail.com>
	<CAKVJ-_7J49FT8QA+iwyFg_Aswt_QG+zqgzMz0teKOx4QPeTOsw@mail.gmail.com>
Message-ID: <CALyJZZXAs0N47UkMazwdGEtuE5ASDC+TgFBai2BEGV6NuDbY8Q@mail.gmail.com>

Peter,
I verified the file,
I miss understood the " if you
expect just one query's results" I was reading this as meaning that there
would be more than one hit.
I figured this was a stupid mistake.

Thanks
Vincent


Vincent Davis
720-301-3003


On Sun, Feb 10, 2013 at 1:35 PM, Peter Cock <p.j.a.cock at googlemail.com>wrote:

> On Sun, Feb 10, 2013 at 8:16 PM, Vincent Davis <vincent at vincentdavis.net>
> wrote:
> > I am having trouble with NCBIWWW.qblast  I can get the the example to
> work.
> > Maybe I need help with reading :-)
> >
> > >From the documentation
> > result_handle = NCBIWWW.qblast("blastn", "nt", "8332116")
> > save_file = open("temp.xml", "w")
> > save_file.write(result_handle.read())
> > save_file.close()
> > result_handle.close()
> > result_handle = open("temp.xml")
> > blast_record = NCBIXML.parse(result_handle)
> >
> > The temp.xml looks correct but I can get nothing from blast_record. I
> have
> > tried passing the directly to NCBIXML.parse and still no luck.
> >
> > How would I for example get the first hit "gi|224094601" ?
> >
> > Vincent Davis
>
> Hi Vincent,
>
> Well, first I would check that the BLAST results were downloaded
> ok - can you open the temp.xml file in a text editor (e.g. WordPad
> on Windows)? Can you see the hits you are expecting?
>
> Second, the parse function is for iterating over the file - if you
> expect just one query's results, try:
>
> blast_record = NCBIXML.read(result_handle)
>
> Peter
>

From p.j.a.cock at googlemail.com  Sun Feb 10 15:42:56 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Sun, 10 Feb 2013 20:42:56 +0000
Subject: [Biopython] NCBI Blast, what an I going wrong
In-Reply-To: <CALyJZZXAs0N47UkMazwdGEtuE5ASDC+TgFBai2BEGV6NuDbY8Q@mail.gmail.com>
References: <CALyJZZV8FwjHJsiS0_sGJYNggvhBtyivKd8QiNgpqsC+ACudsQ@mail.gmail.com>
	<CAKVJ-_7J49FT8QA+iwyFg_Aswt_QG+zqgzMz0teKOx4QPeTOsw@mail.gmail.com>
	<CALyJZZXAs0N47UkMazwdGEtuE5ASDC+TgFBai2BEGV6NuDbY8Q@mail.gmail.com>
Message-ID: <CAKVJ-_61oeLHPvLDyP8ptEBk7-PRvqg1=Qy0Cb3yFqcW0zVozg@mail.gmail.com>

On Sun, Feb 10, 2013 at 8:41 PM, Vincent Davis <vincent at vincentdavis.net> wrote:
> Peter,
> I verified the file,
> I miss understood the " if you expect just one query's results" I was
> reading this as meaning that there would be more than one hit.
> I figured this was a stupid mistake.
>
> Thanks
> Vincent

So things are working now :)

Great,

Peter

From vincent at vincentdavis.net  Sun Feb 10 15:46:25 2013
From: vincent at vincentdavis.net (Vincent Davis)
Date: Sun, 10 Feb 2013 13:46:25 -0700
Subject: [Biopython] NCBI Blast, what an I going wrong
In-Reply-To: <CAKVJ-_61oeLHPvLDyP8ptEBk7-PRvqg1=Qy0Cb3yFqcW0zVozg@mail.gmail.com>
References: <CALyJZZV8FwjHJsiS0_sGJYNggvhBtyivKd8QiNgpqsC+ACudsQ@mail.gmail.com>
	<CAKVJ-_7J49FT8QA+iwyFg_Aswt_QG+zqgzMz0teKOx4QPeTOsw@mail.gmail.com>
	<CALyJZZXAs0N47UkMazwdGEtuE5ASDC+TgFBai2BEGV6NuDbY8Q@mail.gmail.com>
	<CAKVJ-_61oeLHPvLDyP8ptEBk7-PRvqg1=Qy0Cb3yFqcW0zVozg@mail.gmail.com>
Message-ID: <CALyJZZU0EZe+HyMTgAaq1111qH83-WehM=DpNin3A0peSMvO9A@mail.gmail.com>

Yes

Vincent Davis
720-301-3003


On Sun, Feb 10, 2013 at 1:42 PM, Peter Cock <p.j.a.cock at googlemail.com>wrote:

> On Sun, Feb 10, 2013 at 8:41 PM, Vincent Davis <vincent at vincentdavis.net>
> wrote:
> > Peter,
> > I verified the file,
> > I miss understood the " if you expect just one query's results" I was
> > reading this as meaning that there would be more than one hit.
> > I figured this was a stupid mistake.
> >
> > Thanks
> > Vincent
>
> So things are working now :)
>
> Great,
>
> Peter
>

From winda002 at student.otago.ac.nz  Sun Feb 10 16:16:53 2013
From: winda002 at student.otago.ac.nz (David Winter)
Date: Mon, 11 Feb 2013 10:16:53 +1300
Subject: [Biopython] Taxonomic Classification tree
In-Reply-To: <118F034CF4C3EF48A96F86CE585B94BF6CE1FE3E@CHIMBX5.ad.uillinois.edu>
References: <CALyJZZXrqwxZsVSXunYCZWqCeQ25HsYNbXjFmqkqZCV_QEfQQQ@mail.gmail.com>
	<848FC12D-5B1C-4D1F-94B1-4EC97845FFEE@genedrift.org>
	<CALyJZZXdrr0VmFWsrUUa+ULr6+m2BiJb_F5sv8cEG5ikzJdENw@mail.gmail.com>
	<118F034CF4C3EF48A96F86CE585B94BF6CE1FE3E@CHIMBX5.ad
Message-ID: <51180E45.9000102@student.otago.ac.nz>


Hi Vincent,

It would probably be possible to do this with Biopython, either by

(a) searching the NCBI's taxonomy database with Eutils to get IDs, then 
fetching the corresponding taxonomy records and extracting the complete 
lineage for each of your taxa. You could find the "lowest shared taxon" 
for each one an build a tree

(b) Read the whole NCBI taxonomy using Phylo, and extracting a subtree 
with just your taxa

Both those are probably more work than you need to do though. The 
Interactive Tree of Life page (http://itol.embl.de/other_trees.shtml) 
can take taxon names or IDs and return a phylogeny.

You should be aware - taxonomy is a dynamic science, and assignments can 
change. The NCBI taxonomy is curated by people that know what they're 
talking about, but it's not a definitive tree of life or the result of a 
particular phylogenetic analysis.

David


--
David Winter
Research Associate
Allan Wilson Centre for Molecular Ecology and Evolution
Univeristy of Otago
Dunedin
New Zealand/ Aotearoa

ph + 64 22 018 0449
w: www.david-winter.info
blog: sciblogs.co.nz/the-atavism


On 2/10/2013 6:01 PM, Fields, Christopher J wrote:
> On Feb 9, 2013, at 10:53 PM, Vincent Davis <vincent at vincentdavis.net>
>   wrote:
>
>> On Sat, Feb 9, 2013 at 9:03 PM, Paulo Nuin <nuin at genedrift.org> wrote:
>>
>>> All phylogenetic trees are based on taxa. You might need to be more
>>> specific.
>>
>>
>> Maybe but Taxonomic Classification is not based on phylogenetics.
>> What I have is a list of organisms and their Taxonomic Classification. I
>> want to build a tree based on only the Taxonomic Classification.
>>
>> Vincent Davis
>> 720-301-3003
>
>
> There's code floating around on the bioperl side for doing this sort of thing, not sure if biopython has anything along these lines (I would be surprised if someone hasn't done this yet, though).
>
> chris
>
>
> _______________________________________________
> Biopython mailing list  -  Biopython at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython
>
>


From vincent at vincentdavis.net  Sun Feb 10 17:25:53 2013
From: vincent at vincentdavis.net (Vincent Davis)
Date: Sun, 10 Feb 2013 15:25:53 -0700
Subject: [Biopython] How to BLAST Optimize for : More dissimilar sequences
	(discontiguous megablast)
Message-ID: <CALyJZZXxMpBPJc=gJ6WSh_dK3KvVpfuX+Ve9-JaAMyP6YYzhDg@mail.gmail.com>

On the NCBI Blast website there is an option to *Optimize for : **More
dissimilar sequences (discontiguous megablast) *The URL shows to to this to
be BLAST_PROGRAMS="discoMegablast"   is there a way to do this
with NCBIWWW.qblast ?
*
*
Vincent Davis

From vincent at vincentdavis.net  Sun Feb 10 17:31:22 2013
From: vincent at vincentdavis.net (Vincent Davis)
Date: Sun, 10 Feb 2013 15:31:22 -0700
Subject: [Biopython] Taxonomic Classification tree
In-Reply-To: <51180E45.9000102@student.otago.ac.nz>
References: <CALyJZZXrqwxZsVSXunYCZWqCeQ25HsYNbXjFmqkqZCV_QEfQQQ@mail.gmail.com>
	<848FC12D-5B1C-4D1F-94B1-4EC97845FFEE@genedrift.org>
	<CALyJZZXdrr0VmFWsrUUa+ULr6+m2BiJb_F5sv8cEG5ikzJdENw@mail.gmail.com>
	<118F034CF4C3EF48A96F86CE585B94BF6CE1FE3E@CHIMBX5.ad.uillinois.edu>
	<51180E45.9000102@student.otago.ac.nz>
Message-ID: <CALyJZZXy48=JLTmaf5LGf=KTp83WibTSNTtidTjuEWcqjC=Tyg@mail.gmail.com>

On Sun, Feb 10, 2013 at 2:16 PM, David Winter
<winda002 at student.otago.ac.nz>wrote:
>
> Both those are probably more work than you need to do though. The
> Interactive Tree of Life page (http://itol.embl.de/other_**trees.shtml<http://itol.embl.de/other_trees.shtml>)
> can take taxon names or IDs and return a phylogeny.
>

This is what I needed thanks David


Vincent Davis
720-301-3003

From vincent at vincentdavis.net  Sun Feb 10 23:35:49 2013
From: vincent at vincentdavis.net (Vincent Davis)
Date: Sun, 10 Feb 2013 21:35:49 -0700
Subject: [Biopython] How to BLAST Optimize for : More dissimilar
 sequences (discontiguous megablast)
In-Reply-To: <CALyJZZXxMpBPJc=gJ6WSh_dK3KvVpfuX+Ve9-JaAMyP6YYzhDg@mail.gmail.com>
References: <CALyJZZXxMpBPJc=gJ6WSh_dK3KvVpfuX+Ve9-JaAMyP6YYzhDg@mail.gmail.com>
Message-ID: <CALyJZZVObn32Sq8s=yGLbwEOqjbwa0LFT+-SF=g_Wy9fXdV8hg@mail.gmail.com>

On Sun, Feb 10, 2013 at 3:25 PM, Vincent Davis <vincent at vincentdavis.net>wrote:

> BLAST_PROGRAMS


I got it figured out. Just need to change the defaults

Vincent Davis
720-301-3003

From hlapp at drycafe.net  Wed Feb 13 23:30:15 2013
From: hlapp at drycafe.net (Hilmar Lapp)
Date: Wed, 13 Feb 2013 23:30:15 -0500
Subject: [Biopython] Taxonomic Classification tree
In-Reply-To: <CALyJZZXdrr0VmFWsrUUa+ULr6+m2BiJb_F5sv8cEG5ikzJdENw@mail.gmail.com>
References: <CALyJZZXrqwxZsVSXunYCZWqCeQ25HsYNbXjFmqkqZCV_QEfQQQ@mail.gmail.com>
	<848FC12D-5B1C-4D1F-94B1-4EC97845FFEE@genedrift.org>
	<CALyJZZXdrr0VmFWsrUUa+ULr6+m2BiJb_F5sv8cEG5ikzJdENw@mail.gmail.com>
Message-ID: <94D820E8-1E6A-438F-B923-63288D7DBAC6@drycafe.net>


On Feb 9, 2013, at 11:53 PM, Vincent Davis wrote:

> On Sat, Feb 9, 2013 at 9:03 PM, Paulo Nuin <nuin at genedrift.org> wrote:
> 
>> All phylogenetic trees are based on taxa. 

This is not true. Phylogenetic trees are based on a character matrix. The rows in such a matrix are called OTUs. OTUs may or may not refer to a taxon; they could (and nowadays typically do) refer to a gene, a protein, a (part of a) genome, or some other nucleic acid or amino acid sequence. 

> Maybe but Taxonomic Classification is not based on phylogenetics.

Not strictly, but it aspires to be. I.e., species taxonomies aspire to group taxa together that are monophyletic. In practice this isn't always the case, but it's the idea, and is one reason why taxonomies change. 

> What I have is a list of organisms and their Taxonomic Classification. I want to build a tree based on only the Taxonomic Classification.

You can obtain this directly from the NCBI taxonomy:

http://www.ncbi.nlm.nih.gov/Taxonomy/CommonTree/wwwcmt.cgi

	-hilmar
-- 
===========================================================
: Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net :
===========================================================


From nuin at genedrift.org  Thu Feb 14 05:22:19 2013
From: nuin at genedrift.org (Paulo Nuin)
Date: Thu, 14 Feb 2013 05:22:19 -0500
Subject: [Biopython] Taxonomic Classification tree
In-Reply-To: <94D820E8-1E6A-438F-B923-63288D7DBAC6@drycafe.net>
References: <CALyJZZXrqwxZsVSXunYCZWqCeQ25HsYNbXjFmqkqZCV_QEfQQQ@mail.gmail.com>
	<848FC12D-5B1C-4D1F-94B1-4EC97845FFEE@genedrift.org>
	<CALyJZZXdrr0VmFWsrUUa+ULr6+m2BiJb_F5sv8cEG5ikzJdENw@mail.gmail.com>
	<94D820E8-1E6A-438F-B923-63288D7DBAC6@drycafe.net>
Message-ID: <A3481D23-EDB6-4148-B21D-57814B6F1144@genedrift.org>


On 2013-02-13, at 11:30 PM, Hilmar Lapp <hlapp at drycafe.net> wrote:

> 
> On Feb 9, 2013, at 11:53 PM, Vincent Davis wrote:
> 
>> On Sat, Feb 9, 2013 at 9:03 PM, Paulo Nuin <nuin at genedrift.org> wrote:
>> 
>>> All phylogenetic trees are based on taxa. 
> 
> This is not true. Phylogenetic trees are based on a character matrix. The rows in such a matrix are called OTUs. OTUs may or may not refer to a taxon; they could (and nowadays typically do) refer to a gene, a protein, a (part of a) genome, or some other nucleic acid or amino acid sequence. 
> 
>> 

Around the gene, protein, sequence, phenotipic  character there's an OTU, and there's a a taxon. If you are analyzing extraterrestrial species (or car colours, or fridge models) you might not have a taxon on your OTU but otherwise each and every piece of data you analyze has come from a species, known or not, repeated or unique in your rows. Semantically, you are correct, but even if you put 1000 genes from the same species in a matrix, and generate a phylogenetic tree, you still based your tree on a taxon. 

P


> ===========================================================
> : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net :
> ===========================================================
> 
> 
> 
> 


From vincent at vincentdavis.net  Thu Feb 14 12:20:58 2013
From: vincent at vincentdavis.net (Vincent Davis)
Date: Thu, 14 Feb 2013 10:20:58 -0700
Subject: [Biopython] Concatenate to aligned sequences
Message-ID: <CALyJZZV18GrRjgd3vcA9k2exNuRD+sRcQxjxFdVkUkuFDQtW7A@mail.gmail.com>

I have 2 fasta files from a mucle alignment. Both have the same number of
sequences from the same organism. If I what to concatenate the pairs of
sequences what it the  best way to do this.
Right now I am doing this:

def concatenate(fa1, fa2):
    fa1open = open(fa1, "rU")
    fa2open = open(fa1, "rU")
    fa1dict =  SeqIO.to_dict(SeqIO.parse(fa1open, "fasta"))
    fa2dict =  SeqIO.to_dict(SeqIO.parse(fa2open, "fasta"))
    fa1open.close()
    fa2open.close()
    # check that both files have the same sequnce id's
    if set(fa1dict.keys()) != set(fa2dict.keys()):
        print(fa1dict.keys(), fa2dict.keys())
        print('The fasta files do not have the same sequences')
    bothdict = {}
    bothlist = []
    count = 1
    for key in fa2dict.keys():
        bothdict[key] = fa2dict[key]
        bothdict[key].seq = fa2dict[key].seq + fa1dict[key].seq
        bothlist.append(bothdict[key])
    return bothdict, bothlist

Vincent Davis
720-301-3003

From p.j.a.cock at googlemail.com  Thu Feb 14 12:29:12 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Thu, 14 Feb 2013 17:29:12 +0000
Subject: [Biopython] Concatenate to aligned sequences
In-Reply-To: <CALyJZZV18GrRjgd3vcA9k2exNuRD+sRcQxjxFdVkUkuFDQtW7A@mail.gmail.com>
References: <CALyJZZV18GrRjgd3vcA9k2exNuRD+sRcQxjxFdVkUkuFDQtW7A@mail.gmail.com>
Message-ID: <CAKVJ-_5CmuVNAC=zuih_UXSwWpzitVj4tm=KOcSYaObVDKJhnA@mail.gmail.com>

On Thu, Feb 14, 2013 at 5:20 PM, Vincent Davis <vincent at vincentdavis.net> wrote:
> I have 2 fasta files from a mucle alignment. Both have the same number of
> sequences from the same organism. If I what to concatenate the pairs of
> sequences what it the  best way to do this.
> Right now I am doing this:
>
> def concatenate(fa1, fa2):
>     fa1open = open(fa1, "rU")
>     fa2open = open(fa1, "rU")
>     fa1dict =  SeqIO.to_dict(SeqIO.parse(fa1open, "fasta"))
>     fa2dict =  SeqIO.to_dict(SeqIO.parse(fa2open, "fasta"))
>     fa1open.close()
>     fa2open.close()
>     # check that both files have the same sequnce id's
>     if set(fa1dict.keys()) != set(fa2dict.keys()):
>         print(fa1dict.keys(), fa2dict.keys())
>         print('The fasta files do not have the same sequences')
>     bothdict = {}
>     bothlist = []
>     count = 1
>     for key in fa2dict.keys():
>         bothdict[key] = fa2dict[key]
>         bothdict[key].seq = fa2dict[key].seq + fa1dict[key].seq
>         bothlist.append(bothdict[key])
>     return bothdict, bothlist
>
> Vincent Davis
> 720-301-3003

Have you tried loading the two alignment files via AlignIO,
sorting by name if required, and adding the alignment objects?

http://biopython.org/DIST/docs/api/Bio.Align.MultipleSeqAlignment-class.html#__add__

Peter

From vincent at vincentdavis.net  Thu Feb 14 12:38:43 2013
From: vincent at vincentdavis.net (Vincent Davis)
Date: Thu, 14 Feb 2013 10:38:43 -0700
Subject: [Biopython] Concatenate to aligned sequences
In-Reply-To: <CAKVJ-_5CmuVNAC=zuih_UXSwWpzitVj4tm=KOcSYaObVDKJhnA@mail.gmail.com>
References: <CALyJZZV18GrRjgd3vcA9k2exNuRD+sRcQxjxFdVkUkuFDQtW7A@mail.gmail.com>
	<CAKVJ-_5CmuVNAC=zuih_UXSwWpzitVj4tm=KOcSYaObVDKJhnA@mail.gmail.com>
Message-ID: <CALyJZZXioDLvHr9_Rr6t15aivvHv4Sj7j+PsrQFagDv099EkBQ@mail.gmail.com>

Thanks
Vincent

Vincent Davis
720-301-3003


On Thu, Feb 14, 2013 at 10:29 AM, Peter Cock <p.j.a.cock at googlemail.com>wrote:

> On Thu, Feb 14, 2013 at 5:20 PM, Vincent Davis <vincent at vincentdavis.net>
> wrote:
> > I have 2 fasta files from a mucle alignment. Both have the same number of
> > sequences from the same organism. If I what to concatenate the pairs of
> > sequences what it the  best way to do this.
> > Right now I am doing this:
> >
> > def concatenate(fa1, fa2):
> >     fa1open = open(fa1, "rU")
> >     fa2open = open(fa1, "rU")
> >     fa1dict =  SeqIO.to_dict(SeqIO.parse(fa1open, "fasta"))
> >     fa2dict =  SeqIO.to_dict(SeqIO.parse(fa2open, "fasta"))
> >     fa1open.close()
> >     fa2open.close()
> >     # check that both files have the same sequnce id's
> >     if set(fa1dict.keys()) != set(fa2dict.keys()):
> >         print(fa1dict.keys(), fa2dict.keys())
> >         print('The fasta files do not have the same sequences')
> >     bothdict = {}
> >     bothlist = []
> >     count = 1
> >     for key in fa2dict.keys():
> >         bothdict[key] = fa2dict[key]
> >         bothdict[key].seq = fa2dict[key].seq + fa1dict[key].seq
> >         bothlist.append(bothdict[key])
> >     return bothdict, bothlist
> >
> > Vincent Davis
> > 720-301-3003
>
> Have you tried loading the two alignment files via AlignIO,
> sorting by name if required, and adding the alignment objects?
>
>
> http://biopython.org/DIST/docs/api/Bio.Align.MultipleSeqAlignment-class.html#__add__
>
> Peter
>

From karolisr at gmail.com  Fri Feb 15 12:28:06 2013
From: karolisr at gmail.com (Karolis Ramanauskas)
Date: Fri, 15 Feb 2013 11:28:06 -0600
Subject: [Biopython] Concatenate to aligned sequences
In-Reply-To: <CALyJZZV18GrRjgd3vcA9k2exNuRD+sRcQxjxFdVkUkuFDQtW7A@mail.gmail.com>
References: <CALyJZZV18GrRjgd3vcA9k2exNuRD+sRcQxjxFdVkUkuFDQtW7A@mail.gmail.com>
Message-ID: <CACT_pJGUkGD-ERheWmKyBEBJA23Oms01d3WZsEFvKJMNE4bnYw@mail.gmail.com>

Good day,

I have written a function that will take a list of alignments and will
concatenate them based on the sequence ids. The advantage here is that
the lists do not have to contain the same number of sequences, which
is helpful when you are trying to create one big alignment for
phylogenetic applications and some taxa are missing certain sequences.

concatenate function is here:
https://github.com/karolisr/krpy/blob/master/kralign.py other
functions can be ignored, it only depends on biopython to work.

Peace

On Thu, Feb 14, 2013 at 11:20 AM, Vincent Davis
<vincent at vincentdavis.net> wrote:
> I have 2 fasta files from a mucle alignment. Both have the same number of
> sequences from the same organism. If I what to concatenate the pairs of
> sequences what it the  best way to do this.
> Right now I am doing this:
>
> def concatenate(fa1, fa2):
>     fa1open = open(fa1, "rU")
>     fa2open = open(fa1, "rU")
>     fa1dict =  SeqIO.to_dict(SeqIO.parse(fa1open, "fasta"))
>     fa2dict =  SeqIO.to_dict(SeqIO.parse(fa2open, "fasta"))
>     fa1open.close()
>     fa2open.close()
>     # check that both files have the same sequnce id's
>     if set(fa1dict.keys()) != set(fa2dict.keys()):
>         print(fa1dict.keys(), fa2dict.keys())
>         print('The fasta files do not have the same sequences')
>     bothdict = {}
>     bothlist = []
>     count = 1
>     for key in fa2dict.keys():
>         bothdict[key] = fa2dict[key]
>         bothdict[key].seq = fa2dict[key].seq + fa1dict[key].seq
>         bothlist.append(bothdict[key])
>     return bothdict, bothlist
>
> Vincent Davis
> 720-301-3003
> _______________________________________________
> Biopython mailing list  -  Biopython at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython

From jordan.r.willis at Vanderbilt.Edu  Thu Feb 21 21:19:40 2013
From: jordan.r.willis at Vanderbilt.Edu (Willis, Jordan R)
Date: Fri, 22 Feb 2013 02:19:40 +0000
Subject: [Biopython] User Defined Scoring Matrix
Message-ID: <AC7D5B64FC829E429B0C96F7E3EE5AAD1CAE9963@ITS-HCWNEM108.ds.vanderbilt.edu>

Hello,

Since I'm not sure which tool to exactly use, I will defer to the biopython community since odds are I will be using it. I'm trying to produce a multiple sequence alignment with a user defined scoring matrix. When I look at Clustalw, there is an option to put in such a matrix, and the help indicates that this should be in "blast" format. When I search for blast format, they indicate that this is hard coded into the software. 

My end goal is to produce a phylogeny tree using this PSSM, but I have no idea how to input this into ClustalW or any multiple sequence alignment software. I don't really care which software to use, which wrappers, or how I have to do it.I have used biopython to produce this matrix, so I thought it would be relatively easy to implement it in any multiple sequence alignment software. 

I'm not having very good luck and any help would be must appreciated.

Jordan


From p.j.a.cock at googlemail.com  Fri Feb 22 05:35:41 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Fri, 22 Feb 2013 10:35:41 +0000
Subject: [Biopython] User Defined Scoring Matrix
In-Reply-To: <AC7D5B64FC829E429B0C96F7E3EE5AAD1CAE9963@ITS-HCWNEM108.ds.vanderbilt.edu>
References: <AC7D5B64FC829E429B0C96F7E3EE5AAD1CAE9963@ITS-HCWNEM108.ds.vanderbilt.edu>
Message-ID: <CAKVJ-_6SYCDRo7Fbi32h9kcxpyHWmH1-TC6qWgpp6Qtvvk2Qvg@mail.gmail.com>

On Fri, Feb 22, 2013 at 2:19 AM, Willis, Jordan R
<jordan.r.willis at vanderbilt.edu> wrote:
> Hello,
>
> Since I'm not sure which tool to exactly use, I will defer to the
> biopython community since odds are I will be using it. I'm trying to produce
> a multiple sequence alignment with a user defined scoring matrix. When I
> look at Clustalw, there is an option to put in such a matrix, and the help
> indicates that this should be in "blast" format. When I search for blast
> format, they indicate that this is hard coded into the software.

I wouldn't start with ClustalW - it is old and still widley used, but even
the authors are trying to discourage this. They suggest their new tool
Clustal Omega, and that as a Biopython wrapper and takes an optional
distance matrix as input via the --distmat-i argument.

from Bio.Align.Applications import ClustalOmegaCommandline
help(ClustalOmegaCommandline)

http://biopython.org/DIST/docs/api/Bio.Align.Applications._ClustalOmega.ClustalOmegaCommandline-class.html

> My end goal is to produce a phylogeny tree using this PSSM, but I have no
> idea how to input this into ClustalW or any multiple sequence alignment
> software. I don't really care which software to use, which wrappers, or how
> I have to do it.I have used biopython to produce this matrix, so I thought
> it would be relatively easy to implement it in any multiple sequence
> alignment software.
>
> I'm not having very good luck and any help would be must appreciated.
>
> Jordan

There are people far more qualified than me to comment on the
goals and if and when you should use a distance based tree (my
understanding is distance based trees are the worst kind, but as
they are computationally inexpensive make can sense for large
datasets).

Regards,

Peter

From biocyberman at gmail.com  Fri Feb 22 10:18:58 2013
From: biocyberman at gmail.com (Biocyberman)
Date: Fri, 22 Feb 2013 16:18:58 +0100
Subject: [Biopython] read and write full ID line of EMBL SeqRecord?
Message-ID: <CADVdCiKUci1DqQ4aMTeMNgkxyRPYqJJBuTogHpoB3q1cgf=QuA@mail.gmail.com>

Hi there,
I am using Biopython version 1.6.1 (latest).
My original ID line is:
ID   ACCESSION1; SV 1; linear; genomic DNA; HTG; PRO; 26402 BP.

But after reading and writing out, I got this:

ID   ACCESSION1; SV 1; ; DNA; ; PRO; 26402 BP.

How do I get the same ID line ?

Attached is the python script and input file.

Thanks for taking a look.
Biocyberman
-------------- next part --------------
A non-text attachment was scrubbed...
Name: input.embl
Type: application/octet-stream
Size: 1063 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/biopython/attachments/20130222/e7dfedbf/attachment.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: checkconvert.py
Type: application/octet-stream
Size: 249 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/biopython/attachments/20130222/e7dfedbf/attachment-0001.obj>

From p.j.a.cock at googlemail.com  Fri Feb 22 11:08:13 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Fri, 22 Feb 2013 16:08:13 +0000
Subject: [Biopython] read and write full ID line of EMBL SeqRecord?
In-Reply-To: <CADVdCiKUci1DqQ4aMTeMNgkxyRPYqJJBuTogHpoB3q1cgf=QuA@mail.gmail.com>
References: <CADVdCiKUci1DqQ4aMTeMNgkxyRPYqJJBuTogHpoB3q1cgf=QuA@mail.gmail.com>
Message-ID: <CAKVJ-_5HOWmz=1--31GjPrH6qFLkFTjD3J4FNXP7ZFR0j45-rA@mail.gmail.com>

On Fri, Feb 22, 2013 at 3:18 PM, Biocyberman <biocyberman at gmail.com> wrote:
> Hi there,
> I am using Biopython version 1.6.1 (latest).
> My original ID line is:
> ID   ACCESSION1; SV 1; linear; genomic DNA; HTG; PRO; 26402 BP.
>
> But after reading and writing out, I got this:
>
> ID   ACCESSION1; SV 1; ; DNA; ; PRO; 26402 BP.
>
> How do I get the same ID line ?
>
> Attached is the python script and input file.
>
> Thanks for taking a look.
> Biocyberman

This is probably part of https://redmine.open-bio.org/issues/2578
(the GenBank and EMBL code overlaps a lot).

Peter

From ferreirafm at usp.br  Fri Feb 22 12:01:02 2013
From: ferreirafm at usp.br (Frederico Moraes Ferreira)
Date: Fri, 22 Feb 2013 14:01:02 -0300
Subject: [Biopython] blastdbcmd
Message-ID: <5127A44E.2030403@usp.br>

Hi there Biopythoneers,
As long as I know, there isnt't a blastdbcmd submodule into Biopython. 
So, I've been writing the blast matched sequences ID's to a file, 
fetching them all with a subprocess and reading with SeqIO afterwards. 
In some cases, however, I miss a blastdbcmd parser to make things easy. 
How do you guys are dealing with this?
Best,
Fred

From p.j.a.cock at googlemail.com  Fri Feb 22 12:23:44 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Fri, 22 Feb 2013 17:23:44 +0000
Subject: [Biopython] blastdbcmd
In-Reply-To: <5127A44E.2030403@usp.br>
References: <5127A44E.2030403@usp.br>
Message-ID: <CAKVJ-_4LosCsm4940My0Y5O6L45a-NqxUa6sziUK-wkKm51mJA@mail.gmail.com>

On Fri, Feb 22, 2013 at 5:01 PM, Frederico Moraes Ferreira
<ferreirafm at usp.br> wrote:
> Hi there Biopythoneers,
> As long as I know, there isnt't a blastdbcmd submodule into Biopython. So,
> I've been writing the blast matched sequences ID's to a file, fetching them
> all with a subprocess and reading with SeqIO afterwards. In some cases,
> however, I miss a blastdbcmd parser to make things easy. How do you guys are
> dealing with this?
> Best,
> Fred

Are you talking about a command line wrapper for blastdbcmd, to go in
Bio/Blast/Applications.py? That seems a good idea.

Personally I find the blastdbcmd tool quite handicapped due to the
introduction of generated sequence identifiers, and rarely use it:
http://blastedbio.blogspot.co.uk/2012/10/my-ids-not-good-enough-for-ncbi-blast.html

Instead I would use Bio.SeqIO to index the FASTA file used for the
database, and get the sequences that way.

Peter

From jgibbons1 at mail.usf.edu  Tue Feb 26 11:45:03 2013
From: jgibbons1 at mail.usf.edu (Justin Gibbons)
Date: Tue, 26 Feb 2013 11:45:03 -0500
Subject: [Biopython] Filter Blast results
Message-ID: <CALaGxMistSP1rG4m+ek5s8f0zXH+fmztiUzStA=P-T2q-+o=ZA@mail.gmail.com>

I know that there is already a script in the Cookbook for filtering out
blast queries with no hits, but it involves holding all of the sequence
objects in memory, which isn't good if you have to work with a lot of
sequences. I came up with the following function, which works, but I would
appreciate any input for how to improve it. In particular I don't like that
I am appending the sequence objects to file and would like to know of any
alternatives.

The main function is:

def filter_no_hits(blast_xml_results, source_fasta, file_format,
no_hit_file, hit_file):
    """Scans Blast XML results and if the query sequence has no hits prints
the sequence
        record in the no_hit_file, otherwise in the hit_file. The
source_fasta is the file
        that was used to perform the blast search and is used to retrieve
the sequence record"""

    result_handle=open(blast_xml_results) #open the xml file
    blast_records=NCBIXML.parse(result_handle) #create the generator object
    indexed_fasta=create_indexed_fasta(source_fasta, file_format) #create
the indexed file object

    for record in blast_records:
        hit_def_list=blast_xml_hit_def(record) #returns list of hit_def
results
        record_id=get_id_str_from_desc(record.query) #get the record ID to
search the indexed file
        record_object=indexed_fasta.get_raw(record_id) #Use the sequence ID
to get the sequence record

        if is_list_null(hit_def_list): #if no hits
            append_to_file(no_hit_file, record_object)
        else: #if hits
            append_to_file(hit_file, record_object)
    result_handle.close()

Sub-functions:

def create_indexed_fasta(path, file_format):
    """Makes a fasta file searchable like a dictionary with the sequence Id
    as the key"""
    return SeqIO.index(path, file_format)

def blast_xml_hit_def(record):
    """Returns a list of hit_def for a record from a NCBI blast XML
report"""
    hit_def_list=[]
    for alignment in record.alignments:
        hit_def_list.append(alignment.hit_def)
    return hit_def_list

def get_id_str_from_desc(desc):
    """Returns the Id from a fasta record description"""
    parts=desc.split(" ")
    return parts[0]

def is_list_null(lst):
    """Returns True if list is empty and False otherwise"""
    if len(lst)==0:
        return True
    else:
        return False

def append_to_file(path, string):
    with open(path, 'a') as f:
        f.write(string)

def record_counter(path, file_format):
    """Input a file path and the format of the file and it returns the
    number of records in the file"""
    counter=0
    for seq_record in SeqIO.parse(path, file_format):
        counter+=1
    print "%s contains %i records" %(path, counter)
    return counter

Thank you

Justin Gibbons

From p.j.a.cock at googlemail.com  Tue Feb 26 11:57:01 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Tue, 26 Feb 2013 16:57:01 +0000
Subject: [Biopython] Filter Blast results
In-Reply-To: <CALaGxMistSP1rG4m+ek5s8f0zXH+fmztiUzStA=P-T2q-+o=ZA@mail.gmail.com>
References: <CALaGxMistSP1rG4m+ek5s8f0zXH+fmztiUzStA=P-T2q-+o=ZA@mail.gmail.com>
Message-ID: <CAKVJ-_43zAgKRABh4O13mx6H9iiD_gpQW+z538BfXH+8yL7gww@mail.gmail.com>

On Tue, Feb 26, 2013 at 4:45 PM, Justin Gibbons <jgibbons1 at mail.usf.edu> wrote:
> I know that there is already a script in the Cookbook for filtering out
> blast queries with no hits, but it involves holding all of the sequence
> objects in memory, which isn't good if you have to work with a lot of
> sequences.

Hi Justin,

Which example are you referring too? It doesn't sound very efficient.

There are some wiki pages with user contributed cookbook recipes:
http://biopython.org/wiki/Category:Cookbook

There is also the "Biopython Tutorial and Cookbook", online here:
http://biopython.org/DIST/docs/tutorial/Tutorial.html
http://biopython.org/DIST/docs/tutorial/Tutorial.pdf

Thanks,

Peter

From w.arindrarto at gmail.com  Tue Feb 26 12:27:21 2013
From: w.arindrarto at gmail.com (Wibowo Arindrarto)
Date: Tue, 26 Feb 2013 18:27:21 +0100
Subject: [Biopython] Filter Blast results
In-Reply-To: <CAKVJ-_43zAgKRABh4O13mx6H9iiD_gpQW+z538BfXH+8yL7gww@mail.gmail.com>
References: <CALaGxMistSP1rG4m+ek5s8f0zXH+fmztiUzStA=P-T2q-+o=ZA@mail.gmail.com>
	<CAKVJ-_43zAgKRABh4O13mx6H9iiD_gpQW+z538BfXH+8yL7gww@mail.gmail.com>
Message-ID: <CADEGkF7n3o4jjEydxvJEy7tMgXZZz-s9NbENY9vrhCUbZJCQ8g@mail.gmail.com>

Hi Justin,

For your purpose, you can try using the SearchIO module
(http://biopython.org/DIST/docs/tutorial/Tutorial.html#htoc101), from
the latest Biopython (1.61).

Here's my attempt to have a similar working function:

from Bio import SearchIO, SeqIO

fasta_ids = set([x.id for x in SeqIO.parse('fasta', 'fasta')]) # get
all fasta IDs in a set

with open('no_hit', 'w') as no_hit, open('hit', 'w') as hit:
    for qresult in SearchIO.parse('blast_results.xml', 'blast-xml'):
        hits = set([x.id for x in qresult]) # get all the ID in a set
        present = fasta_ids.intersection(hits) # output all IDs
present in both sets

        if present: # set is not empty
            hit.write(qresult.id)
        else:
            no_hit.write(qresult.id)

On another note, if you are always checking against the same Fasta
file, you can try to create your own BLAST database consisting of only
those files and search against them, so any BLAST results you have
will always at least
contain one of the sequences in your FASTA file.

This makes the functions slightly simpler:

from Bio import SearchIO

with open('no_hit', 'w') as no_hit, open('hit', 'w') as hit:
    for qresult in SearchIO.parse('blast_results.xml', 'blast-xml'):
        # empty queries evaluate to False
        if qresult:
            hit.write(qresult.id)
        else:
            no_hit.write(qresult.id)

Both functions still require you to store all the FASTA IDs in memory,
but should be more reasonable than storing whole SeqRecord objects.

Hope that helps,
Bow

On Tue, Feb 26, 2013 at 5:57 PM, Peter Cock <p.j.a.cock at googlemail.com> wrote:
> On Tue, Feb 26, 2013 at 4:45 PM, Justin Gibbons <jgibbons1 at mail.usf.edu> wrote:
>> I know that there is already a script in the Cookbook for filtering out
>> blast queries with no hits, but it involves holding all of the sequence
>> objects in memory, which isn't good if you have to work with a lot of
>> sequences.
>
> Hi Justin,
>
> Which example are you referring too? It doesn't sound very efficient.
>
> There are some wiki pages with user contributed cookbook recipes:
> http://biopython.org/wiki/Category:Cookbook
>
> There is also the "Biopython Tutorial and Cookbook", online here:
> http://biopython.org/DIST/docs/tutorial/Tutorial.html
> http://biopython.org/DIST/docs/tutorial/Tutorial.pdf
>
> Thanks,
>
> Peter
> _______________________________________________
> Biopython mailing list  -  Biopython at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython

From robert.j.ahern at mycit.ie  Wed Feb 27 12:21:24 2013
From: robert.j.ahern at mycit.ie (Robert Ahern)
Date: Wed, 27 Feb 2013 17:21:24 +0000
Subject: [Biopython] (no subject)
Message-ID: <9042978694721632165@unknownmsgid>

robert.j.ahern at mycit.ie

Sent from Windows Mail

From p.j.a.cock at googlemail.com  Wed Feb 27 17:32:35 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Wed, 27 Feb 2013 22:32:35 +0000
Subject: [Biopython] Fwd: [Numpy-discussion] [ANN] SciPy2013: Call for
	abstracts
In-Reply-To: <CAOzk5QcVrMu+7chbRggKX8i+bWVAc4cxQ5Nrvd-32J55REXwMg@mail.gmail.com>
References: <CAOzk5QcVrMu+7chbRggKX8i+bWVAc4cxQ5Nrvd-32J55REXwMg@mail.gmail.com>
Message-ID: <CAKVJ-_5-a2FiZ_+rCX16MuDRe4-6B3Jk112XAtNvuTjsbaRKfA@mail.gmail.com>

The new bioinformatics mini-symposium this year makes SciPy 2013
especially interesting.

Peter

---------- Forwarded message ----------
From: *Jonathan Rocher*
Date: Wednesday, February 27, 2013
Subject: [Numpy-discussion] [ANN] SciPy2013: Call for abstracts
To: SciPy Users List <scipy-user at scipy.org>, numfocus at googlegroups.com,
Discussion of Numerical Python <numpy-discussion at scipy.org>


[Apologies for cross-posts]

Dear all,

The annual SciPy Conference (Scientific Computing with
Python)<http://conference.scipy.org/scipy2013/about.php> allows
participants from academic, commercial, and governmental organizations to
showcase their latest projects, learn from skilled users and developers,
and collaborate on code development. *The deadline for abstract submissions
is March 20th, 2013. *

Submissions are welcome that address general Scientific Computing with
Python, one of the two special themes for this years conference (machine
learning & reproducible science), or the domain-specific
mini-symposia<http://conference.scipy.org/scipy2013/about.php>held
during the conference (Meteorology, climatology, and atmospheric and
oceanic science, Astronomy and astrophysics, Medical imaging,
Bio-informatics).

Please submit your abstract at the SciPy 2013 website abstract submission
form <http://conference.scipy.org/scipy2013/speaking_submission.php>.
Abstracts will be accepted for posters or presentations. Optional papers to
be published in the conference proceedings will be requested following
abstract submission. This year the proceedings will be made available prior
to the conference to help attendees navigate the conference.

We look forward to an exciting and interesting set of talks, posters, and
discussions and hope to see you at the conference.
The SciPy 2013 Program Committee Chairs

Matt McCormick, Kitware, Inc.
Katy Huff, University of Wisconsin-Madison and Argonne National Laboratory

From chapmanb at 50mail.com  Thu Feb 28 21:36:34 2013
From: chapmanb at 50mail.com (Brad Chapman)
Date: Thu, 28 Feb 2013 21:36:34 -0500
Subject: [Biopython] [ANN] SciPy2013: Call for abstracts
In-Reply-To: <CAKVJ-_5-a2FiZ_+rCX16MuDRe4-6B3Jk112XAtNvuTjsbaRKfA@mail.gmail.com>
References: <CAOzk5QcVrMu+7chbRggKX8i+bWVAc4cxQ5Nrvd-32J55REXwMg@mail.gmail.com>
	<CAKVJ-_5-a2FiZ_+rCX16MuDRe4-6B3Jk112XAtNvuTjsbaRKfA@mail.gmail.com>
Message-ID: <87ppzjsv65.fsf@fastmail.fm>


Peter;
Thanks for sending this out. I'm helping with the organization of the
SciPy bioinformatics session thanks to Peter's recommendation and wrote
up a little bit about the types of abstracts that would fit will with
the overall theme of SciPy:

http://j.mp/Z4xxXB

This is a great chance to connect with another open source scientific
community so definitely send in an abstract if this is of interest; the
deadline is coming up next month: March 20th. Austin also has awesome
music and barbecue in addition to science and hacking so lots of reasons
to attend,
Brad


> The new bioinformatics mini-symposium this year makes SciPy 2013
> especially interesting.
>
> Peter
>
> ---------- Forwarded message ----------
> From: *Jonathan Rocher*
> Date: Wednesday, February 27, 2013
> Subject: [Numpy-discussion] [ANN] SciPy2013: Call for abstracts
> To: SciPy Users List <scipy-user at scipy.org>, numfocus at googlegroups.com,
> Discussion of Numerical Python <numpy-discussion at scipy.org>
>
>
> [Apologies for cross-posts]
>
> Dear all,
>
> The annual SciPy Conference (Scientific Computing with
> Python)<http://conference.scipy.org/scipy2013/about.php> allows
> participants from academic, commercial, and governmental organizations to
> showcase their latest projects, learn from skilled users and developers,
> and collaborate on code development. *The deadline for abstract submissions
> is March 20th, 2013. *
>
> Submissions are welcome that address general Scientific Computing with
> Python, one of the two special themes for this years conference (machine
> learning & reproducible science), or the domain-specific
> mini-symposia<http://conference.scipy.org/scipy2013/about.php>held
> during the conference (Meteorology, climatology, and atmospheric and
> oceanic science, Astronomy and astrophysics, Medical imaging,
> Bio-informatics).
>
> Please submit your abstract at the SciPy 2013 website abstract submission
> form <http://conference.scipy.org/scipy2013/speaking_submission.php>.
> Abstracts will be accepted for posters or presentations. Optional papers to
> be published in the conference proceedings will be requested following
> abstract submission. This year the proceedings will be made available prior
> to the conference to help attendees navigate the conference.
>
> We look forward to an exciting and interesting set of talks, posters, and
> discussions and hope to see you at the conference.
> The SciPy 2013 Program Committee Chairs
>
> Matt McCormick, Kitware, Inc.
> Katy Huff, University of Wisconsin-Madison and Argonne National Laboratory
> _______________________________________________
> Biopython mailing list  -  Biopython at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython

From p.j.a.cock at googlemail.com  Fri Feb  1 13:34:46 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Fri, 1 Feb 2013 13:34:46 +0000
Subject: [Biopython] Fwd: Bug in bgzf module
In-Reply-To: <CANJ6P8J7PBXSQngbLJ2QKkFHEpFWwaQ8opiQRBPtu01eiUK2KQ@mail.gmail.com>
References: <CANJ6P8KTPF0DCoOGvFfVAXQkwJtZezncpr4HDDTYn4HAQJjUnQ@mail.gmail.com>
	<CANJ6P8LBkbR89pROYfka4P82TFAPvdLSOiwjEr3gxNgxx=wghw@mail.gmail.com>
	<CAKVJ-_48X3YGXN7ky+LmtQf8YFyscm6e0wtJWZs2ZM8yLyj3Bg@mail.gmail.com>
	<CANJ6P8J7PBXSQngbLJ2QKkFHEpFWwaQ8opiQRBPtu01eiUK2KQ@mail.gmail.com>
Message-ID: <CAKVJ-_6tvf3U3MJp0O2Cd6sPsgoMP_yaQtWnkG3yro5vsoXneA@mail.gmail.com>

On Thu, Jan 31, 2013 at 10:57 PM, Petra Kubincov?
<petra.kubincova at gmail.com> wrote:
> Hi Peter,
>
> well, I don't have much experience with unit tests but I will try to come up
> with something. :)
> I'll let you know if I won't succeed.

That would be great - in the short term I've added something quite simple:
https://github.com/biopython/biopython/commit/5b0d0bd55024d6dbbdea85ff73e6bd2fbbfd5ee1

> And yes, recording an index is exactly the thing I need to do. (I am
> currently working on interval mapping tool for multiple whole-genome
> alignments, where I need to read .maf file, write preprocessed data into a
> compressed file and then work just with index for the compressed file and
> the compressed file itself to do the mapping.)

That reminds me I need to look at Andrew's MAF work:
http://biopython.org/wiki/Multiple_Alignment_Format
https://github.com/biopython/biopython/pull/5

Regards,

Peter


From p.j.a.cock at googlemail.com  Mon Feb  4 18:04:40 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Mon, 4 Feb 2013 18:04:40 +0000
Subject: [Biopython] Proof reading the tutorial for the next release?
Message-ID: <CAKVJ-_4+H+qb-nB-6PBZzBxoRgGDrtWy54jE3qT4MQcio2mZ_w@mail.gmail.com>

Hello all,

If you're also on the Biopython-Dev Mailing List you'll know
we're hoping to release Biopython 1.61 this week. If anyone
here wants to help out, proof-reading the draft tutorial would
be great :)

I've posted the current tutorial as HTML and PDF online,
http://biopython.org/DIST/docs/tutorial/Tutorial-dev.html
http://biopython.org/DIST/docs/tutorial/Tutorial-dev.pdf

Currently those are being updated manually (it used to be
done automatically every night - something which needs
to be-configured following a server move). If you see an
error, and want to know if it has already been fixed, then
the source file is Tutorial.tex (it is written using LaTex), and
you can see the recent changes here on GitHub:

https://github.com/biopython/biopython/commits/master/Doc/Tutorial.tex

Thanks,

Peter


From p.j.a.cock at googlemail.com  Tue Feb  5 22:05:25 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Tue, 5 Feb 2013 22:05:25 +0000
Subject: [Biopython] Biopython 1.61 released
Message-ID: <CAKVJ-_6jxFnV8HozDT8sc7xey88y8hXkW8dQUSw0yZDO-q00FA@mail.gmail.com>

Dear Biopythoneers,

Source distributions and Windows installers for Biopython 1.61 are now
available from the downloads page on the Biopython website and from
the Python Package Index (PyPI).

The updated Biopython Tutorial and Cookbook is online (PDF).

Platforms/Deployment:

We currently support Python 2.5, 2.6 and 2.7 and also test under
Python 3.1, 3.2 and 3.3 (including modules using NumPy), and Jython
2.5 and PyPy 1.9 (Jython and PyPy do not cover NumPy or our C
extensions). We are still encouraging early adopters to help test on
these platforms, and have included a ?beta? installer for Python 3.2
(and Python 3.3 to follow soon) under 32-bit Windows.

Please note we are phasing out support for Python 2.5. We will
continue support for at least one further release (Biopython 1.62).
This could be extended given feedback from our users. Focusing on
Python 2.6 and 2.7 only will make writing Python 3 compatible code
easier.

New Features:

GenomeDiagram has three new sigils (shapes to illustrate features).
OCTO shows an octagonal shape, like the existing BOX sigil but with
the corners cut off. JAGGY shows a box with jagged edges at the start
and end, intended for things like NNNNN regions in draft genomes.
Finally BIGARROW is like the existing ARROW sigil but is drawn
straddling the axis. This is useful for drawing vertically compact
figures where you do not have overlapping genes.

New module Bio.Graphics.ColorSpiral can generate colors along a spiral
path through HSV color space. This can be used to make arbitrary
?rainbow? scales, for example to color features or cross-links on a
GenomeDiagram figure.

The Bio.SeqIO module now supports reading sequences from PDB files in
two different ways. The ?pdb-atom? format determines the sequence as
it appears in the structure based on the atom coordinate section of
the file (via Bio.PDB,
so NumPy is currently required for this). Alternatively, you can use
the ?pdb-seqres? format to read the complete protein sequence as it is
listed in the PDB header, if available.

The Bio.SeqUtils module how has a seq1 function to turn a sequence
using three letter amino acid codes into one using the more common one
letter codes. This acts as the inverse of the existing seq3 function.

The multiple-sequence-alignment object used by Bio.AlignIO etc now
supports an annotation dictionary. Additional support for per-column
annotation is planned, with addition and splicing to work like that
for the SeqRecord per-letter annotation.

The Bio.Motif module has been updated and reorganized. To allow for a
clean deprecation of the old code, the new motif code is stored in a
new module Bio.motifs, and a PendingDeprecationWarning was added to
Bio.Motif.

Experimental Code ? SearchIO:

This release also includes Bow?s Google Summer of Code work writing a
unified parsing framework for NCBI BLAST (assorted formats including
tabular and XML), HMMER, BLAT, and other sequence searching tools.
This is currently available with the new BiopythonExperimentalWarning
to indicate that this is still somewhat experimental. We?re bundling
it with the main release to get more public feedback, but with the big
warning that the API is likely to change. In fact, even the current
name of Bio.SearchIO may change since unless you are familiar with
BioPerl its purpose isn?t immediately clear.

Contributors:

Brandon Invergo
Bryan Lunt (first contribution)
Christian Brueffer (first contribution)
David Cain
Eric Talevich
Grace Yeo (first contribution)
Jeffrey Chang
Jingping Li (first contribution)
Kai Blin (first contribution)
Leighton Pritchard
Lenna Peterson
Lucas Sinclair (first contribution)
Michiel de Hoon
Nick Semenkovich (first contribution)
Peter Cock
Robert Ernst (first contribution)
Tiago Antao
Wibowo ?Bow? Arindrarto

Thank you all.

Release announcement here (RSS feed available):
http://news.open-bio.org/news/2013/02/biopython-1-61-released/

P.S. You can follow @Biopython on Twitter
https://twitter.com/Biopython


From w.arindrarto at gmail.com  Wed Feb  6 00:03:52 2013
From: w.arindrarto at gmail.com (Wibowo Arindrarto)
Date: Wed, 6 Feb 2013 01:03:52 +0100
Subject: [Biopython] Biopython 1.61 released
In-Reply-To: <CAKVJ-_6jxFnV8HozDT8sc7xey88y8hXkW8dQUSw0yZDO-q00FA@mail.gmail.com>
References: <CAKVJ-_6jxFnV8HozDT8sc7xey88y8hXkW8dQUSw0yZDO-q00FA@mail.gmail.com>
Message-ID: <CADEGkF4-H0cc2zC245gaK3AbN8kZRyByD0xe5o8RfX1patj-qA@mail.gmail.com>

Hi Peter,

> Dear Biopythoneers,
>
> Source distributions and Windows installers for Biopython 1.61 are now
> available from the downloads page on the Biopython website and from
> the Python Package Index (PyPI).
>
> The updated Biopython Tutorial and Cookbook is online (PDF).
>
> Platforms/Deployment:
>
> We currently support Python 2.5, 2.6 and 2.7 and also test under
> Python 3.1, 3.2 and 3.3 (including modules using NumPy), and Jython
> 2.5 and PyPy 1.9 (Jython and PyPy do not cover NumPy or our C
> extensions). We are still encouraging early adopters to help test on
> these platforms, and have included a ?beta? installer for Python 3.2
> (and Python 3.3 to follow soon) under 32-bit Windows.
>
> Please note we are phasing out support for Python 2.5. We will
> continue support for at least one further release (Biopython 1.62).
> This could be extended given feedback from our users. Focusing on
> Python 2.6 and 2.7 only will make writing Python 3 compatible code
> easier.
>
> New Features:
>
> GenomeDiagram has three new sigils (shapes to illustrate features).
> OCTO shows an octagonal shape, like the existing BOX sigil but with
> the corners cut off. JAGGY shows a box with jagged edges at the start
> and end, intended for things like NNNNN regions in draft genomes.
> Finally BIGARROW is like the existing ARROW sigil but is drawn
> straddling the axis. This is useful for drawing vertically compact
> figures where you do not have overlapping genes.
>
> New module Bio.Graphics.ColorSpiral can generate colors along a spiral
> path through HSV color space. This can be used to make arbitrary
> ?rainbow? scales, for example to color features or cross-links on a
> GenomeDiagram figure.
>
> The Bio.SeqIO module now supports reading sequences from PDB files in
> two different ways. The ?pdb-atom? format determines the sequence as
> it appears in the structure based on the atom coordinate section of
> the file (via Bio.PDB,
> so NumPy is currently required for this). Alternatively, you can use
> the ?pdb-seqres? format to read the complete protein sequence as it is
> listed in the PDB header, if available.
>
> The Bio.SeqUtils module how has a seq1 function to turn a sequence
> using three letter amino acid codes into one using the more common one
> letter codes. This acts as the inverse of the existing seq3 function.
>
> The multiple-sequence-alignment object used by Bio.AlignIO etc now
> supports an annotation dictionary. Additional support for per-column
> annotation is planned, with addition and splicing to work like that
> for the SeqRecord per-letter annotation.
>
> The Bio.Motif module has been updated and reorganized. To allow for a
> clean deprecation of the old code, the new motif code is stored in a
> new module Bio.motifs, and a PendingDeprecationWarning was added to
> Bio.Motif.
>
> Experimental Code ? SearchIO:
>
> This release also includes Bow?s Google Summer of Code work writing a
> unified parsing framework for NCBI BLAST (assorted formats including
> tabular and XML), HMMER, BLAT, and other sequence searching tools.
> This is currently available with the new BiopythonExperimentalWarning
> to indicate that this is still somewhat experimental. We?re bundling
> it with the main release to get more public feedback, but with the big
> warning that the API is likely to change. In fact, even the current
> name of Bio.SearchIO may change since unless you are familiar with
> BioPerl its purpose isn?t immediately clear.
>
> Contributors:
>
> Brandon Invergo
> Bryan Lunt (first contribution)
> Christian Brueffer (first contribution)
> David Cain
> Eric Talevich
> Grace Yeo (first contribution)
> Jeffrey Chang
> Jingping Li (first contribution)
> Kai Blin (first contribution)
> Leighton Pritchard
> Lenna Peterson
> Lucas Sinclair (first contribution)
> Michiel de Hoon
> Nick Semenkovich (first contribution)
> Peter Cock
> Robert Ernst (first contribution)
> Tiago Antao
> Wibowo ?Bow? Arindrarto
>
> Thank you all.
>
> Release announcement here (RSS feed available):
> http://news.open-bio.org/news/2013/02/biopython-1-61-released/
>
> P.S. You can follow @Biopython on Twitter
> https://twitter.com/Biopython

Thanks for doing the release! It feels exciting to see SearchIO code
finally live in the distributions :). Hopefully this will result in
more feedback (and then more improvements ~ likewise for the whole
Biopython as well).

Also, thank you as well to everyone who has criticized / commented /
contributed code to the module :).

cheers,
Bow


From p.j.a.cock at googlemail.com  Thu Feb  7 11:33:25 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Thu, 7 Feb 2013 11:33:25 +0000
Subject: [Biopython] Biopython 1.61 released
In-Reply-To: <CAKVJ-_6jxFnV8HozDT8sc7xey88y8hXkW8dQUSw0yZDO-q00FA@mail.gmail.com>
References: <CAKVJ-_6jxFnV8HozDT8sc7xey88y8hXkW8dQUSw0yZDO-q00FA@mail.gmail.com>
Message-ID: <CAKVJ-_7bXcXkxFQ9Xx0W3CDwd_QzhYiRKpsFHGL9n5YoSFDtXQ@mail.gmail.com>

On Tue, Feb 5, 2013 at 10:05 PM, Peter Cock <p.j.a.cock at googlemail.com> wrote:
> Dear Biopythoneers,
>
> Source distributions and Windows installers for Biopython 1.61 are now
> available from the downloads page on the Biopython website and from
> the Python Package Index (PyPI).
>
> The updated Biopython Tutorial and Cookbook is online (PDF).
>
> Platforms/Deployment:
>
> We currently support Python 2.5, 2.6 and 2.7 and also test under
> Python 3.1, 3.2 and 3.3 (including modules using NumPy), and Jython
> 2.5 and PyPy 1.9 (Jython and PyPy do not cover NumPy or our C
> extensions). We are still encouraging early adopters to help test on
> these platforms, and have included a ?beta? installer for Python 3.2
> (and Python 3.3 to follow soon) under 32-bit Windows.

For those of you wanting to try Biopython on Python 3.3 on Windows,
there is now an installer for Biopython 1.61 built against NumPy 1.7.0rc2.

NumPy 1.7 is their first release to support Python 3.3, and the
official release is expected to be near-identical to this second
release candidate, see:
http://mail.scipy.org/pipermail/numpy-discussion/2013-February/065384.html

Regards,

Peter


From vincent at vincentdavis.net  Sun Feb 10 03:47:20 2013
From: vincent at vincentdavis.net (Vincent Davis)
Date: Sat, 9 Feb 2013 20:47:20 -0700
Subject: [Biopython] Taxonomic Classification tree
Message-ID: <CALyJZZXrqwxZsVSXunYCZWqCeQ25HsYNbXjFmqkqZCV_QEfQQQ@mail.gmail.com>

Any suggestion of how to build a Taxonomic Classification tree. That is,
like a Phylo tree but based on taxa.

Vincent Davis


From nuin at genedrift.org  Sun Feb 10 04:03:27 2013
From: nuin at genedrift.org (Paulo Nuin)
Date: Sat, 9 Feb 2013 23:03:27 -0500
Subject: [Biopython] Taxonomic Classification tree
In-Reply-To: <CALyJZZXrqwxZsVSXunYCZWqCeQ25HsYNbXjFmqkqZCV_QEfQQQ@mail.gmail.com>
References: <CALyJZZXrqwxZsVSXunYCZWqCeQ25HsYNbXjFmqkqZCV_QEfQQQ@mail.gmail.com>
Message-ID: <848FC12D-5B1C-4D1F-94B1-4EC97845FFEE@genedrift.org>

All phylogenetic trees are based on taxa. You might need to be more specific.

Paulo


On 2013-02-09, at 10:47 PM, Vincent Davis <vincent at vincentdavis.net> wrote:

> Any suggestion of how to build a Taxonomic Classification tree. That is,
> like a Phylo tree but based on taxa.
> 
> Vincent Davis
> _______________________________________________
> Biopython mailing list  -  Biopython at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython


From vincent at vincentdavis.net  Sun Feb 10 04:53:13 2013
From: vincent at vincentdavis.net (Vincent Davis)
Date: Sat, 9 Feb 2013 21:53:13 -0700
Subject: [Biopython] Taxonomic Classification tree
In-Reply-To: <848FC12D-5B1C-4D1F-94B1-4EC97845FFEE@genedrift.org>
References: <CALyJZZXrqwxZsVSXunYCZWqCeQ25HsYNbXjFmqkqZCV_QEfQQQ@mail.gmail.com>
	<848FC12D-5B1C-4D1F-94B1-4EC97845FFEE@genedrift.org>
Message-ID: <CALyJZZXdrr0VmFWsrUUa+ULr6+m2BiJb_F5sv8cEG5ikzJdENw@mail.gmail.com>

On Sat, Feb 9, 2013 at 9:03 PM, Paulo Nuin <nuin at genedrift.org> wrote:

> All phylogenetic trees are based on taxa. You might need to be more
> specific.


Maybe but Taxonomic Classification is not based on phylogenetics.
What I have is a list of organisms and their Taxonomic Classification. I
want to build a tree based on only the Taxonomic Classification.

Vincent Davis
720-301-3003


From cjfields at illinois.edu  Sun Feb 10 05:01:59 2013
From: cjfields at illinois.edu (Fields, Christopher J)
Date: Sun, 10 Feb 2013 05:01:59 +0000
Subject: [Biopython] Taxonomic Classification tree
In-Reply-To: <CALyJZZXdrr0VmFWsrUUa+ULr6+m2BiJb_F5sv8cEG5ikzJdENw@mail.gmail.com>
References: <CALyJZZXrqwxZsVSXunYCZWqCeQ25HsYNbXjFmqkqZCV_QEfQQQ@mail.gmail.com>
	<848FC12D-5B1C-4D1F-94B1-4EC97845FFEE@genedrift.org>
	<CALyJZZXdrr0VmFWsrUUa+ULr6+m2BiJb_F5sv8cEG5ikzJdENw@mail.gmail.com>
Message-ID: <118F034CF4C3EF48A96F86CE585B94BF6CE1FE3E@CHIMBX5.ad.uillinois.edu>

On Feb 9, 2013, at 10:53 PM, Vincent Davis <vincent at vincentdavis.net>
 wrote:

> On Sat, Feb 9, 2013 at 9:03 PM, Paulo Nuin <nuin at genedrift.org> wrote:
> 
>> All phylogenetic trees are based on taxa. You might need to be more
>> specific.
> 
> 
> Maybe but Taxonomic Classification is not based on phylogenetics.
> What I have is a list of organisms and their Taxonomic Classification. I
> want to build a tree based on only the Taxonomic Classification.
> 
> Vincent Davis
> 720-301-3003


There's code floating around on the bioperl side for doing this sort of thing, not sure if biopython has anything along these lines (I would be surprised if someone hasn't done this yet, though).

chris


From vincent at vincentdavis.net  Sun Feb 10 20:16:20 2013
From: vincent at vincentdavis.net (Vincent Davis)
Date: Sun, 10 Feb 2013 13:16:20 -0700
Subject: [Biopython] NCBI Blast, what an I going wrong
Message-ID: <CALyJZZV8FwjHJsiS0_sGJYNggvhBtyivKd8QiNgpqsC+ACudsQ@mail.gmail.com>

I am having trouble with NCBIWWW.qblast  I can get the the example to work.
Maybe I need help with reading :-)

>From the documentation
result_handle = NCBIWWW.qblast("blastn", "nt", "8332116")
save_file = open("temp.xml", "w")
save_file.write(result_handle.read())
save_file.close()
result_handle.close()
result_handle = open("temp.xml")
blast_record = NCBIXML.parse(result_handle)

The temp.xml looks correct but I can get nothing from blast_record. I have
tried passing the directly to NCBIXML.parse and still no luck.

How would I for example get the first hit "gi|224094601" ?

Vincent Davis


From p.j.a.cock at googlemail.com  Sun Feb 10 20:35:20 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Sun, 10 Feb 2013 20:35:20 +0000
Subject: [Biopython] NCBI Blast, what an I going wrong
In-Reply-To: <CALyJZZV8FwjHJsiS0_sGJYNggvhBtyivKd8QiNgpqsC+ACudsQ@mail.gmail.com>
References: <CALyJZZV8FwjHJsiS0_sGJYNggvhBtyivKd8QiNgpqsC+ACudsQ@mail.gmail.com>
Message-ID: <CAKVJ-_7J49FT8QA+iwyFg_Aswt_QG+zqgzMz0teKOx4QPeTOsw@mail.gmail.com>

On Sun, Feb 10, 2013 at 8:16 PM, Vincent Davis <vincent at vincentdavis.net> wrote:
> I am having trouble with NCBIWWW.qblast  I can get the the example to work.
> Maybe I need help with reading :-)
>
> >From the documentation
> result_handle = NCBIWWW.qblast("blastn", "nt", "8332116")
> save_file = open("temp.xml", "w")
> save_file.write(result_handle.read())
> save_file.close()
> result_handle.close()
> result_handle = open("temp.xml")
> blast_record = NCBIXML.parse(result_handle)
>
> The temp.xml looks correct but I can get nothing from blast_record. I have
> tried passing the directly to NCBIXML.parse and still no luck.
>
> How would I for example get the first hit "gi|224094601" ?
>
> Vincent Davis

Hi Vincent,

Well, first I would check that the BLAST results were downloaded
ok - can you open the temp.xml file in a text editor (e.g. WordPad
on Windows)? Can you see the hits you are expecting?

Second, the parse function is for iterating over the file - if you
expect just one query's results, try:

blast_record = NCBIXML.read(result_handle)

Peter


From vincent at vincentdavis.net  Sun Feb 10 20:41:42 2013
From: vincent at vincentdavis.net (Vincent Davis)
Date: Sun, 10 Feb 2013 13:41:42 -0700
Subject: [Biopython] NCBI Blast, what an I going wrong
In-Reply-To: <CAKVJ-_7J49FT8QA+iwyFg_Aswt_QG+zqgzMz0teKOx4QPeTOsw@mail.gmail.com>
References: <CALyJZZV8FwjHJsiS0_sGJYNggvhBtyivKd8QiNgpqsC+ACudsQ@mail.gmail.com>
	<CAKVJ-_7J49FT8QA+iwyFg_Aswt_QG+zqgzMz0teKOx4QPeTOsw@mail.gmail.com>
Message-ID: <CALyJZZXAs0N47UkMazwdGEtuE5ASDC+TgFBai2BEGV6NuDbY8Q@mail.gmail.com>

Peter,
I verified the file,
I miss understood the " if you
expect just one query's results" I was reading this as meaning that there
would be more than one hit.
I figured this was a stupid mistake.

Thanks
Vincent


Vincent Davis
720-301-3003


On Sun, Feb 10, 2013 at 1:35 PM, Peter Cock <p.j.a.cock at googlemail.com>wrote:

> On Sun, Feb 10, 2013 at 8:16 PM, Vincent Davis <vincent at vincentdavis.net>
> wrote:
> > I am having trouble with NCBIWWW.qblast  I can get the the example to
> work.
> > Maybe I need help with reading :-)
> >
> > >From the documentation
> > result_handle = NCBIWWW.qblast("blastn", "nt", "8332116")
> > save_file = open("temp.xml", "w")
> > save_file.write(result_handle.read())
> > save_file.close()
> > result_handle.close()
> > result_handle = open("temp.xml")
> > blast_record = NCBIXML.parse(result_handle)
> >
> > The temp.xml looks correct but I can get nothing from blast_record. I
> have
> > tried passing the directly to NCBIXML.parse and still no luck.
> >
> > How would I for example get the first hit "gi|224094601" ?
> >
> > Vincent Davis
>
> Hi Vincent,
>
> Well, first I would check that the BLAST results were downloaded
> ok - can you open the temp.xml file in a text editor (e.g. WordPad
> on Windows)? Can you see the hits you are expecting?
>
> Second, the parse function is for iterating over the file - if you
> expect just one query's results, try:
>
> blast_record = NCBIXML.read(result_handle)
>
> Peter
>


From p.j.a.cock at googlemail.com  Sun Feb 10 20:42:56 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Sun, 10 Feb 2013 20:42:56 +0000
Subject: [Biopython] NCBI Blast, what an I going wrong
In-Reply-To: <CALyJZZXAs0N47UkMazwdGEtuE5ASDC+TgFBai2BEGV6NuDbY8Q@mail.gmail.com>
References: <CALyJZZV8FwjHJsiS0_sGJYNggvhBtyivKd8QiNgpqsC+ACudsQ@mail.gmail.com>
	<CAKVJ-_7J49FT8QA+iwyFg_Aswt_QG+zqgzMz0teKOx4QPeTOsw@mail.gmail.com>
	<CALyJZZXAs0N47UkMazwdGEtuE5ASDC+TgFBai2BEGV6NuDbY8Q@mail.gmail.com>
Message-ID: <CAKVJ-_61oeLHPvLDyP8ptEBk7-PRvqg1=Qy0Cb3yFqcW0zVozg@mail.gmail.com>

On Sun, Feb 10, 2013 at 8:41 PM, Vincent Davis <vincent at vincentdavis.net> wrote:
> Peter,
> I verified the file,
> I miss understood the " if you expect just one query's results" I was
> reading this as meaning that there would be more than one hit.
> I figured this was a stupid mistake.
>
> Thanks
> Vincent

So things are working now :)

Great,

Peter


From vincent at vincentdavis.net  Sun Feb 10 20:46:25 2013
From: vincent at vincentdavis.net (Vincent Davis)
Date: Sun, 10 Feb 2013 13:46:25 -0700
Subject: [Biopython] NCBI Blast, what an I going wrong
In-Reply-To: <CAKVJ-_61oeLHPvLDyP8ptEBk7-PRvqg1=Qy0Cb3yFqcW0zVozg@mail.gmail.com>
References: <CALyJZZV8FwjHJsiS0_sGJYNggvhBtyivKd8QiNgpqsC+ACudsQ@mail.gmail.com>
	<CAKVJ-_7J49FT8QA+iwyFg_Aswt_QG+zqgzMz0teKOx4QPeTOsw@mail.gmail.com>
	<CALyJZZXAs0N47UkMazwdGEtuE5ASDC+TgFBai2BEGV6NuDbY8Q@mail.gmail.com>
	<CAKVJ-_61oeLHPvLDyP8ptEBk7-PRvqg1=Qy0Cb3yFqcW0zVozg@mail.gmail.com>
Message-ID: <CALyJZZU0EZe+HyMTgAaq1111qH83-WehM=DpNin3A0peSMvO9A@mail.gmail.com>

Yes

Vincent Davis
720-301-3003


On Sun, Feb 10, 2013 at 1:42 PM, Peter Cock <p.j.a.cock at googlemail.com>wrote:

> On Sun, Feb 10, 2013 at 8:41 PM, Vincent Davis <vincent at vincentdavis.net>
> wrote:
> > Peter,
> > I verified the file,
> > I miss understood the " if you expect just one query's results" I was
> > reading this as meaning that there would be more than one hit.
> > I figured this was a stupid mistake.
> >
> > Thanks
> > Vincent
>
> So things are working now :)
>
> Great,
>
> Peter
>


From winda002 at student.otago.ac.nz  Sun Feb 10 21:16:53 2013
From: winda002 at student.otago.ac.nz (David Winter)
Date: Mon, 11 Feb 2013 10:16:53 +1300
Subject: [Biopython] Taxonomic Classification tree
In-Reply-To: <118F034CF4C3EF48A96F86CE585B94BF6CE1FE3E@CHIMBX5.ad.uillinois.edu>
References: <CALyJZZXrqwxZsVSXunYCZWqCeQ25HsYNbXjFmqkqZCV_QEfQQQ@mail.gmail.com>
	<848FC12D-5B1C-4D1F-94B1-4EC97845FFEE@genedrift.org>
	<CALyJZZXdrr0VmFWsrUUa+ULr6+m2BiJb_F5sv8cEG5ikzJdENw@mail.gmail.com>
	<118F034CF4C3EF48A96F86CE585B94BF6CE1FE3E@CHIMBX5.ad
Message-ID: <51180E45.9000102@student.otago.ac.nz>


Hi Vincent,

It would probably be possible to do this with Biopython, either by

(a) searching the NCBI's taxonomy database with Eutils to get IDs, then 
fetching the corresponding taxonomy records and extracting the complete 
lineage for each of your taxa. You could find the "lowest shared taxon" 
for each one an build a tree

(b) Read the whole NCBI taxonomy using Phylo, and extracting a subtree 
with just your taxa

Both those are probably more work than you need to do though. The 
Interactive Tree of Life page (http://itol.embl.de/other_trees.shtml) 
can take taxon names or IDs and return a phylogeny.

You should be aware - taxonomy is a dynamic science, and assignments can 
change. The NCBI taxonomy is curated by people that know what they're 
talking about, but it's not a definitive tree of life or the result of a 
particular phylogenetic analysis.

David


--
David Winter
Research Associate
Allan Wilson Centre for Molecular Ecology and Evolution
Univeristy of Otago
Dunedin
New Zealand/ Aotearoa

ph + 64 22 018 0449
w: www.david-winter.info
blog: sciblogs.co.nz/the-atavism


On 2/10/2013 6:01 PM, Fields, Christopher J wrote:
> On Feb 9, 2013, at 10:53 PM, Vincent Davis <vincent at vincentdavis.net>
>   wrote:
>
>> On Sat, Feb 9, 2013 at 9:03 PM, Paulo Nuin <nuin at genedrift.org> wrote:
>>
>>> All phylogenetic trees are based on taxa. You might need to be more
>>> specific.
>>
>>
>> Maybe but Taxonomic Classification is not based on phylogenetics.
>> What I have is a list of organisms and their Taxonomic Classification. I
>> want to build a tree based on only the Taxonomic Classification.
>>
>> Vincent Davis
>> 720-301-3003
>
>
> There's code floating around on the bioperl side for doing this sort of thing, not sure if biopython has anything along these lines (I would be surprised if someone hasn't done this yet, though).
>
> chris
>
>
> _______________________________________________
> Biopython mailing list  -  Biopython at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython
>
>


From vincent at vincentdavis.net  Sun Feb 10 22:25:53 2013
From: vincent at vincentdavis.net (Vincent Davis)
Date: Sun, 10 Feb 2013 15:25:53 -0700
Subject: [Biopython] How to BLAST Optimize for : More dissimilar sequences
	(discontiguous megablast)
Message-ID: <CALyJZZXxMpBPJc=gJ6WSh_dK3KvVpfuX+Ve9-JaAMyP6YYzhDg@mail.gmail.com>

On the NCBI Blast website there is an option to *Optimize for : **More
dissimilar sequences (discontiguous megablast) *The URL shows to to this to
be BLAST_PROGRAMS="discoMegablast"   is there a way to do this
with NCBIWWW.qblast ?
*
*
Vincent Davis


From vincent at vincentdavis.net  Sun Feb 10 22:31:22 2013
From: vincent at vincentdavis.net (Vincent Davis)
Date: Sun, 10 Feb 2013 15:31:22 -0700
Subject: [Biopython] Taxonomic Classification tree
In-Reply-To: <51180E45.9000102@student.otago.ac.nz>
References: <CALyJZZXrqwxZsVSXunYCZWqCeQ25HsYNbXjFmqkqZCV_QEfQQQ@mail.gmail.com>
	<848FC12D-5B1C-4D1F-94B1-4EC97845FFEE@genedrift.org>
	<CALyJZZXdrr0VmFWsrUUa+ULr6+m2BiJb_F5sv8cEG5ikzJdENw@mail.gmail.com>
	<118F034CF4C3EF48A96F86CE585B94BF6CE1FE3E@CHIMBX5.ad.uillinois.edu>
	<51180E45.9000102@student.otago.ac.nz>
Message-ID: <CALyJZZXy48=JLTmaf5LGf=KTp83WibTSNTtidTjuEWcqjC=Tyg@mail.gmail.com>

On Sun, Feb 10, 2013 at 2:16 PM, David Winter
<winda002 at student.otago.ac.nz>wrote:
>
> Both those are probably more work than you need to do though. The
> Interactive Tree of Life page (http://itol.embl.de/other_**trees.shtml<http://itol.embl.de/other_trees.shtml>)
> can take taxon names or IDs and return a phylogeny.
>

This is what I needed thanks David


Vincent Davis
720-301-3003


From vincent at vincentdavis.net  Mon Feb 11 04:35:49 2013
From: vincent at vincentdavis.net (Vincent Davis)
Date: Sun, 10 Feb 2013 21:35:49 -0700
Subject: [Biopython] How to BLAST Optimize for : More dissimilar
 sequences (discontiguous megablast)
In-Reply-To: <CALyJZZXxMpBPJc=gJ6WSh_dK3KvVpfuX+Ve9-JaAMyP6YYzhDg@mail.gmail.com>
References: <CALyJZZXxMpBPJc=gJ6WSh_dK3KvVpfuX+Ve9-JaAMyP6YYzhDg@mail.gmail.com>
Message-ID: <CALyJZZVObn32Sq8s=yGLbwEOqjbwa0LFT+-SF=g_Wy9fXdV8hg@mail.gmail.com>

On Sun, Feb 10, 2013 at 3:25 PM, Vincent Davis <vincent at vincentdavis.net>wrote:

> BLAST_PROGRAMS


I got it figured out. Just need to change the defaults

Vincent Davis
720-301-3003


From hlapp at drycafe.net  Thu Feb 14 04:30:15 2013
From: hlapp at drycafe.net (Hilmar Lapp)
Date: Wed, 13 Feb 2013 23:30:15 -0500
Subject: [Biopython] Taxonomic Classification tree
In-Reply-To: <CALyJZZXdrr0VmFWsrUUa+ULr6+m2BiJb_F5sv8cEG5ikzJdENw@mail.gmail.com>
References: <CALyJZZXrqwxZsVSXunYCZWqCeQ25HsYNbXjFmqkqZCV_QEfQQQ@mail.gmail.com>
	<848FC12D-5B1C-4D1F-94B1-4EC97845FFEE@genedrift.org>
	<CALyJZZXdrr0VmFWsrUUa+ULr6+m2BiJb_F5sv8cEG5ikzJdENw@mail.gmail.com>
Message-ID: <94D820E8-1E6A-438F-B923-63288D7DBAC6@drycafe.net>


On Feb 9, 2013, at 11:53 PM, Vincent Davis wrote:

> On Sat, Feb 9, 2013 at 9:03 PM, Paulo Nuin <nuin at genedrift.org> wrote:
> 
>> All phylogenetic trees are based on taxa. 

This is not true. Phylogenetic trees are based on a character matrix. The rows in such a matrix are called OTUs. OTUs may or may not refer to a taxon; they could (and nowadays typically do) refer to a gene, a protein, a (part of a) genome, or some other nucleic acid or amino acid sequence. 

> Maybe but Taxonomic Classification is not based on phylogenetics.

Not strictly, but it aspires to be. I.e., species taxonomies aspire to group taxa together that are monophyletic. In practice this isn't always the case, but it's the idea, and is one reason why taxonomies change. 

> What I have is a list of organisms and their Taxonomic Classification. I want to build a tree based on only the Taxonomic Classification.

You can obtain this directly from the NCBI taxonomy:

http://www.ncbi.nlm.nih.gov/Taxonomy/CommonTree/wwwcmt.cgi

	-hilmar
-- 
===========================================================
: Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net :
===========================================================


From nuin at genedrift.org  Thu Feb 14 10:22:19 2013
From: nuin at genedrift.org (Paulo Nuin)
Date: Thu, 14 Feb 2013 05:22:19 -0500
Subject: [Biopython] Taxonomic Classification tree
In-Reply-To: <94D820E8-1E6A-438F-B923-63288D7DBAC6@drycafe.net>
References: <CALyJZZXrqwxZsVSXunYCZWqCeQ25HsYNbXjFmqkqZCV_QEfQQQ@mail.gmail.com>
	<848FC12D-5B1C-4D1F-94B1-4EC97845FFEE@genedrift.org>
	<CALyJZZXdrr0VmFWsrUUa+ULr6+m2BiJb_F5sv8cEG5ikzJdENw@mail.gmail.com>
	<94D820E8-1E6A-438F-B923-63288D7DBAC6@drycafe.net>
Message-ID: <A3481D23-EDB6-4148-B21D-57814B6F1144@genedrift.org>


On 2013-02-13, at 11:30 PM, Hilmar Lapp <hlapp at drycafe.net> wrote:

> 
> On Feb 9, 2013, at 11:53 PM, Vincent Davis wrote:
> 
>> On Sat, Feb 9, 2013 at 9:03 PM, Paulo Nuin <nuin at genedrift.org> wrote:
>> 
>>> All phylogenetic trees are based on taxa. 
> 
> This is not true. Phylogenetic trees are based on a character matrix. The rows in such a matrix are called OTUs. OTUs may or may not refer to a taxon; they could (and nowadays typically do) refer to a gene, a protein, a (part of a) genome, or some other nucleic acid or amino acid sequence. 
> 
>> 

Around the gene, protein, sequence, phenotipic  character there's an OTU, and there's a a taxon. If you are analyzing extraterrestrial species (or car colours, or fridge models) you might not have a taxon on your OTU but otherwise each and every piece of data you analyze has come from a species, known or not, repeated or unique in your rows. Semantically, you are correct, but even if you put 1000 genes from the same species in a matrix, and generate a phylogenetic tree, you still based your tree on a taxon. 

P


> ===========================================================
> : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net :
> ===========================================================
> 
> 
> 
> 


From vincent at vincentdavis.net  Thu Feb 14 17:20:58 2013
From: vincent at vincentdavis.net (Vincent Davis)
Date: Thu, 14 Feb 2013 10:20:58 -0700
Subject: [Biopython] Concatenate to aligned sequences
Message-ID: <CALyJZZV18GrRjgd3vcA9k2exNuRD+sRcQxjxFdVkUkuFDQtW7A@mail.gmail.com>

I have 2 fasta files from a mucle alignment. Both have the same number of
sequences from the same organism. If I what to concatenate the pairs of
sequences what it the  best way to do this.
Right now I am doing this:

def concatenate(fa1, fa2):
    fa1open = open(fa1, "rU")
    fa2open = open(fa1, "rU")
    fa1dict =  SeqIO.to_dict(SeqIO.parse(fa1open, "fasta"))
    fa2dict =  SeqIO.to_dict(SeqIO.parse(fa2open, "fasta"))
    fa1open.close()
    fa2open.close()
    # check that both files have the same sequnce id's
    if set(fa1dict.keys()) != set(fa2dict.keys()):
        print(fa1dict.keys(), fa2dict.keys())
        print('The fasta files do not have the same sequences')
    bothdict = {}
    bothlist = []
    count = 1
    for key in fa2dict.keys():
        bothdict[key] = fa2dict[key]
        bothdict[key].seq = fa2dict[key].seq + fa1dict[key].seq
        bothlist.append(bothdict[key])
    return bothdict, bothlist

Vincent Davis
720-301-3003


From p.j.a.cock at googlemail.com  Thu Feb 14 17:29:12 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Thu, 14 Feb 2013 17:29:12 +0000
Subject: [Biopython] Concatenate to aligned sequences
In-Reply-To: <CALyJZZV18GrRjgd3vcA9k2exNuRD+sRcQxjxFdVkUkuFDQtW7A@mail.gmail.com>
References: <CALyJZZV18GrRjgd3vcA9k2exNuRD+sRcQxjxFdVkUkuFDQtW7A@mail.gmail.com>
Message-ID: <CAKVJ-_5CmuVNAC=zuih_UXSwWpzitVj4tm=KOcSYaObVDKJhnA@mail.gmail.com>

On Thu, Feb 14, 2013 at 5:20 PM, Vincent Davis <vincent at vincentdavis.net> wrote:
> I have 2 fasta files from a mucle alignment. Both have the same number of
> sequences from the same organism. If I what to concatenate the pairs of
> sequences what it the  best way to do this.
> Right now I am doing this:
>
> def concatenate(fa1, fa2):
>     fa1open = open(fa1, "rU")
>     fa2open = open(fa1, "rU")
>     fa1dict =  SeqIO.to_dict(SeqIO.parse(fa1open, "fasta"))
>     fa2dict =  SeqIO.to_dict(SeqIO.parse(fa2open, "fasta"))
>     fa1open.close()
>     fa2open.close()
>     # check that both files have the same sequnce id's
>     if set(fa1dict.keys()) != set(fa2dict.keys()):
>         print(fa1dict.keys(), fa2dict.keys())
>         print('The fasta files do not have the same sequences')
>     bothdict = {}
>     bothlist = []
>     count = 1
>     for key in fa2dict.keys():
>         bothdict[key] = fa2dict[key]
>         bothdict[key].seq = fa2dict[key].seq + fa1dict[key].seq
>         bothlist.append(bothdict[key])
>     return bothdict, bothlist
>
> Vincent Davis
> 720-301-3003

Have you tried loading the two alignment files via AlignIO,
sorting by name if required, and adding the alignment objects?

http://biopython.org/DIST/docs/api/Bio.Align.MultipleSeqAlignment-class.html#__add__

Peter


From vincent at vincentdavis.net  Thu Feb 14 17:38:43 2013
From: vincent at vincentdavis.net (Vincent Davis)
Date: Thu, 14 Feb 2013 10:38:43 -0700
Subject: [Biopython] Concatenate to aligned sequences
In-Reply-To: <CAKVJ-_5CmuVNAC=zuih_UXSwWpzitVj4tm=KOcSYaObVDKJhnA@mail.gmail.com>
References: <CALyJZZV18GrRjgd3vcA9k2exNuRD+sRcQxjxFdVkUkuFDQtW7A@mail.gmail.com>
	<CAKVJ-_5CmuVNAC=zuih_UXSwWpzitVj4tm=KOcSYaObVDKJhnA@mail.gmail.com>
Message-ID: <CALyJZZXioDLvHr9_Rr6t15aivvHv4Sj7j+PsrQFagDv099EkBQ@mail.gmail.com>

Thanks
Vincent

Vincent Davis
720-301-3003


On Thu, Feb 14, 2013 at 10:29 AM, Peter Cock <p.j.a.cock at googlemail.com>wrote:

> On Thu, Feb 14, 2013 at 5:20 PM, Vincent Davis <vincent at vincentdavis.net>
> wrote:
> > I have 2 fasta files from a mucle alignment. Both have the same number of
> > sequences from the same organism. If I what to concatenate the pairs of
> > sequences what it the  best way to do this.
> > Right now I am doing this:
> >
> > def concatenate(fa1, fa2):
> >     fa1open = open(fa1, "rU")
> >     fa2open = open(fa1, "rU")
> >     fa1dict =  SeqIO.to_dict(SeqIO.parse(fa1open, "fasta"))
> >     fa2dict =  SeqIO.to_dict(SeqIO.parse(fa2open, "fasta"))
> >     fa1open.close()
> >     fa2open.close()
> >     # check that both files have the same sequnce id's
> >     if set(fa1dict.keys()) != set(fa2dict.keys()):
> >         print(fa1dict.keys(), fa2dict.keys())
> >         print('The fasta files do not have the same sequences')
> >     bothdict = {}
> >     bothlist = []
> >     count = 1
> >     for key in fa2dict.keys():
> >         bothdict[key] = fa2dict[key]
> >         bothdict[key].seq = fa2dict[key].seq + fa1dict[key].seq
> >         bothlist.append(bothdict[key])
> >     return bothdict, bothlist
> >
> > Vincent Davis
> > 720-301-3003
>
> Have you tried loading the two alignment files via AlignIO,
> sorting by name if required, and adding the alignment objects?
>
>
> http://biopython.org/DIST/docs/api/Bio.Align.MultipleSeqAlignment-class.html#__add__
>
> Peter
>


From karolisr at gmail.com  Fri Feb 15 17:28:06 2013
From: karolisr at gmail.com (Karolis Ramanauskas)
Date: Fri, 15 Feb 2013 11:28:06 -0600
Subject: [Biopython] Concatenate to aligned sequences
In-Reply-To: <CALyJZZV18GrRjgd3vcA9k2exNuRD+sRcQxjxFdVkUkuFDQtW7A@mail.gmail.com>
References: <CALyJZZV18GrRjgd3vcA9k2exNuRD+sRcQxjxFdVkUkuFDQtW7A@mail.gmail.com>
Message-ID: <CACT_pJGUkGD-ERheWmKyBEBJA23Oms01d3WZsEFvKJMNE4bnYw@mail.gmail.com>

Good day,

I have written a function that will take a list of alignments and will
concatenate them based on the sequence ids. The advantage here is that
the lists do not have to contain the same number of sequences, which
is helpful when you are trying to create one big alignment for
phylogenetic applications and some taxa are missing certain sequences.

concatenate function is here:
https://github.com/karolisr/krpy/blob/master/kralign.py other
functions can be ignored, it only depends on biopython to work.

Peace

On Thu, Feb 14, 2013 at 11:20 AM, Vincent Davis
<vincent at vincentdavis.net> wrote:
> I have 2 fasta files from a mucle alignment. Both have the same number of
> sequences from the same organism. If I what to concatenate the pairs of
> sequences what it the  best way to do this.
> Right now I am doing this:
>
> def concatenate(fa1, fa2):
>     fa1open = open(fa1, "rU")
>     fa2open = open(fa1, "rU")
>     fa1dict =  SeqIO.to_dict(SeqIO.parse(fa1open, "fasta"))
>     fa2dict =  SeqIO.to_dict(SeqIO.parse(fa2open, "fasta"))
>     fa1open.close()
>     fa2open.close()
>     # check that both files have the same sequnce id's
>     if set(fa1dict.keys()) != set(fa2dict.keys()):
>         print(fa1dict.keys(), fa2dict.keys())
>         print('The fasta files do not have the same sequences')
>     bothdict = {}
>     bothlist = []
>     count = 1
>     for key in fa2dict.keys():
>         bothdict[key] = fa2dict[key]
>         bothdict[key].seq = fa2dict[key].seq + fa1dict[key].seq
>         bothlist.append(bothdict[key])
>     return bothdict, bothlist
>
> Vincent Davis
> 720-301-3003
> _______________________________________________
> Biopython mailing list  -  Biopython at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython


From jordan.r.willis at Vanderbilt.Edu  Fri Feb 22 02:19:40 2013
From: jordan.r.willis at Vanderbilt.Edu (Willis, Jordan R)
Date: Fri, 22 Feb 2013 02:19:40 +0000
Subject: [Biopython] User Defined Scoring Matrix
Message-ID: <AC7D5B64FC829E429B0C96F7E3EE5AAD1CAE9963@ITS-HCWNEM108.ds.vanderbilt.edu>

Hello,

Since I'm not sure which tool to exactly use, I will defer to the biopython community since odds are I will be using it. I'm trying to produce a multiple sequence alignment with a user defined scoring matrix. When I look at Clustalw, there is an option to put in such a matrix, and the help indicates that this should be in "blast" format. When I search for blast format, they indicate that this is hard coded into the software. 

My end goal is to produce a phylogeny tree using this PSSM, but I have no idea how to input this into ClustalW or any multiple sequence alignment software. I don't really care which software to use, which wrappers, or how I have to do it.I have used biopython to produce this matrix, so I thought it would be relatively easy to implement it in any multiple sequence alignment software. 

I'm not having very good luck and any help would be must appreciated.

Jordan


From p.j.a.cock at googlemail.com  Fri Feb 22 10:35:41 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Fri, 22 Feb 2013 10:35:41 +0000
Subject: [Biopython] User Defined Scoring Matrix
In-Reply-To: <AC7D5B64FC829E429B0C96F7E3EE5AAD1CAE9963@ITS-HCWNEM108.ds.vanderbilt.edu>
References: <AC7D5B64FC829E429B0C96F7E3EE5AAD1CAE9963@ITS-HCWNEM108.ds.vanderbilt.edu>
Message-ID: <CAKVJ-_6SYCDRo7Fbi32h9kcxpyHWmH1-TC6qWgpp6Qtvvk2Qvg@mail.gmail.com>

On Fri, Feb 22, 2013 at 2:19 AM, Willis, Jordan R
<jordan.r.willis at vanderbilt.edu> wrote:
> Hello,
>
> Since I'm not sure which tool to exactly use, I will defer to the
> biopython community since odds are I will be using it. I'm trying to produce
> a multiple sequence alignment with a user defined scoring matrix. When I
> look at Clustalw, there is an option to put in such a matrix, and the help
> indicates that this should be in "blast" format. When I search for blast
> format, they indicate that this is hard coded into the software.

I wouldn't start with ClustalW - it is old and still widley used, but even
the authors are trying to discourage this. They suggest their new tool
Clustal Omega, and that as a Biopython wrapper and takes an optional
distance matrix as input via the --distmat-i argument.

from Bio.Align.Applications import ClustalOmegaCommandline
help(ClustalOmegaCommandline)

http://biopython.org/DIST/docs/api/Bio.Align.Applications._ClustalOmega.ClustalOmegaCommandline-class.html

> My end goal is to produce a phylogeny tree using this PSSM, but I have no
> idea how to input this into ClustalW or any multiple sequence alignment
> software. I don't really care which software to use, which wrappers, or how
> I have to do it.I have used biopython to produce this matrix, so I thought
> it would be relatively easy to implement it in any multiple sequence
> alignment software.
>
> I'm not having very good luck and any help would be must appreciated.
>
> Jordan

There are people far more qualified than me to comment on the
goals and if and when you should use a distance based tree (my
understanding is distance based trees are the worst kind, but as
they are computationally inexpensive make can sense for large
datasets).

Regards,

Peter


From biocyberman at gmail.com  Fri Feb 22 15:18:58 2013
From: biocyberman at gmail.com (Biocyberman)
Date: Fri, 22 Feb 2013 16:18:58 +0100
Subject: [Biopython] read and write full ID line of EMBL SeqRecord?
Message-ID: <CADVdCiKUci1DqQ4aMTeMNgkxyRPYqJJBuTogHpoB3q1cgf=QuA@mail.gmail.com>

Hi there,
I am using Biopython version 1.6.1 (latest).
My original ID line is:
ID   ACCESSION1; SV 1; linear; genomic DNA; HTG; PRO; 26402 BP.

But after reading and writing out, I got this:

ID   ACCESSION1; SV 1; ; DNA; ; PRO; 26402 BP.

How do I get the same ID line ?

Attached is the python script and input file.

Thanks for taking a look.
Biocyberman
-------------- next part --------------
A non-text attachment was scrubbed...
Name: input.embl
Type: application/octet-stream
Size: 1063 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/biopython/attachments/20130222/e7dfedbf/attachment-0004.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: checkconvert.py
Type: application/octet-stream
Size: 249 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/biopython/attachments/20130222/e7dfedbf/attachment-0005.obj>

From p.j.a.cock at googlemail.com  Fri Feb 22 16:08:13 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Fri, 22 Feb 2013 16:08:13 +0000
Subject: [Biopython] read and write full ID line of EMBL SeqRecord?
In-Reply-To: <CADVdCiKUci1DqQ4aMTeMNgkxyRPYqJJBuTogHpoB3q1cgf=QuA@mail.gmail.com>
References: <CADVdCiKUci1DqQ4aMTeMNgkxyRPYqJJBuTogHpoB3q1cgf=QuA@mail.gmail.com>
Message-ID: <CAKVJ-_5HOWmz=1--31GjPrH6qFLkFTjD3J4FNXP7ZFR0j45-rA@mail.gmail.com>

On Fri, Feb 22, 2013 at 3:18 PM, Biocyberman <biocyberman at gmail.com> wrote:
> Hi there,
> I am using Biopython version 1.6.1 (latest).
> My original ID line is:
> ID   ACCESSION1; SV 1; linear; genomic DNA; HTG; PRO; 26402 BP.
>
> But after reading and writing out, I got this:
>
> ID   ACCESSION1; SV 1; ; DNA; ; PRO; 26402 BP.
>
> How do I get the same ID line ?
>
> Attached is the python script and input file.
>
> Thanks for taking a look.
> Biocyberman

This is probably part of https://redmine.open-bio.org/issues/2578
(the GenBank and EMBL code overlaps a lot).

Peter


From ferreirafm at usp.br  Fri Feb 22 17:01:02 2013
From: ferreirafm at usp.br (Frederico Moraes Ferreira)
Date: Fri, 22 Feb 2013 14:01:02 -0300
Subject: [Biopython] blastdbcmd
Message-ID: <5127A44E.2030403@usp.br>

Hi there Biopythoneers,
As long as I know, there isnt't a blastdbcmd submodule into Biopython. 
So, I've been writing the blast matched sequences ID's to a file, 
fetching them all with a subprocess and reading with SeqIO afterwards. 
In some cases, however, I miss a blastdbcmd parser to make things easy. 
How do you guys are dealing with this?
Best,
Fred


From p.j.a.cock at googlemail.com  Fri Feb 22 17:23:44 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Fri, 22 Feb 2013 17:23:44 +0000
Subject: [Biopython] blastdbcmd
In-Reply-To: <5127A44E.2030403@usp.br>
References: <5127A44E.2030403@usp.br>
Message-ID: <CAKVJ-_4LosCsm4940My0Y5O6L45a-NqxUa6sziUK-wkKm51mJA@mail.gmail.com>

On Fri, Feb 22, 2013 at 5:01 PM, Frederico Moraes Ferreira
<ferreirafm at usp.br> wrote:
> Hi there Biopythoneers,
> As long as I know, there isnt't a blastdbcmd submodule into Biopython. So,
> I've been writing the blast matched sequences ID's to a file, fetching them
> all with a subprocess and reading with SeqIO afterwards. In some cases,
> however, I miss a blastdbcmd parser to make things easy. How do you guys are
> dealing with this?
> Best,
> Fred

Are you talking about a command line wrapper for blastdbcmd, to go in
Bio/Blast/Applications.py? That seems a good idea.

Personally I find the blastdbcmd tool quite handicapped due to the
introduction of generated sequence identifiers, and rarely use it:
http://blastedbio.blogspot.co.uk/2012/10/my-ids-not-good-enough-for-ncbi-blast.html

Instead I would use Bio.SeqIO to index the FASTA file used for the
database, and get the sequences that way.

Peter


From jgibbons1 at mail.usf.edu  Tue Feb 26 16:45:03 2013
From: jgibbons1 at mail.usf.edu (Justin Gibbons)
Date: Tue, 26 Feb 2013 11:45:03 -0500
Subject: [Biopython] Filter Blast results
Message-ID: <CALaGxMistSP1rG4m+ek5s8f0zXH+fmztiUzStA=P-T2q-+o=ZA@mail.gmail.com>

I know that there is already a script in the Cookbook for filtering out
blast queries with no hits, but it involves holding all of the sequence
objects in memory, which isn't good if you have to work with a lot of
sequences. I came up with the following function, which works, but I would
appreciate any input for how to improve it. In particular I don't like that
I am appending the sequence objects to file and would like to know of any
alternatives.

The main function is:

def filter_no_hits(blast_xml_results, source_fasta, file_format,
no_hit_file, hit_file):
    """Scans Blast XML results and if the query sequence has no hits prints
the sequence
        record in the no_hit_file, otherwise in the hit_file. The
source_fasta is the file
        that was used to perform the blast search and is used to retrieve
the sequence record"""

    result_handle=open(blast_xml_results) #open the xml file
    blast_records=NCBIXML.parse(result_handle) #create the generator object
    indexed_fasta=create_indexed_fasta(source_fasta, file_format) #create
the indexed file object

    for record in blast_records:
        hit_def_list=blast_xml_hit_def(record) #returns list of hit_def
results
        record_id=get_id_str_from_desc(record.query) #get the record ID to
search the indexed file
        record_object=indexed_fasta.get_raw(record_id) #Use the sequence ID
to get the sequence record

        if is_list_null(hit_def_list): #if no hits
            append_to_file(no_hit_file, record_object)
        else: #if hits
            append_to_file(hit_file, record_object)
    result_handle.close()

Sub-functions:

def create_indexed_fasta(path, file_format):
    """Makes a fasta file searchable like a dictionary with the sequence Id
    as the key"""
    return SeqIO.index(path, file_format)

def blast_xml_hit_def(record):
    """Returns a list of hit_def for a record from a NCBI blast XML
report"""
    hit_def_list=[]
    for alignment in record.alignments:
        hit_def_list.append(alignment.hit_def)
    return hit_def_list

def get_id_str_from_desc(desc):
    """Returns the Id from a fasta record description"""
    parts=desc.split(" ")
    return parts[0]

def is_list_null(lst):
    """Returns True if list is empty and False otherwise"""
    if len(lst)==0:
        return True
    else:
        return False

def append_to_file(path, string):
    with open(path, 'a') as f:
        f.write(string)

def record_counter(path, file_format):
    """Input a file path and the format of the file and it returns the
    number of records in the file"""
    counter=0
    for seq_record in SeqIO.parse(path, file_format):
        counter+=1
    print "%s contains %i records" %(path, counter)
    return counter

Thank you

Justin Gibbons


From p.j.a.cock at googlemail.com  Tue Feb 26 16:57:01 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Tue, 26 Feb 2013 16:57:01 +0000
Subject: [Biopython] Filter Blast results
In-Reply-To: <CALaGxMistSP1rG4m+ek5s8f0zXH+fmztiUzStA=P-T2q-+o=ZA@mail.gmail.com>
References: <CALaGxMistSP1rG4m+ek5s8f0zXH+fmztiUzStA=P-T2q-+o=ZA@mail.gmail.com>
Message-ID: <CAKVJ-_43zAgKRABh4O13mx6H9iiD_gpQW+z538BfXH+8yL7gww@mail.gmail.com>

On Tue, Feb 26, 2013 at 4:45 PM, Justin Gibbons <jgibbons1 at mail.usf.edu> wrote:
> I know that there is already a script in the Cookbook for filtering out
> blast queries with no hits, but it involves holding all of the sequence
> objects in memory, which isn't good if you have to work with a lot of
> sequences.

Hi Justin,

Which example are you referring too? It doesn't sound very efficient.

There are some wiki pages with user contributed cookbook recipes:
http://biopython.org/wiki/Category:Cookbook

There is also the "Biopython Tutorial and Cookbook", online here:
http://biopython.org/DIST/docs/tutorial/Tutorial.html
http://biopython.org/DIST/docs/tutorial/Tutorial.pdf

Thanks,

Peter


From w.arindrarto at gmail.com  Tue Feb 26 17:27:21 2013
From: w.arindrarto at gmail.com (Wibowo Arindrarto)
Date: Tue, 26 Feb 2013 18:27:21 +0100
Subject: [Biopython] Filter Blast results
In-Reply-To: <CAKVJ-_43zAgKRABh4O13mx6H9iiD_gpQW+z538BfXH+8yL7gww@mail.gmail.com>
References: <CALaGxMistSP1rG4m+ek5s8f0zXH+fmztiUzStA=P-T2q-+o=ZA@mail.gmail.com>
	<CAKVJ-_43zAgKRABh4O13mx6H9iiD_gpQW+z538BfXH+8yL7gww@mail.gmail.com>
Message-ID: <CADEGkF7n3o4jjEydxvJEy7tMgXZZz-s9NbENY9vrhCUbZJCQ8g@mail.gmail.com>

Hi Justin,

For your purpose, you can try using the SearchIO module
(http://biopython.org/DIST/docs/tutorial/Tutorial.html#htoc101), from
the latest Biopython (1.61).

Here's my attempt to have a similar working function:

from Bio import SearchIO, SeqIO

fasta_ids = set([x.id for x in SeqIO.parse('fasta', 'fasta')]) # get
all fasta IDs in a set

with open('no_hit', 'w') as no_hit, open('hit', 'w') as hit:
    for qresult in SearchIO.parse('blast_results.xml', 'blast-xml'):
        hits = set([x.id for x in qresult]) # get all the ID in a set
        present = fasta_ids.intersection(hits) # output all IDs
present in both sets

        if present: # set is not empty
            hit.write(qresult.id)
        else:
            no_hit.write(qresult.id)

On another note, if you are always checking against the same Fasta
file, you can try to create your own BLAST database consisting of only
those files and search against them, so any BLAST results you have
will always at least
contain one of the sequences in your FASTA file.

This makes the functions slightly simpler:

from Bio import SearchIO

with open('no_hit', 'w') as no_hit, open('hit', 'w') as hit:
    for qresult in SearchIO.parse('blast_results.xml', 'blast-xml'):
        # empty queries evaluate to False
        if qresult:
            hit.write(qresult.id)
        else:
            no_hit.write(qresult.id)

Both functions still require you to store all the FASTA IDs in memory,
but should be more reasonable than storing whole SeqRecord objects.

Hope that helps,
Bow

On Tue, Feb 26, 2013 at 5:57 PM, Peter Cock <p.j.a.cock at googlemail.com> wrote:
> On Tue, Feb 26, 2013 at 4:45 PM, Justin Gibbons <jgibbons1 at mail.usf.edu> wrote:
>> I know that there is already a script in the Cookbook for filtering out
>> blast queries with no hits, but it involves holding all of the sequence
>> objects in memory, which isn't good if you have to work with a lot of
>> sequences.
>
> Hi Justin,
>
> Which example are you referring too? It doesn't sound very efficient.
>
> There are some wiki pages with user contributed cookbook recipes:
> http://biopython.org/wiki/Category:Cookbook
>
> There is also the "Biopython Tutorial and Cookbook", online here:
> http://biopython.org/DIST/docs/tutorial/Tutorial.html
> http://biopython.org/DIST/docs/tutorial/Tutorial.pdf
>
> Thanks,
>
> Peter
> _______________________________________________
> Biopython mailing list  -  Biopython at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython


From robert.j.ahern at mycit.ie  Wed Feb 27 17:21:24 2013
From: robert.j.ahern at mycit.ie (Robert Ahern)
Date: Wed, 27 Feb 2013 17:21:24 +0000
Subject: [Biopython] (no subject)
Message-ID: <9042978694721632165@unknownmsgid>

robert.j.ahern at mycit.ie

Sent from Windows Mail


From p.j.a.cock at googlemail.com  Wed Feb 27 22:32:35 2013
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Wed, 27 Feb 2013 22:32:35 +0000
Subject: [Biopython] Fwd: [Numpy-discussion] [ANN] SciPy2013: Call for
	abstracts
In-Reply-To: <CAOzk5QcVrMu+7chbRggKX8i+bWVAc4cxQ5Nrvd-32J55REXwMg@mail.gmail.com>
References: <CAOzk5QcVrMu+7chbRggKX8i+bWVAc4cxQ5Nrvd-32J55REXwMg@mail.gmail.com>
Message-ID: <CAKVJ-_5-a2FiZ_+rCX16MuDRe4-6B3Jk112XAtNvuTjsbaRKfA@mail.gmail.com>

The new bioinformatics mini-symposium this year makes SciPy 2013
especially interesting.

Peter

---------- Forwarded message ----------
From: *Jonathan Rocher*
Date: Wednesday, February 27, 2013
Subject: [Numpy-discussion] [ANN] SciPy2013: Call for abstracts
To: SciPy Users List <scipy-user at scipy.org>, numfocus at googlegroups.com,
Discussion of Numerical Python <numpy-discussion at scipy.org>


[Apologies for cross-posts]

Dear all,

The annual SciPy Conference (Scientific Computing with
Python)<http://conference.scipy.org/scipy2013/about.php> allows
participants from academic, commercial, and governmental organizations to
showcase their latest projects, learn from skilled users and developers,
and collaborate on code development. *The deadline for abstract submissions
is March 20th, 2013. *

Submissions are welcome that address general Scientific Computing with
Python, one of the two special themes for this years conference (machine
learning & reproducible science), or the domain-specific
mini-symposia<http://conference.scipy.org/scipy2013/about.php>held
during the conference (Meteorology, climatology, and atmospheric and
oceanic science, Astronomy and astrophysics, Medical imaging,
Bio-informatics).

Please submit your abstract at the SciPy 2013 website abstract submission
form <http://conference.scipy.org/scipy2013/speaking_submission.php>.
Abstracts will be accepted for posters or presentations. Optional papers to
be published in the conference proceedings will be requested following
abstract submission. This year the proceedings will be made available prior
to the conference to help attendees navigate the conference.

We look forward to an exciting and interesting set of talks, posters, and
discussions and hope to see you at the conference.
The SciPy 2013 Program Committee Chairs

Matt McCormick, Kitware, Inc.
Katy Huff, University of Wisconsin-Madison and Argonne National Laboratory