From hlapp at drycafe.net  Mon Aug  1 18:36:27 2011
From: hlapp at drycafe.net (Hilmar Lapp)
Date: Mon, 1 Aug 2011 18:36:27 -0400
Subject: [Biopython] Job opportunity: User Interface Design and Web
	Application Developer
Message-ID: <7F0AE58E-6052-469B-ACD0-207FAD060472@drycafe.net>

(Apologies if you have received this already or if this is considered  
spam - we're trying to reach out as broad as possible and I know that  
quite a few in the Bio* communities would be well qualified. Please  
feel free to pass on to anyone who might be interested, or might know  
someone who is.)

User Interface Design and Web Application Developer

The National Evolutionary Synthesis Center (NESCent) seeks a creative  
and enthusiastic individual to design user interfaces and web  
applications for scientific applications that manage, analyze,  
visualize and share data in support of evolutionary research. The  
incumbent will work as part of a small informatics team in close  
collaboration with domain scientists.

NESCent (http://nescent.org) is an NSF-funded center dedicated to  
cross-disciplinary research in evolutionary science. Our informatics  
team works closely with visiting and resident scientists to support  
their custom software and database development needs (http://informatics.nescent.org 
), and collaborates broadly with other biodiversity informatics  
projects.  All NESCent software products are open-source, and the  
Center has a number of initiatives to actively promote collaborative  
development of community software resources. Above all, we are  
enthusiastic about our work, about the mission of the Center, and  
about the contribution of informatics to that mission.

Job description: The incumbent will design and develop user interfaces  
and web applications for databases and other software tools for  
sponsored scientists and staff. The job responsibilities include all  
stages of the software development process, including requirements  
gathering, design, implementation, release packaging and  
documentation, as part of a small team (typically 2-3 individuals).   
We expect the incumbent to present their work at conferences and  
contribute to publications with scientific collaborators; interact  
regularly with visiting and resident scientists, other members of the  
informatics team and Center staff; and generally serve as an expert  
resource for Center personnel. The position provides opportunities for  
professional development and encourages research into new  
technologies. Most informatics staff work at our Durham NC offices,  
located adjacent to Duke University, but we support a wide range of  
technologies for virtual communication with off-site staff and  
collaborators.

Salary range: $70,000 - $80,000, depending on education and experience

Required Qualifications:
* Demonstrated success collaborating with clients on custom software  
solutions
* Experience with various stages of the software development cycle
* Expertise in development and testing of user interface designs
* Excellent communication skills, both virtual and face-to-face

Preferred Qualifications:
* M.S. or Ph.D. in Computer Science, Bioinformatics or related field
* Demonstrated interest in science, particularly biology
* Expertise in dynamic and interactive web technologies (JavaScript,  
CGI)
* Expertise in rapid application development and respective  
programming technologies and languages (e.g., modern scripting  
languages and web-application frameworks such as Python/Django, Ruby/ 
Ruby-on-Rails, and Perl/Catalyst).
* Expertise in graphic design
* Expertise in data visualization and/or scientific data integration
* Expertise in software usability design and assessment
* Expertise in web service (SOAP, REST, XML, JSON) and semantic web  
technologies
* Fluency in Java programming
* Prior experience in relational database programming  (PostgreSQL or  
MySQL)
* Experience with open-source, and collaborative, software development

How to apply: Please send cover letter, resume and contact information  
for three references to Dr. Karen Cranston, Training Coordinator and  
Bioinformatics Project Manager (karen.cranston at nescent.org);  Please  
also complete the online application at the University of North  
Carolina HR website: http://bit.ly/r9HQ8r. Informal inquires or  
requests for additional information may be directed to Dr. Cranston by  
email or phone (+1-919-613-2275). Closing date is August 15, 2011.

-- 
===========================================================
: Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net :
===========================================================


From lueck at ipk-gatersleben.de  Tue Aug  2 09:15:40 2011
From: lueck at ipk-gatersleben.de (=?iso-8859-1?Q?Stefanie_L=FCck?=)
Date: Tue, 2 Aug 2011 15:15:40 +0200
Subject: [Biopython] clustalW output format
Message-ID: <000001cc5116$46def910$1022a8c0@ipkgatersleben.de>

Hello!

 
I?m using ClustalW2 for my alignments and would like to have the aln1 output
format (http://www.ebi.ac.uk/Tools/msa/clustalw2/help/). This should show
the line numbers. Actually this should be the default but it?s not. 

 
I tried to add aln1 in \Bio\Align\Applications\_Clustalw.py in line 100+

 
Option(["-output", "-OUTPUT", "OUTPUT", "output"],

["input"],

lambda x: x in ["GCG", "GDE", "PHYLIP", "PIR", "NEXUS",?ALN1?

                         "gcg", "gde", "phylip", "pir", "nexus", ?aln1?]

 
but it doesn?t work.

Any ideas?

 
Thanks in advance!

Stefanie

 
From p.j.a.cock at googlemail.com  Tue Aug  2 10:54:42 2011
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Tue, 2 Aug 2011 15:54:42 +0100
Subject: [Biopython] clustalW output format
In-Reply-To: <000001cc5116$46def910$1022a8c0@ipkgatersleben.de>
References: <000001cc5116$46def910$1022a8c0@ipkgatersleben.de>
Message-ID: <CAKVJ-_40xfuwtJ7yGCP1tBRwMCJWy2Y8rudrL432QtMi6EZwNA@mail.gmail.com>

On Tue, Aug 2, 2011 at 2:15 PM, Stefanie L?ck <lueck at ipk-gatersleben.de> wrote:
> Hello!
>
> I?m using ClustalW2 for my alignments and would like to have the aln1 output
> format (http://www.ebi.ac.uk/Tools/msa/clustalw2/help/). This should show
> the line numbers. Actually this should be the default but it?s not.

I have version 2.1 installed and the default output format is traditional
Clustal output with no residue/base numbers (according to -help).

Which version of ClustalW2 are you using?

I'm expecting we'll need a new wrapper for Clustal Omega (I don't know
why they didn't call it Clustalw v3).

We'll probably also need to update Clustalw parser to cope with base/residue
numbering in the output as well.

Peter


From steven.irvin at monsanto.com  Tue Aug  2 11:47:41 2011
From: steven.irvin at monsanto.com (IRVIN, STEVEN (AG-Contractor/1000))
Date: Tue, 2 Aug 2011 15:47:41 +0000
Subject: [Biopython] Bio.Blast.Applications issue with outfmt="quoted string"
Message-ID: <8F46CBF672774F4C8A6B288A246B4468A532BF@stlwexmbxprd02.na.ds.monsanto.com>

Hello,

I am having an issue with the Biopython module making BLAST+ queries.

I am wondering if there is any support in Bio.Blast.Applications  for using the multiple arguments to -outfmt allowed by NCBI BLAST+ programs such as blastn.

I need to use this for example:

    blastn_cline = NcbiblastnCommandline(query='somefastafile.fas', db='tomato_cdna.db', evalue=1000, word_size=7, outfmt='10 qseqid sseqid length pident', out='outfile.txt')

The multiple arguments allowed to blastn -outfmt  allow the choice of specific columns output to the csv or tab separated file such subject_id, etc.

Biopython is returning non-zero exit status 1: USAGE when I run my program with above statement.


Here is a an example coomand line for BLAST+:

prompt_: blastn -query seq_fasta.fas -db local_db.db -out csv_out.csv -dust no -num_alignments 20 -num_descriptions 20 -evalue 1000 -word_size 7 -task blastn -outfmt "10 qseqid sseqid length pident"

I do not yet know if something in Bio.Blast.Applications needs to be modified to support this.

Steve

Steven D Irvin, MS
Bioinformatics Analyst
[cid:image003.png at 01CC1925.F25B8430] CC214-A
Monsanto Research Center
Chesterfield Village, MO
Steven.Irvin at monsanto.com<mailto:Steven.Irvin at monsanto.com>
(636) 737-1756


This e-mail message may contain privileged and/or confidential information, and is intended to be received only by persons entitled
to receive such information. If you have received this e-mail in error, please notify the sender immediately. Please delete it and
all attachments from any servers, hard drives or any other media. Other use of this e-mail by you is strictly prohibited.

All e-mails and attachments sent and received are subject to monitoring, reading and archival by Monsanto, including its
subsidiaries. The recipient of this e-mail is solely responsible for checking for the presence of "Viruses" or other "Malware".
Monsanto, along with its subsidiaries, accepts no liability for any damage caused by any such code transmitted by or accompanying
this e-mail or any attachment.


The information contained in this email may be subject to the export control laws and regulations of the United States, potentially
including but not limited to the Export Administration Regulations (EAR) and sanctions regulations issued by the U.S. Department of
Treasury, Office of Foreign Asset Controls (OFAC).  As a recipient of this information you are obligated to comply with all
applicable U.S. export laws and regulations.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.png
Type: image/png
Size: 3721 bytes
Desc: image001.png
URL: <http://lists.open-bio.org/pipermail/biopython/attachments/20110802/ff6953f4/attachment.png>

From p.j.a.cock at googlemail.com  Tue Aug  2 12:14:25 2011
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Tue, 2 Aug 2011 17:14:25 +0100
Subject: [Biopython] Bio.Blast.Applications issue with outfmt="quoted
	string"
In-Reply-To: <8F46CBF672774F4C8A6B288A246B4468A532BF@stlwexmbxprd02.na.ds.monsanto.com>
References: <8F46CBF672774F4C8A6B288A246B4468A532BF@stlwexmbxprd02.na.ds.monsanto.com>
Message-ID: <CAKVJ-_5Qh9jnfFhcasRrkcA0fp5ZMeThBPrFgfN8nzm2=hexSQ@mail.gmail.com>

On Tue, Aug 2, 2011 at 4:47 PM, IRVIN, STEVEN (AG-Contractor/1000)
<steven.irvin at monsanto.com> wrote:
> Hello,
>
> I am having an issue with the Biopython module making BLAST+ queries.
>
> I am wondering if there is any support in Bio.Blast.Applications ?for using the multiple arguments to -outfmt allowed by NCBI BLAST+ programs such as blastn.
>
> I need to use this for example:
>
> ? ?blastn_cline = NcbiblastnCommandline(query='somefastafile.fas', db='tomato_cdna.db', evalue=1000, word_size=7, outfmt='10 qseqid sseqid length pident', out='outfile.txt')
>
> The multiple arguments allowed to blastn -outfmt ?allow the choice of specific columns output to the csv or tab separated file such subject_id, etc.

Yes, and they are very useful. Try:

blastn_cline = NcbiblastnCommandline(query='somefastafile.fas',
db='tomato_cdna.db', evalue=1000, word_size=7, outfmt='"10 qseqid
sseqid length pident"', out='outfile.txt')

i.e. Include the extra quotes explicitly. That's single quote, double
quote, text, double quote, single quote.
(There are other ways to embed double quote characters in a Python
string but that works for me.)

Peter


From lueck at ipk-gatersleben.de  Wed Aug  3 07:19:50 2011
From: lueck at ipk-gatersleben.de (=?iso-8859-1?Q?Stefanie_L=FCck?=)
Date: Wed, 3 Aug 2011 13:19:50 +0200
Subject: [Biopython] clustalW output format
In-Reply-To: <CAKVJ-_40xfuwtJ7yGCP1tBRwMCJWy2Y8rudrL432QtMi6EZwNA@mail.gmail.com>
References: <000001cc5116$46def910$1022a8c0@ipkgatersleben.de>
	<CAKVJ-_40xfuwtJ7yGCP1tBRwMCJWy2Y8rudrL432QtMi6EZwNA@mail.gmail.com>
Message-ID: <000001cc51cf$42d16100$1022a8c0@ipkgatersleben.de>

Thanks Peter!

I'm also using version 2.1. I didn't check the -help, only the homepage
where they say "Default value is: Aln w/numbers [aln1]", which confused
me...
Thanks for mentioning Clustal Omega, I didn't know that say changed the
names.
Stefanie

-----Urspr?ngliche Nachricht-----
Von: Peter Cock [mailto:p.j.a.cock at googlemail.com] 
Gesendet: Dienstag, 2. August 2011 16:55
An: Stefanie L?ck
Cc: biopython at lists.open-bio.org
Betreff: Re: [Biopython] clustalW output format

On Tue, Aug 2, 2011 at 2:15 PM, Stefanie L?ck <lueck at ipk-gatersleben.de>
wrote:
> Hello!
>
> I?m using ClustalW2 for my alignments and would like to have the aln1
output
> format (http://www.ebi.ac.uk/Tools/msa/clustalw2/help/). This should show
> the line numbers. Actually this should be the default but it?s not.

I have version 2.1 installed and the default output format is traditional
Clustal output with no residue/base numbers (according to -help).

Which version of ClustalW2 are you using?

I'm expecting we'll need a new wrapper for Clustal Omega (I don't know
why they didn't call it Clustalw v3).

We'll probably also need to update Clustalw parser to cope with base/residue
numbering in the output as well.

Peter


From jgrant at smith.edu  Mon Aug  8 14:08:25 2011
From: jgrant at smith.edu (Jessica Grant)
Date: Mon, 8 Aug 2011 14:08:25 -0400
Subject: [Biopython] deleting in-group paralogs from newick trees
Message-ID: <5D7AD333-66EC-4B23-950E-523E2FBD2A62@smith.edu>

Hello,

I am looking at large phylogenetic trees that have many paralogs.  I  
would like to simplify my trees so that all monophyletic paralog  
groups are collapsed--or all sequences except the shortest branch are  
deleted.  Is there a Biopython module that can help?  I started  
looking at Phylo, but couldn't see an obvious way.

Thanks,

Jessica


From eric.talevich at gmail.com  Mon Aug  8 15:33:52 2011
From: eric.talevich at gmail.com (Eric Talevich)
Date: Mon, 8 Aug 2011 15:33:52 -0400
Subject: [Biopython] deleting in-group paralogs from newick trees
In-Reply-To: <5D7AD333-66EC-4B23-950E-523E2FBD2A62@smith.edu>
References: <5D7AD333-66EC-4B23-950E-523E2FBD2A62@smith.edu>
Message-ID: <CAMC681nx63571_3gxYf+_OLJCS_68JixB55ftwoxNcuiC1bdbQ@mail.gmail.com>

On Mon, Aug 8, 2011 at 2:08 PM, Jessica Grant <jgrant at smith.edu> wrote:

> Hello,
>
> I am looking at large phylogenetic trees that have many paralogs.  I would
> like to simplify my trees so that all monophyletic paralog groups are
> collapsed--or all sequences except the shortest branch are deleted.  Is
> there a Biopython module that can help?  I started looking at Phylo, but
> couldn't see an obvious way.
>

Hi Jessica,

Yes, Phylo is the right module to use. If I understand your problem
correctly, the tree methods you want are is_monophyletic() and
collapse_all(). Both operate on a clade within the tree. You'd traverse the
tree with get_nonterminals(), check if a paralog group under a clade is
monophyletic, and if so, collapse it.

Do you have a list of paralogs already? And, do you know which groups might
be monophyletic?

If you have groups/clades already, it's simple:

>>> tree = Phylo.read('mytree.nwk', 'newick')
>>> for clade in tree.get_nonterminals(order='postorder'):
...     mono_parent = clade.is_monophyletic([SOME_PARALOG_GROUP])
...     if mono_parent:
...         mono_parent.collapse_all()

If you don't know the groups yet, then the test inside the loop is a little
more elaborate. You can look for overlaps between a clade's tips and and the
paralog list using sets:

>>> paralogs = set(PARALOG_LIST)
# Inside the loop:
>>> tips = set([str(t) for t in clade.get_terminals()])
>>> overlap = tips.intersect(paralogs)
>>> if len(overlaps) >= 2:
# The rest of the loop...


Hope that helps,
Eric

From cjfields at illinois.edu  Tue Aug  9 16:09:05 2011
From: cjfields at illinois.edu (Chris Fields)
Date: Tue, 9 Aug 2011 15:09:05 -0500
Subject: [Biopython] [BioRuby] Interesting BLAST 2.2.25+ XML behaviour
In-Reply-To: <BANLkTinV4uha74Y9jC_f=XLK5LufAn1xHw@mail.gmail.com>
References: <BANLkTi=6_2bFpGhOwxtdjy-DzxUotVWxEg@mail.gmail.com>
	<BANLkTinH0y4KQ7_AXt7Ly3TgN9fXxErUzA@mail.gmail.com>
	<398303E2-1195-4CC2-8B73-09C6C1117892@illinois.edu>
	<4DC12371.3040204@gmail.com>
	<BANLkTinV4uha74Y9jC_f=XLK5LufAn1xHw@mail.gmail.com>
Message-ID: <CFCCB538-DDE5-4167-8442-1C296EDE787C@illinois.edu>

I'm reviving this thread to see what the current status is (if anything has changed).  The bioperl parser has the same problem; at the moment we're bascially stuck until NCBI gives some indication as to whether this is a bug or not.  Any word back from them yet?

(and agreed, it would be nice to have an external bug tracker from NCBI).

chris

On May 4, 2011, at 5:36 AM, Peter Cock wrote:

> On Wed, May 4, 2011 at 10:59 AM, Michal <mictadlo at gmail.com> wrote:
>> Hi Peter,
>> Do you have the script which read
>> 
>> https://bitbucket.org/galaxy/galaxy-central/src/8eaf07a46623/test-data/blastp_four_human_vs_rhodopsin.xml
>> 
>> 
>> and what would be the correct output?
>> 
>> Thank you in advance.
>> 
>> Cheers,
>> Michal
> 
> Hi Michal,
> 
> I'm not quite sure what you're asking, but I'll try. First, the three
> data files:
> 
> $ wget https://bitbucket.org/galaxy/galaxy-central/src/8eaf07a46623/test-data/blastp_four_human_vs_rhodopsin.xml
> $ wget https://bitbucket.org/galaxy/galaxy-central/src/8eaf07a46623/test-data/four_human_proteins.fasta
> $ wget https://bitbucket.org/galaxy/galaxy-central/src/8eaf07a46623/rhodopsin_proteins.fasta
> 
> The query file has four sequences,
> 
> $ grep -c "^>" four_human_proteins.fasta
> 4
> 
> $ grep "^>" four_human_proteins.fasta
>> sp|Q9BS26|ERP44_HUMAN Endoplasmic reticulum resident protein 44 OS=Homo sapiens GN=ERP44 PE=1 SV=1
>> sp|Q9NSY1|BMP2K_HUMAN BMP-2-inducible protein kinase OS=Homo sapiens GN=BMP2K PE=1 SV=2
>> sp|P06213|INSR_HUMAN Insulin receptor OS=Homo sapiens GN=INSR PE=1 SV=4
>> sp|P08100|OPSD_HUMAN Rhodopsin OS=Homo sapiens GN=RHO PE=1 SV=1
> 
> Based on past experience, I would expect 4 iteration blocks in the
> XML, but in this case I have 24:
> 
> $ grep "<Iteration>" -c blastp_four_human_vs_rhodopsin.xml
> 24
> 
> Notice we get 6 iterations for each query (4 times 6 is 24):
> 
> $ grep "<Iteration_query-ID>" blastp_four_human_vs_rhodopsin.xml
>      <Iteration_query-ID>sp|Q9BS26|ERP44_HUMAN</Iteration_query-ID>
>      <Iteration_query-ID>sp|Q9BS26|ERP44_HUMAN</Iteration_query-ID>
>      <Iteration_query-ID>sp|Q9BS26|ERP44_HUMAN</Iteration_query-ID>
>      <Iteration_query-ID>sp|Q9BS26|ERP44_HUMAN</Iteration_query-ID>
>      <Iteration_query-ID>sp|Q9BS26|ERP44_HUMAN</Iteration_query-ID>
>      <Iteration_query-ID>sp|Q9BS26|ERP44_HUMAN</Iteration_query-ID>
>      <Iteration_query-ID>sp|Q9NSY1|BMP2K_HUMAN</Iteration_query-ID>
>      <Iteration_query-ID>sp|Q9NSY1|BMP2K_HUMAN</Iteration_query-ID>
>      <Iteration_query-ID>sp|Q9NSY1|BMP2K_HUMAN</Iteration_query-ID>
>      <Iteration_query-ID>sp|Q9NSY1|BMP2K_HUMAN</Iteration_query-ID>
>      <Iteration_query-ID>sp|Q9NSY1|BMP2K_HUMAN</Iteration_query-ID>
>      <Iteration_query-ID>sp|Q9NSY1|BMP2K_HUMAN</Iteration_query-ID>
>      <Iteration_query-ID>sp|P06213|INSR_HUMAN</Iteration_query-ID>
>      <Iteration_query-ID>sp|P06213|INSR_HUMAN</Iteration_query-ID>
>      <Iteration_query-ID>sp|P06213|INSR_HUMAN</Iteration_query-ID>
>      <Iteration_query-ID>sp|P06213|INSR_HUMAN</Iteration_query-ID>
>      <Iteration_query-ID>sp|P06213|INSR_HUMAN</Iteration_query-ID>
>      <Iteration_query-ID>sp|P06213|INSR_HUMAN</Iteration_query-ID>
>      <Iteration_query-ID>sp|P08100|OPSD_HUMAN</Iteration_query-ID>
>      <Iteration_query-ID>sp|P08100|OPSD_HUMAN</Iteration_query-ID>
>      <Iteration_query-ID>sp|P08100|OPSD_HUMAN</Iteration_query-ID>
>      <Iteration_query-ID>sp|P08100|OPSD_HUMAN</Iteration_query-ID>
>      <Iteration_query-ID>sp|P08100|OPSD_HUMAN</Iteration_query-ID>
>      <Iteration_query-ID>sp|P08100|OPSD_HUMAN</Iteration_query-ID>
> 
> Now, using the two FASTA files directly and re-running blastp, what do I get?
> 
> $ ~/Downloads/ncbi-blast-2.2.25+/bin/blastp -query
> four_human_proteins.fasta -subject rhodopsin_proteins.fasta -outfmt 5
> | grep "<Iteration>" -c
> 24
> 
> Or again with -parse_deflines, which changes how the hit ID/def is presented:
> 
> $ ~/Downloads/ncbi-blast-2.2.25+/bin/blastp -query
> four_human_proteins.fasta -subject rhodopsin_proteins.fasta -outfmt 5
> -parse_deflines | grep "<Iteration>" -c
> 24
> 
> How about older versions?
> 
> $ ~/Downloads/ncbi-blast-2.2.24+/bin/blastp -query
> four_human_proteins.fasta -subject rhodopsin_proteins.fasta -outfmt 5
> BLAST engine error: XML formatting is only supported for a database search
> 
> I'll have to make a blast database first...
> 
> $ ~/Downloads/ncbi-blast-2.2.24+/bin/makeblastdb -in
> rhodopsin_proteins.fasta -dbtype prot
> 
> Building a new DB, current time: 05/04/2011 11:22:57
> New DB name:   rhodopsin_proteins.fasta
> New DB title:  rhodopsin_proteins.fasta
> Sequence type: Protein
> Keep Linkouts: T
> Keep MBits: T
> Maximum file size: 1073741824B
> Adding sequences from FASTA; added 6 sequences in 0.105655 seconds.
> 
> $ ~/Downloads/ncbi-blast-2.2.25+/bin/blastp -query
> four_human_proteins.fasta -db rhodopsin_proteins.fasta -outfmt 5 |
> grep "<Iteration>" -c
> 4
> 
> Look - just four identifiers as I expect! This also works if the database
> is built with the -parse_seqids switch.
> 
> The same happens with older versions of BLAST+, one <Iteration>
> block per query, so four iteration blocks for this example. I tried all
> of 2.2.21+, 2.2.22+, 2.2.23+ and 2.2.24+ (running makeblastdb to
> give a fresh database, then blastp).
> 
> That seems to demonstrate that bug is specific to the XML output
> from FASTA vs FASTA (not FASTA vs DB), which is a new feature
> in NCBI BLAST 2.2.25+
> 
> I will raise this with the NCBI, and report back.
> 
> However, even if the NCBI fix it in the next release, we (Bio*) may
> want to update our parsers to cope with this quirk, or at least put a
> warning in our BLAST XML parser documentation, as there will be
> lots of installations of NCBI BLAST 2.2.25+ in the wild.
> 
> Peter


From p.j.a.cock at googlemail.com  Wed Aug 10 05:15:18 2011
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Wed, 10 Aug 2011 10:15:18 +0100
Subject: [Biopython] [BioRuby] Interesting BLAST 2.2.25+ XML behaviour
In-Reply-To: <CFCCB538-DDE5-4167-8442-1C296EDE787C@illinois.edu>
References: <BANLkTi=6_2bFpGhOwxtdjy-DzxUotVWxEg@mail.gmail.com>
	<BANLkTinH0y4KQ7_AXt7Ly3TgN9fXxErUzA@mail.gmail.com>
	<398303E2-1195-4CC2-8B73-09C6C1117892@illinois.edu>
	<4DC12371.3040204@gmail.com>
	<BANLkTinV4uha74Y9jC_f=XLK5LufAn1xHw@mail.gmail.com>
	<CFCCB538-DDE5-4167-8442-1C296EDE787C@illinois.edu>
Message-ID: <CAKVJ-_6eQz=ZTDZHPEGOM12QvXN15OZGuQVOQNyJbW9XK05mPg@mail.gmail.com>

On Tue, Aug 9, 2011 at 9:09 PM, Chris Fields <cjfields at illinois.edu> wrote:
> I'm reviving this thread to see what the current status is (if anything has
> changed). ?The bioperl parser has the same problem; at the moment we're
> bascially stuck until NCBI gives some indication as to whether this is a
> bug or not. ?Any word back from them yet?
>
> (and agreed, it would be nice to have an external bug tracker from NCBI).
>
> chris

Hi Chris,

My email to the NCBI on 17 May had a reply from Tao Tao (NCBI User
services) saying it would be brought to their developers' attention.

For reference, the email subject line was:
"Multiple iteration blocks per query in FASTA vs FASTA BLAST XML"

I have just emailed back to enquire if there is any news to report.

Peter


From cjfields at illinois.edu  Wed Aug 10 21:35:37 2011
From: cjfields at illinois.edu (Chris Fields)
Date: Wed, 10 Aug 2011 20:35:37 -0500
Subject: [Biopython] [BioRuby] Interesting BLAST 2.2.25+ XML behaviour
In-Reply-To: <CAKVJ-_6eQz=ZTDZHPEGOM12QvXN15OZGuQVOQNyJbW9XK05mPg@mail.gmail.com>
References: <BANLkTi=6_2bFpGhOwxtdjy-DzxUotVWxEg@mail.gmail.com>
	<BANLkTinH0y4KQ7_AXt7Ly3TgN9fXxErUzA@mail.gmail.com>
	<398303E2-1195-4CC2-8B73-09C6C1117892@illinois.edu>
	<4DC12371.3040204@gmail.com>
	<BANLkTinV4uha74Y9jC_f=XLK5LufAn1xHw@mail.gmail.com>
	<CFCCB538-DDE5-4167-8442-1C296EDE787C@illinois.edu>
	<CAKVJ-_6eQz=ZTDZHPEGOM12QvXN15OZGuQVOQNyJbW9XK05mPg@mail.gmail.com>
Message-ID: <EA454327-8737-49D9-A9BC-F0A63F14CCB1@illinois.edu>

On Aug 10, 2011, at 4:15 AM, Peter Cock wrote:

> On Tue, Aug 9, 2011 at 9:09 PM, Chris Fields <cjfields at illinois.edu> wrote:
>> I'm reviving this thread to see what the current status is (if anything has
>> changed).  The bioperl parser has the same problem; at the moment we're
>> bascially stuck until NCBI gives some indication as to whether this is a
>> bug or not.  Any word back from them yet?
>> 
>> (and agreed, it would be nice to have an external bug tracker from NCBI).
>> 
>> chris
> 
> Hi Chris,
> 
> My email to the NCBI on 17 May had a reply from Tao Tao (NCBI User
> services) saying it would be brought to their developers' attention.
> 
> For reference, the email subject line was:
> "Multiple iteration blocks per query in FASTA vs FASTA BLAST XML"
> 
> I have just emailed back to enquire if there is any news to report.
> 
> Peter

Wonder if it's worth a second prod from someone else.  Sometimes that gets their attention.

chris

From p.j.a.cock at googlemail.com  Thu Aug 11 05:09:13 2011
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Thu, 11 Aug 2011 10:09:13 +0100
Subject: [Biopython] [BioRuby] Interesting BLAST 2.2.25+ XML behaviour
In-Reply-To: <EA454327-8737-49D9-A9BC-F0A63F14CCB1@illinois.edu>
References: <BANLkTi=6_2bFpGhOwxtdjy-DzxUotVWxEg@mail.gmail.com>
	<BANLkTinH0y4KQ7_AXt7Ly3TgN9fXxErUzA@mail.gmail.com>
	<398303E2-1195-4CC2-8B73-09C6C1117892@illinois.edu>
	<4DC12371.3040204@gmail.com>
	<BANLkTinV4uha74Y9jC_f=XLK5LufAn1xHw@mail.gmail.com>
	<CFCCB538-DDE5-4167-8442-1C296EDE787C@illinois.edu>
	<CAKVJ-_6eQz=ZTDZHPEGOM12QvXN15OZGuQVOQNyJbW9XK05mPg@mail.gmail.com>
	<EA454327-8737-49D9-A9BC-F0A63F14CCB1@illinois.edu>
Message-ID: <CAKVJ-_7WU-N0aW6DXNObieW3kzQVGaM4oF6v9pH9SMOaiVQeEw@mail.gmail.com>

On Thu, Aug 11, 2011 at 2:35 AM, Chris Fields <cjfields at illinois.edu> wrote:
>> Hi Chris,
>>
>> My email to the NCBI on 17 May had a reply from Tao Tao (NCBI User
>> services) saying it would be brought to their developers' attention.
>>
>> For reference, the email subject line was:
>> "Multiple iteration blocks per query in FASTA vs FASTA BLAST XML"
>>
>> I have just emailed back to enquire if there is any news to report.
>>
>> Peter
>
> Wonder if it's worth a second prod from someone else. ?Sometimes
> that gets their attention.
>
> chris

Tao replied yesterday morning (US time) to confirm the test files
so he (she?) could try this on the latest code.

Peter


From mok at bioxray.dk  Thu Aug 11 08:40:57 2011
From: mok at bioxray.dk (Morten Kjeldgaard)
Date: Thu, 11 Aug 2011 14:40:57 +0200
Subject: [Biopython] Unable to convert alignment to nexus format
Message-ID: <1313066457.10034.11.camel@yeti.daimi.au.dk>

Hi,

I am getting an exception when trying to output an alignment in nexus
format:

  ValueError: Need a DNA, RNA or Protein alphabet

The alignment is read by AlignIO.read() in fasta format from an output
file written by Muscle, and so the alphabet specified in the sequences
is IUPACProtein. Apparently, NexusIO checks for ProteinAlphabet and thus
fails. I am using BioPython 1.56.

Here is a 4-line test program generating the exception:

from Bio import AlignIO
alignment = AlignIO.read(open("aln.muscle"), "fasta")
g = open("aln.nexus", "w")
g.write (alignment.format("nexus"))


Any (safe) workarounds here?
Cheers,

Morten


-- 
Morten Kjeldgaard, asc. professor, MSc, PhD
BiRC - Bioinformatics Research Center, Aarhus University
C.F. M?llers Alle, Building 1110, DK-8000 Aarhus C, Denmark.
Lab +45 8942 3130 * Mobile: +45 5186 0147 * Home +45 8618 8180


From chapmanb at 50mail.com  Thu Aug 11 09:52:38 2011
From: chapmanb at 50mail.com (Brad Chapman)
Date: Thu, 11 Aug 2011 09:52:38 -0400
Subject: [Biopython] Unable to convert alignment to nexus format
In-Reply-To: <1313066457.10034.11.camel@yeti.daimi.au.dk>
References: <1313066457.10034.11.camel@yeti.daimi.au.dk>
Message-ID: <20110811135238.GF3143@kunkel>

Morten;

> I am getting an exception when trying to output an alignment in nexus
> format:
> 
>   ValueError: Need a DNA, RNA or Protein alphabet
> 
> The alignment is read by AlignIO.read() in fasta format from an output
> file written by Muscle, and so the alphabet specified in the sequences
> is IUPACProtein. Apparently, NexusIO checks for ProteinAlphabet and thus
> fails. I am using BioPython 1.56.

If you specify the alphabet to AlignIO, Nexus will be happy. Here's
your modified test program:

from Bio import AlignIO
from Bio.Alphabet import IUPAC, Gapped
alignment = AlignIO.read(open("aln.muscle"), "fasta", alphabet=Gapped(IUPAC.protein))
g = open("aln.nexus", "w")
g.write (alignment.format("nexus"))

Hope this helps,
Brad

From p.j.a.cock at googlemail.com  Thu Aug 11 10:04:50 2011
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Thu, 11 Aug 2011 15:04:50 +0100
Subject: [Biopython] Unable to convert alignment to nexus format
In-Reply-To: <1313066457.10034.11.camel@yeti.daimi.au.dk>
References: <1313066457.10034.11.camel@yeti.daimi.au.dk>
Message-ID: <CAKVJ-_5+AHr6bjOVYoYCBEsgCt5YEx+9jFm8F4h8aHR3bGK=tw@mail.gmail.com>

On Thu, Aug 11, 2011 at 1:40 PM, Morten Kjeldgaard <mok at bioxray.dk> wrote:
> Hi,
>
> I am getting an exception when trying to output an alignment in nexus
> format:
>
> ?ValueError: Need a DNA, RNA or Protein alphabet
>
> The alignment is read by AlignIO.read() in fasta format from an output
> file written by Muscle, and so the alphabet specified in the sequences
> is IUPACProtein.

Yes, but it was written out as a FASTA file which does not record
the alphabet. Biopython does not try to guess this, you must be
explicit.

> Apparently, NexusIO checks for ProteinAlphabet and thus
> fails. I am using BioPython 1.56.

As Brad described, when you parse the FASTA alignment, tell
Biopython it is protein.

Peter


From aaronquinlan at gmail.com  Fri Aug 12 14:39:34 2011
From: aaronquinlan at gmail.com (Aaron Quinlan)
Date: Fri, 12 Aug 2011 14:39:34 -0400
Subject: [Biopython] Working with genomic intervals
Message-ID: <B46749AA-EE67-4061-8B20-021C61501A3E@gmail.com>

All,

I apologize in advance if this is a naive question. I am wondering if BioPython provides libraries for working with genomic intervals in BED, GFF, or any other like format?  I am looking for libraries that handle the parsing of files in these formats into Python objects, as well as libraries for manipulating (intersection, merging, counting, etc.) intervals.  I know this exists in Galaxy's bx-python, but am wondering if there are similar libraries in BioPython?

Gratefully,
Aaron


From p.j.a.cock at googlemail.com  Sun Aug 14 07:11:37 2011
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Sun, 14 Aug 2011 12:11:37 +0100
Subject: [Biopython]  Working with genomic intervals
In-Reply-To: <B46749AA-EE67-4061-8B20-021C61501A3E@gmail.com>
References: <B46749AA-EE67-4061-8B20-021C61501A3E@gmail.com>
Message-ID: <CAKVJ-_6QM0mB=uXhExmG66REYhtTed3+ukaqAGPyB2ujv6Aiag@mail.gmail.com>

On Friday, August 12, 2011, Aaron Quinlan <aaronquinlan at gmail.com> wrote:
> All,
>
> I apologize in advance if this is a naive question.
> I am wondering if BioPython provides libraries for
> working with genomic intervals in BED, GFF, or
> any other like format?  I am looking for libraries
> that handle the parsing of files in these formats
> into Python objects, as well as libraries for
> manipulating (intersection, merging, counting,
> etc.) intervals.  I know this exists in Galaxy's
> bx-python, but am wondering if there are similar
> libraries in BioPython?
>
> Gratefully,
> Aaron

Hi Aaron,

Have a look at http://biopython.org/wiki/GFF_Parsing
wher Brad is working on this. He's also spoken
highly of bx-python as I recall.

Peter

From sdavis2 at mail.nih.gov  Sun Aug 14 07:48:15 2011
From: sdavis2 at mail.nih.gov (Sean Davis)
Date: Sun, 14 Aug 2011 07:48:15 -0400
Subject: [Biopython] Working with genomic intervals
In-Reply-To: <CAKVJ-_6QM0mB=uXhExmG66REYhtTed3+ukaqAGPyB2ujv6Aiag@mail.gmail.com>
References: <B46749AA-EE67-4061-8B20-021C61501A3E@gmail.com>
	<CAKVJ-_6QM0mB=uXhExmG66REYhtTed3+ukaqAGPyB2ujv6Aiag@mail.gmail.com>
Message-ID: <CANeAVB=WHK1QzBAmHS68eH-aueubvknAcVKVLL+p=rBVXiR3JQ@mail.gmail.com>

On Sun, Aug 14, 2011 at 7:11 AM, Peter Cock <p.j.a.cock at googlemail.com> wrote:
> On Friday, August 12, 2011, Aaron Quinlan <aaronquinlan at gmail.com> wrote:
>> All,
>>
>> I apologize in advance if this is a naive question.
>> I am wondering if BioPython provides libraries for
>> working with genomic intervals in BED, GFF, or
>> any other like format? ?I am looking for libraries
>> that handle the parsing of files in these formats
>> into Python objects, as well as libraries for
>> manipulating (intersection, merging, counting,
>> etc.) intervals. ?I know this exists in Galaxy's
>> bx-python, but am wondering if there are similar
>> libraries in BioPython?
>>
>> Gratefully,
>> Aaron
>
> Hi Aaron,
>
> Have a look at http://biopython.org/wiki/GFF_Parsing
> wher Brad is working on this. He's also spoken
> highly of bx-python as I recall.

I would second the bx-python vote.  Not only are the "normal" interval
classes covered, but there are also some variants (clustering is one
that comes to mind).

Sean


From lgautier at gmail.com  Mon Aug 15 02:17:28 2011
From: lgautier at gmail.com (Laurent Gautier)
Date: Mon, 15 Aug 2011 08:17:28 +0200
Subject: [Biopython] Working with genomic intervals
In-Reply-To: <mailman.1.1313337602.7670.biopython@lists.open-bio.org>
References: <mailman.1.1313337602.7670.biopython@lists.open-bio.org>
Message-ID: <4E48B9F8.4070605@gmail.com>

On 2011-08-14 18:00, biopython-request at lists.open-bio.org wrote:
> On Sun, Aug 14, 2011 at 7:11 AM, Peter Cock<p.j.a.cock at googlemail.com>  wrote:
>> >  On Friday, August 12, 2011, Aaron Quinlan<aaronquinlan at gmail.com>  wrote:
>>> >>  All,
>>> >>
>>> >>  I apologize in advance if this is a naive question.
>>> >>  I am wondering if BioPython provides libraries for
>>> >>  working with genomic intervals in BED, GFF, or
>>> >>  any other like format? ?I am looking for libraries
>>> >>  that handle the parsing of files in these formats
>>> >>  into Python objects, as well as libraries for
>>> >>  manipulating (intersection, merging, counting,
>>> >>  etc.) intervals. ?I know this exists in Galaxy's
>>> >>  bx-python, but am wondering if there are similar
>>> >>  libraries in BioPython?
>>> >>
>>> >>  Gratefully,
>>> >>  Aaron
>> >
>> >  Hi Aaron,
>> >
>> >  Have a look athttp://biopython.org/wiki/GFF_Parsing
>> >  wher Brad is working on this. He's also spoken
>> >  highly of bx-python as I recall.
> I would second the bx-python vote.  Not only are the "normal" interval
> classes covered, but there are also some variants (clustering is one
> that comes to mind).
>
> Sean

One can also access from Python the utilities for ranges available in
bioconductor, for example using the bioconductor extension to rpy2 or rpy2
directly (may be using dynamic class mapping features, as shown below):

from rpy2.robjects.packages import importr
iranges = importr("IRanges")
# Python class IRanges as an API to Bioconductors IRanges::IRanges
from rpy2.robjects.methods import RS4, RS4Auto_Type
class IRanges(RS4):
     __metaclass__ = RS4Auto_Type
     __rpackagename__ = "IRanges"
     __rname__ = "IRanges"

# now in action

 >>> from rpy2.robjects.vectors import IntVector
 >>> ir = IRanges(iranges.IRanges(start = IntVector(range(10)), width = 11))
 >>> print(ir)
IRanges of length 10
      start end width
[1]      0  10    11
[2]      1  11    11
[3]      2  12    11
[4]      3  13    11
[5]      4  14    11
[6]      5  15    11
[7]      6  16    11
[8]      7  17    11
[9]      8  18    11
[10]     9  19    11
 >>> print(IRanges(ir.reduce__IRanges(ir)))
IRanges of length 1
     start end width
[1]     0  19    20


From aaronquinlan at gmail.com  Mon Aug 15 19:54:31 2011
From: aaronquinlan at gmail.com (Aaron Quinlan)
Date: Mon, 15 Aug 2011 19:54:31 -0400
Subject: [Biopython] Working with genomic intervals
In-Reply-To: <4E48B9F8.4070605@gmail.com>
References: <mailman.1.1313337602.7670.biopython@lists.open-bio.org>
	<4E48B9F8.4070605@gmail.com>
Message-ID: <10D5E6D5-7114-406C-A2CF-8EB211CCE8D2@gmail.com>

Dear Peter, Sean, and Laurent,
   Thanks so much for the useful suggestions.
Best,
Aaron


On Aug 15, 2011, at 2:17 AM, Laurent Gautier wrote:

> On 2011-08-14 18:00, biopython-request at lists.open-bio.org wrote:
>> On Sun, Aug 14, 2011 at 7:11 AM, Peter Cock<p.j.a.cock at googlemail.com>  wrote:
>>> >  On Friday, August 12, 2011, Aaron Quinlan<aaronquinlan at gmail.com>  wrote:
>>>> >>  All,
>>>> >>
>>>> >>  I apologize in advance if this is a naive question.
>>>> >>  I am wondering if BioPython provides libraries for
>>>> >>  working with genomic intervals in BED, GFF, or
>>>> >>  any other like format? ?I am looking for libraries
>>>> >>  that handle the parsing of files in these formats
>>>> >>  into Python objects, as well as libraries for
>>>> >>  manipulating (intersection, merging, counting,
>>>> >>  etc.) intervals. ?I know this exists in Galaxy's
>>>> >>  bx-python, but am wondering if there are similar
>>>> >>  libraries in BioPython?
>>>> >>
>>>> >>  Gratefully,
>>>> >>  Aaron
>>> >
>>> >  Hi Aaron,
>>> >
>>> >  Have a look athttp://biopython.org/wiki/GFF_Parsing
>>> >  wher Brad is working on this. He's also spoken
>>> >  highly of bx-python as I recall.
>> I would second the bx-python vote.  Not only are the "normal" interval
>> classes covered, but there are also some variants (clustering is one
>> that comes to mind).
>> 
>> Sean
> 
> One can also access from Python the utilities for ranges available in
> bioconductor, for example using the bioconductor extension to rpy2 or rpy2
> directly (may be using dynamic class mapping features, as shown below):
> 
> from rpy2.robjects.packages import importr
> iranges = importr("IRanges")
> # Python class IRanges as an API to Bioconductors IRanges::IRanges
> from rpy2.robjects.methods import RS4, RS4Auto_Type
> class IRanges(RS4):
>    __metaclass__ = RS4Auto_Type
>    __rpackagename__ = "IRanges"
>    __rname__ = "IRanges"
> 
> # now in action
> 
> >>> from rpy2.robjects.vectors import IntVector
> >>> ir = IRanges(iranges.IRanges(start = IntVector(range(10)), width = 11))
> >>> print(ir)
> IRanges of length 10
>     start end width
> [1]      0  10    11
> [2]      1  11    11
> [3]      2  12    11
> [4]      3  13    11
> [5]      4  14    11
> [6]      5  15    11
> [7]      6  16    11
> [8]      7  17    11
> [9]      8  18    11
> [10]     9  19    11
> >>> print(IRanges(ir.reduce__IRanges(ir)))
> IRanges of length 1
>    start end width
> [1]     0  19    20
> 
> 


From brandonjbreitling at gmail.com  Wed Aug 17 17:44:21 2011
From: brandonjbreitling at gmail.com (Brandon Breitling)
Date: Wed, 17 Aug 2011 21:44:21 +0000 (UTC)
Subject: [Biopython] Question on your Methods in Enzymology paper
References: <i2vb34be8bd1004071757lc24e582aweed55444a3abd8cf@mail.gmail.com>
Message-ID: <loom.20110817T233009-686@post.gmane.org>


Hi Mr. Lunt,

My name is Brandon Breitling and I'm a statistics 
graduate student in the United States.  I was 
wondering if you hadthe scripts or code available
from your "Inference of Direct Residue Contacts
in Two-Component Signaling" paper.  I'm trying 
to see if I can do the same for a eukaryotic
protein pair that my lab 
studies.  

I have created the concatenated strings dataset
for my protein as described in your paper and 
have attempted to make scripts for the MI steps
but would really be benefited if I could get
them for the all steps in the Direct Coupling
analysis.  If you could also email me the 
accession number for your dataset so that I 
can verify that I have the scripts working, 
that would be most appreciated as well. 

Regards,
Brandon Breitling


From p.j.a.cock at googlemail.com  Wed Aug 17 18:21:44 2011
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Wed, 17 Aug 2011 23:21:44 +0100
Subject: [Biopython] Question on your Methods in Enzymology paper
In-Reply-To: <loom.20110817T233009-686@post.gmane.org>
References: <i2vb34be8bd1004071757lc24e582aweed55444a3abd8cf@mail.gmail.com>
	<loom.20110817T233009-686@post.gmane.org>
Message-ID: <CAKVJ-_5jp_2X-zqFDOnL68TpeQBB5BJWMHQ5nH9Fi3fsep_daQ@mail.gmail.com>

On Wed, Aug 17, 2011 at 10:44 PM, Brandon Breitling wrote:
>
> Hi Mr. Lunt,
>
> My name is Brandon Breitling and I'm a statistics
> graduate student in the United States. ?I was
> wondering if you had the scripts or code available
> from your "Inference of Direct Residue Contacts
> in Two-Component Signaling" paper. ?I'm trying
> to see if I can do the same for a eukaryotic
> protein pair that my lab studies.
>
> I have created the concatenated strings dataset
> for my protein as described in your paper and
> have attempted to make scripts for the MI steps
> but would really be benefited if I could get
> them for the all steps in the Direct Coupling
> analysis. ?If you could also email me the
> accession number for your dataset so that I
> can verify that I have the scripts working,
> that would be most appreciated as well.
>
> Regards,
> Brandon Breitling

Hi Brandon,

It looks like you've mixed up your email addresses.

As it happens I did my PhD on TCS, and used
Biopython's Bio.PDB model to get crude distances
from a PDB complex (and also looked at MI). I'm
not sure if I've read this paper though...

Bryan Lunt, Hendrik Szurmant, Andrea Procaccini,
James A. Hoch, Terence Hwa and Martin Weigt
"Chapter Two -  Inference of Direct Residue Contacts
in Two-Component Signaling". Methods in Enzymology
Volume 471, 2010, Pages 17-41
http://dx.doi.org/10.1016/S0076-6879(10)71002-8

Peter


From lunt at ctbp.ucsd.edu  Thu Aug 18 12:46:05 2011
From: lunt at ctbp.ucsd.edu (Bryan Lunt)
Date: Thu, 18 Aug 2011 09:46:05 -0700
Subject: [Biopython] Biopython Digest, Vol 104, Issue 10
In-Reply-To: <mailman.1.1313683203.22074.biopython@lists.open-bio.org>
References: <mailman.1.1313683203.22074.biopython@lists.open-bio.org>
Message-ID: <CAEdCOMNaQTqfn8sTVVWgDvrxLimXPOv5HGrZVt392nL0ukMrSg@mail.gmail.com>

Oh!,
Yeah, we used BioPython extensively, but I thought I sent Brandon the
code already...
We have a decent module for getting distances from Bio.PDB, though
unfortunately it uses far far too much disk space (it outputs a large
text file with every residue compared to every other reside, allowing
AWK or some other tool to filter the file.)
And a large set of tools for creating putative pairings, mainly for
TCS, but of course generalized to pair any set of protein domains...

-Bryan

On Thu, Aug 18, 2011 at 9:00 AM,  <biopython-request at lists.open-bio.org> wrote:
> Send Biopython mailing list submissions to
> ? ? ? ?biopython at lists.open-bio.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
> ? ? ? ?http://lists.open-bio.org/mailman/listinfo/biopython
> or, via email, send a message with subject or body 'help' to
> ? ? ? ?biopython-request at lists.open-bio.org
>
> You can reach the person managing the list at
> ? ? ? ?biopython-owner at lists.open-bio.org
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of Biopython digest..."
>
>
> Today's Topics:
>
> ? 1. Question on your Methods in Enzymology paper (Brandon Breitling)
> ? 2. Re: Question on your Methods in Enzymology paper (Peter Cock)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Wed, 17 Aug 2011 21:44:21 +0000 (UTC)
> From: Brandon Breitling <brandonjbreitling at gmail.com>
> Subject: [Biopython] Question on your Methods in Enzymology paper
> To: biopython at biopython.org
> Message-ID: <loom.20110817T233009-686 at post.gmane.org>
> Content-Type: text/plain; charset=us-ascii
>
>
> Hi Mr. Lunt,
>
> My name is Brandon Breitling and I'm a statistics
> graduate student in the United States. ?I was
> wondering if you hadthe scripts or code available
> from your "Inference of Direct Residue Contacts
> in Two-Component Signaling" paper. ?I'm trying
> to see if I can do the same for a eukaryotic
> protein pair that my lab
> studies.
>
> I have created the concatenated strings dataset
> for my protein as described in your paper and
> have attempted to make scripts for the MI steps
> but would really be benefited if I could get
> them for the all steps in the Direct Coupling
> analysis. ?If you could also email me the
> accession number for your dataset so that I
> can verify that I have the scripts working,
> that would be most appreciated as well.
>
> Regards,
> Brandon Breitling
>
>
>
> ------------------------------
>
> Message: 2
> Date: Wed, 17 Aug 2011 23:21:44 +0100
> From: Peter Cock <p.j.a.cock at googlemail.com>
> Subject: Re: [Biopython] Question on your Methods in Enzymology paper
> To: Brandon Breitling <brandonjbreitling at gmail.com>
> Cc: biopython at biopython.org
> Message-ID:
> ? ? ? ?<CAKVJ-_5jp_2X-zqFDOnL68TpeQBB5BJWMHQ5nH9Fi3fsep_daQ at mail.gmail.com>
> Content-Type: text/plain; charset=ISO-8859-1
>
> On Wed, Aug 17, 2011 at 10:44 PM, Brandon Breitling wrote:
>>
>> Hi Mr. Lunt,
>>
>> My name is Brandon Breitling and I'm a statistics
>> graduate student in the United States. ?I was
>> wondering if you had the scripts or code available
>> from your "Inference of Direct Residue Contacts
>> in Two-Component Signaling" paper. ?I'm trying
>> to see if I can do the same for a eukaryotic
>> protein pair that my lab studies.
>>
>> I have created the concatenated strings dataset
>> for my protein as described in your paper and
>> have attempted to make scripts for the MI steps
>> but would really be benefited if I could get
>> them for the all steps in the Direct Coupling
>> analysis. ?If you could also email me the
>> accession number for your dataset so that I
>> can verify that I have the scripts working,
>> that would be most appreciated as well.
>>
>> Regards,
>> Brandon Breitling
>
> Hi Brandon,
>
> It looks like you've mixed up your email addresses.
>
> As it happens I did my PhD on TCS, and used
> Biopython's Bio.PDB model to get crude distances
> from a PDB complex (and also looked at MI). I'm
> not sure if I've read this paper though...
>
> Bryan Lunt, Hendrik Szurmant, Andrea Procaccini,
> James A. Hoch, Terence Hwa and Martin Weigt
> "Chapter Two - ?Inference of Direct Residue Contacts
> in Two-Component Signaling". Methods in Enzymology
> Volume 471, 2010, Pages 17-41
> http://dx.doi.org/10.1016/S0076-6879(10)71002-8
>
> Peter
>
>
>
> ------------------------------
>
> _______________________________________________
> Biopython mailing list ?- ?Biopython at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython
>
>
> End of Biopython Digest, Vol 104, Issue 10
> ******************************************
>


From p.j.a.cock at googlemail.com  Thu Aug 18 15:32:57 2011
From: p.j.a.cock at googlemail.com (Peter)
Date: Thu, 18 Aug 2011 20:32:57 +0100
Subject: [Biopython] Biopython 1.58 released
Message-ID: <75327C54-CF88-43BC-BACF-87139456FE67@googlemail.com>

Dear All,

Biopython 1.58 is out:
http://news.open-bio.org/news/2011/08/biopython-1-58-released/

Thank you to everyone who has contributed.

Peter

P.S. We're on Twitter as @Biopython


From hlapp at drycafe.net  Mon Aug  1 22:36:27 2011
From: hlapp at drycafe.net (Hilmar Lapp)
Date: Mon, 1 Aug 2011 18:36:27 -0400
Subject: [Biopython] Job opportunity: User Interface Design and Web
	Application Developer
Message-ID: <7F0AE58E-6052-469B-ACD0-207FAD060472@drycafe.net>

(Apologies if you have received this already or if this is considered  
spam - we're trying to reach out as broad as possible and I know that  
quite a few in the Bio* communities would be well qualified. Please  
feel free to pass on to anyone who might be interested, or might know  
someone who is.)

User Interface Design and Web Application Developer

The National Evolutionary Synthesis Center (NESCent) seeks a creative  
and enthusiastic individual to design user interfaces and web  
applications for scientific applications that manage, analyze,  
visualize and share data in support of evolutionary research. The  
incumbent will work as part of a small informatics team in close  
collaboration with domain scientists.

NESCent (http://nescent.org) is an NSF-funded center dedicated to  
cross-disciplinary research in evolutionary science. Our informatics  
team works closely with visiting and resident scientists to support  
their custom software and database development needs (http://informatics.nescent.org 
), and collaborates broadly with other biodiversity informatics  
projects.  All NESCent software products are open-source, and the  
Center has a number of initiatives to actively promote collaborative  
development of community software resources. Above all, we are  
enthusiastic about our work, about the mission of the Center, and  
about the contribution of informatics to that mission.

Job description: The incumbent will design and develop user interfaces  
and web applications for databases and other software tools for  
sponsored scientists and staff. The job responsibilities include all  
stages of the software development process, including requirements  
gathering, design, implementation, release packaging and  
documentation, as part of a small team (typically 2-3 individuals).   
We expect the incumbent to present their work at conferences and  
contribute to publications with scientific collaborators; interact  
regularly with visiting and resident scientists, other members of the  
informatics team and Center staff; and generally serve as an expert  
resource for Center personnel. The position provides opportunities for  
professional development and encourages research into new  
technologies. Most informatics staff work at our Durham NC offices,  
located adjacent to Duke University, but we support a wide range of  
technologies for virtual communication with off-site staff and  
collaborators.

Salary range: $70,000 - $80,000, depending on education and experience

Required Qualifications:
* Demonstrated success collaborating with clients on custom software  
solutions
* Experience with various stages of the software development cycle
* Expertise in development and testing of user interface designs
* Excellent communication skills, both virtual and face-to-face

Preferred Qualifications:
* M.S. or Ph.D. in Computer Science, Bioinformatics or related field
* Demonstrated interest in science, particularly biology
* Expertise in dynamic and interactive web technologies (JavaScript,  
CGI)
* Expertise in rapid application development and respective  
programming technologies and languages (e.g., modern scripting  
languages and web-application frameworks such as Python/Django, Ruby/ 
Ruby-on-Rails, and Perl/Catalyst).
* Expertise in graphic design
* Expertise in data visualization and/or scientific data integration
* Expertise in software usability design and assessment
* Expertise in web service (SOAP, REST, XML, JSON) and semantic web  
technologies
* Fluency in Java programming
* Prior experience in relational database programming  (PostgreSQL or  
MySQL)
* Experience with open-source, and collaborative, software development

How to apply: Please send cover letter, resume and contact information  
for three references to Dr. Karen Cranston, Training Coordinator and  
Bioinformatics Project Manager (karen.cranston at nescent.org);  Please  
also complete the online application at the University of North  
Carolina HR website: http://bit.ly/r9HQ8r. Informal inquires or  
requests for additional information may be directed to Dr. Cranston by  
email or phone (+1-919-613-2275). Closing date is August 15, 2011.

-- 
===========================================================
: Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net :
===========================================================


From lueck at ipk-gatersleben.de  Tue Aug  2 13:15:40 2011
From: lueck at ipk-gatersleben.de (=?iso-8859-1?Q?Stefanie_L=FCck?=)
Date: Tue, 2 Aug 2011 15:15:40 +0200
Subject: [Biopython] clustalW output format
Message-ID: <000001cc5116$46def910$1022a8c0@ipkgatersleben.de>

Hello!

 
I?m using ClustalW2 for my alignments and would like to have the aln1 output
format (http://www.ebi.ac.uk/Tools/msa/clustalw2/help/). This should show
the line numbers. Actually this should be the default but it?s not. 

 
I tried to add aln1 in \Bio\Align\Applications\_Clustalw.py in line 100+

 
Option(["-output", "-OUTPUT", "OUTPUT", "output"],

["input"],

lambda x: x in ["GCG", "GDE", "PHYLIP", "PIR", "NEXUS",?ALN1?

                         "gcg", "gde", "phylip", "pir", "nexus", ?aln1?]

 
but it doesn?t work.

Any ideas?

 
Thanks in advance!

Stefanie

 
From p.j.a.cock at googlemail.com  Tue Aug  2 14:54:42 2011
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Tue, 2 Aug 2011 15:54:42 +0100
Subject: [Biopython] clustalW output format
In-Reply-To: <000001cc5116$46def910$1022a8c0@ipkgatersleben.de>
References: <000001cc5116$46def910$1022a8c0@ipkgatersleben.de>
Message-ID: <CAKVJ-_40xfuwtJ7yGCP1tBRwMCJWy2Y8rudrL432QtMi6EZwNA@mail.gmail.com>

On Tue, Aug 2, 2011 at 2:15 PM, Stefanie L?ck <lueck at ipk-gatersleben.de> wrote:
> Hello!
>
> I?m using ClustalW2 for my alignments and would like to have the aln1 output
> format (http://www.ebi.ac.uk/Tools/msa/clustalw2/help/). This should show
> the line numbers. Actually this should be the default but it?s not.

I have version 2.1 installed and the default output format is traditional
Clustal output with no residue/base numbers (according to -help).

Which version of ClustalW2 are you using?

I'm expecting we'll need a new wrapper for Clustal Omega (I don't know
why they didn't call it Clustalw v3).

We'll probably also need to update Clustalw parser to cope with base/residue
numbering in the output as well.

Peter


From steven.irvin at monsanto.com  Tue Aug  2 15:47:41 2011
From: steven.irvin at monsanto.com (IRVIN, STEVEN (AG-Contractor/1000))
Date: Tue, 2 Aug 2011 15:47:41 +0000
Subject: [Biopython] Bio.Blast.Applications issue with outfmt="quoted string"
Message-ID: <8F46CBF672774F4C8A6B288A246B4468A532BF@stlwexmbxprd02.na.ds.monsanto.com>

Hello,

I am having an issue with the Biopython module making BLAST+ queries.

I am wondering if there is any support in Bio.Blast.Applications  for using the multiple arguments to -outfmt allowed by NCBI BLAST+ programs such as blastn.

I need to use this for example:

    blastn_cline = NcbiblastnCommandline(query='somefastafile.fas', db='tomato_cdna.db', evalue=1000, word_size=7, outfmt='10 qseqid sseqid length pident', out='outfile.txt')

The multiple arguments allowed to blastn -outfmt  allow the choice of specific columns output to the csv or tab separated file such subject_id, etc.

Biopython is returning non-zero exit status 1: USAGE when I run my program with above statement.


Here is a an example coomand line for BLAST+:

prompt_: blastn -query seq_fasta.fas -db local_db.db -out csv_out.csv -dust no -num_alignments 20 -num_descriptions 20 -evalue 1000 -word_size 7 -task blastn -outfmt "10 qseqid sseqid length pident"

I do not yet know if something in Bio.Blast.Applications needs to be modified to support this.

Steve

Steven D Irvin, MS
Bioinformatics Analyst
[cid:image003.png at 01CC1925.F25B8430] CC214-A
Monsanto Research Center
Chesterfield Village, MO
Steven.Irvin at monsanto.com<mailto:Steven.Irvin at monsanto.com>
(636) 737-1756


This e-mail message may contain privileged and/or confidential information, and is intended to be received only by persons entitled
to receive such information. If you have received this e-mail in error, please notify the sender immediately. Please delete it and
all attachments from any servers, hard drives or any other media. Other use of this e-mail by you is strictly prohibited.

All e-mails and attachments sent and received are subject to monitoring, reading and archival by Monsanto, including its
subsidiaries. The recipient of this e-mail is solely responsible for checking for the presence of "Viruses" or other "Malware".
Monsanto, along with its subsidiaries, accepts no liability for any damage caused by any such code transmitted by or accompanying
this e-mail or any attachment.


The information contained in this email may be subject to the export control laws and regulations of the United States, potentially
including but not limited to the Export Administration Regulations (EAR) and sanctions regulations issued by the U.S. Department of
Treasury, Office of Foreign Asset Controls (OFAC).  As a recipient of this information you are obligated to comply with all
applicable U.S. export laws and regulations.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.png
Type: image/png
Size: 3721 bytes
Desc: image001.png
URL: <http://lists.open-bio.org/pipermail/biopython/attachments/20110802/ff6953f4/attachment-0002.png>

From p.j.a.cock at googlemail.com  Tue Aug  2 16:14:25 2011
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Tue, 2 Aug 2011 17:14:25 +0100
Subject: [Biopython] Bio.Blast.Applications issue with outfmt="quoted
	string"
In-Reply-To: <8F46CBF672774F4C8A6B288A246B4468A532BF@stlwexmbxprd02.na.ds.monsanto.com>
References: <8F46CBF672774F4C8A6B288A246B4468A532BF@stlwexmbxprd02.na.ds.monsanto.com>
Message-ID: <CAKVJ-_5Qh9jnfFhcasRrkcA0fp5ZMeThBPrFgfN8nzm2=hexSQ@mail.gmail.com>

On Tue, Aug 2, 2011 at 4:47 PM, IRVIN, STEVEN (AG-Contractor/1000)
<steven.irvin at monsanto.com> wrote:
> Hello,
>
> I am having an issue with the Biopython module making BLAST+ queries.
>
> I am wondering if there is any support in Bio.Blast.Applications ?for using the multiple arguments to -outfmt allowed by NCBI BLAST+ programs such as blastn.
>
> I need to use this for example:
>
> ? ?blastn_cline = NcbiblastnCommandline(query='somefastafile.fas', db='tomato_cdna.db', evalue=1000, word_size=7, outfmt='10 qseqid sseqid length pident', out='outfile.txt')
>
> The multiple arguments allowed to blastn -outfmt ?allow the choice of specific columns output to the csv or tab separated file such subject_id, etc.

Yes, and they are very useful. Try:

blastn_cline = NcbiblastnCommandline(query='somefastafile.fas',
db='tomato_cdna.db', evalue=1000, word_size=7, outfmt='"10 qseqid
sseqid length pident"', out='outfile.txt')

i.e. Include the extra quotes explicitly. That's single quote, double
quote, text, double quote, single quote.
(There are other ways to embed double quote characters in a Python
string but that works for me.)

Peter


From lueck at ipk-gatersleben.de  Wed Aug  3 11:19:50 2011
From: lueck at ipk-gatersleben.de (=?iso-8859-1?Q?Stefanie_L=FCck?=)
Date: Wed, 3 Aug 2011 13:19:50 +0200
Subject: [Biopython] clustalW output format
In-Reply-To: <CAKVJ-_40xfuwtJ7yGCP1tBRwMCJWy2Y8rudrL432QtMi6EZwNA@mail.gmail.com>
References: <000001cc5116$46def910$1022a8c0@ipkgatersleben.de>
	<CAKVJ-_40xfuwtJ7yGCP1tBRwMCJWy2Y8rudrL432QtMi6EZwNA@mail.gmail.com>
Message-ID: <000001cc51cf$42d16100$1022a8c0@ipkgatersleben.de>

Thanks Peter!

I'm also using version 2.1. I didn't check the -help, only the homepage
where they say "Default value is: Aln w/numbers [aln1]", which confused
me...
Thanks for mentioning Clustal Omega, I didn't know that say changed the
names.
Stefanie

-----Urspr?ngliche Nachricht-----
Von: Peter Cock [mailto:p.j.a.cock at googlemail.com] 
Gesendet: Dienstag, 2. August 2011 16:55
An: Stefanie L?ck
Cc: biopython at lists.open-bio.org
Betreff: Re: [Biopython] clustalW output format

On Tue, Aug 2, 2011 at 2:15 PM, Stefanie L?ck <lueck at ipk-gatersleben.de>
wrote:
> Hello!
>
> I?m using ClustalW2 for my alignments and would like to have the aln1
output
> format (http://www.ebi.ac.uk/Tools/msa/clustalw2/help/). This should show
> the line numbers. Actually this should be the default but it?s not.

I have version 2.1 installed and the default output format is traditional
Clustal output with no residue/base numbers (according to -help).

Which version of ClustalW2 are you using?

I'm expecting we'll need a new wrapper for Clustal Omega (I don't know
why they didn't call it Clustalw v3).

We'll probably also need to update Clustalw parser to cope with base/residue
numbering in the output as well.

Peter


From jgrant at smith.edu  Mon Aug  8 18:08:25 2011
From: jgrant at smith.edu (Jessica Grant)
Date: Mon, 8 Aug 2011 14:08:25 -0400
Subject: [Biopython] deleting in-group paralogs from newick trees
Message-ID: <5D7AD333-66EC-4B23-950E-523E2FBD2A62@smith.edu>

Hello,

I am looking at large phylogenetic trees that have many paralogs.  I  
would like to simplify my trees so that all monophyletic paralog  
groups are collapsed--or all sequences except the shortest branch are  
deleted.  Is there a Biopython module that can help?  I started  
looking at Phylo, but couldn't see an obvious way.

Thanks,

Jessica


From eric.talevich at gmail.com  Mon Aug  8 19:33:52 2011
From: eric.talevich at gmail.com (Eric Talevich)
Date: Mon, 8 Aug 2011 15:33:52 -0400
Subject: [Biopython] deleting in-group paralogs from newick trees
In-Reply-To: <5D7AD333-66EC-4B23-950E-523E2FBD2A62@smith.edu>
References: <5D7AD333-66EC-4B23-950E-523E2FBD2A62@smith.edu>
Message-ID: <CAMC681nx63571_3gxYf+_OLJCS_68JixB55ftwoxNcuiC1bdbQ@mail.gmail.com>

On Mon, Aug 8, 2011 at 2:08 PM, Jessica Grant <jgrant at smith.edu> wrote:

> Hello,
>
> I am looking at large phylogenetic trees that have many paralogs.  I would
> like to simplify my trees so that all monophyletic paralog groups are
> collapsed--or all sequences except the shortest branch are deleted.  Is
> there a Biopython module that can help?  I started looking at Phylo, but
> couldn't see an obvious way.
>

Hi Jessica,

Yes, Phylo is the right module to use. If I understand your problem
correctly, the tree methods you want are is_monophyletic() and
collapse_all(). Both operate on a clade within the tree. You'd traverse the
tree with get_nonterminals(), check if a paralog group under a clade is
monophyletic, and if so, collapse it.

Do you have a list of paralogs already? And, do you know which groups might
be monophyletic?

If you have groups/clades already, it's simple:

>>> tree = Phylo.read('mytree.nwk', 'newick')
>>> for clade in tree.get_nonterminals(order='postorder'):
...     mono_parent = clade.is_monophyletic([SOME_PARALOG_GROUP])
...     if mono_parent:
...         mono_parent.collapse_all()

If you don't know the groups yet, then the test inside the loop is a little
more elaborate. You can look for overlaps between a clade's tips and and the
paralog list using sets:

>>> paralogs = set(PARALOG_LIST)
# Inside the loop:
>>> tips = set([str(t) for t in clade.get_terminals()])
>>> overlap = tips.intersect(paralogs)
>>> if len(overlaps) >= 2:
# The rest of the loop...


Hope that helps,
Eric


From cjfields at illinois.edu  Tue Aug  9 20:09:05 2011
From: cjfields at illinois.edu (Chris Fields)
Date: Tue, 9 Aug 2011 15:09:05 -0500
Subject: [Biopython] [BioRuby] Interesting BLAST 2.2.25+ XML behaviour
In-Reply-To: <BANLkTinV4uha74Y9jC_f=XLK5LufAn1xHw@mail.gmail.com>
References: <BANLkTi=6_2bFpGhOwxtdjy-DzxUotVWxEg@mail.gmail.com>
	<BANLkTinH0y4KQ7_AXt7Ly3TgN9fXxErUzA@mail.gmail.com>
	<398303E2-1195-4CC2-8B73-09C6C1117892@illinois.edu>
	<4DC12371.3040204@gmail.com>
	<BANLkTinV4uha74Y9jC_f=XLK5LufAn1xHw@mail.gmail.com>
Message-ID: <CFCCB538-DDE5-4167-8442-1C296EDE787C@illinois.edu>

I'm reviving this thread to see what the current status is (if anything has changed).  The bioperl parser has the same problem; at the moment we're bascially stuck until NCBI gives some indication as to whether this is a bug or not.  Any word back from them yet?

(and agreed, it would be nice to have an external bug tracker from NCBI).

chris

On May 4, 2011, at 5:36 AM, Peter Cock wrote:

> On Wed, May 4, 2011 at 10:59 AM, Michal <mictadlo at gmail.com> wrote:
>> Hi Peter,
>> Do you have the script which read
>> 
>> https://bitbucket.org/galaxy/galaxy-central/src/8eaf07a46623/test-data/blastp_four_human_vs_rhodopsin.xml
>> 
>> 
>> and what would be the correct output?
>> 
>> Thank you in advance.
>> 
>> Cheers,
>> Michal
> 
> Hi Michal,
> 
> I'm not quite sure what you're asking, but I'll try. First, the three
> data files:
> 
> $ wget https://bitbucket.org/galaxy/galaxy-central/src/8eaf07a46623/test-data/blastp_four_human_vs_rhodopsin.xml
> $ wget https://bitbucket.org/galaxy/galaxy-central/src/8eaf07a46623/test-data/four_human_proteins.fasta
> $ wget https://bitbucket.org/galaxy/galaxy-central/src/8eaf07a46623/rhodopsin_proteins.fasta
> 
> The query file has four sequences,
> 
> $ grep -c "^>" four_human_proteins.fasta
> 4
> 
> $ grep "^>" four_human_proteins.fasta
>> sp|Q9BS26|ERP44_HUMAN Endoplasmic reticulum resident protein 44 OS=Homo sapiens GN=ERP44 PE=1 SV=1
>> sp|Q9NSY1|BMP2K_HUMAN BMP-2-inducible protein kinase OS=Homo sapiens GN=BMP2K PE=1 SV=2
>> sp|P06213|INSR_HUMAN Insulin receptor OS=Homo sapiens GN=INSR PE=1 SV=4
>> sp|P08100|OPSD_HUMAN Rhodopsin OS=Homo sapiens GN=RHO PE=1 SV=1
> 
> Based on past experience, I would expect 4 iteration blocks in the
> XML, but in this case I have 24:
> 
> $ grep "<Iteration>" -c blastp_four_human_vs_rhodopsin.xml
> 24
> 
> Notice we get 6 iterations for each query (4 times 6 is 24):
> 
> $ grep "<Iteration_query-ID>" blastp_four_human_vs_rhodopsin.xml
>      <Iteration_query-ID>sp|Q9BS26|ERP44_HUMAN</Iteration_query-ID>
>      <Iteration_query-ID>sp|Q9BS26|ERP44_HUMAN</Iteration_query-ID>
>      <Iteration_query-ID>sp|Q9BS26|ERP44_HUMAN</Iteration_query-ID>
>      <Iteration_query-ID>sp|Q9BS26|ERP44_HUMAN</Iteration_query-ID>
>      <Iteration_query-ID>sp|Q9BS26|ERP44_HUMAN</Iteration_query-ID>
>      <Iteration_query-ID>sp|Q9BS26|ERP44_HUMAN</Iteration_query-ID>
>      <Iteration_query-ID>sp|Q9NSY1|BMP2K_HUMAN</Iteration_query-ID>
>      <Iteration_query-ID>sp|Q9NSY1|BMP2K_HUMAN</Iteration_query-ID>
>      <Iteration_query-ID>sp|Q9NSY1|BMP2K_HUMAN</Iteration_query-ID>
>      <Iteration_query-ID>sp|Q9NSY1|BMP2K_HUMAN</Iteration_query-ID>
>      <Iteration_query-ID>sp|Q9NSY1|BMP2K_HUMAN</Iteration_query-ID>
>      <Iteration_query-ID>sp|Q9NSY1|BMP2K_HUMAN</Iteration_query-ID>
>      <Iteration_query-ID>sp|P06213|INSR_HUMAN</Iteration_query-ID>
>      <Iteration_query-ID>sp|P06213|INSR_HUMAN</Iteration_query-ID>
>      <Iteration_query-ID>sp|P06213|INSR_HUMAN</Iteration_query-ID>
>      <Iteration_query-ID>sp|P06213|INSR_HUMAN</Iteration_query-ID>
>      <Iteration_query-ID>sp|P06213|INSR_HUMAN</Iteration_query-ID>
>      <Iteration_query-ID>sp|P06213|INSR_HUMAN</Iteration_query-ID>
>      <Iteration_query-ID>sp|P08100|OPSD_HUMAN</Iteration_query-ID>
>      <Iteration_query-ID>sp|P08100|OPSD_HUMAN</Iteration_query-ID>
>      <Iteration_query-ID>sp|P08100|OPSD_HUMAN</Iteration_query-ID>
>      <Iteration_query-ID>sp|P08100|OPSD_HUMAN</Iteration_query-ID>
>      <Iteration_query-ID>sp|P08100|OPSD_HUMAN</Iteration_query-ID>
>      <Iteration_query-ID>sp|P08100|OPSD_HUMAN</Iteration_query-ID>
> 
> Now, using the two FASTA files directly and re-running blastp, what do I get?
> 
> $ ~/Downloads/ncbi-blast-2.2.25+/bin/blastp -query
> four_human_proteins.fasta -subject rhodopsin_proteins.fasta -outfmt 5
> | grep "<Iteration>" -c
> 24
> 
> Or again with -parse_deflines, which changes how the hit ID/def is presented:
> 
> $ ~/Downloads/ncbi-blast-2.2.25+/bin/blastp -query
> four_human_proteins.fasta -subject rhodopsin_proteins.fasta -outfmt 5
> -parse_deflines | grep "<Iteration>" -c
> 24
> 
> How about older versions?
> 
> $ ~/Downloads/ncbi-blast-2.2.24+/bin/blastp -query
> four_human_proteins.fasta -subject rhodopsin_proteins.fasta -outfmt 5
> BLAST engine error: XML formatting is only supported for a database search
> 
> I'll have to make a blast database first...
> 
> $ ~/Downloads/ncbi-blast-2.2.24+/bin/makeblastdb -in
> rhodopsin_proteins.fasta -dbtype prot
> 
> Building a new DB, current time: 05/04/2011 11:22:57
> New DB name:   rhodopsin_proteins.fasta
> New DB title:  rhodopsin_proteins.fasta
> Sequence type: Protein
> Keep Linkouts: T
> Keep MBits: T
> Maximum file size: 1073741824B
> Adding sequences from FASTA; added 6 sequences in 0.105655 seconds.
> 
> $ ~/Downloads/ncbi-blast-2.2.25+/bin/blastp -query
> four_human_proteins.fasta -db rhodopsin_proteins.fasta -outfmt 5 |
> grep "<Iteration>" -c
> 4
> 
> Look - just four identifiers as I expect! This also works if the database
> is built with the -parse_seqids switch.
> 
> The same happens with older versions of BLAST+, one <Iteration>
> block per query, so four iteration blocks for this example. I tried all
> of 2.2.21+, 2.2.22+, 2.2.23+ and 2.2.24+ (running makeblastdb to
> give a fresh database, then blastp).
> 
> That seems to demonstrate that bug is specific to the XML output
> from FASTA vs FASTA (not FASTA vs DB), which is a new feature
> in NCBI BLAST 2.2.25+
> 
> I will raise this with the NCBI, and report back.
> 
> However, even if the NCBI fix it in the next release, we (Bio*) may
> want to update our parsers to cope with this quirk, or at least put a
> warning in our BLAST XML parser documentation, as there will be
> lots of installations of NCBI BLAST 2.2.25+ in the wild.
> 
> Peter


From p.j.a.cock at googlemail.com  Wed Aug 10 09:15:18 2011
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Wed, 10 Aug 2011 10:15:18 +0100
Subject: [Biopython] [BioRuby] Interesting BLAST 2.2.25+ XML behaviour
In-Reply-To: <CFCCB538-DDE5-4167-8442-1C296EDE787C@illinois.edu>
References: <BANLkTi=6_2bFpGhOwxtdjy-DzxUotVWxEg@mail.gmail.com>
	<BANLkTinH0y4KQ7_AXt7Ly3TgN9fXxErUzA@mail.gmail.com>
	<398303E2-1195-4CC2-8B73-09C6C1117892@illinois.edu>
	<4DC12371.3040204@gmail.com>
	<BANLkTinV4uha74Y9jC_f=XLK5LufAn1xHw@mail.gmail.com>
	<CFCCB538-DDE5-4167-8442-1C296EDE787C@illinois.edu>
Message-ID: <CAKVJ-_6eQz=ZTDZHPEGOM12QvXN15OZGuQVOQNyJbW9XK05mPg@mail.gmail.com>

On Tue, Aug 9, 2011 at 9:09 PM, Chris Fields <cjfields at illinois.edu> wrote:
> I'm reviving this thread to see what the current status is (if anything has
> changed). ?The bioperl parser has the same problem; at the moment we're
> bascially stuck until NCBI gives some indication as to whether this is a
> bug or not. ?Any word back from them yet?
>
> (and agreed, it would be nice to have an external bug tracker from NCBI).
>
> chris

Hi Chris,

My email to the NCBI on 17 May had a reply from Tao Tao (NCBI User
services) saying it would be brought to their developers' attention.

For reference, the email subject line was:
"Multiple iteration blocks per query in FASTA vs FASTA BLAST XML"

I have just emailed back to enquire if there is any news to report.

Peter


From cjfields at illinois.edu  Thu Aug 11 01:35:37 2011
From: cjfields at illinois.edu (Chris Fields)
Date: Wed, 10 Aug 2011 20:35:37 -0500
Subject: [Biopython] [BioRuby] Interesting BLAST 2.2.25+ XML behaviour
In-Reply-To: <CAKVJ-_6eQz=ZTDZHPEGOM12QvXN15OZGuQVOQNyJbW9XK05mPg@mail.gmail.com>
References: <BANLkTi=6_2bFpGhOwxtdjy-DzxUotVWxEg@mail.gmail.com>
	<BANLkTinH0y4KQ7_AXt7Ly3TgN9fXxErUzA@mail.gmail.com>
	<398303E2-1195-4CC2-8B73-09C6C1117892@illinois.edu>
	<4DC12371.3040204@gmail.com>
	<BANLkTinV4uha74Y9jC_f=XLK5LufAn1xHw@mail.gmail.com>
	<CFCCB538-DDE5-4167-8442-1C296EDE787C@illinois.edu>
	<CAKVJ-_6eQz=ZTDZHPEGOM12QvXN15OZGuQVOQNyJbW9XK05mPg@mail.gmail.com>
Message-ID: <EA454327-8737-49D9-A9BC-F0A63F14CCB1@illinois.edu>

On Aug 10, 2011, at 4:15 AM, Peter Cock wrote:

> On Tue, Aug 9, 2011 at 9:09 PM, Chris Fields <cjfields at illinois.edu> wrote:
>> I'm reviving this thread to see what the current status is (if anything has
>> changed).  The bioperl parser has the same problem; at the moment we're
>> bascially stuck until NCBI gives some indication as to whether this is a
>> bug or not.  Any word back from them yet?
>> 
>> (and agreed, it would be nice to have an external bug tracker from NCBI).
>> 
>> chris
> 
> Hi Chris,
> 
> My email to the NCBI on 17 May had a reply from Tao Tao (NCBI User
> services) saying it would be brought to their developers' attention.
> 
> For reference, the email subject line was:
> "Multiple iteration blocks per query in FASTA vs FASTA BLAST XML"
> 
> I have just emailed back to enquire if there is any news to report.
> 
> Peter

Wonder if it's worth a second prod from someone else.  Sometimes that gets their attention.

chris


From p.j.a.cock at googlemail.com  Thu Aug 11 09:09:13 2011
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Thu, 11 Aug 2011 10:09:13 +0100
Subject: [Biopython] [BioRuby] Interesting BLAST 2.2.25+ XML behaviour
In-Reply-To: <EA454327-8737-49D9-A9BC-F0A63F14CCB1@illinois.edu>
References: <BANLkTi=6_2bFpGhOwxtdjy-DzxUotVWxEg@mail.gmail.com>
	<BANLkTinH0y4KQ7_AXt7Ly3TgN9fXxErUzA@mail.gmail.com>
	<398303E2-1195-4CC2-8B73-09C6C1117892@illinois.edu>
	<4DC12371.3040204@gmail.com>
	<BANLkTinV4uha74Y9jC_f=XLK5LufAn1xHw@mail.gmail.com>
	<CFCCB538-DDE5-4167-8442-1C296EDE787C@illinois.edu>
	<CAKVJ-_6eQz=ZTDZHPEGOM12QvXN15OZGuQVOQNyJbW9XK05mPg@mail.gmail.com>
	<EA454327-8737-49D9-A9BC-F0A63F14CCB1@illinois.edu>
Message-ID: <CAKVJ-_7WU-N0aW6DXNObieW3kzQVGaM4oF6v9pH9SMOaiVQeEw@mail.gmail.com>

On Thu, Aug 11, 2011 at 2:35 AM, Chris Fields <cjfields at illinois.edu> wrote:
>> Hi Chris,
>>
>> My email to the NCBI on 17 May had a reply from Tao Tao (NCBI User
>> services) saying it would be brought to their developers' attention.
>>
>> For reference, the email subject line was:
>> "Multiple iteration blocks per query in FASTA vs FASTA BLAST XML"
>>
>> I have just emailed back to enquire if there is any news to report.
>>
>> Peter
>
> Wonder if it's worth a second prod from someone else. ?Sometimes
> that gets their attention.
>
> chris

Tao replied yesterday morning (US time) to confirm the test files
so he (she?) could try this on the latest code.

Peter


From mok at bioxray.dk  Thu Aug 11 12:40:57 2011
From: mok at bioxray.dk (Morten Kjeldgaard)
Date: Thu, 11 Aug 2011 14:40:57 +0200
Subject: [Biopython] Unable to convert alignment to nexus format
Message-ID: <1313066457.10034.11.camel@yeti.daimi.au.dk>

Hi,

I am getting an exception when trying to output an alignment in nexus
format:

  ValueError: Need a DNA, RNA or Protein alphabet

The alignment is read by AlignIO.read() in fasta format from an output
file written by Muscle, and so the alphabet specified in the sequences
is IUPACProtein. Apparently, NexusIO checks for ProteinAlphabet and thus
fails. I am using BioPython 1.56.

Here is a 4-line test program generating the exception:

from Bio import AlignIO
alignment = AlignIO.read(open("aln.muscle"), "fasta")
g = open("aln.nexus", "w")
g.write (alignment.format("nexus"))


Any (safe) workarounds here?
Cheers,

Morten


-- 
Morten Kjeldgaard, asc. professor, MSc, PhD
BiRC - Bioinformatics Research Center, Aarhus University
C.F. M?llers Alle, Building 1110, DK-8000 Aarhus C, Denmark.
Lab +45 8942 3130 * Mobile: +45 5186 0147 * Home +45 8618 8180


From chapmanb at 50mail.com  Thu Aug 11 13:52:38 2011
From: chapmanb at 50mail.com (Brad Chapman)
Date: Thu, 11 Aug 2011 09:52:38 -0400
Subject: [Biopython] Unable to convert alignment to nexus format
In-Reply-To: <1313066457.10034.11.camel@yeti.daimi.au.dk>
References: <1313066457.10034.11.camel@yeti.daimi.au.dk>
Message-ID: <20110811135238.GF3143@kunkel>

Morten;

> I am getting an exception when trying to output an alignment in nexus
> format:
> 
>   ValueError: Need a DNA, RNA or Protein alphabet
> 
> The alignment is read by AlignIO.read() in fasta format from an output
> file written by Muscle, and so the alphabet specified in the sequences
> is IUPACProtein. Apparently, NexusIO checks for ProteinAlphabet and thus
> fails. I am using BioPython 1.56.

If you specify the alphabet to AlignIO, Nexus will be happy. Here's
your modified test program:

from Bio import AlignIO
from Bio.Alphabet import IUPAC, Gapped
alignment = AlignIO.read(open("aln.muscle"), "fasta", alphabet=Gapped(IUPAC.protein))
g = open("aln.nexus", "w")
g.write (alignment.format("nexus"))

Hope this helps,
Brad


From p.j.a.cock at googlemail.com  Thu Aug 11 14:04:50 2011
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Thu, 11 Aug 2011 15:04:50 +0100
Subject: [Biopython] Unable to convert alignment to nexus format
In-Reply-To: <1313066457.10034.11.camel@yeti.daimi.au.dk>
References: <1313066457.10034.11.camel@yeti.daimi.au.dk>
Message-ID: <CAKVJ-_5+AHr6bjOVYoYCBEsgCt5YEx+9jFm8F4h8aHR3bGK=tw@mail.gmail.com>

On Thu, Aug 11, 2011 at 1:40 PM, Morten Kjeldgaard <mok at bioxray.dk> wrote:
> Hi,
>
> I am getting an exception when trying to output an alignment in nexus
> format:
>
> ?ValueError: Need a DNA, RNA or Protein alphabet
>
> The alignment is read by AlignIO.read() in fasta format from an output
> file written by Muscle, and so the alphabet specified in the sequences
> is IUPACProtein.

Yes, but it was written out as a FASTA file which does not record
the alphabet. Biopython does not try to guess this, you must be
explicit.

> Apparently, NexusIO checks for ProteinAlphabet and thus
> fails. I am using BioPython 1.56.

As Brad described, when you parse the FASTA alignment, tell
Biopython it is protein.

Peter


From aaronquinlan at gmail.com  Fri Aug 12 18:39:34 2011
From: aaronquinlan at gmail.com (Aaron Quinlan)
Date: Fri, 12 Aug 2011 14:39:34 -0400
Subject: [Biopython] Working with genomic intervals
Message-ID: <B46749AA-EE67-4061-8B20-021C61501A3E@gmail.com>

All,

I apologize in advance if this is a naive question. I am wondering if BioPython provides libraries for working with genomic intervals in BED, GFF, or any other like format?  I am looking for libraries that handle the parsing of files in these formats into Python objects, as well as libraries for manipulating (intersection, merging, counting, etc.) intervals.  I know this exists in Galaxy's bx-python, but am wondering if there are similar libraries in BioPython?

Gratefully,
Aaron


From p.j.a.cock at googlemail.com  Sun Aug 14 11:11:37 2011
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Sun, 14 Aug 2011 12:11:37 +0100
Subject: [Biopython]  Working with genomic intervals
In-Reply-To: <B46749AA-EE67-4061-8B20-021C61501A3E@gmail.com>
References: <B46749AA-EE67-4061-8B20-021C61501A3E@gmail.com>
Message-ID: <CAKVJ-_6QM0mB=uXhExmG66REYhtTed3+ukaqAGPyB2ujv6Aiag@mail.gmail.com>

On Friday, August 12, 2011, Aaron Quinlan <aaronquinlan at gmail.com> wrote:
> All,
>
> I apologize in advance if this is a naive question.
> I am wondering if BioPython provides libraries for
> working with genomic intervals in BED, GFF, or
> any other like format?  I am looking for libraries
> that handle the parsing of files in these formats
> into Python objects, as well as libraries for
> manipulating (intersection, merging, counting,
> etc.) intervals.  I know this exists in Galaxy's
> bx-python, but am wondering if there are similar
> libraries in BioPython?
>
> Gratefully,
> Aaron

Hi Aaron,

Have a look at http://biopython.org/wiki/GFF_Parsing
wher Brad is working on this. He's also spoken
highly of bx-python as I recall.

Peter


From sdavis2 at mail.nih.gov  Sun Aug 14 11:48:15 2011
From: sdavis2 at mail.nih.gov (Sean Davis)
Date: Sun, 14 Aug 2011 07:48:15 -0400
Subject: [Biopython] Working with genomic intervals
In-Reply-To: <CAKVJ-_6QM0mB=uXhExmG66REYhtTed3+ukaqAGPyB2ujv6Aiag@mail.gmail.com>
References: <B46749AA-EE67-4061-8B20-021C61501A3E@gmail.com>
	<CAKVJ-_6QM0mB=uXhExmG66REYhtTed3+ukaqAGPyB2ujv6Aiag@mail.gmail.com>
Message-ID: <CANeAVB=WHK1QzBAmHS68eH-aueubvknAcVKVLL+p=rBVXiR3JQ@mail.gmail.com>

On Sun, Aug 14, 2011 at 7:11 AM, Peter Cock <p.j.a.cock at googlemail.com> wrote:
> On Friday, August 12, 2011, Aaron Quinlan <aaronquinlan at gmail.com> wrote:
>> All,
>>
>> I apologize in advance if this is a naive question.
>> I am wondering if BioPython provides libraries for
>> working with genomic intervals in BED, GFF, or
>> any other like format? ?I am looking for libraries
>> that handle the parsing of files in these formats
>> into Python objects, as well as libraries for
>> manipulating (intersection, merging, counting,
>> etc.) intervals. ?I know this exists in Galaxy's
>> bx-python, but am wondering if there are similar
>> libraries in BioPython?
>>
>> Gratefully,
>> Aaron
>
> Hi Aaron,
>
> Have a look at http://biopython.org/wiki/GFF_Parsing
> wher Brad is working on this. He's also spoken
> highly of bx-python as I recall.

I would second the bx-python vote.  Not only are the "normal" interval
classes covered, but there are also some variants (clustering is one
that comes to mind).

Sean


From lgautier at gmail.com  Mon Aug 15 06:17:28 2011
From: lgautier at gmail.com (Laurent Gautier)
Date: Mon, 15 Aug 2011 08:17:28 +0200
Subject: [Biopython] Working with genomic intervals
In-Reply-To: <mailman.1.1313337602.7670.biopython@lists.open-bio.org>
References: <mailman.1.1313337602.7670.biopython@lists.open-bio.org>
Message-ID: <4E48B9F8.4070605@gmail.com>

On 2011-08-14 18:00, biopython-request at lists.open-bio.org wrote:
> On Sun, Aug 14, 2011 at 7:11 AM, Peter Cock<p.j.a.cock at googlemail.com>  wrote:
>> >  On Friday, August 12, 2011, Aaron Quinlan<aaronquinlan at gmail.com>  wrote:
>>> >>  All,
>>> >>
>>> >>  I apologize in advance if this is a naive question.
>>> >>  I am wondering if BioPython provides libraries for
>>> >>  working with genomic intervals in BED, GFF, or
>>> >>  any other like format? ?I am looking for libraries
>>> >>  that handle the parsing of files in these formats
>>> >>  into Python objects, as well as libraries for
>>> >>  manipulating (intersection, merging, counting,
>>> >>  etc.) intervals. ?I know this exists in Galaxy's
>>> >>  bx-python, but am wondering if there are similar
>>> >>  libraries in BioPython?
>>> >>
>>> >>  Gratefully,
>>> >>  Aaron
>> >
>> >  Hi Aaron,
>> >
>> >  Have a look athttp://biopython.org/wiki/GFF_Parsing
>> >  wher Brad is working on this. He's also spoken
>> >  highly of bx-python as I recall.
> I would second the bx-python vote.  Not only are the "normal" interval
> classes covered, but there are also some variants (clustering is one
> that comes to mind).
>
> Sean

One can also access from Python the utilities for ranges available in
bioconductor, for example using the bioconductor extension to rpy2 or rpy2
directly (may be using dynamic class mapping features, as shown below):

from rpy2.robjects.packages import importr
iranges = importr("IRanges")
# Python class IRanges as an API to Bioconductors IRanges::IRanges
from rpy2.robjects.methods import RS4, RS4Auto_Type
class IRanges(RS4):
     __metaclass__ = RS4Auto_Type
     __rpackagename__ = "IRanges"
     __rname__ = "IRanges"

# now in action

 >>> from rpy2.robjects.vectors import IntVector
 >>> ir = IRanges(iranges.IRanges(start = IntVector(range(10)), width = 11))
 >>> print(ir)
IRanges of length 10
      start end width
[1]      0  10    11
[2]      1  11    11
[3]      2  12    11
[4]      3  13    11
[5]      4  14    11
[6]      5  15    11
[7]      6  16    11
[8]      7  17    11
[9]      8  18    11
[10]     9  19    11
 >>> print(IRanges(ir.reduce__IRanges(ir)))
IRanges of length 1
     start end width
[1]     0  19    20


From aaronquinlan at gmail.com  Mon Aug 15 23:54:31 2011
From: aaronquinlan at gmail.com (Aaron Quinlan)
Date: Mon, 15 Aug 2011 19:54:31 -0400
Subject: [Biopython] Working with genomic intervals
In-Reply-To: <4E48B9F8.4070605@gmail.com>
References: <mailman.1.1313337602.7670.biopython@lists.open-bio.org>
	<4E48B9F8.4070605@gmail.com>
Message-ID: <10D5E6D5-7114-406C-A2CF-8EB211CCE8D2@gmail.com>

Dear Peter, Sean, and Laurent,
   Thanks so much for the useful suggestions.
Best,
Aaron


On Aug 15, 2011, at 2:17 AM, Laurent Gautier wrote:

> On 2011-08-14 18:00, biopython-request at lists.open-bio.org wrote:
>> On Sun, Aug 14, 2011 at 7:11 AM, Peter Cock<p.j.a.cock at googlemail.com>  wrote:
>>> >  On Friday, August 12, 2011, Aaron Quinlan<aaronquinlan at gmail.com>  wrote:
>>>> >>  All,
>>>> >>
>>>> >>  I apologize in advance if this is a naive question.
>>>> >>  I am wondering if BioPython provides libraries for
>>>> >>  working with genomic intervals in BED, GFF, or
>>>> >>  any other like format? ?I am looking for libraries
>>>> >>  that handle the parsing of files in these formats
>>>> >>  into Python objects, as well as libraries for
>>>> >>  manipulating (intersection, merging, counting,
>>>> >>  etc.) intervals. ?I know this exists in Galaxy's
>>>> >>  bx-python, but am wondering if there are similar
>>>> >>  libraries in BioPython?
>>>> >>
>>>> >>  Gratefully,
>>>> >>  Aaron
>>> >
>>> >  Hi Aaron,
>>> >
>>> >  Have a look athttp://biopython.org/wiki/GFF_Parsing
>>> >  wher Brad is working on this. He's also spoken
>>> >  highly of bx-python as I recall.
>> I would second the bx-python vote.  Not only are the "normal" interval
>> classes covered, but there are also some variants (clustering is one
>> that comes to mind).
>> 
>> Sean
> 
> One can also access from Python the utilities for ranges available in
> bioconductor, for example using the bioconductor extension to rpy2 or rpy2
> directly (may be using dynamic class mapping features, as shown below):
> 
> from rpy2.robjects.packages import importr
> iranges = importr("IRanges")
> # Python class IRanges as an API to Bioconductors IRanges::IRanges
> from rpy2.robjects.methods import RS4, RS4Auto_Type
> class IRanges(RS4):
>    __metaclass__ = RS4Auto_Type
>    __rpackagename__ = "IRanges"
>    __rname__ = "IRanges"
> 
> # now in action
> 
> >>> from rpy2.robjects.vectors import IntVector
> >>> ir = IRanges(iranges.IRanges(start = IntVector(range(10)), width = 11))
> >>> print(ir)
> IRanges of length 10
>     start end width
> [1]      0  10    11
> [2]      1  11    11
> [3]      2  12    11
> [4]      3  13    11
> [5]      4  14    11
> [6]      5  15    11
> [7]      6  16    11
> [8]      7  17    11
> [9]      8  18    11
> [10]     9  19    11
> >>> print(IRanges(ir.reduce__IRanges(ir)))
> IRanges of length 1
>    start end width
> [1]     0  19    20
> 
> 


From brandonjbreitling at gmail.com  Wed Aug 17 21:44:21 2011
From: brandonjbreitling at gmail.com (Brandon Breitling)
Date: Wed, 17 Aug 2011 21:44:21 +0000 (UTC)
Subject: [Biopython] Question on your Methods in Enzymology paper
References: <i2vb34be8bd1004071757lc24e582aweed55444a3abd8cf@mail.gmail.com>
Message-ID: <loom.20110817T233009-686@post.gmane.org>


Hi Mr. Lunt,

My name is Brandon Breitling and I'm a statistics 
graduate student in the United States.  I was 
wondering if you hadthe scripts or code available
from your "Inference of Direct Residue Contacts
in Two-Component Signaling" paper.  I'm trying 
to see if I can do the same for a eukaryotic
protein pair that my lab 
studies.  

I have created the concatenated strings dataset
for my protein as described in your paper and 
have attempted to make scripts for the MI steps
but would really be benefited if I could get
them for the all steps in the Direct Coupling
analysis.  If you could also email me the 
accession number for your dataset so that I 
can verify that I have the scripts working, 
that would be most appreciated as well. 

Regards,
Brandon Breitling


From p.j.a.cock at googlemail.com  Wed Aug 17 22:21:44 2011
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Wed, 17 Aug 2011 23:21:44 +0100
Subject: [Biopython] Question on your Methods in Enzymology paper
In-Reply-To: <loom.20110817T233009-686@post.gmane.org>
References: <i2vb34be8bd1004071757lc24e582aweed55444a3abd8cf@mail.gmail.com>
	<loom.20110817T233009-686@post.gmane.org>
Message-ID: <CAKVJ-_5jp_2X-zqFDOnL68TpeQBB5BJWMHQ5nH9Fi3fsep_daQ@mail.gmail.com>

On Wed, Aug 17, 2011 at 10:44 PM, Brandon Breitling wrote:
>
> Hi Mr. Lunt,
>
> My name is Brandon Breitling and I'm a statistics
> graduate student in the United States. ?I was
> wondering if you had the scripts or code available
> from your "Inference of Direct Residue Contacts
> in Two-Component Signaling" paper. ?I'm trying
> to see if I can do the same for a eukaryotic
> protein pair that my lab studies.
>
> I have created the concatenated strings dataset
> for my protein as described in your paper and
> have attempted to make scripts for the MI steps
> but would really be benefited if I could get
> them for the all steps in the Direct Coupling
> analysis. ?If you could also email me the
> accession number for your dataset so that I
> can verify that I have the scripts working,
> that would be most appreciated as well.
>
> Regards,
> Brandon Breitling

Hi Brandon,

It looks like you've mixed up your email addresses.

As it happens I did my PhD on TCS, and used
Biopython's Bio.PDB model to get crude distances
from a PDB complex (and also looked at MI). I'm
not sure if I've read this paper though...

Bryan Lunt, Hendrik Szurmant, Andrea Procaccini,
James A. Hoch, Terence Hwa and Martin Weigt
"Chapter Two -  Inference of Direct Residue Contacts
in Two-Component Signaling". Methods in Enzymology
Volume 471, 2010, Pages 17-41
http://dx.doi.org/10.1016/S0076-6879(10)71002-8

Peter


From lunt at ctbp.ucsd.edu  Thu Aug 18 16:46:05 2011
From: lunt at ctbp.ucsd.edu (Bryan Lunt)
Date: Thu, 18 Aug 2011 09:46:05 -0700
Subject: [Biopython] Biopython Digest, Vol 104, Issue 10
In-Reply-To: <mailman.1.1313683203.22074.biopython@lists.open-bio.org>
References: <mailman.1.1313683203.22074.biopython@lists.open-bio.org>
Message-ID: <CAEdCOMNaQTqfn8sTVVWgDvrxLimXPOv5HGrZVt392nL0ukMrSg@mail.gmail.com>

Oh!,
Yeah, we used BioPython extensively, but I thought I sent Brandon the
code already...
We have a decent module for getting distances from Bio.PDB, though
unfortunately it uses far far too much disk space (it outputs a large
text file with every residue compared to every other reside, allowing
AWK or some other tool to filter the file.)
And a large set of tools for creating putative pairings, mainly for
TCS, but of course generalized to pair any set of protein domains...

-Bryan

On Thu, Aug 18, 2011 at 9:00 AM,  <biopython-request at lists.open-bio.org> wrote:
> Send Biopython mailing list submissions to
> ? ? ? ?biopython at lists.open-bio.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
> ? ? ? ?http://lists.open-bio.org/mailman/listinfo/biopython
> or, via email, send a message with subject or body 'help' to
> ? ? ? ?biopython-request at lists.open-bio.org
>
> You can reach the person managing the list at
> ? ? ? ?biopython-owner at lists.open-bio.org
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of Biopython digest..."
>
>
> Today's Topics:
>
> ? 1. Question on your Methods in Enzymology paper (Brandon Breitling)
> ? 2. Re: Question on your Methods in Enzymology paper (Peter Cock)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Wed, 17 Aug 2011 21:44:21 +0000 (UTC)
> From: Brandon Breitling <brandonjbreitling at gmail.com>
> Subject: [Biopython] Question on your Methods in Enzymology paper
> To: biopython at biopython.org
> Message-ID: <loom.20110817T233009-686 at post.gmane.org>
> Content-Type: text/plain; charset=us-ascii
>
>
> Hi Mr. Lunt,
>
> My name is Brandon Breitling and I'm a statistics
> graduate student in the United States. ?I was
> wondering if you hadthe scripts or code available
> from your "Inference of Direct Residue Contacts
> in Two-Component Signaling" paper. ?I'm trying
> to see if I can do the same for a eukaryotic
> protein pair that my lab
> studies.
>
> I have created the concatenated strings dataset
> for my protein as described in your paper and
> have attempted to make scripts for the MI steps
> but would really be benefited if I could get
> them for the all steps in the Direct Coupling
> analysis. ?If you could also email me the
> accession number for your dataset so that I
> can verify that I have the scripts working,
> that would be most appreciated as well.
>
> Regards,
> Brandon Breitling
>
>
>
> ------------------------------
>
> Message: 2
> Date: Wed, 17 Aug 2011 23:21:44 +0100
> From: Peter Cock <p.j.a.cock at googlemail.com>
> Subject: Re: [Biopython] Question on your Methods in Enzymology paper
> To: Brandon Breitling <brandonjbreitling at gmail.com>
> Cc: biopython at biopython.org
> Message-ID:
> ? ? ? ?<CAKVJ-_5jp_2X-zqFDOnL68TpeQBB5BJWMHQ5nH9Fi3fsep_daQ at mail.gmail.com>
> Content-Type: text/plain; charset=ISO-8859-1
>
> On Wed, Aug 17, 2011 at 10:44 PM, Brandon Breitling wrote:
>>
>> Hi Mr. Lunt,
>>
>> My name is Brandon Breitling and I'm a statistics
>> graduate student in the United States. ?I was
>> wondering if you had the scripts or code available
>> from your "Inference of Direct Residue Contacts
>> in Two-Component Signaling" paper. ?I'm trying
>> to see if I can do the same for a eukaryotic
>> protein pair that my lab studies.
>>
>> I have created the concatenated strings dataset
>> for my protein as described in your paper and
>> have attempted to make scripts for the MI steps
>> but would really be benefited if I could get
>> them for the all steps in the Direct Coupling
>> analysis. ?If you could also email me the
>> accession number for your dataset so that I
>> can verify that I have the scripts working,
>> that would be most appreciated as well.
>>
>> Regards,
>> Brandon Breitling
>
> Hi Brandon,
>
> It looks like you've mixed up your email addresses.
>
> As it happens I did my PhD on TCS, and used
> Biopython's Bio.PDB model to get crude distances
> from a PDB complex (and also looked at MI). I'm
> not sure if I've read this paper though...
>
> Bryan Lunt, Hendrik Szurmant, Andrea Procaccini,
> James A. Hoch, Terence Hwa and Martin Weigt
> "Chapter Two - ?Inference of Direct Residue Contacts
> in Two-Component Signaling". Methods in Enzymology
> Volume 471, 2010, Pages 17-41
> http://dx.doi.org/10.1016/S0076-6879(10)71002-8
>
> Peter
>
>
>
> ------------------------------
>
> _______________________________________________
> Biopython mailing list ?- ?Biopython at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython
>
>
> End of Biopython Digest, Vol 104, Issue 10
> ******************************************
>


From p.j.a.cock at googlemail.com  Thu Aug 18 19:32:57 2011
From: p.j.a.cock at googlemail.com (Peter)
Date: Thu, 18 Aug 2011 20:32:57 +0100
Subject: [Biopython] Biopython 1.58 released
Message-ID: <75327C54-CF88-43BC-BACF-87139456FE67@googlemail.com>

Dear All,

Biopython 1.58 is out:
http://news.open-bio.org/news/2011/08/biopython-1-58-released/

Thank you to everyone who has contributed.

Peter

P.S. We're on Twitter as @Biopython