[Biopython] Follow-up about python/biopython code for submitting multiple jobs to clustalw

Michael Thon mike.thon at gmail.com
Thu Jan 23 16:31:50 UTC 2014


Hi Edson -  It sounds like you have many alignments concatenated together in one file.  You may want to keep each of your loci (a.k.a. orthologous sets of DNA or protein sequences) in a separate file for each family.  I think you will find it easier to do your alignment and tree building operations on them.  For each locus make a protein file in FASTA format and a transcript file in fasta format, each file would have four sequences in it.  then its simple to loop through the contents of a directory and call a command line program on each file.  You may not even need python for all the steps. 


On Jan 23, 2014, at 10:57 AM, Peter Cock <p.j.a.cock at googlemail.com> wrote:

> On Thu, Jan 23, 2014 at 7:39 AM, Edson Ishengoma
> <ishengomae at nm-aist.ac.tz> wrote:
>> Hi all,
>> 
>> I couldn't get a response about my struggles which I asked few days past, I
>> presume it was either a poorly submitted question or my approach with what I
>> want to do is totally out of touch with mainstream bioinformatics. The thing
>> is I am a newbie to both python programming and bioinformatics but I believe
>> there are people here who can help, so I will try again with more
>> background.
>> 
>> The overall goal with what I want to achieve is to perform selection
>> analyses on multiple species with codeml in PAML. For this the inputs should
>> be both the sequence alignments and tree files. I already have sequence file
>> (produced by pal2nal) but I still need a corresponding tree file.
>> 
>> So what I am challenged with is the fact that my nucleotide alignment file
>> contain cds of four species at many loci (it is kind of whole genome data)
>> so I will have to submit the job to a tree producing program per each
>> alignment - I can use clustalw or Phylip.
> 
> If you haven't already, try to get some advice from a phylogenetics
> specialist about what to do. For example, clustalw is old and superseded.
> 
> You have 4 species, and (say) 50 genes/loci from each. One approach
> is to make 50 protein alignments (one for each set of four genes),
> turn these into 50 codon-aware nucleotide alignments (with pal2nal
> or similar, e.g. [1]), then you could use Biopython to combine these
> into a single large concatenated alignment (4 rows for the 4 species),
> and use that to build a tree.
> 
> This may not be the best plan, but one of our students here did
> something like this recently (using Biopython in part).
> 
> Peter
> [1] https://github.com/peterjc/picobio/blob/master/align/align_back_trans.py
> _______________________________________________
> Biopython mailing list  -  Biopython at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython





More information about the Biopython mailing list