[Biopython] Follow-up about python/biopython code for submitting multiple jobs to clustalw

Peter Cock p.j.a.cock at googlemail.com
Thu Jan 23 09:57:50 UTC 2014


On Thu, Jan 23, 2014 at 7:39 AM, Edson Ishengoma
<ishengomae at nm-aist.ac.tz> wrote:
> Hi all,
>
> I couldn't get a response about my struggles which I asked few days past, I
> presume it was either a poorly submitted question or my approach with what I
> want to do is totally out of touch with mainstream bioinformatics. The thing
> is I am a newbie to both python programming and bioinformatics but I believe
> there are people here who can help, so I will try again with more
> background.
>
> The overall goal with what I want to achieve is to perform selection
> analyses on multiple species with codeml in PAML. For this the inputs should
> be both the sequence alignments and tree files. I already have sequence file
> (produced by pal2nal) but I still need a corresponding tree file.
>
> So what I am challenged with is the fact that my nucleotide alignment file
> contain cds of four species at many loci (it is kind of whole genome data)
> so I will have to submit the job to a tree producing program per each
> alignment - I can use clustalw or Phylip.

If you haven't already, try to get some advice from a phylogenetics
specialist about what to do. For example, clustalw is old and superseded.

You have 4 species, and (say) 50 genes/loci from each. One approach
is to make 50 protein alignments (one for each set of four genes),
turn these into 50 codon-aware nucleotide alignments (with pal2nal
or similar, e.g. [1]), then you could use Biopython to combine these
into a single large concatenated alignment (4 rows for the 4 species),
and use that to build a tree.

This may not be the best plan, but one of our students here did
something like this recently (using Biopython in part).

Peter
[1] https://github.com/peterjc/picobio/blob/master/align/align_back_trans.py



More information about the Biopython mailing list