[Biopython] Follow-up about python/biopython code for submitting multiple jobs to clustalw

Thu Jan 23 18:01:47 UTC 2014

Thanks Michael,
Yes I have many orthologous alignments (about 20,000 thousands genes
--typical of mammalian genomes anyway). Initially I thought of this idea of
having separate files and I hesitated because of computer memory expenses
in writing files. So thanks for reinforcing my thought that it can be a
viable option.

Regards,

Edson B. Ishengoma
PhD-Candidate
*School of Life Sciences and Engineering
Nelson Mandela African Institute of Science and Technology
Nelson Mandela Road
P. O. Box 447, Arusha
Tanzania (255)
*
*ishengomae at nm-aist.ac.tz  *ebarongo82 at yahoo.co.uk
*

<http://www.nm-aist.ac.tz/>Mobile: +255 762 348 037, +255 714 789 360,
  Website: www.nm-aist.ac.tz
Skype: edson.ishengoma

*
*
**

On Thu, Jan 23, 2014 at 7:31 PM, Michael Thon <mike.thon at gmail.com> wrote:

> Hi Edson -  It sounds like you have many alignments concatenated together
> in one file.  You may want to keep each of your loci (a.k.a. orthologous
> sets of DNA or protein sequences) in a separate file for each family.  I
> think you will find it easier to do your alignment and tree building
> operations on them.  For each locus make a protein file in FASTA format and
> a transcript file in fasta format, each file would have four sequences in
> it.  then its simple to loop through the contents of a directory and call a
> command line program on each file.  You may not even need python for all
> the steps.
>
>
> On Jan 23, 2014, at 10:57 AM, Peter Cock <p.j.a.cock at googlemail.com>
> wrote:
>
> On Thu, Jan 23, 2014 at 7:39 AM, Edson Ishengoma
> <ishengomae at nm-aist.ac.tz> wrote:
>
> Hi all,
>
> I couldn't get a response about my struggles which I asked few days past, I
> presume it was either a poorly submitted question or my approach with what
> I
> want to do is totally out of touch with mainstream bioinformatics. The
> thing
> is I am a newbie to both python programming and bioinformatics but I
> believe
> there are people here who can help, so I will try again with more
> background.
>
> The overall goal with what I want to achieve is to perform selection
> analyses on multiple species with codeml in PAML. For this the inputs
> should
> be both the sequence alignments and tree files. I already have sequence
> file
> (produced by pal2nal) but I still need a corresponding tree file.
>
> So what I am challenged with is the fact that my nucleotide alignment file
> contain cds of four species at many loci (it is kind of whole genome data)
> so I will have to submit the job to a tree producing program per each
> alignment - I can use clustalw or Phylip.
>
>
> If you haven't already, try to get some advice from a phylogenetics
> specialist about what to do. For example, clustalw is old and superseded.
>
> You have 4 species, and (say) 50 genes/loci from each. One approach
> is to make 50 protein alignments (one for each set of four genes),
> turn these into 50 codon-aware nucleotide alignments (with pal2nal
> or similar, e.g. [1]), then you could use Biopython to combine these
> into a single large concatenated alignment (4 rows for the 4 species),
> and use that to build a tree.
>
> This may not be the best plan, but one of our students here did
> something like this recently (using Biopython in part).
>
> Peter
> [1]
> https://github.com/peterjc/picobio/blob/master/align/align_back_trans.py
> _______________________________________________
> Biopython mailing list  -  Biopython at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython
>
>
>