[Biopython] clustalw align multiple sequences to reference.

Peter Cock p.j.a.cock at googlemail.com
Wed Jul 13 21:44:55 UTC 2011


On Wednesday, July 13, 2011, Daniel Jones <lawson.jones at gmail.com> wrote:
> Hi Biopython users,
> I have a file with many (~50,000) 200 bp sequences, each of which I would
> like to align to a fixed reference sequence. I *don't* care about aligning
> all 50,000 sequences with each other; I only care about aligning each one
> with the reference sequence. I can't figure out a way to do this without
> generating 50,000 files, which seems like ridiculous unnecessary overhead.
> It seems like ClustalW's interface is quite inflexible in demanding separate
> input and output files for each alignment, but I don't have much experience
> using it so maybe I'm completely missing something.
>
> Incidentally, I'm not wedded to the idea of using ClustalW, so if there's an
> alternate alignment program that would make this easier, I'd certainly be
> open to trying it.
>
> Thanks,
> Daniel Jones

You need a pairwise alignment tool. Perhaps needle or water from the
EMBOSS suite, or Biopython's pairwise2 module would be suitable (not
in the tutorial, read the API docs).

However, as Eric suggested, an NGS alignment tool might be more appropriate.

Peter



More information about the Biopython mailing list