[Biopython] clustalw align multiple sequences to reference.

Eric Talevich eric.talevich at gmail.com
Wed Jul 13 20:38:42 UTC 2011


On Wed, Jul 13, 2011 at 2:40 PM, Daniel Jones <lawson.jones at gmail.com>wrote:

> Hi Biopython users,
> I have a file with many (~50,000) 200 bp sequences, each of which I would
> like to align to a fixed reference sequence. I *don't* care about aligning
> all 50,000 sequences with each other; I only care about aligning each one
> with the reference sequence. I can't figure out a way to do this without
> generating 50,000 files, which seems like ridiculous unnecessary overhead.
> It seems like ClustalW's interface is quite inflexible in demanding
> separate
> input and output files for each alignment, but I don't have much experience
> using it so maybe I'm completely missing something.
>
> Incidentally, I'm not wedded to the idea of using ClustalW, so if there's
> an
> alternate alignment program that would make this easier, I'd certainly be
> open to trying it.
>
>
Are these reads from sequencing? If so, then BWA or Bowtie might be what you
want:

http://bio-bwa.sourceforge.net/
http://bowtie-bio.sourceforge.net/index.shtml

If not, then you could try BLAST with your reference sequence as the query
and the short sequences as your database.

Cheers,
Eric



More information about the Biopython mailing list