[BioPython] How to use Bio.cluster Module to assembly dna sequences

Bruno Santos bsantos at biocant.pt
Thu Mar 27 17:33:55 UTC 2008


Hi,

This question is a little bit more generic so I really don't know if anyone
in the mailing may help me.

I have a fasta file with thousands of reads obtained by a sequencing run, in
this fasta file I know I have several copies of the same sequences but their
size and some nucleotides inside it can change. So I need to group them
together using clustering so then I can create a consensus sequence for each
group.  

I am trying to achieve this by align all the sequences using clustalw-mpi
and the I run dnadist from phylip to obtain a matrix of distances between
the sequences. Now I need to use clustering to group the sequences based on
these values and for that I am trying to use Bio.cluster to achieve this.
Can anyone help me to choose the clustering method I should use and how can
I submit this kind of data to that method?

 

Sincerely,

Bruno Santos




More information about the Biopython mailing list