[BioPython] FW: How to Bio.cluster Module

Michiel de Hoon mjldehoon at yahoo.com
Sat Mar 29 23:01:33 EDT 2008


I agree with Bruno that Bio.Cluster is probably not the best tool for this kind of alignment. If you do want to use Bio.Cluster, first you have to decide how you want to define the distance or similarity between sequences. Then, create a distance matrix that stores all these distances, and apply Cluster.treecluster on this distance matrix to get a hierarchical clustering of the sequences:

>>> from Bio import Cluster
>>> d = [[],[2.0],[3.0,4.0]]
# Your distance matrix
# Distance between seq1 and seq2 is 2.0
# Distance between seq1 and seq3 is 3.0
 # Distance between seq2 and seq3 is 4.0
 >>> print Cluster.treecluster(distancematrix=d)
(1, 0): 2
(2, -1): 4
# First, join seq1 and seq2 at distance 2.0
# Then, join seq3 with the node (seq1, seq2).

--Michiel


> Hi,
> I don't know the details of your sequences, but for the assembly that you want 
> to do there could be better methods. I have done this kind of assemblies with 
> ESTs sequences and for that porpouse I have used cap3 or tgicl.
> Best regards,

> Jose Blanca

Bruno Santos <bsantos at biocant.pt> wrote: Hi,

This question is a little bit more generic so I really don't know if anyone
in the mailing may help me.

I have a fasta file with thousands of reads obtained by a sequencing run, in
this fasta file I know I have several copies of the same sequences but their
size and some nucleotides inside it can change. So I need to group them
together using clustering so then I can create a consensus sequence for each
group.  

I am trying to achieve this by align all the sequences using clustalw-mpi
and the I run dnadist from phylip to obtain a matrix of distances between
the sequences. Now I need to use clustering to group the sequences based on
these values and for that I am trying to use Bio.cluster to achieve this.
Can anyone help me to choose the clustering method I should use and how can
I submit this kind of data to that method?

 

Sincerely,

Bruno Santos

_______________________________________________
BioPython mailing list  -  BioPython at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/biopython


       
---------------------------------
Be a better friend, newshound, and know-it-all with Yahoo! Mobile.  Try it now.


More information about the BioPython mailing list