[BioPython] Bio.Cluster - Howto, Documentation, exporting results

Wed Mar 26 14:11:49 UTC 2008

Thanks again for all the clarifications.

This was very helpful.

Renato

Michiel de Hoon wrote:
> > Additionally, as of BioPython-1.44, there are a couple of things
> > mentioned in the documentation that are not available in Bio.Cluster.
> > One of those is the Bio.Cluster.read function. I don't know if this is
> > because it was not yet in BioPython-1.44 or if the documentation is
> > outdated.
>
> Some changes were made in Bio.Cluster in Biopython 1.45. These are 
> largely cosmetic to make Bio.Cluster more consistent with other 
> modules in Biopython. One of them is the read() function, which was 
> added in Biopython 1.45. I have now updated the documentation for 
> Bio.Cluster on the Biopython website; it corresponds to Biopython 1.45.
>
> > I don't read the data from files, so I don't understand if DataFile
> > class is what I need, and if it is, how do I make use of it.
>
> > What I'm trying to do is to calculate the distances between some
> > multidimensional vectors and then cluster them. I managed to do that,
> > but then I don't know what to do with the Tree object I get. It's also
> > not obvious how do I keep track of which values in the Tree object
> > correspond to which entries in the distance matrix or in the 
> original data.
>
> The values in the Tree object, if non-negative, simply correspond to 
> the row number in the distance matrix. If negative, they correspond to 
> a node number. So if the Tree object is
> [1, 2]  --> This is Node # -1
> [-1,0]  --> This is node # -2
> then first row 1 and row 2 in the distance matrix are joined, and then 
> row 0 in the distance matrix is joined to the node [1,2].
>
> > Is it possible to pass text in the original data so that it is used as
> > some sort of identifying header in later operations?
>
> Instead of relying on the row numbers, you can also create an empty 
> Bio.Cluster.Record object and fill this object with the data you have. 
> Bio.Cluster.Record is essentially the same as Bio.Cluster.DataFile, 
> just the name was changed for consistency with other Biopython 
> modules. It may be a good idea to look at the documentation of Cluster 
> 3 at
> http://bonsai.ims.u-tokyo.ac.jp/~mdehoon/software/cluster/manual/index.html
> to understand what all the fields in Bio.Cluster.Record are.
>
> Another way is to construct a file in memory and let Bio.Cluster.read 
> parse it.
> >>> lines = 
> "Start\tCol0\tCol1\tCol2\nRow0\t2.0\t1.2\t3.4\nRow1\t5.0\t6.2\t7.1\nRow2\t2.3\t5.6\t1.2\n"
> >>> print lines
> Start   Col0    Col1    Col2
> Row0    2.0     1.2     3.4
> Row1    5.0     6.2     7.1
> Row2    2.3     5.6     1.2
> >>> import StringIO
> >>> handle = StringIO.StringIO(lines)
> >>> record = Cluster.read(handle)
> >>> tree = record.treecluster()
>
> > How can I export the Tree object to something like the treeview format
> > mentioned in the documentation?
>
> >>> record.save("myfilename", tree)
>
> > Is there any way to visualize the tree directly using ASCII or 
> something
> > more graphical?
>
> Currently, there is no ASCII art -like representation to visualize the 
> tree. So the easiest solution is to save the clustering solution in 
> the treeview format, and use Java TreeView to visualize it.
>
> --Michiel.
>
> ------------------------------------------------------------------------
> Looking for last minute shopping deals? Find them fast with Yahoo! 
> Search. 
> <http://us.rd.yahoo.com/evt=51734/*http://tools.search.yahoo.com/newsearch/category.php?category=shopping>