[BioPython] Bio.Cluster - Howto, Documentation, exporting results
Renato Alves
rjalves at igc.gulbenkian.pt
Wed Mar 26 14:11:49 UTC 2008
Thanks again for all the clarifications.
This was very helpful.
Renato
Michiel de Hoon wrote:
> > Additionally, as of BioPython-1.44, there are a couple of things
> > mentioned in the documentation that are not available in Bio.Cluster.
> > One of those is the Bio.Cluster.read function. I don't know if this is
> > because it was not yet in BioPython-1.44 or if the documentation is
> > outdated.
>
> Some changes were made in Bio.Cluster in Biopython 1.45. These are
> largely cosmetic to make Bio.Cluster more consistent with other
> modules in Biopython. One of them is the read() function, which was
> added in Biopython 1.45. I have now updated the documentation for
> Bio.Cluster on the Biopython website; it corresponds to Biopython 1.45.
>
> > I don't read the data from files, so I don't understand if DataFile
> > class is what I need, and if it is, how do I make use of it.
>
> > What I'm trying to do is to calculate the distances between some
> > multidimensional vectors and then cluster them. I managed to do that,
> > but then I don't know what to do with the Tree object I get. It's also
> > not obvious how do I keep track of which values in the Tree object
> > correspond to which entries in the distance matrix or in the
> original data.
>
> The values in the Tree object, if non-negative, simply correspond to
> the row number in the distance matrix. If negative, they correspond to
> a node number. So if the Tree object is
> [1, 2] --> This is Node # -1
> [-1,0] --> This is node # -2
> then first row 1 and row 2 in the distance matrix are joined, and then
> row 0 in the distance matrix is joined to the node [1,2].
>
> > Is it possible to pass text in the original data so that it is used as
> > some sort of identifying header in later operations?
>
> Instead of relying on the row numbers, you can also create an empty
> Bio.Cluster.Record object and fill this object with the data you have.
> Bio.Cluster.Record is essentially the same as Bio.Cluster.DataFile,
> just the name was changed for consistency with other Biopython
> modules. It may be a good idea to look at the documentation of Cluster
> 3 at
> http://bonsai.ims.u-tokyo.ac.jp/~mdehoon/software/cluster/manual/index.html
> to understand what all the fields in Bio.Cluster.Record are.
>
> Another way is to construct a file in memory and let Bio.Cluster.read
> parse it.
> >>> lines =
> "Start\tCol0\tCol1\tCol2\nRow0\t2.0\t1.2\t3.4\nRow1\t5.0\t6.2\t7.1\nRow2\t2.3\t5.6\t1.2\n"
> >>> print lines
> Start Col0 Col1 Col2
> Row0 2.0 1.2 3.4
> Row1 5.0 6.2 7.1
> Row2 2.3 5.6 1.2
> >>> import StringIO
> >>> handle = StringIO.StringIO(lines)
> >>> record = Cluster.read(handle)
> >>> tree = record.treecluster()
>
> > How can I export the Tree object to something like the treeview format
> > mentioned in the documentation?
>
> >>> record.save("myfilename", tree)
>
> > Is there any way to visualize the tree directly using ASCII or
> something
> > more graphical?
>
> Currently, there is no ASCII art -like representation to visualize the
> tree. So the easiest solution is to save the clustering solution in
> the treeview format, and use Java TreeView to visualize it.
>
> --Michiel.
>
> ------------------------------------------------------------------------
> Looking for last minute shopping deals? Find them fast with Yahoo!
> Search.
> <http://us.rd.yahoo.com/evt=51734/*http://tools.search.yahoo.com/newsearch/category.php?category=shopping>
More information about the Biopython
mailing list