hosted by CEDAR HepForge
Data Mining in HEP, kmeans, cmeans (fuzzy) cluster algorithm

JMinHEP - data mining framework for high-energy physics

Version 1.0, April 2006 (c) copyright S.Chekanov (chekanov@mail.desy.de)


JMinHEP is a framework for clustering analysis, i.e. for non-supervised learning in which the classification process does not depend on a priory information . The program is a pure JAVA-based application and includes the following algorithms:
  1. K-means clustering analysis (single and multi pass)
  2. C-means (fuzzy) algorithm
  3. Agglomerative hierarchical clustering
  4. .. more will be included soon
More information can be found in en.wikipedia.org or this tutorial.. The algorithms can run for a fixed cluster mode and for a best estimate, i.e. when the number of clusters is not a priory given but is found after estimation of the cluster compactness. The data points can be defined in multidimensional space. At present, the distance measure is euclidean.

Download: JMinHEP-1.0.tar.gz
Then unzip and untar it (tar -zvxf JMinHEP-[version].tar.gz under Linux/Unix). This will create a directory: jminhep with JMinHEP.jar file. You can run it as usual, i.e. as java -jar JMinHEP.jar


The program can be run:
  • 1) In GUI mode. In this mode one can set all clustering parameters via the user interface and the output is displayed. The data can be loaded in form of Attribute-Relation File Format (ARFF). The cluster centers and the  seeds positions can also be shown.
  • 2) JMinHEP can be run in an embedded mode.

JMinHEP GUI mode

Just run it as:

java -jar JMinHEP-[version].jar

and load any ARFF file. Some example files can be found here:  iris.arff  or my.arff. Also, you can load the data from the prompt:

java -jar JMinHEP.jar iris.arff

Screenshot of JMinHEP


JMinHEP embedded

You can include JMinHEP.jar to your application. Look at the example application located in the "example" directory. You can also look at the example another example for the C-mean algorithm which prints membership matrix. You need to include JMinHEP.jar to the JAVA classpath to compile it.

In short, you need just the statement in your code:

include jminhep.clanalyse.*;

Then load the data to the dataHolder. The Partition class does the clustering. Then you can run any cluster algorithm
depending on input mode (the correct mode is shown in the status bar of GUI). You can access all output information by calling the methods: getName(), .getCompactness(), getNclusters(), getCenters(), getClusterNumber(). The example program runs over all possible clustering modes and then print the final result. Read API to learn more about the Partition, dataPoint and dataHolder classes here.

Note: JMinHEP is not completely free software. Read the JMinHEP License. The package is based on free JFreeChart package by Object Refinery Limited and Contributors. JFreeChart is licensed under the terms of the GNU Lesser General Public Licence (LGPL).

You can send your algorithms to me for inclusion, if you will follow the codding standard given by the dataHolder and Partitioner standard.

This program is a standalone version of my package: JHepWork - Java data-analysis framework, which I'm developing for the international linear collider (ILC). You can also look at my other software packages here

S.Chekanov (chekanov@mail.desy.de)

Last updated: Thu Jan 10 21:03:51 2008