latest version v1.9 - last update 10 Apr 2010 |
A class providing the DBScan for clustering. More...
#include <ltiDBScan.h>
Classes | |
class | parameters |
the parameters for the class DBScan More... | |
Public Member Functions | |
DBScan () | |
DBScan (const DBScan &other) | |
virtual | ~DBScan () |
virtual const char * | getTypeName () const |
DBScan & | copy (const DBScan &other) |
DBScan & | operator= (const DBScan &other) |
virtual classifier * | clone () const |
const parameters & | getParameters () const |
virtual bool | train (const dmatrix &input, ivector &ids) |
virtual bool | train (const dmatrix &input) |
virtual bool | classify (const dvector &feature, outputVector &result) const |
Protected Member Functions | |
virtual bool | dbScan (dKdTree &setOfPoints) |
void | cluster (dNode *nPtr, dKdTree *setOfPoints) |
bool | expandCluster (dKdTree *setOfPoints, dElement *point, int clusterID) |
bool | allElementsClustered (dNode *nPtr) |
Protected Attributes | |
double | eps |
int | minPts |
int | clusterID |
dKdTree | setOfPoints |
int | buckedSize |
A class providing the DBScan for clustering.
DBScan is a density-based clustering algorithm, which is designed to find clusters of abitrary shape.
Using one point the algorithm searches for minPts in the eps-neighborhood of this point. If not enough points are found, they are classified as "NOISE" (Value = 0) else it tries to expand this found cluster by using each found point for another search and assigns the current cluster ID to each point. If then not enough points are found an new cluster is created using an not jet classified point. The clustering finishes if all points are clussified, that means either having a cluster ID or marked as "NOISE".
The original algorithm uses a R*-Tree as internal data struckture, this implementation uses the lti::kdTree. The trees provide an efficient method for searching points in an eps-neighborhood.
DBScan requires three parameters:
The eps is the critical parameter of this algorithm, because the number of clusters depends on it. A good value would be the density of the least dense cluster. But it is very hard to get this information on advance. Normally one does not know the distribution of the points in the space. If no cluster is found, all points are marked as noise!
The type T describes the distance measure used in the internal data structure. The Default value is l2SquareDistance
The original DBScan algorithm was introduced in:
Martin Ester, Hans-Peter Kriegel, Joerg Sander, Xiaowei Xu A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise. Proceedings of 2nd International Conference on Knowledge Discovery and Data Mining (KDD-96)
URL: http://www.dbs.informatik.uni-muenchen.de/Publikationen/Papers/ KDD-96.final.frame.ps
lti::DBScan< T >::DBScan | ( | ) |
default constructor reads and sets the parameters
lti::DBScan< T >::DBScan | ( | const DBScan< T > & | other | ) |
copy constructor
other | the object to be copied |
virtual lti::DBScan< T >::~DBScan | ( | ) | [virtual] |
destructor
bool lti::DBScan< T >::allElementsClustered | ( | dNode * | nPtr | ) | [protected] |
check if elements of node have been clustered, if the node is a leaf returns true if all elements are clusterd, else false
virtual bool lti::DBScan< T >::classify | ( | const dvector & | feature, | |
outputVector & | result | |||
) | const [virtual] |
classify
feature | vector to be classified | |
result | result as described above |
Implements lti::unsupervisedClassifier.
virtual classifier* lti::DBScan< T >::clone | ( | ) | const [virtual] |
returns a pointer to a clone of this clustering.
Implements lti::classifier.
void lti::DBScan< T >::cluster | ( | dNode * | nPtr, | |
dKdTree * | setOfPoints | |||
) | [protected] |
The dbScan algorithm.
walks rekursively through the tree, if one point ist found, the clustering is started
DBScan& lti::DBScan< T >::copy | ( | const DBScan< T > & | other | ) |
copy data of "other" clustering.
other | the clustering to be copied |
Reimplemented from lti::clustering.
virtual bool lti::DBScan< T >::dbScan | ( | dKdTree & | setOfPoints | ) | [protected, virtual] |
bool lti::DBScan< T >::expandCluster | ( | dKdTree * | setOfPoints, | |
dElement * | point, | |||
int | clusterID | |||
) | [protected] |
expandCluster tries to expand the cluster found, with respect of eps, minPts, density-reachability and desity-connectivity
setOfPoints | pointer to the used kd-Tree | |
point | one point/element of the data stored in the kd-Tree | |
clusterID | the ID of the current cluster returns true if the cluster could be expanded |
const parameters& lti::DBScan< T >::getParameters | ( | ) | const |
returns used parameters
Reimplemented from lti::clustering.
virtual const char* lti::DBScan< T >::getTypeName | ( | ) | const [virtual] |
returns the name of this type ("DBScan")
Reimplemented from lti::clustering.
DBScan& lti::DBScan< T >::operator= | ( | const DBScan< T > & | other | ) |
alias for copy member
other | the clustering to be copied |
Reimplemented from lti::classifier.
virtual bool lti::DBScan< T >::train | ( | const dmatrix & | input | ) | [virtual] |
Unsupervised training.
The vectors in the input
matrix will be clustered using each specific method. Train will return false, if no cluster is found.
Implements lti::clustering.
virtual bool lti::DBScan< T >::train | ( | const dmatrix & | input, | |
ivector & | ids | |||
) | [virtual] |
Unsupervised training.
The vectors in the input
matrix will be put into groups according to the training algorithm. Additionally, an integer indicating the class each point belongs to is returned.
By default this method uses the other train method train(const dmatrix&) and then calls classify(const dvector&,outputVector&) to get the ids for each train vector. These ids are then returned. Train will return false, if no cluster is found and all points are marked as noise!
input | the matrix with the input vectors (each row is a training vector) | |
ids | vector of class ids for each input point |
Reimplemented from lti::clustering.
int lti::DBScan< T >::buckedSize [protected] |
bucked-size of the kd-tree number of elements stored in one leaf default value: 2
int lti::DBScan< T >::clusterID [protected] |
clusterID
double lti::DBScan< T >::eps [protected] |
eps neighbourhood
int lti::DBScan< T >::minPts [protected] |
minPts
dKdTree lti::DBScan< T >::setOfPoints [protected] |
kd-tree used for effective search on elements internal used data strukture