lti::DBScan< T > Class Template Reference

A class providing the DBScan for clustering. More...

#include <ltiDBScan.h>

Inheritance diagram for lti::DBScan< T >:

[legend]

Collaboration diagram for lti::DBScan< T >:

[legend]

List of all members.

Classes
class	parameters
	the parameters for the class DBScan More...
Public Member Functions
	DBScan ()
	DBScan (const DBScan &other)
virtual	~DBScan ()
virtual const char *	getTypeName () const
DBScan &	copy (const DBScan &other)
DBScan &	operator= (const DBScan &other)
virtual classifier *	clone () const
const parameters &	getParameters () const
virtual bool	train (const dmatrix &input, ivector &ids)
virtual bool	train (const dmatrix &input)
virtual bool	classify (const dvector &feature, outputVector &result) const
Protected Member Functions
virtual bool	dbScan (dKdTree &setOfPoints)
void	cluster (dNode nPtr, dKdTree setOfPoints)
bool	expandCluster (dKdTree setOfPoints, dElement point, int clusterID)
bool	allElementsClustered (dNode *nPtr)
Protected Attributes
double	eps
int	minPts
int	clusterID
dKdTree	setOfPoints
int	buckedSize

Detailed Description

template<class T = l2SquareDistantor<vector<double> >>
class lti::DBScan< T >

A class providing the DBScan for clustering.

DBScan is a density-based clustering algorithm, which is designed to find clusters of abitrary shape.

Using one point the algorithm searches for minPts in the eps-neighborhood of this point. If not enough points are found, they are classified as "NOISE" (Value = 0) else it tries to expand this found cluster by using each found point for another search and assigns the current cluster ID to each point. If then not enough points are found an new cluster is created using an not jet classified point. The clustering finishes if all points are clussified, that means either having a cluster ID or marked as "NOISE".

The original algorithm uses a R*-Tree as internal data struckture, this implementation uses the lti::kdTree. The trees provide an efficient method for searching points in an eps-neighborhood.

DBScan requires three parameters:

buckedSize the number of elements stored in each leaf of the kd-Tree. The default value is 2.
minPts the minimum number of points which have to be found in the eps neighborhood. The default value is 4.
eps IMPORTANT: The size of the eps-neighborhood used for searching. This parameter MUST be set by the user. The default value is 0 and if it isn't change, the algorithms returns false.

The eps is the critical parameter of this algorithm, because the number of clusters depends on it. A good value would be the density of the least dense cluster. But it is very hard to get this information on advance. Normally one does not know the distribution of the points in the space. If no cluster is found, all points are marked as noise!

The type T describes the distance measure used in the internal data structure. The Default value is l2SquareDistance

The original DBScan algorithm was introduced in:

Martin Ester, Hans-Peter Kriegel, Joerg Sander, Xiaowei Xu A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise. Proceedings of 2nd International Conference on Knowledge Discovery and Data Mining (KDD-96)

URL: http://www.dbs.informatik.uni-muenchen.de/Publikationen/Papers/ KDD-96.final.frame.ps

Constructor & Destructor Documentation

template<class T = l2SquareDistantor<vector<double> >>

lti::DBScan< T >::DBScan ( )

default constructor reads and sets the parameters

template<class T = l2SquareDistantor<vector<double> >>

lti::DBScan< T >::DBScan ( const DBScan< T > & other )

copy constructor

Parameters:

other

the object to be copied

template<class T = l2SquareDistantor<vector<double> >>

virtual lti::DBScan< T >::~DBScan ( ) [virtual]

destructor

Member Function Documentation

template<class T = l2SquareDistantor<vector<double> >>

bool lti::DBScan< T >::allElementsClustered ( dNode * nPtr ) [protected]

check if elements of node have been clustered, if the node is a leaf returns true if all elements are clusterd, else false

template<class T = l2SquareDistantor<vector<double> >>

virtual bool lti::DBScan< T >::classify	(	const dvector &	feature,
		outputVector &	result
	)			const `[virtual]`

classify

Parameters:

	feature	vector to be classified
	result	result as described above

Returns:: true if successful, false otherwise

Implements lti::unsupervisedClassifier.

template<class T = l2SquareDistantor<vector<double> >>

virtual classifier* lti::DBScan< T >::clone ( ) const [virtual]

returns a pointer to a clone of this clustering.

Implements lti::classifier.

template<class T = l2SquareDistantor<vector<double> >>

void lti::DBScan< T >::cluster	(	dNode *	nPtr,
		dKdTree *	setOfPoints
	)			`[protected]`

The dbScan algorithm.

walks rekursively through the tree, if one point ist found, the clustering is started

template<class T = l2SquareDistantor<vector<double> >>

DBScan& lti::DBScan< T >::copy ( const DBScan< T > & other )

copy data of "other" clustering.

Parameters:

other

the clustering to be copied

Returns:: a reference to this clustering object

Reimplemented from lti::clustering.

template<class T = l2SquareDistantor<vector<double> >>

virtual bool lti::DBScan< T >::dbScan ( dKdTree & setOfPoints ) [protected, virtual]

DBScan algorithm working with the kdTree as data struckture.

Parameters:

setOfPoints

a kd-Tree containing the data to be clustered

template<class T = l2SquareDistantor<vector<double> >>

bool lti::DBScan< T >::expandCluster	(	dKdTree *	setOfPoints,
		dElement *	point,
		int	clusterID
	)			`[protected]`

expandCluster tries to expand the cluster found, with respect of eps, minPts, density-reachability and desity-connectivity

Parameters:

	setOfPoints	pointer to the used kd-Tree
	point	one point/element of the data stored in the kd-Tree
	clusterID	the ID of the current cluster returns true if the cluster could be expanded

template<class T = l2SquareDistantor<vector<double> >>

const parameters& lti::DBScan< T >::getParameters ( ) const

returns used parameters

Reimplemented from lti::clustering.

template<class T = l2SquareDistantor<vector<double> >>

virtual const char* lti::DBScan< T >::getTypeName ( ) const [virtual]

returns the name of this type ("DBScan")

Reimplemented from lti::clustering.

template<class T = l2SquareDistantor<vector<double> >>

DBScan& lti::DBScan< T >::operator= ( const DBScan< T > & other )

alias for copy member

Parameters:

other

the clustering to be copied

Returns:: a reference to this clustering object

Reimplemented from lti::classifier.

template<class T = l2SquareDistantor<vector<double> >>

virtual bool lti::DBScan< T >::train ( const dmatrix & input ) [virtual]

Unsupervised training.

The vectors in the input matrix will be clustered using each specific method. Train will return false, if no cluster is found.

Parameters:

input

the matrix with the input vectors (each row is a training vector)

Returns:: true if successful, false otherwise. (if false you can check the error message with getStatusString())

Implements lti::clustering.

template<class T = l2SquareDistantor<vector<double> >>

virtual bool lti::DBScan< T >::train	(	const dmatrix &	input,
		ivector &	ids
	)			`[virtual]`

Unsupervised training.

The vectors in the input matrix will be put into groups according to the training algorithm. Additionally, an integer indicating the class each point belongs to is returned.

By default this method uses the other train method train(const dmatrix&) and then calls classify(const dvector&,outputVector&) to get the ids for each train vector. These ids are then returned. Train will return false, if no cluster is found and all points are marked as noise!

Parameters:

	input	the matrix with the input vectors (each row is a training vector)
	ids	vector of class ids for each input point

Returns:: true if successful, false otherwise. (if false you can check the error message with getStatusString())

Reimplemented from lti::clustering.

Member Data Documentation

template<class T = l2SquareDistantor<vector<double> >>

int lti::DBScan< T >::buckedSize [protected]

bucked-size of the kd-tree number of elements stored in one leaf default value: 2

template<class T = l2SquareDistantor<vector<double> >>

int lti::DBScan< T >::clusterID [protected]

clusterID

template<class T = l2SquareDistantor<vector<double> >>

double lti::DBScan< T >::eps [protected]

eps neighbourhood

template<class T = l2SquareDistantor<vector<double> >>

int lti::DBScan< T >::minPts [protected]

minPts

template<class T = l2SquareDistantor<vector<double> >>

dKdTree lti::DBScan< T >::setOfPoints [protected]

kd-tree used for effective search on elements internal used data strukture

The documentation for this class was generated from the following file:

ltiDBScan.h

lti::DBScan< T > Class Template Reference

Classes

Public Member Functions

Protected Member Functions

Protected Attributes

Detailed Description

template<class T = l2SquareDistantor<vector<double> >> class lti::DBScan< T >

Constructor & Destructor Documentation

Member Function Documentation

Member Data Documentation

template<class T = l2SquareDistantor<vector<double> >>
class lti::DBScan< T >