lti::kNNClassifier Class Reference

Implements a k nearest neighbors search based classifier. More...

#include <ltiKNNClassifier.h>

Inheritance diagram for lti::kNNClassifier:

[legend]

Collaboration diagram for lti::kNNClassifier:

[legend]

List of all members.

Classes
class	parameters
	the parameters for the class kNNClassifier More...
struct	pointInfo
	Information about a feature point. More...
Public Member Functions
	kNNClassifier ()
	kNNClassifier (const kNNClassifier &other)
virtual	~kNNClassifier ()
virtual const char *	getTypeName () const
kNNClassifier &	copy (const kNNClassifier &other)
kNNClassifier &	operator= (const kNNClassifier &other)
virtual classifier *	clone () const
const kNNClassifier::parameters &	getParameters () const
virtual bool	train (const dmatrix &input, const ivector &ids)
virtual bool	train (const dmatrix &input, const ivector &ids, const ivector &pointIds)
virtual bool	trainObject (const dmatrix &input, int &id)
virtual bool	trainObject (const dmatrix &input, int &id, const ivector &pointIds)
virtual bool	trainObjectId (const dmatrix &input, const int id)
virtual bool	trainObjectId (const dmatrix &input, const int id, const ivector &pointIds)
virtual bool	classify (const dvector &feature, outputVector &result) const
virtual bool	classify (const dmatrix &features, outputVector &result) const
virtual bool	classify (const dmatrix &features, dmatrix &result) const
int	getColumnId (const int columnId) const
virtual bool	classify (const dvector &feature, outputVector &result, std::vector< pointInfo > &points) const
virtual bool	nearest (const dvector &feature, pointInfo &nearestPoint) const
virtual bool	write (ioHandler &handler, const bool complete=true) const
virtual bool	read (ioHandler &handler, const bool complete=true)
void	build ()
void	clear ()
Protected Types
typedef std::map< int, int >	idMap_type
typedef kdTree< dvector, std::pair< int, int > >	treeType
Protected Member Functions
void	buildIdMaps (const ivector &ids)
void	defineOutputTemplate ()
bool	classify (const dvector &feature, outputVector &output, std::multimap< double, treeType::element * > &resList) const
Static Protected Member Functions
Reliability weighting functions

static double	linear (const double &ratio, const double &threshold)
static double	exponential (const double &ratio, const double &threshold)
Protected Attributes
idMap_type	idMap
idMap_type	rIdMap
int	nClasses
treeType	databaseTree
std::vector< double >	classWeight
int	minPointsPerClass
int	maxPointsPerClass

Detailed Description

Implements a k nearest neighbors search based classifier.

The simplest case of a k-nearest neighbor classifier is for k=1, also known as a nearest neighbor classifier, which assings as winner class for a test point x the one belonging to the nearest sample point.

For k>1, a k nearest neighbor classifier assigns to a point x the class most represented in the k nearest neighbors. In the simplest case, each of the k nearest sample points vote with the same weight for their respective classes. In more sophisticated cases, each point will vote with a weight depending on the total number of sample points of its class and/or the ratio between the distance of the test point to the winner sample and the distance of the test point to the first sample point belonging to another class.

At this point, only the use of Euclidean distance is allowed.

This classifier uses a kd-Tree to perform the search in an efficient manner, but shows therefore also the drawbacks of a normal kd-Tree: it is not suitable for higher dimensional spaces. If you use high-dimensional spaces, maybe you should try increasing the bucket size, or activating the best-bin-first mode, which is a suggestion of David Lowe to get an aproximative solution (accurate enough) in much less time.

This classificator differs a little bit from the other LTI-Lib classificators. Since the whole training set is stored as sample points, it is useful in many applications to obtain, besides the winner class, the exact winner samples. Therefore, this class administrates two sets of ID numbers. One set for the classe IDs, which is used the same way than all other LTI-Lib classifiers, and a second set that administrates IDs for each sample point. This second set can be explicitelly given, or generated automatically. You can then for example use some tables containing additional information for each winner point, that are accessed using the point ID.

Example:

 #include <iostream>
 #include <fstream>

 #include "ltiKNNClassifier.h"
 #include "ltiLispStreamHandler.h"

 // ...

 double inData[] = {-1,-1,
                    -1, 0,
                    -1,+1,
                    +0,+1,
                    +1,+1,
                    +1,+0,
                    +1,-1,
                    +0,-1,
                    +0,+0};

 lti::dmatrix inputs(9,2,inData);     // training vectors

 int idsData[] = {1,0,2,1,0,2,1,0,1}; // and the respective ids
 lti::ivector ids(9,idsData);

 lti::kNNClassifier knn;  // our nearest neighbor classifier

 lti::kNNClassifier::parameters param;
 param.useReliabilityMeasure = false;
 knn.setParameters(param);

 // we want to see some info while training
 streamProgressInfo prog(std::cout);
 knn.setProgressObject(prog);

 // train the network
 knn.train(inputs,ids);

 // let us save our network for future use
 // in the file called mlp.dat
 std::ofstream out("knn.dat");
 lti::lispStreamHandler lsh(out);

 // save the network
 knn.write(lsh);
 // close the file
 out.close();

 // show some results with the same training set:

 lti::kNNClassifier::outputVector outv; // here we will get some
                                        // classification results
 std::cout << std::endl << "Results: " << std::endl;

 int id;
 dvector sample(2,0.0);
 // generate some points and check which would be the winner class
 for (sample.at(0) = -1; sample.at(0) <= 1; sample.at(0)+=0.25) {
   for (sample.at(1) = -1; sample.at(1) <= 1; sample.at(1)+=0.25) {
 
     knn.classify(sample,outv);
     std::cout << "Input " << sample << " \tOutput: ";
     outv.getId(outv.getWinnerUnit(),id);
     std::cout << id;
     std::cout << std::endl;
   }
 }

Member Typedef Documentation

typedef std::map<int,int> lti::kNNClassifier::idMap_type [protected]

Type for maps mapping ids from internal to external and viceversa.

typedef kdTree< dvector, std::pair<int,int> > lti::kNNClassifier::treeType [protected]

Exact kd-Tree type used for the database.

The data in the tree is composed by two id numbers:

The first component contains the object or class id
The second component contains the point id, since using kNN it can be necessary to know exactly which points in the training set were the winners.

Constructor & Destructor Documentation

lti::kNNClassifier::kNNClassifier ( )

default constructor

lti::kNNClassifier::kNNClassifier ( const kNNClassifier & other )

copy constructor

Parameters:

other

the object to be copied

virtual lti::kNNClassifier::~kNNClassifier ( ) [virtual]

destructor

Member Function Documentation

void lti::kNNClassifier::build ( )

Finish a training process.

If you used the methods trainObject() or trainObjectId() you must call this method in order to complete the training process.

If you used one of the train() methods, you must avoid calling this method.

Remember that the "incremental" training mode with trainObject() or trainObjectId() cannot be combined with a "at-once" training step using the method train().

See also:: trainObject(),trainObjectId,reset()

void lti::kNNClassifier::buildIdMaps ( const ivector & ids ) [protected]

mapping of the ids

bool lti::kNNClassifier::classify	(	const dvector &	feature,
		outputVector &	output,
		std::multimap< double, treeType::element * > &	resList
	)			const `[protected]`

Helper for classification.

virtual bool lti::kNNClassifier::classify	(	const dvector &	feature,
		outputVector &	result,
		std::vector< pointInfo > &	points
	)			const `[virtual]`

Classification.

Classifies the feature and returns the outputVector object with the classification result.

NOTE: This method is NOT really const. Although the main members of the kNNClassifier are not changed some state variables used for efficiency are. Thus, it is not save to use the same instance of the kNNClassifier in two different threads.

Parameters:

	feature	pattern to be classified
	result	of the classifications as a classifier::outputVector
	points	vector sorted in increasing order of the distances to the feature point and containing two ID numbers. The first one corresponds to the class id, the second one to the point id. Also a const pointer to the feature point of the train set and the distance to that point are contained in the pointInfo structure

Returns:: true if the classification has been successful

virtual bool lti::kNNClassifier::classify	(	const dmatrix &	features,
		dmatrix &	result
	)			const `[virtual]`

Classification.

Classifies all features (the rows of the matrix) and returns for each of them a vector of unnormalized probabilities, coded in the rows of the result matrix.

Since no classifier::outputVector is constructed, only the classification "raw data" is produced.

This method is used in recognition tasks based on many local hints, for which the individual classification of each feature vector would cost too much time.

Each column of the output matrix represent one object. To get the id represented by a vector column you can use the outputTemplate of the classifier:

 kNNClassifier knn;

 knn.train(data);

 int columnId = knn.getOutputTemplate().getIds().at(columnNumber);

or the shortcut method of this class getColumnId()

NOTE: This method is NOT really const. Although the main members of the kNNClassifier are not changed some, state variables used for efficiency are. Thus, it is not save to use the same instance of the kNNClassifier in two different threads.

Parameters:

	features	patterns to be classified each row is one feature
	result	of the classifications as a classifier::outputVector

Returns:: true if the classification has been successful

virtual bool lti::kNNClassifier::classify	(	const dmatrix &	features,
		outputVector &	result
	)			const `[virtual]`

Classification.

Classifies all features (the rows of the matrix) and returns the outputVector object with the classification result.

The classification will be the accumulation of the voting for all given points, assuming that they all belong to the same class.

NOTE: This method is NOT really const. Although the main members of the kNNClassifier are not changed some state variables used for efficiency are. Thus, it is not save to use the same instance of the kNNClassifier in two different threads.

Parameters:

	features	patterns to be classified each row is one feature
	result	of the classifications as a classifier::outputVector

Returns:: true if the classification has been successful

virtual bool lti::kNNClassifier::classify	(	const dvector &	feature,
		outputVector &	result
	)			const `[virtual]`

Classification.

Classifies the feature and returns the outputVector object with the classification result.

NOTE: This method is NOT really const. Although the main members of the kNNClassifier are not changed some state variables used for efficiency are. Thus, it is not save to use the same instance of the kNNClassifier in two different threads.

Parameters:

	feature	pattern to be classified
	result	of the classifications as a classifier::outputVector

Returns:: true if the classification has been successful

Implements lti::supervisedInstanceClassifier.

void lti::kNNClassifier::clear ( )

Resets all values and deletes the content.

If you want to forget the sample points and start giving new points with trainObject, you need to call this method first

virtual classifier* lti::kNNClassifier::clone ( ) const [virtual]

returns a pointer to a clone of this clustering.

Implements lti::classifier.

kNNClassifier& lti::kNNClassifier::copy ( const kNNClassifier & other )

copy data of "other" clustering.

Parameters:

other

the clustering to be copied

Returns:: a reference to this clustering object

Reimplemented from lti::supervisedInstanceClassifier.

void lti::kNNClassifier::defineOutputTemplate ( ) [protected]

Define the output template.

int lti::kNNClassifier::getColumnId ( const int columnId ) const [inline]

Shortcut method to comfortably access the object id for the column of the result matrix of the classify(const dmatrix&,dmatrix&) method.

It returns a negative value if the input column index is invalid.

References lti::genericVector< T >::at(), lti::classifier::outputTemplate::getIds(), lti::classifier::getOutputTemplate(), and nClasses.

const kNNClassifier::parameters& lti::kNNClassifier::getParameters ( ) const

returns used parameters

Reimplemented from lti::supervisedInstanceClassifier.

virtual const char* lti::kNNClassifier::getTypeName ( ) const [virtual]

returns the name of this type ("kNNClassifier")

Reimplemented from lti::supervisedInstanceClassifier.

virtual bool lti::kNNClassifier::nearest	(	const dvector &	feature,
		pointInfo &	nearestPoint
	)			const `[virtual]`

Get only the nearest point to the given vector.

Sometimes it is not necessary to have the probability distribution for the objects computed with the classify() methods. Only the nearest point can be of interest. This method provides an efficient way to just search for the nearest point and obtain its data.

Parameters:

	feature	reference multidimensional point
	nearestPoint	nearest point in the training set to the presented point.

Returns:: true if search was successful, false otherwise.

kNNClassifier& lti::kNNClassifier::operator= ( const kNNClassifier & other )

alias for copy member

Parameters:

other

the clustering to be copied

Returns:: a reference to this clustering object

Reimplemented from lti::supervisedInstanceClassifier.

virtual bool lti::kNNClassifier::read	(	ioHandler &	handler,
		const bool	complete = `true`
	)			`[virtual]`

read the classifier from the given ioHandler

Parameters:

	handler	the ioHandler to be used
	complete	if true (the default) the enclosing begin/end will be also written, otherwise only the data block will be written.

Returns:: true if write was successful

Reimplemented from lti::classifier.

virtual bool lti::kNNClassifier::train	(	const dmatrix &	input,
		const ivector &	ids,
		const ivector &	pointIds
	)			`[virtual]`

Supervised training.

The vectors in the input matrix are arranged row-wise, i.e. each row contains one data vector. The ids vector contains the class label for each row.

This is an alternative method to trainObject(). You cannot add further objects after you have called train(), nor can you call train() after calling trainObject(), since all data provided with trainObject() would be removed. In other words, you must decide if you want to supply all objects separately or if you want to give them all simultaneously, but you cannot combine both methods.

As ids for each feature point the index of the corresponding values will be used.

Parameters:

	input	the matrix with the input vectors (each row is a training vector)
	ids	vector of class ids for each input point
	pointIds	vector containing the ids for each single feature point of the training set.

Returns:: true if successful, false otherwise. (if false you can check the error message with getStatusString())

virtual bool lti::kNNClassifier::train	(	const dmatrix &	input,
		const ivector &	ids
	)			`[virtual]`

Supervised training.

The vectors in the input matrix are arranged row-wise, i.e. each row contains one data vector. The ids vector contains the class label for each row.

This is an alternative method to trainObject(). You cannot add further objects after you have called train(), nor can you call train() after calling trainObject(), since all data provided with trainObject() would be removed. In other words, you must decide if you want to supply all objects separately or if you want to give them all simultaneously, but you cannot combine both methods.

As ids for each feature point the index of the corresponding matrix row will be used.

Parameters:

	input	the matrix with the input vectors (each row is a training vector)
	ids	vector of class ids for each input point

Returns:: true if successful, false otherwise. (if false you can check the error message with getStatusString())

Implements lti::supervisedInstanceClassifier.

virtual bool lti::kNNClassifier::trainObject	(	const dmatrix &	input,
		int &	id,
		const ivector &	pointIds
	)			`[virtual]`

Adds an object to this classifier.

The id is given automatically and returned in the parameter.

After you have trained several objects, you must call the build() method to finish the training process. If you don't do it, the classifier will ignore everything you have provided.

This is an alternative method to train(). You cannot add further objects after you have called train, nor can you call train() after calling this method, since all data provided with trainObject would be removed. In other words, you must decide if you want to supply all objects separately or if you want to give them all simultaneously, but you cannot combine both methods.

Note that the difference with the method trainObjectId() is that here you will receive as a returned argument the id assigned to the object, while in the method trainObjectId() you decide which id should be used for the given object.

Parameters:

	input	each row of this matrix represents a point in the feature space belonging to one single class.
	id	this id will be used for the class represented by the points in the input matrix.
	pointIds	each point in the input matrix will have its own id given by the entries of this vector, which must have a size equal to the number of rows of input.

Returns:: true if successful, false otherwise.

virtual bool lti::kNNClassifier::trainObject	(	const dmatrix &	input,
		int &	id
	)			`[virtual]`

Adds an object to this classifier.

The id is given automatically and returned in the parameter.

After you have trained several objects, you must call the build() method to finish the training process. If you don't do it, the classifier will ignore everything you have provided.

This is an alternative method to train(). You cannot add further objects after you have called train, nor can you call train() after calling this method, since all data provided with trainObject would be removed. In other words, you must decide if you want to supply all objects separately or if you want to give them all simultaneously, but you cannot combine both methods.

Note that the difference with the method trainObjectId() is that here you receive as as returned argument the id assigned to the object, while in the method trainObjectId() you decide which id should be used for the given object.

As id for each point in the given matrix will be used the row index plus the number of points trained until now, i.e. just the successive numeration of each sample point will be continued.

Parameters:

	input	each row of this matrix represents a point in the feature space belonging to one single class.
	id	this id will be used for the class represented by the points in the input matrix.

Returns:: true if successful, false otherwise.

virtual bool lti::kNNClassifier::trainObjectId	(	const dmatrix &	input,
		const int	id,
		const ivector &	pointIds
	)			`[virtual]`

Adds an object to this classifier.

The object ID is given by the user.

After you have trained several objects, you must call the build() method to finish the training process. If you don't do it, the classifier will ignore everything you have provided.

This is an alternative method to train(). You cannot add further objects after you have called train, nor can you call train() after calling this method, since all data provided with trainObject would be removed. In other words, you must decide if you want to supply all objects separately or if you want to give them all simultaneously, but you cannot combine both methods.

Note that the difference with the method trainObject() is that here you directly provide the id to be used for the object, while the trainObject() method returns one id that is computed automatically.

Parameters:

	input	each row of this matrix represents a point in the feature space belonging to one single class.
	id	this id will be used for the class represented by the points in the input matrix.
	pointIds	each point in the input matrix will have its own ID, given by the entries in this vector, which must have the same size than the number of rows of input.

Returns:: true if successful, false otherwise.

virtual bool lti::kNNClassifier::trainObjectId	(	const dmatrix &	input,
		const int	id
	)			`[virtual]`

Adds an object to this classifier.

The object ID is given by the user.

After you have trained several objects, you must call the build() method to finish the training process. If you don't do it, the classifier will ignore everything you have provided.

This is an alternative method to train(). You cannot add further objects after you have called train, nor can you call train() after calling this method, since all data provided with trainObject would be removed. In other words, you must decide if you want to supply all objects separately or if you want to give them all simultaneously, but you cannot combine both methods.

Note that the difference with the method trainObject() is that here you directly provide the id to be used for the object, while the trainObject() method returns one id that is computed automatically.

As id for each point in the given matrix will be used the row index plus the number of points trained until now, i.e. just the successive numeration of each sample point will be continued.

Parameters:

	input	each row of this matrix represents a point in the feature space belonging to one single class.
	id	this id will be used for the class represented by the points in the input matrix.

Returns:: true if successful, false otherwise.

virtual bool lti::kNNClassifier::write	(	ioHandler &	handler,
		const bool	complete = `true`
	)			const `[virtual]`

write the classifier in the given ioHandler

Parameters:

	handler	the ioHandler to be used
	complete	if true (the default) the enclosing begin/end will be also written, otherwise only the data block will be written.