lti::classifier::outputTemplate Class Reference

The outputTemplate stores the relation between the different positions (sometimes called internal ids) of a classification result and the ids. More...

#include <ltiClassifier.h>

Inheritance diagram for lti::classifier::outputTemplate:

[legend]

Collaboration diagram for lti::classifier::outputTemplate:

[legend]

List of all members.

Public Types
enum	eMultipleMode { Ignore = 0, Max, Uniform, ObjProb }
Public Member Functions
	outputTemplate ()
	outputTemplate (const outputTemplate &other)
	outputTemplate (const ivector &theIds)
	outputTemplate (const int &size)
outputTemplate &	copy (const outputTemplate &other)
outputTemplate &	operator= (const outputTemplate &other)
outputTemplate *	clone () const
void	setMultipleMode (const eMultipleMode &mode)
eMultipleMode	getMultipleMode () const
bool	setIds (const ivector &theIds)
const ivector &	getIds () const
bool	setProbs (const int &pos, const ivector &theIds, const dvector &theValues)
bool	setProbs (const int &pos, const outputVector &outV)
const outputVector &	getProbs (const int &pos) const
bool	setData (const int &pos, const int &realId, const outputVector &outV)
int	size () const
bool	apply (const dvector &data, outputVector &result) const
bool	write (ioHandler &handler, const bool complete=true) const
bool	read (ioHandler &handler, const bool complete=true)
Protected Attributes
eMultipleMode	multipleMode
std::vector< outputVector >	probList
ivector	defaultIds

Detailed Description

The outputTemplate stores the relation between the different positions (sometimes called internal ids) of a classification result and the ids.

Applying the outputTemplate to such a vector results in an outputVector which is not to be confused with the classification result.

There are two data structures within the outputTemplate storing the relevant data:

A simple list of ids, one for each element of the classification result. These are used, when the parameter multipleMode is set to Ignore. If Ignore is set, but the data is not available, the multipleMode is set to Max temporarily.
For each element of the classification result exists a list of ids and respective probabilities. They state that, when that element is activated there is a certain probability that an input belonging to the class of the id was presented. These probabilities are usually generated by classifying a data-set and generating a probability distribution of the ids for the element of the classification result with the highest value. This data is used for all values of multipleMode but Ignore. If the data is not available, multipleMode is set to Ignore temprarily.

The calculation of the outputVector using the apply method depends on the value of the parameter multipleType, which is of type eMultipleType. The following settings are available:

Ignore: If default ids have been stored in the outputTemplate via the constructor that receives an ivector, the method setIds or setData these ids are simply copied to the outputVector. I.e. no statistics about the actual classification performance of the classifier are used. If the data is not set, the option Max is used and false is returned by the apply method.
Max: The probability lists are used. For each element of the classification result, the id with the highest probability is found and set to one while all other probabilities for that element are set to zero. This leads to an outputVector which is equal or similar to the one generated by Ignore. There will be differences, however, if a certain element of the classification result was trained for one class, but when building the probability distributions another class caused this element to have the highest value more frequently. This case can be seen for the second element in the example below.
Uniform: The probability lists are used. For each of the classification result the number of ids in the list is found and their probabilities set to be uniformly distributed. This method puts very little trust in the data used for generating the probabilities, i.e. that it represents the true distribution of the data. On the other hand, it is very susceptible to noise in the data: One misclassified example can completely change the outcome of future classifications.
ObjProb: The probablity lists are used. The complete information is used. This has a functionality similar to a rule set: If element A is activated, then there is a probability of 0.3 for class 1 and 0.7 for class 5. This method works quite well, when the training data represents the actual distributions quite well, but the classifier is not able to build the correct models. A typical effect of using this approach rather than Ignore is that misclassified unknown data will have greater probability and thus a higher ranking. On the downside, sometimes data correctly classified by Ignore can be just misclassified.

As mentioned above for all cases but Ignore, the outputTemplate contains a list of class probabilities for each element of the classification result. These are interpreted as dependent probablities: P(o|x) where o stands for the id and x for the position in the classification result. Each element of the classification results is also taken as a probability p(x). Thus the values for each id are calculated as $P(o)=\sum_x p(x)\cdot P(o|x)$ .

Here is a short example for the behavior of an outputTemplate when applied to a classification result. The figure shows the classification result on the lefthand side, the default ids which are used with the option Ignore in the middle and the probabiltity lists which are used for Max, Uniform and ObjProb on the righthand side.

Depending on the value of multipleMode the following outputVector is generated by calling apply:

	1	3	5	6	17	22	41
Ignore	0.15	0.50	---	---	0.03	0.30	0.02
Max	0.15	---	---	0.50	0.03	0.30	0.02
Uniform	0.15	0.35	0.10	0.25	0.04	0.10	0.01
ObjProb	0.15	0.33	0.05	0.27	0.04	0.15	0.01

If the use of all four options is desired, the constructor outputTemplate(int) receiving an int value must be used. All data can be set using methods setIds, setProbs and/or setData. If the other constructors are used, no space is reserved for the lists of probabilities, since these take much space and some, especially unsupervised, classifiers do not need or have no means to gather this information.

Member Enumeration Documentation

enum lti::classifier::outputTemplate::eMultipleMode

This type specifies how the output element probability and the probabilities in the list should be combined.

See description of outputTemplate.

Enumerator:

Ignore	ignore the object probability
Max	set the prob of the id with max prob to 1, others to zero.
Uniform	assume that all objects in the list of one output element have the same probability (1/number of elements).
ObjProb	consider the given object probabilities

Constructor & Destructor Documentation

lti::classifier::outputTemplate::outputTemplate ( )

Default constructor.

multipleMode is ObjProb.

lti::classifier::outputTemplate::outputTemplate ( const outputTemplate & other )

Copy constructor.

lti::classifier::outputTemplate::outputTemplate ( const ivector & theIds )

Constructor.

Since a vector of ids is given multipleMode is Ignore and the probability lists are not initialized and thus cannot be set later.

lti::classifier::outputTemplate::outputTemplate ( const int & size )

Constructor.

The number of output units is given. multipleMode is ObjeProb. Default ids as well as lists of probabilities can be set.

Member Function Documentation

bool lti::classifier::outputTemplate::apply	(	const dvector &	data,
		outputVector &	result
	)			const

Uses the information stored in the outputTemplate to generate an outputVector from a dvector.

See description of outputTemplate for details. The classification result should contain only positive values which are greater for better fit. The best interpretability is obtained if data is a probability distribution.

Parameters:

	data	the classification result
	result	outputVector calculted using the outputTemplate.

Returns:: false on error (check getStatusString())

outputTemplate* lti::classifier::outputTemplate::clone ( ) const

clone

outputTemplate& lti::classifier::outputTemplate::copy ( const outputTemplate & other )

copy

Reimplemented from lti::ioObject.

Referenced by operator=().

const ivector& lti::classifier::outputTemplate::getIds ( ) const

Returns a const reference to the id vector.

Referenced by lti::kNNClassifier::getColumnId().

eMultipleMode lti::classifier::outputTemplate::getMultipleMode ( ) const

Get the setting of multipleMode.

const outputVector& lti::classifier::outputTemplate::getProbs ( const int & pos ) const

Returns a const reference to the probability distribution at the given position of the template.

outputTemplate& lti::classifier::outputTemplate::operator= ( const outputTemplate & other ) [inline]

assigment operator (alias for copy(other)).

Parameters:

other

the outputTemplate to be copied

Returns:: a reference to the actual outputTemplate

Reimplemented from lti::ioObject.

References copy().

bool lti::classifier::outputTemplate::read	(	ioHandler &	handler,
		const bool	complete = `true`
	)			`[virtual]`

read the outputTemplate from the given ioHandler

Parameters:

	handler	the ioHandler to be used
	complete	if true (the default) the enclosing begin/end will be also written, otherwise only the data block will be written.

Returns:: true if write was successful

Reimplemented from lti::ioObject.

bool lti::classifier::outputTemplate::setData	(	const int &	pos,
		const int &	realId,
		const outputVector &	outV
	)

Set the probabilities and the default id of one unit.

This information must be set for all elements of the classification result. Then is can be used by the apply method for any value of multipleMode.

Parameters:

	pos	the posision in the classification result this distribution is for.
	realId	the expected or desired id of this posision of the classification result.
	outV	list of ids and corresponding probabilities of classes possibly correct, when this position has high probability.

Returns:: false on error, e.g. illegal pos

bool lti::classifier::outputTemplate::setIds ( const ivector & theIds )

Set the default id vector.

These are used when multipleMode is set to Ignore.

void lti::classifier::outputTemplate::setMultipleMode ( const eMultipleMode & mode )

Change the setting of how the object probabilities of each unit are taken into account when calculating the outputVector.

See description of outputTemplate.

bool lti::classifier::outputTemplate::setProbs	(	const int &	pos,
		const outputVector &	outV
	)

Set the probabilities of one unit.

This information must be set for all elements of the classification result. Then is can be used by the apply method when multipleMode is set to one of Max, Uniform or ObjProb.

Parameters:

	pos	the posision in the classification result this distribution is for-
	outV	list of ids and corresponding probabilities of classes possibly correct, when this position has high probability.

Returns:: false on error, e.g. illegal pos

bool lti::classifier::outputTemplate::setProbs	(	const int &	pos,
		const ivector &	theIds,
		const dvector &	theValues
	)

Set the probabilities of one unit.

This information must be set for all elements of the classification result. Then is can be used by the apply method when multipleMode is set to one of Max, Uniform or ObjProb.

Parameters:

	pos	the posision in the classification result this distribution is for-
	theIds	list of ids of classes possibly correct, when this position has high probability.
	theValues	probabilities of each of these ids.

Returns:: false on errer, e.g. illegal pos

int lti::classifier::outputTemplate::size ( ) const

returns the number of output units handled by this outputTemplate

bool lti::classifier::outputTemplate::write	(	ioHandler &	handler,
		const bool	complete = `true`
	)			const `[virtual]`

write the outputTemplate in the given ioHandler

Parameters:

	handler	the ioHandler to be used
	complete	if true (the default) the enclosing begin/end will be also written, otherwise only the data block will be written.