P. Long and

A *consistent* loss function for multiclass classification is one such
that for any source of labeled examples, any tuple
of scoring functions that
minimizes the expected loss will have classification accuracy close to that
of the Bayes optimal classifier. While consistency has been proposed as a
desirable property for multiclass loss functions, we
give experimental and theoretical results exhibiting a
sequence of linearly separable data sources with the following property:
a multiclass classification algorithm which optimizes a loss function
due to Crammer and Singer (which is known *not* to be consistent) produces
classifiers whose expected error goes to 0, while the expected error
of an algorithm which optimizes a generalization of the loss
function used by LogitBoost (a loss function which is known to be consistent)
is bounded below by a positive constant.

We identify a property of a loss function, *realizable
consistency with respect to a restricted class of scoring functions*,
that accounts for this difference. As our main technical results we show
that the Crammer--Singer loss function is
realizable consistent for the class of linear scoring functions, while
the generalization of LogitBoost is not. Our result for LogitBoost is
a special case of a more general theorem that applies to several other
loss functions that have been proposed for multiclass classification.