“In machine learning, multiclass or multinomial classification is the problem of classifying instances into one of three or more classes (classifying instances into one of two classes is called binary classification). (Ref: Wiki)”
As an evaluation scheme of the multiclass classification, you can compute Micro- and Macro- metric scores. E.g. precision scores can be computed as the following:
Suppose the result of the multiclass classification is the following:
Class | True Positive (TP) | False Positive (FP) |
A | 1 | 1 |
B | 10 | 90 |
C | 1 | 1 |
D | 1 | 1 |
The Macro-precision is simply the average of precisions per class. That is, (0.5 + 0.1 + 0.5 + 0.5) / 4 = 0.4.
The Micro-precision computes the overall performance, e.g. sum(TP) / (sum(TP) + sum(FP)). That is, 13 / 106 = 0.123.
In this example, Macro-precision > Micro-precision. This is because the classifier performed well on minority classes (e.g., A, C, D) but performed poorly on the majority class (e.g., B). Macro-precision do not consider the number of instances in each class, which means thought the classifier performed poorly on most of the instances, in terms of Macro- level, the performance (0.4) is not so bad. However, Micro-precision takes account the number of instances, which means, since the classifier performed poorly on the class with majority number of instances, Micro-precision score is low (0.123).
Take a look at another example. Suppose the result of the multiclass classification is the following:
Class | True Positive (TP) | False Positive (FP) |
A | 1 | 1 |
B | 90 | 10 |
C | 1 | 1 |
D | 1 | 1 |
The Macro-precision is simply the average of precisions per class. That is, (0.5 + 0.9 + 0.5 + 0.5) / 4 = 0.6.
The Micro-precision computes the overall performance, e.g. sum(TP) / (sum(TP) + sum(FP)). That is, 93 / 106 = 0.877.
In this example, Macro-precision < Micro-precision. This is because the classifier performed well on the majority class (e.g., B).
Reference
https://datascience.stackexchange.com/a/24051