Threshold Moving

Introduction

Balanced class

A class label is being predicted within a classification task. Most of the time, a threshold of 0.5 is used. Larger than 0.5 for one class (1), smaller than 0.5 for another class (0).

Unbalanced class

However, severe class imbalance can be possible for some classification problems. 0.5 - the default threshold would results in terrible performance. Therefore, we need to optimise the threshold to improve the model accuracy.

The causes of unbalanced classes:

The predicted probabilities are not calibrated
The metric used to train the model is different from the evaluation
The class distribution is severely skewed
The cost of one type of misclassification is more important than another type of misclassification.

Probability vs classes

The machine learning or deep learning models are able to output probabilities. However, most of the times, we care about the actual class/label to be output. Therefore, a decision boundary is required

Accuracy maximization

There are different ways to optimize for the best threshold. Say, you can optimise for the best accuracy. This is simple. All you need to do, is to sample all the true positives and false positives and calculate a list of accuracies. Finally, find the threshold that makes the pair of tp and fp maximize the accuracies.

Reference

A Gentle Introduction to Threshold-Moving for Imbalanced Classification

Threshold Moving - Finding optimal threshold for classification

Introduction

Balanced class

Unbalanced class

Probability vs classes

Threshold moving

Accuracy maximization

Reference

CATALOG

FEATURED TAGS

FRIENDS