Classification is a common machine learning task. This is where we have a data set of labelled examples with which we build a model that can then be used to (hopefully accurately!) assign a class to new unlabelled examples. There are various points at which we might want to test the performance of the model. Initially we might tune parameters or hyperparameters using cross validation, then check the best performing models on the test set. If putting the model into production we may also want to test it on live data, we might even use different evaluation measures at different stages of this process. This article discusses some frequently used measures for evaluating the performance of classification models.