Category Archives: Data Mining

A quick introduction to k-NN (Part 1)

One of the oldest and most popular classification algorithms is nearest neighbors algorithm. It’s also one of the easiest algorithms to understand so is a good place to start when learning about data mining algorithms. Part one of this article provides a brief introduction to, and overview of, k-NN. Part two will demonstrate an implementation of it in r.

Essentially the nearest neighbors algorithm is based on the premise that the more features objects have in common the more likely they are to belong to the same class. Nearest neighbors is a non-parametric method so it is not reliant on assumptions about the underlying distribution of the data set. It is called a lazy learning method because unlike most classification algorithms it does not attempt to model the data set. Instead test cases are compared to other cases in the data set to determine their class. Continue reading