Sometimes people assume that data science and related areas are all about consumer facing businesses trying to learn ways to sell more product. I’m not sure whether this viewpoint is more naive or more cynical. Data science is about translating raw data into knowledge and insight to enable better decision making. Therefore it has wide and varied application. Data science and related fields like data mining, machine learning and big data have huge potential to drive innovation in social enterprise and business for social good.
Education too is being impacted by the data revolution. This isn’t something that may happen in the future. It has already begun. Schools in America are already using systems that combine data points like attendance and grades to predict which students are at risk of school dropout years in advance. Continue reading
One of the oldest and most popular classification algorithms is nearest neighbors algorithm. It’s also one of the easiest algorithms to understand so is a good place to start when learning about data mining algorithms. Part one of this article provides a brief introduction to, and overview of, k-NN. Part two will demonstrate an implementation of it in r.
Essentially the nearest neighbors algorithm is based on the premise that the more features objects have in common the more likely they are to belong to the same class. Nearest neighbors is a non-parametric method so it is not reliant on assumptions about the underlying distribution of the data set. It is called a lazy learning method because unlike most classification algorithms it does not attempt to model the data set. Instead test cases are compared to other cases in the data set to determine their class. Continue reading