This is part 2 of a clustering demo in R. You can read Part 1 here which deals with assessing clustering tendency of the data and deciding on cluster number. This part looks at performing clustering using Partitioning Around Medoids algorithm and validating the results. Continue reading
The last article provided a brief introduction to clustering. This one demonstrates how to conduct a basic clustering analysis in the statistical computing environment R (I have actually split it into 2 parts as it got rather long!). For demos like this it is easiest to use a small data set, ideally with few features relative to instances. The one used in this example is the Acidosis Patients data set available from this collection of clustering data sets. This data set has 40 instances, each corresponding to a patient and 6 features each corresponding to a measurement of blood or cerebrospinal fluid. Continue reading
Cluster analysis more usually referred to as clustering, is a common data mining task. In clustering the goal is to divide the data set into groups so that objects in the same group are similar to one another while objects in different groups are different to one another. In other words the goal is to minimize the intra-cluster distance while maximizing the inter-cluster distance.
The General Data Protection Regulation (GDPR) is a new data protection regulation that will be effective across the EU from 25th May 2018. The GDPR applies to all companies that process data of EU citizens regardless of where the companies are based. It replaces the Directive 95/46/EC normally referred to as the Data Protection Directive which dates back to the 1990’s.
Data Science, Data Mining, Machine Learning, Artificial Intelligence, Big Data … the list goes on. All terms that eager cheerleaders of the data revolution are highlighting that organisations need to embrace. With all the hype and attention it’s not surprising that businesses feel they need to be become more data-driven or risk losing competitive advantage. And definitely there is substance to the hype otherwise companies like IBM wouldn’t be pouring literally billions of dollars in investment into their big data capabilities.
(Note: This article discusses Bayesian and Frequentist statistics and follows from this previous one). Parapsychology has played an important role in ensuring that psychology retains at least some focus on anomalous human experiences. These experiences are very common and if psychology is truly to be the science of behavior and mental processes then it needs to take account of them. In addition to posing legitimate questions to materialist reductionist orthodoxy, parapsychology has also made contributions to scientific methodology in areas like study design, statistical inference and meta-analysis.
I have been interested in parapsychology ever since picking up a copy of the excellent Eysenck and Sargent book – Explaining the Unexplained many years ago. It’s a bit dated now but still a great introduction for anyone interested in learning more about the topic.
Just like with data science there is sometimes confusion about what parapsychology is. Perhaps it’s easier to start with what it’s not. It’s not astrology, ghost busting, monster hunting, fortune telling or investigating UFO sightings though these are things that often come to mind when one thinks of the paranormal mainly because of the influence of television shows on the ‘paranormal’.
There is huge hype at the moment about Data Science and it seems like everybody is trying to get in on the game. While hype might help bring the topic to popular attention, it can also serve to obscure and confuse. What do people mean when they talk about data science? Is it all just hype? Continue reading
Data mining is the process of finding interesting patterns in large quantities of data. This process of knowledge discovery involves various steps, the most obvious of these being the application of algorithms to the data set to discover patterns as in, for example, clustering. The goal of clustering is to find natural groups in the data so that objects in a group are similar to other objects in the same group and dissimilar to objects in other groups. When implementing clustering (and also some classification algorithms such as k nearest neighbours), it is important to be able to quantify the proximity of objects to one another. Continue reading
It’s well known that data preparation is often the most time consuming part of any data science project. Before you can start doing much with the data though it first needs to be imported into your analysis software, something which can sometimes prove more tricky than anticipated. Continue reading