# Statistical Paradigms – Bayesian and Frequentist

(Note: This article discusses Bayesian and Frequentist statistics and follows from this previous one). Parapsychology has played an important role in ensuring that psychology retains at least some focus on anomalous human experiences. These experiences are very common and if psychology is truly to be the science of behavior and mental processes then it needs to take account of them. In addition to posing legitimate questions to materialist reductionist orthodoxy, parapsychology has also made contributions to scientific methodology in areas like study design, statistical inference and meta-analysis.

Some statisticians have argued that positive results in parapsychology experiments are symptomatic of a wider problem in how psychologists do their research and analyse their data. They may have a point in some respects. Undergraduate psychology students in nearly all universities are taught statistical inference through a classical, also called frequentist, paradigm. There are other statistical paradigms, chief among them being Bayesian Statistics. These two approaches differ in their philosophical assumptions and methods.

Frequentist Approach
Frequentist statistics is so called because it’s approach to probability considers the frequency of occurrence of the outcome of a random event repeated many times. In this case probability is objectively determined according to properties of the physical world.

Frequentism has been the dominant statistical approach since at least the early 20th century and is still taught to psychology undergraduates as ‘the’ way to do hypothesis testing in many universities. This approach to statistical inference proceeds as follows …

• Two hypotheses are specified- the experimental hypothesis $H_1$ which is the hypothesis under investigation and the null hypothesis $H_0$ which states that no effect, relationship or association exists.
• A test statistic is calculated based on the observed data and then using the known or estimated distribution of the test statistic a p value is calculated. The p value represents the probability of observing a test statistic at least as extreme as that observed in favour of $H_1$ assuming that $H_0$ is true.
• The p value is compared to a prespecified alpha ($\alpha$) level. Usually $\alpha$ is set to 0.05, though this is an arbitrary value. The idea is that if the p value is smaller than this then the probability of observing these results, or results at least as extreme, when $H_0$ is true is small and therefore $H_0$ is rejected i.e. the results are unlikely to have arisen by chance . On the other hand if the p value is above $\alpha$ then the probability of observing the results given $H_0$ is true is not considered small enough and $H_0$ is not rejected.

So in frequentist inference parameters are treated as fixed but unknown and data is treated as variable. Parameters are estimated using samples from a population and the probability of data given the hypothesis is calculated.

For a 2 tailed test the null hypothesis is rejected if the test statistic is in the shaded region at either end of the distribution.

Some Points to Note

• Because p values are calculated based on the probability of observing values at least as extreme as the test statistic given $H_0$ is true, they depend on data never actually observed but only hypothesised.
• Secondly p values depend on the experimenter intentions and sampling plan. If an experimenter doesn’t specify sample size prior to carrying out a study then one sure way to reject the null hypothesis is to test participants one at a time, calculating the new p value after each test and continuing testing until a p value < $\alpha$ is achieved. It can be shown that for any $\alpha > 0$ this approach (optional stopping) will result in rejection of $H_0$ at some point.
• Thirdly p values do not measure the weight of evidence for a phenomenon. Because p values are strongly dependent on sample size, two studies reporting the same p value do not necessarily convey the same evidence. One needs to take effect size into account also.

Bayesian Approach

The Bayesian approach to statistics considers that probability occurs as a degree of belief and specifies how prior beliefs should change in response to new data. Therefore from a Bayesian perspective there is a subjective element to probability.

Bayesian statistics has been around since the second half of the 18th century when Bayes work on conditional probability was posthumously published. It was the French mathematician LaPlace who stated Bayes Theorem in the form it’s known today. Bayes Theorem states that $P(A \mid B) = \frac{P(B \mid A) \, P(A)}{P(B)}$

Bayesian statistics fell out of favour somewhat for a few reasons. A major one was the computational cost this approach entails. Another is the subjectivity associated with the specification of a prior probability. Computational complexity is much less of a concern nowadays but the perceived subjectivity inherent in Bayesianism still prompts debate. Bayesian statistics are starting to become popular again in recent years.

• Bayesian hypothesis testing attempts to determine which of two competing hypotheses $H_0$ and $H_1$ are more likely and starts with the calculation of the prior probability. The prior probability represents an attempt to quantify the degree of belief in the hypothesis prior to the experiment.
• The likelihood function or probability of the data given the hypothesis is calculated given the data observed in the experiment.
• Given the prior probability and the likelihood function the posterior probability can be calculated using Bayes theorem. Substituting H (hypothesis) for A and D (data) for B in the formula above gives $P(H \mid D) = \frac{P(D \mid H) \, P(H)}{P(D)}$ which can be stated as the probability of the hypothesis given the data (i.e. the posterior probability) is equal to the probability of the data given the hypothesis (the likelihood function) multiplied by the probability of the hypothesis (the prior probability) divided by the probability of the data. The probability of the data is the probability of observing a given outcome of the experiment calculated prior to the experiment being conducted.
• The posterior probability is calculated for each hypothesis. The ratio of posterior probabilities gives a direct measure of the probability of $H_1$ being true.
• P values aren’t used in Bayesian hypothesis testing. Instead the Bayes factor (ratio of marginal likelihoods) is used and unlike the p value, it does provide a quantitative measure of the evidence for a hypothesis. It shows the ratio of the likelihoods of the experimental outcome under $H_1$ and $H_0$.

In contrast to the frequentist approach, the Bayesian approach treats the parameters of interest as variable. It calculates the probability of the hypothesis given the data.

In Bayesian hypothesis testing the prior probability is updated with data from the experiment to produce the posterior probability.

Some Points to Note

• Quantifying the prior probability of a hypothesis is usually the problematic part of Bayesian hypothesis testing. Degrees of belief are subjective and therefore different people may come up with different prior distributions. One way round this is so called Objective Bayesian Analysis where non-informative prior distributions are used. There are various non-informative priors that can be used but this approach has not been entirely successful.
• Hoijtink et al (2016) point out that unless the prior distribution is chosen carefully then the Bayes factor will not be well calibrated i.e. will not be unbiased with respect to the experimental hypotheses. Frequentist null hypothesis testing has been criticised for being biased towards rejecting $H_0$. It has been shown that Bayesian methods can suffer from the opposite fault (bias towards not rejecting $H_0$), particularly if the effect size is small (as it is in Ganzfeld research).
• The current rules for interpreting the size of the Bayes factor are abitrary. Currently Bayes factors of 1-3 are treated as anecdotal evidence, 3-20 positive evidence and 20-150 strong evidence but as Hoijtink et al (2016) point out these can be misleading. They advocate using frequency calculations to determine the probability of making a correct decision given the Bayes Factor computed and using that as an indicator of evidential strength.

Do psychologists need to change the way they analyse their data?

While there have been calls for psychologists to start using Bayesian approaches to analyse their data (for example Wagenmakers et al 2011), I don’t think any statistical approach (Bayesian, Frequentist or anything else) is going to be a panacea for a flawed research design. It shouldn’t be a case of Frequentist vs Bayesian wars either. Both have advantages and disadvantages. Best may be to combine useful elements from both approaches. In any case more important than the statistical paradigm used is for psychologists to be familiar with the nuances of the statistical technique they are using. Just because some may incorrectly apply a particular approach to statistical inference does not mean that approach should be abandoned.

Rather than having to change how they analyse their data  it may be that psychologists (and researchers in other disciplines) need to change how they plan their research and interpret their data analyses. One way to do this is by implementing a study registry. This allows researchers to register studies in advance specifying in detail how they will be carried out, whether they are confirmatory or exploratory etc. This is a common procedure in regulated medical research and removes the possibility of a lot of undetected methodological flaws.

One such register for parapsychology is maintained by the Koestler Parapsychology Unit in Edinburgh since 2012. It will be interesting to see what trends emerge from these pre-registered studies and if in favour of the psi hypothesis, to what post-hoc explanations motivated skeptics will resort to try and nullify the results!