Learning Analytics – Predicting Academic Performance

A recent longitudinal study by the HEA that tracked the progress of more than 34,000 students enrolled in third level education in Ireland in 2007/2008, found that 76% graduated over the following ten years. Completion rates varied somewhat by type of college, subject and gender. Overall 58% of students graduated on time. Although apparently these figures compare well internationally, one can see that more than 40% of third level students in this cohort didn’t graduate on time and nearly a quarter hadn’t graduated in the following ten year period.

Non-completion rates in further education and training are likely lower than those at third level (see this report for example which estimates national PLC drop-out rates of around 16%), but reducing non-completion rates is, or at least should be in my opinion, a concern at all levels of education. Course drop-out is not always negative, a person may leave to take up a job they enjoy, to take up a course more suitable to their needs and interests, or for personal development. However in many cases educational drop-out can result in a significant cost to students, colleges and schools and society in general.

Analytics can form a useful part of an organisational strategy to tackle drop-out rates. Learning Analytics has been defined as the measurement, collection, analysis and reporting of data about learners and contexts for the purposes of understanding and optimising learning and the environments in which it occurs. One of the most common uses of learning analytics has been in the prediction of student academic performance. Nearly all of the work in this area to date has been done with cohorts of students in tertiary education.

In a recent study, Dr Geraldine Gray and I looked at the prediction of academic performance in cohorts of students in further education and training (FET) courses using only data from a learning management system (LMS), Moodle, to create the predictor variables. Learning management systems have become almost ubiquitous in education nowadays. From an analysts viewpoint, such systems have the big advantage of providing a means of quantifying students learning behaviours and interactions with the learning space in a way that’s more difficult to do in ‘face to face’ learning.

Of course there are aspects of learning that can’t be captured in an LMS, but given the low cost of collecting LMS data and the potentially high cost of student dropout, it is at least worth investigating whether it can be of use at predicting students at risk of early exit or failure. To be of much use these predictions need to be available early enough in the course so that some type of intervention can occur to change the predicted outcome.

Findings from research in this area have been mixed to date. There have been large differences in predictive accuracy of models from study to study. This shouldn’t be surprising given that predictor variables, type of LMS, course design, instruction style and so on can often differ between studies. However predictive accuracy can vary a lot between courses even within the same institution.

In our study with FET cohorts we achieved reasonably good accuracy at predicting student grade and very good accuracy at predicting whether a student would pass or fail, but only when data for full course durations was used. Results using early data were not so good. At 10 weeks (about a third of the way through most courses) the algorithms were able to predict significantly better than chance, though still only identifying a minority of failing students.

I think that our results could be improved with better feature engineering and dataset enrichment.

  • A report by Civitas which looked at commonalities in predictive variables across many colleges and students found 4 broad areas under which useful predictors could be grouped: these are  measures of attendance, course material engagement, course discussion board engagement and LMS grades.
  • Civitas also suggest that rather than using raw frequency counts it is better to use derived variables, as for example in regularity of login in our study.
  • In a similar vein researchers in Ireland have emphasised the need to use fine grained data for performance prediction.
  • Best results might well come from combining LMS data from data from other sources such as student information systems (SIS), for example the HEA study above found that previous academic performance was a good indicator of likelihood to complete.

Building a model that can predict academic performance is one thing, operationalising it is another. There are a number of factors to be considered in that case.

  • In our study, because of small sample size, failing students and early exiting students were combined into a single category. Ideally these may need to be differentiated since a different approach may be required for each.
  • Once a student has been identified as at risk of failure what form will intervention take? Identifying students at-risk of drop out or failure is not much use if an intervention doesn’t succeed in preventing those outcomes for at least some of them.
  • Cost-Benefit Analysis. Collecting data and training a model to predict student performance is a relatively low-cost endeavour. Even if only a few extra students in any particular cohort graduate, the benefits may outweigh the costs.
  • Before such a model could actually be deployed in a ‘live’ environment, model interpretability would need to be considered in the context of GDPR.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.