I had been meaning to read this book for a while. It features on many recommended reading lists for data science and its author, Cathy O’Neil, was a proponent of data science who co-authored “Doing Data Science”, an excellent practical introduction to the subject. So I was interested to read what might be the antidote to some of the current big data hubris. Having started to read it a while back but put it aside, a recent holiday to Poland gave me a chance to revisit it.
O’Neil lays her cards on the table from the beginning stating in the introduction that she is no longer a big data evangelist. She is a former maths professor who left academia to work in quantitative finance at a hedge fund. Disillusioned with this work following the crash of the late noughties, she rebranded herself as a data scientist to work in ecommerce. Her latest venture is a company that audits algorithms for fairness.
As you might have guessed from the title, in this book the author equates big data algorithms to weapons of mass destruction that can wreak devastating consequences on a huge scale. She calls them weapons of math destruction (WMDs) and outlines their bomb parts as opacity, scale and damage. Typically such algorithms work as black boxes, their decision making processes invisible but inviolable at the same time. Results from these algorithms form a self-fulfilling prophecy feedback loop where the models outputs help to perpetuate the environment that justifies their assumptions. And big data technologies mean that these algorithms, or models, can scale massively and therefore have the potential to unfairly impact the lives of many people.
Plenty of examples of WMDs are presented throughout the book in areas like finance, education, justice, employment, insurance, real estate and politics. We read how algorithms measuring traders performance incentivised underestimation of risk, how algorithms used to rank US colleges led to massive increases in college tuition fees, how predictive policing models can result in the criminalisation of poorer neighbourhoods, how algorithms used by finance companies for assessing creditworthiness can serve to maintain unequal access to credit and so on. Many other examples are given and for more details you’ll have to read the book!
One thing to consider is if a model is built to optimise profits for it’s designers, and it does that extremely well but at the same time results in certain sectors of society being treated unfairly, is this the fault of the algorithm or those who designed it? To my mind the problem isn’t with the algorithm. The author acknowledges this as she mentions a number of times that it is the purpose or objective which the model is designed for that’s the problem. However it could be made clearer that the problem isn’t algorithms per se but rather that they are designed with a particular goal in mind and if optimised ruthlessly to achieve this, will do so, often at the expense of fairness.
While definitely recommending this book, I have some other slight quibbles. It’s very “American-centric”, nearly all the examples are from America. In fact one could read the book as a withering critique of elements of American society rather than a treatise on the dangers of algorithms that scale. Some recommendations are made, a Hippocratic Oath for data scientists and the need to incorporate considerations of equality and fairness into algorithmic evaluation measures but the book is short on solutions. There is no mention at all of the recent GDPR legislation which came into force in the EU earlier this year and which in Articles 13-15 provides for the right to an explanation for those affected by automated decision making of machine learning models. A bit surprising given model opacity is one of the criteria for a WMD. Also, there is no mention of the huge potential of big data for good. The idea of social enterprise as an alternative to traditional business models has been very much promoted within the EU in recent years and big data offers huge promise for organisations with a people, profit, place focus.
Notwithstanding the above this is an interesting and thought-provoking book that deserves to be read by anybody with an interest in the broader implications for society of big data and data science.