Data Science, Data Mining, Machine Learning, Artificial Intelligence, Big Data … the list goes on. All terms that eager cheerleaders of the data revolution are highlighting that organisations need to embrace. With all the hype and attention it’s not surprising that businesses feel they need to be become more data-driven or risk losing competitive advantage. And definitely there is substance to the hype otherwise companies like IBM wouldn’t be pouring literally billions of dollars in investment into their big data capabilities.
Data science done properly can yield big rewards but it’s also possible to throw lots of money at setting up a data science unit and still see little tangible benefit. Simply saying you’re going to do data science and hiring some ‘data scientists’and buying or licencing expensive technology is not the way to go. So how should one go about building an effective data science function?
Take a Team Approach
At the moment there is a global shortage of data scientists so hiring good ones is not easy. In addition data science is a multidisciplinary field requiring skills and knowledge in areas like computer science, statistics, data visualisation, programming, communications and business (subject matter expertise). People with strong skills in all these areas are difficult to find. Rather than hunting for ‘unicorns’ as they are known one can build a rounded data science team by hiring for diverse skill sets that complement each other.
Centralised or Dispersed?
One key decision that needs to be made is how to situate the data science team within the overall organisation. Options range from a single data science unit reporting to a single manager or chief data scientist to a scenario where data scientists are embedded in each of the business departments e.g. logistics, marketing, product development. Which structure is best will depend on various factors. Smaller organisations may do better with a centralised structure. This structure also facilitates knowledge and good practice sharing among the data science team but may make it harder for data science to collaborate closely with other organisational units. On the other hand a more dispersed structure across the organisation enables closer communication between data scientists and business units which can be vital in ensuring that data scientists are focusing on the right type of questions, the answers to which will generate actionable insight for the business.
It’s about the process!
Data science is about a process rather than any particular piece of technology. The data science process is an iterative one that starts with defining a question or objective and identifying the data needed. Then the data undergoes preparation and exploratory analysis. Data preparation is commonly the most time consuming part of a data analysis. Then comes the most interesting part of the process (at least for most data scientists!) which involves deploying algorithms on the data and building models. Finally, and a crucial part of the process, is communication of the results.
Standard cross industry approaches to analytics and data mining such as the Cross Industry Standard Process for Data Mining (CRISP-DM) can be used as the basis for a process model. Taking an agile approach with rapid development of prototypes, many of which will eventually be discarded seems to be becoming more popular. This requires a tolerance for failure but also a recognition that over time success will ensue if the right process is followed.
Develop data skills within and across the organisation
As the data science function becomes established it can play an important role in communicating what data science is about and in empowering other employees to become involved in the data science process. Some organisations have developed this further into internal data education programmes for all staff, a good example is Airbnb s data science university. Along similar lines but in the public sector is the Data Science Accelerator programme in the UK.
Becoming data driven is a goal many companies aspire to. The numbers claiming to be data driven are likely well in excess of those who truly are though. Changing organisational culture is never easy and requires involvement of all stakeholders. It necessitates operationalising key performance indicators (something which may be trivial or difficult depending on the organisations area of business) and collecting data on all important processes and outputs. This data needs to be analysed and insights communicated across the organisation in a timely manner. Really important is to empower a sense of ownership in the analytics process right across the organisation. Effectively integrating analytic output into both everyday and strategic decision making is what truly defines a data driven organisation.