There is huge hype at the moment about Data Science and it seems like everybody is trying to get in on the game. While hype might help bring the topic to popular attention, it can also serve to obscure and confuse. What do people mean when they talk about data science? Is it all just hype?
Some for example here and here have argued that data science is just statistics rebranded. That argument may have some merit however there seems to be a broader consensus that there is more to data science than just statistics.
The Data Science Association define it as the scientific study of the creation, validation and transformation of data to create meaning. Here are some more attempts to define data science and what a data scientist does by those working in the field.
Data Science is the art of turning data into actions. This is accomplished through the creation of data products, which provide actionable information without exposing decision makers to the underlying data or analytics (Booz Allen – Field Guide to Data Science)
Data in the cyberspace already show features of an independent world, like the natural world, so all data in cyberspace are here referred to as datanature…Data science is the theory, method, and technology of studying datanature.(Y. Zhu & Y. Xiong)
A data scientist is someone who knows how to extract meaning from and interpret data which requires both tools and methods from statistics and machine learning as well as being human.(R Schutt & C O’Neill)
By definition all scientists are data scientists. In my opinion they are half hacker, half analyst, they use data to build products and find insights. It’s Columbus meets Colombo – starry eyed explorers and skeptical detectives. (M. Rogati)
A data scientist is someone who is better at statistics than any software engineer and better at software engineering than any statistician (J. Wills).
Data science as it is practised is a blend of Red Bull fueled hacking and Espresso inspired statistics (M. Driscoll)
Being a data scientist is not only about data crunching. It’s about understanding the business challenge, creating some valuable actionable insights to the data, and communicating their findings to the business (J.P. Isson).
All of the above are valid to some extent but it is difficult to define an entire field in just a few words. My own thoughts about data science are that essentially it means taking a scientific approach to the process of extracting knowledge from data. While some of data science is meta science – in effect science about science, most of the work is applied and focused on solving real world problems. It is a multidisciplinary field requiring skills and knowledge in programming, computer science, statistics, data visualization, communications and domain (subject matter) expertise. As with any science, reproducibility is important and will become more so as sharing data, code and pipelines becomes easier.