What is Data Science, what are the stages and how did it originate?

Over the last decade or so Data Science as a subject has become extremely popular among academics and corporations. As a result, the demand for Data Scientists have rocketed. According to LinkedIn, Data Scientists roles are one of the most promising roles over the last couple of years. Data Scientists roles have also topped the rank among glass door jobs list in 2017 and 2018. HBR reported it as the sexiest job of the 21st century.

So, naturally questions arise to the ones who are exploring this thing called Data Science – What Data Science is, how it works and when it all began. We are going to discuss all those question in this blog. Now, some of you may also be interested at finding out why is Data Science so important and what does a Data Scientist do? Don’t worry, we also got that covered in our second blog in this topic. Please see Why Data Science is important.

To start with, Data Science is not just some industry buzzword, at least not anymore. The simplest of the response to answer what Data Science is, ‘the science of data Management’. But then it probably still does not make a lot of sense. Does it? Let’s break it down into a bit more detail.

It is about gathering a large dataset and processing them to make them more useful or informative, particularly to understand the world around us. For example, if you are a Training Provider for candidates in Finance Industry. Then you will need to gather data around what are the top sought after skills in the industry, how are you doing with your existing training programs, what is working and what is not working etc.

However, it may not be as straight forward as I mentioned, to use all the collected data in informed decision making, we need to use ideas from Statistics, Computer Science and domain knowledge that communicates what those data really represent. To illustrate Domain knowledge further, you cannot do an analysis on Finance without understanding something about Finance, so that is what we mean by domain knowledge.

So, we can conclude that Data Science is drawing useful conclusions from data using computation.

Now let’s answer our next query, how does it work? Like many of the computational tasks, it goes through some logical stages with the aim to make sense of data that were previously left untouched or unused. There are three stages that Data Science involves: exploration, inference, and prediction. I am going to very briefly outline these three stages below.


Raw data itself is unlikely to be of any use until that can be mastered to create some meaning out of it. Hence, Data workers conduct a process called exploration. Through this process, they organise data in such a way that patterns in the information can be identified and can be viewed using visualisation techniques.





Once those patterns have been identified, the next step is to quantify those patterns to see if they are reliable or just one-off event, and this process can be completed using a technique called randomisation. Now, if those patterns are not reliable, any decision made based on those are unlikely to be reliable too. Hence, finding a pattern itself is not sufficient for decision making until and unless its proven to be a reliable one.



Things get pretty interesting here. Since this is where Data Science can be different from Business Intelligence or spreadsheets. Unlike traditional tools, it can make the best-informed guess for the future converting bulk amounts of data through the use of Machine Learning.

Now if you are like me, you might be thinking, this sounds awesome but when did it all begin and how I missed out on this. Okay, why not take a dive into this straightaway. Even though the term Data Science has been popularised quite recently, The history of Data Science can be traced back to over fifty years from now and was used as substitute for Computer Science in 1960 by Peter Naur. In 1974, a paper was published by Peter – Concise Survey of Computer Methods where he used the term Data Science.

After about twenty years later the term Data Science was used again for the first time when the members of the International Federation of Classification Societies got together in Kobe at their biennial conference in 1996. The conference was named as Data Science, classification and related methods. In the following year C.F. Jeff Wu gave an inaugural lecture on the same topic when he discussed how data science came from Statistics. Then in 2001, Data Science was introduced as an independent field by William S. Cleveland. He wrote an article on International Statistical Review about Data Science: An action plan for expanding the technical areas of statistics. Where he incorporated advances in computing with Data. In that report, he outlined six areas that he believed to form the foundation of data science. Those six areas include multidisciplinary investigations, models and methods for data, pedagogy, computing with data, theory and tool evaluation.

Then the next year, the International Council for Science: Committee on Data for Science and Technology started the publication of Data Science Journal that mainly focused on areas that were related to data such as description of data system, writing on internet, their application and any legal concerns. Soon after that in 2003 Columbia University followed the footsteps of International Council for Science to initiate their own publication: Journal of Data Science. Which acted as a platform for various data workers to exchange ideas and share their opinions about the significance and use of Data Science.

After that the National Science Board published long lived digital data collections: enabling research and education in the 21st century in 2005. In the publication, they defined Data Scientists as the information and computer scientists, database and software programmers, disciplinary experts, expert annotators and curators whose primary job was to conduct creative inquiry and analysis so that data can be utilised by companies and organisations effectively in all sectors. And since then Data Science as a field has never looked back and continued triumphing in the academic and business world.



Leave a Reply