Data Science Life Cycle

Life cycle of data science is recursive. After completing the all phases, the data scientist can back to top. The data Science life cycle is like a cross industry process for data mining as data science is an interdisciplinary field of data collection, data analysis, feature engineering, data prediction, data visualization and is involved in both structured and unstructured data.

The phases of Data Science are –

  • Business Understanding
  • Data Mining
  • Data Cleaning
  • Exploration
  • Feature Engineering
  • Prediction Modeling
  • Data Visualization

Business Understanding

At first, the data scientist identifies the problem, a group of people analyzes the problem and discuss their solutions. They also learn the previous records to identify whether such problem happened earlier or not. Every decision has to be in favor of the organization.

Data Mining

Data mining is the process of identifying what type of data is available to them?, is data sufficient according the requirement?, or is there any need to buy the data from a third party?, if yes, would the data secure or private? This process is time consuming, as in it data gathered from different sources. The main perspective of data mining is to gathering all the needful data.

Data Cleaning

The collected data in the data mining process may contain lots of unnecessary data or may be inconsistent way. It may also happen that some pieces of the data are in different sources, the date format may be incomplete. So, the next task of data scientist is to clean all the unwanted data or make data consolidation. This process may be time consuming, as all depends on the quality of gathered data. At last of the process the data scientist has cleaning and manipulated data.


Data exploration is in actually the starting stage of data analysis. In this process, the data scientist summarizes the data with main characteristics and analyze and explore each data set very carefully. They can use the different graphical representation technique like histogram, scatter plots and so on.

Feature Engineering

This process is basically the applied machine learning. In this process, domain knowledge and deep learning of data is required to make the machine learning algorithm to work. This is very difficult and expensive. This process requires brainstorming to improve the features. The features in your data is important for the data prediction.

Prediction Modeling

Here, the data scientist predicts the project. There are so many predictive analytics questions in front of the finally built data science project. They are also predicting the future events and actions.

Data Visualization

Data visualization is to show the information in the pictorial or graphical configuration. It empowers leaders to see examination displayed outwardly, so they can get a handle on troublesome ideas or recognize new examples. With intelligent perception, you can make the idea a stride facilitate by utilizing innovation to penetrate down into diagrams and charts for more detail, intuitively changing what information you see and how it’s prepared.