learn about Data Science Life Cycle

Posted by Rahul Sharma on March 19th, 2020

Data Science Life Cycle

For a better understanding of ‘What is Data Science?’, let’s explore its life cycle. Suppose, Mr. X is the owner of a retail store and his goal is to improve the sales of his store by identifying the drivers of sales. To accomplish the goal, he needs to answer the following questions:

  • Which are the most profitable products in the store?
  • How are the in-store promotions working?
  • Are the product placements effectively deployed?

His primary aim is to answer these questions which would surely influence the outcome of the project. Hence, he appoints you as a Data Scientist. Let’s solve this problem using the Data Science life cycle.

Data Discovery

The first phase in the Data Science life cycle is data discovery for any Data Science problem. It includes ways to discover data from various sources which could be in an unstructured format like videos or images or in a structured format like in text files, or it could be from relational database systems. Organizations are also peeping into customer social media data, and the like, to understand customer mindset better.

In this stage, become Data Scientist by learning data science course in bangalore, our objective would be to boost the sales of Mr. X’s retail store. Here, factors affecting the sales could be:

  • Store location
  • Staff
  • Working hours
  • Promotions
  • Product placement
  • Product pricing
  • Competitors’ location and promotions, and so on

Keeping these factors in mind, we would develop clarity on the data and procure this data for our analysis. At the end of this stage, we would collect all data that pertain to the elements listed above.

Data Preparation

Once the data discovery phase is completed, the next stage is data preparation. It includes converting disparate data into a common format in order to work with it seamlessly. This process involves collecting clean data subsets and inserting suitable defaults, and it can also involve more complex methods like identifying missing values by modeling, and so on. Once the data cleaning is done, the next step is to integrate and create a conclusion from the dataset for analysis. This involves the integration of data which includes merging two or more tables of the same objects, but storing different information, or summarizing fields in a table using aggregation. Here, we would also try to explore and understand what patterns and values our datasets have.

Mathematical Models

Do you know, all Data Science projects have certain mathematical models driving them. These models are planned and built by the Data Scientists in order to suit the specific need of the business organization. This might involve various areas of the mathematical domain including statistics, logistic and linear regression, differential and integral calculus, etc.

Like it? Share it!


Rahul Sharma

About the Author

Rahul Sharma
Joined: March 19th, 2020
Articles Posted: 2

More by this author