Holistic view for Data-Driven Decision Making Projects using the Exploded View to manage the Data Science Lifecycle
In today's data-driven world, businesses and organisations rely on data science to make critical decisions. However, with the large number of new aspects within a data science project, it can be challenging to know where to start and how to ensure the important aspects to realise a successful project is covered. In this article, we will explore how to approach projects specific to the data science lifecycle with the Exploded View model, from defining the problem to deploying the solution and under consideration of a holistic-organisational view.
The Exploded View model ensures that the deeper contexts and challenges of digitalisation are made visible and comprehensible for all stakeholders in the company. It allows us to discuss, address and agree on specific tasks within our data science project. But other tools are used to create a detailed view: e.g. Data Value Proposition Canvas, ROI Assessment or the Standard Process for the Data Science Lifecycle, etc. The Exploded View is a great tool that helps us to model a holistic view for our data-driven use cases along all aspects within the company in six layers:
If you are interested in the Exploded View, you will find more information in our white paper.
While the first three layers are unique for each use case and organisation, there are general aspects (entities) of a data science project for the other three layers that provide a guidance to approach such use cases. We will discover these aspects in the following. The figure below shows an overview while the single entities will be explained in more detail afterwards.
Holistic view for data-driven decision making projects
As described, the performance layer gives an overview of all relevant processes that we need to consider for a data science use case. In the following, we will look at the three most important ones (Data Value Story, ROI Assessment, Data Science Lifecycle standard process), knowing that the list is not limited to those.
Elevator Pitch Data Value Story
Approaching any given use case from a business perspective, the most important question to answer is what the expected value is. We can define Data Value Story tools like the Data Value Proposition Canvas to clarify the following questions:
What are the gains and pains?
What is the target, what are the results?
What is the desired data product/service?
How does the data product/service relieve the pains or create the gains?
Besides the general value of a use case, assessing the Return on Investment (ROI) of a use case is crucial from an economic perspective. We need to consider three main areas: desirability, feasibility and viability. Desirability involves determining the potential impact of the idea, including cost savings, revenue generation, new business opportunities or improving customer experience, as well as the strategic importance and urgency of the use case. Feasibility entails evaluating the scale of the use case, along with the level of effort required for implementation (including monetary costs and time investment), system readiness, required expertise, data availability and legal considerations. Finally, viability involves examining the impact value in terms of EUR and the effort required to implement and maintain the product. By considering these three areas, businesses can better understand the potential value and feasibility of a use case and make informed decisions about whether or not to pursue it.
Return On Invest (ROI) Assessment
The standard process for the Data Science Lifecycle is a method-based and goal-oriented approach for logical reasoning, traceability, and reproducibility. The process defines a step-by-step guide how to approach any given problem with a data-driven decision making (DDDM) methodology. While the process provides a comprehensive collection of aspects, it is not limited to the ones listed. It covers general aspects such as framing the problem from a business perspective as well as statistical aspects.
Data science use cases are distinct from other software solutions. One reason is the difficulty in upfront assessment of whether the data-driven solution meets the desired goal requirements. This is because the capability of the solution depends on the given data and the ability to extract patterns using statistical methods. To address this, a proof of concept (PoC) as a minimum valuable product (MVP) is recommended to assess the solution's capabilities with minimal effort and time. The PoC scope is typically limited to evaluating the ability to extract the required patterns with the given data to come up with a robust decision. For some use cases, additional topics like legal aspects or infrastructure should be included. With a positive outcome of the PoC, the use case can be deployed and set for productionisation to deliver value.
Standard Process for the Data Science Lifecycle
The first six steps of the Data Science Lifecycle can be divided into two parts (see violet-coloured boxes in Figure 5) that build upon each other:
A) Understanding the problem from a business perspective &
understanding the problem from a statistical (data analytical) perspective.
Based on the gained insights of A, a method-based transition from understanding to solving the problem can be achieved in part B).
B) Deriving and evaluating use case specific statistical methods to solve the problem.
The following 8 steps provide a full picture to realise a data science project:
Business understanding: Define the project's goals and objectives from a business perspective, capture ROI potential, identify risks and formulate a concrete question that can be answered with data. We already learned about some tools like the Data Value Story and the Data Value Proposition Canvas that support this process.
Data collection: Obtain the right data by considering aspects such as data availability, accessibility, legal obligations and technical considerations. This determines if we can even tackle the project with a data-driven approach.
Data understanding: Gain a better understanding of the data to determine whether it fits the business requirements. Explore the data through exploratory data analysis (EDA) and identify suitable statistical methods to address the problem. Based on those findings, we can achieve a method-based transition from understanding to solving the problem by identifying suitable statistical methods.
Data pre-processing: Pre-process the data to support data understanding, clean the data or prepare it for applying analytical methods or statistical models. Proper data structures are crucial to improve efficiency and quality.
Modeling: Select the appropriate modeling techniques, build and validate models and assess their effectiveness. Short-list the best models among multiple models with individual strengths.
Evaluation & benchmarking: Evaluate the model's performance, select a suitable evaluation metric and analyse how robust the results are.
Deployment: Develop a plan to integrate the solution into the organisation's operations, define what productionisation means for the use case, establish risk tiers, validate code quality and security standards and apply general software engineering concepts to data science projects like DevOps, MLOps, and software testing.
Reporting & monitoring: Monitor the deployed system's performance to ensure it continues to meet the project's goals and objectives. Adjust the model if necessary to maintain its effectiveness over time.
Asset Layer & Data Layer
The asset layer describes all resources that need to be considered for a use case. For our data product, we are mainly talking about IT infrastructure, systems and architectures. In contrast, the data layer covers the sum of all data and data structures. Below we look at those two layers combined from a data science perspective.
There are multiple ways to approach data science use cases which are grouped here into three types, each with different proposes and levels of maturity. The figure below gives an overview of the main aspects of those types.
The Data Science Lab is great for PoCs or single use cases. It is mostly used for fast experimentation or productionisation of single use cases with high flexibility.
A Data Science Hub provides standards and processes to enable teams to work together efficiently and to integrate the solutions into the existing IT landscape.
While a Data Science Platform provides great capabilities for multiple use cases, especially over multiple departments, it focuses on building a platform cross-application wide with a core service provider approach. Best practices, standardised tools, services, and templates enable high productivity.
Data & Data Science Architecture
When talking about the different data and data science architectures for those three groups, we mean the system's components, their properties and how those interact with each other. Data science components are software tools and frameworks like Docker, Scikit- Learn or Spark. Data components refer to infrastructure like a data warehouse or lakehouse to store and provision data. For more Information about Data Productivity feel free to reach out to my college Tobias Bartsch.
In this post we discussed the importance of the data science lifecycle for successfully approaching data-driven decision making projects in businesses and organisations. We discovered the Exploded View methodology as a holistic framework that guides data practitioners through the entire process, from defining the problem to deploying the solution.
We support your data-driven journey
This is the second of a series of articles on data-driven decision making, data science, AI and machine learning. In the previous one, we learned about data-driven decision making in general and how it enables business value. In the next article, we will take a closer look at how to apply the described approach for demand forecasting to optimise supply chain management in logistics.
We would be pleased to discuss your interest in these topics with you in person. Contact the Munich office or write an email to firstname.lastname@example.org. foryouandyourcustomers is happy to support your company in becoming more data-driven.