How Agile Scrum techniques should be adapted to fit data science projects

Industry   |   
Published June 21, 2019   |   

18771

Tech Talk: Volume 1

I’m a big fan of Agile practices. However, I think the traditional and rigid Agile Scrum rules may not be a good fit for data science projects.
Tweaking the Scrum methodology could help in making it more adaptable for data science projects. Here’s how.

Task definition

Unlike software engineering projects, it’s not easy to define the scope and task for a data science project. This is because there is very little or no clarity on the problem. A major chunk of data scientist’s time would be spent in defining the business problem into the right kind of statistical problem. If the problem definition is wrong, then we will possibly correlate slowdown in South African GDP growth to the slowdown in sales of ice cream in the United States.
Starting from the end business result would help in defining the problem clearly. For example, if the goal is to do sales forecasting, then split it into smaller influencing factors. Try to gather the necessary data around it. Identify the influence of one variable over the other. If required define multiple smaller statistical problems.

Timeline-based story points

In generic terms, we can say that data science projects will have data needs, exploration, cleaning, preparation, feature engineering, model selection, and validation phases. However, the time and effort required in each phase are not the same across projects. They vary to a great extent because the nature of the problem is not the same.
Instead of using time as a major factor, it would be more appropriate to use complexity points. This would bring in another question: how to define complexity points and relate it to the actual work being done? Relative grading on the story can be one approach to this question where we compare one story to another and see how different they are, in terms of complexity. The time taken to complete can be arrived based on the average time taken for a specific complexity story from multiple sprints. At least three sprints are required to come up with a decent story point that can be consumed in one sprint.

Change of scope and requirements

While addressing high-level problems, we start with an initial assumption based on either past experience or the industry experts’ view. The initial experiments are defined based on those assumptions. However, as we unearth new findings from the data, the focus might change or in some cases, the scope of work will take a tangential path. This proves to be disruptive for scrum-based approaches.
Having intermediate checkpoints with the project owner is necessary to alter the course of action within a sprint. Hence, the project owner and the data scientist should work hand in hand. Enabling change in the story description and story point during a sprint can help. Story spillovers are not a crime and should be treated as acceptable.

Deliverables

Unlike software engineering projects, not all tasks can be considered as a tangible deliverable. Defining the acceptance criteria becomes a challenge while formulating the scope. As data science is partly research-oriented work, project managers wary of results and time may feel that these tasks lack clear deadlines.
Though there are no visible outcomes, changes are happening under the tide. It might take some time to see results. Being flexible about timelines can help the team to be more productive.