If you’ve heard of data science (if you haven’t, where have you been and how did you find this blog?), you’ve probably heard of “fail fast”. The fail fast mentality is based on the notion that if an activity isn’t going to work, you should find out as quickly as possible, and stop doing it.
As the size, complexity and number of new data sources continues to increase, there is a corresponding increase in the value of discovery analytics. Discovery analytics is the method by which we uncover patterns in data and develop new use cases that lead to business value.
It is easy to see how discovery activities lead to a fail fast method. However, how can we learn from these failures, and how can we proceed without experiencing the same failures time and again?
Good failure, bad failure
There are two different types of failure possible in a data science project: good failures and bad failures. Good failures are a necessary part of the discovery process, and an important step in finding value in data. On the other hand, bad failures occur when they could have been avoided, and are basically of waste of everybody’s time. Examples of the cause of bad failures include:
Poor specification – this is not specific to data science and applies to any project that isn’t specified properly in terms of expected results and appropriate timelines.
Inappropriate projects for a data science methodology – it has become increasingly common to call all analytics data science. If a project can be solved using a standard data warehouse and business intelligence method, then you should probably just do that.
Poor expectation management – many data science projects suffer from this. It is important to ensure stakeholders are aware what can and cannot be expected from the results.
Data supply – a vital first step in any analytics project is to ensure that the necessary data feeds are available and accessible.
When it comes to #datascience, there's good failure and bad failure. Tweet This
Let’s talk about publication bias. This phenomenon occurs in the publication of scientific papers, where it is usual to only publish studies that produce positive results. What is far less common is to publish a paper that highlights the amount of work you did in order to fail to produce anything of any worth! The problem is that this leads to teams making the same mistakes, or proceeding down the same creative cul-de-sacs as so many before them. Because of publication bias, we do not learn from each other’s mistakes.
Exactly that situation can occur in a data science team. Unless a true collaborative environment exists for discovery and predictive model development, the same failures will be made over and over again by different members of the team.
Move out of the cul-de-sac
In order to benefit from the fail fast approach, data science teams need to adopt a best practice method of sharing results, methodologies and discovery work – especially when their work is considered a failure. This can be done in many ways, but some of the more effective include regular discussion – similar to agile methodology’s stand-up meetings – and using appropriate software to aid the process.
Software tools exist to facilitate collaboration, issue tracking, continuous documentation, source control and versioning of programme code, as well as task tracking. These tools create a lineage of activities that is permanent and searchable.
If you want to hear more on this subject, why not come to see my presentation ‘My data scientists are failures’ at the Teradata PARTNERS conference in Anaheim this October.
Chris Hillman is the Director of Data Science in the International region and has been responsible for developing and articulating the Teradata Analytics 1-2-3 strategy and supporting the direction and development of ClearScape Analytics. Prior to this current role, Chris led the International Data Science Practice and has worked on a large number of AI projects in the International Region focusing on the generation of measurable ROI from Analytics in production at scale using Teradata, open source and other vendor technologies. Chris has spoken regularly at leading conferences including Strata, Gartner Analytics, O’Reilly AI and Hadoop World. Chris also worked to establish the Art of Analytics practice, promoting the value of producing striking visualisations that draw people into Data Science projects, while retaining a solid business-outcome foundation.