written by alastair barlow [blog]
If technology is the engine behind your business, processes, customer experience or even designs, then data is most certainly the new oil. While oil is there to make sure your engine runs smoothly, data is also a rich resource and powerful commodity the can drive your business performance.
Data is power. Data can help to make better decisions and put you ahead of your competitors. But if data is so important, how do we harness its power and make the most of it?
The greater (more relevant, accurate and timely) information is, the better informed we are to draw conclusions, and hopefully, make better decisions from it. However, a word of warning here, the more information we have available the harder it can be to distil it down into what’s relevant and seeing ‘the wood for the trees’. Equally, strategic decision-making still (for the time being) requires human intervention and so there are other factors around the quality of our decisions and interpretation of information. Generally, the richer the data, the less room for error.
“Data is the new oil”
Clive Humby, UK mathematician and architect of Tesco’s clubcard
So, just like driving a car, we want the oil that goes in, to be as good as possible to make it a smooth journey. If we are to make the most of data how do we make sure it’s as good as possible?
Wherever you are pulling business intelligence from, the effectiveness of data is fundamental to informed decision-making. Most organisations struggle with data siloed across their business (or worse, in unconnected spreadsheets) and competing “versions of the truth”, ultimately leading to more questions than answers.
At flinder, we believe there are 4 key areas to focus on to make data work for you as well as possible. We refer to these as our 4 Pillars of Effective Data™:
- Appropriately structured
Not all businesses need to have leading-practice data, as most things it’s a balance of resources and effectiveness. However, being aware of how each of these pillars can affect data, can help your business harness data and improve its power and ultimately effective decision-making.
The Holy Grail of data is to have one version of truth in the business; no ambiguity on the data, where all stakeholders are clear and agree on what that data shows.
I’ve seen it many times where finance and sales have two different numbers for revenue. Most of the time, the two functions are working from two applications; a CRM and a finance system that aren’t integrated. This lack of integration leads to the confusion and multiple versions of the truth. This creates noise, loses confidence and time is spent on working out which is correct, rather than making decisions on the back of the information.
Single-version-of-the-truth (SVOT) essentially means data should be consistent across the business and no alternatives that would question which version is correct should exist. To avoid ambiguity, it should be entered once in one system and that system talking to all other systems; there should certainly be no manual rekeying of data which leads to dirty-data. Data should be entered once and, where possible, stored once.
When we talk about data, we want it to be of high-quality. We often hear the phrase “garbage in, garbage out” which, in our case, means management information is only as good as the data you put in. We believe high-quality data is defined by the following 4 key principles:
- Clean and accurate
This is probably the most sought-after attribute of data. Clean and accurate data is typically what we most identify high-quality data with. The data that is captured needs to be correct.
This is where data is captured in all fields and there is no data missing. While clean and accurate captures correct data, we also need to make sure we have a complete data set. Without it being complete, we could draw incorrect conclusions from the lack of data being presented.
In order to identify trends and analyse data, it needs to be consistent. The definitions need to be consistent, the timing needs to be consistent and the units need to be consistent. Say for example, you’re analysing monthly gross margin, if there are inconsistencies in classifying costs to cost of sales or operating costs, then this will have a huge impact on the analysis and render it meaningless.
Data needs to be timely but timely means for its purpose. For example, Formula 1 requires data second by second in order to monitor car, driver and tyre performance at over 200mph, where losing a second could be the difference between winning and losing. Compare this to a supermarket loyalty card which sends coupons to a card holder for frequently bought items periodically. The supermarket groups data together over periods of time and sends these only once a month.
Organisations, and data within organisations, can vary in how they are structured. There are broadly 3 ways to consider data in your business:
- Structured data
- Semi-structured data
- Unstructured data
It’s estimated about 5 – 10% of data is structured. The easiest way to think of this is data that sits in a database or spreadsheet. In other words, it’s contained in rows and columns and can be easily mapped into pre-designed fields.
Again, semi-structured data in most businesses represents about 5 – 10% of all data. However, with the rise of best-of-breed cloud-applications and more businesses being savvy about the power of data, this is probably much higher. Semi-structured data is a form of structured data that does not follow the true structured data relational form associated with databases but does contain tags, markers or that enforce hierarchies of records within the data. Examples of semi-structured data are JSON or XML files and are much more powerful to work with when integrating applications, or in our case, extracting data to provide information.
The remaining 80% or so of data is unstructured. The easiest way to think of unstructured data is text-based or multimedia content data. Examples of unstructured data include blogs, emails, customer feedback, videos, images, audio files etc.
While there have been advances in leveraging unstructured data, such as natural language processing (NLP), it’s significantly harder to make use of unstructured data in business.
It’s important to understand the data types in order to capture data appropriately and leverage its power. Data should be appropriately structured for what you’re trying to do with it.
If data is a commodity there should absolutely be control and a governance strategy wrapped around it.
Hierarchy of structure
We’ve just heard about structured data. All structured and some semi-structured data should have a clear hierarchy in order to organise data in the best possible way. The following illustrates a clear hierarchy structure for a customer name:
- Database – financial application
- File – Customer master data
- Record – Specific customer details
- Field – Customer name
Like everything, data should have clear owners. Why? Well, clear ownership leads to increased quality and understanding of data. Owners will be responsible for ensuring the data is clean, up to date and appropriately defined. Overall, they will be the go-to person for the specific data set.
A data dictionary defines critical information about each attribute in a business-focused way. It makes it clear for the business to understand what the data relates to and a developer to understand the different attributes. It also provides detailed information such as standard definitions of data elements, their meanings, and allowable values.
Managing data involves a broad range of tasks, policies, procedures, and practices, such as:
- Creating, providing access, and updating data
- Storing data across multiple applications
- Disaster recovery procedures
- Ensuring data privacy and security
- Archive, destroy and give relevant access data in accordance with GDPR and other regulatory or compliance requirements
What is dirty data?
Dirty data is the term used for data that contains errors or omissions with the main definitions covering:
This is where data fields are just blank, and data is missing. I see this quite often in CRM systems where things like source of contact or location are missing. Without this information you can’t segment your leads or understand your most valuable sources of new leads.
This is where data is duplicated across the system landscape. Again, this is common where applications are just not integrated and there is no SVOT applied.
This can either lead on from duplicate data where only one source is updated and the other are left as inaccurate, or where there is multiple rekeying of data in the business. Overall, it’s data that is technically the right type but just incorrect. Again, customer or supplier master data is a typical one, as is invoice creation; where a sales order has one number and the final invoice has something different.
While all of the above can be treated through data cleansing activities, it’s much easier and more effective to make sure processes are in place for well controlled data capture in the first place
How good is your data?