What is Data?
We talk a lot about data. But do we know what it is?
Last week I explained how effective thinking starts with clarification. In this post we’ll clarify the subject of this blog.
What Is Data?
Perhaps you’ve thought about data before. But did you ever think about what it is?
Data is a ubiquitous word. We use it across various domains.
We read about it in the news:
“There’s no data to support this”
“We need more data”
“The data is biased”
“There’s no evidence in the data”
We see it in various job titles.
Data Analyst
Data Scientist
Data Engineer
Data Professional
We see it in the names of our products
Database
Data Warehouse
Unlimited Data Plan
Offline Data
We must have a clear definition for such a widely used word.
How We Currently Define Data
Let’s look at how the internet defines data.
The top definition from Webster’s Dictionary
Factual information (such as measurement or statistics) used as a basis for reasoning, discussion, or calculation.
The second definition from Webster’s Dictionary
Information in digital form that can be transmitted or processed.
From Vocabulary.com
Data is information such as facts and numbers used to analyze something or make decisions.
From Wikipedia (Yikes)
In the pursuit of knowledge, data is a collection of discrete values that convey information, describing quantity, quality, fact, statistics, other basic units of meaning, or simply sequences of symbols that may be further interpreted.
Which One Is Best?
Definitions 1 and 3 cover what most people probably think when they read about data in the news. But this definition has flaws. Does the data always need to be factual? Does it always need to be used for analysis?
What about definition 2? I would bet that most data professionals would prefer this definition. However it is also limited. Not all data is in digital form. Nor does it always need to be processed.
The Problem
The problem with all four definitions above is that none of them explain what data is. They only describe what we do with it. It’s no wonder many data professionals struggle to communicate findings from something they can’t simply define.
I have a solution to this problem. I will provide a simple definition for data that everyone will understand.
The Solution
I present to my readers Occam’s Data’s definition of data
Data is information.
Data is information. That is all.
Information is the only word you’ll find in common in the four definitions from the internet. That’s because this is what data is. We get caught up in the details of what we do with this information.
But these details are not what data is. And the more we talk about data without clarifying what it is, the harder it is to understand what we are talking about.
This definition is one that everyone can understand. Clarifying data this way makes it easier to talk about what we do with it.
Exercise
I want you to put this new definition to the test. In your next encounter with data, swap the word data with the word information.
Tell me how it went.
If you are new to data, did this help you understand something better?
If you are a data professional, did this swap help you communicate what you’re working on?