Data is everywhere around us, it exists in a myriad of different forms and is used in loads of different ways in our everyday life.
What kind of pets do you have? How many kids do you have? What is your favorite type of chocolate? How tall are you?
All of these questions can be answered with some kind of information that can be defined as processed data.
Types of data
To give a scientific definition: data is a collection of raw, unorganized facts that need to be processed. After the data is processed we can conclude whether it may be used to prove or disprove a hypothesis or a data set.
You have probably noticed that the questions above have different answers. It shows us, that there are different types of data. So, it is essential to know what type of data you are working with. Having understood the kind of data, you will be able to effectively interpret and analyze it.
There are two main data types: numerical and categorical or, in other words, quantitative and qualitative.
Now we will have a closer look at them!
Numerical data
Numerical, or quantitative, data is a type of data that represents numbers rather than natural language descriptions, so it can only be collected in a numeric form.
Examples of quantitative data include arithmetic operations (addition, subtraction, division, and multiplication), and ways to measure a person's weight and height.
It is also divided into two subsets: discrete data and continuous data:
Discrete data:
The main feature of this data type is that it is countable, meaning that it can take certain values like numbers , and so on, and a discrete dataset can be either finite or infinite.
Examples of these types of data are age, the number of children you want to have (the number is a non-negative integer because you can't have or kids), and the number of sugar cubes in the jar. All of these examples are finite. They can be counted from the beginning to the end, but if you try to count all the sugar cubes in the world, you will notice that it is countably infinite data, so you cannot possibly complete the counting as the number of sugar cubes tends to infinity.
Continuous data:
Continuous data is a type of data with uncountable elements. It is represented as a set of intervals on a number line. Just like discrete data, continuous can also be either finite or infinite.
Examples of continuous data are the measure of weight, height, area, distance, time, etc. This type of data can be further divided into interval data and ratio data.
Interval data:
Interval data is measured along a scale, in which each point is placed at an equal distance, or interval, from one another.
Ratio data:
Ratio data is almost the same as the previous type but the main difference is that it has a zero point. For instance, the zero point temperature can be measured in Kelvin. It is equal to degrees Celsius, or Fahrenheit.
Categorical data
Categorical, or qualitative data, is information divided into groups or categories using labels or names. In such dataset, each item is placed in a single category depending on its qualities. All categories are mutually exclusive.
Numbers in this type of data do not have mathematical meaning, i.e. no arithmetical operations can be performed with numerical variables.
A good example of categorical data is when you are filling out forms for job applications. You may be asked to specify your level of education. For instance, you are choosing MSc out of all because you fall under this particular category.
Categorical data is further divided into nominal data and ordinal data.
Nominal data:
Nominal data, also known as naming data, is descriptive and has a function of labeling or naming variables. Elements of this type of data do not have any order, or numerical value, and cannot be measured. Nominal data is usually collected via questionnaires or surveys.
E.g.: Person's name, eye color, clothes brand.
Ordinal data:
This type of data represents elements that are ordered, ranked, or used on a rating scale. Generally speaking, these are categories with an implied order. Though ordinal data can be counted, it cannot be measured as well as nominal one.
Examples of ordinal data include customer satisfaction rating, Likert scale, and income level.
Key differences and similarities
After exploring these two data types, let's take a look at how similar and different they are.
Similarities:
Ordinal data is classified as not only categorical but also numerical data.
Both are usually collected using surveys and questionnaires.
Differences:
Numerical data defines a number, while categorical data is descriptive.
Numerical data can be counted and measured in numerical values while categorical cannot.
Numerical data answers questions 'how many?', 'how much?', or 'how often?', while categorical gives answers to questions 'why?' or 'how?' something happened due to certain circumstances.
Numerical data is analyzed using statistical analysis, while categorical data is processed by collecting its elements into classifications and topics.
Which data type to use?
Numerical data is better for data analysis because it is more concrete. So, to perform a complete statistical analysis, it is using both data types that will lead to the best results.
For example, you may be asked "how many times did you visit a doctor this month?". To get a better understanding of your actions, you may also be asked to explain ''why?''. Thus, a different perspective on numbers will be obtained. Your answer may include different reasons for having doctor's appointments. This will make the research results more accurate and complete.
The following diagram will help you better understand the classification of the data types we have discussed:
Conclusion
Let's go over the main takeaways regarding data types:
Numerical data and categorical data are the two main types of data.
Numerical, or quantitative, data is all about numbers and is easier to analyze due to its numeric format.
Categorical, or qualitative, data consists of personal information, opinions, and experiences.
Numerical and categorical data are best used together than separately, as the combination of the two gives a greater picture in analysis.