Data is everywhere. Nowadays we have more data than ever: banks collect various data about their customers, scientists collect data for experiments, and even when you just use the Internet, various services also collect information about you. However, why do all these people need so much data? In most cases, data are collected for further analysis. In this topic, we will talk about data analysis, discuss its stages and find out why it is necessary.
What is data analysis?
Data analysis is a process of collecting and analyzing data using various statistical and mathematical methods to reveal useful information. It is already clear from the definition that data analysis consists of several stages. Here are the five most important ones
- Problem identification
- Data collection
- Data cleaning
- Analysis
- Results interpretation
We will discuss these steps in more detail in the next paragraph, but for now, let's focus on why data analysis is so important and why many companies are now looking for good data analysts.
Competent data analysis can help build a profitable business strategy, learn more about customer preferences, weigh the risks of implementing new features in the application, and allow the company to solve problems faster and less costly.
Data analysis steps
In this section, we will discuss the five steps we outlined in the previous paragraph.
1. Problem identification. The first thing we need to do is to understand why we are analyzing data, what we want to find out, and what problem we want to solve. This step will help us understand what data we need and what methods of analysis we will use in the future.
2. Data collection. Once we have identified a problem and understood what data we will analyze to solve it, it is time to collect the data we need. Data can contain both quantitative and qualitative information and can be collected at different intervals in different ways – it all depends on the purpose for which the data are used. Data can be collected, for example, from the financial statements of the company, from the results of a sociological survey, from the statistics of attendance of an Internet resource, etc.
3. Data cleaning. Unfortunately, the data we collect is not always immediately ready for analysis. In most cases, you will have to clean them. "Garbage" in the data can be missing values, values outside a strictly defined range, extra spaces and characters in the text data, duplicate entries, and other problems that can interfere with the quality of the analysis and lead to erroneous results.
4. Analysis. And only now, when we have a clear problem statement and clean data, we can proceed directly to the analysis of this data. Most of the analysis is the application of various statistical methods, the measurement of various metrics, and the construction of various graphs, the interpretation of which will eventually lead to some results. For example, we can build charts of the age distribution of users of the website to get a better idea of the target audience of the company. Also, we can calculate correlation indices for some numerical values to reveal certain correlations and so on. As you can see, comprehensive data analysis requires a confident knowledge of statistical methods in order to understand which specific metrics to measure in this or that situation and what conclusions to draw from all this.
5. Results interpretation. After the analysis we will get some results, which we must be able to interpret correctly, drawing the appropriate conclusions. The conclusions may indicate either the confirmation of some hypothesis or simply state some facts that we wanted to know about when we defined the goals of our analysis. If you have to present the results of the analysis to the management, you should also use visualization, clearly confirming the conclusions with charts.
Types of data analysis
Depending on the task and the necessary conclusions distinguish four basic types of data analysis, which will be discussed in this section.
1. Descriptive analysis. As you can guess, this type of analysis is used when we need to describe some information and identify some facts. For example, if we want to know how Apple stock price has changed over the past 10 years, we will use descriptive analysis.
2. Diagnostic analysis. This type of analysis is used to understand why we have certain values. For example, if we want to understand why Apple stock prices were changing in this way, we use diagnostic analysis. The diagnostic analysis is usually carried out after a descriptive analysis: the descriptive analysis shows us what's happened and the diagnostic analysis tells us why.
3. Predictive analysis. The name speaks for itself again, because predictive analysis is used to predict how the situation may be in the future. For this type of analysis, it is usually necessary to build models using the relationships between some variables in order to predict the target. For instance, we can perform the regression analysis to find out the Apple stock price in the next year.
4. Prescriptive analysis. The prescriptive analysis is a more advanced version of predictive analysis. This type of analysis helps not just to extrapolate data, but to understand the best outcome under the given circumstances. As an example, when you use a navigator to find the route to the location you need, the algorithms inside the app use prescriptive analysis of weather, traffic jams, and other factors to find the simplest and the fastest route. This type of analysis is the most difficult to implement and requires not only knowledge in statistics but also the ability to use machine learning algorithms and computer modeling.
Conclusion
In this topic, we have talked about data analysis. We found out what takes it includes and what its types are. Let's remember a few important points:
- Data analysis is used for many tasks in the industry and helps in developing strategies and identifying any facts.
- There are 5 main stages in data analysis: problem identification, data collection and cleaning, analysis, and results interpretation.
- There are 4 main types of data analysis: descriptive, diagnostic, predictive, and prescriptive.
Now let's dive into some practice to consolidate the knowledge you have gained.