MathStatisticsRelations between data

Histograms and bar charts

6 minutes read

In this topic, you will learn about two of the most popular chart types that let you quickly see the shape and distribution of a dataset: bar charts and histograms. These statistical tools are extremely important as we can benefit from them when understanding the data. Both of them are useful for showing how data changes over time and for visually comparing different quantities. For example, you can compare monthly income and expenses.

Histogram

Histograms can be used to understand the distribution of your continuous data. Remember that continuous data is data that can take on an infinite number of possible values, like weight, time required to run a marathon, and so on. Judging by the shape of the histogram, the analyst can estimate which statistical distribution law the random variable obeys. For example, if all the columns of the histogram are approximately the same, then uniform, if they are bell-shaped, then normal, and so on.

But how do we get one?

Imagine we have some random numerical data. There are higher and lower values but they are not organized. Let's line up all the values in increasing order from left to right to make it easier for us. Obviously, one long line of values is not very efficient, so let's find values that are close to each other, stack them together, and put those "stacks" next to each other. That's how we get our histogram!

composition of a histogram

Histogram is a graph that shows the distribution of continuous numerical data using rectangles (sometimes called bins).

The height of a rectangle represents the distribution frequency of a variable: the amount, or how often that variable appears. The width of the rectangle represents the value of the variable. We use graphs such as histograms in conjunction with statistical values to provide a very strong understanding of sample data.

Basically, a histogram visualizes the distribution of data over a continuous interval or a limited period of time. Each bar on the histogram usually represents the frequency of something in a certain interval. The total area of the histogram is equal to the amount of data, and the difference between the maximum and the minimum value is called the range. Histograms help you identify concentrations of values, limit values, and whether there are gaps or outliers. In addition, they are useful for making a rough overview of the probability distribution.

For example, we have the results of testing in mathematics for 7th-grade students. In the table below there's data on the percentage of students who correctly completed the tasks and the number of tasks.

Percentage of students, %

253525-35

354535-45

455545-55

556555-65

657565-75

758575-85

859585-95

Number of solved problems

11

11

55

77

77

33

11

Based on the table, we can build a histogram! Note two important points for such graphs:

  • The horizontal axis shows the range of observed values of a quantity, divided into a certain number of intervals (in our situation, percentages of students).

  • The vertical axis shows the probability or frequency of its occurrence in each interval (in our situation, solved problems).

an example of a histogram

The range of percentage of students is from 23 to 95, so 95-23=72

Histograms can also be used to find the center of data samples -the mean. In the histogram given above, we can see the center of the data sample is between 5555 and 7575: (75+55)/2=65. In other words, histograms give information about the mean of data.

Bar chart

Another seemingly similar way of representing data is bar charts. The classic bar chart uses horizontal or vertical bars to show discrete, numerical comparisons between different categories. Remember that discrete data is data that can only take on a countable number of different values, such as 0, 1, 2, 3, 4, 5…100, or 1 million.

On one axis of the chart, specific compared categories are presented, and on the other, a scale of discrete values. Bar charts differ from histograms because do not show continuous development within a certain interval. Discrete bar chart data is categorized data that answers the question "How much?" - for each category. The only major disadvantage of bar charts is the design when there are a large number of bars.

A bar graph (or a bar chart) is a graph that displays data using bars of different heights. The bars drawn are of uniform width and represent the distinct values themselves.

The heights or the lengths of the bars represent the value of the variable, and these graphs are also used to compare certain quantities. We can use bar graphs to show the relative sizes of many things.

Let's imagine a firm of 400 employees, the percentage of monthly salary saved by each employee is given in the following table.

Savings, %

Number of employees

20

105

30

199

40

29

50

79

It would be much easier to process this data in a visual form, but we don't have continuous data, so we can't use a histogram. However, even though our data has only four categories — 20,30,40,5020, 30,40,50 percentages — we can still show that on a graph. This is called a bar chart:an example of a barchart

Bar charts can be horizontal or vertical, depending on the orientation of the bars.

  • Horizontal bars are easy to read since the layout mimics how we process information, where we read from left to right, starting at the top. A horizontal bar chart is a great option for long category names because there is more space on the left-hand side of the chart.

  • Conversely, a vertical bar chart can be a better choice if the data is ordinal, meaning the categories have a natural sequence, and ordering them left to right is more logical. Examples of such variables include age ranges, income brackets, and other groupings of quantities (e.g., 1-10, 11-20, etc.).different types of barcharts

Histograms VS bar charts

Histograms and bar charts are often confused with each other, so let's summarize the key differences between them.

Bar chart

Histogram

Requires discrete data

Requires continuous data

Bars are equal in width

Bars can be unequal in width

Bars are flexible and can be reordered

Bars are fixed and can't be reordered

Gaps between bars are allowed

No gaps between bars

Compares discrete data

Distributes non-discrete data

XX axis: different categories of data

YY axis: numeric values

XX axis: number ranges

YY axis: frequency count

difference between barcharts and histograms

Conclusion

In this topic we covered and compared two ways of visualizing data: bar charts and histograms. Both of them can be used in statistics because they are as clear and visible as possible and are easy to build. Bar graphs are good when your data is in categories, but when you have continuous data, use a histogram.

When determining the type of data, use this rule of thumb:

  • If you can count the number of results, then you are working with discrete data, like counting how many times a coin comes up heads.

  • If you can measure the result, you are working with continuous data, like measuring height, weight, or time.

3 learners liked this piece of theory. 1 didn't like it. What about you?
Report a typo