Bubble charts are a popular way to present quantitative data. Often the area of the circles in the bubble chart are used to encode the value of a parameter. The larger the value, the greater the area, the bigger the circle. This makes intuitive sense and can be quite effective for rough comparisons. However, it is important to understand that we are not very good at estimating area and these representations distorts the data because we are not very good at estimating area.
Take a look at the following 4 circles:
Figure 1.(drag to move shapes)
It is relatively easy to see that the middle two circles are the same size, that the one on the left is smallest and the one on the right is somewhere in the middle.
It is more difficult to use this chart to infer the absolute differences between the shapes. Before continuing, drag the circles around a bit and make a guess as to how much smaller the two on the ends are than the ones in the middle.
Here is the answer: The area of the circle on the right is half that of the two in the middle and the one on the left has an area two thirds of the middle two.
How did you do? Odds are you overestimated the area of the two smaller circles. Common guesses are two thirds for the smallest one and three quarters for the larger one. This tendency to under estimate the relative differences in area stem from the fact that we are very good at assessing relative differences in length and height but not area. The height of a circle increases faster than the area. Since our brains focus on height, we over estimate the area.
Squares aren’t much better. The following figure shows the same data encoded as the area of 4 different squares:
Figure 2.(drag to move shapes)
The problem is the same. The length and height of the squares increase faster than their area. Since the height of the last one is more than two-thirds of the two in the middle and the first one is more than half as tall as the larger ones, the tendency is still to under estimate the relative difference between the large and small squares.
This is why bar charts, no matter how boring, are still the default choice for effective communication of absolute differences between items in a dataset. Here is the same data a third time with the differences encoded as bar height.
Figure 3.(drag to move shapes)
Even without an axes or guidelines, the 1/2 and 2/3 values are very easy to see in this representation.
The challenges of using area become even more obvious as the complexity and interactivity of a visualization increases. The final figure is designed to give an idea of how challenging it can get.
The graph shows a square and three circles. The two larger circles are the same size and each have an area just under half of the total area of the square. The sum of the areas of the three circles is equal to the total area of the square.
Figure 4.(drag to move shapes)
As with all of the figures in this post, the circles in this final figure can be dragged around. In the initial arrangement, with two circles fully inside the square and one circle crossing the squares lower edge, the area of the uncovered portion of the square (red not purple) is equal to the area of circle outside of the square (blue, not purple).
This might seem like a contrived example, but this arrangement is based on a layout from a current project. And a figure similar to this was the inspiration for this post.
Bubble charts are great for creating an engaging visual experience, but encoding data into the area of a circle does have its drawbacks.
The math behind the final figure:
The square is 100 units on a side. This gives it an area of 10,000 units (LxH)
The radius of the two larger circles each have a radius of 38.7 units and the smaller circle has a radius of 13.7 units
The area of a circle is PI r^2:
Large circle area = PI * 38.7 * 38.7 = 4705 .1units
Small circle area = PI * 13.7 * 13.7 = 589.6 units
4705.1 + 4705.1 + 589.6 = 9,999.8 ~ 10,000 units