In my previous post I talked about taking the Tableau Desktop Specialist exam because I felt I was missing a deeper understanding of some of the mechanics of Tableau; and preparing for the exam would be a great way to fill those gaps.
I believe that if you can’t explain a concept to someone else who doesn’t know anything (I always tell folks to explain things to me like I’m 10 because knowledge doesn’t have to be complicated), you probably don’t understand it as well as you think you do. So I decided to do my homework and share my lessons with the broader community. If you’re getting ready for the exam as well, I hope that by going through this post you’ll have gained a better understanding of some of the skills required as listed in the Exam Guide. This post covers the section ‘Understanding Tableau Concepts’ (see screenshot below).
At the bottom of this page you’ll find a list of additional ‘Exam Prep Resources’ that will help you get started and also the links to some of the ‘Source Materials’ I used.
Dimensions & Measures
When connecting a data source to Tableau you will have noticed there are 2 main categories on the ‘Data’ pane: Dimensions & Measures. These ‘categories’ or ‘types’ get assigned by Tableau depending on what type of data the columns in your data source contain, and they are considered to be fairly accurate.
So what do they actually mean and what type of data goes where?
Typically, dimensions are used to describe something while the measures will contain information that can be measured and counted.
Let’s take a look at something simple before we get back into Tableau.
A real-life example
I made up this small table representing my immediate family. Can you make out what the ‘Dimension’ and ‘Measure’ is in this case?
- ‘# People’ is a measure, it counts the number of family members in each group.
- ‘Family Relationship’ is a dimension, it describes what the numbers represent and provides the overall context. You could even say that without it the numbers themselves wouldn’t mean much to anyone.
And if I now connect this small data source to Tableau, this is what it would look like.
For most of the content in this post I’ll be working with a dataset provided by learningtableau.com. I recommend you download it if you’re interested in replicating some of the examples in this blog. Small side note: I’ll mostly be working with the ‘Flights’ dataset.
We covered the basics and now understand the 2 different data roles in Tableau so it’s time to connect to ‘Flights’ and explore further. In case you are curious what the different icons in the images below mean, I recommend you check out ‘Visual Cues & Icons in Tableau Desktop’; it’s a great reference guide.
To be able to better understand and explain the difference between measures & dimensions, I read several blogs/pages/posts available online (they are all listed at the end of this post) and from there I was able to come up this short recap which is when things started to make sense for me (better late than never right?).
- Measures are data fields that in most cases contain numeric values.
- They can be counted and aggregated, they are quantitative.
- Examples from ‘Flights’: Number of Flights, % of Delayed Flights
- Dimensions are data fields used to slice/group your measures.
- They provide context and will determine the level of detail (granularity) of your view.
- Dimensions are qualitative data fields, they describe something and can be numeric or text.
- Examples from ‘Flights’: Date, City, Airport Name
A simple example in Tableau
I put a Measure & Dimension on the view and it created the image below.
The dimension on Rows is determining the level of detail (read: granularity) in our view by providing context and slicing the measure on Columns across the different airports.
If I were to take out the dimension, the view would look something like this. I think you’ll agree there’s not much context there unless you’re only interested in the total count.
So that was it. You now understand what measures and dimensions are, so time to move on to what those green and blue pills represent because you’ll soon find out that green and blue are not synonymous for measures and dimensions.
Discrete (Blue) & Continuous (Green) Fields
Without going into the specifics, I hope the following example using different measures helps put the above statement into perspective (‘green and blue are not synonymous for measures and dimensions.’).
A real-life example
You already know I have 2 brothers and in this example we’ll call them Charles & John. While John is 195cm tall, Charles is 185cm (about the same height as myself).
Which is which? And keep in mind we’re looking at different types of measures only.
- The number of brothers I have is a discrete measure (1,2,3, …) (the chances of me having 1,25 brothers is very unlikely)
- The height of my brothers is a continuous measure as their height can technically fall anywhere between 0 to 195 cm.
You could say that while continuous data fields form an unbroken chain (read: axis), discrete data fields represent distinct fields/values.
Understanding the difference between discrete (blue) and continuous (green) pills is critical as the colour of the pills and where you put them in your view (filters, columns & rows, the marks card) will determine how your data is visualized so it’s important to have a solid understanding of the principles behind it.
Now that we’re done talking green vs blue and dimensions vs measures, the below really brought it home for me. Thanks Timothy Manning & Andy Kriebel for putting this out there, I’m totally getting T-shirts made once I’ve figured out a clever design 😊
- Blue things group your data
- Green things count your data
- Dimensions split up the view
- Measures fill up the view
Now that you understand the overall logic behind those green and blue pills, there is one more concept you need to get your head around. And I would say this is probably the most important one of all, because once you get it you’ll be able to better digest everything else that’s coming your way.
What is it?
In simple words aggregation is the concept of adding up different rows into a single cell of data.
Why would you use it?
When you require less granularity.
Instead of seeing the individual records, you want to go up a few levels to have a better view of what’s going on. The image below will help explain this in a more visual way.
What does aggregation look like in Tableau?
Remember when we connected to our ‘Flights’ dataset? Did you take a look at how the individual records were structured? Because that’s really where it all starts.
For this example let’s say that instead of looking at records for all the number of flights (and their status) per airport on a given day …
… you’re more interested to see the total number of flights per airport and how many of those were on time as that will make it easier to identify trends.
By adding the dimension on to Rows, the measure on to Columns and the dimension ‘Ontime Category’ on the Marks Card on Colours; Tableau will automatically Total (SUM) (read: aggregate) the number of flights per ‘Airport Name’ & ‘On Time Category’.
For more detail on the different types of aggregation I recommend you review the list of predefined Aggregations.
So why are Aggregations difficult to relate to sometimes?
For the longest time I heard everyone talk about aggregation but couldn’t get my head around the overall concept, let alone explain it to others. After having done my homework and reading this awesome blog by Mina Ozgen I started to see the light.
What does it all come down to?
Before doing anything else (and I can’t stress this enough), always make sure…
- You have a proper understanding of how your data(source) is structured before you dive in.
- You are clear on what questions you are looking to answer. Asking different questions can lead to a different order of operations (do I first aggregate and then apply the calculation or the other way around?) and perhaps even structuring your data differently.
The question Mina was looking to answer was the school wide male to female ratio knowing that each record/row represented a specific class. I took the liberty of making a mock-up in Excel of her example. Do you know which one is correct and can you explain why? I recommend you read her post and find out 😊
I hope my post will help you get your head around what for me used to be words on a page. Below you’ll find a list of blogs, posts and pages I went back to when trying to figure it all out. I’ve also included a list of ‘Exam Prep’ resources that will be helpful if you’re studying to the take the Desktop Specialist Exam.
If you’re taking the exam, I want to hear from you. And if this post helped you get ready, I would love to know as well so don’t be shy and drop me a note on twitter.
Thank you for reading and remember to always stay curious!