FacetGrid

Contents

FacetGrid#

Welcome to another lecture on Seaborn! Our journey began with assigning style and color to our plots as per our requirement. Then we moved on to visualize distribution of a dataset, and Linear relationships, and further we dived into topics covering plots for Categorical data. Every now and then, we’ve also roughly touched customization aspects using underlying Matplotlib code. That indeed is the end of the types of plots offered by Seaborn, and only leaves us with widening the scope of usage of all the plots that we have learnt till now.

Our discussion in upcoming lectures is majorly going to focus on using the core of Seaborn, based on which, Seaborn allows us to plot these amazing figures, that we had been detailing previously. This ofcourse isn’t going to be a brand new topic because every now & then I have used these in previous lectures but hereon we’re going to specifically deal with each one of those.

To introduce our new topic, i.e. Grids, we shall at first list the options available. Majorly, there are just two aspects to our discussion on Grids that includes:

  • FacetGrid

  • PairGrid Additionally, we also have a companion function for PairGrid to enhance execution speed of PairGrid, i.e.

  • Pairplot

Our discourse shall detail each one of these topics in-length for better understanding. As we have already covered the statistical inference of each type of plot, our emphasis shall mostly be on scaling and parameter variety of known plots on these grids. So let us commence our journey with FacetGrid in this lecture.

FacetGrid#

The term Facet here refers to a dimension or say, an aspect or a feature of a multi-dimensional dataset. This analysis is extremely useful when working with a multi-variate dataset which has a varied blend of datatypes, specially in Data Science & Machine Learning domain, where generally you would be dealing with huge datasets. If you’re a working pofessional, you know what I am talking about. And if you’re a fresher or a student, just to give you an idea, in this era of Big Data, an average CSV file (which is generally the most common form), or even a RDBMS size would vary from Gigabytes to Terabytes of data. If you are dealing with Image/Video/Audio datasets, then you may easily expect those to be in hundreds of gigabyte.

On the other hand, the term Grid refers to any framework with spaced bars that are parallel to or cross each other, to form a series of squares or rectangles. Statistically, these Grids are also used to represent and understand an entire population or just a sample space out of it. In general, these are pretty powerful tool for presentation, to describe our dataset and to study the interrelationship, or correlation between each facet of any environment.

Subplot grid for plotting conditional relationships.

The FacetGrid is an object that links a Pandas DataFrame to a matplotlib figure with a particular structure.

In particular, FacetGrid is used to draw plots with multiple Axes where each Axes shows the same relationship conditioned on different levels of some variable. It’s possible to condition on up to three variables by assigning variables to the rows and columns of the grid and using different colors for the plot elements.

The general approach to plotting here is called “small multiples”, where the same kind of plot is repeated multiple times, and the specific use of small multiples to display the same relationship conditioned on one ore more other variables is often called a “trellis plot”.

The basic workflow is to initialize the FacetGrid object with the dataset and the variables that are used to structure the grid. Then one or more plotting functions can be applied to each subset by calling FacetGrid.map() or FacetGrid.map_dataframe(). Finally, the plot can be tweaked with other methods to do things like change the axis labels, use different ticks, or add a legend. See the detailed code examples below for more information.

To kill our curiousity, let us plot a simple FacetGrid before continuing on with our discussion. And to do that, we shall once again quickly import our package dependencies and set the aesthetics for future use with built-in datasets.

# Importing intrinsic libraries:
import numpy as np
import pandas as pd
np.random.seed(101)
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
sns.set(style="whitegrid", palette="rocket")
import warnings
warnings.filterwarnings("ignore")

# Let us also get tableau colors we defined earlier:
tableau_20 = [(31, 119, 180), (174, 199, 232), (255, 127, 14), (255, 187, 120),
         (44, 160, 44), (152, 223, 138), (214, 39, 40), (255, 152, 150),
         (148, 103, 189), (197, 176, 213), (140, 86, 75), (196, 156, 148),
         (227, 119, 194), (247, 182, 210), (127, 127, 127), (199, 199, 199),
         (188, 189, 34), (219, 219, 141), (23, 190, 207), (158, 218, 229)]

# Scaling above RGB values to [0, 1] range, which is Matplotlib acceptable format:
for i in range(len(tableau_20)):
    r, g, b = tableau_20[i]
    tableau_20[i] = (r / 255., g / 255., b / 255.)
# Loading built-in Tips dataset:
tips = sns.load_dataset("tips")
tips.head()
total_bill tip sex smoker day time size
0 16.99 1.01 Female No Sun Dinner 2
1 10.34 1.66 Male No Sun Dinner 3
2 21.01 3.50 Male No Sun Dinner 3
3 23.68 3.31 Male No Sun Dinner 2
4 24.59 3.61 Female No Sun Dinner 4
# Initialize a 2x2 grid of facets using the tips dataset:
sns.set(style="ticks", color_codes=True)
sns.FacetGrid(tips, row='time', col='smoker')
<seaborn.axisgrid.FacetGrid at 0x1a2186f5940>
../../../_images/cb1444756d64797f0ff3fcd7e367bfe54a830b1a8615bb11e096c926c5dfb300.png
# Draw a univariate plot on each facet:
x = sns.FacetGrid(tips, col='time',row='smoker')
x = x.map(plt.hist,"total_bill")
../../../_images/e81ee0d5c0c8495a1f575263f08accd1d8ef7e1312eaa4f1d88153f49800581d.png
bins = np.arange(0,65,5)
x = sns.FacetGrid(tips, col="time",  row="smoker")
x =x.map(plt.hist, "total_bill", bins=bins, color="g")
../../../_images/ea2f067da045ae6664ce5cecf55f0b4c39475fd27fe82f19c6fd6c380d86fa1d.png
# Plot a bivariate function on each facet:

x = sns.FacetGrid(tips, col="time",  row="smoker")
x = x.map(plt.scatter, "total_bill", "tip", edgecolor="w")
../../../_images/9877837305a37e1c01aaed557b6820437ff0b893d9c75fe26d824373e293c09e.png
# Assign one of the variables to the color of the plot elements:

x = sns.FacetGrid(tips, col="time",  hue="smoker")
x = x.map(plt.scatter,"total_bill","tip",edgecolor = "w")
x =x.add_legend()
../../../_images/75220350ad98af42e231ecbdea8640bd8965d9447b2bd5aa263b93c82dc5ea6d.png
# Plotting a basic FacetGrid with Scatterplot representation:
ax = sns.FacetGrid(tips, col="sex", hue="smoker", size=5)
ax.map(plt.scatter, "total_bill", "tip", alpha=.6)
ax.add_legend()
<seaborn.axisgrid.FacetGrid at 0x1a218de36a0>
../../../_images/8b9df5e2ebd9618cc763dca2538e7d2bf8cf255e1a710f0679791afe26e600a3.png

This is a combined scatter representation of Tips dataset that we have seen earlier as well, where Total tip generated against Total Bill amount is drawn in accordance with their Gender and Smoking practice. With this we can conclude how FacetGrid helps us visualize distribution of a variable or the relationship between multiple variables separately within subsets of our dataset. Important to note here is that Seaborn FacetGrid can only support upto 3-Dimensional figures, using row, column and hue dimensions of the grid for Categorical and Discrete variables within our dataset.

Let us now have a look at the parameters offered or supported by Seaborn for a FacetGrid: seaborn.FacetGrid(data, row=None, col=None, hue=None, col_wrap=None, sharex=True, sharey=True, size=3, aspect=1, palette=None, row_order=None, col_order=None, hue_order=None, hue_kws=None, dropna=True, legend_out=True, despine=True, margin_titles=False, xlim=None, ylim=None, subplot_kws=None, gridspec_kws=None

There seems to be few new parameters out here for us, so let us one-by-one understand their scope before we start experimenting with those on our plots:

  • We are well acquainted with mandatory data, row, col and hue parameters.

  • Next is col_wrap that defines the width of our variable selected as col dimension, so that the column facets can span multiple rows.

  • sharex helps us draft dedicated Y-axis for each sub-plot, if declared False. Same concept holds good for sharey as well.

  • size helps us determine the size of our grid-frame.

  • We may also declare hue_kws parameter that lets us control other aesthetics of our plot.

  • dropna drops all the NULL variables from the selected features; and legend_out places the Legend either inside or outside our plot, as we’ve already seen.

  • margin_titles fetch the feature names from our dataset; and xlim & ylim additionally offers Matplotlib style limitation to each of our axes on the grid.

That pretty much seems to cover intrinsic parameters so let us now try to use them one-by-one with slight modifications:

Let us begin by pulling the Legend inside our FacetGrid and creating a Header for our grid:

ax = sns.FacetGrid(tips, col="sex", hue="smoker", size=5, legend_out=False)
ax.map(plt.scatter, "total_bill", "tip", alpha=.6)
ax.add_legend()

plt.suptitle('Tip Collection based on Gender and Smoking', fontsize=11)
Text(0.5, 0.98, 'Tip Collection based on Gender and Smoking')
../../../_images/7bacb6beaeb7e30da89043f5e2ef1826ae15c0a87267b3461b80d80c561f0ab6.png

So declaring legend_out as False and creating a Superhead title using Matplotlib seems to be working great on our Grid. Customization on Header size gives us an add-on capability as well. Right now, we are going by default palette for marker colors which can be customized by setting to a different one. Let us try other parameters as well:

Actually, before we jump further into utilization of other parameters, let me quickly take you behind the curtain of this plot. As visible, we assigned ax as a variable to our FacetGrid for creating a visualizaion figure, and then plotted a Scatterplot on top of it, before decorating further with a Legend and a Super Title. So when we initialized the assignment of ax, the grid actually gets created using backend Matplotlib figure and axes, though doesn’t plot anything on top of it. This is when we call Scatterplot on our sample data, that in turn at the backend calls FacetGrid.map() function to map this grid to our Scatterplot. We intended to draw a linear relation plot, and thus entered multiple variable names, i.e. Total Bill and associated Tip to form facets, or dimensions of our grid.

# Change the size and aspect ratio of each facet:

x = sns.FacetGrid(tips, col="day", size=5, aspect=.5)
x =x.map(plt.hist, "total_bill", bins=bins)
../../../_images/6e96e97837feb5726a90260ffd0e8ae35e3f84b6defd89b4365d88ed4aabd388.png
# Specify the order for plot elements:

g = sns.FacetGrid(tips, col="smoker", col_order=["Yes", "No"])
g = g.map(plt.hist, "total_bill", bins=bins, color="m")
../../../_images/a7f92cacfe4f44e8d60ec75bb33206bea78da7d70f71b654810134a694854152.png
# Use a different color palette:

kws = dict(s=50, linewidth=.5, edgecolor="w")
g =sns.FacetGrid(tips, col="sex", hue="time", palette="Set1",\
                   hue_order=["Dinner", "Lunch"]) 

g = g.map(plt.scatter, "total_bill", "tip", **kws)
g.add_legend()
<seaborn.axisgrid.FacetGrid at 0x1a218c77d30>
../../../_images/cd3c506feae9808e7a7f2ef551a48a044fcbc30deb55b67f9e35c7bb4e28161b.png
# Use a dictionary mapping hue levels to colors:

pal = dict(Lunch="seagreen", Dinner="gray")
g = sns.FacetGrid(tips, col="sex", hue="time", palette=pal,\
                   hue_order=["Dinner", "Lunch"])

g = g.map(plt.scatter, "total_bill", "tip", **kws)
g.add_legend()
<seaborn.axisgrid.FacetGrid at 0x1a21896cac0>
../../../_images/2aae4f5afed6033c539249585fbd2a5a6940fbd1da3843c201f0c42c0a417f8c.png
# FacetGrid with boxplot
x = sns.FacetGrid(tips,col= 'day')
x = x.map(sns.boxplot,"total_bill","time")
../../../_images/727f87832721af84633f6b7245fe1cb23a17e9d25b649ed21ed32bdab55224cd.png

Also important to note is the use the matplotlib.pyplot.gca() function, if required to set the current axes on our Grid. This shall fetch the current Axes instance on our current figure matching the given keyword arguments or params, & if unavailable, it shall even create one.

# Let us create a dummy DataFrame:
football = pd.DataFrame({
        "Wins": [76, 64, 38, 78, 63, 45, 32, 46, 13, 40, 59, 80],
        "Loss": [55, 67, 70, 56, 59, 69, 72, 24, 45, 21, 58, 22],
        "Team": ["Arsenal"] * 4 + ["Liverpool"] * 4 + ["Chelsea"] * 4,
        "Year": [2015, 2016, 2017, 2018] * 3})

Before I begin illustration using this DataFrame, on a lighter note, I would add a disclosure that this is a dummy dataset and holds no resemblance whatsoever to actual records of respective Soccer clubs. So if you’re one among those die-hard fans of any of these clubs, kindly excuse me if the numbers don’t tally, as they are all fabricated.

Here, football is kind of a Time-series Pandas DataFrame that in entirety reflects 4 features, where Wins and Loss variables represent the quarterly Scorecard of three soccer Teams for last four Years, from 2015 to 2018. Let us check how this DataFrame looks like:

football
Wins Loss Team Year
0 76 55 Arsenal 2015
1 64 67 Arsenal 2016
2 38 70 Arsenal 2017
3 78 56 Arsenal 2018
4 63 59 Liverpool 2015
5 45 69 Liverpool 2016
6 32 72 Liverpool 2017
7 46 24 Liverpool 2018
8 13 45 Chelsea 2015
9 40 21 Chelsea 2016
10 59 58 Chelsea 2017
11 80 22 Chelsea 2018

This looks pretty good for our purpose so now let us initialize our FacetGrid on top of it and try to obtain a time-indexed with further plotting. In production environment, to keep our solution scalable, this is generally done by defining a function for data manipulation so we shall try that in this example:

# Defining a customizable function to be precise with our requirements & shall discuss it a little later:
# We shall be using a new type of plot here that I shall discuss in detail later on.
def football_plot(data, color):
    sns.heatmap(data[["Wins", "Loss"]])

# 'margin_titles' won't necessarily guarantee desired results so better to be cautious:
ax = sns.FacetGrid(football, col="Team", size=5, margin_titles=True)
ax.map_dataframe(football_plot)
<seaborn.axisgrid.FacetGrid at 0x1a21a1a7310>
../../../_images/896a7c1d3382e1393d82c8332f661f81aed4af8db65d249e98cd9b8f0754888b.png
ax = sns.FacetGrid(football, col="Team", size=5)
ax.map(sns.kdeplot, "Wins", "Year", hist=True, lw=2)
<seaborn.axisgrid.FacetGrid at 0x1a21a492c70>
../../../_images/ac2b14e583e4fc6bf54ac58693d0bb20552794290c50c3a7a12e9007873f0c52.png

As visible, Heatmap plots rectangular boxes for data points as a color-encoded matrix, and this is a topic we shall be discussing in detail in another Lecture but for now, I just wanted you to have a preview of it, and hence used it on top of our FacetGrid. Another good thing to know with FacetGrid is gridspec module which allows Matplotlib params to be passed for drawing attention to a particular facet by increasing its size. To better understand, let us try to use this module now:

# Loading built-in Titanic Dataset:
titanic = sns.load_dataset("titanic")

# Assigning reformed `deck` column:
titanic = titanic.assign(deck=titanic.deck.astype(object)).sort_values("deck")

# Creating Grid and Plot:
ax = sns.FacetGrid(titanic, col="class", sharex=False, size=7,
                  gridspec_kws={"width_ratios": [3.5, 2, 2]})
ax.map(sns.boxplot, "deck", "age")

ax.set_titles(fontweight='bold', size=17)
<seaborn.axisgrid.FacetGrid at 0x1a218d2d6a0>
../../../_images/a4c5308a82f70de018ec4559b6718ad0ede6df6ff0a3ade87064a9a5fb1d6cef.png

Breaking it down, at first we import our built-in Titanic dataset, and then assign a new column, i.e. deck using Pandas .assign() function. Here we declare this new column as a component of pre-existing deck column from Titanic dataset, but as a sorted object. Then we create our FacetGrid mentioning the DataFrame, the column on which Grids get segregated but with shared across Y-axis; for chosen deck against Age of passengers. Next in action is our grid keyword specifications, where we decide the width ratio of the plot that shall be passed on to these grids. Finally, we have our Box Plot representing values of Age feature across respective decks.

Now let us try to use different axes with same size for multivariate plotting on Tips dataset:

# Loading built-in Tips dataset:
tips = sns.load_dataset("tips")

# Mapping a Scatterplot to our FacetGrid:
ax = sns.FacetGrid(tips, col="smoker", row="sex", size=3.5)
ax = (ax.map(plt.scatter, "total_bill", "tip", color=tableau_20[6]).set_axis_labels("Total Bill Generated (USD)", "Tip Amount"))

# Increasing size for subplot Titles & making it appear Bolder:
ax.set_titles(fontweight='bold', size=11)
<seaborn.axisgrid.FacetGrid at 0x1a21a449490>
../../../_images/76814d2038f77cc6df3e84386ac79aaa7eb3613eec6dda1c60ab3548372dd91b.png

Scatterplot dealing with data that has multiple variables is no new science for us so instead let me highlight what .map() does for us. This function actually allows us to project our figure axes, in accordance to which our Scatterplot spreads the feature datapoints across the grids, depending upon the segregators. Here we have sex and smoker as our segregators (When I use the general term “segregator”, it just refers to the columns on which we decide to determine the layout). This comes in really handy as we can pass Matplotlib parrameters for further customization of our plot. At the end, when we add .set_axis_labels() it gets easy for us to label our axes but please note that this method shall work for you only when you’re dealing with grids, hence you didn’t observe me adapting to this function, while detailing various other plots.

  • Let us now talk about the football_plot function we defined earlier with football DataFrame. The only reason I didn’t speak of it then was because I wanted you to go through a few more parameter implementation before getting into this. There are 3 important rules for defining such functions that are supported by FacetGrid.map:

    -They must take array-like inputs as positional arguments, with the first argument corresponding to the X-Axis, and the second argument corresponding to y-Axis. -They must also accept two keyword arguments: color, and label. If you want to use a hue variable, than these should get passed to the underlying plotting function (As a side note: You may just catch **kwargs and not do anything with them, if it’s not relevant to the specific plot you’re making. -Lastly, when called, they must draw a plot on the “currently active” matplotlib Axes.

  • Important to note is that there may be cases where your function draws a plot that looks correct without taking x, y, positional inputs and then it is better to just call the plot, like: ax.set_axis_labels("Column_1", "Column_2") after you use .map(), which should rename your axes properly. Alternatively, you may also want to do something like ax.set(xticklabels=) to get more meaningful ticks.

  • Well I am also quite stoked to mention another important function (though not that comonly used), that is FacetGrid.map_dataframe(). The rules here are similar to FacetGrid.map but the function you pass must accept a DataFrame input in a parameter called data, and instead of taking array-like positional inputs it takes strings that correspond to variables in that dataframe. Then on each iteration through the facets, the function will be called with the Input dataframe, masked to just the values for that combination of row, col, and hue levels.

Another important to note with both the above-mentioned functions is that the return value is ignored so you don’t really have to worry about it. Just for illustration purpose, let us consider drafting a function that just draws a horizontal line in each facet at y=2 and ignores all the Input data*:

# That is all you require in your function:
def plot_func(x, y, color=None, label=None):
    ax.map(plt.axhline, y=2)

I know this function concept might look little hazy at the moment but once you have covered more on dates and maptplotlib syntax in particular, the picture shall get much more clearer for you.

Let us look at one more example of FacetGrid() and this time let us again create a synthetic DataFrame for this demonstration:

# Creating synthetic Data (Don't focus on how it's getting created):
units = np.linspace(0, 50)
A = [1., 18., 40., 100.]

df = []
for i in A:
    V1 = np.sin(i * units)
    V2 = np.cos(i * units)
    df.append(pd.DataFrame({"units": units, "V_1": V1, "V_2": V2, "A": i}))

sample = pd.concat(df, axis=0)
# Previewing DataFrame:
sample.head(10)
sample.describe()
units V_1 V_2 A
count 200.000000 200.000000 200.000000 200.000000
mean 25.000000 -0.005356 -0.001868 39.750000
std 14.762329 0.682091 0.734673 37.526373
min 0.000000 -0.997677 -0.999703 1.000000
25% 12.244898 -0.670396 -0.746011 13.750000
50% 25.000000 0.007234 0.063577 29.000000
75% 37.755102 0.651253 0.744281 55.000000
max 50.000000 0.999926 1.000000 100.000000
# Melting our sample DataFrame: 
sample_melt = sample.melt(id_vars=['A', 'units'], value_vars=['V_1', 'V_2'])

# Creating plot:
ax = sns.FacetGrid(sample_melt, col='A', hue='A', palette="icefire", row='variable', sharey='row', margin_titles=True)
ax.map(plt.plot, 'units', 'value')
ax.add_legend()
<seaborn.axisgrid.FacetGrid at 0x1a21b6eba00>
../../../_images/2c3d3e1272ab35fa2331e3441d713eeb8cbec5cf16c6e728d7181c41d95122fb.png

This process shall come in handy if you ever wish to vertically stack rows of subplots on top of one another. You do not really have to focus on the process of creating dataset, as generally you will have your dataset provided with a problem statement. For our plot, you may just consider these visual variations as Sinusoidal waves. I shall attach a link in our notebook, if you wish to dig deeper into what these are and how are they actually computed.

Our next lecture would be pretty much a small follow up to this lecture, where we would try to bring more of Categorical data to our FacetGrid(). Meanwhile, I would again suggest you to play around with analyzing and plotting datasets, as much as you can because visualization is a very important facet of Data Science & Research. And, I shall see you in our next lecture with Heat Map.