Seaborn: Strip Plot#

1. Strip Plot#

Welcome back to another lecture on visualizing categorical data with Seaborn! In the last lecture, we discussed in detail the importance of Categorical data and general representation. In this lecture, we shall continue from where we left previously. Our discussion has majorly been around beeswarm visualization using Seaborn’s Swarmplot. Today, we are going to discuss another important type of plot, i.e. Strip Plot, which is pretty similar to what we have already seen previously.

  • A Strip Plot can be drawn on its own, but it is also a good complement to a box or violin plot in cases where you want to show all observations along with some representation of the underlying distribution.

So, let us begin by importing our requisities, that we have gathered over time. Then, slowly we shall start exploring parameters and scenarios where this plot could come in handy for us:

# Importing intrinsic libraries:
import numpy as np
import pandas as pd
np.random.seed(0)
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
sns.set(style="whitegrid", palette="rocket")
import warnings
warnings.filterwarnings("ignore")

# Let us also get tableau colors we defined earlier:
tableau_20 = [(31, 119, 180), (174, 199, 232), (255, 127, 14), (255, 187, 120),
         (44, 160, 44), (152, 223, 138), (214, 39, 40), (255, 152, 150),
         (148, 103, 189), (197, 176, 213), (140, 86, 75), (196, 156, 148),
         (227, 119, 194), (247, 182, 210), (127, 127, 127), (199, 199, 199),
         (188, 189, 34), (219, 219, 141), (23, 190, 207), (158, 218, 229)]

# Scaling above RGB values to [0, 1] range, which is Matplotlib acceptable format:
for i in range(len(tableau_20)):
    r, g, b = tableau_20[i]
    tableau_20[i] = (r / 255., g / 255., b / 255.)
Matplotlib is building the font cache; this may take a moment.

We have already observed Strip Plot representation earlier as well so let us plot it at a basic level once again to begin our discussion with:

# load tips dataset

tips = sns.load_dataset("tips")
tips.head()
total_bill tip sex smoker day time size
0 16.99 1.01 Female No Sun Dinner 2
1 10.34 1.66 Male No Sun Dinner 3
2 21.01 3.50 Male No Sun Dinner 3
3 23.68 3.31 Male No Sun Dinner 2
4 24.59 3.61 Female No Sun Dinner 4
# Draw a single horizontal strip plot:

sns.stripplot(x=tips["total_bill"], size=4, color="green")
<AxesSubplot:xlabel='total_bill'>
../../../_images/9be974b61d2c6aa2c5d4a6911e45b06e80484dab748c7bfc161e77b361c878c5.png

Group the strips by a categorical variable#

sns.stripplot(x="day", y="total_bill", data=tips)
<AxesSubplot:xlabel='day', ylabel='total_bill'>
../../../_images/4c64e81a4b194ec6dbd66cabec6f6dd64f21cc4f4a668071fee568d7a5ef78c0.png
sns.stripplot(x='day', y='total_bill', data=tips, size=7, order=['Fri','Sat','Sun','Thur'])
<AxesSubplot:xlabel='day', ylabel='total_bill'>
../../../_images/a3d7ccbcd2396ecd50ab9b455822380690bd6fa2c19c1d360921bacc1965e2c2.png
# Plotting basic Strip Plot by adding little noise with 'jitter' parameter:

sns.stripplot(x="day", y="total_bill", data=tips, jitter=True)
<AxesSubplot:xlabel='day', ylabel='total_bill'>
../../../_images/fca7f2cf46301e7619a61a0cf033c6a669e26a29cb1bd56e2ae6beebf72fd973.png
# use of jitter

sns.stripplot(x='day', y='total_bill', data=tips,size=7, order=['Fri','Sat','Sun','Thur'], jitter=True)
<AxesSubplot:xlabel='day', ylabel='total_bill'>
../../../_images/86adad728bb696809d0298888c7db9b042c7577486fa3fd161d8628f5598d79d.png
# Draw horizontal strips:
sns.stripplot(x="total_bill", y="day", data=tips, jitter=True)
<AxesSubplot:xlabel='total_bill', ylabel='day'>
../../../_images/787a77eb44cb254c1aed750fe37a86a29484d87636bd2e6f34c140940ca144f4.png
# Draw outlines around the points:
sns.stripplot(x="total_bill", y="day", data=tips,jitter=True, linewidth=1)
<AxesSubplot:xlabel='total_bill', ylabel='day'>
../../../_images/a02acb0975f79251f02b1420b4aac9b44704faea689af908a883590437e0625f.png
# Control strip order by passing an explicit order:

sns.stripplot(x="time", y="tip", data=tips, order=["Dinner","Lunch"], size=8)
<AxesSubplot:xlabel='time', ylabel='tip'>
../../../_images/d99cc3a25bc5dafa5250dd5cae381985c3fc935844c909d058fad651f9438b52.png

Looks quite familiar, right? Indeed it is! This is again a scatterplot presentation with one of it’s variable as Categorical. From Tips dataset, we have chosen Days of a Week as our categorical variable against the Total bill generated in the restaurant for that particular day. Just like Swarm plot, even Strip Plot can be plotted on it’s own but generally it is coupled with other plots like Violin plot, as discussed earlier, and we shall go through those kind of coupling as well later on in this lecture.

Important to note is that sometimes you might find professionals referring to Strip plot as Dot Plot, so that shouldn’t actually confuse you as both refer to the same type of plot that we’re looking at right now.

For now, let us quickly go through the parameters to find if there is something that requires extra attention, or better to say, something that we haven’t covered as of now:

seaborn.stripplot(x=None, y=None, hue=None, data=None, order=None, hue_order=None, jitter=False, dodge=False, orient=None, color=None, palette=None, size=5, edgecolor='gray', linewidth=0, ax=None)

Not that difficult to guess by now that x, y and data are our three mandatory parameters. Quick note here would be that you would often find me as well as other domain specific people abbreviating parameters as params, so I just thought of letting you know so that it doesn’t ever confuse you. Moving on, rest of the optional parameters are similar to our Swarm plot and in the same order so nothing new out here for us to explore.

Let us then add few more optional params to our previous code to check it’s flexibility:

# Nest the strips within a second categorical variable:

sns.stripplot(x="day", y="total_bill", data=tips, jitter=True, hue='sex', palette='Set1')
<AxesSubplot:xlabel='day', ylabel='total_bill'>
../../../_images/220e67d365bea6678f773d353f47def99552604ab6578f8288c1dc65a8ab8e84.png
# Nest the strips within a second categorical variable:

sns.stripplot(x="day", y="total_bill", data=tips, jitter=True, hue='sex', palette='Set1', split=True)
<AxesSubplot:xlabel='day', ylabel='total_bill'>
../../../_images/4dda3a9c2e82a1cdbecd314dda7bed036d115ec8b54fc13f9a2da64f4d676247.png
# Nest the strips within a second categorical variable: with legend

sns.stripplot(x="day", y="total_bill", data=tips, jitter=True, hue="sex")
plt.legend()
<matplotlib.legend.Legend at 0x1a11b1a6e80>
../../../_images/3c3197340172b31b06e7fafe978643f18375b238aa8ec62e17da6829060a2953.png
# Draw each level of the hue variable at different locations on the major categorical axis:

sns.stripplot(x="day", y="total_bill", data=tips, jitter=True, hue="sex", dodge=True)
plt.legend()
<matplotlib.legend.Legend at 0x1a11b2e7a00>
../../../_images/02e71326998cb57a1aead147552638277953cac894c793681df95beebd15a44e.png
sns.stripplot(x="day", y="total_bill", hue="smoker", data=tips, jitter=True, palette="icefire", size= 7, dodge=True, lw=0.5)
<AxesSubplot:xlabel='day', ylabel='total_bill'>
../../../_images/003ba59a371c770d486813c6116f96bba6b46b3a4dcabeab5fb94ad98ec704df.png

Swarm Plot#

The Swarm Plot is similar to stripplot(), but the points are adjusted (only along the categorical axis) so that they don’t overlap. This gives a better representation of the distribution of values, although it does not scale as well to large numbers of observations (both in terms of the ability to show all the points and in terms of the computation needed to arrange them).

sns.swarmplot(x="day", y="total_bill", data=tips)
<AxesSubplot:xlabel='day', ylabel='total_bill'>
../../../_images/e0a525075a6c38e15aef3162e899f1914acdb0a7995a6ccfba162d929fabb3b7.png
sns.swarmplot(x="day", y="total_bill", hue='sex', data=tips, palette="Set1", split=True)
<AxesSubplot:xlabel='day', ylabel='total_bill'>
../../../_images/5f6aa735b90823404b7d4d3b9b6faf04c1439cc22a1808e4dddc41a661e11e3a.png

Draw swarm observations on top of a violin plot#

sns.violinplot(x="tip", y="day", data=tips,palette='rainbow')
sns.swarmplot(x="tip", y="day", data=tips,color='black',size=3)
<AxesSubplot:xlabel='tip', ylabel='day'>
../../../_images/4e0cf00c68f26829a662613f83029bfdf673e6d7e5a5f0182c6163ea5072e44d.png

There isn’t much to explain here that we not aware of in terms of inference, though remember that if you need to plot these scattered points horizontally, all that you need to do is interchange x and y variable position. Let us slightly alter the aesthetics of our plot now:

# Loading Iris dataset for this experiment:
iris = sns.load_dataset("iris")

# Melt dataset to 'long-form' or 'tidy' representation:
iris = pd.melt(iris, "species", var_name="measurement")
sns.stripplot(x="measurement", y="value", hue="species", order=['sepal_width','petal_width','sepal_length','petal_length'], 
              data=iris, palette="hsv_r", size=15, marker="D", alpha=.30)
<AxesSubplot:xlabel='measurement', ylabel='value'>
../../../_images/a85275b5cb803915761b2b7a158b9449d94021a7bf120595edb0e66c514cf944.png

Personally, I am not really a great fan of these kind of strips but I have often seen it’s implication in FinTech so if work for a bank or any closely related domain, you can play around with this by increasing the size or transparency with alpha, etc. You even have the option to change marker. Let me quickly replace this plot with + sign marker:

sns.stripplot(x="measurement", y="value", hue="species", order=['sepal_width','petal_width','sepal_length','petal_length'], 
              data=iris, palette="icefire", size=15, marker="P", alpha=.30)
<AxesSubplot:xlabel='measurement', ylabel='value'>
../../../_images/47f0e761c50336ffc635dac8adde65f29f5a81cb6423b1c367e5abf0ea9cd9b5.png

This graphical data analysis technique for summarizing a univariate data set, pretty well suits the requirement to plot the sorted response values along one axis. So now let us try to enhance it even more by coupling with other plots.

Meanwhile, I would also like you to know that generally in real-world, Strip Plot acts as a replacement to Histogram or Density Plot; but again Strip Plots are much more efficient with long-form data and not so good with wide-form data. For wide-form dataset, histograms and Density plot that we discussed in our previous lecture are going to be a better choice. Also need to remember that there is no thumb-rule for choosing a type of plot for any dataset because as mentioned earlier, it majorly depends upon the dataset in-hand and associated requirements.

Draw strips of observations on top of a box plot#

Just like we explored pairing of plots with Swarm plot, let us now try thise with Strip plot on our Tips dataset:

sns.boxplot(x="day", y="tip", data=tips, whis=np.inf, palette="cividis")
sns.stripplot(x="day", y="tip", data=tips, jitter=True, color=tableau_20[2])
<AxesSubplot:xlabel='day', ylabel='tip'>
../../../_images/f8c32facaa01d967bd3552f8d2f11d5aca470bd8b64299c33a418cee22df6b42.png
sns.boxplot(x="tip", y="day", data=tips, whis=np.inf)
sns.stripplot(x="tip", y="day", data=tips,jitter=True, color=".3")
<AxesSubplot:xlabel='tip', ylabel='day'>
../../../_images/f46c713b09110ff68da802476eae6741e7882891764c82bfc8aed61baa3da9d0.png

We shall be discussing Box plot later in detail but for now, the horizontal bar at the top that you see is actually known as whisker and marks the extreme values for particular variable. Here, it marks the top tip amount given for each day. np.inf is a Python or say Cython attribute for getting float values. As always, jitter helps in adding noise to our dataset. Let us get a mix with Violin plot now. As stated earlier, such mixes get a broader visualization impact on insight extraction.

Draw strips of observations on top of a violin plot#

sns.violinplot(x="day", y="total_bill", data=tips, inner=None)
sns.stripplot(x="day", y="total_bill", data=tips, jitter=True, color=tableau_20[7])
<AxesSubplot:xlabel='day', ylabel='total_bill'>
../../../_images/ead715f086792c0a3df703f2918bb83255fd6394ee9d5e9fb326393e8dd6a684.png
sns.stripplot(x="day", y="total_bill", data=tips, jitter=True)
sns.violinplot(x="day", y="total_bill", data=tips,inner=None, color=".8")
<AxesSubplot:xlabel='day', ylabel='total_bill'>
../../../_images/61b96dcdde9ed8352817cc74901cfdbe1de6d1fb65abbb79b7ebacd1e098c97f.png

Violin plot has few different optional params that we shall discuss in it’s specific lecture, but for now I would just like you to focus on the strip plot that gets beautifully enclosed within a Violin plot.

You must have noticed by now that everytime I am adding noise to the dataset with jitter, but not necessarily had been doing this earlier, so let me explain why I add this parameter. The reason is that Strip plots in general struggle to display multiple data points with the same value. And jitter adds that random noise to vertical axis to compensate.

OKAY! That pretty much gets us done with everything that Seaborn Strip plot in general has to offer so now it is time for me to share with you few tips and tricks of real-world business.

For illustration purpose, I shall build a simple Pandas DataFrame with just 3 variables, City, Gender and Age. And then try to plot a Strip plot with just 2 data points, one for each Gender. Let’s do this:

sample = [["Brooklyn", "Female", 80], ["Brooklyn", "Male", 66], ["Manhattan", "Female", 90], 
          ["Manhattan", "Male", 53], ["Queens", "Female", 75], ["Queens", "Male", 63]]

sample = pd.DataFrame(sample, columns=["City", "Gender", "Age"])

sns.stripplot(x="City", y="Age", hue="Gender", data=sample, palette="hsv", size=12, lw=1.5)
<AxesSubplot:xlabel='City', ylabel='Age'>
../../../_images/2f36c044f070731b41aba7eb555e15883bcae3aa5a6642d866069ed997ae313e.png

For this sample dataframe, we get our data points plotted but this plot doesn’t give us too much of a feel, because obviously we do not have that many data points. This is a scenario you may come across, and then it is a better idea to have this plot modified a little to look good. And for that, let us add a line attaching both data points:

# Customization help from underlying Matplotlib:
from matplotlib import collections

# Copy-paste previous code:
sample = [["Brooklyn", "Female", 80], ["Brooklyn", "Male", 66], ["Manhattan", "Female", 90], 
          ["Manhattan", "Male", 53], ["Queens", "Female", 75], ["Queens", "Male", 63]]

sample = pd.DataFrame(sample, columns=["City", "Gender", "Age"])

ax = sns.stripplot(x="City", y="Age", hue="Gender", data=sample, palette="hsv", size=12, lw=1.5)

# Modifications - Creating a Line connecting both Data points:
lines = ([[x, i] for i in df] for x, (_, df) in enumerate(sample.groupby(["City"], sort=False)["Age"]))
mlc = collections.LineCollection(lines, colors='red', linewidths=1.5)    
ax.add_collection(mlc)
<matplotlib.collections.LineCollection at 0x1a11b349eb0>
../../../_images/3583a6723888da972d32e3f9d8b620adf10c47ed85c930fdabc8fa9ce0e3889b.png

I believe that looks much better and shows relevance upto an extent. Well, this can be customized further by adding markers, etc. but I shall leave that as a homework for you.

Moving on, let us try to plot a Strip plot with Tips dataset once again. There is something I want to show you regarding Legends on the plot here:

sns.stripplot(x="sex", y="total_bill", hue="day", data=tips, jitter=True)
<AxesSubplot:xlabel='sex', ylabel='total_bill'>
../../../_images/3abd4cedce219edb6b811344fdd0826ecf9c90d61fff83d6e7c39b938ecdeff4.png

What if we want to remove the legend from our plot? Let’s try that:

sns.stripplot(x="sex", y="total_bill", hue="day", data=tips, jitter=True, legend=False)
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-29-f2db54e8947c> in <module>
----> 1 sns.stripplot(x="sex", y="total_bill", hue="day", data=tips, jitter=True, legend=False)

C:\ProgramData\Anaconda3\lib\site-packages\seaborn\_decorators.py in inner_f(*args, **kwargs)
     44             )
     45         kwargs.update({k: arg for k, arg in zip(sig.parameters, args)})
---> 46         return f(**kwargs)
     47     return inner_f
     48 

C:\ProgramData\Anaconda3\lib\site-packages\seaborn\categorical.py in stripplot(x, y, hue, data, order, hue_order, jitter, dodge, orient, color, palette, size, edgecolor, linewidth, ax, **kwargs)
   2817                        linewidth=linewidth))
   2818 
-> 2819     plotter.plot(ax, kwargs)
   2820     return ax
   2821 

C:\ProgramData\Anaconda3\lib\site-packages\seaborn\categorical.py in plot(self, ax, kws)
   1158     def plot(self, ax, kws):
   1159         """Make the plot."""
-> 1160         self.draw_stripplot(ax, kws)
   1161         self.add_legend_data(ax)
   1162         self.annotate_axes(ax)

C:\ProgramData\Anaconda3\lib\site-packages\seaborn\categorical.py in draw_stripplot(self, ax, kws)
   1134                 kws.update(c=palette[point_colors])
   1135                 if self.orient == "v":
-> 1136                     ax.scatter(cat_pos, strip_data, **kws)
   1137                 else:
   1138                     ax.scatter(strip_data, cat_pos, **kws)

C:\ProgramData\Anaconda3\lib\site-packages\matplotlib\__init__.py in inner(ax, data, *args, **kwargs)
   1445     def inner(ax, *args, data=None, **kwargs):
   1446         if data is None:
-> 1447             return func(ax, *map(sanitize_sequence, args), **kwargs)
   1448 
   1449         bound = new_sig.bind(ax, *args, **kwargs)

C:\ProgramData\Anaconda3\lib\site-packages\matplotlib\cbook\deprecation.py in wrapper(*inner_args, **inner_kwargs)
    409                          else deprecation_addendum,
    410                 **kwargs)
--> 411         return func(*inner_args, **inner_kwargs)
    412 
    413     return wrapper

C:\ProgramData\Anaconda3\lib\site-packages\matplotlib\axes\_axes.py in scatter(self, x, y, s, c, marker, cmap, norm, vmin, vmax, alpha, linewidths, verts, edgecolors, plotnonfinite, **kwargs)
   4496                 )
   4497         collection.set_transform(mtransforms.IdentityTransform())
-> 4498         collection.update(kwargs)
   4499 
   4500         if colors is None:

C:\ProgramData\Anaconda3\lib\site-packages\matplotlib\artist.py in update(self, props)
    994                     func = getattr(self, f"set_{k}", None)
    995                     if not callable(func):
--> 996                         raise AttributeError(f"{type(self).__name__!r} object "
    997                                              f"has no property {k!r}")
    998                     ret.append(func(v))

AttributeError: 'PathCollection' object has no property 'legend'
../../../_images/4f1d382a781c34815c6195af9ccbed6ad4377d82f5deaffa23f2323740a995df.png

Explanation:

Do you notice the “AttributeError” at the end? Generally a simple addition of legend=False should get you rid of Legends but with Strip Plot, it doesn’t work and people often struggle with it. So let me show you how simple it actually is to remove that legend:

ax = sns.stripplot(x="sex", y="total_bill", hue="day", data=tips, jitter=True)

ax.legend_.remove()

And we successfully removed Legend from our plot. Okay! Now let me show you something that might be useful, if you into Research domain or trying to get into one such domain. What if we require Mean, median, mode from our plot. Let me show you how to get median line for a random dataset:

Every time we try to customize our plot, be it any of Seaborn plot, we shall be referencing underlying Matplotlib, just as we’re going to do here for getting horizontal lines for the median points of our dataset. Let us try to get this done:

# Creating sample DataFrame:
sample = pd.DataFrame({"City": ["X", "X", "X", "Y", "Y", "Y"], "Moisture": [0.2, 0.275, 0.35, 0.7, 0.8, 0.9]})

# Calculating Median for both axes:
x_med = sample.loc[sample["City"] == 'X'].median()['Moisture']
y_med = sample.loc[sample["City"] == 'Y'].median()['Moisture']

sns.stripplot(x="City", y="Moisture", data=sample, hue="City", palette="icefire")

x = plt.gca().axes.get_xlim()

# how to plot median line?
plt.plot(x, len(x) * [x_med], sns.xkcd_rgb["denim blue"])
plt.plot(x, len(x) * [y_med], sns.xkcd_rgb["pale red"])

With all these variations that we’ve learnt, we now have a good idea to deal with Strip Plot, as & when required. In the next lecture, we shall be dealing with a plot that we’ve already observed umpteen number of times but this time, it would be a detailed discussion about it’s scope, variations and few more real-world scenarios.

Till then, I would highly recommend to play around with these plots as much as you can and if you have any doubts, feel free to post in the forum. Also, it would be nice if you could take out a minute of your time to leave a review or at least rate this course using Course Dashboard; because that shall help other students gauge if this course is worth their time and money.

And, I shall meet you in the next lecture where we will discuss Box Plot.