Understanding 3-way interactions between continuous variables

A recurrent problem I’ve found when analysing my data is that of trying to interpret 3-way interactions in multiple regression models. As always, the mantra of PLOT YOUR DATA* holds true: ggplot2 is particularly helpful for this type of visualisation, especially using facets (I will cover this in a later post). Easier said than done, though, when all three predictor variables are continuous. How do interactions between continuous variables work, and how can we plot them?

First, let’s think of a simple example: consider a response such as the time a person takes to run 100m. We might expect that height would have a significant effect, with taller people covering the ground more quickly. A continuous by continuous interaction (two-way) would mean that the effect of height on 100m time depends on another continuous variable – for example, weight. A significant height x weight interaction would mean that the slope of height on 100m time changes as weight changes: the effect of height might be mitigated at heavier weights. Already, this is getting a little tricky to visualise in your head, so what about when we add in another predictor (e.g. age, resting heart rate, volume of beer** consumed the night before, that kind of thing)?


Some quick googling brings up a nice resource from the Institute for Digital Research and Education at UCLA, showing how to visualise three-way interactions between continuous variables using the statistical software Stata. However, I don’t want to use Stata. I want to use R. Just like you probably do. And if you don’t probably do, you should definitely do, and if you don’t now definitely do, you should probably leave. I think. Anyway, I decided that I would take their example and convert it to R so that we can follow along. Hooray for me!


Ok, so we’re going to use a simple approach to explain a three-way interaction: computing the slopes of the dependent variable (e.g. 100m time) on the independent variable (e.g. height) when moderator variables (e.g. weight and last night’s beer volume) are held constant at different combinations of high and low values. Basically, this would show the effect of height on 100m time at these combinations:

  • Heavy weight, much beer
  • Heavy weight, little beer
  • Light weight, much beer
  • Light weight, little beer

Simple, huh? Let’s take a look at the model formula, where Y is the response variable (100m time), X the predictor (height), and Z and W being the moderators (weight, last night’s beer volume):

Y = b0 + b1X + b2Z + b3W + b4XZ + b5XW + b6ZW + b7XZW

This can be reordered into two groups: the first defines the intercept (terms that do not contain X), and the second defines the simple slope (terms that do contain X):

Y = (b0 + b2Z + b3W + b6ZW) + (b1 + b4Z + b5W + b7ZW)X

We can define high values of Z and W as being one standard deviation above their respective means and will denote them as zH and wH; the low values are one standard deviation below their means (zL and wL). As in the example I gave above, this gives 4 possible combinations: zHwH, zHwL, zLwH, zLwL. As an example, here is the formula when both Z and W are at high values:

simple slope 1 = b1 + b4zH + b5wH + b7zHwH

intercept 1 = b0 + b2zH + b3wH + b6zHwH

I’m going to use the same data set given in the example that I’m converting, and you can find the Stata file here. Don’t worry, we can convert this to R data format easily enough with the R package ‘foreign’:

hsb2 <- read.dta("hsb2.dta")

As we’re not really concerned with the innards of this data set (yes, I’m afraid it’s not really the effect on running time of height, weight and beer consumption), let’s rename the variables that we’ll be using to Y, X, Z and W:

colnames(hsb2)[c(8,7,9,10)] <- c("y","x","z","w")


lm.hsb2 <- lm(y ~ x*z*w, data = hsb2)

…giving the following output:

     Min       1Q   Median       3Q      Max 
-20.0432  -4.8576   0.6129   4.0728  17.6322 
              Estimate Std. Error t value Pr(>|t|)  
(Intercept)  1.665e+02  9.313e+01   1.788   0.0753 .
x           -3.014e+00  1.803e+00  -1.672   0.0962 .
z           -3.228e+00  1.926e+00  -1.676   0.0954 .
w           -2.435e+00  1.757e+00  -1.386   0.1674  
x:z          7.375e-02  3.590e-02   2.054   0.0413 *
x:w          5.531e-02  3.252e-02   1.701   0.0906 .
z:w          6.111e-02  3.503e-02   1.745   0.0827 .
x:z:w       -1.256e-03  6.277e-04  -2.001   0.0468 *
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 6.896 on 192 degrees of freedom
Multiple R-squared:  0.4893,	Adjusted R-squared:  0.4707 
F-statistic: 26.28 on 7 and 192 DF,  p-value: < 2.2e-16

You can see that the three-way interaction between the continuous variables (i.e., X:Z:W) is ‘significant’, just about. Let’s save any grousing about p-values for another day, shall we? Also, I would normally centre and standardise these types of continuous predictor variables before regression (see Gelman 2007, Gelman & Hill 2007, or Schielzeth 2010 for more details on why), but in this case I’m just converting code from a given example so we’ll take things at face value…

The next thing I want to do is to create the slope and intercept values for our 4 combinations of values (high Z + high W, high Z + low W, etc). I use the original data to create high and low values for Z and W (mean plus/minus one standard deviation), then use coefficients from the linear regression to compute the intercepts and slopes:

## High / low for w and z
zH <- mean(hsb2$z) + sd(hsb2$z)
zL <- mean(hsb2$z) - sd(hsb2$z)
wH <- mean(hsb2$w) + sd(hsb2$w)
wL <- mean(hsb2$w) - sd(hsb2$w)
## Get coefficients from regression
coefs.hsb2 <- coef(lm.hsb2)
coef.int <- coefs.hsb2["(Intercept)"]
coef.x <- coefs.hsb2["x"]
coef.z <- coefs.hsb2["z"]
coef.w <- coefs.hsb2["w"]
coef.xz <- coefs.hsb2["x:z"]
coef.xw <- coefs.hsb2["x:w"]
coef.zw <- coefs.hsb2["z:w"]
coef.xzw <- coefs.hsb2["x:z:w"]
## Create slopes
zHwH <- coef.x + zH*coef.xz + wH*coef.xw + zH*wH*coef.xzw
zHwL <- coef.x + zH*coef.xz + wL*coef.xw + zH*wL*coef.xzw
zLwH <- coef.x + zL*coef.xz + wH*coef.xw + zL*wH*coef.xzw
zLwL <- coef.x + zL*coef.xz + wL*coef.xw + zL*wL*coef.xzw
## Create intercepts
i.zHwH <- coef.int + zH*coef.z + wH*coef.w + zH*wH*coef.zw
i.zHwL <- coef.int + zH*coef.z + wL*coef.w + zH*wL*coef.zw
i.zLwH <- coef.int + zL*coef.z + wH*coef.w + zL*wH*coef.zw
i.zLwL <- coef.int + zL*coef.z + wL*coef.w + zL*wL*coef.zw

Now, I want to create a data frame that uses these intercepts and slopes to ‘predict’ lines: because I know that the lines are going to be straight, I simply compute the value of Y for the minimum and maximum values of X for each Z:W combination. I’m pretty sure it can be done with far less code than I’ve used here, but there you go:

## a set of values of x
x0 <- seq(min(hsb2$x), max(hsb2$x), length.out = 2)
df.HH <- data.frame(x0 = x0)
df.HH$y0 <- i.zHwH + df.HH$x0*zHwH
df.HH$type <- rep("zHwH", nrow(df.HH))
df.HL <- data.frame(x0 = x0)
df.HL$y0 <- i.zHwL + df.HL$x0*zHwL
df.HL$type <- rep("zHwL", nrow(df.HL))
df.LH <- data.frame(x0 = x0)
df.LH$y0 <- i.zLwH + df.LH$x0*zLwH
df.LH$type <- rep("zLwH", nrow(df.LH))
df.LL <- data.frame(x0 = x0)
df.LL$y0 <- i.zLwL + df.LL$x0*zLwL
df.LL$type <- rep("zLwL", nrow(df.LL))
## Create final data frame
df.pred <- rbind(df.HH, df.HL, df.LH, df.LL)
## Remove unnecessary data frames
rm(df.HH, df.HL, df.LH, df.LL)

Now, it’s time to plot the data! I’ve used ggplot2, and specified different colours and line types for each of our 4 combinations: red denotes high Z-value, and solid line denotes high W-value. I’ve also plotted the original data over the top:

## Call library
## Convert 'type' to factor
df.pred$type <- factor(df.pred$type)
## Draw plot
ggplot(df.pred, aes(x = x0,
                    y = y0)) +
  geom_line(aes(colour = type,
                linetype = type)) +
  geom_jitter(data = hsb2, 
            aes(x = x,
                y = y),
             size = 3,
             alpha = 0.7) + 
  theme_bw() +
  theme(panel.background = element_rect(fill="ivory")) +
  theme(legend.key = element_blank()) +
  theme(text = element_text(size = 15)) +
  scale_colour_manual(name = "Partial effect",
                      labels = c("High z High w", 
                                 "High z Low w", 
                                 "Low z High w", 
                                 "Low z Low w"),
                      values = c("#E41A1C", 
                                 "#377EB8")) +   
  scale_linetype_manual(name = "Partial effect",
                     labels = c("High z High w", 
                                "High z Low w", 
                                "Low z High w", 
                                "Low z Low w"),
                     values = c("solid", 

The finished article:


Clearly, three of the slopes are very similar (differing mostly in terms of intercept), while one is very different to the rest. The relationship between Y and X is changed dramatically when there are high values of Z in combination with low values of W. It can still be hard to grasp these in abstract terms, so let’s refer back to my initial example (just for ‘fun’): this would mean that when weight is high [Z] and beer consumption is low [W], there is a steep relationship between height [X] and running time [Y]: short, heavy teetotallers are slow, but tall, heavy teetotallers are really fast.

Remember, these lines are theoretical, and drawn from the regression coefficients rather than the data itself; it’s up to you to make sure that short, heavy teetotallers are represented within your data set. I, for one, drink quite a lot of beer.


Want to know more about understanding and visualising interactions in multiple linear regression? Check out my follow-up posts:

Using ‘small multiples’ to visualise 3-way interactions between 1 continuous and 2 categorical variables

Three-way interactions between 2 continuous and 1 categorical variable


* My supervisor calls this the ‘ADF method’, as in ‘ANY DAMN FOOL (can see what’s happening if you just actually plot your data)’

** I originally had beer count, but my esteemed colleague Lilly Herridge (who knows what she is talking about much more than I do) pointed out that counts are discrete, not continuous, hence the change to volume. SMARTS.

Code in this post was adapted from the UCLA Statistical Consulting Group’s page on three-way continuous interactions, and highlighted using inside-R’s ‘Pretty-R‘ tool.