Understanding 3-way interactions between continuous variables

A recurrent problem I’ve found when analysing my data is that of trying to interpret 3-way interactions in multiple regression models. As always, the mantra of PLOT YOUR DATA* holds true: ggplot2 is particularly helpful for this type of visualisation, especially using facets (I will cover this in a later post). Easier said than done, though, when all three predictor variables are continuous. How do interactions between continuous variables work, and how can we plot them?

First, let’s think of a simple example: consider a response such as the time a person takes to run 100m. We might expect that height would have a significant effect, with taller people covering the ground more quickly. A continuous by continuous interaction (two-way) would mean that the effect of height on 100m time depends on another continuous variable – for example, weight. A significant height x weight interaction would mean that the slope of height on 100m time changes as weight changes: the effect of height might be mitigated at heavier weights. Already, this is getting a little tricky to visualise in your head, so what about when we add in another predictor (e.g. age, resting heart rate, volume of beer** consumed the night before, that kind of thing)?

var_runners

Some quick googling brings up a nice resource from the Institute for Digital Research and Education at UCLA, showing how to visualise three-way interactions between continuous variables using the statistical software Stata. However, I don’t want to use Stata. I want to use R. Just like you probably do. And if you don’t probably do, you should definitely do, and if you don’t now definitely do, you should probably leave. I think. Anyway, I decided that I would take their example and convert it to R so that we can follow along. Hooray for me!

I said, HOORAY FOR ME.

Ok, so we’re going to use a simple approach to explain a three-way interaction: computing the slopes of the dependent variable (e.g. 100m time) on the independent variable (e.g. height) when moderator variables (e.g. weight and last night’s beer volume) are held constant at different combinations of high and low values. Basically, this would show the effect of height on 100m time at these combinations:

  • Heavy weight, much beer
  • Heavy weight, little beer
  • Light weight, much beer
  • Light weight, little beer

Simple, huh? Let’s take a look at the model formula, where Y is the response variable (100m time), X the predictor (height), and Z and W being the moderators (weight, last night’s beer volume):

Y = b0 + b1X + b2Z + b3W + b4XZ + b5XW + b6ZW + b7XZW

This can be reordered into two groups: the first defines the intercept (terms that do not contain X), and the second defines the simple slope (terms that do contain X):

Y = (b0 + b2Z + b3W + b6ZW) + (b1 + b4Z + b5W + b7ZW)X

We can define high values of Z and W as being one standard deviation above their respective means and will denote them as zH and wH; the low values are one standard deviation below their means (zL and wL). As in the example I gave above, this gives 4 possible combinations: zHwH, zHwL, zLwH, zLwL. As an example, here is the formula when both Z and W are at high values:

simple slope 1 = b1 + b4zH + b5wH + b7zHwH

intercept 1 = b0 + b2zH + b3wH + b6zHwH

I’m going to use the same data set given in the example that I’m converting, and you can find the Stata file here. Don’t worry, we can convert this to R data format easily enough with the R package ‘foreign’:

library(foreign)
hsb2 <- read.dta("hsb2.dta")

As we’re not really concerned with the innards of this data set (yes, I’m afraid it’s not really the effect on running time of height, weight and beer consumption), let’s rename the variables that we’ll be using to Y, X, Z and W:

colnames(hsb2)[c(8,7,9,10)] <- c("y","x","z","w")

IT’S TIME TO REGRESS!

lm.hsb2 <- lm(y ~ x*z*w, data = hsb2)
summary(lm.hsb2)

…giving the following output:

Residuals:
     Min       1Q   Median       3Q      Max 
-20.0432  -4.8576   0.6129   4.0728  17.6322 
 
Coefficients:
              Estimate Std. Error t value Pr(>|t|)  
(Intercept)  1.665e+02  9.313e+01   1.788   0.0753 .
x           -3.014e+00  1.803e+00  -1.672   0.0962 .
z           -3.228e+00  1.926e+00  -1.676   0.0954 .
w           -2.435e+00  1.757e+00  -1.386   0.1674  
x:z          7.375e-02  3.590e-02   2.054   0.0413 *
x:w          5.531e-02  3.252e-02   1.701   0.0906 .
z:w          6.111e-02  3.503e-02   1.745   0.0827 .
x:z:w       -1.256e-03  6.277e-04  -2.001   0.0468 *
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
 
Residual standard error: 6.896 on 192 degrees of freedom
Multiple R-squared:  0.4893,	Adjusted R-squared:  0.4707 
F-statistic: 26.28 on 7 and 192 DF,  p-value: < 2.2e-16

You can see that the three-way interaction between the continuous variables (i.e., X:Z:W) is ‘significant’, just about. Let’s save any grousing about p-values for another day, shall we? Also, I would normally centre and standardise these types of continuous predictor variables before regression (see Gelman 2007, Gelman & Hill 2007, or Schielzeth 2010 for more details on why), but in this case I’m just converting code from a given example so we’ll take things at face value…

The next thing I want to do is to create the slope and intercept values for our 4 combinations of values (high Z + high W, high Z + low W, etc). I use the original data to create high and low values for Z and W (mean plus/minus one standard deviation), then use coefficients from the linear regression to compute the intercepts and slopes:

## High / low for w and z
zH <- mean(hsb2$z) + sd(hsb2$z)
zL <- mean(hsb2$z) - sd(hsb2$z)
wH <- mean(hsb2$w) + sd(hsb2$w)
wL <- mean(hsb2$w) - sd(hsb2$w)
 
## Get coefficients from regression
coefs.hsb2 <- coef(lm.hsb2)
coef.int <- coefs.hsb2["(Intercept)"]
coef.x <- coefs.hsb2["x"]
coef.z <- coefs.hsb2["z"]
coef.w <- coefs.hsb2["w"]
coef.xz <- coefs.hsb2["x:z"]
coef.xw <- coefs.hsb2["x:w"]
coef.zw <- coefs.hsb2["z:w"]
coef.xzw <- coefs.hsb2["x:z:w"]
 
## Create slopes
zHwH <- coef.x + zH*coef.xz + wH*coef.xw + zH*wH*coef.xzw
zHwL <- coef.x + zH*coef.xz + wL*coef.xw + zH*wL*coef.xzw
zLwH <- coef.x + zL*coef.xz + wH*coef.xw + zL*wH*coef.xzw
zLwL <- coef.x + zL*coef.xz + wL*coef.xw + zL*wL*coef.xzw
 
## Create intercepts
i.zHwH <- coef.int + zH*coef.z + wH*coef.w + zH*wH*coef.zw
i.zHwL <- coef.int + zH*coef.z + wL*coef.w + zH*wL*coef.zw
i.zLwH <- coef.int + zL*coef.z + wH*coef.w + zL*wH*coef.zw
i.zLwL <- coef.int + zL*coef.z + wL*coef.w + zL*wL*coef.zw

Now, I want to create a data frame that uses these intercepts and slopes to ‘predict’ lines: because I know that the lines are going to be straight, I simply compute the value of Y for the minimum and maximum values of X for each Z:W combination. I’m pretty sure it can be done with far less code than I’ve used here, but there you go:

## a set of values of x
x0 <- seq(min(hsb2$x), max(hsb2$x), length.out = 2)
 
df.HH <- data.frame(x0 = x0)
df.HH$y0 <- i.zHwH + df.HH$x0*zHwH
df.HH$type <- rep("zHwH", nrow(df.HH))
 
df.HL <- data.frame(x0 = x0)
df.HL$y0 <- i.zHwL + df.HL$x0*zHwL
df.HL$type <- rep("zHwL", nrow(df.HL))
 
df.LH <- data.frame(x0 = x0)
df.LH$y0 <- i.zLwH + df.LH$x0*zLwH
df.LH$type <- rep("zLwH", nrow(df.LH))
 
df.LL <- data.frame(x0 = x0)
df.LL$y0 <- i.zLwL + df.LL$x0*zLwL
df.LL$type <- rep("zLwL", nrow(df.LL))
 
## Create final data frame
df.pred <- rbind(df.HH, df.HL, df.LH, df.LL)
 
## Remove unnecessary data frames
rm(df.HH, df.HL, df.LH, df.LL)

Now, it’s time to plot the data! I’ve used ggplot2, and specified different colours and line types for each of our 4 combinations: red denotes high Z-value, and solid line denotes high W-value. I’ve also plotted the original data over the top:

## Call library
library(ggplot2)
 
## Convert 'type' to factor
df.pred$type <- factor(df.pred$type)
 
## Draw plot
ggplot(df.pred, aes(x = x0,
                    y = y0)) +
  geom_line(aes(colour = type,
                linetype = type)) +
  geom_jitter(data = hsb2, 
            aes(x = x,
                y = y),
             size = 3,
             alpha = 0.7) + 
  theme_bw() +
  theme(panel.background = element_rect(fill="ivory")) +
  theme(legend.key = element_blank()) +
  theme(text = element_text(size = 15)) +
  scale_colour_manual(name = "Partial effect",
                      labels = c("High z High w", 
                                 "High z Low w", 
                                 "Low z High w", 
                                 "Low z Low w"),
                      values = c("#E41A1C", 
                                 "#E41A1C", 
                                 "#377EB8", 
                                 "#377EB8")) +   
  scale_linetype_manual(name = "Partial effect",
                     labels = c("High z High w", 
                                "High z Low w", 
                                "Low z High w", 
                                "Low z Low w"),
                     values = c("solid", 
                                "longdash", 
                                "solid", 
                                "longdash"))

The finished article:

gg_ints

Clearly, three of the slopes are very similar (differing mostly in terms of intercept), while one is very different to the rest. The relationship between Y and X is changed dramatically when there are high values of Z in combination with low values of W. It can still be hard to grasp these in abstract terms, so let’s refer back to my initial example (just for ‘fun’): this would mean that when weight is high [Z] and beer consumption is low [W], there is a steep relationship between height [X] and running time [Y]: short, heavy teetotallers are slow, but tall, heavy teetotallers are really fast.

Remember, these lines are theoretical, and drawn from the regression coefficients rather than the data itself; it’s up to you to make sure that short, heavy teetotallers are represented within your data set. I, for one, drink quite a lot of beer.

—-

Want to know more about understanding and visualising interactions in multiple linear regression? Check out my follow-up posts:

Using ‘small multiples’ to visualise 3-way interactions between 1 continuous and 2 categorical variables

Three-way interactions between 2 continuous and 1 categorical variable

—-

* My supervisor calls this the ‘ADF method’, as in ‘ANY DAMN FOOL (can see what’s happening if you just actually plot your data)’

** I originally had beer count, but my esteemed colleague Lilly Herridge (who knows what she is talking about much more than I do) pointed out that counts are discrete, not continuous, hence the change to volume. SMARTS.

Code in this post was adapted from the UCLA Statistical Consulting Group’s page on three-way continuous interactions, and highlighted using inside-R’s ‘Pretty-R‘ tool.

17 thoughts on “Understanding 3-way interactions between continuous variables”

  1. Thanks so much for your post, this has made things much clearer. However I am having trouble understanding how to compute the coefficients (and therefore how to plot the slopes) as my output of the regression looks a little different to yours.

    I too have a model with Y (response variable), X (predictor; continuous), Z (moderator condition, with 3 levels – say high, medium and low conditions), and W (moderator, with 3 levels, say high, medium, low conditions). So essentially I have 9 conditions + a continuous independent variable.

    I am using random effects with participant so my model is : Result <- lmer(Y ~ X*Z*W+(1|PPX),data=Matrix).

    Here I obtain the output (with ZLow and WLow as the reference levels): – I've provide just the headings and the estimate for brevity

    Fixed effects: Estimate
    (Intercept) -0.0293
    X 0.0007
    WHigh -0.0357
    WMedium 0.0092
    ZHigh -0.0491
    ZMedium -0.0314
    X:WHigh 0.0007
    X:WMedium 0.0002
    X:ZHigh 0.0009
    X:ZMedium 0.0007
    WHigh:ZHigh 0.0004
    WMedium:ZHigh -0.0021
    WHigh:ZMedium -0.0955
    WMedium:ZMedium 0.0143
    X:PrimeAction:ZHigh 0.0002
    X:WMedium:ZHigh -0.0003
    X:PrimeAction:ZMedium -0.0013
    X:WMedium:ZMedium -0.0017

    Are you able to explain how I can obtain the coefficients for each of my 9 conditions so I can compute the slopes and intercepts as you have demonstrated? Maybe I am completely missing something here/or just confused, but this doesn’t seem as straight forward as your example, as the output provides interaction estimates for specific levels of Z and W rather than X:Z:W.

    Any help you can give is greatly appreciated!

    1. Hi Rebekah, great question! I’ll write this up as another post later with more explanation, but for now here is some code – can I ask you to check a couple of things for me first though? Firstly, I’m assuming ‘PrimeAction’ is supposed to read ‘WHigh’; secondly – and more importantly – can you double check the X:WMedium:ZMedium coefficient for me? It doesn’t make sense to me that it’s so negative…

      x <- seq(0, 100)

      int_global <- -0.0293
      X_coef <- 0.0007
      WHigh <- -0.0357
      WMedium <- 0.0092
      ZHigh <- -0.0491
      ZMedium <- -0.0314
      X_WHigh <- 0.0007
      X_WMedium <- 0.0002
      X_ZHigh <- 0.0009
      X_ZMedium <- 0.0007
      WHigh_ZHigh <- 0.0004
      WMedium_ZHigh <- -0.0021
      WHigh_ZMedium <- -0.0955
      WMedium_ZMedium <- 0.0143
      X_WHigh_ZHigh <- 0.0002
      X_WMedium_ZHigh <- -0.0003
      X_WHigh_ZMedium <- -0.0013
      X_WMedium_ZMedium <- -0.0017

      y.WL_ZL <- int_global + (x * X_coef)

      df.WL_ZL <- data.frame(x = x,
      W = "W:Low",
      Z = "Z:Low",
      y = y.WL_ZL)

      # Change Z

      y.WL_ZM <- int_global + ZMedium + (x * X_coef) + (x * X_ZMedium)

      df.WL_ZM <- data.frame(x = x,
      W = "W:Low",
      Z = "Z:Medium",
      y = y.WL_ZM)

      y.WL_ZH <- int_global + ZHigh + (x * X_coef) + (x * X_ZHigh)

      df.WL_ZH <- data.frame(x = x,
      W = "W:Low",
      Z = "Z:High",
      y = y.WL_ZM)

      # Change W

      y.WM_ZL <- int_global + WMedium + (x * X_coef) + (x * X_WMedium)

      df.WM_ZL <- data.frame(x = x,
      W = "W:Medium",
      Z = "Z:Low",
      y = y.WM_ZL)

      y.WH_ZL <- int_global + WHigh + (x * X_coef) + (x * X_WHigh)

      df.WH_ZL <- data.frame(x = x,
      W = "W:High",
      Z = "Z:Low",
      y = y.WM_ZL)

      # Change both

      y.WM_ZM <- int_global + WMedium + ZMedium + WMedium_ZMedium +
      (x * X_coef) +
      (x * X_ZMedium) +
      (x * X_WMedium) +
      (x * X_WMedium_ZMedium)

      df.WM_ZM <- data.frame(x = x,
      W = "W:Medium",
      Z = "Z:Medium",
      y = y.WM_ZM)

      y.WM_ZH <- int_global + WMedium + ZHigh + WMedium_ZHigh +
      (x * X_coef) +
      (x * X_ZHigh) +
      (x * X_WMedium) +
      (x * X_WMedium_ZHigh)

      df.WM_ZH <- data.frame(x = x,
      W = "W:Medium",
      Z = "Z:High",
      y = y.WM_ZH)

      y.WH_ZM <- int_global + WHigh + ZMedium + WHigh_ZMedium +
      (x * X_coef) +
      (x * X_ZMedium) +
      (x * X_WHigh) +
      (x * X_WHigh_ZMedium)

      df.WH_ZM <- data.frame(x = x,
      W = "W:High",
      Z = "Z:Medium",
      y = y.WM_ZH)

      y.WH_ZH <- int_global + WHigh + ZHigh + WHigh_ZHigh +
      (x * X_coef) +
      (x * X_ZHigh) +
      (x * X_WHigh) +
      (x * X_WHigh_ZHigh)

      df.WH_ZH <- data.frame(x = x,
      W = "W:High",
      Z = "Z:High",
      y = y.WH_ZH)

      # Concatenate data frames
      df.XWZ <- rbind(df.WL_ZL,
      df.WM_ZL,
      df.WH_ZL,
      df.WL_ZM,
      df.WL_ZH,
      df.WM_ZM,
      df.WM_ZH,
      df.WH_ZM,
      df.WH_ZH)

      # Remove individual frames
      rm(df.WL_ZL,
      df.WM_ZL,
      df.WH_ZL,
      df.WL_ZM,
      df.WL_ZH,
      df.WM_ZM,
      df.WM_ZH,
      df.WH_ZM,
      df.WH_ZH)

      # Plot
      library(ggplot2)

      ggplot(df.XWZ, aes(x,y)) + geom_point() +
      facet_grid(W~Z, as.table = FALSE) +
      theme_bw()

  2. Thanks for your quick reply. Yes you are correct on both counts, somehow I had mixed up the matrix a little (while trying to keep variables names consistent with your post)… Here it is again. Cheers!

    Fixed effects: Estimate
    (Intercept) -0.0293
    X 0.0007
    WHigh -0.0357
    WMedium 0.0092
    ZHigh -0.0491
    ZMedium -0.0314
    X:WHigh 0.0007
    X:WMedium 0.0002
    X:ZHigh 0.0009
    X:ZMedium 0.0007
    WHigh:ZHigh 0.0004
    WMedium:ZHigh -0.0021
    WHigh:ZMedium -0.0955
    WMedium:ZMedium 0.0143
    X:WHigh:ZHigh -0.0002
    X:WMedium:ZHigh -0.0004
    X:WHigh:ZMedium 0.0013
    X:WMedium:ZMedium -0.0004

    1. Excellent – the code as above should work, but just with switching those variables at the start… full variable list is now:

      int_global <- -0.0293
      X_coef <- 0.0007
      WHigh <- -0.0357
      WMedium <- 0.0092
      ZHigh <- -0.0491
      ZMedium <- -0.0314
      X_WHigh <- 0.0007
      X_WMedium <- 0.0002
      X_ZHigh <- 0.0009
      X_ZMedium <- 0.0007
      WHigh_ZHigh <- 0.0004
      WMedium_ZHigh <- -0.0021
      WHigh_ZMedium <- -0.0955
      WMedium_ZMedium <- 0.0143
      X_WHigh_ZHigh <- -0.0002
      X_WMedium_ZHigh <- -0.0004
      X_WHigh_ZMedium <- 0.0013
      X_WMedium_ZMedium <- -0.0004

      Hopefully the plot makes sense! As I said, I'll write up another post later describing this in more detail (maybe using these coefficients if that's ok with you?)…

  3. Dear Tom, your post and the additional comments on Rebekah’s questions have been very useful to me so far. Yet I have an additional question and was wandering if you could help. Do you have any idea on how to calculate p values for the different regression slopes to decide which of them is really different from zero?

    1. Hi Jan, I’m glad the posts have been helpful to you! As to your question, in the example given above then these are simply illustrations of what the model is telling us about patterns in the data. It’s a bit trickier to explain with the example above, because of multiple continuous variables… These visualisations enable us to better connect the effects shown in the model with what is happening in the data (and to make inferences about what is happening biologically).

      For a simpler example, let’s say we had data for the fastest times for the 100m in every Olympics, for both men and women. Our model would be driven by inspecting a data, but we would likely come up with questions and a model as follows:

      time ~ olympic_year * sex

      …asking three things:

      – is there a trend in how the fastest time changes over olympic years (for the reference level)?
      – does one sex have a faster average time than the other (second sex against reference level sex)?
      – are the sex-specific trends different?

      We tend to include interactions because we are interested in these higher-order terms – basically, is the effect of one predictor dependent on another predictor? Here, is the effect of ‘olympic year’ different depending on sex?

      If we did find a significant interaction between the sex-specific slopes, only one of them is tested against zero (the other is then tested against being different from the first slope). You could change which is the reference level, or subset the data to test the slope against zero for each sex separately, but these are less interesting questions than the main one.

      I hope this helps – please let me know if it is not clear…

  4. Hi Tom, Thank you very much for your step by step illustration. I followed your procedure to plot the three-way interactions but i couldn’t get the regression lines on the plot. I am not sure what is wrong. Could you tell me what is this in your ggplot codes:
    values = c(“#E41A1C”,
    “#E41A1C”,
    “#377EB8”,
    “#377EB8”)
    Thank you!

    1. Hi Wei, those are just codes for different colours for each line. I’ve been meaning to rewrite these posts about interactions for a long time to tidy them up but just haven’t had a chance yet! Let me know if you’re still having problems and I may be able to help…

      1. Thank you so much Tom for your reply!
        i used exactly the same codes as you showed but i could not get the lines on my plot. should i use geom_smooth in ggplot2 gets regression lines to display? Could you please help me with the codes?
        Many thanks,
        Wei

      2. Great to hear! Sorry for lack of response – I got a bit bogged down with other stuff. Please do let me know if there is an error in my code here (I will try to use this as impetus to update this set of posts…!)

  5. Hello Tom – just wanted to say thanks very much for this – first time I’ve been made to laugh whilst reading an R related post – thanks for that! The code ran perfectly and has helped loads understandings what’s what with the three-way interaction I’m dealing with. However, I’ve met a snag! I’ve used the sim_slope function to get an idea which of the slopes are significant, but I’m worried that I’m slipping into interpreting two-way interactions rather than the 3-way interaction. So, for example, my slope for x is negative at zHwL, but positive at zHwH. (I’ve set my low value to be 0 as this is more meaningful – subtracting 1 SD gives me a negative value on a non-negative variable) and the sim_slope output tells me that the negative slope at zHwL does not include 0 i.e. is significant, while the positive slope at zHwH includes 0. In my case I interpret this as meaning that when z is high and w is zero, the response to x is significantly negative. But is this not then essentially saying that the interaction between x and z is negative, given that w is set to 0, and not necessarily saying anything about the 3-way interaction? I’m worried that I’m not interpreting the 3-way interaction correctly…I’m not sure if this makes sense but if you have any thoughts (or jokes) I’d be happy to hear them. Thanks so much. Yours gratefully, Jess

Leave a comment