A recurrent problem I’ve found when analysing my data is that of trying to interpret 3-way interactions in multiple regression models. As always, the mantra of PLOT YOUR DATA* holds true: ggplot2 is particularly helpful for this type of visualisation, especially using facets (I will cover this in a later post). Easier said than done, though, when all three predictor variables are continuous. How do interactions between continuous variables work, and how can we plot them?

First, let’s think of a simple example: consider a response such as the time a person takes to run 100m. We might expect that height would have a significant effect, with taller people covering the ground more quickly. A continuous by continuous interaction (two-way) would mean that the effect of height on 100m time depends on another continuous variable – for example, weight. A significant height x weight interaction would mean that the slope of height on 100m time changes as weight changes: the effect of height might be mitigated at heavier weights. Already, this is getting a little tricky to visualise in your head, so what about when we add in another predictor (e.g. age, resting heart rate, volume of beer** consumed the night before, that kind of thing)?

Some quick googling brings up a nice resource from the Institute for Digital Research and Education at UCLA, showing how to visualise three-way interactions between continuous variables using the statistical software Stata. However, I don’t want to use Stata. I want to use R. Just like you probably do. And if you don’t probably do, you should definitely do, and if you don’t now definitely do, you should probably leave. I think. Anyway, I decided that I would take their example and convert it to R so that we can follow along. Hooray for me!

I said, HOORAY FOR ME.

Ok, so we’re going to use a simple approach to explain a three-way interaction: computing the slopes of the dependent variable (e.g. 100m time) on the independent variable (e.g. height) when moderator variables (e.g. weight and last night’s beer volume) are held constant at different combinations of high and low values. Basically, this would show the effect of height on 100m time at these combinations:

- Heavy weight, much beer
- Heavy weight, little beer
- Light weight, much beer
- Light weight, little beer

Simple, huh? Let’s take a look at the model formula, where Y is the response variable (100m time), X the predictor (height), and Z and W being the moderators (weight, last night’s beer volume):

`Y = b0 + b1X + b2Z + b3W + b4XZ + b5XW + b6ZW + b7XZW`

This can be reordered into two groups: the first defines the intercept (terms that do not contain X), and the second defines the simple slope (terms that do contain X):

`Y = (b0 + b2Z + b3W + b6ZW) + (b1 + b4Z + b5W + b7ZW)X`

We can define high values of Z and W as being one standard deviation above their respective means and will denote them as zH and wH; the low values are one standard deviation below their means (zL and wL). As in the example I gave above, this gives 4 possible combinations: zHwH, zHwL, zLwH, zLwL. As an example, here is the formula when both Z and W are at high values:

`simple slope 1 = b1 + b4zH + b5wH + b7zHwH`

intercept 1 = b0 + b2zH + b3wH + b6zHwH

I’m going to use the same data set given in the example that I’m converting, and you can find the Stata file here. Don’t worry, we can convert this to R data format easily enough with the R package ‘foreign’:

As we’re not really concerned with the innards of this data set (yes, I’m afraid it’s not really the effect on running time of height, weight and beer consumption), let’s rename the variables that we’ll be using to Y, X, Z and W:

IT’S TIME TO REGRESS!

…giving the following output:

Residuals: Min 1Q Median 3Q Max -20.0432 -4.8576 0.6129 4.0728 17.6322 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 1.665e+02 9.313e+01 1.788 0.0753 . x -3.014e+00 1.803e+00 -1.672 0.0962 . z -3.228e+00 1.926e+00 -1.676 0.0954 . w -2.435e+00 1.757e+00 -1.386 0.1674 x:z 7.375e-02 3.590e-02 2.054 0.0413 * x:w 5.531e-02 3.252e-02 1.701 0.0906 . z:w 6.111e-02 3.503e-02 1.745 0.0827 . x:z:w -1.256e-03 6.277e-04 -2.001 0.0468 * --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 6.896 on 192 degrees of freedom Multiple R-squared: 0.4893, Adjusted R-squared: 0.4707 F-statistic: 26.28 on 7 and 192 DF, p-value: < 2.2e-16

You can see that the three-way interaction between the continuous variables (i.e., X:Z:W) is ‘significant’, just about. Let’s save any grousing about p-values for another day, shall we? Also, I would normally centre and standardise these types of continuous predictor variables before regression (see Gelman 2007, Gelman & Hill 2007, or Schielzeth 2010 for more details on why), but in this case I’m just converting code from a given example so we’ll take things at face value…

The next thing I want to do is to create the slope and intercept values for our 4 combinations of values (high Z + high W, high Z + low W, etc). I use the original data to create high and low values for Z and W (mean plus/minus one standard deviation), then use coefficients from the linear regression to compute the intercepts and slopes:

## High / low for w and z zH <- mean(hsb2$z) + sd(hsb2$z) zL <- mean(hsb2$z) - sd(hsb2$z) wH <- mean(hsb2$w) + sd(hsb2$w) wL <- mean(hsb2$w) - sd(hsb2$w) ## Get coefficients from regression coefs.hsb2 <- coef(lm.hsb2) coef.int <- coefs.hsb2["(Intercept)"] coef.x <- coefs.hsb2["x"] coef.z <- coefs.hsb2["z"] coef.w <- coefs.hsb2["w"] coef.xz <- coefs.hsb2["x:z"] coef.xw <- coefs.hsb2["x:w"] coef.zw <- coefs.hsb2["z:w"] coef.xzw <- coefs.hsb2["x:z:w"] ## Create slopes zHwH <- coef.x + zH*coef.xz + wH*coef.xw + zH*wH*coef.xzw zHwL <- coef.x + zH*coef.xz + wL*coef.xw + zH*wL*coef.xzw zLwH <- coef.x + zL*coef.xz + wH*coef.xw + zL*wH*coef.xzw zLwL <- coef.x + zL*coef.xz + wL*coef.xw + zL*wL*coef.xzw ## Create intercepts i.zHwH <- coef.int + zH*coef.z + wH*coef.w + zH*wH*coef.zw i.zHwL <- coef.int + zH*coef.z + wL*coef.w + zH*wL*coef.zw i.zLwH <- coef.int + zL*coef.z + wH*coef.w + zL*wH*coef.zw i.zLwL <- coef.int + zL*coef.z + wL*coef.w + zL*wL*coef.zw

Now, I want to create a data frame that uses these intercepts and slopes to ‘predict’ lines: because I know that the lines are going to be straight, I simply compute the value of Y for the minimum and maximum values of X for each Z:W combination. I’m pretty sure it can be done with far less code than I’ve used here, but there you go:

## a set of values of x x0 <- seq(min(hsb2$x), max(hsb2$x), length.out = 2) df.HH <- data.frame(x0 = x0) df.HH$y0 <- i.zHwH + df.HH$x0*zHwH df.HH$type <- rep("zHwH", nrow(df.HH)) df.HL <- data.frame(x0 = x0) df.HL$y0 <- i.zHwL + df.HL$x0*zHwL df.HL$type <- rep("zHwL", nrow(df.HL)) df.LH <- data.frame(x0 = x0) df.LH$y0 <- i.zLwH + df.LH$x0*zLwH df.LH$type <- rep("zLwH", nrow(df.LH)) df.LL <- data.frame(x0 = x0) df.LL$y0 <- i.zLwL + df.LL$x0*zLwL df.LL$type <- rep("zLwL", nrow(df.LL)) ## Create final data frame df.pred <- rbind(df.HH, df.HL, df.LH, df.LL) ## Remove unnecessary data frames rm(df.HH, df.HL, df.LH, df.LL)

Now, it’s time to plot the data! I’ve used ggplot2, and specified different colours and line types for each of our 4 combinations: red denotes high Z-value, and solid line denotes high W-value. I’ve also plotted the original data over the top:

## Call library library(ggplot2) ## Convert 'type' to factor df.pred$type <- factor(df.pred$type) ## Draw plot ggplot(df.pred, aes(x = x0, y = y0)) + geom_line(aes(colour = type, linetype = type)) + geom_jitter(data = hsb2, aes(x = x, y = y), size = 3, alpha = 0.7) + theme_bw() + theme(panel.background = element_rect(fill="ivory")) + theme(legend.key = element_blank()) + theme(text = element_text(size = 15)) + scale_colour_manual(name = "Partial effect", labels = c("High z High w", "High z Low w", "Low z High w", "Low z Low w"), values = c("#E41A1C", "#E41A1C", "#377EB8", "#377EB8")) + scale_linetype_manual(name = "Partial effect", labels = c("High z High w", "High z Low w", "Low z High w", "Low z Low w"), values = c("solid", "longdash", "solid", "longdash"))

The finished article:

Clearly, three of the slopes are very similar (differing mostly in terms of intercept), while one is very different to the rest. The relationship between Y and X is changed dramatically when there are high values of Z in combination with low values of W. It can still be hard to grasp these in abstract terms, so let’s refer back to my initial example (just for ‘fun’): this would mean that when weight is high [Z] and beer consumption is low [W], there is a steep relationship between height [X] and running time [Y]: short, heavy teetotallers are slow, but tall, heavy teetotallers are really fast.

Remember, these lines are theoretical, and drawn from the regression coefficients rather than the data itself; it’s up to you to make sure that short, heavy teetotallers are represented within your data set. I, for one, drink quite a lot of beer.

—-

*Want to know more about understanding and visualising interactions in multiple linear regression? Check out my follow-up posts:*

Three-way interactions between 2 continuous and 1 categorical variable

—-

* *My supervisor calls this the ‘ADF method’, as in ‘ANY DAMN FOOL (can see what’s happening if you just actually plot your data)’*

** *I originally had beer count, but my esteemed colleague Lilly Herridge (who knows what she is talking about much more than I do) pointed out that counts are discrete, not continuous, hence the change to volume. SMARTS.*

Code in this post was adapted from the UCLA Statistical Consulting Group’s page on three-way continuous interactions, and highlighted using inside-R’s ‘Pretty-R‘ tool.

Thanks so much for your post, this has made things much clearer. However I am having trouble understanding how to compute the coefficients (and therefore how to plot the slopes) as my output of the regression looks a little different to yours.

I too have a model with Y (response variable), X (predictor; continuous), Z (moderator condition, with 3 levels – say high, medium and low conditions), and W (moderator, with 3 levels, say high, medium, low conditions). So essentially I have 9 conditions + a continuous independent variable.

I am using random effects with participant so my model is : Result <- lmer(Y ~ X*Z*W+(1|PPX),data=Matrix).

Here I obtain the output (with ZLow and WLow as the reference levels): – I've provide just the headings and the estimate for brevity

Fixed effects: Estimate

(Intercept) -0.0293

X 0.0007

WHigh -0.0357

WMedium 0.0092

ZHigh -0.0491

ZMedium -0.0314

X:WHigh 0.0007

X:WMedium 0.0002

X:ZHigh 0.0009

X:ZMedium 0.0007

WHigh:ZHigh 0.0004

WMedium:ZHigh -0.0021

WHigh:ZMedium -0.0955

WMedium:ZMedium 0.0143

X:PrimeAction:ZHigh 0.0002

X:WMedium:ZHigh -0.0003

X:PrimeAction:ZMedium -0.0013

X:WMedium:ZMedium -0.0017

Are you able to explain how I can obtain the coefficients for each of my 9 conditions so I can compute the slopes and intercepts as you have demonstrated? Maybe I am completely missing something here/or just confused, but this doesn’t seem as straight forward as your example, as the output provides interaction estimates for specific levels of Z and W rather than X:Z:W.

Any help you can give is greatly appreciated!

Hi Rebekah, great question! I’ll write this up as another post later with more explanation, but for now here is some code – can I ask you to check a couple of things for me first though? Firstly, I’m assuming ‘PrimeAction’ is supposed to read ‘WHigh’; secondly – and more importantly – can you double check the X:WMedium:ZMedium coefficient for me? It doesn’t make sense to me that it’s so negative…

x <- seq(0, 100)

int_global <- -0.0293

X_coef <- 0.0007

WHigh <- -0.0357

WMedium <- 0.0092

ZHigh <- -0.0491

ZMedium <- -0.0314

X_WHigh <- 0.0007

X_WMedium <- 0.0002

X_ZHigh <- 0.0009

X_ZMedium <- 0.0007

WHigh_ZHigh <- 0.0004

WMedium_ZHigh <- -0.0021

WHigh_ZMedium <- -0.0955

WMedium_ZMedium <- 0.0143

X_WHigh_ZHigh <- 0.0002

X_WMedium_ZHigh <- -0.0003

X_WHigh_ZMedium <- -0.0013

X_WMedium_ZMedium <- -0.0017

y.WL_ZL <- int_global + (x * X_coef)

df.WL_ZL <- data.frame(x = x,

W = "W:Low",

Z = "Z:Low",

y = y.WL_ZL)

# Change Z

y.WL_ZM <- int_global + ZMedium + (x * X_coef) + (x * X_ZMedium)

df.WL_ZM <- data.frame(x = x,

W = "W:Low",

Z = "Z:Medium",

y = y.WL_ZM)

y.WL_ZH <- int_global + ZHigh + (x * X_coef) + (x * X_ZHigh)

df.WL_ZH <- data.frame(x = x,

W = "W:Low",

Z = "Z:High",

y = y.WL_ZM)

# Change W

y.WM_ZL <- int_global + WMedium + (x * X_coef) + (x * X_WMedium)

df.WM_ZL <- data.frame(x = x,

W = "W:Medium",

Z = "Z:Low",

y = y.WM_ZL)

y.WH_ZL <- int_global + WHigh + (x * X_coef) + (x * X_WHigh)

df.WH_ZL <- data.frame(x = x,

W = "W:High",

Z = "Z:Low",

y = y.WM_ZL)

# Change both

y.WM_ZM <- int_global + WMedium + ZMedium + WMedium_ZMedium +

(x * X_coef) +

(x * X_ZMedium) +

(x * X_WMedium) +

(x * X_WMedium_ZMedium)

df.WM_ZM <- data.frame(x = x,

W = "W:Medium",

Z = "Z:Medium",

y = y.WM_ZM)

y.WM_ZH <- int_global + WMedium + ZHigh + WMedium_ZHigh +

(x * X_coef) +

(x * X_ZHigh) +

(x * X_WMedium) +

(x * X_WMedium_ZHigh)

df.WM_ZH <- data.frame(x = x,

W = "W:Medium",

Z = "Z:High",

y = y.WM_ZH)

y.WH_ZM <- int_global + WHigh + ZMedium + WHigh_ZMedium +

(x * X_coef) +

(x * X_ZMedium) +

(x * X_WHigh) +

(x * X_WHigh_ZMedium)

df.WH_ZM <- data.frame(x = x,

W = "W:High",

Z = "Z:Medium",

y = y.WM_ZH)

y.WH_ZH <- int_global + WHigh + ZHigh + WHigh_ZHigh +

(x * X_coef) +

(x * X_ZHigh) +

(x * X_WHigh) +

(x * X_WHigh_ZHigh)

df.WH_ZH <- data.frame(x = x,

W = "W:High",

Z = "Z:High",

y = y.WH_ZH)

# Concatenate data frames

df.XWZ <- rbind(df.WL_ZL,

df.WM_ZL,

df.WH_ZL,

df.WL_ZM,

df.WL_ZH,

df.WM_ZM,

df.WM_ZH,

df.WH_ZM,

df.WH_ZH)

# Remove individual frames

rm(df.WL_ZL,

df.WM_ZL,

df.WH_ZL,

df.WL_ZM,

df.WL_ZH,

df.WM_ZM,

df.WM_ZH,

df.WH_ZM,

df.WH_ZH)

# Plot

library(ggplot2)

ggplot(df.XWZ, aes(x,y)) + geom_point() +

facet_grid(W~Z, as.table = FALSE) +

theme_bw()

Thanks for your quick reply. Yes you are correct on both counts, somehow I had mixed up the matrix a little (while trying to keep variables names consistent with your post)… Here it is again. Cheers!

Fixed effects: Estimate

(Intercept) -0.0293

X 0.0007

WHigh -0.0357

WMedium 0.0092

ZHigh -0.0491

ZMedium -0.0314

X:WHigh 0.0007

X:WMedium 0.0002

X:ZHigh 0.0009

X:ZMedium 0.0007

WHigh:ZHigh 0.0004

WMedium:ZHigh -0.0021

WHigh:ZMedium -0.0955

WMedium:ZMedium 0.0143

X:WHigh:ZHigh -0.0002

X:WMedium:ZHigh -0.0004

X:WHigh:ZMedium 0.0013

X:WMedium:ZMedium -0.0004

Excellent – the code as above should work, but just with switching those variables at the start… full variable list is now:

int_global <- -0.0293

X_coef <- 0.0007

WHigh <- -0.0357

WMedium <- 0.0092

ZHigh <- -0.0491

ZMedium <- -0.0314

X_WHigh <- 0.0007

X_WMedium <- 0.0002

X_ZHigh <- 0.0009

X_ZMedium <- 0.0007

WHigh_ZHigh <- 0.0004

WMedium_ZHigh <- -0.0021

WHigh_ZMedium <- -0.0955

WMedium_ZMedium <- 0.0143

X_WHigh_ZHigh <- -0.0002

X_WMedium_ZHigh <- -0.0004

X_WHigh_ZMedium <- 0.0013

X_WMedium_ZMedium <- -0.0004

Hopefully the plot makes sense! As I said, I'll write up another post later describing this in more detail (maybe using these coefficients if that's ok with you?)…

Perfect, thanks so much. And yes that would be no problem.

Dear Tom, your post and the additional comments on Rebekah’s questions have been very useful to me so far. Yet I have an additional question and was wandering if you could help. Do you have any idea on how to calculate p values for the different regression slopes to decide which of them is really different from zero?

Hi Jan, I’m glad the posts have been helpful to you! As to your question, in the example given above then these are simply illustrations of what the model is telling us about patterns in the data. It’s a bit trickier to explain with the example above, because of multiple continuous variables… These visualisations enable us to better connect the effects shown in the model with what is happening in the data (and to make inferences about what is happening biologically).

For a simpler example, let’s say we had data for the fastest times for the 100m in every Olympics, for both men and women. Our model would be driven by inspecting a data, but we would likely come up with questions and a model as follows:

time ~ olympic_year * sex

…asking three things:

– is there a trend in how the fastest time changes over olympic years (for the reference level)?

– does one sex have a faster average time than the other (second sex against reference level sex)?

– are the sex-specific trends different?

We tend to include interactions because we are interested in these higher-order terms – basically, is the effect of one predictor dependent on another predictor? Here, is the effect of ‘olympic year’ different depending on sex?

If we did find a significant interaction between the sex-specific slopes, only one of them is tested against zero (the other is then tested against being different from the first slope). You could change which is the reference level, or subset the data to test the slope against zero for each sex separately, but these are less interesting questions than the main one.

I hope this helps – please let me know if it is not clear…