R packages we use in this section

library(tidyverse)
library(stargazer)
library(margins)
library(interplot)
library(msm)
library(patchwork)
library(jtools)

1. What is interaction effects?

Interaction effects occur when the effect of one variable depends on the value of another variable.
Interaction effects are commonly used in regression analysis and designed experiments.
In this section, I explain interaction effects and how to interpret them in statistical designs.

What is the difference between a dummmy variable and an iteraction term?

When we include a party dummy variable (ldp), then we can see the difference in outcome variable (voteshare) between LDP candidates (ldp = 1) and non-LDP candidates(ldp = 0) as shown in the graph on the left.

Interaction effects indicate that a third variable (in this example, ldp) influences the relationship between an explanatory and outcome variable.
When we include an interaction term (ldp:exppv), then we can see the difference in slope (exppv → voteshare) between LDP candidates (ldp = 1) and non-LDP candidates(ldp = 0) as shown in the graph on the right.
This type of effect makes the model more complex, but if the real world actually behaves this way, it is critical to incorporate it in your model.

2. Model_1

The following is the model we use in this section:

nocand is control variable

Note ・A control variable is anything that is held constant or limited in a research study.
・It’s a variable that is not of interest to our research aims, but is controlled because it could influence the outcomes.
・In actual research, we need to add more control variables because it is reasonable for us to assume other variables (such as age of candidates, campaign expenditure, candidate’s gender, etc.) which could influence the votes share.
・Here I only add nocand as control variable.

2.1 Marginal Effects

To see interaction effects, we make an interaction term by multiplying the major independent variable (in this case, exppv) and a categorical moderator variable (in this case, ldp): ldp:exppv
We make an interaction term by multiplying the following two independent variables:

major explanatory variable (exppv)
moderator variable (ldp)

Note ・A moderator variable can be two types: categorical and continuous variables.
・In this section, I will deal with a moderator variable (categorical).
・In 21. Multiple Regression 4 (Interaction Effects 2), I will deal with a moderator variable (continuous).

We include this interaction term in Model_1
This way, we can see how a third variable (ldp) influences the relationship between an explanatory and outcome variable.

Assumption of model_1:

The slope (exppv → voteshare) differs between LDP and non-LDP candidates.
This slope is equivalent to Marginal Effect
We estimate the following model:
\[\mathrm{{voteshare}\ = \alpha_0 + \alpha_1 exppv + \alpha_2 ldp + \alpha_3 ldp:exppv + \alpha_4 nocand + \varepsilon}\]
We can rewrite the equation asa follows:

\[\mathrm{{voteshare}\ = \alpha_0 + (\alpha_1 + \alpha_3 ldp) exppv + \alpha_2 ldp + \alpha_4 nocand + \varepsilon}\]

\(\alpha_0\)	: vote share (%) of a non-LDP member (`ldp = 0`) when `exppv = 0`
\((\alpha_1 + \alpha_3 \textrm{ldp})\)	slope (`exppv` → `voteshare`) = Marginal Effect

WHAT WE WANT TO KNOW:

Does the impact of campaign expenditure on vote share differ between LDP candidates and non-LDP candidates?

When we estimate a model with an interaction term, we need to check Marginal Effects

Marginal Effects ・Marginal effects are often calculated when analyzing regression analysis results.
・Marginal effects are equivalent to slopes in regression analysis.
・Marginal effects tells us how a dependent variable (outcome) changes when a specific independent variable (explanatory variable) changes.
・Other covariates (= control variables) are assumed to be held constant.
→ In this case, we hold nocand constant.

2.2 Date preparation

Download hr96_17.csv
Japanese lower house election results (1996-2017)
Put the file (hr96_17.csv) into data folder in your RProject folder
Load the data and name it hr

hr <- read_csv("data/hr96-17.csv", 
               na = ".")

Check the variable names in hr

names(hr)

 [1] "year"          "pref"          "ku"            "kun"          
 [5] "mag"           "rank"          "wl"            "nocand"       
 [9] "seito"         "j_name"        "name"          "term"         
[13] "gender"        "age"           "exp"           "status"       
[17] "vote"          "voteshare"     "eligible"      "turnout"      
[21] "castvotes"     "seshu_dummy"   "jiban_seshu"   "nojiban_seshu"

Check how many lower elections we had since 1996 to date

unique(hr$year)

[1] 1996 2000 2003 2005 2009 2012 2014 2017

Select the variables we need:
We only use the 2005 lower house election out of 8 elections

df1 <- hr %>% 
  dplyr::filter(year == 2005) %>% 
  dplyr::select(year, voteshare, exp, eligible,  seito, nocand)

Check df1

DT::datatable(df1)

df1 contains the following 6 variables

variable	detail
year	Election Year
voteshare	Voteshare (%)
exp	Election expenditure (yen) spent by each candidate
eligible	Eligible voters in each district
seito	Candidate’s affiliated party
nocand	Number of candidates in each district

We need LDP dummy variable (ldp)
ldp = 0 if a candidate is an LDP, 0 otherwise

df1 <- df1 %>% 
  mutate(ldp = ifelse(seito == "自民", 1, 0))

We need campaign expenditure variable (exppv)
exppv show campaign expenditure spent by each candidate per voter in their electoral district.

df1 <- mutate(df1, exppv = exp / eligible)

DT::datatable(df1)

Show the descriptive statistics of df1

stargazer(as.data.frame(df1), type = "html")


Statistic	N	Mean	St. Dev.	Min	Pctl(25)	Pctl(75)	Max

year	989	2,005.000	0.000	2,005	2,005	2,005	2,005
voteshare	989	30.333	19.230	1	8.8	46.6	74
exp	985	8,142,244.000	5,569,641.000	62,710.000	2,917,435.000	11,822,797.000	24,649,710.000
eligible	989	344,654.300	63,898.230	214,235	297,385	397,210	465,181
nocand	989	3.435	0.740	2	3	4	6
ldp	989	0.293	0.455	0	0	1	1
exppv	985	24.627	17.907	0.148	8.352	35.269	89.332

2.3 Results (Model_1)

model_1 <- lm(voteshare ~ exppv*ldp + nocand,
              data = df1)

Visually show the results: `jtools::plot_summs()`

jtools::plot_summs(model_1)

Show the results: `tidy()`

tidy(model_1)

# A tibble: 5 x 5
  term        estimate std.error statistic   p.value
  <chr>          <dbl>     <dbl>     <dbl>     <dbl>
1 (Intercept)   23.5      1.70        13.8 1.27e- 39
2 exppv          0.770    0.0250      30.7 7.15e-146
3 ldp           39.8      1.69        23.6 1.08e- 97
4 nocand        -4.45     0.444      -10.0 1.41e- 22
5 exppv:ldp     -0.749    0.0453     -16.5 2.72e- 54

Show the results: `stargazer()`

stargazer(model_1,  
          digits = 3,        
          style = "ajps", 
          title = "Results of model_1 (2005HR election)", 
          type ="html")

**Results of model(2005HR election)**

	voteshare

exppv	0.770^***
	(0.025)
ldp	39.830^***
	(1.690)
nocand	-4.453^***
	(0.444)
exppv:ldp	-0.749^***
	(0.045)
Constant	23.459^***
	(1.702)
N	985
R-squared	0.723
Adj. R-squared	0.722
Residual Std. Error	10.130 (df = 980)
F Statistic	639.244^*** (df = 4; 980)

p < .01; p < .05; p < .1

We see that all variables are statistically significant.
Next, we need to interpret the results.

2.4 Interpretation (model_1)

In interpreting model_1, let’s make another model which does not include the interaction term (model_2) and compare them to have a better understanding of the results.

Let’s make model_2

model_2 <- lm(voteshare ~ exppv + ldp + nocand,
              data = df1)

Show the results: `stargazer()`

stargazer(model_1, model_2,
          digits = 3,        
          style = "ajps", 
          title = "Results of model_1 and model_2(2005HR election)", 
          type ="html")

**Results of modeland modelelection)**

	voteshare
	Model 1	Model 2

exppv	0.770^***	0.543^***
	(0.025)	(0.024)
ldp	39.830^***	15.423^***
	(1.690)	(0.927)
nocand	-4.453^***	-4.403^***
	(0.444)	(0.502)
exppv:ldp	-0.749^***
	(0.045)
Constant	23.459^***	27.558^***
	(1.702)	(1.903)
N	985	985
R-squared	0.723	0.646
Adj. R-squared	0.722	0.645
Residual Std. Error	10.130 (df = 980)	11.448 (df = 981)
F Statistic	639.244^*** (df = 4; 980)	596.035^*** (df = 3; 981)

p < .01; p < .05; p < .1

Important point The results of Model_1 is the results when ldp = 0

The coefficient of exppv (0.770) in model_1
→ when exppv increased one yen, voteshare increased by 0.77 percentage points [when a candidate is not an LDP]
To confirm this, let’s check the Sample Regression Function equations for model_1.

\[\widehat{voteshare}\ = 23.459 + 0.77exppv -0.749ldp:exppv - 4.453nocand\]
\[= 23.459 + (0.77 - 0.749ldp)exppv - 4.453nocand\]

The impact of exppv on voteshare \((α_1 + α_3ldp)\) is:

\[{0.77 - 0.749ldp}\]

0.77 - 0.749ldp means the marginal effect（exppv →　voteshare）when a candidate increases his/her campaign expenditure by 1 yen per voter
We see that marginal effects change depending on the value of ldp
Thus, we get the following two equations depending on the value of ldp:

Model_1

When ldp = 0

\[\textrm{voteshare}= 23.46 + 0.77 \cdot \textrm{exppv} - 4.45 \cdot \textrm{nocand} + \varepsilon\]

When ldp = 1

\[\textrm{voteshare}= 63.29 + 0.02 \cdot \textrm{exppv} - 4.47 \cdot \textrm{nocand} + \varepsilon\]

Let’s visualize Model_1

plot1 <- ggplot(df1, aes(x = exppv, y = voteshare)) +
  geom_point(aes(color = as.factor(ldp))) +
  geom_abline(intercept = 23.46, slope = 0.77, linetype = "dashed", color = "red") +
  geom_abline(intercept = 63.29, slope = 0.02, color = "blue") +
  ylim(0, 100) +
  labs(x = "Campaign expenditure per voter (yen)", y = "Vote share(%)") +
  geom_text(label = "voteshare = 23.46 + 0.77exppv- 4.45nocand\n(non-LDP candidates)",
            x = 60, y = 95, color = "red") +
  geom_text(label = "voteshare = 63.29 + 0.002exppv - 4.47nocand\n(LDP candidates)",
            x = 30, y = 80,  color = "blue")
plot1

Model_2

\[voteshare = 27.558 + 0.543exppv - 4.403\]

Let’s visualize Model_2

plot2 <- ggplot(df1, aes(x = exppv, y = voteshare)) +
    geom_point() +
  geom_abline(intercept = 27.558, slope = 0.543, color = "black") +
  ylim(0, 100) +
  labs(x = "Campaign expenditure per voter (yen)", y = "Vote share(%)") +
  geom_text(label = "voteshare = 27.558 + 0.543exppv - 4.403nocand\n(Model_2)",
            x = 60, y = 80, color = "black")
plot2

Visualize Marginal Effects (model_1)

`msm package`

We need to check if the impact of campaign expenditure on vote share differs between LDP candidates (Marginal effect = 0.002) and non-LDP candidates (Marginal effect = 0.77).
Check the Intercept and the four coefficients in model_1

model_1$coef

(Intercept)       exppv         ldp      nocand   exppv:ldp 
 23.4586397   0.7701405  39.8297455  -4.4525774  -0.7493132

We need to calculate the marginal effects (= slopes) of exppv on voteshare when ldp = 0 and ldp = 1.
To calculate these two marginal effects, we use the 2nd value (exppv) and the 5th value (exppv:ldp)

at.ldp <- c(0, 1) 
slopes <- model_1$coef[2] + model_1$coef[5]*at.ldp　
                    
slopes

[1] 0.77014053 0.02082736

Using the delta method, we calculate standard error on these two marginal effects with 95% confidence intervals

library(msm)

estmean <- coef(model_1)
var <- vcov(model_1)

SEs <- rep(NA, length(at.ldp))

for (i in 1:length(at.ldp)){
    j <- at.ldp[i]
    SEs[i] <- deltamethod (~ (x2) + (x5)*j, estmean, var)　# standard error
}                                                      

upper <- slopes + 1.96*SEs
lower <- slopes - 1.96*SEs

cbind(at.ldp, slopes, upper, lower)

     at.ldp     slopes      upper       lower
[1,]      0 0.77014053 0.81923536  0.72104569
[2,]      1 0.02082736 0.09518719 -0.05353247

Let me explain what this means.
at.ldp shows whether a candidate belongs to the LDP (= 1) or not (= 0)
slopes is the marginal effects of exppv on voteshare
The value (0.77014053) between [1, ] and slopes
→　The marginal effect of exppv on voteshare for a non-LDP candidate
The value (0.02082736) between [2, ] and slopes
→　The marginal effect of exppv on voteshare for an LDP candidate
upper and lower show the upper bound and lower bound on 95% confidence intervals
To visualize this result, we change the data into data frame and name it msm_1

msm_1 <- cbind(at.ldp, slopes, upper, lower) %>% 
  as.data.frame()

msm_1

  at.ldp     slopes      upper       lower
1      0 0.77014053 0.81923536  0.72104569
2      1 0.02082736 0.09518719 -0.05353247

Draw a graph showing the marginal effects of exppv on voteshare for a Non-LDP and an LDP candidate.

msm_1 <- msm_1 %>% 
  ggplot(aes(at.ldp, slopes, ymin = lower, ymax = upper)) +
  geom_hline(yintercept = 0, linetype = 2, col = "red") +
  geom_pointrange(size = 1) +
  geom_errorbar(aes(x = at.ldp, ymin = lower, ymax = upper),
                width = 0.1) +
  labs(x = "Candidate's affiliated party", y = "Marginal Effects") +
  scale_x_continuous(breaks = c(1,0),
                     labels = c("LDP", "Non-LDP")) +
  ggtitle("Marginal Effects of exppv on voteshare (Model_1)") +
  theme(axis.text.x  = element_text(size = 14),
        axis.text.y  = element_text(size = 14),
        axis.title.y = element_text(size = 14),
        plot.title   = element_text(size = 18)) 
msm_1

2.5 How to show your results

**Results of model(2005HR election)**

	voteshare
	Model 1	Model 2

exppv	0.770^***	0.543^***
	(0.025)	(0.024)
ldp	39.830^***	15.423^***
	(1.690)	(0.927)
nocand	-4.453^***	-4.403^***
	(0.444)	(0.502)
exppv:ldp	-0.749^***
	(0.045)
Constant	23.459^***	27.558^***
	(1.702)	(1.903)
N	985	985
R-squared	0.723	0.646
Adj. R-squared	0.722	0.645
Residual Std. Error	10.130 (df = 980)	11.448 (df = 981)
F Statistic	639.244^*** (df = 4; 980)	596.035^*** (df = 3; 981)

p < .01; p < .05; p < .1

Results on model_1 When ldp = 0
・When a non-LDP candidates increases campaign money by 1 yen, then his/her vote share increases by 0.77 percentage points.
・This is statistically significant with the 1% significant level (p-value = 7.15e-146)

When ldp = 1
・When an LDP candidate increases campaign money by 1 yen, then his/her vote share increases by 0.02 percentage points.
・This is not statistically significant.

The coefficient of exppv:ldp, -0.749
・The impact of campaign expenditure on vote share differs between LDP candidates and non-LDP candidates.
・The difference in the impact of exppv on voteshare differs by 0.749.
・A non-LDP candidate’s marginal effect (slope) is larger than a LDP candidat’s by 0.749 percentage points.
・This is statistically significant with the 1% significant level (p-value = 2.72e- 54)

If you want to check p-value, use either tidy() or summary()

tidy(model_1)

# A tibble: 5 x 5
  term        estimate std.error statistic   p.value
  <chr>          <dbl>     <dbl>     <dbl>     <dbl>
1 (Intercept)   23.5      1.70        13.8 1.27e- 39
2 exppv          0.770    0.0250      30.7 7.15e-146
3 ldp           39.8      1.69        23.6 1.08e- 97
4 nocand        -4.45     0.444      -10.0 1.41e- 22
5 exppv:ldp     -0.749    0.0453     -16.5 2.72e- 54

3. Excercise

We want know whether the impact of campaign expenditure on vote share differ between DPJ candidates and non-DPJ candidates in the 2009 lower house election in Japan.
The 2009 lower house election is a very important national election in that the DPJ won the landslide victory over the long-lasting LDP in Japanese politics for the first time.
Data you use: hr96_17.csv
Japanese lower house election results (1996-2017)
Following the steps in 2. Model_1 in this section, answer the following 7 questions.

Q1:

Select the following 6 variables from hr96_17.csv, name the dataframe df2, and show its descriptive statistics using stargazer package.

variable	detail
year	Election Year
voteshare	Voteshare (%)
exp	Election expenditure (yen) spent by each candidate
eligible	Eligible voters in each district
seito	Candidate’s affiliated party
nocand	Number of candidates in each district

Q2:

Using seito, make a party dummy variable, dpj where dpj = 1 if a candidate belongs to the DPJ, 0 otherwise.
The DPJ is shown as “民主” in hr96_17.csv
Using exp and eligible, make an expenditure variable, exppv which shows campaign expenditure spent by each candidate per voter in their electoral district.
Select the following 5 variables from hr96_17.csv, name the dataframe df3, and show its descriptive statistics using stargazer package.

variables	detail
year	Election Year
voteshare	Voteshare (%)
exppv	Campaign expenditure spent by each candidate per voter in their electoral district (yen)
dpj	= 1 if a candidate belongs to the DPJ, 0 otherwise
nocand	Number of candidates in each district

Q3:

Run the following two models and show their results using stargazer package.

model_3 <- lm(voteshare ~ exppv*dpj + nocand,
              data = df3)

model_4 <- lm(voteshare ~ exppv + dpj + nocand,
              data = df3)

Q4:

Show the two regression equations for model_3 and model_4

Q5:

Draw a scatter plot on model_3 with two regression lines (dpj = 0 & dpj =1)
x-axis is exppv, y-axis is voteshare

Q6:

Draw a scatter plot on model_4
x-axis is exppv, y-axis is voteshare

Q7:

Using msm package, visualize the Marginal effects on model_3 and explain its results.
Can you conclude that the impact of campaign expenditure on vote share differ between DPJ candidates (民主) and non-DPJ candidates in the 2009 lower house election in Japan? Show its evidence and explain why.

References

宋財泫 (Jaehyun Song)・矢内勇生 (Yuki Yanai)「私たちのR: ベストプラクティスの探究」

土井翔平（北海道大学公共政策大学院）「Rで計量政治学入門」

矢内勇生（高知工科大学）授業一覧

浅野正彦, 矢内勇生.『Rによる計量政治学』オーム社、2018年

浅野正彦, 中村公亮.『初めてのRStudio』オーム社、2018年

Winston Chang, R Graphics Cookbook, O’Reilly Media, 2012.

Kieran Healy, DATA VISUALIZATION, Princeton, 2019

Kosuke Imai, Quantitative Social Science: An Introduction, Princeton University Press, 2017

20. Linear Regression 3 (Interaction 1)

Masahiko Asano

2021-10-19

1. What is interaction effects?

What is the difference between a dummmy variable and an iteraction term?

2. Model_1

2.1 Marginal Effects

Assumption of model_1:

WHAT WE WANT TO KNOW:

2.2 Date preparation

2.3 Results (Model_1)

Visually show the results: `jtools::plot_summs()`

Show the results: `tidy()`

Show the results: `stargazer()`

2.4 Interpretation (model_1)

Show the results: `stargazer()`

Model_1

When ldp = 0

When ldp = 1

Model_2

Visualize Marginal Effects (model_1)

`msm package`

2.5 How to show your results

3. Excercise

20. Linear Regression 3 (Interaction 1)

Masahiko Asano

2021-10-19

1. What is interaction effects?

What is the difference between a dummmy variable and an iteraction term?

2. Model_1

2.1 Marginal Effects

Assumption of model_1:

WHAT WE WANT TO KNOW:

2.2 Date preparation

2.3 Results (Model_1)

Visually show the results: jtools::plot_summs()

Show the results: tidy()

Show the results: stargazer()

2.4 Interpretation (model_1)

Show the results: stargazer()

Model_1

When ldp = 0

When ldp = 1

Model_2

Visualize Marginal Effects (model_1)

msm package

2.5 How to show your results

3. Excercise

Visually show the results: `jtools::plot_summs()`

Show the results: `tidy()`

Show the results: `stargazer()`

Show the results: `stargazer()`

`msm package`