R packages we use in this sectionlibrary(tidyverse)
library(stargazer)
library(margins)
library(interplot)
library(msm)
library(patchwork)
library(jtools)ldp), then we can see the difference in outcome variable (voteshare) between LDP candidates (ldp = 1) and non-LDP candidates(ldp = 0) as shown in the graph on the left.Interaction effects indicate that a third variable (in this example, ldp) influences the relationship between an explanatory and outcome variable.
When we include an interaction term (ldp:exppv), then we can see the difference in slope (exppv → voteshare) between LDP candidates (ldp = 1) and non-LDP candidates(ldp = 0) as shown in the graph on the right.
This type of effect makes the model more complex, but if the real world actually behaves this way, it is critical to incorporate it in your model.
nocand is control variableNote ・A control variable is anything that is held constant or limited in a research study.
・It’s a variable that is not of interest to our research aims, but is controlled because it could influence the outcomes.
・In actual research, we need to add more control variables because it is reasonable for us to assume other variables (such as age of candidates, campaign expenditure, candidate’s gender, etc.) which could influence the votes share.
・Here I only add nocand as control variable.
To see interaction effects, we make an interaction term by multiplying the major independent variable (in this case, exppv) and a categorical moderator variable (in this case, ldp): ldp:exppv
We make an interaction term by multiplying the following two independent variables:
exppv)ldp)Note ・A moderator variable can be two types: categorical and continuous variables.
・In this section, I will deal with a moderator variable (categorical).
・In 21. Multiple Regression 4 (Interaction Effects 2), I will deal with a moderator variable (continuous).
ldp) influences the relationship between an explanatory and outcome variable.The slope (exppv → voteshare) differs between LDP and non-LDP candidates.
This slope is equivalent to Marginal Effect
We estimate the following model:
\[\mathrm{{voteshare}\ = \alpha_0 + \alpha_1 exppv + \alpha_2 ldp + \alpha_3 ldp:exppv + \alpha_4 nocand + \varepsilon}\]
We can rewrite the equation asa follows:
\[\mathrm{{voteshare}\ = \alpha_0 + (\alpha_1 + \alpha_3 ldp) exppv + \alpha_2 ldp + \alpha_4 nocand + \varepsilon}\]
| \(\alpha_0\) | : vote share (%) of a non-LDP member (ldp = 0) when exppv = 0 |
| \((\alpha_1 + \alpha_3 \textrm{ldp})\) | slope (exppv → voteshare) = Marginal Effect |
Does the impact of campaign expenditure on vote share differ between LDP candidates and non-LDP candidates?
Marginal Effects ・Marginal effects are often calculated when analyzing regression analysis results.
・Marginal effects are equivalent to slopes in regression analysis.
・Marginal effects tells us how a dependent variable (outcome) changes when a specific independent variable (explanatory variable) changes.
・Other covariates (= control variables) are assumed to be held constant.
→ In this case, we hold nocand constant.
data folder in your RProject folderhrhr <- read_csv("data/hr96-17.csv",
na = ".")hrnames(hr) [1] "year" "pref" "ku" "kun"
[5] "mag" "rank" "wl" "nocand"
[9] "seito" "j_name" "name" "term"
[13] "gender" "age" "exp" "status"
[17] "vote" "voteshare" "eligible" "turnout"
[21] "castvotes" "seshu_dummy" "jiban_seshu" "nojiban_seshu"
unique(hr$year)[1] 1996 2000 2003 2005 2009 2012 2014 2017
df1 <- hr %>%
dplyr::filter(year == 2005) %>%
dplyr::select(year, voteshare, exp, eligible, seito, nocand) DT::datatable(df1)df1 contains the following 6 variables
| variable | detail |
|---|---|
| year | Election Year |
| voteshare | Voteshare (%) |
| exp | Election expenditure (yen) spent by each candidate |
| eligible | Eligible voters in each district |
| seito | Candidate’s affiliated party |
| nocand | Number of candidates in each district |
ldp)ldp = 0 if a candidate is an LDP, 0 otherwisedf1 <- df1 %>%
mutate(ldp = ifelse(seito == "自民", 1, 0))exppv)exppv show campaign expenditure spent by each candidate per voter in their electoral district.df1 <- mutate(df1, exppv = exp / eligible)DT::datatable(df1)df1stargazer(as.data.frame(df1), type = "html")| Statistic | N | Mean | St. Dev. | Min | Pctl(25) | Pctl(75) | Max |
| year | 989 | 2,005.000 | 0.000 | 2,005 | 2,005 | 2,005 | 2,005 |
| voteshare | 989 | 30.333 | 19.230 | 1 | 8.8 | 46.6 | 74 |
| exp | 985 | 8,142,244.000 | 5,569,641.000 | 62,710.000 | 2,917,435.000 | 11,822,797.000 | 24,649,710.000 |
| eligible | 989 | 344,654.300 | 63,898.230 | 214,235 | 297,385 | 397,210 | 465,181 |
| nocand | 989 | 3.435 | 0.740 | 2 | 3 | 4 | 6 |
| ldp | 989 | 0.293 | 0.455 | 0 | 0 | 1 | 1 |
| exppv | 985 | 24.627 | 17.907 | 0.148 | 8.352 | 35.269 | 89.332 |
model_1 <- lm(voteshare ~ exppv*ldp + nocand,
data = df1)jtools::plot_summs()jtools::plot_summs(model_1)tidy()tidy(model_1)# A tibble: 5 x 5
term estimate std.error statistic p.value
<chr> <dbl> <dbl> <dbl> <dbl>
1 (Intercept) 23.5 1.70 13.8 1.27e- 39
2 exppv 0.770 0.0250 30.7 7.15e-146
3 ldp 39.8 1.69 23.6 1.08e- 97
4 nocand -4.45 0.444 -10.0 1.41e- 22
5 exppv:ldp -0.749 0.0453 -16.5 2.72e- 54
stargazer()stargazer(model_1,
digits = 3,
style = "ajps",
title = "Results of model_1 (2005HR election)",
type ="html")| voteshare | |
| exppv | 0.770*** |
| (0.025) | |
| ldp | 39.830*** |
| (1.690) | |
| nocand | -4.453*** |
| (0.444) | |
| exppv:ldp | -0.749*** |
| (0.045) | |
| Constant | 23.459*** |
| (1.702) | |
| N | 985 |
| R-squared | 0.723 |
| Adj. R-squared | 0.722 |
| Residual Std. Error | 10.130 (df = 980) |
| F Statistic | 639.244*** (df = 4; 980) |
| p < .01; p < .05; p < .1 | |
model_1, let’s make another model which does not include the interaction term (model_2) and compare them to have a better understanding of the results.model_2 <- lm(voteshare ~ exppv + ldp + nocand,
data = df1)stargazer()stargazer(model_1, model_2,
digits = 3,
style = "ajps",
title = "Results of model_1 and model_2(2005HR election)",
type ="html")| voteshare | ||
| Model 1 | Model 2 | |
| exppv | 0.770*** | 0.543*** |
| (0.025) | (0.024) | |
| ldp | 39.830*** | 15.423*** |
| (1.690) | (0.927) | |
| nocand | -4.453*** | -4.403*** |
| (0.444) | (0.502) | |
| exppv:ldp | -0.749*** | |
| (0.045) | ||
| Constant | 23.459*** | 27.558*** |
| (1.702) | (1.903) | |
| N | 985 | 985 |
| R-squared | 0.723 | 0.646 |
| Adj. R-squared | 0.722 | 0.645 |
| Residual Std. Error | 10.130 (df = 980) | 11.448 (df = 981) |
| F Statistic | 639.244*** (df = 4; 980) | 596.035*** (df = 3; 981) |
| p < .01; p < .05; p < .1 | ||
Important point The results of Model_1 is the results when ldp = 0
The coefficient of exppv (0.770) in model_1
→ when exppv increased one yen, voteshare increased by 0.77 percentage points [when a candidate is not an LDP]
To confirm this, let’s check the Sample Regression Function equations for model_1.
\[\widehat{voteshare}\ = 23.459 + 0.77exppv -0.749ldp:exppv - 4.453nocand\]
\[= 23.459 + (0.77 - 0.749ldp)exppv - 4.453nocand\]
exppv on voteshare \((α_1 + α_3ldp)\) is:\[{0.77 - 0.749ldp}\]
exppv → voteshare)when a candidate increases his/her campaign expenditure by 1 yen per voterldpldp:\[\textrm{voteshare}= 23.46 + 0.77 \cdot \textrm{exppv} - 4.45 \cdot \textrm{nocand} + \varepsilon\]
\[\textrm{voteshare}= 63.29 + 0.02 \cdot \textrm{exppv} - 4.47 \cdot \textrm{nocand} + \varepsilon\]
plot1 <- ggplot(df1, aes(x = exppv, y = voteshare)) +
geom_point(aes(color = as.factor(ldp))) +
geom_abline(intercept = 23.46, slope = 0.77, linetype = "dashed", color = "red") +
geom_abline(intercept = 63.29, slope = 0.02, color = "blue") +
ylim(0, 100) +
labs(x = "Campaign expenditure per voter (yen)", y = "Vote share(%)") +
geom_text(label = "voteshare = 23.46 + 0.77exppv- 4.45nocand\n(non-LDP candidates)",
x = 60, y = 95, color = "red") +
geom_text(label = "voteshare = 63.29 + 0.002exppv - 4.47nocand\n(LDP candidates)",
x = 30, y = 80, color = "blue")
plot1\[voteshare = 27.558 + 0.543exppv - 4.403\]
plot2 <- ggplot(df1, aes(x = exppv, y = voteshare)) +
geom_point() +
geom_abline(intercept = 27.558, slope = 0.543, color = "black") +
ylim(0, 100) +
labs(x = "Campaign expenditure per voter (yen)", y = "Vote share(%)") +
geom_text(label = "voteshare = 27.558 + 0.543exppv - 4.403nocand\n(Model_2)",
x = 60, y = 80, color = "black")
plot2msm packageIntercept and the four coefficients in model_1model_1$coef(Intercept) exppv ldp nocand exppv:ldp
23.4586397 0.7701405 39.8297455 -4.4525774 -0.7493132
We need to calculate the marginal effects (= slopes) of exppv on voteshare when ldp = 0 and ldp = 1.
To calculate these two marginal effects, we use the 2nd value (exppv) and the 5th value (exppv:ldp)
at.ldp <- c(0, 1)
slopes <- model_1$coef[2] + model_1$coef[5]*at.ldp
slopes [1] 0.77014053 0.02082736
delta method, we calculate standard error on these two marginal effects with 95% confidence intervalslibrary(msm)estmean <- coef(model_1)
var <- vcov(model_1)
SEs <- rep(NA, length(at.ldp))
for (i in 1:length(at.ldp)){
j <- at.ldp[i]
SEs[i] <- deltamethod (~ (x2) + (x5)*j, estmean, var) # standard error
}
upper <- slopes + 1.96*SEs
lower <- slopes - 1.96*SEs
cbind(at.ldp, slopes, upper, lower) at.ldp slopes upper lower
[1,] 0 0.77014053 0.81923536 0.72104569
[2,] 1 0.02082736 0.09518719 -0.05353247
Let me explain what this means.
at.ldp shows whether a candidate belongs to the LDP (= 1) or not (= 0)
slopes is the marginal effects of exppv on voteshare
The value (0.77014053) between [1, ] and slopes
→ The marginal effect of exppv on voteshare for a non-LDP candidate
The value (0.02082736) between [2, ] and slopes
→ The marginal effect of exppv on voteshare for an LDP candidate
upper and lower show the upper bound and lower bound on 95% confidence intervals
To visualize this result, we change the data into data frame and name it msm_1
msm_1 <- cbind(at.ldp, slopes, upper, lower) %>%
as.data.frame()
msm_1 at.ldp slopes upper lower
1 0 0.77014053 0.81923536 0.72104569
2 1 0.02082736 0.09518719 -0.05353247
exppv on voteshare for a Non-LDP and an LDP candidate.msm_1 <- msm_1 %>%
ggplot(aes(at.ldp, slopes, ymin = lower, ymax = upper)) +
geom_hline(yintercept = 0, linetype = 2, col = "red") +
geom_pointrange(size = 1) +
geom_errorbar(aes(x = at.ldp, ymin = lower, ymax = upper),
width = 0.1) +
labs(x = "Candidate's affiliated party", y = "Marginal Effects") +
scale_x_continuous(breaks = c(1,0),
labels = c("LDP", "Non-LDP")) +
ggtitle("Marginal Effects of exppv on voteshare (Model_1)") +
theme(axis.text.x = element_text(size = 14),
axis.text.y = element_text(size = 14),
axis.title.y = element_text(size = 14),
plot.title = element_text(size = 18))
msm_1| voteshare | ||
| Model 1 | Model 2 | |
| exppv | 0.770*** | 0.543*** |
| (0.025) | (0.024) | |
| ldp | 39.830*** | 15.423*** |
| (1.690) | (0.927) | |
| nocand | -4.453*** | -4.403*** |
| (0.444) | (0.502) | |
| exppv:ldp | -0.749*** | |
| (0.045) | ||
| Constant | 23.459*** | 27.558*** |
| (1.702) | (1.903) | |
| N | 985 | 985 |
| R-squared | 0.723 | 0.646 |
| Adj. R-squared | 0.722 | 0.645 |
| Residual Std. Error | 10.130 (df = 980) | 11.448 (df = 981) |
| F Statistic | 639.244*** (df = 4; 980) | 596.035*** (df = 3; 981) |
| p < .01; p < .05; p < .1 | ||
Results on model_1 When ldp = 0
・When a non-LDP candidates increases campaign money by 1 yen, then his/her vote share increases by 0.77 percentage points.
・This is statistically significant with the 1% significant level (p-value = 7.15e-146)
When ldp = 1
・When an LDP candidate increases campaign money by 1 yen, then his/her vote share increases by 0.02 percentage points.
・This is not statistically significant.
The coefficient of exppv:ldp, -0.749
・The impact of campaign expenditure on vote share differs between LDP candidates and non-LDP candidates.
・The difference in the impact of exppv on voteshare differs by 0.749.
・A non-LDP candidate’s marginal effect (slope) is larger than a LDP candidat’s by 0.749 percentage points.
・This is statistically significant with the 1% significant level (p-value = 2.72e- 54)
tidy() or summary()tidy(model_1)# A tibble: 5 x 5
term estimate std.error statistic p.value
<chr> <dbl> <dbl> <dbl> <dbl>
1 (Intercept) 23.5 1.70 13.8 1.27e- 39
2 exppv 0.770 0.0250 30.7 7.15e-146
3 ldp 39.8 1.69 23.6 1.08e- 97
4 nocand -4.45 0.444 -10.0 1.41e- 22
5 exppv:ldp -0.749 0.0453 -16.5 2.72e- 54
2. Model_1 in this section, answer the following 7 questions.Q1:
hr96_17.csv, name the dataframe df2, and show its descriptive statistics using stargazer package.| variable | detail |
|---|---|
| year | Election Year |
| voteshare | Voteshare (%) |
| exp | Election expenditure (yen) spent by each candidate |
| eligible | Eligible voters in each district |
| seito | Candidate’s affiliated party |
| nocand | Number of candidates in each district |
Q2:
Using seito, make a party dummy variable, dpj where dpj = 1 if a candidate belongs to the DPJ, 0 otherwise.
The DPJ is shown as “民主” in hr96_17.csv
Using exp and eligible, make an expenditure variable, exppv which shows campaign expenditure spent by each candidate per voter in their electoral district.
Select the following 5 variables from hr96_17.csv, name the dataframe df3, and show its descriptive statistics using stargazer package.
| variables | detail |
|---|---|
| year | Election Year |
| voteshare | Voteshare (%) |
| exppv | Campaign expenditure spent by each candidate per voter in their electoral district (yen) |
| dpj | = 1 if a candidate belongs to the DPJ, 0 otherwise |
| nocand | Number of candidates in each district |
Q3:
stargazer package.model_3 <- lm(voteshare ~ exppv*dpj + nocand,
data = df3)
model_4 <- lm(voteshare ~ exppv + dpj + nocand,
data = df3)Q4:
model_3 and model_4Q5:
model_3 with two regression lines (dpj = 0 & dpj =1)exppv, y-axis is voteshareQ6:
model_4exppv, y-axis is voteshareQ7:
msm package, visualize the Marginal effects on model_3 and explain its results.