R packages we use in this sectionlibrary(broom)
library(ggstance)
library(interplot)
library(margins)
library(msm)
library(patchwork)
library(stargazer)
library(tidyverse)categorial or continous?Does the impact of campaign expenditure on vote share differ depending on the number of eligible voters?
Marginal Effects ・Marginal effects are often calculated when analyzing regression analysis results.
・Marginal effects are equivalent to slopes in regression analysis.
・Marginal effects tells us how a dependent variable (outcome) changes when a specific independent variable (explanatory variable) changes.
・Other covariates (= control variables) are assumed to be held constant.
→ In this case, we hold nocand constant.
data folder in your RProject folderhrhr <- read_csv("data/hr96-17.csv",
na = ".")hrnames(hr) [1] "year" "pref" "ku" "kun"
[5] "mag" "rank" "wl" "nocand"
[9] "seito" "j_name" "name" "term"
[13] "gender" "age" "exp" "status"
[17] "vote" "voteshare" "eligible" "turnout"
[21] "castvotes" "seshu_dummy" "jiban_seshu" "nojiban_seshu"
unique(hr$year)[1] 1996 2000 2003 2005 2009 2012 2014 2017
df1 <- hr %>%
dplyr::filter(year == 2005) %>%
dplyr::select(year, age, voteshare, exp, eligible, nocand, term) DT::datatable(df1)df1 contains the following 6 variables
| variable | detail |
|---|---|
| year | Election Year |
| age | Candidate’s age |
| voteshare | Voteshare (%) |
| exp | Election expenditure (yen) spent by each candidate |
| eligible | Eligible voters in each district |
| nocand | Number of candidates in each district |
| term | Number of terms served as a lower house member |
exppv)exppv show campaign expenditure spent by each candidate per voter in their electoral district.df1 <- mutate(df1, exppv = exp / eligible)eligible.t)eligible.t show the number of eligible voters by thousanddf1 <- mutate(df1, eligible.t = eligible / 1000)DT::datatable(df1)df1stargazer(as.data.frame(df1), type = "html")| Statistic | N | Mean | St. Dev. | Min | Pctl(25) | Pctl(75) | Max |
| year | 989 | 2,005.000 | 0.000 | 2,005 | 2,005 | 2,005 | 2,005 |
| age | 989 | 50.292 | 10.871 | 25 | 42 | 58 | 81 |
| voteshare | 989 | 30.333 | 19.230 | 1 | 8.8 | 46.6 | 74 |
| exp | 985 | 8,142,244.000 | 5,569,641.000 | 62,710.000 | 2,917,435.000 | 11,822,797.000 | 24,649,710.000 |
| eligible | 989 | 344,654.300 | 63,898.230 | 214,235 | 297,385 | 397,210 | 465,181 |
| nocand | 989 | 3.435 | 0.740 | 2 | 3 | 4 | 6 |
| term | 989 | 1.975 | 2.721 | 0 | 0 | 3 | 16 |
| exppv | 985 | 24.627 | 17.907 | 0.148 | 8.352 | 35.269 | 89.332 |
| eligible.t | 989 | 344.654 | 63.898 | 214.235 | 297.385 | 397.210 | 465.181 |
F1 <- ggplot(df1, aes(exppv, voteshare)) +
geom_point() +
labs(x = "Campaign expenditure per voter (yen)", y = "vote share(%)",
title = "Campagin expenditure and vote share") +
stat_smooth(method = lm, se = FALSE)F2 <- ggplot(df1, aes(eligible.t, voteshare)) +
geom_point() +
labs(x = "Eligible Voters (thousands)", y = "vote share(%)",
title = "Eligible Voters and vote share") +
stat_smooth(method = lm, se = FALSE) F1 + F2 + plot_layout(ncol = 2)exppv and voteshareeligible.t and voteshareage, nocand, and term are control variablesNote ・A control variable is anything that is held constant or limited in a research study.
・It’s a variable that is not of interest to our research aims, but is controlled because it could influence the outcomes.
・In actual research, we need to add more control variables because it is reasonable for us to assume other variables (such as age of candidates, campaign expenditure, candidate’s gender, etc.) which could influence the votes share.
・Here I add three control variables; age, nocand, andterm.
To see interaction effects, we make an interaction term by multiplying the major independent variable (in this case, exppv) and a numerical moderator variable (in this case, eligible.t): eligible.t:exppv
We make an interaction term by multiplying the following two independent variables:
exppv)eligible.t)Note ・A moderator variable can be two types: categorical and continuous variables.
・In this section, I will deal with a moderator variable (numeric).
・In 20. Multiple Regression 3 (Interaction Effects 1), I will deal with a moderator variable (categorical).
Model_1eligible.t) influences the relationship between an explanatory and outcome variable.The slope (exppv → voteshare) differs bdepending on the number of eligible.t voters.
This slope is equivalent to Marginal Effect
We estimate the following model:
\[voteshare = α_1 + α_2 exppv + α_3 eligible.t + α_{4} eligible.t:exppv +\\ α_{5} age + α_{6} nocand + α_{7} term + ε \]
\[voteshare = α_1 + (α_2 + α_{4} eligible.t) exppv + α_3 eligible.t\\ + α_{5}age + α_{6} nocand + α_{7} term +ε\]
| \(\alpha_2\) | vote share (%) of a candidate when eligible.t = 0 |
| \((\alpha_2 + \alpha_4 \textrm{eligible.t})\) | slope (exppv → voteshare)= Marginal Effect |
Does the impact of campaign expenditure on vote share differ depending on the number of eligible.t voters?
Marginal Effects ・Marginal effects are often calculated when analyzing regression analysis results.
・Marginal effects are equivalent to slopes in regression analysis.
・Marginal effects tells us how a dependent variable (outcome) changes when a specific independent variable (explanatory variable) changes.
・Other covariates (= control variables) are assumed to be held constant.
→ In this case, we hold age, nocand, and term constant.
model_1 <- lm(voteshare ~ exppv*eligible.t + age + nocand + term,
data = df1)jtools::plot_summs()jtools::plot_summs(model_1)tidy()tidy(model_1)# A tibble: 7 x 5
term estimate std.error statistic p.value
<chr> <dbl> <dbl> <dbl> <dbl>
1 (Intercept) 40.2 3.78 10.6 5.29e-25
2 exppv 0.0723 0.0931 0.777 4.37e- 1
3 eligible.t 0.00488 0.00876 0.557 5.78e- 1
4 age -0.315 0.0340 -9.27 1.19e-19
5 nocand -4.75 0.450 -10.5 1.19e-24
6 term 3.10 0.161 19.2 3.32e-70
7 exppv:eligible.t 0.00156 0.000290 5.39 8.82e- 8
stargazer()stargazer(model_1,
digits = 3,
style = "ajps",
title = "Results of model_1 (2005HR election)",
type ="html")| voteshare | |
| exppv | 0.072 |
| (0.093) | |
| eligible.t | 0.005 |
| (0.009) | |
| age | -0.315*** |
| (0.034) | |
| nocand | -4.745*** |
| (0.450) | |
| term | 3.102*** |
| (0.161) | |
| exppv:eligible.t | 0.002*** |
| (0.0003) | |
| Constant | 40.165*** |
| (3.783) | |
| N | 985 |
| R-squared | 0.716 |
| Adj. R-squared | 0.715 |
| Residual Std. Error | 10.259 (df = 978) |
| F Statistic | 411.786*** (df = 6; 978) |
| p < .01; p < .05; p < .1 | |
model_1, let’s make another model which does not include the interaction term (model_2) and compare them to have a better understanding of the results.model_2 <- lm(voteshare ~ exppv + eligible.t + age + nocand + term,
data = df1)stargazer()stargazer(model_1, model_2,
digits = 3,
style = "ajps",
title = "Results of model_1 and model_2(2005HR election)",
type ="html")| voteshare | ||
| Model 1 | Model 2 | |
| exppv | 0.072 | 0.559*** |
| (0.093) | (0.023) | |
| eligible.t | 0.005 | 0.042*** |
| (0.009) | (0.006) | |
| age | -0.315*** | -0.318*** |
| (0.034) | (0.035) | |
| nocand | -4.745*** | -4.764*** |
| (0.450) | (0.457) | |
| term | 3.102*** | 3.252*** |
| (0.161) | (0.161) | |
| exppv:eligible.t | 0.002*** | |
| (0.0003) | ||
| Constant | 40.165*** | 28.161*** |
| (3.783) | (3.101) | |
| N | 985 | 985 |
| R-squared | 0.716 | 0.708 |
| Adj. R-squared | 0.715 | 0.707 |
| Residual Std. Error | 10.259 (df = 978) | 10.405 (df = 979) |
| F Statistic | 411.786*** (df = 6; 978) | 474.729*** (df = 5; 979) |
| p < .01; p < .05; p < .1 | ||
tidy(model_1)# A tibble: 7 x 5
term estimate std.error statistic p.value
<chr> <dbl> <dbl> <dbl> <dbl>
1 (Intercept) 40.2 3.78 10.6 5.29e-25
2 exppv 0.0723 0.0931 0.777 4.37e- 1
3 eligible.t 0.00488 0.00876 0.557 5.78e- 1
4 age -0.315 0.0340 -9.27 1.19e-19
5 nocand -4.75 0.450 -10.5 1.19e-24
6 term 3.10 0.161 19.2 3.32e-70
7 exppv:eligible.t 0.00156 0.000290 5.39 8.82e- 8
Important point The results of Model_1 is the results when eligible.t = 0
→ Since eligible.t = 0 is unrealistic, we need to check marginal efffects when the number of eligible.t voters varies.
The coefficient of exppv (0.072) in model_1
→ when exppv increased one yen, voteshare increased by 0.072 percentage points when eligible.t = 0.
To confirm this, let’s check the Sample Regression Function equations for model_1.
\[\widehat{voteshare}\ = 40.2 + 0.072exppv + 0.002eligible:exppv + 0.005eligible.t \\ - 0.315age - 4.75 nocand + 3.1term\]
\[= 40.2 + (0.072 + 0.002eligible)exppv + 0.005eligible - 0.315age \\- 4.75 nocand + 3.1term\]
exppv on voteshare \((α_2 + α_4eligible)\) is:\[{0.072 + 0.002eligible}\]
exppv → voteshare)when the number of eligible.t voter increases by one unit (which means one thousand voters), then vote share increases by 0.074 percentage points (0.072 + 0.002*1 = 0.074)eligible.tintercept and coefficients of model_1model_1$coef (Intercept) exppv eligible.t age
40.165022051 0.072328288 0.004875959 -0.315259777
nocand term exppv:eligible.t
-4.745116609 3.102144872 0.001563733
eligible.t)summary(df1$eligible.t) Min. 1st Qu. Median Mean 3rd Qu. Max.
214.2 297.4 347.9 344.7 397.2 465.2
eligible.t between its min (214.2) and max (465.2) with the interval of 50.exppv) and the 7th coefficients (exppv:eligible) to calculate these marginal effects.at.eligible.t <- seq(214.2, 465.2, 50)
slopes <- model_1$coef[2] + model_1$coef[7]*at.eligible.t
slopes [1] 0.4072798 0.4854664 0.5636531 0.6418397 0.7200263 0.7982130
delta method, we calculate standard error with 95% confidence intervals for each marginal effects.library(msm)estmean <- coef(model_1)
var <- vcov(model_1)
SEs <- rep(NA, length(at.eligible.t))
for (i in 1:length(at.eligible.t)){
j <- at.eligible.t[i]
SEs[i] <- deltamethod (~ (x2) + (x7)*j, estmean, var)
}
upper <- slopes + 1.96*SEs # 5% significant level
lower <- slopes - 1.96*SEs # 5% significant level
cbind(at.eligible.t, slopes, upper, lower) at.eligible.t slopes upper lower
[1,] 214.2 0.4072798 0.4783484 0.3362112
[2,] 264.2 0.4854664 0.5377139 0.4332189
[3,] 314.2 0.5636531 0.6086580 0.5186481
[4,] 364.2 0.6418397 0.6960400 0.5876394
[5,] 414.2 0.7200263 0.7939621 0.6460905
[6,] 464.2 0.7982130 0.8962533 0.7001726
at.eligible.t shows the 6 values between mim (214.2) and max (465.2) with the intervals of 50
slopes is the marginal effects of exppv on voteshare
The value, (0.4854664), which is located between [2, ] and slopes
→ The marginal effect of exppv on voteshare when eligible.t = 214.2
The value, (0.5636534), which is located between [3, ] and slopes
→ The marginal effect of exppv on voteshare when eligible.t = 314.2
upper and lower show the upper & the lower bound on 95% confidence intervals.→ This is important information when you check their statistical significance.
msm_1msm_1 <- cbind(at.eligible.t, slopes, upper, lower) %>%
as.data.frame()
msm_1 at.eligible.t slopes upper lower
1 214.2 0.4072798 0.4783484 0.3362112
2 264.2 0.4854664 0.5377139 0.4332189
3 314.2 0.5636531 0.6086580 0.5186481
4 364.2 0.6418397 0.6960400 0.5876394
5 414.2 0.7200263 0.7939621 0.6460905
6 464.2 0.7982130 0.8962533 0.7001726
exppv on voteshare for the size of eligible votersmsm_1 <- msm_1 %>%
ggplot(aes(x = at.eligible.t,
y = slopes,
ymin = lower,
ymax = upper)) +
geom_hline(yintercept = 0, linetype = 2) +
geom_ribbon(alpha = 0.5, fill = "gray") +
geom_line() +
labs(x = "Eligible voters (thousands)",
y = "Marginal Effects of exppv on voteshare (Model_1)") +
ggtitle('Marginal Effects') +
ylim(0, 1)
msm_1Conclusion ・As the number of eligible voter increases, the impact of campaign expenditure on vote share increases.
・This is statistically significant with the 5% significant level.
msm package is one way of visualize the results on marginal effects.interplotlibrary("interplot")interplot_1 <- interplot(m = model_1,
var1 = "exppv", # Major independent variable
var2 = "eligible.t") + # Moderator variable
labs(x = "Eligible voters (thousands)",
y = "Marginal Effects of exppv on voteshare (Model_1)") +
ggtitle('Marginal Effects') +
geom_hline(yintercept = 0, linetype = 2) +
ylim(0, 1)
interplot_1marginslibrary("margins")margins_1 <- cplot(model_1,
x = "eligible.t", # Moderator variable
dx = "exppv", # Major independent variable
what = "effect", # Marginal Effects
n = 6, # Number assigned
draw = FALSE)
margins_1 xvals yvals upper lower factor
214.2350 0.4073 0.4784 0.3363 exppv
264.4242 0.4858 0.5380 0.4336 exppv
314.6134 0.5643 0.6093 0.5193 exppv
364.8026 0.6428 0.6972 0.5884 exppv
414.9918 0.7213 0.7956 0.6470 exppv
465.1810 0.7997 0.8983 0.7012 exppv
margins_1 <- margins_1 %>%
ggplot(aes(x = xvals, y = yvals, ymin = lower, ymax = upper)) +
geom_hline(yintercept = 0, linetype = 2) +
geom_ribbon(alpha = 0.5, fill = "gray") +
geom_line() +
labs(x = "Eligible voters (thousands)",
y = "Marginal Effects of exppv on voteshare (Model_1)") +
ggtitle('Marginal Effects') +
ylim(0, 1)
margins_1 exppv) on vote share (voteshare) differ as the number of winning (term) for candidates in the 2012 lower house election in Japan.2. Model_1 in this section, answer the following 5 questions.Q1: Using exp and eligible, make an expenditure variable, exppv which shows campaign expenditure spent by each candidate per voter in their electoral district.
df3, and show its descriptive statistics using stargazer package.| variable | detail |
|---|---|
| voteshare | Voteshare (%) |
| exppv | Campaign expenditure spent by each candidate per voter in their electoral district (yen) |
| age | Candidate’s age |
| nocand | Number of candidates in each district |
| term | Number of candidates in each district |
Q2: Run the following two models and show their results using stargazer package.
model_5 <- lm(voteshare ~ exppv*term + age + nocand,
data = df3)
model_6 <- lm(voteshare ~ exppv + term + age + nocand,
data = df3)Q3: Show the two regression equations for model_5 and model_6
Q4: Draw a scatter plot on model_6
exppv, y-axis is voteshareQ5: Using msm package, visualize the Marginal effects on model_5 and explain its results.
exppv) on vote share (voteshare) differ as the number of winning (term) changes in the 2012 lower house election in Japan? Show its evidence and explain why.