R packages we use in this section
library(broom)
library(ggstance)
library(interplot)
library(margins)
library(msm)
library(patchwork)
library(stargazer)
library(tidyverse)
categorial
or continous
?Does the impact of campaign expenditure on vote share differ depending on the number of eligible voters?
Marginal Effects ・Marginal effects are often calculated when analyzing regression analysis results.
・Marginal effects are equivalent to slopes
in regression analysis.
・Marginal effects tells us how a dependent variable (outcome) changes when a specific independent variable (explanatory variable) changes.
・Other covariates (= control variables) are assumed to be held constant.
→ In this case, we hold nocand
constant.
data
folder in your RProject
folderhr
<- read_csv("data/hr96-17.csv",
hr na = ".")
hr
names(hr)
[1] "year" "pref" "ku" "kun"
[5] "mag" "rank" "wl" "nocand"
[9] "seito" "j_name" "name" "term"
[13] "gender" "age" "exp" "status"
[17] "vote" "voteshare" "eligible" "turnout"
[21] "castvotes" "seshu_dummy" "jiban_seshu" "nojiban_seshu"
unique(hr$year)
[1] 1996 2000 2003 2005 2009 2012 2014 2017
<- hr %>%
df1 ::filter(year == 2005) %>%
dplyr::select(year, age, voteshare, exp, eligible, nocand, term) dplyr
::datatable(df1) DT
df1
contains the following 6 variables
variable | detail |
---|---|
year | Election Year |
age | Candidate’s age |
voteshare | Voteshare (%) |
exp | Election expenditure (yen) spent by each candidate |
eligible | Eligible voters in each district |
nocand | Number of candidates in each district |
term | Number of terms served as a lower house member |
exppv
)exppv
show campaign expenditure spent by each candidate per voter in their electoral district.<- mutate(df1, exppv = exp / eligible) df1
eligible.t
)eligible.t
show the number of eligible voters by thousand<- mutate(df1, eligible.t = eligible / 1000) df1
::datatable(df1) DT
df1
stargazer(as.data.frame(df1), type = "html")
Statistic | N | Mean | St. Dev. | Min | Pctl(25) | Pctl(75) | Max |
year | 989 | 2,005.000 | 0.000 | 2,005 | 2,005 | 2,005 | 2,005 |
age | 989 | 50.292 | 10.871 | 25 | 42 | 58 | 81 |
voteshare | 989 | 30.333 | 19.230 | 1 | 8.8 | 46.6 | 74 |
exp | 985 | 8,142,244.000 | 5,569,641.000 | 62,710.000 | 2,917,435.000 | 11,822,797.000 | 24,649,710.000 |
eligible | 989 | 344,654.300 | 63,898.230 | 214,235 | 297,385 | 397,210 | 465,181 |
nocand | 989 | 3.435 | 0.740 | 2 | 3 | 4 | 6 |
term | 989 | 1.975 | 2.721 | 0 | 0 | 3 | 16 |
exppv | 985 | 24.627 | 17.907 | 0.148 | 8.352 | 35.269 | 89.332 |
eligible.t | 989 | 344.654 | 63.898 | 214.235 | 297.385 | 397.210 | 465.181 |
<- ggplot(df1, aes(exppv, voteshare)) +
F1 geom_point() +
labs(x = "Campaign expenditure per voter (yen)", y = "vote share(%)",
title = "Campagin expenditure and vote share") +
stat_smooth(method = lm, se = FALSE)
<- ggplot(df1, aes(eligible.t, voteshare)) +
F2 geom_point() +
labs(x = "Eligible Voters (thousands)", y = "vote share(%)",
title = "Eligible Voters and vote share") +
stat_smooth(method = lm, se = FALSE)
+ F2 + plot_layout(ncol = 2) F1
exppv
and voteshare
eligible.t
and voteshare
age
, nocand
, and term
are control variablesNote ・A control variable is anything that is held constant or limited in a research study.
・It’s a variable that is not of interest to our research aims, but is controlled because it could influence the outcomes.
・In actual research, we need to add more control variables because it is reasonable for us to assume other variables (such as age of candidates, campaign expenditure, candidate’s gender, etc.) which could influence the votes share.
・Here I add three control variables; age
, nocand
, andterm.
To see interaction effects, we make an interaction term by multiplying the major independent variable (in this case, exppv
) and a numerical moderator variable (in this case, eligible.t
): eligible.t:exppv
We make an interaction term by multiplying the following two independent variables:
exppv
)eligible.t
)Note ・A moderator variable can be two types: categorical and continuous variables.
・In this section, I will deal with a moderator variable (numeric).
・In 20. Multiple Regression 3 (Interaction Effects 1)
, I will deal with a moderator variable (categorical).
Model_1
eligible.t
) influences the relationship between an explanatory and outcome variable.The slope (exppv
→ voteshare
) differs bdepending on the number of eligible.t voters.
This slope is equivalent to Marginal Effect
We estimate the following model:
\[voteshare = α_1 + α_2 exppv + α_3 eligible.t + α_{4} eligible.t:exppv +\\ α_{5} age + α_{6} nocand + α_{7} term + ε \]
\[voteshare = α_1 + (α_2 + α_{4} eligible.t) exppv + α_3 eligible.t\\ + α_{5}age + α_{6} nocand + α_{7} term +ε\]
\(\alpha_2\) | vote share (%) of a candidate when eligible.t = 0 |
\((\alpha_2 + \alpha_4 \textrm{eligible.t})\) | slope (exppv → voteshare )= Marginal Effect |
Does the impact of campaign expenditure on vote share differ depending on the number of eligible.t voters?
Marginal Effects ・Marginal effects are often calculated when analyzing regression analysis results.
・Marginal effects are equivalent to slopes
in regression analysis.
・Marginal effects tells us how a dependent variable (outcome) changes when a specific independent variable (explanatory variable) changes.
・Other covariates (= control variables) are assumed to be held constant.
→ In this case, we hold age
, nocand
, and term
constant.
<- lm(voteshare ~ exppv*eligible.t + age + nocand + term,
model_1 data = df1)
jtools::plot_summs()
::plot_summs(model_1) jtools
tidy()
tidy(model_1)
# A tibble: 7 x 5
term estimate std.error statistic p.value
<chr> <dbl> <dbl> <dbl> <dbl>
1 (Intercept) 40.2 3.78 10.6 5.29e-25
2 exppv 0.0723 0.0931 0.777 4.37e- 1
3 eligible.t 0.00488 0.00876 0.557 5.78e- 1
4 age -0.315 0.0340 -9.27 1.19e-19
5 nocand -4.75 0.450 -10.5 1.19e-24
6 term 3.10 0.161 19.2 3.32e-70
7 exppv:eligible.t 0.00156 0.000290 5.39 8.82e- 8
stargazer()
stargazer(model_1,
digits = 3,
style = "ajps",
title = "Results of model_1 (2005HR election)",
type ="html")
voteshare | |
exppv | 0.072 |
(0.093) | |
eligible.t | 0.005 |
(0.009) | |
age | -0.315*** |
(0.034) | |
nocand | -4.745*** |
(0.450) | |
term | 3.102*** |
(0.161) | |
exppv:eligible.t | 0.002*** |
(0.0003) | |
Constant | 40.165*** |
(3.783) | |
N | 985 |
R-squared | 0.716 |
Adj. R-squared | 0.715 |
Residual Std. Error | 10.259 (df = 978) |
F Statistic | 411.786*** (df = 6; 978) |
p < .01; p < .05; p < .1 |
model_1
, let’s make another model which does not include the interaction term (model_2
) and compare them to have a better understanding of the results.<- lm(voteshare ~ exppv + eligible.t + age + nocand + term,
model_2 data = df1)
stargazer()
stargazer(model_1, model_2,
digits = 3,
style = "ajps",
title = "Results of model_1 and model_2(2005HR election)",
type ="html")
voteshare | ||
Model 1 | Model 2 | |
exppv | 0.072 | 0.559*** |
(0.093) | (0.023) | |
eligible.t | 0.005 | 0.042*** |
(0.009) | (0.006) | |
age | -0.315*** | -0.318*** |
(0.034) | (0.035) | |
nocand | -4.745*** | -4.764*** |
(0.450) | (0.457) | |
term | 3.102*** | 3.252*** |
(0.161) | (0.161) | |
exppv:eligible.t | 0.002*** | |
(0.0003) | ||
Constant | 40.165*** | 28.161*** |
(3.783) | (3.101) | |
N | 985 | 985 |
R-squared | 0.716 | 0.708 |
Adj. R-squared | 0.715 | 0.707 |
Residual Std. Error | 10.259 (df = 978) | 10.405 (df = 979) |
F Statistic | 411.786*** (df = 6; 978) | 474.729*** (df = 5; 979) |
p < .01; p < .05; p < .1 |
tidy(model_1)
# A tibble: 7 x 5
term estimate std.error statistic p.value
<chr> <dbl> <dbl> <dbl> <dbl>
1 (Intercept) 40.2 3.78 10.6 5.29e-25
2 exppv 0.0723 0.0931 0.777 4.37e- 1
3 eligible.t 0.00488 0.00876 0.557 5.78e- 1
4 age -0.315 0.0340 -9.27 1.19e-19
5 nocand -4.75 0.450 -10.5 1.19e-24
6 term 3.10 0.161 19.2 3.32e-70
7 exppv:eligible.t 0.00156 0.000290 5.39 8.82e- 8
Important point The results of Model_1 is the results when eligible.t = 0
→ Since eligible.t = 0 is unrealistic, we need to check marginal efffects when the number of eligible.t voters varies.
The coefficient of exppv
(0.072) in model_1
→ when exppv
increased one yen, voteshare
increased by 0.072 percentage points when eligible.t = 0.
To confirm this, let’s check the Sample Regression Function equations for model_1.
\[\widehat{voteshare}\ = 40.2 + 0.072exppv + 0.002eligible:exppv + 0.005eligible.t \\ - 0.315age - 4.75 nocand + 3.1term\]
\[= 40.2 + (0.072 + 0.002eligible)exppv + 0.005eligible - 0.315age \\- 4.75 nocand + 3.1term\]
exppv
on voteshare
\((α_2 + α_4eligible)\) is:\[{0.072 + 0.002eligible}\]
exppv
→ voteshare
)when the number of eligible.t voter increases by one unit (which means one thousand voters), then vote share increases by 0.074 percentage points (0.072 + 0.002*1 = 0.074)eligible.t
intercept
and coefficients
of model_1
$coef model_1
(Intercept) exppv eligible.t age
40.165022051 0.072328288 0.004875959 -0.315259777
nocand term exppv:eligible.t
-4.745116609 3.102144872 0.001563733
eligible.t
)summary(df1$eligible.t)
Min. 1st Qu. Median Mean 3rd Qu. Max.
214.2 297.4 347.9 344.7 397.2 465.2
eligible.t
between its min (214.2) and max (465.2) with the interval of 50.exppv
) and the 7th coefficients (exppv:eligible
) to calculate these marginal effects.<- seq(214.2, 465.2, 50)
at.eligible.t <- model_1$coef[2] + model_1$coef[7]*at.eligible.t
slopes slopes
[1] 0.4072798 0.4854664 0.5636531 0.6418397 0.7200263 0.7982130
delta method
, we calculate standard error
with 95% confidence intervals for each marginal effects.library(msm)
<- coef(model_1)
estmean <- vcov(model_1)
var
<- rep(NA, length(at.eligible.t))
SEs
for (i in 1:length(at.eligible.t)){
<- at.eligible.t[i]
j <- deltamethod (~ (x2) + (x7)*j, estmean, var)
SEs[i]
}
<- slopes + 1.96*SEs # 5% significant level
upper <- slopes - 1.96*SEs # 5% significant level
lower
cbind(at.eligible.t, slopes, upper, lower)
at.eligible.t slopes upper lower
[1,] 214.2 0.4072798 0.4783484 0.3362112
[2,] 264.2 0.4854664 0.5377139 0.4332189
[3,] 314.2 0.5636531 0.6086580 0.5186481
[4,] 364.2 0.6418397 0.6960400 0.5876394
[5,] 414.2 0.7200263 0.7939621 0.6460905
[6,] 464.2 0.7982130 0.8962533 0.7001726
at.eligible.t
shows the 6 values between mim (214.2) and max (465.2) with the intervals of 50
slopes
is the marginal effects of exppv
on voteshare
The value, (0.4854664), which is located between [2, ]
and slopes
→ The marginal effect of exppv
on voteshare
when eligible.t = 214.2
The value, (0.5636534), which is located between [3, ]
and slopes
→ The marginal effect of exppv
on voteshare
when eligible.t = 314.2
upper
and lower
show the upper & the lower bound on 95% confidence intervals.→ This is important information when you check their statistical significance.
msm_1
<- cbind(at.eligible.t, slopes, upper, lower) %>%
msm_1 as.data.frame()
msm_1
at.eligible.t slopes upper lower
1 214.2 0.4072798 0.4783484 0.3362112
2 264.2 0.4854664 0.5377139 0.4332189
3 314.2 0.5636531 0.6086580 0.5186481
4 364.2 0.6418397 0.6960400 0.5876394
5 414.2 0.7200263 0.7939621 0.6460905
6 464.2 0.7982130 0.8962533 0.7001726
exppv
on voteshare
for the size of eligible voters<- msm_1 %>%
msm_1 ggplot(aes(x = at.eligible.t,
y = slopes,
ymin = lower,
ymax = upper)) +
geom_hline(yintercept = 0, linetype = 2) +
geom_ribbon(alpha = 0.5, fill = "gray") +
geom_line() +
labs(x = "Eligible voters (thousands)",
y = "Marginal Effects of exppv on voteshare (Model_1)") +
ggtitle('Marginal Effects') +
ylim(0, 1)
msm_1
Conclusion ・As the number of eligible voter increases, the impact of campaign expenditure on vote share increases.
・This is statistically significant with the 5% significant level.
msm package
is one way of visualize the results on marginal effects.interplot
library("interplot")
<- interplot(m = model_1,
interplot_1 var1 = "exppv", # Major independent variable
var2 = "eligible.t") + # Moderator variable
labs(x = "Eligible voters (thousands)",
y = "Marginal Effects of exppv on voteshare (Model_1)") +
ggtitle('Marginal Effects') +
geom_hline(yintercept = 0, linetype = 2) +
ylim(0, 1)
interplot_1
margins
library("margins")
<- cplot(model_1,
margins_1 x = "eligible.t", # Moderator variable
dx = "exppv", # Major independent variable
what = "effect", # Marginal Effects
n = 6, # Number assigned
draw = FALSE)
margins_1
xvals yvals upper lower factor
214.2350 0.4073 0.4784 0.3363 exppv
264.4242 0.4858 0.5380 0.4336 exppv
314.6134 0.5643 0.6093 0.5193 exppv
364.8026 0.6428 0.6972 0.5884 exppv
414.9918 0.7213 0.7956 0.6470 exppv
465.1810 0.7997 0.8983 0.7012 exppv
<- margins_1 %>%
margins_1 ggplot(aes(x = xvals, y = yvals, ymin = lower, ymax = upper)) +
geom_hline(yintercept = 0, linetype = 2) +
geom_ribbon(alpha = 0.5, fill = "gray") +
geom_line() +
labs(x = "Eligible voters (thousands)",
y = "Marginal Effects of exppv on voteshare (Model_1)") +
ggtitle('Marginal Effects') +
ylim(0, 1)
margins_1
exppv
) on vote share (voteshare
) differ as the number of winning (term
) for candidates in the 2012 lower house election in Japan.2. Model_1
in this section, answer the following 5 questions.Q1: Using exp
and eligible
, make an expenditure variable, exppv
which shows campaign expenditure spent by each candidate per voter in their electoral district.
df3
, and show its descriptive statistics using stargazer package
.variable | detail |
---|---|
voteshare | Voteshare (%) |
exppv | Campaign expenditure spent by each candidate per voter in their electoral district (yen) |
age | Candidate’s age |
nocand | Number of candidates in each district |
term | Number of candidates in each district |
Q2: Run the following two models and show their results using stargazer package
.
<- lm(voteshare ~ exppv*term + age + nocand,
model_5 data = df3)
<- lm(voteshare ~ exppv + term + age + nocand,
model_6 data = df3)
Q3: Show the two regression equations for model_5
and model_6
Q4: Draw a scatter plot on model_6
exppv
, y-axis is voteshare
Q5: Using msm package
, visualize the Marginal effects on model_5
and explain its results.
exppv
) on vote share (voteshare
) differ as the number of winning (term
) changes in the 2012 lower house election in Japan? Show its evidence and explain why.