R packages we use in this section

library(broom)
library(ggstance)
library(interplot)
library(margins)
library(msm)
library(patchwork)
library(stargazer)
library(tidyverse)

1. Two ways of showing marginal effects

Is moderator variable `categorial` or `continous`?

Depending on the type of moderator variable, you need to change the way of showing your results.

Exmaple: 　　

WHAT WE WANT TO KNOW:

Does the impact of campaign expenditure on vote share differ depending on the number of eligible voters?

When we estimate a model with an interaction term, we need to check Marginal Effects

Marginal Effects ・Marginal effects are often calculated when analyzing regression analysis results.
・Marginal effects are equivalent to slopes in regression analysis.
・Marginal effects tells us how a dependent variable (outcome) changes when a specific independent variable (explanatory variable) changes.
・Other covariates (= control variables) are assumed to be held constant.
→ In this case, we hold nocand constant.

2. Date preparation

Download hr96_17.csv
Japanese lower house election results (1996-2017)
Put the file (hr96_17.csv) into data folder in your RProject folder
Load the data and name it hr

hr <- read_csv("data/hr96-17.csv", 
               na = ".")

Check the variable names in hr

names(hr)

 [1] "year"          "pref"          "ku"            "kun"          
 [5] "mag"           "rank"          "wl"            "nocand"       
 [9] "seito"         "j_name"        "name"          "term"         
[13] "gender"        "age"           "exp"           "status"       
[17] "vote"          "voteshare"     "eligible"      "turnout"      
[21] "castvotes"     "seshu_dummy"   "jiban_seshu"   "nojiban_seshu"

Check how many lower elections we had since 1996 to date

unique(hr$year)

[1] 1996 2000 2003 2005 2009 2012 2014 2017

Select the variables we need:
We only use the 2005 lower house election out of 8 elections

df1 <- hr %>% 
  dplyr::filter(year == 2005) %>% 
  dplyr::select(year, age, voteshare, exp, eligible, nocand, term)

Check df1

DT::datatable(df1)

df1 contains the following 6 variables

variable	detail
year	Election Year
age	Candidate’s age
voteshare	Voteshare (%)
exp	Election expenditure (yen) spent by each candidate
eligible	Eligible voters in each district
nocand	Number of candidates in each district
term	Number of terms served as a lower house member

We need campaign expenditure variable (exppv)
exppv show campaign expenditure spent by each candidate per voter in their electoral district.

df1 <- mutate(df1, exppv = exp / eligible)

We need eligible voter variable (eligible.t)
eligible.t show the number of eligible voters by thousand

df1 <- mutate(df1, eligible.t = eligible / 1000)

DT::datatable(df1)

Show the descriptive statistics of df1

stargazer(as.data.frame(df1), type = "html")


Statistic	N	Mean	St. Dev.	Min	Pctl(25)	Pctl(75)	Max

year	989	2,005.000	0.000	2,005	2,005	2,005	2,005
age	989	50.292	10.871	25	42	58	81
voteshare	989	30.333	19.230	1	8.8	46.6	74
exp	985	8,142,244.000	5,569,641.000	62,710.000	2,917,435.000	11,822,797.000	24,649,710.000
eligible	989	344,654.300	63,898.230	214,235	297,385	397,210	465,181
nocand	989	3.435	0.740	2	3	4	6
term	989	1.975	2.721	0	0	3	16
exppv	985	24.627	17.907	0.148	8.352	35.269	89.332
eligible.t	989	344.654	63.898	214.235	297.385	397.210	465.181

3. Draw two scatter plots

F1 <- ggplot(df1, aes(exppv, voteshare)) +
  geom_point() +
  labs(x = "Campaign expenditure per voter (yen)", y = "vote share(%)",
         title = "Campagin expenditure and vote share") + 
  stat_smooth(method = lm, se = FALSE)

F2 <- ggplot(df1, aes(eligible.t, voteshare)) +
  geom_point() +
  labs(x = "Eligible Voters (thousands)", y = "vote share(%)",
         title = "Eligible Voters and vote share") + 
  stat_smooth(method = lm, se = FALSE)

F1 + F2 + plot_layout(ncol = 2)

We can see a positive relation between exppv and voteshare
We can see a very weak negative relation between eligible.t and voteshare

4. Model_1

age, nocand, and term are control variables

Note ・A control variable is anything that is held constant or limited in a research study.
・It’s a variable that is not of interest to our research aims, but is controlled because it could influence the outcomes.
・In actual research, we need to add more control variables because it is reasonable for us to assume other variables (such as age of candidates, campaign expenditure, candidate’s gender, etc.) which could influence the votes share.
・Here I add three control variables; age, nocand, andterm.

4.1 Interaction term & marginal effects

To see interaction effects, we make an interaction term by multiplying the major independent variable (in this case, exppv) and a numerical moderator variable (in this case, eligible.t): eligible.t:exppv
We make an interaction term by multiplying the following two independent variables:

major explanatory variable (exppv)
moderator variable (eligible.t)

Note ・A moderator variable can be two types: categorical and continuous variables.
・In this section, I will deal with a moderator variable (numeric).
・In 20. Multiple Regression 3 (Interaction Effects 1), I will deal with a moderator variable (categorical).

We include this interaction term in Model_1
This way, we can see how a third variable (eligible.t) influences the relationship between an explanatory and outcome variable.

Assumption of model_1:

The slope (exppv → voteshare) differs bdepending on the number of eligible.t voters.
This slope is equivalent to Marginal Effect
We estimate the following model:

\[voteshare = α_1 + α_2 exppv + α_3 eligible.t + α_{4} eligible.t:exppv +\\ α_{5} age + α_{6} nocand + α_{7} term + ε \]

We can rewrite the equation as follows:

\[voteshare = α_1 + (α_2 + α_{4} eligible.t) exppv + α_3 eligible.t\\ + α_{5}age + α_{6} nocand + α_{7} term +ε\]

\(\alpha_2\)	vote share (%) of a candidate when `eligible.t = 0`
\((\alpha_2 + \alpha_4 \textrm{eligible.t})\)	slope (`exppv` → `voteshare`)= Marginal Effect

WHAT WE WANT TO KNOW:

Does the impact of campaign expenditure on vote share differ depending on the number of eligible.t voters?

When we estimate a model with an interaction term, we need to check Marginal Effects

4.2 Results (Model_1)

model_1 <- lm(voteshare ~ exppv*eligible.t + age + nocand + term,
              data = df1)

Visually show the results: `jtools::plot_summs()`

jtools::plot_summs(model_1)

Show the results: `tidy()`

tidy(model_1)

# A tibble: 7 x 5
  term             estimate std.error statistic  p.value
  <chr>               <dbl>     <dbl>     <dbl>    <dbl>
1 (Intercept)      40.2      3.78        10.6   5.29e-25
2 exppv             0.0723   0.0931       0.777 4.37e- 1
3 eligible.t        0.00488  0.00876      0.557 5.78e- 1
4 age              -0.315    0.0340      -9.27  1.19e-19
5 nocand           -4.75     0.450      -10.5   1.19e-24
6 term              3.10     0.161       19.2   3.32e-70
7 exppv:eligible.t  0.00156  0.000290     5.39  8.82e- 8

Show the results: `stargazer()`

stargazer(model_1,  
          digits = 3,        
          style = "ajps", 
          title = "Results of model_1 (2005HR election)", 
          type ="html")

**Results of model(2005HR election)**

	voteshare

exppv	0.072
	(0.093)
eligible.t	0.005
	(0.009)
age	-0.315^***
	(0.034)
nocand	-4.745^***
	(0.450)
term	3.102^***
	(0.161)
exppv:eligible.t	0.002^***
	(0.0003)
Constant	40.165^***
	(3.783)
N	985
R-squared	0.716
Adj. R-squared	0.715
Residual Std. Error	10.259 (df = 978)
F Statistic	411.786^*** (df = 6; 978)

p < .01; p < .05; p < .1

We see that all variables are statistically significant.
Next, we need to interpret the results.

4.3 Interpretation (model_1)

In interpreting model_1, let’s make another model which does not include the interaction term (model_2) and compare them to have a better understanding of the results.

Let’s make model_2

model_2 <- lm(voteshare ~ exppv + eligible.t + age + nocand + term,
              data = df1)

Show the results: `stargazer()`

stargazer(model_1, model_2,
          digits = 3,        
          style = "ajps", 
          title = "Results of model_1 and model_2(2005HR election)", 
          type ="html")

**Results of modeland modelelection)**

	voteshare
	Model 1	Model 2

exppv	0.072	0.559^***
	(0.093)	(0.023)
eligible.t	0.005	0.042^***
	(0.009)	(0.006)
age	-0.315^***	-0.318^***
	(0.034)	(0.035)
nocand	-4.745^***	-4.764^***
	(0.450)	(0.457)
term	3.102^***	3.252^***
	(0.161)	(0.161)
exppv:eligible.t	0.002^***
	(0.0003)
Constant	40.165^***	28.161^***
	(3.783)	(3.101)
N	985	985
R-squared	0.716	0.708
Adj. R-squared	0.715	0.707
Residual Std. Error	10.259 (df = 978)	10.405 (df = 979)
F Statistic	411.786^*** (df = 6; 978)	474.729^*** (df = 5; 979)

p < .01; p < .05; p < .1

tidy(model_1)

# A tibble: 7 x 5
  term             estimate std.error statistic  p.value
  <chr>               <dbl>     <dbl>     <dbl>    <dbl>
1 (Intercept)      40.2      3.78        10.6   5.29e-25
2 exppv             0.0723   0.0931       0.777 4.37e- 1
3 eligible.t        0.00488  0.00876      0.557 5.78e- 1
4 age              -0.315    0.0340      -9.27  1.19e-19
5 nocand           -4.75     0.450      -10.5   1.19e-24
6 term              3.10     0.161       19.2   3.32e-70
7 exppv:eligible.t  0.00156  0.000290     5.39  8.82e- 8

Important point The results of Model_1 is the results when eligible.t = 0
→ Since eligible.t = 0 is unrealistic, we need to check marginal efffects when the number of eligible.t voters varies.

The coefficient of exppv (0.072) in model_1
→ when exppv increased one yen, voteshare increased by 0.072 percentage points when eligible.t = 0.
To confirm this, let’s check the Sample Regression Function equations for model_1.

\[\widehat{voteshare}\ = 40.2 + 0.072exppv + 0.002eligible:exppv + 0.005eligible.t \\ - 0.315age - 4.75 nocand + 3.1term\]
\[= 40.2 + (0.072 + 0.002eligible)exppv + 0.005eligible - 0.315age \\- 4.75 nocand + 3.1term\]

The impact of exppv on voteshare \((α_2 + α_4eligible)\) is:

\[{0.072 + 0.002eligible}\]

0.072 + 0.002eligible means the marginal effect（exppv →　voteshare）when the number of eligible.t voter increases by one unit (which means one thousand voters), then vote share increases by 0.074 percentage points (0.072 + 0.002*1 = 0.074)
We see that marginal effects change depending on the value of eligible.t

5. Visualize Marginal Effects (Model_1)

Show the intercept and coefficients of model_1

model_1$coef

     (Intercept)            exppv       eligible.t              age 
    40.165022051      0.072328288      0.004875959     -0.315259777 
          nocand             term exppv:eligible.t 
    -4.745116609      3.102144872      0.001563733

Check the descriptive statistics of moderator (in this case, eligible.t)

summary(df1$eligible.t)

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  214.2   297.4   347.9   344.7   397.2   465.2

Make 6 values on eligible.t between its min (214.2) and max (465.2) with the interval of 50.
Calculate the marginal effects (= slopes) for the 6 values
We need the 2nd coefficient (exppv) and the 7th coefficients (exppv:eligible) to calculate these marginal effects.

at.eligible.t <- seq(214.2, 465.2, 50) 
slopes <- model_1$coef[2] + model_1$coef[7]*at.eligible.t　
slopes

[1] 0.4072798 0.4854664 0.5636531 0.6418397 0.7200263 0.7982130

Using delta method, we calculate standard error with 95% confidence intervals for each marginal effects.

library(msm)

estmean <- coef(model_1)
var <- vcov(model_1)

SEs <- rep(NA, length(at.eligible.t))

for (i in 1:length(at.eligible.t)){
    j <- at.eligible.t[i]
    SEs[i] <- deltamethod (~ (x2) + (x7)*j, estmean, var)　
}                                                      

upper <- slopes + 1.96*SEs  # 5% significant level
lower <- slopes - 1.96*SEs  # 5% significant level

cbind(at.eligible.t, slopes, upper, lower)

     at.eligible.t    slopes     upper     lower
[1,]         214.2 0.4072798 0.4783484 0.3362112
[2,]         264.2 0.4854664 0.5377139 0.4332189
[3,]         314.2 0.5636531 0.6086580 0.5186481
[4,]         364.2 0.6418397 0.6960400 0.5876394
[5,]         414.2 0.7200263 0.7939621 0.6460905
[6,]         464.2 0.7982130 0.8962533 0.7001726

at.eligible.t shows the 6 values between mim (214.2) and max (465.2) with the intervals of 50
slopes is the marginal effects of exppv on voteshare
The value, (0.4854664), which is located between [2, ] and slopes
→　The marginal effect of exppv on voteshare when eligible.t = 214.2
The value, (0.5636534), which is located between [3, ] and slopes
→　The marginal effect of exppv on voteshare when eligible.t = 314.2

★ `upper` and `lower` show the upper & the lower bound on 95% confidence intervals.

→ This is important information when you check their statistical significance.

To visualize this result, we change the data into data frame and name it msm_1

msm_1 <- cbind(at.eligible.t, slopes, upper, lower) %>% 
  as.data.frame()

msm_1

  at.eligible.t    slopes     upper     lower
1         214.2 0.4072798 0.4783484 0.3362112
2         264.2 0.4854664 0.5377139 0.4332189
3         314.2 0.5636531 0.6086580 0.5186481
4         364.2 0.6418397 0.6960400 0.5876394
5         414.2 0.7200263 0.7939621 0.6460905
6         464.2 0.7982130 0.8962533 0.7001726

Draw a graph showing the marginal effects of exppv on voteshare for the size of eligible voters

msm_1 <- msm_1 %>% 
  ggplot(aes(x = at.eligible.t, 
             y = slopes,
             ymin = lower,
             ymax = upper)) +
  geom_hline(yintercept = 0, linetype = 2) +
  geom_ribbon(alpha = 0.5, fill = "gray") +
  geom_line() +
  labs(x = "Eligible voters (thousands)", 
     y = "Marginal Effects of exppv on voteshare (Model_1)") +
  ggtitle('Marginal Effects') +
  ylim(0, 1) 

msm_1

Conclusion ・As the number of eligible voter increases, the impact of campaign expenditure on vote share increases.
・This is statistically significant with the 5% significant level.

6. Statistical Significance (Marginal Effects)

7. How to show Marginal Effects

Using msm package is one way of visualize the results on marginal effects.
Let me introduce the other two packages to visualize the results.

7.1 `interplot`

library("interplot")

interplot_1 <- interplot(m = model_1, 
          var1 = "exppv",           # Major independent variable
          var2 = "eligible.t") +    # Moderator variable  
labs(x = "Eligible voters (thousands)", 
     y = "Marginal Effects of exppv on voteshare (Model_1)") +
  ggtitle('Marginal Effects') +
  geom_hline(yintercept = 0, linetype = 2) +
  ylim(0, 1) 

interplot_1

7.2 `margins`

library("margins")

margins_1 <- cplot(model_1,         
                   x = "eligible.t",  # Moderator variable
                   dx = "exppv", 　   # Major independent variable
                   what = "effect",   # Marginal Effects 
                   n = 6,             # Number assigned  
                   draw = FALSE)   
margins_1

    xvals  yvals  upper  lower factor
 214.2350 0.4073 0.4784 0.3363  exppv
 264.4242 0.4858 0.5380 0.4336  exppv
 314.6134 0.5643 0.6093 0.5193  exppv
 364.8026 0.6428 0.6972 0.5884  exppv
 414.9918 0.7213 0.7956 0.6470  exppv
 465.1810 0.7997 0.8983 0.7012  exppv

margins_1 <- margins_1  %>% 
  ggplot(aes(x = xvals, y = yvals, ymin = lower, ymax = upper)) +
  geom_hline(yintercept = 0, linetype = 2) +
  geom_ribbon(alpha = 0.5, fill = "gray") +
  geom_line() +
  labs(x = "Eligible voters (thousands)", 
     y = "Marginal Effects of exppv on voteshare (Model_1)") +
  ggtitle('Marginal Effects') +
  ylim(0, 1) 

margins_1

8. Exercise

We want know whether the impact of campaign expenditure (exppv) on vote share (voteshare) differ as the number of winning (term) for candidates in the 2012 lower house election in Japan.
The 2012 lower house election is known for Prime Minister Abe’s coming back to power election.
Data you use: hr96_17.csv
Japanese lower house election results (1996-2017)
Following the steps in 2. Model_1 in this section, answer the following 5 questions.

Q1: Using exp and eligible, make an expenditure variable, exppv which shows campaign expenditure spent by each candidate per voter in their electoral district.

Select the following 5 variables, name the data frame df3, and show its descriptive statistics using stargazer package.

variable	detail
voteshare	Voteshare (%)
exppv	Campaign expenditure spent by each candidate per voter in their electoral district (yen)
age	Candidate’s age
nocand	Number of candidates in each district
term	Number of candidates in each district

Q2: Run the following two models and show their results using stargazer package.

model_5 <- lm(voteshare ~ exppv*term + age + nocand,
              data = df3)

model_6 <- lm(voteshare ~ exppv + term + age + nocand,
              data = df3)

Q3: Show the two regression equations for model_5 and model_6

Q4: Draw a scatter plot on model_6

x-axis is exppv, y-axis is voteshare

Q5: Using msm package, visualize the Marginal effects on model_5 and explain its results.

Can you conclude that the impact of campaign expenditure (exppv) on vote share (voteshare) differ as the number of winning (term) changes in the 2012 lower house election in Japan? Show its evidence and explain why.

References

宋財泫 (Jaehyun Song)・矢内勇生 (Yuki Yanai)「私たちのR: ベストプラクティスの探究」

土井翔平（北海道大学公共政策大学院）「Rで計量政治学入門」

矢内勇生（高知工科大学）授業一覧

浅野正彦, 矢内勇生.『Rによる計量政治学』オーム社、2018年

浅野正彦, 中村公亮.『初めてのRStudio』オーム社、2018年

Winston Chang, R Graphics Cookbook, O’Reilly Media, 2012.

Kieran Healy, DATA VISUALIZATION, Princeton, 2019

Kosuke Imai, Quantitative Social Science: An Introduction, Princeton University Press, 2017

21. Linear Regression 4 (Interaction 2)

Masahiko Asano

2021-10-19

1. Two ways of showing marginal effects

Is moderator variable `categorial` or `continous`?

Exmaple:

WHAT WE WANT TO KNOW:

2. Date preparation

3. Draw two scatter plots

4. Model_1

4.1 Interaction term & marginal effects

Assumption of model_1:

WHAT WE WANT TO KNOW:

4.2 Results (Model_1)

Visually show the results: `jtools::plot_summs()`

Show the results: `tidy()`

Show the results: `stargazer()`

4.3 Interpretation (model_1)

Show the results: `stargazer()`

5. Visualize Marginal Effects (Model_1)

★ `upper` and `lower` show the upper & the lower bound on 95% confidence intervals.

6. Statistical Significance (Marginal Effects)

7. How to show Marginal Effects

7.1 `interplot`

7.2 `margins`

8. Exercise

21. Linear Regression 4 (Interaction 2)

Masahiko Asano

2021-10-19

1. Two ways of showing marginal effects

Is moderator variable categorial or continous?

Exmaple:

WHAT WE WANT TO KNOW:

2. Date preparation

3. Draw two scatter plots

4. Model_1

4.1 Interaction term & marginal effects

Assumption of model_1:

WHAT WE WANT TO KNOW:

4.2 Results (Model_1)

Visually show the results: jtools::plot_summs()

Show the results: tidy()

Show the results: stargazer()

4.3 Interpretation (model_1)

Show the results: stargazer()

5. Visualize Marginal Effects (Model_1)

★ upper and lower show the upper & the lower bound on 95% confidence intervals.

6. Statistical Significance (Marginal Effects)

7. How to show Marginal Effects

7.1 interplot

7.2 margins

8. Exercise

Is moderator variable `categorial` or `continous`?

Exmaple: 　　

Visually show the results: `jtools::plot_summs()`

Show the results: `tidy()`

Show the results: `stargazer()`

Show the results: `stargazer()`

★ `upper` and `lower` show the upper & the lower bound on 95% confidence intervals.

7.1 `interplot`

7.2 `margins`