• R packages we use in this section
library(tidyverse)
library(stargazer)

1. What can we do with a dummy?

1.1 What is a dummy variable?

  • A dummy variable is one that takes only the value 0 or 1 to indicate the absence (or presence) of some categorical effect that may be expected to shift the outcome.
  • They can be thought of as numeric substitutes for qualitative facts in a regression model, sorting data into mutually exclusive categories (such as winner = 1, loser =0 in election).
  • Other examples of a dummy variable:
  1. male = 1, or female = 0
  2. war = 1, no war = 0
  3. north = 1, south = 0

1.2 What we can do with a dummy variable

Research Question

  • Do the location of local government (north or south) matters in predicting local government’s performance in Italy?

When you do not control with local dummy (left)

  • You see that the better the economic situation, the more government performance in Italy.

When you control with local dummy (right)

  • You see the location of local government (north or south) matters in predicting local government’s performance in Italy.
  • Local government performance is higher in the north rather than in the south.
  • However, economic situation does not matter in predicting government performance in Italy

2. Economic Situation and Location

Theory: Social capital enhances local government’s performance.

Source: Robert Putnam, (1994) Making Democracy Work: Civic Traditions in Modern Italy,
Princeton, NJ: Princeton University Press)

Theory

  • The differences on local government’s performance can be explained by the degree of social capital in each local government.

  • Social capital can be defined as “the networks of relationships among people who live and work in a particular society, enabling that society to function effectively”.

  • It involves the effective functioning of social groups through interpersonal relationships, a shared sense of identity, a shared understanding, shared norms, shared values, trust, cooperation, and reciprocity.

  • Social capital help people build cooperation one another.
    In the area with more social capital, the more people trust and cooperate one another, which leads to high quality government performance.

3. Testing Goldberg’s Argument

✔ Goldberg’s argument (1996)

  • Italy has very different history, tradition and culture between the north and the south
  • The difference on the north and the south explains the difference in society such as politics and economy.
  • So, you need to take the difference between the north and the south into consideration in analyzing the relationship between government performance.
  • North → more social capital → higher government performance
  • South → less social capital → lower government performance
  • Let’s check if what Goldberg says is correct.

3.1 Does gov_p differ by location?

  • Download (putnam.csv)
  • Put the file (putnam.csv) into data folder in your RProject folder
  • Load the data and name it putnam
putnam <- read_csv("data/putnam.csv") 
  • Show the list of variables the data contains
names(putnam)
[1] "region"   "gov_p"    "cc"       "econ"     "location"

Data

Types of variables Variables Details
Outcome gov_p Performance of Italian local governments
Predictor region Abbreviation of Italian local governments
Predictor cc Civic Community Index
Predictor econ Economy Index (the larger, the better)
Predictor location Area dummy (north,south
  • Check putnum
DT::datatable(putnam)

Let’s check if the government performance differs by location

  • Draw a scatter plot
putnam %>% 
  ggplot(aes(x = location, y = gov_p, fill = location)) +
    geom_boxplot() +
  labs(x = "Location Dummy", y = "gov_p",
         title = "Government Performance in Italy by Location") + 
  stat_smooth(method = lm, se = FALSE) 

  • It looks like there is a clear difference between north and south

  • Conduct a t-test (unpaired)

t.test(putnam$gov_p[putnam$location == "north"],
       putnam$gov_p[putnam$location == "south"])

    Welch Two Sample t-test

data:  putnam$gov_p[putnam$location == "north"] and putnam$gov_p[putnam$location == "south"]
t = 6.8253, df = 14.552, p-value = 6.737e-06
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 4.607777 8.808890
sample estimates:
mean of x mean of y 
 11.83333   5.12500 

Result

・Average gov_p (North) = 11.833
・Average gov_p (South) = 5.125
・The difference (-6.708) is statistically significant with the 1% significant level (p-value = 6.737e-06)
→ As Goldberg (1996) argues, there is a clear difference in government performance between north and south.

  • Next question we should ask is whether economic situation is related to government performance both in northern area and southern area.

3.2 Does econ explain gov_p?

  • It is seems that economy (econ) is related to government performance (gov_p) in Italy.

  • However, it is not clear yet that this is the case both in northern area and southern area.

  • Draw a scatter plot between econ and gov_p

putnam %>% 
  ggplot(aes(econ, gov_p)) +
  geom_point() +
  theme_bw() +
  labs(x = "econ", y = "gov_p",
         title = "Economic situation and Government Performance in Italy") + 
  stat_smooth(method = lm, se = FALSE)

  • We see a positive correlation between econ and gov_p.
    → The better the economic situation, the higher the local government performance.
  • Let’s get Sample Regression Function (SRP) for model_1.
model_1 <- lm(gov_p ~ econ, data = putnam)

summary(model_1)

Call:
lm(formula = gov_p ~ econ, data = putnam)

Residuals:
    Min      1Q  Median      3Q     Max 
-4.3386 -1.7733  0.0086  0.8336  5.5114 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)   3.0108     1.3847   2.174 0.043264 *  
econ          0.5889     0.1200   4.909 0.000113 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 2.659 on 18 degrees of freedom
Multiple R-squared:  0.5724,    Adjusted R-squared:  0.5487 
F-statistic:  24.1 on 1 and 18 DF,  p-value: 0.0001131

\[\widehat{gov_p}\ = 3.01 + 0.589econ\]

  • Check the class of variables contained in putnam
str(putnam)
spec_tbl_df [20 × 5] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
 $ region  : chr [1:20] "Ab" "Ba" "Cl" "Cm" ...
 $ gov_p   : num [1:20] 7.5 7.5 1.5 2.5 16 12 10 11 11 9 ...
 $ cc      : num [1:20] 8 4 1 2 18 17 13 16 17 15.5 ...
 $ econ    : num [1:20] 7 3 3 6.5 13 14.5 12.5 15.5 19 10.5 ...
 $ location: chr [1:20] "south" "south" "south" "south" ...
 - attr(*, "spec")=
  .. cols(
  ..   region = col_character(),
  ..   gov_p = col_double(),
  ..   cc = col_double(),
  ..   econ = col_double(),
  ..   location = col_character()
  .. )

→ Change the class of location from charactor to numeric
→ Change the name of data frame as df2

df2 <- mutate(putnam, 
              location = as.numeric(location == "north" )) # north = 1, south = 0
DT::datatable(df2)
  • To see if economic situation (econ) is related to government performance (gov_p) both in northern area and southern area, we need to simultaneously include econ and location in our regression model.
model_2 <- lm(gov_p ~ econ + location, data = df2)
  • Show the results
  • Note: replace {r} with {r, results = "asis"} as the chunk option
stargazer(model_2, type = "html")
Dependent variable:
gov_p
econ -0.019
(0.220)
location 6.884***
(2.229)
Constant 5.222***
(1.347)
Observations 20
R2 0.726
Adjusted R2 0.694
Residual Std. Error 2.190 (df = 17)
F Statistic 22.531*** (df = 2; 17)
Note: p<0.1; p<0.05; p<0.01
  • We get the following SRF for model_2:

\[\widehat{gov_p}\ = 5.222 - 0.019econ + 6.88location\]

  • We see that econ is not related to gov_p

  • We see that location is related to gov_p
    → When location = 1, (that is, when the local government is located in the North), government performance is higher by 6.884 points.

  • By substituting location = 0 and 1, we get the following two regression functions:
    Note: The two slopes are identical!

loation = 0

\[\widehat{gov_p}\ = 5.22 - 0.019econ\]

location = 1

\[\widehat{gov_p}\ = 12.11 - 0.019econ\]

  • Let’s visualize these two results by drawining scatter plots

When you do not control with local dummy (left)

  • You see that the better the economic situation, the more government performance in Italy.

When you control with local dummy (right)

  • You see the location of local government (north or south) matters in predicting local government’s performance in Italy.
  • Local government performance is higher in the north rather than in the south.
  • Economic situation does not matter in predicting government performance in Italy

Result ・The relationship between economic situation (econ) and government performance (gov_p) is spurious correlation.

4. Testing Putnam’s Argument

4.1 Does cc explain gov_p?

  • It seems that civic community index (cc) is related to government performance (gov_p) in Italy.

  • However, it is not clear yet that this is the case both in northern area and southern area.

  • Draw a scatter plot between cc and gov_p

putnam %>% 
  ggplot(aes(cc, gov_p)) +
  geom_point() +
  theme_bw() +
  labs(x = "cc", y = "gov_p",
         title = "civic community index and Government Performance in Italy") + 
  stat_smooth(method = lm, se = FALSE)

  • We see a positive correlation between cc and gov_p.
    → The better civic community index the higher the local government performance.
  • Let’s get Sample Regression Function (SRP) for model_3.
model_3 <- lm(gov_p ~ cc, data = putnam)

summary(model_3)

Call:
lm(formula = gov_p ~ cc, data = putnam)

Residuals:
    Min      1Q  Median      3Q     Max 
-2.5043 -1.3481 -0.2087  0.9764  3.4957 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  2.71115    0.84443   3.211  0.00485 ** 
cc           0.56730    0.06552   8.658 7.81e-08 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1.789 on 18 degrees of freedom
Multiple R-squared:  0.8064,    Adjusted R-squared:  0.7956 
F-statistic: 74.97 on 1 and 18 DF,  p-value: 7.806e-08

\[\widehat{gov_p}\ = 2.711 + 0.567econ\]

  • Check the class of variables contained in putnam
str(putnam)
spec_tbl_df [20 × 5] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
 $ region  : chr [1:20] "Ab" "Ba" "Cl" "Cm" ...
 $ gov_p   : num [1:20] 7.5 7.5 1.5 2.5 16 12 10 11 11 9 ...
 $ cc      : num [1:20] 8 4 1 2 18 17 13 16 17 15.5 ...
 $ econ    : num [1:20] 7 3 3 6.5 13 14.5 12.5 15.5 19 10.5 ...
 $ location: chr [1:20] "south" "south" "south" "south" ...
 - attr(*, "spec")=
  .. cols(
  ..   region = col_character(),
  ..   gov_p = col_double(),
  ..   cc = col_double(),
  ..   econ = col_double(),
  ..   location = col_character()
  .. )

→ Change the class of location from charactor to numeric
→ Change the name of data frame as df2

df2 <- mutate(putnam, 
              location = as.numeric(location == "north" )) # north = 1, south = 0
DT::datatable(df2)
  • To see if civic community index (cc) is related to government performance (gov_p) both in northern area and southern area, we need to simultaneously include cc and location in our regression model.
model_3 <- lm(gov_p ~ cc + location, data = df2)
  • Show the results
  • Note: replace {r} with {r, results = "asis"} as the chunk option
stargazer(model_3, type = "html")
Dependent variable:
gov_p
cc 0.571**
(0.215)
location -0.048
(2.678)
Constant 2.698**
(1.121)
Observations 20
R2 0.806
Adjusted R2 0.784
Residual Std. Error 1.841 (df = 17)
F Statistic 35.402*** (df = 2; 17)
Note: p<0.1; p<0.05; p<0.01
  • We get the following SRF for model_3:

\[\widehat{gov_p}\ = 2.698 - 0.571econ + 0.048location\]

  • We see that cc is related to gov_p

  • We see that location is not related to gov_p
    → Regardless of the value of location that is, when the local government is either in the North or in the South), government performance does not differ.

  • By substituting location = 0 and 1, we get the following two regression functions:
    Note: The two slopes are identical!

loation = 0

\[\widehat{gov_p}\ = 2.65 - 0.571econ\]

location = 1

\[\widehat{gov_p}\ = 2.698 - 0.571econ\]

  • Let’s visualize these two results by drawning scatter plots

When you do not control with local dummy (left)

  • You see that the better the economic situation, the more government performance in Italy.

When you control with local dummy (right)

  • You see the location of local government (north or south) are not related in predicting local government’s performance (gov_p) in Italy.

Result ・The Civic Community Index (cc) matters in predicting government performance both in the north and in the south in Italy

4.2 What explains gov_p?

model_4 <- lm(gov_p ~ cc + econ + location, data = df2)
stargazer(model_1, model_2, model_3, model_4,
          type = "html")
Dependent variable:
gov_p
(1) (2) (3) (4)
econ 0.589*** -0.019 -0.269
(0.120) (0.220) (0.199)
cc 0.571** 0.700***
(0.215) (0.230)
location 6.884*** -0.048 0.858
(2.229) (2.678) (2.698)
Constant 3.011** 5.222*** 2.698** 3.495**
(1.385) (1.347) (1.121) (1.243)
Observations 20 20 20 20
R2 0.572 0.726 0.806 0.826
Adjusted R2 0.549 0.694 0.784 0.794
Residual Std. Error 2.659 (df = 18) 2.190 (df = 17) 1.841 (df = 17) 1.797 (df = 16)
F Statistic 24.097*** (df = 1; 18) 22.531*** (df = 2; 17) 35.402*** (df = 2; 17) 25.370*** (df = 3; 16)
Note: p<0.1; p<0.05; p<0.01

Conclusions ・The Civic Community Index (cc) matters in predicting government performance

・Economic situaion (econ) does not matter in predicting government performance

・The location (location) does not matter in predicting government performance

References
  • 飯田健『計量政治分析』共立出版、2013年.
  • Ellis Goldberg (1996), Thinking about How Democracy Works, Politics & Society, Vol. 24, pp.7-18.
  • 宋財泫 (Jaehyun Song)- 矢内勇生 (Yuki Yanai)「私たちのR: ベストプラクティスの探究」
  • 土井翔平(北海道大学公共政策大学院)「Rで計量政治学入門」
  • 矢内勇生(高知工科大学)授業一覧
  • 浅野正彦, 矢内勇生.『Rによる計量政治学』オーム社、2018年
  • 浅野正彦, 中村公亮.『初めてのRStudio』オーム社、2018年
  • Winston Chang, R Graphics Cookbook, O’Reilly Media, 2012.
  • Kieran Healy, DATA VISUALIZATION, Princeton, 2019
  • Kosuke Imai, Quantitative Social Science: An Introduction, Princeton University Press, 2017