R packages we use in this section
library(tidyverse)
library(stargazer)
Research Question
Theory: Social capital enhances local government’s performance.
Source: Robert Putnam, (1994) Making Democracy Work: Civic Traditions in Modern Italy,
Princeton, NJ: Princeton University Press)
Theory
The differences on local government’s performance can be explained by the degree of social capital in each local government.
Social capital can be defined as “the networks of relationships among people who live and work in a particular society, enabling that society to function effectively”.
It involves the effective functioning of social groups through interpersonal relationships, a shared sense of identity, a shared understanding, shared norms, shared values, trust, cooperation, and reciprocity.
Social capital help people build cooperation one another.
→ In the area with more social capital, the more people trust and cooperate one another, which leads to high quality government performance.
✔ Goldberg’s argument (1996)
gov_p
differ by location?data
folder in your RProject
folderputnam
<- read_csv("data/putnam.csv") putnam
names(putnam)
[1] "region" "gov_p" "cc" "econ" "location"
Data:
Types of variables | Variables | Details |
---|---|---|
Outcome | gov_p |
Performance of Italian local governments |
Predictor | region |
Abbreviation of Italian local governments |
Predictor | cc |
Civic Community Index |
Predictor | econ |
Economy Index (the larger, the better) |
Predictor | location |
Area dummy (north ,south ) |
putnum
::datatable(putnam) DT
%>%
putnam ggplot(aes(x = location, y = gov_p, fill = location)) +
geom_boxplot() +
labs(x = "Location Dummy", y = "gov_p",
title = "Government Performance in Italy by Location") +
stat_smooth(method = lm, se = FALSE)
It looks like there is a clear difference between north and south
Conduct a t-test
(unpaired)
t.test(putnam$gov_p[putnam$location == "north"],
$gov_p[putnam$location == "south"]) putnam
Welch Two Sample t-test
data: putnam$gov_p[putnam$location == "north"] and putnam$gov_p[putnam$location == "south"]
t = 6.8253, df = 14.552, p-value = 6.737e-06
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
4.607777 8.808890
sample estimates:
mean of x mean of y
11.83333 5.12500
Result
・Average gov_p (North) = 11.833
・Average gov_p (South) = 5.125
・The difference (-6.708) is statistically significant with the 1% significant level (p-value = 6.737e-06
)
→ As Goldberg (1996) argues, there is a clear difference in government performance between north and south.
econ
explain gov_p
?It is seems that economy (econ
) is related to government performance (gov_p
) in Italy.
However, it is not clear yet that this is the case both in northern area and southern area.
Draw a scatter plot between econ
and gov_p
%>%
putnam ggplot(aes(econ, gov_p)) +
geom_point() +
theme_bw() +
labs(x = "econ", y = "gov_p",
title = "Economic situation and Government Performance in Italy") +
stat_smooth(method = lm, se = FALSE)
econ
and gov_p
.<- lm(gov_p ~ econ, data = putnam)
model_1
summary(model_1)
Call:
lm(formula = gov_p ~ econ, data = putnam)
Residuals:
Min 1Q Median 3Q Max
-4.3386 -1.7733 0.0086 0.8336 5.5114
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 3.0108 1.3847 2.174 0.043264 *
econ 0.5889 0.1200 4.909 0.000113 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 2.659 on 18 degrees of freedom
Multiple R-squared: 0.5724, Adjusted R-squared: 0.5487
F-statistic: 24.1 on 1 and 18 DF, p-value: 0.0001131
\[\widehat{gov_p}\ = 3.01 + 0.589econ\]
putnam
str(putnam)
spec_tbl_df [20 × 5] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
$ region : chr [1:20] "Ab" "Ba" "Cl" "Cm" ...
$ gov_p : num [1:20] 7.5 7.5 1.5 2.5 16 12 10 11 11 9 ...
$ cc : num [1:20] 8 4 1 2 18 17 13 16 17 15.5 ...
$ econ : num [1:20] 7 3 3 6.5 13 14.5 12.5 15.5 19 10.5 ...
$ location: chr [1:20] "south" "south" "south" "south" ...
- attr(*, "spec")=
.. cols(
.. region = col_character(),
.. gov_p = col_double(),
.. cc = col_double(),
.. econ = col_double(),
.. location = col_character()
.. )
→ Change the class of location
from charactor
to numeric
→ Change the name of data frame as df2
<- mutate(putnam,
df2 location = as.numeric(location == "north" )) # north = 1, south = 0
::datatable(df2) DT
econ
) is related to government performance (gov_p
) both in northern area and southern area, we need to simultaneously include econ
and location
in our regression model.<- lm(gov_p ~ econ + location, data = df2) model_2
{r}
with {r, results = "asis"}
as the chunk optionstargazer(model_2, type = "html")
Dependent variable: | |
gov_p | |
econ | -0.019 |
(0.220) | |
location | 6.884*** |
(2.229) | |
Constant | 5.222*** |
(1.347) | |
Observations | 20 |
R2 | 0.726 |
Adjusted R2 | 0.694 |
Residual Std. Error | 2.190 (df = 17) |
F Statistic | 22.531*** (df = 2; 17) |
Note: | p<0.1; p<0.05; p<0.01 |
SRF
for model_2:\[\widehat{gov_p}\ = 5.222 - 0.019econ + 6.88location\]
We see that econ
is not related to gov_p
We see that location
is related to gov_p
→ When location
= 1, (that is, when the local government is located in the North), government performance is higher by 6.884 points.
By substituting location = 0 and 1
, we get the following two regression functions:
Note: The two slopes are identical!
loation = 0
\[\widehat{gov_p}\ = 5.22 - 0.019econ\]
location = 1
\[\widehat{gov_p}\ = 12.11 - 0.019econ\]
Result ・The relationship between economic situation (econ
) and government performance (gov_p
) is spurious correlation.
cc
explain gov_p
?It seems that civic community index (cc
) is related to government performance (gov_p
) in Italy.
However, it is not clear yet that this is the case both in northern area and southern area.
Draw a scatter plot between cc
and gov_p
%>%
putnam ggplot(aes(cc, gov_p)) +
geom_point() +
theme_bw() +
labs(x = "cc", y = "gov_p",
title = "civic community index and Government Performance in Italy") +
stat_smooth(method = lm, se = FALSE)
cc
and gov_p
.<- lm(gov_p ~ cc, data = putnam)
model_3
summary(model_3)
Call:
lm(formula = gov_p ~ cc, data = putnam)
Residuals:
Min 1Q Median 3Q Max
-2.5043 -1.3481 -0.2087 0.9764 3.4957
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 2.71115 0.84443 3.211 0.00485 **
cc 0.56730 0.06552 8.658 7.81e-08 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 1.789 on 18 degrees of freedom
Multiple R-squared: 0.8064, Adjusted R-squared: 0.7956
F-statistic: 74.97 on 1 and 18 DF, p-value: 7.806e-08
\[\widehat{gov_p}\ = 2.711 + 0.567econ\]
putnam
str(putnam)
spec_tbl_df [20 × 5] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
$ region : chr [1:20] "Ab" "Ba" "Cl" "Cm" ...
$ gov_p : num [1:20] 7.5 7.5 1.5 2.5 16 12 10 11 11 9 ...
$ cc : num [1:20] 8 4 1 2 18 17 13 16 17 15.5 ...
$ econ : num [1:20] 7 3 3 6.5 13 14.5 12.5 15.5 19 10.5 ...
$ location: chr [1:20] "south" "south" "south" "south" ...
- attr(*, "spec")=
.. cols(
.. region = col_character(),
.. gov_p = col_double(),
.. cc = col_double(),
.. econ = col_double(),
.. location = col_character()
.. )
→ Change the class of location
from charactor
to numeric
→ Change the name of data frame as df2
<- mutate(putnam,
df2 location = as.numeric(location == "north" )) # north = 1, south = 0
::datatable(df2) DT
cc
) is related to government performance (gov_p
) both in northern area and southern area, we need to simultaneously include cc
and location
in our regression model.<- lm(gov_p ~ cc + location, data = df2) model_3
{r}
with {r, results = "asis"}
as the chunk optionstargazer(model_3, type = "html")
Dependent variable: | |
gov_p | |
cc | 0.571** |
(0.215) | |
location | -0.048 |
(2.678) | |
Constant | 2.698** |
(1.121) | |
Observations | 20 |
R2 | 0.806 |
Adjusted R2 | 0.784 |
Residual Std. Error | 1.841 (df = 17) |
F Statistic | 35.402*** (df = 2; 17) |
Note: | p<0.1; p<0.05; p<0.01 |
SRF
for model_3:\[\widehat{gov_p}\ = 2.698 - 0.571econ + 0.048location\]
We see that cc
is related to gov_p
We see that location
is not related to gov_p
→ Regardless of the value of location
that is, when the local government is either in the North or in the South), government performance does not differ.
By substituting location = 0 and 1
, we get the following two regression functions:
Note: The two slopes are identical!
loation = 0
\[\widehat{gov_p}\ = 2.65 - 0.571econ\]
location = 1
\[\widehat{gov_p}\ = 2.698 - 0.571econ\]
gov_p
) in Italy.Result ・The Civic Community Index (cc
) matters in predicting government performance both in the north and in the south in Italy
gov_p
?<- lm(gov_p ~ cc + econ + location, data = df2) model_4
stargazer(model_1, model_2, model_3, model_4,
type = "html")
Dependent variable: | ||||
gov_p | ||||
(1) | (2) | (3) | (4) | |
econ | 0.589*** | -0.019 | -0.269 | |
(0.120) | (0.220) | (0.199) | ||
cc | 0.571** | 0.700*** | ||
(0.215) | (0.230) | |||
location | 6.884*** | -0.048 | 0.858 | |
(2.229) | (2.678) | (2.698) | ||
Constant | 3.011** | 5.222*** | 2.698** | 3.495** |
(1.385) | (1.347) | (1.121) | (1.243) | |
Observations | 20 | 20 | 20 | 20 |
R2 | 0.572 | 0.726 | 0.806 | 0.826 |
Adjusted R2 | 0.549 | 0.694 | 0.784 | 0.794 |
Residual Std. Error | 2.659 (df = 18) | 2.190 (df = 17) | 1.841 (df = 17) | 1.797 (df = 16) |
F Statistic | 24.097*** (df = 1; 18) | 22.531*** (df = 2; 17) | 35.402*** (df = 2; 17) | 25.370*** (df = 3; 16) |
Note: | p<0.1; p<0.05; p<0.01 |
Conclusions ・The Civic Community Index (cc
) matters in predicting government performance
・Economic situaion (econ
) does not matter in predicting government performance
・The location (location
) does not matter in predicting government performance