• R packages used in this section
library(DT)
library(gapminder)
library(gghighlight)
library(ggrepel)
library(stargazer)
library(tidyverse)

1. Visualization by types of variables

variable types types of visalization variables needed
Discrete 2. bar chart more than 1 variable
Continuous 3. histogram more than 1 variable
Continuous 4. box plot more than 1 variable
Continuous 5. lollipop chart more than 1 variable
Continuous 6. scatterplot more than 2 variables
Continuous 7. line graph more than 2 variables

2. Tips on installing packages

2.1 Install packages via Github

  • Some packages are not on CRAN
  • If this is the case, you will see the following message when you install a package
  • For instance, jpndistrict package is not on CRAN
  • You will see the following error message
  • An error: package ‘jpndistrict’ is not available for this version of R
    → You need to install jpndistrict package via Github by typing the following command in Console:
install.packages("remotes")
remotes::install_github("uribo/jpndistrict")

2.2 Install multiple packages at the same time

  • You can install multiple variables (which are dependent each other) simultaneously by typing the following command in Console:
install.packages("rnaturalearth", dependencies = TRUE)

2.3 Avoid conflict among packages

  • For instance, if you install tidyverse package, you will see the following message:
library(tidyverse)

What this message means:

  • If you install tidyverse package, you automatically download 8 packages: ggplot2, tibble, tidyr, readr, purrr, dplyr, stringr, forcats
  • This message says that tidyverse package conflicts with two functions: filter() and lag()
    → You can avoid this conflict by typing the following command:

filter()dplyr::filter()
lag()dplyr::lag()

2.4 How to avoid Garbled characters in ggplot

  • If you have garbled characters (in Japanese) in graphs or figures using ggplot, you can avoid them by including either of the following two commands:
theme_bw(base_family = "HiraKakuProN-W3")
theme_set(theme_classic(base_size = 10,
                        base_family = "HiraginoSans-W3"))

2.5 How to load a package

  • There are two ways of loading a package in RStudio
  • For instance, if you want to use datatable() function on DT package, you can get it done by either of the following two ways:
library(DT)
dtatable(df1)
DT::datatable(df1)

In this section, I use them interchageably

2.6 How to make a R Project

  • It is very useful and efficient to make a R Project when you work on RStudio
  • Making a R Project = making a R Projct folder
  • A folder = a directory
  • Making a R Projct folder enables you to dramatically increase the efficiency of your work load in analysis
  • A working directory means a directory where you are working on
  • Working on RStudio enables you to precisely “take note” on what you are doing and what you have done (people easily forget what they have done)

3. Data Preparation

3.1 Data Cleaning on Election Data (1996-2017)

  • Download hr96-17.csv
  • Make a folder within your R Project folder and name it data
  • Put the hr96-17.csv in the data folder
  • To read the hr96-17.csv, we need to load readr package, which is included in tidyverse package
library(tidyverse)
df <- read.csv("data/hr96-17.csv", 
               na = ".")            
  • Show the list of variables in df
names(df)
 [1] "year"          "pref"          "ku"            "kun"          
 [5] "mag"           "rank"          "wl"            "nocand"       
 [9] "seito"         "j_name"        "name"          "term"         
[13] "gender"        "age"           "exp"           "status"       
[17] "vote"          "voteshare"     "eligible"      "turnout"      
[21] "castvotes"     "seshu_dummy"   "jiban_seshu"   "nojiban_seshu"

Make a new variables and modify variables

Make a dummy variable (wlsmd)
  • We make a dummy variable (wlsmd) using a variable, wl
variable name detail
wl 0 = loser / 1 = single-member district (smd) winner / 2 = zombie winner
wlsmd 0 = loser / 1 = winner
table(df$wl)

   0    1    2 
5563 2387  853 
df1 <- mutate(df, wlsmd = as.numeric(wl == 1)) 
table(df1$wlsmd)

   0    1 
6416 2387 
Make a variable (exppv)
  • exp is election expenditure (yen) spent by each candidate
  • We want to make exppv, which shows election expenditure (yen) per voter spent by each candidate per voter
  • eligible is the number of eligible voters in each single-member district
df1 <- mutate(df1, exppv = exp / eligible) 
summary(df1$exppv)
    Min.  1st Qu.   Median     Mean  3rd Qu.     Max.     NA's 
  0.0013   8.1762  18.7646  23.0907  33.3863 120.8519     1974 
  • How to deal with an error when you make exppv
  • Before executing df1 <- mutate(df1, exppv = exp / eligible), you need to take the following procedure:
  • Delete the following command you typed:
    df1 <- mutate(df1, exppv = exp / eligible)
  • You need to retype this command after a series of this procedure
Steps Command Detail
1 str(df1$exp) Check the class of exp
2 If the class is num, then go Step 4. Go Step 3, otherwise
3 df1$exp <- as.numeric(df1$exp) Change the class of exp to num
4 str(df1$eligible) Check the class of eligible
5 If the class is num, then go Step 7. Go Step 6, otherwise
6 df1$eligible <- as.numeric(df1$eligible) Change the class of eligible to num
7 str(df1$eligible) Check the class of eligible
8 If the class is num, then it is OK
9 str(df1$eligible) Check the class of eligible
10 If the class is num, then it is OK
  • If RStudio recognizes exp and eligible (which are supposed to be numeric) not as numeric, but as character, then we need to change the class of each variable to numeric by using as.numeric() function
Make a dummy variable (inc)
  • We make a dummy variable (inc) using a variable, status
variable name detail
status 0 = challenger / 1 = incumbent / 2 = former incumbent
inc 0 = non-incumbent / 1 = incumbent
table(df1$status)

   0    1    2 
5106 3129  568 
df1 <- mutate(df1, inc = as.numeric(status == 1 )) 
table(df1$inc)

   0    1 
5674 3129 
names(df1)
 [1] "year"          "pref"          "ku"            "kun"          
 [5] "mag"           "rank"          "wl"            "nocand"       
 [9] "seito"         "j_name"        "name"          "term"         
[13] "gender"        "age"           "exp"           "status"       
[17] "vote"          "voteshare"     "eligible"      "turnout"      
[21] "castvotes"     "seshu_dummy"   "jiban_seshu"   "nojiban_seshu"
[25] "wlsmd"         "exppv"         "inc"          
Make a dummy variable (ldp)
  • seito is a variable, which stands for each candidate’s affiliated party in Japanese
table(df1$seito)

          アイヌ民族党 さわやか神戸・市民の会     ニューディールの会 
                     1                      2                      1 
                みんな               安楽死党                   維新 
                    79                      1                     77 
        沖縄社会大衆党                   改革             改革クラブ 
                     1                      1                      4 
              希望の党                   共産                   公明 
                   198                   2123                     70 
                  幸福               国民新党                 国民党 
                   312                     21                     11 
      市民新党にいがた                   次世                   自民 
                     1                     39                   2266 
                自由党               自由連合                   社民 
                    61                    212                    307 
                  緒派                   諸派               新社会党 
                    44                      9                     38 
                新進党           新党さきがけ               新党尊命 
                   235                     13                      1 
              新党大地               新党日本       世界経済共同体党 
                     8                      9                      2 
        政事公団太平会           政治団体代表                   生活 
                     1                      2                     13 
            青年自由党             当たり前党           日本維新の会 
                     1                      1                    198 
            日本新進党           日本未来の党         文化フォーラム 
                     1                    111                     10 
              保守新党                 保守党                   民主 
                    11                     16                   1654 
          民主改革連合                 無所属             無所属の会 
                     2                    562                      9 
              立憲民主                 緑の党 
                    63                      1 
  • Using seito, we make ldp dummy variable, ldp
  • ldp = 1: LDP candidates、ldp = 0: non-LDP candidates
  • 「自民」means LDP in Japanese
df1 <- mutate(df1, ldp = as.numeric(seito == "自民" )) 
table(df1$ldp)

   0    1 
6537 2266 
names(df1)
 [1] "year"          "pref"          "ku"            "kun"          
 [5] "mag"           "rank"          "wl"            "nocand"       
 [9] "seito"         "j_name"        "name"          "term"         
[13] "gender"        "age"           "exp"           "status"       
[17] "vote"          "voteshare"     "eligible"      "turnout"      
[21] "castvotes"     "seshu_dummy"   "jiban_seshu"   "nojiban_seshu"
[25] "wlsmd"         "exppv"         "inc"           "ldp"          
  • df1 contains the following 28 variables
variable detail
year Election year (1996-2017)
pref Prefecture
ku Electoral district name
kun Number of electoral district
mag District magnitude (Number of candidate elected)
rank Ascending order of votes
nocand Number of candidates in each district
seito Candidate’s affiliated party
j_name Candidate’s name (Japanese)
name Candidate’s name (English)
term Previous wins
gender Candidate’s gender:“male”, “female”
age Candidate’s age
wl 0 = loser / 1 = single-member district (smd) winner / 2 = zombie winner
wlsmd 0 = loser / 1 = winner
exp Election expenditure (yen) spent by each candidate
status 0 = challenger / 1 = incumbent / 2 = former incumbent
vote votes each candidate garnered
voteshare Voteshare (%)
eligible Eligible voters in each district
turnout Turnout in each district (%)
castvote Total votes cast in each district
seshu_dummy 0 = Not-hereditary candidates, 1 = hereditary candidate
jiban_seshu Relationship between candidate and his predecessor
nojiban_seshu Relationship between candidate and his predecessor
exppv election expenditure (yen) per voter spent by each candidate per voter
inc 0 = non-incumbent / 1 = incumbent
ldp 0 = non-LDP candidates, 1 = LDP candidates

3.2 Descriptive Statistics of Japan’s National Elections

  • Show descriptive statistics of Japan’s national election data (df1)
library(stargazer)
  • Type {r, results = "asis"} at chunk option
stargazer(as.data.frame(df1), 
          type ="html",
          digits = 2)
Statistic N Mean St. Dev. Min Pctl(25) Pctl(75) Max
year 8,803 2,006.60 6.81 1,996 2,000 2,012 2,017
kun 8,803 5.74 5.06 1 2 8 25
mag 8,803 1.00 0.00 1 1 1 1
rank 8,803 2.70 21.36 1 1 3 2,003
wl 8,803 0.46 0.67 0 0 1 2
nocand 8,803 3.96 1.08 2 3 5 9
term 8,803 1.86 2.68 0 0 3 20
age 8,799 50.90 11.08 25.00 43.00 59.00 94.00
exp 6,829 7,551,393.00 5,482,684.00 535.00 2,803,567.00 11,044,412.00 27,462,362.00
status 8,803 0.48 0.62 0 0 1 2
vote 8,803 54,911.15 40,467.97 177 18,239.5 86,494.5 201,461
voteshare 8,803 27.08 19.19 0 8.9 42.9 95
eligible 7,928 326,092.00 79,708.01 115,013.00 269,945.80 390,965.00 495,212.00
turnout 6,992 62.84 6.39 44.71 57.74 67.50 83.80
castvotes 6,992 210,416.40 41,101.89 104,398.00 181,016.20 237,484.00 339,780.00
seshu_dummy 8,803 0.12 0.32 0 0 0 1
wlsmd 8,803 0.27 0.44 0 0 1 1
exppv 6,829 23.09 18.13 0.001 8.18 33.39 120.85
inc 8,803 0.36 0.48 0 0 1 1
ldp 8,803 0.26 0.44 0 0 1 1
  • df1 contais 28 variables
  • But, we can see only 20 variables here

Reason:

  • Descriptive statistics only shows the variables whose class is numeric: numeric, integer, double

  • Neither character variable nor factor variable is numeric variable

It is very important to check the class of variable in data visualization

  • Check the class of variables in df1 using str() function
str(df1)
'data.frame':   8803 obs. of  28 variables:
 $ year         : int  1996 1996 1996 1996 1996 1996 1996 1996 1996 1996 ...
 $ pref         : chr  "愛知" "愛知" "愛知" "愛知" ...
 $ ku           : chr  "aichi" "aichi" "aichi" "aichi" ...
 $ kun          : int  1 2 3 4 5 6 7 8 9 10 ...
 $ mag          : int  1 1 1 1 1 1 1 1 1 1 ...
 $ rank         : int  1 1 1 1 1 1 1 1 1 1 ...
 $ wl           : int  1 1 1 1 1 1 1 1 1 1 ...
 $ nocand       : int  7 8 7 6 7 8 7 5 7 7 ...
 $ seito        : chr  "新進党" "新進党" "新進党" "新進党" ...
 $ j_name       : chr  "河村たかし" "青木宏之" "吉田幸弘" "三沢淳" ...
 $ name         : chr  "KAWAMURA, TAKASHI" "AOKI, HIROYUKI" "YOSHIDA, YUKIHIRO" "MISAWA, JUN" ...
 $ term         : int  2 2 1 1 3 8 7 3 13 2 ...
 $ gender       : chr  "male" "male" "male" "male" ...
 $ age          : int  47 51 35 44 48 68 55 59 65 53 ...
 $ exp          : int  9828097 12940178 11245219 12134215 11894801 11252336 13493050 6368857 19731389 18863794 ...
 $ status       : int  1 1 0 0 1 1 1 1 1 1 ...
 $ vote         : int  66876 56101 52478 57361 48648 90812 91439 93053 111578 110820 ...
 $ voteshare    : num  40 32.9 32.3 35.7 30.9 39.7 47.5 44.4 47.7 46.4 ...
 $ eligible     : int  346774 338310 331808 315704 319846 433930 357984 377152 393953 437148 ...
 $ turnout      : num  49.2 51.8 50.4 52 50.3 54.2 55.5 57.1 60.6 56 ...
 $ castvotes    : int  167051 170317 162679 160548 157404 228631 192362 209450 234001 238646 ...
 $ seshu_dummy  : int  0 0 0 0 1 0 0 1 0 1 ...
 $ jiban_seshu  : chr  NA NA NA NA ...
 $ nojiban_seshu: chr  NA NA NA NA ...
 $ wlsmd        : num  1 1 1 1 1 1 1 1 1 1 ...
 $ exppv        : num  28.3 38.2 33.9 38.4 37.2 ...
 $ inc          : num  1 1 0 0 1 1 1 1 1 1 ...
 $ ldp          : num  0 0 0 0 0 0 0 1 0 0 ...

4. Bar Chart

4.1 The number of Candidate in HR election (1996-2017)

  • Draw a barchart representing the number of candidates per Lower House election between 1996 and 2017

  • Check the number of candidate by using table() function

table(df1$year)

1996 2000 2003 2005 2009 2012 2014 2017 
1261 1199 1026  989 1139 1294  959  936 
  • x axis shows election year
  • Y axis shows the number of candidates running for election
  • we use the election data df1

Caution: Windows users should typ either of the following two commands to avoid garbled characters

  1. windowsFonts(YuGothic = windowsFont("Yu Gothic"))
  2. windowsFonts(Noto = windowsFont("Noto Sans CJK JP"))
df1 %>% 
  ggplot() +
  geom_bar(aes(x = year)) +
  labs(x = "Election Year", y = "The number of lower house election") + 
  theme_bw(base_family = "HiraKakuProN-W3")   

  • Something is wrong with X-axis
str(df1$year)
 int [1:8803] 1996 1996 1996 1996 1996 1996 1996 1996 1996 1996 ...
  • Reason => The class of year is numeric

Solution

  • Change the class of year from numeric to factor
df1$year <- factor(df1$year) 
str(df1$year)
 Factor w/ 8 levels "1996","2000",..: 1 1 1 1 1 1 1 1 1 1 ...
df1 %>% 
  ggplot() +
  geom_bar(aes(x = year)) +
  labs(x = "Election Year", y = "The number of lower house election") + 
  theme_bw(base_family = "HiraKakuProN-W3") 

What we can see from the bar chart ・The number of candidates in Japan’s lower house election is decreasing since 1996 (except 2009 and 2012)

4.2 The Number of Candidates (LDP and Non-LDP)

  • Visualize the number of candidates between LDP and Non-LDP per election
  • Calculate the number of Candidate for LDP and Non-LDP candidates
df2 <- df1 %>%
  group_by(year, ldp) %>%
  summarise(N = n(),
            .groups   = "drop")

df2
# A tibble: 16 x 3
   year    ldp     N
   <fct> <dbl> <int>
 1 1996      0   973
 2 1996      1   288
 3 2000      0   928
 4 2000      1   271
 5 2003      0   749
 6 2003      1   277
 7 2005      0   699
 8 2005      1   290
 9 2009      0   848
10 2009      1   291
11 2012      0  1005
12 2012      1   289
13 2014      0   676
14 2014      1   283
15 2017      0   659
16 2017      1   277
  • Using group_by() and summarize() function, calculate the mean of candidates for LDP and Non-LDP candidates
df2 %>%
  group_by(ldp) %>% 
  summarize(N = mean(N, na.rm = TRUE),
            .groups = "drop")
# A tibble: 2 x 2
    ldp     N
  <dbl> <dbl>
1     0  817.
2     1  283.
  • The average of LDP candidates — 817
  • The average of Non-LDP candidates — 283
  • Check the class of variable, ldp
class(df2$ldp)
[1] "numeric"
  • Change the class of ldp from dbl to factor
df2$ldp <- factor(df2$ldp) 
class(df2$ldp)
[1] "factor"

Draw a stacked bar chart

df2 %>%
  ggplot() +
  geom_bar(aes(x = year, y = N, fill = ldp), 
           stat = "identity", position = "stack") +
  labs(x = "Election Year", y = "The Number of Candidates") +
  theme_minimal(base_family = "HiraKakuProN-W3")

What we can see from the bar chart ・The number of candidates in Japan’s lower house election is decreasing since 1996 (except 2009 and 2012)
・The number of LDP candidates does not change that much over time

Draw a parallel bar chart

  • Assigning position = "dodge" enables us to draw parallel bar charts
df2 %>%
  ggplot() +
  geom_bar(aes(x = year, y = N, fill = ldp), 
           stat = "identity", position = "dodge") + 
  labs(x = "Election Year", y = "The Number of Candidates") +
  theme_minimal(base_family = "HiraKakuProN-W3")

The difference between parallel graph and stacked graph

Types of bar chart What you can do
Stacked You can compare the average age of winners by Election Year
Parallel You can compare the average age of winners by Parties

Which one you should use depends on you!

Change colors in bar chart

  • You can assign any color you like by using scale_fill_manual() function
df2 %>%
  ggplot() +
  geom_bar(aes(x = year, y = N, fill = ldp), 
           stat = "identity", position = "dodge") +
  labs(x = "Election Year", y = "The Number of Candidates") +
  theme_minimal(base_family = "HiraKakuProN-W3") +
  scale_fill_manual(values = c("springgreen2", "deeppink2"))

4.3 SMD Winner"s Averaged Age in Lower House Elections

  • Visualize the averaged age for SMD winners
  • Using dplyr() function, calculate the average and save it as df2
df3 <- df1 %>%
  dplyr::filter(wlsmd == 1) %>% # choose only smd winners
  group_by(year) %>%         # calculate by election year
  summarize(age = mean(age, na.rm = TRUE), # calculate the mean of age 
            .groups = "drop")
df3
# A tibble: 8 x 2
  year    age
  <fct> <dbl>
1 1996   54.2
2 2000   53.8
3 2003   52.9
4 2005   53.1
5 2009   51.0
6 2012   52.5
7 2014   54.7
8 2017   55.9
  • Draw a graph with df2
  • x-axis is election year
  • y-axis is the winner’s averaged age
  • Show the averaged age => (stat = "identity)
df3 %>%
  ggplot() +
  geom_bar(aes(x = year, y = age), stat = "identity") + 
  labs(x = "Election Year", y = "") +
  theme_minimal(base_family = "HiraKakuProN-W3")

summary(df3$age)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  51.03   52.81   53.47   53.52   54.30   55.90 

Summary ・The SMD winner’s averaged age is 54 years old and it does not vary much

4.4 The SMD winner’s Averaged Age (LDP $ Non LDP)

  • Visualize the SMD winner’s Averaged Age for LDP and non-LDP candidates
  • Zombie winners are not included
library(tidyverse)
df4 <- df1 %>% 
  group_by(year, ldp) %>% 
  summarize(age = mean(age, na.rm = TRUE),
            .groups = "drop")
df4
# A tibble: 16 x 3
   year    ldp   age
   <fct> <dbl> <dbl>
 1 1996      0  49.4
 2 1996      1  54.3
 3 2000      0  49.5
 4 2000      1  55.7
 5 2003      0  49.5
 6 2003      1  54.3
 7 2005      0  49.3
 8 2005      1  52.6
 9 2009      0  48.3
10 2009      1  55.5
11 2012      0  49.8
12 2012      1  51.9
13 2014      0  51.5
14 2014      1  53.3
15 2017      0  51.7
16 2017      1  55.3
  • Calculate the SMD winner’s Averaged Age for LDP and non-LDP candidates
df4 %>%
  group_by(ldp) %>% 
  summarize(age = mean(age, na.rm = TRUE),
            .groups = "drop")
# A tibble: 2 x 2
    ldp   age
  <dbl> <dbl>
1     0  49.9
2     1  54.1
  • Non-LDP candidate"s averaged age :49.9
  • LDP candidate"s averaged age :54.1
  • class of ldp is <dbl> (numeric)
    => we need to change its class to factor
df4$ldp <- factor(df4$ldp) 
  • Check if the class is changed
class(df4$ldp)
[1] "factor"
df4 %>%
  ggplot() +
  geom_bar(aes(x = year, y = age, fill = ldp), 
           stat = "identity", position = "dodge") + 
  labs(x = "Election Year", y = "SMD Winner's Averaged Age") 

Summary LDP Winners averaged age (54.1) is larger than the non-LDP’s winners (49.9).

5. Histogram

  • A histogram is the most commonly used graph to show frequency distributions.
  • It looks very much like a bar chart, but there are important differences between them.
  • A histogram is a visual representation of frequency table.

Differences between a bar chart and a histogram

Type of graph x-axis Geometric objec Gap between bars
Bar Chart Discrete variable geom_bar() Yes
Histogram Continuous variable geom_histogram() No

5.1 Histogram 1 (vote share in HR elections)

Candidate’s Vote Share
- Candidate’s Vote Sharein the lower house election (1996-2017)

summary(df1$voteshare)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
   0.10    8.90   25.76   27.08   42.90   95.30 
  • minimum = 0.1%, maximam = 95.3%, average = 27.08%
  • Draw a histogram
df1 %>% 
  ggplot() +
  geom_histogram(aes(x = voteshare)) +
  labs(x = "Candidate's Vote Share(1996-2017)", y = "Frequency") 

  geom_vline(xintercept = mean(df1$voteshare),  # Draw a line at its mean
             col = "magenta3")
mapping: xintercept = ~xintercept 
geom_vline: na.rm = FALSE
stat_identity: na.rm = FALSE
position_identity 
  • Check wl
table(df1$wl)

   0    1    2 
5563 2387  853 
  • We want to make a data frame (df5) for SMD winners (wl == 1)
df5 <- df1 %>%
  dplyr::filter(wl == 1)
  • Show the descriptive statistics of the SMD winner’s vote share between 1996 and 2017 in the lower house elections
summary(df5$voteshare)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  21.50   44.20   50.10   50.65   56.00   95.30 
  • minimum = 21.50%, maximam = 95.30%, average = 50.65%
  • Draw a histogram
df5 %>% 
  ggplot() +
  geom_histogram(aes(x = voteshare)) +
  labs(x = "The SMD Winner's Averaged Vote Share (1996-2017)", y = "Frequency") +
  geom_vline(xintercept = mean(df5$voteshare),   # Draw a line at its mean
             col = "magenta3")

5.2 Histogram 1 (vote share in HR elections)

Draw a white line between bins

df5 %>%
  dplyr::filter(!is.na(voteshare)) %>%
  ggplot() +
  geom_histogram(aes(x = voteshare), color = "white") +
  labs(x = "SMD Winner's vote share (1996-2017)", y = "Candidte's Number") 

Change the number of Bins

  • Change the number of bins to 10
df5 %>%
  ggplot() +
  geom_histogram(aes(x = voteshare), 
                 color = "white", 
                 bins = 10) +
 labs(x = "SMD Winner's vote share (1996-2017)", y = "Candidte's Number") 

Add another dimention

  • We want to draw a histogram with a new dimention: whether or not a candidate is an LDP politician(自民 means LDP in Japanese)

Stacked Histogram

df5 %>%
  mutate(ldp2 = ifelse(ldp == 1, "LDP", "Non_LDP")) %>%
  ggplot() +
  geom_histogram(aes(x = voteshare, 
                     fill = ldp2), 
                 color = "white",
                 bins = 30) +
  labs(x = "SMD Winner's vote share (1996-2017)", y = "Candidate's number", fill = "Candidate's party") 

Overlapping Histogram

df5 %>%
  mutate(ldp2 = ifelse(ldp == 1, "LDP", "Non_LDP")) %>%
  dplyr::filter(!is.na(voteshare)) %>%                
  ggplot() +
  geom_histogram(aes(x = voteshare,       
                     fill = ldp2),           
                 color = "white",              
                 alpha = 0.5,                  
                 position = "identity",         
                 boundary = 0) +               
  labs(x = "SMD Winner's vote share (1996-2017)", 
       y = "Candidate's number", fill = "Candidate's party")

6. Box plot

6.1 Show Vote Share using Box Plot (2017HR)

df8 <- df1 %>% 
  dplyr::filter(year == 2017)
df8 %>%
  filter(!is.na(voteshare)) %>%
  ggplot() +
  geom_boxplot(aes(y = voteshare)) +
  labs(y = "Vote Share (2017 HR Election)") +
  coord_flip()  # flip the plot   

- How to interpret the box plot

Source:浅野・矢内『Rによる計量政治学』p.107.

6.2 Show Vote Share by Pary (2017HR)

df8 %>%
  dplyr::filter(!is.na(voteshare)) %>%
  ggplot() +
  geom_boxplot(aes(x = seito, 
                   y = voteshare)) +
  labs(x = "Party", 
       y = "Vote Share (%) ") +
  theme_gray(base_family = "HiraKakuProN-W3")  +
  coord_flip() 

  • You can nicely compare the vote share by party
  • On average, 公明党 (CGP) received the most vote share
  • 共産党 (JCP) received the least vote share
  • 無所属 (independent) candidates’ variance is the largest
  • 公明党 (CGP) and 自民党 (LDP) candidats’ variance is relatively small
  • A dot「●」means a outlier
  • 自民党 (LDP) has four outliers
  • They are the top four LDP winners with the most votes received
  • Check who they are
df8 %>% 
  dplyr::filter(voteshare > 75) %>% 
  select(pref, kun, seito, age, voteshare, vote, j_name) %>% 
  head()
    pref kun seito age voteshare   vote     j_name
1   岐阜   2  自民  54     75.40 117278   棚橋泰文
2   広島   1  自民  60     77.96 113239   岸田文雄
3 神奈川  11  自民  36     78.02 154761 小泉進次郎
4   宮城   6  自民  57     85.72 123871 小野寺五典
5   鳥取   1  自民  60     83.63 106425     石破茂

  • You can show it in colors by adding fill = seito command
df8  %>%
  dplyr::filter(!is.na(voteshare)) %>%
  ggplot() +
  geom_boxplot(aes(x = seito, 
                   y = voteshare,
                   fill = seito),    
               show.legend = FALSE) + 
   labs(x = "Party", y = "Vote Share (2017 HR Election)") +
  coord_flip()  

6.3 Exercise

Q: Refering to 6.2 Show Vote Share by Pary (2017HR), generate a boxplots showing the vote shares by party with the 2009 HR election.

6.4 Present dots on box plot

  • Using geom_point(), you can present dot “●” on the box plot
df8 %>%
  dplyr::filter(!is.na(voteshare)) %>%
  ggplot(aes(x = seito, y = voteshare)) +
  geom_point(aes(color = seito), alpha = 0.5,
             show.legend = FALSE) +
  geom_boxplot(aes(fill = seito),
               alpha = 0.5, show.legend = FALSE) +
  labs(x = "Party", y = "Vote Share (2017 HR Election)")

- You can scatter the dots by adding geom_jitter() so that you can clearly see them

df8 %>%
  filter(!is.na(voteshare)) %>%
  ggplot(aes(x = seito, 
             y = voteshare)) +
  geom_jitter(aes(color = seito),
              show.legend = FALSE) +
  geom_boxplot(aes(fill = seito),
               alpha = 0.5, 
               show.legend = FALSE) +
   labs(x = "Party", y = "Vote Share (2017 HR Election)")

  • You can adjust the degree of dispersion using width = 0.15, height = 0
  • The larger the number, the larger their dispersion
df8 %>%
  dplyr::filter(!is.na(voteshare)) %>%
  ggplot(aes(x = seito, 
             y = voteshare)) +
  geom_jitter(aes(color = seito),
              width = 0.15, height = 0, # Adjust the dispersion  
              show.legend = FALSE) +
  geom_boxplot(aes(fill = seito),
               alpha = 0.5, 
               show.legend = FALSE) +
   labs(x = "Party", y = "Vote Share (2017 HR Election)")

6.6 Add Another Dimention to Box Plot

facet_wrap()

df8 %>%
  dplyr::filter(!is.na(voteshare)) %>%
  ggplot(aes(x = seito, y = voteshare)) +
  geom_jitter(aes(color = seito), alpha = 0.5,
              width = 0.15, height = 0,
              show.legend = FALSE) +
  geom_boxplot(aes(fill = seito),
               alpha = 0.5, show.legend = FALSE) +
  labs(x = "Party", 
       y = "Vote Share (2017 HR Election)",
       caption = "Male and Female Candidates") +
  facet_wrap(~ gender) +
  theme_bw(base_family = "HiraKakuProN-W3") +
  coord_flip()  

  • You can compare vote share between male and female candidates by party
df8 %>%
  dplyr::filter(!is.na(voteshare)) %>%
  ggplot(aes(x = seito, 
             y = voteshare)) +
  geom_jitter(aes(color = gender), 
              alpha = 0.5,
              position = position_jitterdodge(jitter.width = 0.2,
                                              jitter.height = 0),
              show.legend = FALSE) +
  geom_boxplot(aes(fill = gender),
               alpha = 0.5) +
  labs(x = "Party", 
       y = "Vote Share (2017 HR Election)", 
       fill = "",
       caption = "Male and Female Candidates") 

- You can limit the number of parties you like to look into

7. Lollipop Chart

  • Download hr09_14_ldp_seatshare.csv
  • The Japan’s lower house election results on the LDP between 2009 and 2014
  • Make a folder within your R Project folder and name it data
  • Put the hr09_14_ldp_seatshare.csv in the data folder
  • To read the hr09_14_ldp_seatshare.csv, we need to load readr package, which is included in tidyverse package
library(tidyverse)
df_seat <- read_csv("data/hr09_14_ldp_seatshare.csv")  
  • Using datatable() function, check the dataframe
DT::datatable(df_seat)
  • df_seat contains the following 7 variables
Variables Details
year Election Year
pref Prefectures (in Japanese)
id Prefecture ID (1-47)
nosmd The total number of Single-Member-Districts (SMD) in each Prefecture (1-25)
ldp The total number of LDP winners in each SMD
ldp_ratio Ratio of LDP winners in each SMD(%)
dpj The total number of DPJ winners in each SMD
  • Check the class of each variable
str(df_seat$year)
 num [1:141] 2012 2014 2009 2012 2014 ...
  • The class of the variable year is numeric
  • Convert it into factor
df_seat$year <- factor(df_seat$year) 
df_seat %>% 
  arrange(year, ldp_ratio) %>%
  mutate(order_seq = c(1:47, rep(0, 47*2))) %>% 
  
 ggplot(aes(x = ldp_ratio, 
            y = reorder(pref, order_seq))) + 
  geom_segment(aes(yend = pref),
               xend = 0, colour = "grey50") +
  geom_point(size = 2,
             aes(colour = year)) +
  scale_colour_brewer(palette = "Set1", 
                      limits = c("2009", "2012", "2014"),
                      guide = FALSE) +
  theme(panel.grid.major.y =
          element_blank()) +
  facet_grid(~ year,
             scales="free_y", space = "free_y") + 
  theme_bw(base_family = "HiraKakuProN-W3")+ # Show Japanese in chart  
    labs(x = "LDP Vote Share (%)", 
         y = "Prefecture")

The ratio of LDP’s winners in SMd by prefecture (2009-2014)
  • You can show it by histogram as follows:
df_seat %>% 
  arrange(year, ldp_ratio) %>%
  mutate(order_seq = c(1:47, rep(0, 47*2))) %>% 
  
  ggplot(aes(x = reorder(pref, order_seq), 
           y = ldp_ratio,  
           fill = year)) +
  geom_bar(stat = "identity") +
  facet_grid(~ year, scales = "free_x") +
  theme(legend.position = "none") +
  coord_flip()+ 
  theme_bw(base_family = "HiraKakuProN-W3")+
   labs(x = "Prefecture", 
         y = "LDP's Vote Share (%)") 

The ratio of LDP’s winners in SMd by prefecture (2009-2014)
Reference