✔ What we do here
・Introduce how to make a variable
・Introduce how to merge variables and make a dataframe
・Introduce how to merge multiple dataframes
・Introduce how to read data with various file name extensions
・Introduce how to clean the data you read into RStudio
・Explain technical terms we need in analyzing data
technical terms explained here text data, binary data, file extension, pass, file, folder, R project, R project folder, working directory, missing value, class, data cleaning, converting data between wide and long format
tidyverse packagelibrary(haven)
library(readxl)
library(tidyverse)─ Attaching packages ──────────────────── tidyverse 1.3.1 ─
✓ ggplot2 3.3.3 ✓ purrr 0.3.4
✓ tibble 3.1.2 ✓ dplyr 1.0.6
✓ tidyr 1.1.3 ✓ stringr 1.4.0
✓ readr 1.4.0 ✓ forcats 0.5.1
─ Conflicts ───────────────────── tidyverse_conflicts() ─
x dplyr::filter() masks stats::filter()
x dplyr::lag() masks stats::lag()
tidyverse contains 8 useful packagesreadr to read the dataidid <- c(1,2,3,4,5,6,7,8)namename <- c("Thies", "Cox", "McCubbins", "Schwartz", "DeNardo", "Bawn", "Patterson", "Geddes")scorescore <- c(43, 74, 80, 37, 20, 83, 64, 35)tidyverse packages to use tibble() functionlibrary(tidyverse)df1 <- tibble(id, name, score)
df1# A tibble: 8 x 3
id name score
<dbl> <chr> <dbl>
1 1 Thies 43
2 2 Cox 74
3 3 McCubbins 80
4 4 Schwartz 37
5 5 DeNardo 20
6 6 Bawn 83
7 7 Patterson 64
8 8 Geddes 35
- You can also use data.frame() instead of using tibble()
df1 <- data.frame(id, name, score)
df1○ tibble() shows the size of data frame, such as 8 x 3 and class of variable, such as <dbl> & <chr>
→ You should use tibble()
○ If you load tidyverse package, then you can use tibble()
departmentdf), then put the name of the new variable (department)df1$department <- c("poli-sci", "econ", "poli-sci", "econ", "art", "music", "communication", "history")
df1# A tibble: 8 x 4
id name score department
<dbl> <chr> <dbl> <chr>
1 1 Thies 43 poli-sci
2 2 Cox 74 econ
3 3 McCubbins 80 poli-sci
4 4 Schwartz 37 econ
5 5 DeNardo 20 art
6 6 Bawn 83 music
7 7 Patterson 64 communication
8 8 Geddes 35 history
gender to df1df1$gender <- c("male", "male", "male", "male", "male", "female", "male", "female")
df1# A tibble: 8 x 5
id name score department gender
<dbl> <chr> <dbl> <chr> <chr>
1 1 Thies 43 poli-sci male
2 2 Cox 74 econ male
3 3 McCubbins 80 poli-sci male
4 4 Schwartz 37 econ male
5 5 DeNardo 20 art male
6 6 Bawn 83 music female
7 7 Patterson 64 communication male
8 8 Geddes 35 history female
df2df2 includes the following two variables:①id
②prefecture
idid <- c(1,2,3,4,5,6,7,8)states standing for where they come fromstate <- c("California", "Oregon", "NY", "Washington", "Florida", "Wisconsin", "Alabama", "South Carolina")df2df2 <- tibble(id, state)
df2# A tibble: 8 x 2
id state
<dbl> <chr>
1 1 California
2 2 Oregon
3 3 NY
4 4 Washington
5 5 Florida
6 6 Wisconsin
7 7 Alabama
8 8 South Carolina
df1 and df2) with the same variable name (id) and name the new data frame, MM <- merge(df1, df2, by = "id")
M id name score department gender state
1 1 Thies 43 poli-sci male California
2 2 Cox 74 econ male Oregon
3 3 McCubbins 80 poli-sci male NY
4 4 Schwartz 37 econ male Washington
5 5 DeNardo 20 art male Florida
6 6 Bawn 83 music female Wisconsin
7 7 Patterson 64 communication male Alabama
8 8 Geddes 35 history female South Carolina
Question 1: Make the list of your family or friends (df1) containing the following variables:
① id: (1…..5)
② name
③ age
④ relationship
Question 2: Make the list of your family or friends (df2) containing the following variables:
① id: (1…..5)
② gender
③ height
Questin 3: Merge the two data frames you made (df1 and df2) with the shared variable (id) and name it M1
.html .Rmd .csv .doc .png .jpg)R Project folder, working directoryA path is a string of characters used to uniquely identify a location in a directory structure.
It is composed by following the directory tree hierarchy in which components, separated by a delimiting character, represent each directory.
The delimiting character is most commonly the slash \("/"\).
getwd() = get working directry
→ You can see in which directory you are working on
For example, let me type getwd() on my computer and hit the return key
getwd() "/Users/asanomasahiko/Dropbox/statistics/class_materials/R"
Mac user, then you will see something like thisWindow user, then you will see C Drive instead of UsersR means the name of RProject forlder (= working directory) where you are currently working at.R.Reasons:
- If you are in your R Project folder, you don’t have to assign a particular path whenevery you need data.
2021_grades」=> ○「grades_2021」2021 grades」=> ○「grades_2021」data(), then you can see the list of these embedded data (part of the list is shown here).data()state.x77head(state.x77) Population Income Illiteracy Life Exp Murder HS Grad Frost Area
Alabama 3615 3624 2.1 69.05 15.1 41.3 20 50708
Alaska 365 6315 1.5 69.31 11.3 66.7 152 566432
Arizona 2212 4530 1.8 70.55 7.8 58.1 15 113417
Arkansas 2110 3378 1.9 70.66 10.1 39.9 65 51945
California 21198 5114 1.1 71.71 10.3 62.6 20 156361
Colorado 2541 4884 0.7 72.06 6.8 63.9 166 103766
state.x77tail(state.x77) Population Income Illiteracy Life Exp Murder HS Grad Frost Area
Vermont 472 3907 0.6 71.64 5.5 57.1 168 9267
Virginia 4981 4701 1.4 70.08 9.5 47.8 85 39780
Washington 3559 4864 0.6 71.72 4.3 63.5 32 66570
West Virginia 1799 3617 1.4 69.48 6.7 41.6 100 24070
Wisconsin 4589 4468 0.7 72.48 3.0 54.5 149 54464
Wyoming 376 4566 0.6 70.29 6.9 62.9 173 97203
Titanichead(Titanic), , Age = Child, Survived = No
Sex
Class Male Female
1st 0 0
2nd 0 0
3rd 35 17
Crew 0 0
, , Age = Adult, Survived = No
Sex
Class Male Female
1st 118 4
2nd 154 13
3rd 387 89
Crew 670 3
, , Age = Child, Survived = Yes
Sex
Class Male Female
1st 5 1
2nd 11 13
3rd 13 14
Crew 0 0
, , Age = Adult, Survived = Yes
Sex
Class Male Female
1st 57 140
2nd 14 80
3rd 75 76
Crew 192 20
Read the data by data form
text data and binary data.txt file — you use this when you do text analysis.html file — you use this when you do web scraping.csv file・・・comma-separated valuesData we cannot read and understand, but computer can
.xls file
.xlsx file — newer than .xls file
.dta file — you can use this on STATA
.rds file — you can use this only for R
You are recommended not to use MS Office Excel, but LibreOffice
→ Free soft ware
→ You can assign character encode
→ You can evade unnecessary errors
.csv filedata within your RProjct folderhr96-17.csv into datatidyverse package to read csv.filetidyverselibrary(tidyverse)hrhr <- read_csv("data/hr96-17.csv",
na = ".") # replace missing data with "." csv UTF-8 (.csv)formUnicode(UTF-8) and save ithr <- read_csv("data/hr96-17.csv",
na = ".",
locale = locale(encoding = "cp932")).xls[x] filereadxl pacakges to read .xls[x] filelibrary(readxl)fh <- read_excel("data/FH_Country.xls").dta file.dta file is a binary datahaven pacakges to read .dta filelibrary(haven)triangle <- read_dta("data/TRIANGLE.DTA")
head(triangle)# A tibble: 6 x 19
statea stateb year dependa dependb demauta demautb allies dispute1 logdstab
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 2 20 1920 0.0157 0.280 10 9 0 0 5.82
2 2 20 1921 0.0115 0.224 10 10 0 0 5.82
3 2 20 1922 0.0113 0.201 10 10 0 0 5.82
4 2 20 1923 0.0112 0.213 10 10 0 0 5.82
5 2 20 1924 0.0110 0.213 10 10 0 0 5.82
6 2 20 1925 0.0108 0.191 10 10 0 0 5.82
# … with 9 more variables: lcaprat2 <dbl>, smigoabi <dbl>, opena <dbl>,
# openb <dbl>, minrpwrs <dbl>, noncontg <dbl>, smldmat <dbl>, smldep <dbl>,
# dyadid <dbl>
triangle containsnames(triangle) [1] "statea" "stateb" "year" "dependa" "dependb" "demauta"
[7] "demautb" "allies" "dispute1" "logdstab" "lcaprat2" "smigoabi"
[13] "opena" "openb" "minrpwrs" "noncontg" "smldmat" "smldep"
[19] "dyadid"
TRIANGLE.DTA as csv form which is more widely used with the following commandwrite_excel_csv(triangle, "data/triangle.csv")GDP datawb_gdp_pc.csvdata folder and put the we_gdp_pc.csv into the folderwb_gdp <- read_csv("data/wb_gdp_pc.csv")Warning: Missing column names filled in: 'X3' [3]
─ Column specification ────────────────────────────
cols(
`Data Source` = col_character(),
`World Development Indicators` = col_character(),
X3 = col_character()
)
Warning: 265 parsing failures.
row col expected actual file
2 -- 3 columns 64 columns 'data/wb_gdp_pc.csv'
3 -- 3 columns 64 columns 'data/wb_gdp_pc.csv'
4 -- 3 columns 64 columns 'data/wb_gdp_pc.csv'
5 -- 3 columns 64 columns 'data/wb_gdp_pc.csv'
6 -- 3 columns 64 columns 'data/wb_gdp_pc.csv'
... ... ......... .......... ....................
See problems(...) for more details.
Pay attention to the Warning: Missing column names filled in : ’X3" [3]
Warning is not as seriou as Error, but we need to be cautious about itHow to deal with the Warning
head(wb_gdp)# A tibble: 6 x 3
`Data Source` `World Development Indicators` X3
<chr> <chr> <chr>
1 Last Updated Date 2019-03-21 <NA>
2 Country Name Country Code Indicator Name
3 Aruba ABW GDP per capita (current US$)
4 Afghanistan AFG GDP per capita (current US$)
5 Angola AGO GDP per capita (current US$)
6 Albania ALB GDP per capita (current US$)
data using LibreOffice(or Excel)and check the dataI emphasize the first 4 yellow lines so that you can easily recognize them
read_csv(), then RStudio automatically recognize the first row of the csv file as variable nameswb_gdp <- read_csv("data/wb_gdp_pc.csv", skip = 4)Warning: Missing column names filled in: 'X64' [64]
─ Column specification ────────────────────────────
cols(
.default = col_double(),
`Country Name` = col_character(),
`Country Code` = col_character(),
`Indicator Name` = col_character(),
`Indicator Code` = col_character(),
`2018` = col_logical(),
X64 = col_logical()
)
ℹ Use `spec()` for the full column specifications.
str() enables us to check the variable classstr(wb_gdp)spec_tbl_df [264 × 64] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
$ Country Name : chr [1:264] "Aruba" "Afghanistan" "Angola" "Albania" ...
$ Country Code : chr [1:264] "ABW" "AFG" "AGO" "ALB" ...
$ Indicator Name: chr [1:264] "GDP per capita (current US$)" "GDP per capita (current US$)" "GDP per capita (current US$)" "GDP per capita (current US$)" ...
$ Indicator Code: chr [1:264] "NY.GDP.PCAP.CD" "NY.GDP.PCAP.CD" "NY.GDP.PCAP.CD" "NY.GDP.PCAP.CD" ...
$ 1960 : num [1:264] NA 59.8 NA NA NA ...
$ 1961 : num [1:264] NA 59.9 NA NA NA ...
$ 1962 : num [1:264] NA 58.5 NA NA NA ...
$ 1963 : num [1:264] NA 78.8 NA NA NA ...
$ 1964 : num [1:264] NA 82.2 NA NA NA ...
$ 1965 : num [1:264] NA 101 NA NA NA ...
$ 1966 : num [1:264] NA 138 NA NA NA ...
$ 1967 : num [1:264] NA 161 NA NA NA ...
$ 1968 : num [1:264] NA 130 NA NA NA ...
$ 1969 : num [1:264] NA 130 NA NA NA ...
$ 1970 : num [1:264] NA 157 NA NA 3239 ...
$ 1971 : num [1:264] NA 160 NA NA 3498 ...
$ 1972 : num [1:264] NA 136 NA NA 4217 ...
$ 1973 : num [1:264] NA 144 NA NA 5342 ...
$ 1974 : num [1:264] NA 175 NA NA 6320 ...
$ 1975 : num [1:264] NA 188 NA NA 7169 ...
$ 1976 : num [1:264] NA 199 NA NA 7152 ...
$ 1977 : num [1:264] NA 226 NA NA 7751 ...
$ 1978 : num [1:264] NA 249 NA NA 9130 ...
$ 1979 : num [1:264] NA 278 NA NA 11821 ...
$ 1980 : num [1:264] NA 275 664 NA 12377 ...
$ 1981 : num [1:264] NA 266 600 NA 10372 ...
$ 1982 : num [1:264] NA NA 579 NA 9610 ...
$ 1983 : num [1:264] NA NA 582 NA 8023 ...
$ 1984 : num [1:264] NA NA 597 639 7729 ...
$ 1985 : num [1:264] NA NA 712 640 7774 ...
$ 1986 : num [1:264] 6473 NA 648 694 10362 ...
$ 1987 : num [1:264] 7886 NA 721 675 12616 ...
$ 1988 : num [1:264] 9765 NA 762 653 14304 ...
$ 1989 : num [1:264] 11392 NA 863 698 15166 ...
$ 1990 : num [1:264] 12307 NA 923 617 18879 ...
$ 1991 : num [1:264] 13496 NA 845 337 19533 ...
$ 1992 : num [1:264] 14047 NA 641 201 20548 ...
$ 1993 : num [1:264] 14937 NA 430 367 16516 ...
$ 1994 : num [1:264] 16241 NA 321 586 16235 ...
$ 1995 : num [1:264] 16439 NA 388 751 18461 ...
$ 1996 : num [1:264] 16586 NA 513 1010 19017 ...
$ 1997 : num [1:264] 17928 NA 507 717 18353 ...
$ 1998 : num [1:264] 19078 NA 420 814 18895 ...
$ 1999 : num [1:264] 19356 NA 386 1033 19262 ...
$ 2000 : num [1:264] 20621 NA 555 1127 21937 ...
$ 2001 : num [1:264] 20669 NA 526 1282 22229 ...
$ 2002 : num [1:264] 20437 184 870 1425 24741 ...
$ 2003 : num [1:264] 20834 196 979 1846 32776 ...
$ 2004 : num [1:264] 22570 217 1248 2374 38503 ...
$ 2005 : num [1:264] 23300 248 1891 2674 41282 ...
$ 2006 : num [1:264] 24046 269 2585 2973 43749 ...
$ 2007 : num [1:264] 25836 366 3108 3595 48583 ...
$ 2008 : num [1:264] 27086 370 4069 4371 47786 ...
$ 2009 : num [1:264] 24631 444 3118 4114 43339 ...
$ 2010 : num [1:264] 23513 551 3586 4094 39736 ...
$ 2011 : num [1:264] 24984 599 4616 4437 41099 ...
$ 2012 : num [1:264] 24710 649 5102 4248 38391 ...
$ 2013 : num [1:264] 25018 648 5258 4413 40620 ...
$ 2014 : num [1:264] 25528 625 5413 4579 42295 ...
$ 2015 : num [1:264] 25796 590 4171 3953 36038 ...
$ 2016 : num [1:264] 25252 550 3510 4132 37232 ...
$ 2017 : num [1:264] 25655 550 4100 4538 39147 ...
$ 2018 : logi [1:264] NA NA NA NA NA NA ...
$ X64 : logi [1:264] NA NA NA NA NA NA ...
- attr(*, "spec")=
.. cols(
.. `Country Name` = col_character(),
.. `Country Code` = col_character(),
.. `Indicator Name` = col_character(),
.. `Indicator Code` = col_character(),
.. `1960` = col_double(),
.. `1961` = col_double(),
.. `1962` = col_double(),
.. `1963` = col_double(),
.. `1964` = col_double(),
.. `1965` = col_double(),
.. `1966` = col_double(),
.. `1967` = col_double(),
.. `1968` = col_double(),
.. `1969` = col_double(),
.. `1970` = col_double(),
.. `1971` = col_double(),
.. `1972` = col_double(),
.. `1973` = col_double(),
.. `1974` = col_double(),
.. `1975` = col_double(),
.. `1976` = col_double(),
.. `1977` = col_double(),
.. `1978` = col_double(),
.. `1979` = col_double(),
.. `1980` = col_double(),
.. `1981` = col_double(),
.. `1982` = col_double(),
.. `1983` = col_double(),
.. `1984` = col_double(),
.. `1985` = col_double(),
.. `1986` = col_double(),
.. `1987` = col_double(),
.. `1988` = col_double(),
.. `1989` = col_double(),
.. `1990` = col_double(),
.. `1991` = col_double(),
.. `1992` = col_double(),
.. `1993` = col_double(),
.. `1994` = col_double(),
.. `1995` = col_double(),
.. `1996` = col_double(),
.. `1997` = col_double(),
.. `1998` = col_double(),
.. `1999` = col_double(),
.. `2000` = col_double(),
.. `2001` = col_double(),
.. `2002` = col_double(),
.. `2003` = col_double(),
.. `2004` = col_double(),
.. `2005` = col_double(),
.. `2006` = col_double(),
.. `2007` = col_double(),
.. `2008` = col_double(),
.. `2009` = col_double(),
.. `2010` = col_double(),
.. `2011` = col_double(),
.. `2012` = col_double(),
.. `2013` = col_double(),
.. `2014` = col_double(),
.. `2015` = col_double(),
.. `2016` = col_double(),
.. `2017` = col_double(),
.. `2018` = col_logical(),
.. X64 = col_logical()
.. )
Country Name and Country Code as character <chr>1960 as double <dbl>NA means (missing value)names(wb_gdp) [1] "Country Name" "Country Code" "Indicator Name" "Indicator Code"
[5] "1960" "1961" "1962" "1963"
[9] "1964" "1965" "1966" "1967"
[13] "1968" "1969" "1970" "1971"
[17] "1972" "1973" "1974" "1975"
[21] "1976" "1977" "1978" "1979"
[25] "1980" "1981" "1982" "1983"
[29] "1984" "1985" "1986" "1987"
[33] "1988" "1989" "1990" "1991"
[37] "1992" "1993" "1994" "1995"
[41] "1996" "1997" "1998" "1999"
[45] "2000" "2001" "2002" "2003"
[49] "2004" "2005" "2006" "2007"
[53] "2008" "2009" "2010" "2011"
[57] "2012" "2013" "2014" "2015"
[61] "2016" "2017" "2018" "X64"
We select variables we use
- We need to solve problems one by one which prevents us from conducting quantitative analysis
- Since we do not know what X64 is, check it
wb_gdp$X64 [1] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
[26] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
[51] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
[76] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
[101] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
[126] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
[151] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
[176] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
[201] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
[226] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
[251] NA NA NA NA NA NA NA NA NA NA NA NA NA NA
=NA)→ The followings are what we need
gdp <- wb_gdp %>%
select("Country Name",
"1960":"2018") names(gdp) [1] "Country Name" "1960" "1961" "1962" "1963"
[6] "1964" "1965" "1966" "1967" "1968"
[11] "1969" "1970" "1971" "1972" "1973"
[16] "1974" "1975" "1976" "1977" "1978"
[21] "1979" "1980" "1981" "1982" "1983"
[26] "1984" "1985" "1986" "1987" "1988"
[31] "1989" "1990" "1991" "1992" "1993"
[36] "1994" "1995" "1996" "1997" "1998"
[41] "1999" "2000" "2001" "2002" "2003"
[46] "2004" "2005" "2006" "2007" "2008"
[51] "2009" "2010" "2011" "2012" "2013"
[56] "2014" "2015" "2016" "2017" "2018"
Fix the name of variables
Country Name => countrygdp <- gdp %>%
rename(country = "Country Name")names(gdp) [1] "country" "1960" "1961" "1962" "1963" "1964" "1965"
[8] "1966" "1967" "1968" "1969" "1970" "1971" "1972"
[15] "1973" "1974" "1975" "1976" "1977" "1978" "1979"
[22] "1980" "1981" "1982" "1983" "1984" "1985" "1986"
[29] "1987" "1988" "1989" "1990" "1991" "1992" "1993"
[36] "1994" "1995" "1996" "1997" "1998" "1999" "2000"
[43] "2001" "2002" "2003" "2004" "2005" "2006" "2007"
[50] "2008" "2009" "2010" "2011" "2012" "2013" "2014"
[57] "2015" "2016" "2017" "2018"
gdpdim(gdp)[1] 264 60
The sample size (N) of gdp is 264
The number of variables is 60
Using DT::datatable() function, we can see how the entire data set looks like
DT::datatable(gdp)gdptidyr::pivot_longer() function, we convert wide to long formatgdp_longgdp_long <- gdp %>%
tidyr::pivot_longer("1960":"2018", # Range of variables you want to convert
names_to = "year", # Put the name of variables of wide format into year
values_to = "GDP") %>% # Put the name of vaariables of wide format into GDP
drop_na() # Drop missing valuesgdp_longDT::datatable(gdp_long)gdp_longstr(gdp_long)tibble [11,824 × 3] (S3: tbl_df/tbl/data.frame)
$ country: chr [1:11824] "Aruba" "Aruba" "Aruba" "Aruba" ...
$ year : chr [1:11824] "1986" "1987" "1988" "1989" ...
$ GDP : num [1:11824] 6473 7886 9765 11392 12307 ...
year from character to numericgdp_long$year <- as.numeric(gdp_long$year)str(gdp_long)tibble [11,824 × 3] (S3: tbl_df/tbl/data.frame)
$ country: chr [1:11824] "Aruba" "Aruba" "Aruba" "Aruba" ...
$ year : num [1:11824] 1986 1987 1988 1989 1990 ...
$ GDP : num [1:11824] 6473 7886 9765 11392 12307 ...
GDPfilter() function extract the data needed and name it jpn.chijpn.chi <- gdp_long %>%
filter(country == "Japan" | country == "China")・You should add the following command to avoid text garbling when using Japanese and drawing figures with ggplot() function
theme_set(theme_classic(base_size = 10,
base_family = "HiraginoSans-W3"))jpn.chi %>%
ggplot(aes(x = year, y = GDP,
color = country,
linetype = country,
shape = country)) +
geom_point() +
geom_line() +
ggtitle("Transition of GDP Per Capita (1980-2017) between Japan and China") +
labs(x = "Year", y = "GDP per capita (US$)") +
theme(legend.position = c(0.1, 0.8)) +
xlim(1980, 2017) # Delete the dta of 2018Freedom HouseFreedom House dataFreedom House data (1972-2016)PR: political rightsCL: civil libertiesStatus | Variables | Variable Class | Details |
|---|---|---|
PR |
numeric | political right (Best = 1, Worst = 7) |
CL |
numeric | civil liberties (Best = 1, Worst = 7) |
status |
categorical | F: free, PF: partly free, NF: not free |
year |
categorical | 1972-2016 |
PR and CL are measured on a one-to-seven scale, with one representing the highest degree of Freedom and seven the lowest.
Load readx1 package to read excel file
library(readxl)Download Freedom House and read it
Prior reading the data, open the original data (FH_Country.xls) file in either LibreOffice or Excel
Country Ratings, Statuses)sheet = 2Assign the sheet number and the row
skip = 2fh <- read_excel("data/FH_Country.xls",
sheet = 2,
skip = 2)Check the class of each variable
str(fh)tibble [205 × 133] (S3: tbl_df/tbl/data.frame)
$ ...1 : chr [1:205] "Afghanistan" "Albania" "Algeria" "Andorra" ...
$ PR...2 : chr [1:205] "4" "7" "6" "4" ...
$ CL...3 : chr [1:205] "5" "7" "6" "3" ...
$ Status...4 : chr [1:205] "PF" "NF" "NF" "PF" ...
$ PR...5 : chr [1:205] "7" "7" "6" "4" ...
$ CL...6 : chr [1:205] "6" "7" "6" "4" ...
$ Status...7 : chr [1:205] "NF" "NF" "NF" "PF" ...
$ PR...8 : chr [1:205] "7" "7" "6" "4" ...
$ CL...9 : chr [1:205] "6" "7" "6" "4" ...
$ Status...10 : chr [1:205] "NF" "NF" "NF" "PF" ...
$ PR...11 : chr [1:205] "7" "7" "7" "4" ...
$ CL...12 : chr [1:205] "6" "7" "6" "4" ...
$ Status...13 : chr [1:205] "NF" "NF" "NF" "PF" ...
$ PR...14 : chr [1:205] "7" "7" "6" "4" ...
$ CL...15 : chr [1:205] "6" "7" "6" "4" ...
$ Status...16 : chr [1:205] "NF" "NF" "NF" "PF" ...
$ PR...17 : chr [1:205] "6" "7" "6" "-" ...
$ CL...18 : chr [1:205] "6" "7" "6" "-" ...
$ Status...19 : chr [1:205] "NF" "NF" "NF" "-" ...
$ PR...20 : chr [1:205] "7" "7" "6" "-" ...
$ CL...21 : chr [1:205] "7" "7" "6" "-" ...
$ Status...22 : chr [1:205] "NF" "NF" "NF" "-" ...
$ PR...23 : chr [1:205] "7" "7" "6" "-" ...
$ CL...24 : chr [1:205] "7" "7" "6" "-" ...
$ Status...25 : chr [1:205] "NF" "NF" "NF" "-" ...
$ PR...26 : chr [1:205] "7" "7" "6" "-" ...
$ CL...27 : chr [1:205] "7" "7" "6" "-" ...
$ Status...28 : chr [1:205] "NF" "NF" "NF" "-" ...
$ PR...29 : chr [1:205] "7" "7" "6" "-" ...
$ CL...30 : chr [1:205] "7" "7" "6" "-" ...
$ Status...31 : chr [1:205] "NF" "NF" "NF" "-" ...
$ PR...32 : chr [1:205] "7" "7" "6" "-" ...
$ CL...33 : chr [1:205] "7" "7" "6" "-" ...
$ Status...34 : chr [1:205] "NF" "NF" "NF" "-" ...
$ PR...35 : chr [1:205] "7" "7" "6" "-" ...
$ CL...36 : chr [1:205] "7" "7" "6" "-" ...
$ Status...37 : chr [1:205] "NF" "NF" "NF" "-" ...
$ PR...38 : chr [1:205] "7" "7" "6" "-" ...
$ CL...39 : chr [1:205] "7" "7" "6" "-" ...
$ Status...40 : chr [1:205] "NF" "NF" "NF" "-" ...
$ PR...41 : chr [1:205] "7" "7" "6" "-" ...
$ CL...42 : chr [1:205] "7" "7" "6" "-" ...
$ Status...43 : chr [1:205] "NF" "NF" "NF" "-" ...
$ PR...44 : chr [1:205] "7" "7" "6" "-" ...
$ CL...45 : chr [1:205] "7" "7" "6" "-" ...
$ Status...46 : chr [1:205] "NF" "NF" "NF" "-" ...
$ PR...47 : chr [1:205] "6" "7" "5" "-" ...
$ CL...48 : chr [1:205] "6" "7" "6" "-" ...
$ Status...49 : chr [1:205] "NF" "NF" "NF" "-" ...
$ PR...50 : chr [1:205] "7" "7" "6" "-" ...
$ CL...51 : chr [1:205] "7" "7" "4" "-" ...
$ Status...52 : chr [1:205] "NF" "NF" "PF" "-" ...
$ PR...53 : chr [1:205] "7" "7" "4" "-" ...
$ CL...54 : chr [1:205] "7" "6" "4" "-" ...
$ Status...55 : chr [1:205] "NF" "NF" "PF" "-" ...
$ PR...56 : chr [1:205] "7" "4" "4" "-" ...
$ CL...57 : chr [1:205] "7" "4" "4" "-" ...
$ Status...58 : chr [1:205] "NF" "PF" "PF" "-" ...
$ PR...59 : chr [1:205] "6" "4" "7" "-" ...
$ CL...60 : chr [1:205] "6" "3" "6" "-" ...
$ Status...61 : chr [1:205] "NF" "PF" "NF" "-" ...
$ PR...62 : chr [1:205] "7" "2" "7" "2" ...
$ CL...63 : chr [1:205] "7" "4" "6" "1" ...
$ Status...64 : chr [1:205] "NF" "PF" "NF" "F" ...
$ PR...65 : chr [1:205] "7" "3" "7" "1" ...
$ CL...66 : chr [1:205] "7" "4" "7" "1" ...
$ Status...67 : chr [1:205] "NF" "PF" "NF" "F" ...
$ PR...68 : chr [1:205] "7" "3" "6" "1" ...
$ CL...69 : chr [1:205] "7" "4" "6" "1" ...
$ Status...70 : chr [1:205] "NF" "PF" "NF" "F" ...
$ PR...71 : chr [1:205] "7" "4" "6" "1" ...
$ CL...72 : chr [1:205] "7" "4" "6" "1" ...
$ Status...73 : chr [1:205] "NF" "PF" "NF" "F" ...
$ PR...74 : chr [1:205] "7" "4" "6" "1" ...
$ CL...75 : chr [1:205] "7" "4" "6" "1" ...
$ Status...76 : chr [1:205] "NF" "PF" "NF" "F" ...
$ PR...77 : chr [1:205] "7" "4" "6" "1" ...
$ CL...78 : chr [1:205] "7" "5" "5" "1" ...
$ Status...79 : chr [1:205] "NF" "PF" "NF" "F" ...
$ PR...80 : chr [1:205] "7" "4" "6" "1" ...
$ CL...81 : chr [1:205] "7" "5" "5" "1" ...
$ Status...82 : chr [1:205] "NF" "PF" "NF" "F" ...
$ PR...83 : chr [1:205] "7" "4" "6" "1" ...
$ CL...84 : chr [1:205] "7" "5" "5" "1" ...
$ Status...85 : chr [1:205] "NF" "PF" "NF" "F" ...
$ PR...86 : chr [1:205] "7" "3" "6" "1" ...
$ CL...87 : chr [1:205] "7" "4" "5" "1" ...
$ Status...88 : chr [1:205] "NF" "PF" "NF" "F" ...
$ PR...89 : chr [1:205] "6" "3" "6" "1" ...
$ CL...90 : chr [1:205] "6" "3" "5" "1" ...
$ Status...91 : chr [1:205] "NF" "PF" "NF" "F" ...
$ PR...92 : chr [1:205] "6" "3" "6" "1" ...
$ CL...93 : chr [1:205] "6" "3" "5" "1" ...
$ Status...94 : chr [1:205] "NF" "PF" "NF" "F" ...
$ PR...95 : chr [1:205] "5" "3" "6" "1" ...
$ CL...96 : chr [1:205] "6" "3" "5" "1" ...
$ Status...97 : chr [1:205] "NF" "PF" "NF" "F" ...
$ PR...98 : chr [1:205] "5" "3" "6" "1" ...
$ CL...99 : chr [1:205] "5" "3" "5" "1" ...
[list output truncated]
chr(= character)chr(= character)PR (political rights) and CL (civil liberty) to be chr(= character)numeric → This should be fixedSolution:
- You can see - in the spread sheet
- This means a missing value in Freedom House data set
- RStudio recognizes a blank as a missing value and show it -
→ We need to let RStudion recognize "-" means missing value
→ Add the following command: na = "-"
fh <- read_excel("data/FH_Country.xls",
sheet = 2,
skip = 2,
na = "-") # NA = "-" でも可str(fh)tibble [205 × 133] (S3: tbl_df/tbl/data.frame)
$ ...1 : chr [1:205] "Afghanistan" "Albania" "Algeria" "Andorra" ...
$ PR...2 : chr [1:205] "4" "7" "6" "4" ...
$ CL...3 : chr [1:205] "5" "7" "6" "3" ...
$ Status...4 : chr [1:205] "PF" "NF" "NF" "PF" ...
$ PR...5 : num [1:205] 7 7 6 4 NA NA 2 NA 1 1 ...
$ CL...6 : num [1:205] 6 7 6 4 NA NA 2 NA 1 1 ...
$ Status...7 : chr [1:205] "NF" "NF" "NF" "PF" ...
$ PR...8 : num [1:205] 7 7 6 4 NA NA 2 NA 1 1 ...
$ CL...9 : num [1:205] 6 7 6 4 NA NA 4 NA 1 1 ...
$ Status...10 : chr [1:205] "NF" "NF" "NF" "PF" ...
$ PR...11 : num [1:205] 7 7 7 4 6 NA 2 NA 1 1 ...
$ CL...12 : num [1:205] 6 7 6 4 6 NA 4 NA 1 1 ...
$ Status...13 : chr [1:205] "NF" "NF" "NF" "PF" ...
$ PR...14 : num [1:205] 7 7 6 4 6 NA 6 NA 1 1 ...
$ CL...15 : num [1:205] 6 7 6 4 6 NA 5 NA 1 1 ...
$ Status...16 : chr [1:205] "NF" "NF" "NF" "PF" ...
$ PR...17 : num [1:205] 6 7 6 NA 7 NA 6 NA 1 1 ...
$ CL...18 : num [1:205] 6 7 6 NA 7 NA 6 NA 1 1 ...
$ Status...19 : chr [1:205] "NF" "NF" "NF" NA ...
$ PR...20 : num [1:205] 7 7 6 NA 7 NA 6 NA 1 1 ...
$ CL...21 : num [1:205] 7 7 6 NA 7 NA 5 NA 1 1 ...
$ Status...22 : chr [1:205] "NF" "NF" "NF" NA ...
$ PR...23 : num [1:205] 7 7 6 NA 7 NA 6 NA 1 1 ...
$ CL...24 : num [1:205] 7 7 6 NA 7 NA 5 NA 1 1 ...
$ Status...25 : chr [1:205] "NF" "NF" "NF" NA ...
$ PR...26 : num [1:205] 7 7 6 NA 7 NA 6 NA 1 1 ...
$ CL...27 : num [1:205] 7 7 6 NA 7 NA 5 NA 1 1 ...
$ Status...28 : chr [1:205] "NF" "NF" "NF" NA ...
$ PR...29 : num [1:205] 7 7 6 NA 7 2 6 NA 1 1 ...
$ CL...30 : num [1:205] 7 7 6 NA 7 2 5 NA 1 1 ...
$ Status...31 : chr [1:205] "NF" "NF" "NF" NA ...
$ PR...32 : num [1:205] 7 7 6 NA 7 2 3 NA 1 1 ...
$ CL...33 : num [1:205] 7 7 6 NA 7 3 3 NA 1 1 ...
$ Status...34 : chr [1:205] "NF" "NF" "NF" NA ...
$ PR...35 : num [1:205] 7 7 6 NA 7 2 2 NA 1 1 ...
$ CL...36 : num [1:205] 7 7 6 NA 7 3 2 NA 1 1 ...
$ Status...37 : chr [1:205] "NF" "NF" "NF" NA ...
$ PR...38 : num [1:205] 7 7 6 NA 7 2 2 NA 1 1 ...
$ CL...39 : num [1:205] 7 7 6 NA 7 3 2 NA 1 1 ...
$ Status...40 : chr [1:205] "NF" "NF" "NF" NA ...
$ PR...41 : num [1:205] 7 7 6 NA 7 2 2 NA 1 1 ...
$ CL...42 : num [1:205] 7 7 6 NA 7 3 1 NA 1 1 ...
$ Status...43 : chr [1:205] "NF" "NF" "NF" NA ...
$ PR...44 : num [1:205] 7 7 6 NA 7 2 2 NA 1 1 ...
$ CL...45 : num [1:205] 7 7 6 NA 7 3 1 NA 1 1 ...
$ Status...46 : chr [1:205] "NF" "NF" "NF" NA ...
$ PR...47 : num [1:205] 6 7 5 NA 7 2 2 NA 1 1 ...
$ CL...48 : num [1:205] 6 7 6 NA 7 3 1 NA 1 1 ...
$ Status...49 : chr [1:205] "NF" "NF" "NF" NA ...
$ PR...50 : num [1:205] 7 7 6 NA 7 2 1 NA 1 1 ...
$ CL...51 : num [1:205] 7 7 4 NA 7 3 2 NA 1 1 ...
$ Status...52 : chr [1:205] "NF" "NF" "PF" NA ...
$ PR...53 : num [1:205] 7 7 4 NA 7 3 1 NA 1 1 ...
$ CL...54 : num [1:205] 7 6 4 NA 7 2 3 NA 1 1 ...
$ Status...55 : chr [1:205] "NF" "NF" "PF" NA ...
$ PR...56 : num [1:205] 7 4 4 NA 6 3 1 5 1 1 ...
$ CL...57 : num [1:205] 7 4 4 NA 4 3 3 5 1 1 ...
$ Status...58 : chr [1:205] "NF" "PF" "PF" NA ...
$ PR...59 : num [1:205] 6 4 7 NA 6 3 2 4 1 1 ...
$ CL...60 : num [1:205] 6 3 6 NA 6 3 3 3 1 1 ...
$ Status...61 : chr [1:205] "NF" "PF" "NF" NA ...
$ PR...62 : num [1:205] 7 2 7 2 7 4 2 3 1 1 ...
$ CL...63 : num [1:205] 7 4 6 1 7 3 3 4 1 1 ...
$ Status...64 : chr [1:205] "NF" "PF" "NF" "F" ...
$ PR...65 : num [1:205] 7 3 7 1 7 4 2 3 1 1 ...
$ CL...66 : num [1:205] 7 4 7 1 7 3 3 4 1 1 ...
$ Status...67 : chr [1:205] "NF" "PF" "NF" "F" ...
$ PR...68 : num [1:205] 7 3 6 1 6 4 2 4 1 1 ...
$ CL...69 : num [1:205] 7 4 6 1 6 3 3 4 1 1 ...
$ Status...70 : chr [1:205] "NF" "PF" "NF" "F" ...
$ PR...71 : num [1:205] 7 4 6 1 6 4 2 5 1 1 ...
$ CL...72 : num [1:205] 7 4 6 1 6 3 3 4 1 1 ...
$ Status...73 : chr [1:205] "NF" "PF" "NF" "F" ...
$ PR...74 : num [1:205] 7 4 6 1 6 4 2 5 1 1 ...
$ CL...75 : num [1:205] 7 4 6 1 6 3 3 4 1 1 ...
$ Status...76 : chr [1:205] "NF" "PF" "NF" "F" ...
$ PR...77 : num [1:205] 7 4 6 1 6 4 3 4 1 1 ...
$ CL...78 : num [1:205] 7 5 5 1 6 3 3 4 1 1 ...
$ Status...79 : chr [1:205] "NF" "PF" "NF" "F" ...
$ PR...80 : num [1:205] 7 4 6 1 6 4 2 4 1 1 ...
$ CL...81 : num [1:205] 7 5 5 1 6 3 3 4 1 1 ...
$ Status...82 : chr [1:205] "NF" "PF" "NF" "F" ...
$ PR...83 : num [1:205] 7 4 6 1 6 4 1 4 1 1 ...
$ CL...84 : num [1:205] 7 5 5 1 6 2 2 4 1 1 ...
$ Status...85 : chr [1:205] "NF" "PF" "NF" "F" ...
$ PR...86 : num [1:205] 7 3 6 1 6 4 3 4 1 1 ...
$ CL...87 : num [1:205] 7 4 5 1 6 2 3 4 1 1 ...
$ Status...88 : chr [1:205] "NF" "PF" "NF" "F" ...
$ PR...89 : num [1:205] 6 3 6 1 6 4 3 4 1 1 ...
$ CL...90 : num [1:205] 6 3 5 1 5 2 3 4 1 1 ...
$ Status...91 : chr [1:205] "NF" "PF" "NF" "F" ...
$ PR...92 : num [1:205] 6 3 6 1 6 4 2 4 1 1 ...
$ CL...93 : num [1:205] 6 3 5 1 5 2 2 4 1 1 ...
$ Status...94 : chr [1:205] "NF" "PF" "NF" "F" ...
$ PR...95 : num [1:205] 5 3 6 1 6 2 2 5 1 1 ...
$ CL...96 : num [1:205] 6 3 5 1 5 2 2 4 1 1 ...
$ Status...97 : chr [1:205] "NF" "PF" "NF" "F" ...
$ PR...98 : num [1:205] 5 3 6 1 6 2 2 5 1 1 ...
$ CL...99 : num [1:205] 5 3 5 1 5 2 2 4 1 1 ...
[list output truncated]
All variables except PR... and CL... are recognized as numeric
PR...2 and CL...3 are recognized as character
→ This should be fixed
We need to know why these two variables (PR...2 and CL...3) are not changed to numeric
→ We need to change the class of these two variables to numeric from character
Using unique() function, check the values of PR...2
unique(fh$PR...2)[1] "4" "7" "6" NA "1" "2" "5" "3" "2(5)"
2(5) is included!
2(5) is not a numeric but a character
The value of character variable is shown with ""
Since 2(5) is not a numeric, the value was shown with ""
→ NA is an exception in RStudio
→ NA is not recognized as a character
Because PR...2 contains 2(5), PR...2 is recognized as character variable
→ This is the reason!
Using unique() function, check the values of CL...3
unique(fh$CL...3)[1] "5" "7" "6" "3" NA "1" "4" "2" "3(6)"
3(6) is included!3(6) is not a numeric but a charactercharacter variable is shown with ""3(6) is not a numeric, the value was shown with ""NA is an exception in RStudioNA is not recognized as a characterCL...3 contains 3(6), CL...3 is recognized as character variableSolution:
if_else() function, replace 2(5) and 3(6) with NAfh_nafh_na <- fh %>%
dplyr::mutate(
PR...2 = if_else(PR...2 == "2(5)", "NA", PR...2),
CL...3 = if_else(CL...3 == "3(6)", "NA", CL...3)) %>%
mutate(across(c(PR...2, CL...3), as.numeric)) unique() functio, check the value of PR...2unique(fh_na$PR...2)[1] 4 7 6 NA 1 2 5 3
NA is not shown with ""
→ NA is econgized asmissing value
Using unique() function, check the value of CL...3
unique(fh_na$CL...3)[1] 5 7 6 3 NA 1 4 2
NA is not shown with ""
→ NA is econgized asmissing value
Using unique() function, check the class of PR...2 and CL...3
str(fh_na$PR...2) num [1:205] 4 7 6 4 NA NA 6 NA 1 1 ...
str(fh_na$CL...3) num [1:205] 5 7 6 3 NA NA 3 NA 1 1 ...
PR...2 and CL...3 are recognized as numericfh_nastr(fh_na)tibble [205 × 133] (S3: tbl_df/tbl/data.frame)
$ ...1 : chr [1:205] "Afghanistan" "Albania" "Algeria" "Andorra" ...
$ PR...2 : num [1:205] 4 7 6 4 NA NA 6 NA 1 1 ...
$ CL...3 : num [1:205] 5 7 6 3 NA NA 3 NA 1 1 ...
$ Status...4 : chr [1:205] "PF" "NF" "NF" "PF" ...
$ PR...5 : num [1:205] 7 7 6 4 NA NA 2 NA 1 1 ...
$ CL...6 : num [1:205] 6 7 6 4 NA NA 2 NA 1 1 ...
$ Status...7 : chr [1:205] "NF" "NF" "NF" "PF" ...
$ PR...8 : num [1:205] 7 7 6 4 NA NA 2 NA 1 1 ...
$ CL...9 : num [1:205] 6 7 6 4 NA NA 4 NA 1 1 ...
$ Status...10 : chr [1:205] "NF" "NF" "NF" "PF" ...
$ PR...11 : num [1:205] 7 7 7 4 6 NA 2 NA 1 1 ...
$ CL...12 : num [1:205] 6 7 6 4 6 NA 4 NA 1 1 ...
$ Status...13 : chr [1:205] "NF" "NF" "NF" "PF" ...
$ PR...14 : num [1:205] 7 7 6 4 6 NA 6 NA 1 1 ...
$ CL...15 : num [1:205] 6 7 6 4 6 NA 5 NA 1 1 ...
$ Status...16 : chr [1:205] "NF" "NF" "NF" "PF" ...
$ PR...17 : num [1:205] 6 7 6 NA 7 NA 6 NA 1 1 ...
$ CL...18 : num [1:205] 6 7 6 NA 7 NA 6 NA 1 1 ...
$ Status...19 : chr [1:205] "NF" "NF" "NF" NA ...
$ PR...20 : num [1:205] 7 7 6 NA 7 NA 6 NA 1 1 ...
$ CL...21 : num [1:205] 7 7 6 NA 7 NA 5 NA 1 1 ...
$ Status...22 : chr [1:205] "NF" "NF" "NF" NA ...
$ PR...23 : num [1:205] 7 7 6 NA 7 NA 6 NA 1 1 ...
$ CL...24 : num [1:205] 7 7 6 NA 7 NA 5 NA 1 1 ...
$ Status...25 : chr [1:205] "NF" "NF" "NF" NA ...
$ PR...26 : num [1:205] 7 7 6 NA 7 NA 6 NA 1 1 ...
$ CL...27 : num [1:205] 7 7 6 NA 7 NA 5 NA 1 1 ...
$ Status...28 : chr [1:205] "NF" "NF" "NF" NA ...
$ PR...29 : num [1:205] 7 7 6 NA 7 2 6 NA 1 1 ...
$ CL...30 : num [1:205] 7 7 6 NA 7 2 5 NA 1 1 ...
$ Status...31 : chr [1:205] "NF" "NF" "NF" NA ...
$ PR...32 : num [1:205] 7 7 6 NA 7 2 3 NA 1 1 ...
$ CL...33 : num [1:205] 7 7 6 NA 7 3 3 NA 1 1 ...
$ Status...34 : chr [1:205] "NF" "NF" "NF" NA ...
$ PR...35 : num [1:205] 7 7 6 NA 7 2 2 NA 1 1 ...
$ CL...36 : num [1:205] 7 7 6 NA 7 3 2 NA 1 1 ...
$ Status...37 : chr [1:205] "NF" "NF" "NF" NA ...
$ PR...38 : num [1:205] 7 7 6 NA 7 2 2 NA 1 1 ...
$ CL...39 : num [1:205] 7 7 6 NA 7 3 2 NA 1 1 ...
$ Status...40 : chr [1:205] "NF" "NF" "NF" NA ...
$ PR...41 : num [1:205] 7 7 6 NA 7 2 2 NA 1 1 ...
$ CL...42 : num [1:205] 7 7 6 NA 7 3 1 NA 1 1 ...
$ Status...43 : chr [1:205] "NF" "NF" "NF" NA ...
$ PR...44 : num [1:205] 7 7 6 NA 7 2 2 NA 1 1 ...
$ CL...45 : num [1:205] 7 7 6 NA 7 3 1 NA 1 1 ...
$ Status...46 : chr [1:205] "NF" "NF" "NF" NA ...
$ PR...47 : num [1:205] 6 7 5 NA 7 2 2 NA 1 1 ...
$ CL...48 : num [1:205] 6 7 6 NA 7 3 1 NA 1 1 ...
$ Status...49 : chr [1:205] "NF" "NF" "NF" NA ...
$ PR...50 : num [1:205] 7 7 6 NA 7 2 1 NA 1 1 ...
$ CL...51 : num [1:205] 7 7 4 NA 7 3 2 NA 1 1 ...
$ Status...52 : chr [1:205] "NF" "NF" "PF" NA ...
$ PR...53 : num [1:205] 7 7 4 NA 7 3 1 NA 1 1 ...
$ CL...54 : num [1:205] 7 6 4 NA 7 2 3 NA 1 1 ...
$ Status...55 : chr [1:205] "NF" "NF" "PF" NA ...
$ PR...56 : num [1:205] 7 4 4 NA 6 3 1 5 1 1 ...
$ CL...57 : num [1:205] 7 4 4 NA 4 3 3 5 1 1 ...
$ Status...58 : chr [1:205] "NF" "PF" "PF" NA ...
$ PR...59 : num [1:205] 6 4 7 NA 6 3 2 4 1 1 ...
$ CL...60 : num [1:205] 6 3 6 NA 6 3 3 3 1 1 ...
$ Status...61 : chr [1:205] "NF" "PF" "NF" NA ...
$ PR...62 : num [1:205] 7 2 7 2 7 4 2 3 1 1 ...
$ CL...63 : num [1:205] 7 4 6 1 7 3 3 4 1 1 ...
$ Status...64 : chr [1:205] "NF" "PF" "NF" "F" ...
$ PR...65 : num [1:205] 7 3 7 1 7 4 2 3 1 1 ...
$ CL...66 : num [1:205] 7 4 7 1 7 3 3 4 1 1 ...
$ Status...67 : chr [1:205] "NF" "PF" "NF" "F" ...
$ PR...68 : num [1:205] 7 3 6 1 6 4 2 4 1 1 ...
$ CL...69 : num [1:205] 7 4 6 1 6 3 3 4 1 1 ...
$ Status...70 : chr [1:205] "NF" "PF" "NF" "F" ...
$ PR...71 : num [1:205] 7 4 6 1 6 4 2 5 1 1 ...
$ CL...72 : num [1:205] 7 4 6 1 6 3 3 4 1 1 ...
$ Status...73 : chr [1:205] "NF" "PF" "NF" "F" ...
$ PR...74 : num [1:205] 7 4 6 1 6 4 2 5 1 1 ...
$ CL...75 : num [1:205] 7 4 6 1 6 3 3 4 1 1 ...
$ Status...76 : chr [1:205] "NF" "PF" "NF" "F" ...
$ PR...77 : num [1:205] 7 4 6 1 6 4 3 4 1 1 ...
$ CL...78 : num [1:205] 7 5 5 1 6 3 3 4 1 1 ...
$ Status...79 : chr [1:205] "NF" "PF" "NF" "F" ...
$ PR...80 : num [1:205] 7 4 6 1 6 4 2 4 1 1 ...
$ CL...81 : num [1:205] 7 5 5 1 6 3 3 4 1 1 ...
$ Status...82 : chr [1:205] "NF" "PF" "NF" "F" ...
$ PR...83 : num [1:205] 7 4 6 1 6 4 1 4 1 1 ...
$ CL...84 : num [1:205] 7 5 5 1 6 2 2 4 1 1 ...
$ Status...85 : chr [1:205] "NF" "PF" "NF" "F" ...
$ PR...86 : num [1:205] 7 3 6 1 6 4 3 4 1 1 ...
$ CL...87 : num [1:205] 7 4 5 1 6 2 3 4 1 1 ...
$ Status...88 : chr [1:205] "NF" "PF" "NF" "F" ...
$ PR...89 : num [1:205] 6 3 6 1 6 4 3 4 1 1 ...
$ CL...90 : num [1:205] 6 3 5 1 5 2 3 4 1 1 ...
$ Status...91 : chr [1:205] "NF" "PF" "NF" "F" ...
$ PR...92 : num [1:205] 6 3 6 1 6 4 2 4 1 1 ...
$ CL...93 : num [1:205] 6 3 5 1 5 2 2 4 1 1 ...
$ Status...94 : chr [1:205] "NF" "PF" "NF" "F" ...
$ PR...95 : num [1:205] 5 3 6 1 6 2 2 5 1 1 ...
$ CL...96 : num [1:205] 6 3 5 1 5 2 2 4 1 1 ...
$ Status...97 : chr [1:205] "NF" "PF" "NF" "F" ...
$ PR...98 : num [1:205] 5 3 6 1 6 2 2 5 1 1 ...
$ CL...99 : num [1:205] 5 3 5 1 5 2 2 4 1 1 ...
[list output truncated]
dim(fh_na) [1] 205 133
fh_naA bit more work to do
We have three variables per country and per year (PR, CL, Status)
■PR...2, CL...3, Status...4 are the data for 1972
■PR...5, CL...6, Status...7 are the data for 1973
・・・・・・・・・・・・・・・・・・・・・・・・
■PR...128, CL...129, Status...130 are data for 2015
■PR...131, CL...132, Status...134 are data for 2016
Two different classes of vairalbes: numeric and categorical
Solution:Make two variables (value, status)
→ The values of PR and CL are put into value
→ The values of Status are put into status
...1...1 shows country name...1 as countryfh_na <- fh_na %>%
rename(country = 1) fh_country and fh_nafh_country <- fh_na %>%
select(country)
fh_na <- fh_na %>%
select(-country)names(fh_na) [1] "PR...2" "CL...3" "Status...4" "PR...5" "CL...6"
[6] "Status...7" "PR...8" "CL...9" "Status...10" "PR...11"
[11] "CL...12" "Status...13" "PR...14" "CL...15" "Status...16"
[16] "PR...17" "CL...18" "Status...19" "PR...20" "CL...21"
[21] "Status...22" "PR...23" "CL...24" "Status...25" "PR...26"
[26] "CL...27" "Status...28" "PR...29" "CL...30" "Status...31"
[31] "PR...32" "CL...33" "Status...34" "PR...35" "CL...36"
[36] "Status...37" "PR...38" "CL...39" "Status...40" "PR...41"
[41] "CL...42" "Status...43" "PR...44" "CL...45" "Status...46"
[46] "PR...47" "CL...48" "Status...49" "PR...50" "CL...51"
[51] "Status...52" "PR...53" "CL...54" "Status...55" "PR...56"
[56] "CL...57" "Status...58" "PR...59" "CL...60" "Status...61"
[61] "PR...62" "CL...63" "Status...64" "PR...65" "CL...66"
[66] "Status...67" "PR...68" "CL...69" "Status...70" "PR...71"
[71] "CL...72" "Status...73" "PR...74" "CL...75" "Status...76"
[76] "PR...77" "CL...78" "Status...79" "PR...80" "CL...81"
[81] "Status...82" "PR...83" "CL...84" "Status...85" "PR...86"
[86] "CL...87" "Status...88" "PR...89" "CL...90" "Status...91"
[91] "PR...92" "CL...93" "Status...94" "PR...95" "CL...96"
[96] "Status...97" "PR...98" "CL...99" "Status...100" "PR...101"
[101] "CL...102" "Status...103" "PR...104" "CL...105" "Status...106"
[106] "PR...107" "CL...108" "Status...109" "PR...110" "CL...111"
[111] "Status...112" "PR...113" "CL...114" "Status...115" "PR...116"
[116] "CL...117" "Status...118" "PR...119" "CL...120" "Status...121"
[121] "PR...122" "CL...123" "Status...124" "PR...125" "CL...126"
[126] "Status...127" "PR...128" "CL...129" "Status...130" "PR...131"
[131] "CL...132" "Status...133"
names(fh_country)[1] "country"
fh_nacolnames(fh_na) <-
str_replace_all(colnames(fh_na),
c("\\.\\.\\." = "-")) %>% # replace "・・・" with "-"
str_subset("PR|CL|Status") %>% # change the variable names like "pr_1972"
str_c(., "_") %>%
str_replace_all(c("-" = "",
"[0-9]" = "",
"PR" = "pr", # pr => PR
"CL" = "cl", # cl => CL
"Status" = "st")) %>% # st = Status
str_c(., rep(setdiff(1972:2016, 1981), # exclude 1981
each = 3)) # make 3 variables per yearfh_nanames(fh_na) [1] "pr_1972" "cl_1972" "st_1972" "pr_1973" "cl_1973" "st_1973" "pr_1974"
[8] "cl_1974" "st_1974" "pr_1975" "cl_1975" "st_1975" "pr_1976" "cl_1976"
[15] "st_1976" "pr_1977" "cl_1977" "st_1977" "pr_1978" "cl_1978" "st_1978"
[22] "pr_1979" "cl_1979" "st_1979" "pr_1980" "cl_1980" "st_1980" "pr_1982"
[29] "cl_1982" "st_1982" "pr_1983" "cl_1983" "st_1983" "pr_1984" "cl_1984"
[36] "st_1984" "pr_1985" "cl_1985" "st_1985" "pr_1986" "cl_1986" "st_1986"
[43] "pr_1987" "cl_1987" "st_1987" "pr_1988" "cl_1988" "st_1988" "pr_1989"
[50] "cl_1989" "st_1989" "pr_1990" "cl_1990" "st_1990" "pr_1991" "cl_1991"
[57] "st_1991" "pr_1992" "cl_1992" "st_1992" "pr_1993" "cl_1993" "st_1993"
[64] "pr_1994" "cl_1994" "st_1994" "pr_1995" "cl_1995" "st_1995" "pr_1996"
[71] "cl_1996" "st_1996" "pr_1997" "cl_1997" "st_1997" "pr_1998" "cl_1998"
[78] "st_1998" "pr_1999" "cl_1999" "st_1999" "pr_2000" "cl_2000" "st_2000"
[85] "pr_2001" "cl_2001" "st_2001" "pr_2002" "cl_2002" "st_2002" "pr_2003"
[92] "cl_2003" "st_2003" "pr_2004" "cl_2004" "st_2004" "pr_2005" "cl_2005"
[99] "st_2005" "pr_2006" "cl_2006" "st_2006" "pr_2007" "cl_2007" "st_2007"
[106] "pr_2008" "cl_2008" "st_2008" "pr_2009" "cl_2009" "st_2009" "pr_2010"
[113] "cl_2010" "st_2010" "pr_2011" "cl_2011" "st_2011" "pr_2012" "cl_2012"
[120] "st_2012" "pr_2013" "cl_2013" "st_2013" "pr_2014" "cl_2014" "st_2014"
[127] "pr_2015" "cl_2015" "st_2015" "pr_2016" "cl_2016" "st_2016"
bind_cols() function, merge fh_na and fh_countryfh_na <- fh_country %>% #
bind_cols(fh_na)fh_narmarkdown::paged_table(fh_na)value and typePR_CL_long <- fh_na %>%
select(country,
starts_with(c("pr", "cl"))) %>% # select those variables starting with `pr` and `cl`
pivot_longer(pr_1972:cl_2016, # assign the range of variables
names_to = "type", # put variable names, such as "pr_1972", into `type`
values_to = "value") %>% # put values of variables, such as 1972, into `value`
separate(type,
into = c("type", "year"), # divide the values of type into 2: `type` and `year`
sep = "_")%>% # two values should be connected by "_"
drop_na() # drop missing values ST_long <- fh_na %>%
select(country, # country を選ぶ
starts_with("st")) %>% # select those variables starting with `st`
pivot_longer(st_1972:st_2016, # assign the range of variables
names_to = "name", # put variable names, such as "pr_1972", into `name`
values_to = "status") %>% # put values of variables, such as 1972, into `status`
separate(name,
into = c("name", "year"), # divide the values of type into 2: `name` and `year`
sep = "_") %>% # two values should be connected by "_"
select(-name)%>% # nameは不要なので削除
drop_na() # drop missing values PR_CL_long names(PR_CL_long)[1] "country" "type" "year" "value"
ST_long names(ST_long)[1] "country" "year" "status"
left_joint() function, merge PR_CL_long and ST_long with the two shared variables: country and yearfh_all_long <- PR_CL_long %>%
left_join(ST_long,
by = c("country", "year"))DT::datatable(fh_all_long)Freedom HouseTransition of Political Rights between North Kore and South Korea (1972-2016)
korea_PR <- fh_all_long %>%
filter(country == "North Korea" | country == "South Korea") %>%
filter(type == "pr")korea_PR %>%
ggplot(aes(x = value, y = year,
color = country,
shape = country)) +
geom_point() +
ggtitle("Political Rights between N.Korea and S.Korea: 1972-2016") +
labs(x = "Political Rights", y = "Year") +
theme(legend.position = c(0.5, 0.8)) Transition of Political Rights between Japan and China (1972-2016)
jpn.chi_PR <- fh_all_long %>%
filter(country == "Japan" | country == "China") %>%
filter(type == "pr")jpn.chi_PR %>%
ggplot(aes(x = value, y = year,
color = country,
shape = country)) +
geom_point() +
ggtitle("Political Rights between Japan and China: 1972-2016") +
labs(x = "Political Rights", y = "Year") +
theme(legend.position = c(0.5, 0.8))