✔ What we do here
・Introduce how to make a variable
・Introduce how to merge variables and make a dataframe
・Introduce how to merge multiple dataframes
・Introduce how to read data with various file name extensions
・Introduce how to clean the data you read into RStudio
・Explain technical terms we need in analyzing data
technical terms explained here text data, binary data, file extension, pass, file, folder, R project, R project folder, working directory, missing value, class, data cleaning, converting data between wide and long format
tidyverse
packagelibrary(haven)
library(readxl)
library(tidyverse)
─ Attaching packages ──────────────────── tidyverse 1.3.1 ─
✓ ggplot2 3.3.3 ✓ purrr 0.3.4
✓ tibble 3.1.2 ✓ dplyr 1.0.6
✓ tidyr 1.1.3 ✓ stringr 1.4.0
✓ readr 1.4.0 ✓ forcats 0.5.1
─ Conflicts ───────────────────── tidyverse_conflicts() ─
x dplyr::filter() masks stats::filter()
x dplyr::lag() masks stats::lag()
tidyverse
contains 8 useful packagesreadr
to read the dataid
<- c(1,2,3,4,5,6,7,8) id
name
<- c("Thies", "Cox", "McCubbins", "Schwartz", "DeNardo", "Bawn", "Patterson", "Geddes") name
score
<- c(43, 74, 80, 37, 20, 83, 64, 35) score
tidyverse
packages to use tibble() function
library(tidyverse)
<- tibble(id, name, score)
df1 df1
# A tibble: 8 x 3
id name score
<dbl> <chr> <dbl>
1 1 Thies 43
2 2 Cox 74
3 3 McCubbins 80
4 4 Schwartz 37
5 5 DeNardo 20
6 6 Bawn 83
7 7 Patterson 64
8 8 Geddes 35
- You can also use data.frame()
instead of using tibble()
<- data.frame(id, name, score)
df1 df1
○ tibble()
shows the size of data frame, such as 8 x 3
and class of variable, such as <dbl>
& <chr>
→ You should use tibble()
○ If you load tidyverse
package, then you can use tibble()
department
df
), then put the name of the new variable (department
)$department <- c("poli-sci", "econ", "poli-sci", "econ", "art", "music", "communication", "history")
df1
df1
# A tibble: 8 x 4
id name score department
<dbl> <chr> <dbl> <chr>
1 1 Thies 43 poli-sci
2 2 Cox 74 econ
3 3 McCubbins 80 poli-sci
4 4 Schwartz 37 econ
5 5 DeNardo 20 art
6 6 Bawn 83 music
7 7 Patterson 64 communication
8 8 Geddes 35 history
gender
to df1
$gender <- c("male", "male", "male", "male", "male", "female", "male", "female")
df1
df1
# A tibble: 8 x 5
id name score department gender
<dbl> <chr> <dbl> <chr> <chr>
1 1 Thies 43 poli-sci male
2 2 Cox 74 econ male
3 3 McCubbins 80 poli-sci male
4 4 Schwartz 37 econ male
5 5 DeNardo 20 art male
6 6 Bawn 83 music female
7 7 Patterson 64 communication male
8 8 Geddes 35 history female
df2
df2
includes the following two variables:①id
②prefecture
id
<- c(1,2,3,4,5,6,7,8) id
states
standing for where they come from<- c("California", "Oregon", "NY", "Washington", "Florida", "Wisconsin", "Alabama", "South Carolina") state
df2
<- tibble(id, state)
df2 df2
# A tibble: 8 x 2
id state
<dbl> <chr>
1 1 California
2 2 Oregon
3 3 NY
4 4 Washington
5 5 Florida
6 6 Wisconsin
7 7 Alabama
8 8 South Carolina
df1
and df2
) with the same variable name (id
) and name the new data frame, M
<- merge(df1, df2, by = "id")
M M
id name score department gender state
1 1 Thies 43 poli-sci male California
2 2 Cox 74 econ male Oregon
3 3 McCubbins 80 poli-sci male NY
4 4 Schwartz 37 econ male Washington
5 5 DeNardo 20 art male Florida
6 6 Bawn 83 music female Wisconsin
7 7 Patterson 64 communication male Alabama
8 8 Geddes 35 history female South Carolina
Question 1:
Make the list of your family or friends (df1
) containing the following variables:
① id: (1…..5)
② name
③ age
④ relationship
Question 2:
Make the list of your family or friends (df2
) containing the following variables:
① id: (1…..5)
② gender
③ height
Questin 3:
Merge the two data frames you made (df1
and df2
) with the shared variable (id
) and name it M1
.html
.Rmd
.csv
.doc
.png
.jpg
)R Project folder
, working directory
A path is a string of characters used to uniquely identify a location in a directory structure.
It is composed by following the directory tree hierarchy in which components, separated by a delimiting character, represent each directory.
The delimiting character is most commonly the slash \("/"\).
getwd()
= get working directry
→ You can see in which directory you are working on
For example, let me type getwd()
on my computer and hit the return key
getwd()
"/Users/asanomasahiko/Dropbox/statistics/class_materials/R"
Mac user
, then you will see something like thisWindow user
, then you will see C Drive
instead of Users
R
means the name of RProject forlder (= working directory) where you are currently working at.R
.Reasons:
- If you are in your R Project folder, you don’t have to assign a particular path whenevery you need data.
2021
_grades」=> ○「grades_2021」2021
grades」=> ○「grades_2021」data()
, then you can see the list of these embedded data (part of the list is shown here).data()
state.x77
head(state.x77)
Population Income Illiteracy Life Exp Murder HS Grad Frost Area
Alabama 3615 3624 2.1 69.05 15.1 41.3 20 50708
Alaska 365 6315 1.5 69.31 11.3 66.7 152 566432
Arizona 2212 4530 1.8 70.55 7.8 58.1 15 113417
Arkansas 2110 3378 1.9 70.66 10.1 39.9 65 51945
California 21198 5114 1.1 71.71 10.3 62.6 20 156361
Colorado 2541 4884 0.7 72.06 6.8 63.9 166 103766
state.x77
tail(state.x77)
Population Income Illiteracy Life Exp Murder HS Grad Frost Area
Vermont 472 3907 0.6 71.64 5.5 57.1 168 9267
Virginia 4981 4701 1.4 70.08 9.5 47.8 85 39780
Washington 3559 4864 0.6 71.72 4.3 63.5 32 66570
West Virginia 1799 3617 1.4 69.48 6.7 41.6 100 24070
Wisconsin 4589 4468 0.7 72.48 3.0 54.5 149 54464
Wyoming 376 4566 0.6 70.29 6.9 62.9 173 97203
Titanic
head(Titanic)
, , Age = Child, Survived = No
Sex
Class Male Female
1st 0 0
2nd 0 0
3rd 35 17
Crew 0 0
, , Age = Adult, Survived = No
Sex
Class Male Female
1st 118 4
2nd 154 13
3rd 387 89
Crew 670 3
, , Age = Child, Survived = Yes
Sex
Class Male Female
1st 5 1
2nd 11 13
3rd 13 14
Crew 0 0
, , Age = Adult, Survived = Yes
Sex
Class Male Female
1st 57 140
2nd 14 80
3rd 75 76
Crew 192 20
Read the data by data form
text data
and binary data
.txt
file — you use this when you do text analysis.html
file — you use this when you do web scraping.csv
file・・・comma-separated valuesData we cannot read and understand, but computer can
.xls
file
.xlsx
file — newer than .xls
file
.dta
file — you can use this on STATA
.rds
file — you can use this only for R
You are recommended not to use MS Office Excel, but LibreOffice
→ Free soft ware
→ You can assign character encode
→ You can evade unnecessary errors
.csv
filedata
within your RProjct folderhr96-17.csv
into data
tidyverse
package to read csv.filetidyverse
library(tidyverse)
hr
<- read_csv("data/hr96-17.csv",
hr na = ".") # replace missing data with "."
csv UTF-8 (.csv)
formUnicode(UTF-8)
and save it<- read_csv("data/hr96-17.csv",
hr na = ".",
locale = locale(encoding = "cp932"))
.xls[x]
filereadxl
pacakges to read .xls[x]
filelibrary(readxl)
<- read_excel("data/FH_Country.xls") fh
.dta
file.dta
file is a binary datahaven
pacakges to read .dta
filelibrary(haven)
<- read_dta("data/TRIANGLE.DTA")
triangle head(triangle)
# A tibble: 6 x 19
statea stateb year dependa dependb demauta demautb allies dispute1 logdstab
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 2 20 1920 0.0157 0.280 10 9 0 0 5.82
2 2 20 1921 0.0115 0.224 10 10 0 0 5.82
3 2 20 1922 0.0113 0.201 10 10 0 0 5.82
4 2 20 1923 0.0112 0.213 10 10 0 0 5.82
5 2 20 1924 0.0110 0.213 10 10 0 0 5.82
6 2 20 1925 0.0108 0.191 10 10 0 0 5.82
# … with 9 more variables: lcaprat2 <dbl>, smigoabi <dbl>, opena <dbl>,
# openb <dbl>, minrpwrs <dbl>, noncontg <dbl>, smldmat <dbl>, smldep <dbl>,
# dyadid <dbl>
triangle
containsnames(triangle)
[1] "statea" "stateb" "year" "dependa" "dependb" "demauta"
[7] "demautb" "allies" "dispute1" "logdstab" "lcaprat2" "smigoabi"
[13] "opena" "openb" "minrpwrs" "noncontg" "smldmat" "smldep"
[19] "dyadid"
TRIANGLE.DTA
as csv form
which is more widely used with the following commandwrite_excel_csv(triangle, "data/triangle.csv")
GDP
datawb_gdp_pc.csv
data
folder and put the we_gdp_pc.csv into the folder<- read_csv("data/wb_gdp_pc.csv") wb_gdp
Warning: Missing column names filled in: 'X3' [3]
─ Column specification ────────────────────────────
cols(
`Data Source` = col_character(),
`World Development Indicators` = col_character(),
X3 = col_character()
)
Warning: 265 parsing failures.
row col expected actual file
2 -- 3 columns 64 columns 'data/wb_gdp_pc.csv'
3 -- 3 columns 64 columns 'data/wb_gdp_pc.csv'
4 -- 3 columns 64 columns 'data/wb_gdp_pc.csv'
5 -- 3 columns 64 columns 'data/wb_gdp_pc.csv'
6 -- 3 columns 64 columns 'data/wb_gdp_pc.csv'
... ... ......... .......... ....................
See problems(...) for more details.
Pay attention to the Warning: Missing column names filled in : ’X3" [3]
Warning
is not as seriou as Error
, but we need to be cautious about itHow to deal with the Warning
head(wb_gdp)
# A tibble: 6 x 3
`Data Source` `World Development Indicators` X3
<chr> <chr> <chr>
1 Last Updated Date 2019-03-21 <NA>
2 Country Name Country Code Indicator Name
3 Aruba ABW GDP per capita (current US$)
4 Afghanistan AFG GDP per capita (current US$)
5 Angola AGO GDP per capita (current US$)
6 Albania ALB GDP per capita (current US$)
data
using LibreOffice(or Excel)and check the dataI emphasize the first 4 yellow lines so that you can easily recognize them
read_csv()
, then RStudio automatically recognize the first row of the csv file as variable names<- read_csv("data/wb_gdp_pc.csv", skip = 4) wb_gdp
Warning: Missing column names filled in: 'X64' [64]
─ Column specification ────────────────────────────
cols(
.default = col_double(),
`Country Name` = col_character(),
`Country Code` = col_character(),
`Indicator Name` = col_character(),
`Indicator Code` = col_character(),
`2018` = col_logical(),
X64 = col_logical()
)
ℹ Use `spec()` for the full column specifications.
str()
enables us to check the variable classstr(wb_gdp)
spec_tbl_df [264 × 64] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
$ Country Name : chr [1:264] "Aruba" "Afghanistan" "Angola" "Albania" ...
$ Country Code : chr [1:264] "ABW" "AFG" "AGO" "ALB" ...
$ Indicator Name: chr [1:264] "GDP per capita (current US$)" "GDP per capita (current US$)" "GDP per capita (current US$)" "GDP per capita (current US$)" ...
$ Indicator Code: chr [1:264] "NY.GDP.PCAP.CD" "NY.GDP.PCAP.CD" "NY.GDP.PCAP.CD" "NY.GDP.PCAP.CD" ...
$ 1960 : num [1:264] NA 59.8 NA NA NA ...
$ 1961 : num [1:264] NA 59.9 NA NA NA ...
$ 1962 : num [1:264] NA 58.5 NA NA NA ...
$ 1963 : num [1:264] NA 78.8 NA NA NA ...
$ 1964 : num [1:264] NA 82.2 NA NA NA ...
$ 1965 : num [1:264] NA 101 NA NA NA ...
$ 1966 : num [1:264] NA 138 NA NA NA ...
$ 1967 : num [1:264] NA 161 NA NA NA ...
$ 1968 : num [1:264] NA 130 NA NA NA ...
$ 1969 : num [1:264] NA 130 NA NA NA ...
$ 1970 : num [1:264] NA 157 NA NA 3239 ...
$ 1971 : num [1:264] NA 160 NA NA 3498 ...
$ 1972 : num [1:264] NA 136 NA NA 4217 ...
$ 1973 : num [1:264] NA 144 NA NA 5342 ...
$ 1974 : num [1:264] NA 175 NA NA 6320 ...
$ 1975 : num [1:264] NA 188 NA NA 7169 ...
$ 1976 : num [1:264] NA 199 NA NA 7152 ...
$ 1977 : num [1:264] NA 226 NA NA 7751 ...
$ 1978 : num [1:264] NA 249 NA NA 9130 ...
$ 1979 : num [1:264] NA 278 NA NA 11821 ...
$ 1980 : num [1:264] NA 275 664 NA 12377 ...
$ 1981 : num [1:264] NA 266 600 NA 10372 ...
$ 1982 : num [1:264] NA NA 579 NA 9610 ...
$ 1983 : num [1:264] NA NA 582 NA 8023 ...
$ 1984 : num [1:264] NA NA 597 639 7729 ...
$ 1985 : num [1:264] NA NA 712 640 7774 ...
$ 1986 : num [1:264] 6473 NA 648 694 10362 ...
$ 1987 : num [1:264] 7886 NA 721 675 12616 ...
$ 1988 : num [1:264] 9765 NA 762 653 14304 ...
$ 1989 : num [1:264] 11392 NA 863 698 15166 ...
$ 1990 : num [1:264] 12307 NA 923 617 18879 ...
$ 1991 : num [1:264] 13496 NA 845 337 19533 ...
$ 1992 : num [1:264] 14047 NA 641 201 20548 ...
$ 1993 : num [1:264] 14937 NA 430 367 16516 ...
$ 1994 : num [1:264] 16241 NA 321 586 16235 ...
$ 1995 : num [1:264] 16439 NA 388 751 18461 ...
$ 1996 : num [1:264] 16586 NA 513 1010 19017 ...
$ 1997 : num [1:264] 17928 NA 507 717 18353 ...
$ 1998 : num [1:264] 19078 NA 420 814 18895 ...
$ 1999 : num [1:264] 19356 NA 386 1033 19262 ...
$ 2000 : num [1:264] 20621 NA 555 1127 21937 ...
$ 2001 : num [1:264] 20669 NA 526 1282 22229 ...
$ 2002 : num [1:264] 20437 184 870 1425 24741 ...
$ 2003 : num [1:264] 20834 196 979 1846 32776 ...
$ 2004 : num [1:264] 22570 217 1248 2374 38503 ...
$ 2005 : num [1:264] 23300 248 1891 2674 41282 ...
$ 2006 : num [1:264] 24046 269 2585 2973 43749 ...
$ 2007 : num [1:264] 25836 366 3108 3595 48583 ...
$ 2008 : num [1:264] 27086 370 4069 4371 47786 ...
$ 2009 : num [1:264] 24631 444 3118 4114 43339 ...
$ 2010 : num [1:264] 23513 551 3586 4094 39736 ...
$ 2011 : num [1:264] 24984 599 4616 4437 41099 ...
$ 2012 : num [1:264] 24710 649 5102 4248 38391 ...
$ 2013 : num [1:264] 25018 648 5258 4413 40620 ...
$ 2014 : num [1:264] 25528 625 5413 4579 42295 ...
$ 2015 : num [1:264] 25796 590 4171 3953 36038 ...
$ 2016 : num [1:264] 25252 550 3510 4132 37232 ...
$ 2017 : num [1:264] 25655 550 4100 4538 39147 ...
$ 2018 : logi [1:264] NA NA NA NA NA NA ...
$ X64 : logi [1:264] NA NA NA NA NA NA ...
- attr(*, "spec")=
.. cols(
.. `Country Name` = col_character(),
.. `Country Code` = col_character(),
.. `Indicator Name` = col_character(),
.. `Indicator Code` = col_character(),
.. `1960` = col_double(),
.. `1961` = col_double(),
.. `1962` = col_double(),
.. `1963` = col_double(),
.. `1964` = col_double(),
.. `1965` = col_double(),
.. `1966` = col_double(),
.. `1967` = col_double(),
.. `1968` = col_double(),
.. `1969` = col_double(),
.. `1970` = col_double(),
.. `1971` = col_double(),
.. `1972` = col_double(),
.. `1973` = col_double(),
.. `1974` = col_double(),
.. `1975` = col_double(),
.. `1976` = col_double(),
.. `1977` = col_double(),
.. `1978` = col_double(),
.. `1979` = col_double(),
.. `1980` = col_double(),
.. `1981` = col_double(),
.. `1982` = col_double(),
.. `1983` = col_double(),
.. `1984` = col_double(),
.. `1985` = col_double(),
.. `1986` = col_double(),
.. `1987` = col_double(),
.. `1988` = col_double(),
.. `1989` = col_double(),
.. `1990` = col_double(),
.. `1991` = col_double(),
.. `1992` = col_double(),
.. `1993` = col_double(),
.. `1994` = col_double(),
.. `1995` = col_double(),
.. `1996` = col_double(),
.. `1997` = col_double(),
.. `1998` = col_double(),
.. `1999` = col_double(),
.. `2000` = col_double(),
.. `2001` = col_double(),
.. `2002` = col_double(),
.. `2003` = col_double(),
.. `2004` = col_double(),
.. `2005` = col_double(),
.. `2006` = col_double(),
.. `2007` = col_double(),
.. `2008` = col_double(),
.. `2009` = col_double(),
.. `2010` = col_double(),
.. `2011` = col_double(),
.. `2012` = col_double(),
.. `2013` = col_double(),
.. `2014` = col_double(),
.. `2015` = col_double(),
.. `2016` = col_double(),
.. `2017` = col_double(),
.. `2018` = col_logical(),
.. X64 = col_logical()
.. )
Country Name
and Country Code
as character <chr>
1960
as double <dbl>
NA
means (missing value)
names(wb_gdp)
[1] "Country Name" "Country Code" "Indicator Name" "Indicator Code"
[5] "1960" "1961" "1962" "1963"
[9] "1964" "1965" "1966" "1967"
[13] "1968" "1969" "1970" "1971"
[17] "1972" "1973" "1974" "1975"
[21] "1976" "1977" "1978" "1979"
[25] "1980" "1981" "1982" "1983"
[29] "1984" "1985" "1986" "1987"
[33] "1988" "1989" "1990" "1991"
[37] "1992" "1993" "1994" "1995"
[41] "1996" "1997" "1998" "1999"
[45] "2000" "2001" "2002" "2003"
[49] "2004" "2005" "2006" "2007"
[53] "2008" "2009" "2010" "2011"
[57] "2012" "2013" "2014" "2015"
[61] "2016" "2017" "2018" "X64"
We select variables we use
- We need to solve problems one by one which prevents us from conducting quantitative analysis
- Since we do not know what X64
is, check it
$X64 wb_gdp
[1] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
[26] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
[51] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
[76] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
[101] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
[126] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
[151] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
[176] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
[201] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
[226] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
[251] NA NA NA NA NA NA NA NA NA NA NA NA NA NA
=NA
)→ The followings are what we need
<- wb_gdp %>%
gdp select("Country Name",
"1960":"2018")
names(gdp)
[1] "Country Name" "1960" "1961" "1962" "1963"
[6] "1964" "1965" "1966" "1967" "1968"
[11] "1969" "1970" "1971" "1972" "1973"
[16] "1974" "1975" "1976" "1977" "1978"
[21] "1979" "1980" "1981" "1982" "1983"
[26] "1984" "1985" "1986" "1987" "1988"
[31] "1989" "1990" "1991" "1992" "1993"
[36] "1994" "1995" "1996" "1997" "1998"
[41] "1999" "2000" "2001" "2002" "2003"
[46] "2004" "2005" "2006" "2007" "2008"
[51] "2009" "2010" "2011" "2012" "2013"
[56] "2014" "2015" "2016" "2017" "2018"
Fix the name of variables
Country Name
=> country
<- gdp %>%
gdp rename(country = "Country Name")
names(gdp)
[1] "country" "1960" "1961" "1962" "1963" "1964" "1965"
[8] "1966" "1967" "1968" "1969" "1970" "1971" "1972"
[15] "1973" "1974" "1975" "1976" "1977" "1978" "1979"
[22] "1980" "1981" "1982" "1983" "1984" "1985" "1986"
[29] "1987" "1988" "1989" "1990" "1991" "1992" "1993"
[36] "1994" "1995" "1996" "1997" "1998" "1999" "2000"
[43] "2001" "2002" "2003" "2004" "2005" "2006" "2007"
[50] "2008" "2009" "2010" "2011" "2012" "2013" "2014"
[57] "2015" "2016" "2017" "2018"
gdp
dim(gdp)
[1] 264 60
The sample size (N) of gdp
is 264
The number of variables is 60
Using DT::datatable()
function, we can see how the entire data set looks like
::datatable(gdp) DT
gdp
tidyr::pivot_longer()
function, we convert wide to long formatgdp_long
<- gdp %>%
gdp_long ::pivot_longer("1960":"2018", # Range of variables you want to convert
tidyrnames_to = "year", # Put the name of variables of wide format into year
values_to = "GDP") %>% # Put the name of vaariables of wide format into GDP
drop_na() # Drop missing values
gdp_long
::datatable(gdp_long) DT
gdp_long
str(gdp_long)
tibble [11,824 × 3] (S3: tbl_df/tbl/data.frame)
$ country: chr [1:11824] "Aruba" "Aruba" "Aruba" "Aruba" ...
$ year : chr [1:11824] "1986" "1987" "1988" "1989" ...
$ GDP : num [1:11824] 6473 7886 9765 11392 12307 ...
year
from character
to numeric
$year <- as.numeric(gdp_long$year) gdp_long
str(gdp_long)
tibble [11,824 × 3] (S3: tbl_df/tbl/data.frame)
$ country: chr [1:11824] "Aruba" "Aruba" "Aruba" "Aruba" ...
$ year : num [1:11824] 1986 1987 1988 1989 1990 ...
$ GDP : num [1:11824] 6473 7886 9765 11392 12307 ...
GDP
filter()
function extract the data needed and name it jpn.chi
<- gdp_long %>%
jpn.chi filter(country == "Japan" | country == "China")
・You should add the following command to avoid text garbling when using Japanese and drawing figures with ggplot()
function
theme_set(theme_classic(base_size = 10,
base_family = "HiraginoSans-W3"))
%>%
jpn.chi ggplot(aes(x = year, y = GDP,
color = country,
linetype = country,
shape = country)) +
geom_point() +
geom_line() +
ggtitle("Transition of GDP Per Capita (1980-2017) between Japan and China") +
labs(x = "Year", y = "GDP per capita (US$)") +
theme(legend.position = c(0.1, 0.8)) +
xlim(1980, 2017) # Delete the dta of 2018
Freedom House
Freedom House
dataFreedom House
data (1972-2016)PR: political rights
CL: civil liberties
Status
Variables | Variable Class | Details |
---|---|---|
PR |
numeric | political right (Best = 1, Worst = 7) |
CL |
numeric | civil liberties (Best = 1, Worst = 7) |
status |
categorical | F: free, PF: partly free, NF: not free |
year |
categorical | 1972-2016 |
PR and CL are measured on a one-to-seven scale, with one representing the highest degree of Freedom and seven the lowest.
Load readx1
package to read excel file
library(readxl)
Download Freedom House and read it
Prior reading the data, open the original data (FH_Country.xls
) file in either LibreOffice or Excel
Country Ratings, Statuses
)sheet = 2
Assign the sheet number and the row
skip = 2
<- read_excel("data/FH_Country.xls",
fh sheet = 2,
skip = 2)
Check the class
of each variable
str(fh)
tibble [205 × 133] (S3: tbl_df/tbl/data.frame)
$ ...1 : chr [1:205] "Afghanistan" "Albania" "Algeria" "Andorra" ...
$ PR...2 : chr [1:205] "4" "7" "6" "4" ...
$ CL...3 : chr [1:205] "5" "7" "6" "3" ...
$ Status...4 : chr [1:205] "PF" "NF" "NF" "PF" ...
$ PR...5 : chr [1:205] "7" "7" "6" "4" ...
$ CL...6 : chr [1:205] "6" "7" "6" "4" ...
$ Status...7 : chr [1:205] "NF" "NF" "NF" "PF" ...
$ PR...8 : chr [1:205] "7" "7" "6" "4" ...
$ CL...9 : chr [1:205] "6" "7" "6" "4" ...
$ Status...10 : chr [1:205] "NF" "NF" "NF" "PF" ...
$ PR...11 : chr [1:205] "7" "7" "7" "4" ...
$ CL...12 : chr [1:205] "6" "7" "6" "4" ...
$ Status...13 : chr [1:205] "NF" "NF" "NF" "PF" ...
$ PR...14 : chr [1:205] "7" "7" "6" "4" ...
$ CL...15 : chr [1:205] "6" "7" "6" "4" ...
$ Status...16 : chr [1:205] "NF" "NF" "NF" "PF" ...
$ PR...17 : chr [1:205] "6" "7" "6" "-" ...
$ CL...18 : chr [1:205] "6" "7" "6" "-" ...
$ Status...19 : chr [1:205] "NF" "NF" "NF" "-" ...
$ PR...20 : chr [1:205] "7" "7" "6" "-" ...
$ CL...21 : chr [1:205] "7" "7" "6" "-" ...
$ Status...22 : chr [1:205] "NF" "NF" "NF" "-" ...
$ PR...23 : chr [1:205] "7" "7" "6" "-" ...
$ CL...24 : chr [1:205] "7" "7" "6" "-" ...
$ Status...25 : chr [1:205] "NF" "NF" "NF" "-" ...
$ PR...26 : chr [1:205] "7" "7" "6" "-" ...
$ CL...27 : chr [1:205] "7" "7" "6" "-" ...
$ Status...28 : chr [1:205] "NF" "NF" "NF" "-" ...
$ PR...29 : chr [1:205] "7" "7" "6" "-" ...
$ CL...30 : chr [1:205] "7" "7" "6" "-" ...
$ Status...31 : chr [1:205] "NF" "NF" "NF" "-" ...
$ PR...32 : chr [1:205] "7" "7" "6" "-" ...
$ CL...33 : chr [1:205] "7" "7" "6" "-" ...
$ Status...34 : chr [1:205] "NF" "NF" "NF" "-" ...
$ PR...35 : chr [1:205] "7" "7" "6" "-" ...
$ CL...36 : chr [1:205] "7" "7" "6" "-" ...
$ Status...37 : chr [1:205] "NF" "NF" "NF" "-" ...
$ PR...38 : chr [1:205] "7" "7" "6" "-" ...
$ CL...39 : chr [1:205] "7" "7" "6" "-" ...
$ Status...40 : chr [1:205] "NF" "NF" "NF" "-" ...
$ PR...41 : chr [1:205] "7" "7" "6" "-" ...
$ CL...42 : chr [1:205] "7" "7" "6" "-" ...
$ Status...43 : chr [1:205] "NF" "NF" "NF" "-" ...
$ PR...44 : chr [1:205] "7" "7" "6" "-" ...
$ CL...45 : chr [1:205] "7" "7" "6" "-" ...
$ Status...46 : chr [1:205] "NF" "NF" "NF" "-" ...
$ PR...47 : chr [1:205] "6" "7" "5" "-" ...
$ CL...48 : chr [1:205] "6" "7" "6" "-" ...
$ Status...49 : chr [1:205] "NF" "NF" "NF" "-" ...
$ PR...50 : chr [1:205] "7" "7" "6" "-" ...
$ CL...51 : chr [1:205] "7" "7" "4" "-" ...
$ Status...52 : chr [1:205] "NF" "NF" "PF" "-" ...
$ PR...53 : chr [1:205] "7" "7" "4" "-" ...
$ CL...54 : chr [1:205] "7" "6" "4" "-" ...
$ Status...55 : chr [1:205] "NF" "NF" "PF" "-" ...
$ PR...56 : chr [1:205] "7" "4" "4" "-" ...
$ CL...57 : chr [1:205] "7" "4" "4" "-" ...
$ Status...58 : chr [1:205] "NF" "PF" "PF" "-" ...
$ PR...59 : chr [1:205] "6" "4" "7" "-" ...
$ CL...60 : chr [1:205] "6" "3" "6" "-" ...
$ Status...61 : chr [1:205] "NF" "PF" "NF" "-" ...
$ PR...62 : chr [1:205] "7" "2" "7" "2" ...
$ CL...63 : chr [1:205] "7" "4" "6" "1" ...
$ Status...64 : chr [1:205] "NF" "PF" "NF" "F" ...
$ PR...65 : chr [1:205] "7" "3" "7" "1" ...
$ CL...66 : chr [1:205] "7" "4" "7" "1" ...
$ Status...67 : chr [1:205] "NF" "PF" "NF" "F" ...
$ PR...68 : chr [1:205] "7" "3" "6" "1" ...
$ CL...69 : chr [1:205] "7" "4" "6" "1" ...
$ Status...70 : chr [1:205] "NF" "PF" "NF" "F" ...
$ PR...71 : chr [1:205] "7" "4" "6" "1" ...
$ CL...72 : chr [1:205] "7" "4" "6" "1" ...
$ Status...73 : chr [1:205] "NF" "PF" "NF" "F" ...
$ PR...74 : chr [1:205] "7" "4" "6" "1" ...
$ CL...75 : chr [1:205] "7" "4" "6" "1" ...
$ Status...76 : chr [1:205] "NF" "PF" "NF" "F" ...
$ PR...77 : chr [1:205] "7" "4" "6" "1" ...
$ CL...78 : chr [1:205] "7" "5" "5" "1" ...
$ Status...79 : chr [1:205] "NF" "PF" "NF" "F" ...
$ PR...80 : chr [1:205] "7" "4" "6" "1" ...
$ CL...81 : chr [1:205] "7" "5" "5" "1" ...
$ Status...82 : chr [1:205] "NF" "PF" "NF" "F" ...
$ PR...83 : chr [1:205] "7" "4" "6" "1" ...
$ CL...84 : chr [1:205] "7" "5" "5" "1" ...
$ Status...85 : chr [1:205] "NF" "PF" "NF" "F" ...
$ PR...86 : chr [1:205] "7" "3" "6" "1" ...
$ CL...87 : chr [1:205] "7" "4" "5" "1" ...
$ Status...88 : chr [1:205] "NF" "PF" "NF" "F" ...
$ PR...89 : chr [1:205] "6" "3" "6" "1" ...
$ CL...90 : chr [1:205] "6" "3" "5" "1" ...
$ Status...91 : chr [1:205] "NF" "PF" "NF" "F" ...
$ PR...92 : chr [1:205] "6" "3" "6" "1" ...
$ CL...93 : chr [1:205] "6" "3" "5" "1" ...
$ Status...94 : chr [1:205] "NF" "PF" "NF" "F" ...
$ PR...95 : chr [1:205] "5" "3" "6" "1" ...
$ CL...96 : chr [1:205] "6" "3" "5" "1" ...
$ Status...97 : chr [1:205] "NF" "PF" "NF" "F" ...
$ PR...98 : chr [1:205] "5" "3" "6" "1" ...
$ CL...99 : chr [1:205] "5" "3" "5" "1" ...
[list output truncated]
chr
(= character)chr
(= character)PR (political rights)
and CL (civil liberty)
to be chr
(= character)numeric
→ This should be fixedSolution:
- You can see -
in the spread sheet
- This means a missing value in Freedom House
data set
- RStudio recognizes a blank
as a missing value
and show it -
→ We need to let RStudion recognize "-"
means missing value
→ Add the following command: na = "-"
<- read_excel("data/FH_Country.xls",
fh sheet = 2,
skip = 2,
na = "-") # NA = "-" でも可
str(fh)
tibble [205 × 133] (S3: tbl_df/tbl/data.frame)
$ ...1 : chr [1:205] "Afghanistan" "Albania" "Algeria" "Andorra" ...
$ PR...2 : chr [1:205] "4" "7" "6" "4" ...
$ CL...3 : chr [1:205] "5" "7" "6" "3" ...
$ Status...4 : chr [1:205] "PF" "NF" "NF" "PF" ...
$ PR...5 : num [1:205] 7 7 6 4 NA NA 2 NA 1 1 ...
$ CL...6 : num [1:205] 6 7 6 4 NA NA 2 NA 1 1 ...
$ Status...7 : chr [1:205] "NF" "NF" "NF" "PF" ...
$ PR...8 : num [1:205] 7 7 6 4 NA NA 2 NA 1 1 ...
$ CL...9 : num [1:205] 6 7 6 4 NA NA 4 NA 1 1 ...
$ Status...10 : chr [1:205] "NF" "NF" "NF" "PF" ...
$ PR...11 : num [1:205] 7 7 7 4 6 NA 2 NA 1 1 ...
$ CL...12 : num [1:205] 6 7 6 4 6 NA 4 NA 1 1 ...
$ Status...13 : chr [1:205] "NF" "NF" "NF" "PF" ...
$ PR...14 : num [1:205] 7 7 6 4 6 NA 6 NA 1 1 ...
$ CL...15 : num [1:205] 6 7 6 4 6 NA 5 NA 1 1 ...
$ Status...16 : chr [1:205] "NF" "NF" "NF" "PF" ...
$ PR...17 : num [1:205] 6 7 6 NA 7 NA 6 NA 1 1 ...
$ CL...18 : num [1:205] 6 7 6 NA 7 NA 6 NA 1 1 ...
$ Status...19 : chr [1:205] "NF" "NF" "NF" NA ...
$ PR...20 : num [1:205] 7 7 6 NA 7 NA 6 NA 1 1 ...
$ CL...21 : num [1:205] 7 7 6 NA 7 NA 5 NA 1 1 ...
$ Status...22 : chr [1:205] "NF" "NF" "NF" NA ...
$ PR...23 : num [1:205] 7 7 6 NA 7 NA 6 NA 1 1 ...
$ CL...24 : num [1:205] 7 7 6 NA 7 NA 5 NA 1 1 ...
$ Status...25 : chr [1:205] "NF" "NF" "NF" NA ...
$ PR...26 : num [1:205] 7 7 6 NA 7 NA 6 NA 1 1 ...
$ CL...27 : num [1:205] 7 7 6 NA 7 NA 5 NA 1 1 ...
$ Status...28 : chr [1:205] "NF" "NF" "NF" NA ...
$ PR...29 : num [1:205] 7 7 6 NA 7 2 6 NA 1 1 ...
$ CL...30 : num [1:205] 7 7 6 NA 7 2 5 NA 1 1 ...
$ Status...31 : chr [1:205] "NF" "NF" "NF" NA ...
$ PR...32 : num [1:205] 7 7 6 NA 7 2 3 NA 1 1 ...
$ CL...33 : num [1:205] 7 7 6 NA 7 3 3 NA 1 1 ...
$ Status...34 : chr [1:205] "NF" "NF" "NF" NA ...
$ PR...35 : num [1:205] 7 7 6 NA 7 2 2 NA 1 1 ...
$ CL...36 : num [1:205] 7 7 6 NA 7 3 2 NA 1 1 ...
$ Status...37 : chr [1:205] "NF" "NF" "NF" NA ...
$ PR...38 : num [1:205] 7 7 6 NA 7 2 2 NA 1 1 ...
$ CL...39 : num [1:205] 7 7 6 NA 7 3 2 NA 1 1 ...
$ Status...40 : chr [1:205] "NF" "NF" "NF" NA ...
$ PR...41 : num [1:205] 7 7 6 NA 7 2 2 NA 1 1 ...
$ CL...42 : num [1:205] 7 7 6 NA 7 3 1 NA 1 1 ...
$ Status...43 : chr [1:205] "NF" "NF" "NF" NA ...
$ PR...44 : num [1:205] 7 7 6 NA 7 2 2 NA 1 1 ...
$ CL...45 : num [1:205] 7 7 6 NA 7 3 1 NA 1 1 ...
$ Status...46 : chr [1:205] "NF" "NF" "NF" NA ...
$ PR...47 : num [1:205] 6 7 5 NA 7 2 2 NA 1 1 ...
$ CL...48 : num [1:205] 6 7 6 NA 7 3 1 NA 1 1 ...
$ Status...49 : chr [1:205] "NF" "NF" "NF" NA ...
$ PR...50 : num [1:205] 7 7 6 NA 7 2 1 NA 1 1 ...
$ CL...51 : num [1:205] 7 7 4 NA 7 3 2 NA 1 1 ...
$ Status...52 : chr [1:205] "NF" "NF" "PF" NA ...
$ PR...53 : num [1:205] 7 7 4 NA 7 3 1 NA 1 1 ...
$ CL...54 : num [1:205] 7 6 4 NA 7 2 3 NA 1 1 ...
$ Status...55 : chr [1:205] "NF" "NF" "PF" NA ...
$ PR...56 : num [1:205] 7 4 4 NA 6 3 1 5 1 1 ...
$ CL...57 : num [1:205] 7 4 4 NA 4 3 3 5 1 1 ...
$ Status...58 : chr [1:205] "NF" "PF" "PF" NA ...
$ PR...59 : num [1:205] 6 4 7 NA 6 3 2 4 1 1 ...
$ CL...60 : num [1:205] 6 3 6 NA 6 3 3 3 1 1 ...
$ Status...61 : chr [1:205] "NF" "PF" "NF" NA ...
$ PR...62 : num [1:205] 7 2 7 2 7 4 2 3 1 1 ...
$ CL...63 : num [1:205] 7 4 6 1 7 3 3 4 1 1 ...
$ Status...64 : chr [1:205] "NF" "PF" "NF" "F" ...
$ PR...65 : num [1:205] 7 3 7 1 7 4 2 3 1 1 ...
$ CL...66 : num [1:205] 7 4 7 1 7 3 3 4 1 1 ...
$ Status...67 : chr [1:205] "NF" "PF" "NF" "F" ...
$ PR...68 : num [1:205] 7 3 6 1 6 4 2 4 1 1 ...
$ CL...69 : num [1:205] 7 4 6 1 6 3 3 4 1 1 ...
$ Status...70 : chr [1:205] "NF" "PF" "NF" "F" ...
$ PR...71 : num [1:205] 7 4 6 1 6 4 2 5 1 1 ...
$ CL...72 : num [1:205] 7 4 6 1 6 3 3 4 1 1 ...
$ Status...73 : chr [1:205] "NF" "PF" "NF" "F" ...
$ PR...74 : num [1:205] 7 4 6 1 6 4 2 5 1 1 ...
$ CL...75 : num [1:205] 7 4 6 1 6 3 3 4 1 1 ...
$ Status...76 : chr [1:205] "NF" "PF" "NF" "F" ...
$ PR...77 : num [1:205] 7 4 6 1 6 4 3 4 1 1 ...
$ CL...78 : num [1:205] 7 5 5 1 6 3 3 4 1 1 ...
$ Status...79 : chr [1:205] "NF" "PF" "NF" "F" ...
$ PR...80 : num [1:205] 7 4 6 1 6 4 2 4 1 1 ...
$ CL...81 : num [1:205] 7 5 5 1 6 3 3 4 1 1 ...
$ Status...82 : chr [1:205] "NF" "PF" "NF" "F" ...
$ PR...83 : num [1:205] 7 4 6 1 6 4 1 4 1 1 ...
$ CL...84 : num [1:205] 7 5 5 1 6 2 2 4 1 1 ...
$ Status...85 : chr [1:205] "NF" "PF" "NF" "F" ...
$ PR...86 : num [1:205] 7 3 6 1 6 4 3 4 1 1 ...
$ CL...87 : num [1:205] 7 4 5 1 6 2 3 4 1 1 ...
$ Status...88 : chr [1:205] "NF" "PF" "NF" "F" ...
$ PR...89 : num [1:205] 6 3 6 1 6 4 3 4 1 1 ...
$ CL...90 : num [1:205] 6 3 5 1 5 2 3 4 1 1 ...
$ Status...91 : chr [1:205] "NF" "PF" "NF" "F" ...
$ PR...92 : num [1:205] 6 3 6 1 6 4 2 4 1 1 ...
$ CL...93 : num [1:205] 6 3 5 1 5 2 2 4 1 1 ...
$ Status...94 : chr [1:205] "NF" "PF" "NF" "F" ...
$ PR...95 : num [1:205] 5 3 6 1 6 2 2 5 1 1 ...
$ CL...96 : num [1:205] 6 3 5 1 5 2 2 4 1 1 ...
$ Status...97 : chr [1:205] "NF" "PF" "NF" "F" ...
$ PR...98 : num [1:205] 5 3 6 1 6 2 2 5 1 1 ...
$ CL...99 : num [1:205] 5 3 5 1 5 2 2 4 1 1 ...
[list output truncated]
All variables except PR...
and CL...
are recognized as numeric
PR...2
and CL...3
are recognized as character
→ This should be fixed
We need to know why these two variables (PR...2
and CL...3
) are not changed to numeric
→ We need to change the class of these two variables to numeric
from character
Using unique()
function, check the values of PR...2
unique(fh$PR...2)
[1] "4" "7" "6" NA "1" "2" "5" "3" "2(5)"
2(5)
is included!
2(5)
is not a numeric
but a character
The value of character
variable is shown with ""
Since 2(5)
is not a numeric
, the value was shown with ""
→ NA
is an exception in RStudio
→ NA
is not recognized as a character
Because PR...2
contains 2(5)
, PR...2
is recognized as character
variable
→ This is the reason!
Using unique()
function, check the values of CL...3
unique(fh$CL...3)
[1] "5" "7" "6" "3" NA "1" "4" "2" "3(6)"
3(6)
is included!3(6)
is not a numeric
but a character
character
variable is shown with ""
3(6)
is not a numeric
, the value was shown with ""
NA
is an exception in RStudioNA
is not recognized as a character
CL...3
contains 3(6)
, CL...3
is recognized as character
variableSolution:
if_else()
function, replace 2(5)
and 3(6)
with NA
fh_na
<- fh %>%
fh_na ::mutate(
dplyrPR...2 = if_else(PR...2 == "2(5)", "NA", PR...2),
CL...3 = if_else(CL...3 == "3(6)", "NA", CL...3)) %>%
mutate(across(c(PR...2, CL...3), as.numeric))
unique()
functio, check the value of PR...2
unique(fh_na$PR...2)
[1] 4 7 6 NA 1 2 5 3
NA
is not shown with ""
→ NA
is econgized asmissing value
Using unique()
function, check the value of CL...3
unique(fh_na$CL...3)
[1] 5 7 6 3 NA 1 4 2
NA
is not shown with ""
→ NA
is econgized asmissing value
Using unique()
function, check the class of PR...2
and CL...3
str(fh_na$PR...2)
num [1:205] 4 7 6 4 NA NA 6 NA 1 1 ...
str(fh_na$CL...3)
num [1:205] 5 7 6 3 NA NA 3 NA 1 1 ...
PR...2
and CL...3
are recognized as numeric
fh_na
str(fh_na)
tibble [205 × 133] (S3: tbl_df/tbl/data.frame)
$ ...1 : chr [1:205] "Afghanistan" "Albania" "Algeria" "Andorra" ...
$ PR...2 : num [1:205] 4 7 6 4 NA NA 6 NA 1 1 ...
$ CL...3 : num [1:205] 5 7 6 3 NA NA 3 NA 1 1 ...
$ Status...4 : chr [1:205] "PF" "NF" "NF" "PF" ...
$ PR...5 : num [1:205] 7 7 6 4 NA NA 2 NA 1 1 ...
$ CL...6 : num [1:205] 6 7 6 4 NA NA 2 NA 1 1 ...
$ Status...7 : chr [1:205] "NF" "NF" "NF" "PF" ...
$ PR...8 : num [1:205] 7 7 6 4 NA NA 2 NA 1 1 ...
$ CL...9 : num [1:205] 6 7 6 4 NA NA 4 NA 1 1 ...
$ Status...10 : chr [1:205] "NF" "NF" "NF" "PF" ...
$ PR...11 : num [1:205] 7 7 7 4 6 NA 2 NA 1 1 ...
$ CL...12 : num [1:205] 6 7 6 4 6 NA 4 NA 1 1 ...
$ Status...13 : chr [1:205] "NF" "NF" "NF" "PF" ...
$ PR...14 : num [1:205] 7 7 6 4 6 NA 6 NA 1 1 ...
$ CL...15 : num [1:205] 6 7 6 4 6 NA 5 NA 1 1 ...
$ Status...16 : chr [1:205] "NF" "NF" "NF" "PF" ...
$ PR...17 : num [1:205] 6 7 6 NA 7 NA 6 NA 1 1 ...
$ CL...18 : num [1:205] 6 7 6 NA 7 NA 6 NA 1 1 ...
$ Status...19 : chr [1:205] "NF" "NF" "NF" NA ...
$ PR...20 : num [1:205] 7 7 6 NA 7 NA 6 NA 1 1 ...
$ CL...21 : num [1:205] 7 7 6 NA 7 NA 5 NA 1 1 ...
$ Status...22 : chr [1:205] "NF" "NF" "NF" NA ...
$ PR...23 : num [1:205] 7 7 6 NA 7 NA 6 NA 1 1 ...
$ CL...24 : num [1:205] 7 7 6 NA 7 NA 5 NA 1 1 ...
$ Status...25 : chr [1:205] "NF" "NF" "NF" NA ...
$ PR...26 : num [1:205] 7 7 6 NA 7 NA 6 NA 1 1 ...
$ CL...27 : num [1:205] 7 7 6 NA 7 NA 5 NA 1 1 ...
$ Status...28 : chr [1:205] "NF" "NF" "NF" NA ...
$ PR...29 : num [1:205] 7 7 6 NA 7 2 6 NA 1 1 ...
$ CL...30 : num [1:205] 7 7 6 NA 7 2 5 NA 1 1 ...
$ Status...31 : chr [1:205] "NF" "NF" "NF" NA ...
$ PR...32 : num [1:205] 7 7 6 NA 7 2 3 NA 1 1 ...
$ CL...33 : num [1:205] 7 7 6 NA 7 3 3 NA 1 1 ...
$ Status...34 : chr [1:205] "NF" "NF" "NF" NA ...
$ PR...35 : num [1:205] 7 7 6 NA 7 2 2 NA 1 1 ...
$ CL...36 : num [1:205] 7 7 6 NA 7 3 2 NA 1 1 ...
$ Status...37 : chr [1:205] "NF" "NF" "NF" NA ...
$ PR...38 : num [1:205] 7 7 6 NA 7 2 2 NA 1 1 ...
$ CL...39 : num [1:205] 7 7 6 NA 7 3 2 NA 1 1 ...
$ Status...40 : chr [1:205] "NF" "NF" "NF" NA ...
$ PR...41 : num [1:205] 7 7 6 NA 7 2 2 NA 1 1 ...
$ CL...42 : num [1:205] 7 7 6 NA 7 3 1 NA 1 1 ...
$ Status...43 : chr [1:205] "NF" "NF" "NF" NA ...
$ PR...44 : num [1:205] 7 7 6 NA 7 2 2 NA 1 1 ...
$ CL...45 : num [1:205] 7 7 6 NA 7 3 1 NA 1 1 ...
$ Status...46 : chr [1:205] "NF" "NF" "NF" NA ...
$ PR...47 : num [1:205] 6 7 5 NA 7 2 2 NA 1 1 ...
$ CL...48 : num [1:205] 6 7 6 NA 7 3 1 NA 1 1 ...
$ Status...49 : chr [1:205] "NF" "NF" "NF" NA ...
$ PR...50 : num [1:205] 7 7 6 NA 7 2 1 NA 1 1 ...
$ CL...51 : num [1:205] 7 7 4 NA 7 3 2 NA 1 1 ...
$ Status...52 : chr [1:205] "NF" "NF" "PF" NA ...
$ PR...53 : num [1:205] 7 7 4 NA 7 3 1 NA 1 1 ...
$ CL...54 : num [1:205] 7 6 4 NA 7 2 3 NA 1 1 ...
$ Status...55 : chr [1:205] "NF" "NF" "PF" NA ...
$ PR...56 : num [1:205] 7 4 4 NA 6 3 1 5 1 1 ...
$ CL...57 : num [1:205] 7 4 4 NA 4 3 3 5 1 1 ...
$ Status...58 : chr [1:205] "NF" "PF" "PF" NA ...
$ PR...59 : num [1:205] 6 4 7 NA 6 3 2 4 1 1 ...
$ CL...60 : num [1:205] 6 3 6 NA 6 3 3 3 1 1 ...
$ Status...61 : chr [1:205] "NF" "PF" "NF" NA ...
$ PR...62 : num [1:205] 7 2 7 2 7 4 2 3 1 1 ...
$ CL...63 : num [1:205] 7 4 6 1 7 3 3 4 1 1 ...
$ Status...64 : chr [1:205] "NF" "PF" "NF" "F" ...
$ PR...65 : num [1:205] 7 3 7 1 7 4 2 3 1 1 ...
$ CL...66 : num [1:205] 7 4 7 1 7 3 3 4 1 1 ...
$ Status...67 : chr [1:205] "NF" "PF" "NF" "F" ...
$ PR...68 : num [1:205] 7 3 6 1 6 4 2 4 1 1 ...
$ CL...69 : num [1:205] 7 4 6 1 6 3 3 4 1 1 ...
$ Status...70 : chr [1:205] "NF" "PF" "NF" "F" ...
$ PR...71 : num [1:205] 7 4 6 1 6 4 2 5 1 1 ...
$ CL...72 : num [1:205] 7 4 6 1 6 3 3 4 1 1 ...
$ Status...73 : chr [1:205] "NF" "PF" "NF" "F" ...
$ PR...74 : num [1:205] 7 4 6 1 6 4 2 5 1 1 ...
$ CL...75 : num [1:205] 7 4 6 1 6 3 3 4 1 1 ...
$ Status...76 : chr [1:205] "NF" "PF" "NF" "F" ...
$ PR...77 : num [1:205] 7 4 6 1 6 4 3 4 1 1 ...
$ CL...78 : num [1:205] 7 5 5 1 6 3 3 4 1 1 ...
$ Status...79 : chr [1:205] "NF" "PF" "NF" "F" ...
$ PR...80 : num [1:205] 7 4 6 1 6 4 2 4 1 1 ...
$ CL...81 : num [1:205] 7 5 5 1 6 3 3 4 1 1 ...
$ Status...82 : chr [1:205] "NF" "PF" "NF" "F" ...
$ PR...83 : num [1:205] 7 4 6 1 6 4 1 4 1 1 ...
$ CL...84 : num [1:205] 7 5 5 1 6 2 2 4 1 1 ...
$ Status...85 : chr [1:205] "NF" "PF" "NF" "F" ...
$ PR...86 : num [1:205] 7 3 6 1 6 4 3 4 1 1 ...
$ CL...87 : num [1:205] 7 4 5 1 6 2 3 4 1 1 ...
$ Status...88 : chr [1:205] "NF" "PF" "NF" "F" ...
$ PR...89 : num [1:205] 6 3 6 1 6 4 3 4 1 1 ...
$ CL...90 : num [1:205] 6 3 5 1 5 2 3 4 1 1 ...
$ Status...91 : chr [1:205] "NF" "PF" "NF" "F" ...
$ PR...92 : num [1:205] 6 3 6 1 6 4 2 4 1 1 ...
$ CL...93 : num [1:205] 6 3 5 1 5 2 2 4 1 1 ...
$ Status...94 : chr [1:205] "NF" "PF" "NF" "F" ...
$ PR...95 : num [1:205] 5 3 6 1 6 2 2 5 1 1 ...
$ CL...96 : num [1:205] 6 3 5 1 5 2 2 4 1 1 ...
$ Status...97 : chr [1:205] "NF" "PF" "NF" "F" ...
$ PR...98 : num [1:205] 5 3 6 1 6 2 2 5 1 1 ...
$ CL...99 : num [1:205] 5 3 5 1 5 2 2 4 1 1 ...
[list output truncated]
dim(fh_na)
[1] 205 133
fh_na
A bit more work to do
We have three variables per country and per year (PR, CL, Status)
■PR...2
, CL...3
, Status...4
are the data for 1972
■PR...5
, CL...6
, Status...7
are the data for 1973
・・・・・・・・・・・・・・・・・・・・・・・・
■PR...128
, CL...129
, Status...130
are data for 2015
■PR...131
, CL...132
, Status...134
are data for 2016
Two different classes of vairalbes: numeric and categorical
Solution:Make two variables (value
, status
)
→ The values of PR and CL are put into value
→ The values of Status are put into status
...1
...1
shows country name...1
as country
<- fh_na %>%
fh_na rename(country = 1)
fh_country
and fh_na
<- fh_na %>%
fh_country select(country)
<- fh_na %>%
fh_na select(-country)
names(fh_na)
[1] "PR...2" "CL...3" "Status...4" "PR...5" "CL...6"
[6] "Status...7" "PR...8" "CL...9" "Status...10" "PR...11"
[11] "CL...12" "Status...13" "PR...14" "CL...15" "Status...16"
[16] "PR...17" "CL...18" "Status...19" "PR...20" "CL...21"
[21] "Status...22" "PR...23" "CL...24" "Status...25" "PR...26"
[26] "CL...27" "Status...28" "PR...29" "CL...30" "Status...31"
[31] "PR...32" "CL...33" "Status...34" "PR...35" "CL...36"
[36] "Status...37" "PR...38" "CL...39" "Status...40" "PR...41"
[41] "CL...42" "Status...43" "PR...44" "CL...45" "Status...46"
[46] "PR...47" "CL...48" "Status...49" "PR...50" "CL...51"
[51] "Status...52" "PR...53" "CL...54" "Status...55" "PR...56"
[56] "CL...57" "Status...58" "PR...59" "CL...60" "Status...61"
[61] "PR...62" "CL...63" "Status...64" "PR...65" "CL...66"
[66] "Status...67" "PR...68" "CL...69" "Status...70" "PR...71"
[71] "CL...72" "Status...73" "PR...74" "CL...75" "Status...76"
[76] "PR...77" "CL...78" "Status...79" "PR...80" "CL...81"
[81] "Status...82" "PR...83" "CL...84" "Status...85" "PR...86"
[86] "CL...87" "Status...88" "PR...89" "CL...90" "Status...91"
[91] "PR...92" "CL...93" "Status...94" "PR...95" "CL...96"
[96] "Status...97" "PR...98" "CL...99" "Status...100" "PR...101"
[101] "CL...102" "Status...103" "PR...104" "CL...105" "Status...106"
[106] "PR...107" "CL...108" "Status...109" "PR...110" "CL...111"
[111] "Status...112" "PR...113" "CL...114" "Status...115" "PR...116"
[116] "CL...117" "Status...118" "PR...119" "CL...120" "Status...121"
[121] "PR...122" "CL...123" "Status...124" "PR...125" "CL...126"
[126] "Status...127" "PR...128" "CL...129" "Status...130" "PR...131"
[131] "CL...132" "Status...133"
names(fh_country)
[1] "country"
fh_na
colnames(fh_na) <-
str_replace_all(colnames(fh_na),
c("\\.\\.\\." = "-")) %>% # replace "・・・" with "-"
str_subset("PR|CL|Status") %>% # change the variable names like "pr_1972"
str_c(., "_") %>%
str_replace_all(c("-" = "",
"[0-9]" = "",
"PR" = "pr", # pr => PR
"CL" = "cl", # cl => CL
"Status" = "st")) %>% # st = Status
str_c(., rep(setdiff(1972:2016, 1981), # exclude 1981
each = 3)) # make 3 variables per year
fh_na
names(fh_na)
[1] "pr_1972" "cl_1972" "st_1972" "pr_1973" "cl_1973" "st_1973" "pr_1974"
[8] "cl_1974" "st_1974" "pr_1975" "cl_1975" "st_1975" "pr_1976" "cl_1976"
[15] "st_1976" "pr_1977" "cl_1977" "st_1977" "pr_1978" "cl_1978" "st_1978"
[22] "pr_1979" "cl_1979" "st_1979" "pr_1980" "cl_1980" "st_1980" "pr_1982"
[29] "cl_1982" "st_1982" "pr_1983" "cl_1983" "st_1983" "pr_1984" "cl_1984"
[36] "st_1984" "pr_1985" "cl_1985" "st_1985" "pr_1986" "cl_1986" "st_1986"
[43] "pr_1987" "cl_1987" "st_1987" "pr_1988" "cl_1988" "st_1988" "pr_1989"
[50] "cl_1989" "st_1989" "pr_1990" "cl_1990" "st_1990" "pr_1991" "cl_1991"
[57] "st_1991" "pr_1992" "cl_1992" "st_1992" "pr_1993" "cl_1993" "st_1993"
[64] "pr_1994" "cl_1994" "st_1994" "pr_1995" "cl_1995" "st_1995" "pr_1996"
[71] "cl_1996" "st_1996" "pr_1997" "cl_1997" "st_1997" "pr_1998" "cl_1998"
[78] "st_1998" "pr_1999" "cl_1999" "st_1999" "pr_2000" "cl_2000" "st_2000"
[85] "pr_2001" "cl_2001" "st_2001" "pr_2002" "cl_2002" "st_2002" "pr_2003"
[92] "cl_2003" "st_2003" "pr_2004" "cl_2004" "st_2004" "pr_2005" "cl_2005"
[99] "st_2005" "pr_2006" "cl_2006" "st_2006" "pr_2007" "cl_2007" "st_2007"
[106] "pr_2008" "cl_2008" "st_2008" "pr_2009" "cl_2009" "st_2009" "pr_2010"
[113] "cl_2010" "st_2010" "pr_2011" "cl_2011" "st_2011" "pr_2012" "cl_2012"
[120] "st_2012" "pr_2013" "cl_2013" "st_2013" "pr_2014" "cl_2014" "st_2014"
[127] "pr_2015" "cl_2015" "st_2015" "pr_2016" "cl_2016" "st_2016"
bind_cols()
function, merge fh_na
and fh_country
<- fh_country %>% #
fh_na bind_cols(fh_na)
fh_na
::paged_table(fh_na) rmarkdown
value
and type
<- fh_na %>%
PR_CL_long select(country,
starts_with(c("pr", "cl"))) %>% # select those variables starting with `pr` and `cl`
pivot_longer(pr_1972:cl_2016, # assign the range of variables
names_to = "type", # put variable names, such as "pr_1972", into `type`
values_to = "value") %>% # put values of variables, such as 1972, into `value`
separate(type,
into = c("type", "year"), # divide the values of type into 2: `type` and `year`
sep = "_")%>% # two values should be connected by "_"
drop_na() # drop missing values
<- fh_na %>%
ST_long select(country, # country を選ぶ
starts_with("st")) %>% # select those variables starting with `st`
pivot_longer(st_1972:st_2016, # assign the range of variables
names_to = "name", # put variable names, such as "pr_1972", into `name`
values_to = "status") %>% # put values of variables, such as 1972, into `status`
separate(name,
into = c("name", "year"), # divide the values of type into 2: `name` and `year`
sep = "_") %>% # two values should be connected by "_"
select(-name)%>% # nameは不要なので削除
drop_na() # drop missing values
PR_CL_long
names(PR_CL_long)
[1] "country" "type" "year" "value"
ST_long
names(ST_long)
[1] "country" "year" "status"
left_joint()
function, merge PR_CL_long
and ST_long
with the two shared variables: country
and year
<- PR_CL_long %>%
fh_all_long left_join(ST_long,
by = c("country", "year"))
::datatable(fh_all_long) DT
Freedom House
Transition of Political Rights between North Kore and South Korea (1972-2016)
<- fh_all_long %>%
korea_PR filter(country == "North Korea" | country == "South Korea") %>%
filter(type == "pr")
%>%
korea_PR ggplot(aes(x = value, y = year,
color = country,
shape = country)) +
geom_point() +
ggtitle("Political Rights between N.Korea and S.Korea: 1972-2016") +
labs(x = "Political Rights", y = "Year") +
theme(legend.position = c(0.5, 0.8))
Transition of Political Rights between Japan and China (1972-2016)
<- fh_all_long %>%
jpn.chi_PR filter(country == "Japan" | country == "China") %>%
filter(type == "pr")
%>%
jpn.chi_PR ggplot(aes(x = value, y = year,
color = country,
shape = country)) +
geom_point() +
ggtitle("Political Rights between Japan and China: 1972-2016") +
labs(x = "Political Rights", y = "Year") +
theme(legend.position = c(0.5, 0.8))