income.csv
)working directory
) you are currently working on.getwd()
[1] "/Users/asanomasahiko/Dropbox/statistics/class_materials"
R Project
which enables you to efficiently conduct your research on RStudioR Project
Knit
buttonName as:
data
data
tidyverse
package to read the csv filelibrary("tidyverse")
<- read_csv("data/income.csv") df1
df1
::datatable(df1) DT
str()
function, check the structure of df1
str(df1)
spec_tbl_df [100 × 7] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
$ id : chr [1:100] "AU" "AY" "AB" "AM" ...
$ sex : chr [1:100] "male" "female" "male" "male" ...
$ age : num [1:100] 70 70 69 67 66 66 65 65 65 64 ...
$ height : num [1:100] 160 156 173 166 171 ...
$ weight : num [1:100] 58.3 44 75.7 69.3 76.5 67.3 41.5 53.5 46.8 52.7 ...
$ income : num [1:100] 201 487 424 1735 929 ...
$ generation: chr [1:100] "elder" "elder" "elder" "elder" ...
- attr(*, "spec")=
.. cols(
.. id = col_character(),
.. sex = col_character(),
.. age = col_double(),
.. height = col_double(),
.. weight = col_double(),
.. income = col_number(),
.. generation = col_character()
.. )
numeric
df1
summary(df1)
id sex age height
Length:100 Length:100 Min. :20.00 Min. :148.0
Class :character Class :character 1st Qu.:36.00 1st Qu.:158.1
Mode :character Mode :character Median :45.00 Median :162.9
Mean :45.96 Mean :163.7
3rd Qu.:57.25 3rd Qu.:170.2
Max. :70.00 Max. :180.5
weight income generation
Min. :28.30 Min. : 24.0 Length:100
1st Qu.:48.95 1st Qu.: 134.8 Class :character
Median :59.95 Median : 298.5 Mode :character
Mean :59.18 Mean : 434.4
3rd Qu.:67.33 3rd Qu.: 607.2
Max. :85.60 Max. :2351.0
stargazer()
with type = "text"
, then you can have a nicer tablelibrary(stargazer)
stargazer(as.data.frame(df1),
type ="text",
digits = 2)
=============================================================
Statistic N Mean St. Dev. Min Pctl(25) Pctl(75) Max
-------------------------------------------------------------
age 100 45.96 13.33 20 36 57.2 70
height 100 163.75 7.69 148.00 158.10 170.17 180.50
weight 100 59.18 12.65 28.30 48.95 67.32 85.60
income 100 434.40 445.78 24 134.8 607.2 2,351
-------------------------------------------------------------
stargazer()
with type = "html"
, then you can have a fancier table{r, results = "asis"}
at the chunk optionstargazer(as.data.frame(df1),
type ="html",
digits = 2)
Statistic | N | Mean | St. Dev. | Min | Pctl(25) | Pctl(75) | Max |
age | 100 | 45.96 | 13.33 | 20 | 36 | 57.2 | 70 |
height | 100 | 163.75 | 7.69 | 148.00 | 158.10 | 170.17 | 180.50 |
weight | 100 | 59.18 | 12.65 | 28.30 | 48.95 | 67.32 | 85.60 |
income | 100 | 434.40 | 445.78 | 24 | 134.8 | 607.2 | 2,351 |
Desriptive Statistics | Details |
---|---|
N: | The number of observation |
Mean: | Average value |
St. Dev. | Standard deviation |
Min | Minimum value |
Pctl(25) | 1st Quantile (25%) |
Pctl(75): | 3rd Quantile (75%) |
Max: | 最大値 |
\[\bar{x} = \frac{\sum_{i=1}^n x_i}{n}\]
<- c(60, 80, 90, 80, 85, 60, 80, 90, 85, 100) toefl
toefl
[1] 60 80 90 80 85 60 80 90 85 100
60+80+90+80+85+60+80+90+85+100)/10 (
[1] 81
toel1
with R (2)sum(toefl)
[1] 810
sum(toefl)/10
[1] 81
toel1
with R (3)mean(toefl)
[1] 81
toel1
with R (4)summary(toefl)
Min. 1st Qu. Median Mean 3rd Qu. Max.
60.00 80.00 82.50 81.00 88.75 100.00
toelf1
using hist( )
hist(toefl)
The median is the value separating the higher half from the lower half of a data sample
For a data set, it may be thought of as “the middle” value.
If we have the data set: 1, 2, 3.
The median is 2.
Using table( )
, we can make a table of toefl
table(toefl)
toefl
60 80 85 90 100
2 3 2 2 1
median( )
median(toefl)
[1] 82.5
table( )
table(toefl)
toefl
60 80 85 90 100
2 3 2 2 1
toefl
is 80.In probability theory and statistics, variance is the expectation of the squared deviation of a random variable from its mean.
Variance is a measure of dispersion, meaning it is a measure of how far a set of numbers is spread out from their average value.
Variance is calculated with the following equasion:
\[Variance = \frac{\sum_{i=1}^N (individual.value - Average)^2}{N}\]
toefl
toefl
[1] 60 80 90 80 85 60 80 90 85 100
toefl
and name it toefl_mean
<- mean(toefl)
toefl_mean toefl_mean
[1] 81
(individual.value - Average)
and name it x
<- toefl - toefl_mean
x x
[1] -21 -1 9 -1 4 -21 -1 9 4 19
x
and name it x2
<- x^2
x2 x2
[1] 441 1 81 1 16 441 1 81 16 361
x2
and name it sum_x2
<- sum(x2)
sum_x2 sum_x2
[1] 1440
<- length(toefl) # Number of observation
N N
[1] 10
\[= \frac{1440}{10} = 144\] - This is the variance of toefl
- We can also calculate variance of toefl
with R as follows:
<- var(toefl) * (length(toefl) - 1) / length(toefl)
variance_toefl variance_toefl
[1] 144
\[Standard Deviation = \sqrt{Variance}\] - Thus, the standard deviation of toefl
is calculated with variance_toefl
sqrt(variance_toefl)
[1] 12
参考文献