income.csv)working directory) you are currently working on.getwd()[1] "/Users/asanomasahiko/Dropbox/statistics/class_materials"
R Project which enables you to efficiently conduct your research on RStudioR ProjectKnit buttonName as:datadatatidyverse package to read the csv filelibrary("tidyverse")
df1 <- read_csv("data/income.csv") df1DT::datatable(df1)str() function, check the structure of df1str(df1)spec_tbl_df [100 × 7] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
$ id : chr [1:100] "AU" "AY" "AB" "AM" ...
$ sex : chr [1:100] "male" "female" "male" "male" ...
$ age : num [1:100] 70 70 69 67 66 66 65 65 65 64 ...
$ height : num [1:100] 160 156 173 166 171 ...
$ weight : num [1:100] 58.3 44 75.7 69.3 76.5 67.3 41.5 53.5 46.8 52.7 ...
$ income : num [1:100] 201 487 424 1735 929 ...
$ generation: chr [1:100] "elder" "elder" "elder" "elder" ...
- attr(*, "spec")=
.. cols(
.. id = col_character(),
.. sex = col_character(),
.. age = col_double(),
.. height = col_double(),
.. weight = col_double(),
.. income = col_number(),
.. generation = col_character()
.. )
numericdf1summary(df1) id sex age height
Length:100 Length:100 Min. :20.00 Min. :148.0
Class :character Class :character 1st Qu.:36.00 1st Qu.:158.1
Mode :character Mode :character Median :45.00 Median :162.9
Mean :45.96 Mean :163.7
3rd Qu.:57.25 3rd Qu.:170.2
Max. :70.00 Max. :180.5
weight income generation
Min. :28.30 Min. : 24.0 Length:100
1st Qu.:48.95 1st Qu.: 134.8 Class :character
Median :59.95 Median : 298.5 Mode :character
Mean :59.18 Mean : 434.4
3rd Qu.:67.33 3rd Qu.: 607.2
Max. :85.60 Max. :2351.0
stargazer() with type = "text", then you can have a nicer tablelibrary(stargazer)stargazer(as.data.frame(df1),
type ="text",
digits = 2)
=============================================================
Statistic N Mean St. Dev. Min Pctl(25) Pctl(75) Max
-------------------------------------------------------------
age 100 45.96 13.33 20 36 57.2 70
height 100 163.75 7.69 148.00 158.10 170.17 180.50
weight 100 59.18 12.65 28.30 48.95 67.32 85.60
income 100 434.40 445.78 24 134.8 607.2 2,351
-------------------------------------------------------------
stargazer() with type = "html", then you can have a fancier table{r, results = "asis"} at the chunk optionstargazer(as.data.frame(df1),
type ="html",
digits = 2)| Statistic | N | Mean | St. Dev. | Min | Pctl(25) | Pctl(75) | Max |
| age | 100 | 45.96 | 13.33 | 20 | 36 | 57.2 | 70 |
| height | 100 | 163.75 | 7.69 | 148.00 | 158.10 | 170.17 | 180.50 |
| weight | 100 | 59.18 | 12.65 | 28.30 | 48.95 | 67.32 | 85.60 |
| income | 100 | 434.40 | 445.78 | 24 | 134.8 | 607.2 | 2,351 |
| Desriptive Statistics | Details |
|---|---|
| N: | The number of observation |
| Mean: | Average value |
| St. Dev. | Standard deviation |
| Min | Minimum value |
| Pctl(25) | 1st Quantile (25%) |
| Pctl(75): | 3rd Quantile (75%) |
| Max: | 最大値 |
\[\bar{x} = \frac{\sum_{i=1}^n x_i}{n}\]
toefl <- c(60, 80, 90, 80, 85, 60, 80, 90, 85, 100)toefl [1] 60 80 90 80 85 60 80 90 85 100
(60+80+90+80+85+60+80+90+85+100)/10[1] 81
toel1 with R (2)sum(toefl)[1] 810
sum(toefl)/10[1] 81
toel1 with R (3)mean(toefl)[1] 81
toel1 with R (4)summary(toefl) Min. 1st Qu. Median Mean 3rd Qu. Max.
60.00 80.00 82.50 81.00 88.75 100.00
toelf1 using hist( )hist(toefl)The median is the value separating the higher half from the lower half of a data sample
For a data set, it may be thought of as “the middle” value.
If we have the data set: 1, 2, 3.
The median is 2.
Using table( ), we can make a table of toefl
table(toefl)toefl
60 80 85 90 100
2 3 2 2 1
median( )median(toefl)[1] 82.5
table( )table(toefl)toefl
60 80 85 90 100
2 3 2 2 1
toefl is 80.In probability theory and statistics, variance is the expectation of the squared deviation of a random variable from its mean.
Variance is a measure of dispersion, meaning it is a measure of how far a set of numbers is spread out from their average value.
Variance is calculated with the following equasion:
\[Variance = \frac{\sum_{i=1}^N (individual.value - Average)^2}{N}\]
toefltoefl [1] 60 80 90 80 85 60 80 90 85 100
toefl and name it toefl_meantoefl_mean <- mean(toefl)
toefl_mean[1] 81
(individual.value - Average) and name it xx <- toefl - toefl_mean
x [1] -21 -1 9 -1 4 -21 -1 9 4 19
x and name it x2x2 <- x^2
x2 [1] 441 1 81 1 16 441 1 81 16 361
x2 and name it sum_x2sum_x2 <- sum(x2)
sum_x2[1] 1440
N <- length(toefl) # Number of observation
N[1] 10
\[= \frac{1440}{10} = 144\] - This is the variance of toefl
- We can also calculate variance of toefl with R as follows:
variance_toefl <- var(toefl) * (length(toefl) - 1) / length(toefl)
variance_toefl[1] 144
\[Standard Deviation = \sqrt{Variance}\] - Thus, the standard deviation of toefl is calculated with variance_toefl
sqrt(variance_toefl)[1] 12
参考文献