library(readxl) # load library to read excel files in R
fresh_15_full <- read_excel("06-Freshman15.xlsx") # load excel file info into R and save it to "fresh_15_full"
fresh_15_full # see data
names(fresh_15_full)
The five variabls are as follows:
# 2a
fresh_15_full$"WT SEPT" # show all values of "WT SEPT"
# 2b
fresh_15_full$"WT SEPT"[55] # one way
fresh_15_full[55,2] # another way
# 2c
fresh_15_full[45:55,] # show rows 45 to 55, all the columns
# 2 d
fresh_15_full[45:55,c(4,5)] # show rows 45 to 55, columns 4 and 5
# 3 a
summary(fresh_15_full$"WT SEPT") # five number summary for "WT SEPT"
# 3 a
summary(fresh_15_full$"WT APRIL") # five number summary for "WT APRIL"
3b.
The median for "WT APRIL" is 66.00 pounds and the median of "WT SEPT" which is 64.00 pounds. Therefore, the median weight of freshman in April (spring semester) is larger than the median weight of freshman in September (fall semester).
# 3 c
# range = max - min
range_sept <- 97.00 - 42.00 # define range for weight of sept
range_april <- 105.00 - 47.00 # define range for weight of april
range_sept
range_april
The range in April is bigger.
# 3 d
# range rule of thumb: estimate for standard deviation = range/4
approx_sd_wt_sept <- range_sept/4
approx_sd_wt_april <- range_april/4
approx_sd_wt_sept
approx_sd_wt_april
The "spread" of the weights in April are larger (approx 14.5) than the weights in September (approx 13.75) using the range rule of thumb.
There are 35 Females and 32 Males, so there are more Females in the data set.
sd_wt_sept <- sd(fresh_15_full$"WT SEPT") # standard deviation of "WT SEPT"
sd_wt_april <- sd(fresh_15_full$"WT APRIL") # standard deviation of "WT APRIL"
sd_wt_sept
sd_wt_april
The exact standard deviation of the weights in September is 11.285... and the approximation using the range rule of four is 13.75. The approximation is bigger than the actual.
The exact standard deviation of the weights in April is 11.284... and the approximation using the range rule of four is 14.5. The approximation is bigger than the actual.
hist(fresh_15_full$"WT SEPT", col="pink") # histogram for "WT SEPT"
# notice the optional " col="pink" " separated by a comman to give the histogram so color ;-)
hist(fresh_15_full$"WT APRIL", col="pink") # histogram for "WT APRIL"
hist(fresh_15_full$"BMI SEPT", col="lightblue") # histogram for "BMI SEPT"
hist(fresh_15_full$"BMI APRIL", col="lightblue") # histogram for "BMI APRIL"
# box plot
# compare before and after freshman year weights
boxplot(fresh_15_full[,c(2,3)], horizontal=TRUE, col="aquamarine")
By comparing the two box plots, we see that there is statistical evidence that supports the "freshman 15 myth" but it is not very dramatic. The minimum weight in April increased to 47.00 from 42.00 and the maximums also increases from 97.00 to 105.00. The median weights in April are larger.
Overall, the box plot in April is shifted to the right, indicating that the weights inceased.
# box plot
# compare two box plots between M vs F
# first piece before the tilde (~) is the box plots from the "WT SEPT" column
# the second after the tilde (~) is the grouping variable (SEX, M/F)
boxplot(fresh_15_full$"WT SEPT" ~ fresh_15_full$SEX, horizontal=TRUE, col="orange")
boxplot(fresh_15_full$"WT APRIL" ~ fresh_15_full$SEX, horizontal=TRUE, col="deepskyblue")
boxplot(fresh_15_full$"BMI SEPT" ~ fresh_15_full$SEX, horizontal=TRUE, col="salmon")
boxplot(fresh_15_full$"BMI APRIL" ~ fresh_15_full$SEX, horizontal=TRUE, col="yellow")
Wouldn't it be nice to do all of them in one graph?
boxplot(fresh_15_full$"WT SEPT", fresh_15_full$"WT APRIL", fresh_15_full$"BMI SEPT", fresh_15_full$"BMI APRIL", col="red", horizontal=TRUE)