Stat 50 - Elementary Statistics¶

Dr. Jorge Basilio¶

Data Analysis using an Excel file¶

Lab 2¶

Name: SOLUTIONS¶

Due: Saturday, Feb 1 at 11:59 PM¶

PART 1¶

library(readxl) # load library to read excel files in R
fresh_15_full <- read_excel("06-Freshman15.xlsx") # load excel file info into R and save it to "fresh_15_full"
fresh_15_full # see data

names(fresh_15_full)

Problem 1¶

The five variabls are as follows:

"Sex" represents the gender of (M or F)
"WT SEPT" represents the weight of the person in September (in Fall semester)
"WT April" represents the weight of the person in April (next year in Spring semester)
"BMI SEPT" represents the BMI (bodymass index) of the person in September (in Fall semester)
"BMI April" represents the BMI (bodymass index) of the person in April (next year in Spring semester)

Problem 2¶

# 2a
fresh_15_full$"WT SEPT" # show all values of "WT SEPT"

# 2b
fresh_15_full$"WT SEPT"[55] # one way
fresh_15_full[55,2] # another way

# 2c
fresh_15_full[45:55,] # show rows 45 to 55, all the columns

# 2 d
fresh_15_full[45:55,c(4,5)] # show rows 45 to 55, columns 4 and 5

Problem 3¶

# 3 a
summary(fresh_15_full$"WT SEPT") # five number summary for "WT SEPT"

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  42.00   56.50   64.00   65.06   70.50   97.00

# 3 a
summary(fresh_15_full$"WT APRIL") # five number summary for "WT APRIL"

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  47.00   58.00   66.00   66.24   70.00  105.00

3b.

The median for "WT APRIL" is 66.00 pounds and the median of "WT SEPT" which is 64.00 pounds. Therefore, the median weight of freshman in April (spring semester) is larger than the median weight of freshman in September (fall semester).

# 3 c
# range = max - min
range_sept <- 97.00 - 42.00 # define range for weight of sept
range_april <- 105.00 - 47.00 # define range for weight of april
range_sept
range_april

The range in April is bigger.

# 3 d
# range rule of thumb: estimate for standard deviation = range/4
approx_sd_wt_sept <- range_sept/4 
approx_sd_wt_april <- range_april/4 
approx_sd_wt_sept
approx_sd_wt_april

The "spread" of the weights in April are larger (approx 14.5) than the weights in September (approx 13.75) using the range rule of thumb.

Problem 4¶

There are 35 Females and 32 Males, so there are more Females in the data set.

Problem 5¶

sd_wt_sept <- sd(fresh_15_full$"WT SEPT") # standard deviation of "WT SEPT"
sd_wt_april <- sd(fresh_15_full$"WT APRIL") # standard deviation of "WT APRIL"
sd_wt_sept 
sd_wt_april

The exact standard deviation of the weights in September is 11.285... and the approximation using the range rule of four is 13.75. The approximation is bigger than the actual.

The exact standard deviation of the weights in April is 11.284... and the approximation using the range rule of four is 14.5. The approximation is bigger than the actual.

PART 2¶

Problem 6¶

hist(fresh_15_full$"WT SEPT", col="pink") # histogram for "WT SEPT" 
    # notice the optional " col="pink" " separated by a comman to give the histogram so color ;-)

hist(fresh_15_full$"WT APRIL", col="pink") # histogram for "WT APRIL"

hist(fresh_15_full$"BMI SEPT", col="lightblue") # histogram for "BMI SEPT"

hist(fresh_15_full$"BMI APRIL", col="lightblue") # histogram for "BMI APRIL"

Problem 7¶

# box plot 
# compare before and after freshman year weights
boxplot(fresh_15_full[,c(2,3)], horizontal=TRUE, col="aquamarine")

By comparing the two box plots, we see that there is statistical evidence that supports the "freshman 15 myth" but it is not very dramatic. The minimum weight in April increased to 47.00 from 42.00 and the maximums also increases from 97.00 to 105.00. The median weights in April are larger.

Overall, the box plot in April is shifted to the right, indicating that the weights inceased.

PART 3¶

Problem 8¶

# box plot 
# compare two box plots between M vs F
# first piece before the tilde (~) is the box plots from the "WT SEPT" column 
# the second after the tilde (~) is the grouping variable (SEX, M/F)
boxplot(fresh_15_full$"WT SEPT" ~ fresh_15_full$SEX, horizontal=TRUE, col="orange")

boxplot(fresh_15_full$"WT APRIL" ~ fresh_15_full$SEX, horizontal=TRUE, col="deepskyblue")

boxplot(fresh_15_full$"BMI SEPT" ~ fresh_15_full$SEX, horizontal=TRUE, col="salmon")

boxplot(fresh_15_full$"BMI APRIL" ~ fresh_15_full$SEX, horizontal=TRUE, col="yellow")

BONUS¶

Wouldn't it be nice to do all of them in one graph?

boxplot(fresh_15_full$"WT SEPT", fresh_15_full$"WT APRIL", fresh_15_full$"BMI SEPT", fresh_15_full$"BMI APRIL", col="red", horizontal=TRUE)

SEX	WT SEPT	WT APRIL	BMI SEPT	BMI APRIL
<chr>	<dbl>	<dbl>	<dbl>	<dbl>
M	72	59	22.02	18.14
M	97	86	19.70	17.44
M	74	69	24.09	22.43
M	93	88	26.97	25.57
F	68	64	21.51	20.10
M	59	55	18.69	17.40
F	64	60	24.24	22.88
F	56	53	21.23	20.23
F	70	68	30.26	29.24
F	58	56	21.88	21.02
F	50	47	17.63	16.89
M	71	69	24.57	23.85
M	67	66	20.68	20.15
F	56	55	20.97	20.36
F	70	68	27.30	26.73
F	61	60	23.30	22.88
F	53	52	19.48	19.24
M	92	92	24.74	24.69
F	57	58	20.69	20.79
M	67	67	20.49	20.60
F	58	58	21.09	21.24
F	49	50	18.37	18.53
M	68	68	22.40	22.61
F	69	69	28.17	28.43
M	87	88	23.60	23.81
M	81	82	26.52	26.78
M	60	61	18.89	19.27
F	52	53	19.31	19.75
M	70	71	20.96	21.32
F	63	64	21.78	22.22
⋮	⋮	⋮	⋮	⋮
F	63	65	23.87	24.67
F	54	56	18.61	19.34
F	56	58	21.73	22.58
M	54	56	18.93	19.72
M	73	75	25.88	26.72
M	77	79	28.59	29.53
F	63	66	21.89	22.79
F	51	54	18.31	19.28
F	59	62	19.64	20.63
F	65	68	23.02	24.10
F	53	56	20.63	21.91
F	62	65	22.61	23.81
F	55	58	22.03	23.42
M	74	77	20.31	21.34
M	74	78	20.31	21.36
M	64	68	19.59	20.77
M	64	68	21.05	22.31
F	57	61	23.47	25.11
F	64	68	22.84	24.29
F	60	64	19.50	20.90
M	64	68	18.51	19.83
M	66	71	21.40	22.97
F	52	57	17.72	19.42
M	71	77	22.26	23.87
F	55	60	21.64	23.81
M	65	71	22.51	24.45
M	75	82	23.69	25.80
F	42	49	15.08	17.74
M	74	82	22.64	25.33
M	94	105	36.57	40.86

SEX	WT SEPT	WT APRIL	BMI SEPT	BMI APRIL
<chr>	<dbl>	<dbl>	<dbl>	<dbl>
F	51	54	18.31	19.28
F	59	62	19.64	20.63
F	65	68	23.02	24.10
F	53	56	20.63	21.91
F	62	65	22.61	23.81
F	55	58	22.03	23.42
M	74	77	20.31	21.34
M	74	78	20.31	21.36
M	64	68	19.59	20.77
M	64	68	21.05	22.31
F	57	61	23.47	25.11

BMI SEPT	BMI APRIL
<dbl>	<dbl>
18.31	19.28
19.64	20.63
23.02	24.10
20.63	21.91
22.61	23.81
22.03	23.42
20.31	21.34
20.31	21.36
19.59	20.77
21.05	22.31
23.47	25.11