Stat 50 - Elementary Statistics¶

Dr. Jorge Basilio¶

Introduction to R and Descriptive Statistics¶

Lab 1¶

Name: SOLUTIONS¶

Part 1¶

# Problem 1
my_feelings <- "I love stats!"
my_feelings

# Problem 2
fresh_15 <- c(72, 97, 74, 93, 68, 59, 64, 56, 70, 58, 50, 71, 67, 56, 70, 61, 53, 92, 57, 67,
58, 49, 68, 69, 87, 81, 60, 52, 70, 63, 56, 68, 68, 54, 80, 64, 57, 63, 54, 56,
54, 73, 77, 63, 51, 59, 65, 53, 62, 55, 74, 74, 64, 64, 57, 64, 60, 64, 66, 52,
71, 55, 65, 75, 42, 74, 94)
fresh_15

# Problem 2 part a
mean(fresh_15) # mean
median(fresh_15) # median
sd(fresh_15) # standard deviation
var(fresh_15) # variance

# Problem 2 part b
summary(fresh_15) # five number summary
boxplot(fresh_15, horizontal=TRUE) # box plot

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  42.00   56.50   64.00   65.06   70.50   97.00

# Problem 2 part c
hist(fresh_15) # histogram

library(plotrix) # recall: dot plot needs a special library loaded
dotplot.mtb(fresh_15) # dot plot
stem(fresh_15) # stem-leaf plot

  The decimal point is 1 digit(s) to the right of the |

  4 | 29
  5 | 0122334445566667778899
  6 | 00123334444445567788889
  7 | 0001123444457
  8 | 017
  9 | 2347

Part (d):¶

The histogram has classes of width 5 whereas the dot plot shows the individual data points. So the histogram will show the data more grouped together and, therefore, has taller bars in the middle. Using the dot plot you can see exactly how the data is arranged and it is more accurate. The histogram shows more clearly the the data trend.

Part 2¶

library(readxl) # load special library so that R can understand excel (spreadsheet) files
fresh_15_excel <- read_excel("06-Freshman15.xlsx") # save the excel spreadsheet to a descriptive name like "fresh_15_excel", or similar

str(fresh_15_excel) # str shows the "structure" of fresh_15_excel.
    # str() also shows the first few rows of data

Classes ‘tbl_df’, ‘tbl’ and 'data.frame':	67 obs. of  5 variables:
 $ SEX      : chr  "M" "M" "M" "M" ...
 $ WT SEPT  : num  72 97 74 93 68 59 64 56 70 58 ...
 $ WT APRIL : num  59 86 69 88 64 55 60 53 68 56 ...
 $ BMI SEPT : num  22 19.7 24.1 27 21.5 ...
 $ BMI APRIL: num  18.1 17.4 22.4 25.6 20.1 ...

names(fresh_15_excel) # show names of columns

Problem 3¶

The first variable "SEX" keeps track of the gender of the individuals sampled. The second variable "WT SEPT" tells us the weight of the individuals in September of their freshman year and "WT APRIL" tells us the weight of the individuals in April of their freshman year.

Similarly, "BMI SEPT" and "BMI APRIL" tells us the BMI, or Body-Mass Index, of the individuals in September and in April of their freshman year.

# END OF LAB