Stat 50 - Elementary Statistics

Dr. Jorge Basilio

Introduction to R and Descriptive Statistics

Lab 1

Name: SOLUTIONS

Part 1

In [1]:
# Problem 1
my_feelings <- "I love stats!"
my_feelings
Out[1]:
'I love stats!'
In [2]:
# Problem 2
fresh_15 <- c(72, 97, 74, 93, 68, 59, 64, 56, 70, 58, 50, 71, 67, 56, 70, 61, 53, 92, 57, 67,
58, 49, 68, 69, 87, 81, 60, 52, 70, 63, 56, 68, 68, 54, 80, 64, 57, 63, 54, 56,
54, 73, 77, 63, 51, 59, 65, 53, 62, 55, 74, 74, 64, 64, 57, 64, 60, 64, 66, 52,
71, 55, 65, 75, 42, 74, 94)
fresh_15
Out[2]:
  1. 72
  2. 97
  3. 74
  4. 93
  5. 68
  6. 59
  7. 64
  8. 56
  9. 70
  10. 58
  11. 50
  12. 71
  13. 67
  14. 56
  15. 70
  16. 61
  17. 53
  18. 92
  19. 57
  20. 67
  21. 58
  22. 49
  23. 68
  24. 69
  25. 87
  26. 81
  27. 60
  28. 52
  29. 70
  30. 63
  31. 56
  32. 68
  33. 68
  34. 54
  35. 80
  36. 64
  37. 57
  38. 63
  39. 54
  40. 56
  41. 54
  42. 73
  43. 77
  44. 63
  45. 51
  46. 59
  47. 65
  48. 53
  49. 62
  50. 55
  51. 74
  52. 74
  53. 64
  54. 64
  55. 57
  56. 64
  57. 60
  58. 64
  59. 66
  60. 52
  61. 71
  62. 55
  63. 65
  64. 75
  65. 42
  66. 74
  67. 94
In [3]:
# Problem 2 part a
mean(fresh_15) # mean
median(fresh_15) # median
sd(fresh_15) # standard deviation
var(fresh_15) # variance
Out[3]:
65.0597014925373
Out[3]:
64
Out[3]:
11.2853895852718
Out[3]:
127.360018091361
In [4]:
# Problem 2 part b
summary(fresh_15) # five number summary
boxplot(fresh_15, horizontal=TRUE) # box plot
Out[4]:
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  42.00   56.50   64.00   65.06   70.50   97.00 
Out[4]:
In [5]:
# Problem 2 part c
hist(fresh_15) # histogram

library(plotrix) # recall: dot plot needs a special library loaded
dotplot.mtb(fresh_15) # dot plot
stem(fresh_15) # stem-leaf plot
Out[5]:
  The decimal point is 1 digit(s) to the right of the |

  4 | 29
  5 | 0122334445566667778899
  6 | 00123334444445567788889
  7 | 0001123444457
  8 | 017
  9 | 2347

Out[5]:

Part (d):

The histogram has classes of width 5 whereas the dot plot shows the individual data points. So the histogram will show the data more grouped together and, therefore, has taller bars in the middle. Using the dot plot you can see exactly how the data is arranged and it is more accurate. The histogram shows more clearly the the data trend.

Part 2

In [10]:
library(readxl) # load special library so that R can understand excel (spreadsheet) files
fresh_15_excel <- read_excel("06-Freshman15.xlsx") # save the excel spreadsheet to a descriptive name like "fresh_15_excel", or similar
In [15]:
str(fresh_15_excel) # str shows the "structure" of fresh_15_excel.
    # str() also shows the first few rows of data
Classes ‘tbl_df’, ‘tbl’ and 'data.frame':	67 obs. of  5 variables:
 $ SEX      : chr  "M" "M" "M" "M" ...
 $ WT SEPT  : num  72 97 74 93 68 59 64 56 70 58 ...
 $ WT APRIL : num  59 86 69 88 64 55 60 53 68 56 ...
 $ BMI SEPT : num  22 19.7 24.1 27 21.5 ...
 $ BMI APRIL: num  18.1 17.4 22.4 25.6 20.1 ...
In [13]:
names(fresh_15_excel) # show names of columns
Out[13]:
  1. 'SEX'
  2. 'WT SEPT'
  3. 'WT APRIL'
  4. 'BMI SEPT'
  5. 'BMI APRIL'

Problem 3

The first variable "SEX" keeps track of the gender of the individuals sampled. The second variable "WT SEPT" tells us the weight of the individuals in September of their freshman year and "WT APRIL" tells us the weight of the individuals in April of their freshman year.

Similarly, "BMI SEPT" and "BMI APRIL" tells us the BMI, or Body-Mass Index, of the individuals in September and in April of their freshman year.

In [0]:
# END OF LAB