R is a powerful, comprehensive, open-source software framework for doing statistics. It is possible to download and install the software on computers, or to use it through a website-interface without downloading anything.
We will use R through a website-interface in the form of a Jupyter notebook or R Markdown document. In fact, what you are reading here is an R-Markdown document that will guide you through the first steps of getting familiar with R.
Navigate to the Labs folder and find the Lab_1 folder. Inside you will find a Juptyper Notebook which can run and execute R code titled “Stat50-Lab_1-YourLastName_YourFirstName-W20”. Re-name the file with the obvious modifications.
At the beginning of your Jupyter Notebook, double-click on the “Lab1” text. Replace the text “FirstName LastName” with your actual first and last name. Click “run cell” (looks like a play button) or hit shift+enter
.
Have some fun and make a few calculations
You can perform all sorts of basic arithmetic using R:
## [1] 4
Of course we know 2+2 is 4. In the above, you’ll see the gray box shows the code, 2+2
and in the box blow that you’ll see the output 4
.
We can input much more complicated expressions as well. Just use parentheses liberally!
If we want to evaluate the expression \(\sqrt{\frac{2+5\cdot 7}{10}}\) we need to know what the square root, multiplication, and division syntax used.
## [1] 1.923538
You can add spaces so that it’s easier to understand/read the code.
Let’s look at one more useful feature of R: assignment.
If you type the symbols less-than and hyphen, you type something that looks like an arrow pointing to the left: <-
. You can use this arrow to assign values to variables.
For example, you can create a variable x
and assign it the value 3
by typing this into the console:
Notice there’s no output. This is because we’ve only defined the variable. We need to “call” the variable.
## [1] 3
Notice by typing x
in the next line we DO get an output of 3
You can name variable almost anything.
If you want to assign a variable a description instead of a number we use quotes:
## [1] "red"
Comments are very useful in programming and coding. They are parts of code that are NOT read or implemented by the program but are there to be helpful for humans–that is, for you or me to read :-)
In R, any text after a hashtag is not read by the program for that line.
For example,
## [1] 11
If you need to write regular text, then you need to create a different structure that understands plain text called “markdown.”
To do this, click on the “Cell” menu, then under “Cell Type”, select “Change to Markdown M”. Double-click on this cell and you can now start writing.
Let’s assume we have a simple data set of: 1,2,2,3,3,3,4,4,4,4,5,5,5,5,5,10.
When entering data “by hand” we use the following code:
If we want to see our data, we have to type the name
## [1] 1 2 2 3 3 3 4 4 4 4 5 5 5 5 5 10
Now my_cool_data
is stored and ready to be studied.
We calculate the mean of the data set my_cool_data
as follows:
## [1] 4.0625
Notice the parentheses after the mean that sandwich the data name.
Similarly, we can calculate the median, standard deviation, and variance as follows:
## [1] 4
## [1] 2.015564
## [1] 4.0625
We can also have the 5 number summary
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.000 3.000 4.000 4.062 5.000 10.000
If we want to see the box plot, we use:
Notice that by default, the box-plot is vertical. We like it to look horizontal so we add the following code horizontal=TRUE
and use a comma after the data name.
A histogram is easy to produce using:
A dot plot is easy to produce using:
A stem-leaf plot is easy as well. To make it more interesting, we’ll introduce a new data set called “ruth”.
##
## The decimal point is 1 digit(s) to the right of the |
##
## 2 | 25
## 3 | 45
## 4 | 1166679
## 5 | 449
## 6 | 0
Finally, pie charts require a bit more to set-up.
grades <- c(4, 10 , 37, 8, 1) # the data for each "slice" of the pie
lbls <- c("A", "B", "C", "D", "F") # the labels to apply to each slice (in order)
pie(grades, labels = lbls, main="Final Exam Grades for a Statistics Course")
# Notes:
# "labels= " will create labels that we defined above
# "main=" will create a title for the pie chart
72, 97, 74, 93, 68, 59, 64, 56, 70, 58, 50, 71, 67, 56, 70, 61, 53, 92, 57, 67,
58, 49, 68, 69, 87, 81, 60, 52, 70, 63, 56, 68, 68, 54, 80, 64, 57, 63, 54, 56,
54, 73, 77, 63, 51, 59, 65, 53, 62, 55, 74, 74, 64, 64, 57, 64, 60, 64, 66, 52,
71, 55, 65, 75, 42, 74, 94
We need a special package that can read “Excel” files (files that end with .xls
or .xlsx
) called a “library”. To load a library we need to know the name of it. The library we want to load is called “readxl”.
Type that into a new cell in your lab notebook.
Next, we want R to read the data in the file “06-Freshman15.xlsx”. To do this, type the following code in a new line:
To see what is in the file “06-Freshman15.xlsx”, we use the function str()
## Classes 'tbl_df', 'tbl' and 'data.frame': 67 obs. of 5 variables:
## $ SEX : chr "M" "M" "M" "M" ...
## $ WT SEPT : num 72 97 74 93 68 59 64 56 70 58 ...
## $ WT APRIL : num 59 86 69 88 64 55 60 53 68 56 ...
## $ BMI SEPT : num 22 19.7 24.1 27 21.5 ...
## $ BMI APRIL: num 18.1 17.4 22.4 25.6 20.1 ...
## [1] "SEX" "WT SEPT" "WT APRIL" "BMI SEPT" "BMI APRIL"
## [1] 67 5
The 66 observations tells us there are 66 pieces of data for each variable. Note that the first value is the variable names so we subtract 1. There are also 5 variables.
We see that the variables are SEX
in first row, WT SEPT
in second row, etc.
Next, let’s start some computations.
Work in Progress…