Download the ipython notebook.

These are just some of the things I find useful. Feel free to search around for others.

R is a fully functional programming language so one can define functions in it. For example, you might get tired of always typing http://stats191.stanford.edu/data. You could make a small function

useful_function = function(dataname) { return(paste("http://stats191.stanford.edu/data/", dataname, sep = "")) } useful_function("heights.table")

## [1] "http://stats191.stanford.edu/data/heights.table"

Let’s load the heights data with less code

h.table = read.table(useful_function("heights.table"), header = T, sep = ",")

When working on a particular project or assignment, it is often easiest
to type commands in a text editor and rerun them several times. The
command *source* is an easy way to do this, and it takes either the name
of a file or a URL as argument. Suppose we have a webpage
http://stats191.stanford.edu/R/mycode.R with the function *dataurl*
above:

```
dataurl = function(dataname) {
return(paste("http://stats191.stanford.edu/data/", dataname, sep=''))
}
```

Then, we can execute this as follows

source("http://stats191.stanford.edu/R/mycode.R") dataurl("heights.table")

## [1] "http://stats191.stanford.edu/data/ heights.table"

As you go through the course, you might add some other useful functions to this file.

The function *c*, concatenation, is used often in R, as are *rep* and
*seq*

X = 3 Y = 4 c(X, Y)

## [1] 3 4

rep(1, 4)

## [1] 1 1 1 1

rep(2, 3)

## [1] 2 2 2

c(rep(1, 4), rep(2, 3))

## [1] 1 1 1 1 2 2 2

seq(0, 10, length = 11)

## [1] 0 1 2 3 4 5 6 7 8 9 10

seq(0, 10, by = 2)

## [1] 0 2 4 6 8 10

You can sort and order sequences

X = c(4, 6, 2, 9) sort(X)

## [1] 2 4 6 9

Use an ordering of X to sort a list of Y in the same order

Y = c(1, 2, 3, 4) o = order(X) X[o]

## [1] 2 4 6 9

Y[o]

## [1] 3 1 2 4

R has a very rich plotting library. Most of our plots will be fairly straightforward, “scatter plots”.

X = c(1:40) Y = 2 + 3 * X + rnorm(40) * 10 plot(X, Y)

The plots can be made nicer by adding colors and using different
symbols. See the help for function *par*.

plot(X, Y, pch = 21, bg = "red")

plot(X, Y, pch = 23, bg = "red")

You can add titles, as well as change the axis labels.

plot(X, Y, pch = 23, bg = "red", main = "A simulated data set", xlab = "Predictor", ylab = "Outcome")

Lines are added with *abline*. We’ll add some lines to our previous
plot: a yellow line with intercept 2, slope 3, width 3, type 2, as well
as a vertical line at x=20 and horizontal line at y=60.

plot(X, Y, pch = 23, bg = "red", main = "A simulated data set", xlab = "Predictor", ylab = "Outcome") abline(2, 3, lwd = 3, lty = 2, col = "yellow") abline(h = 60) abline(v = 20)

You can add points and lines to existing plots.

plot(X[1:20], Y[1:20], pch = 21, bg = "red", xlim = c(min(X), max(X)), ylim = c(min(Y), max(Y))) points(X[21:40], Y[21:40], pch = 21, bg = "blue") lines(X[21:40], Y[21:40], lwd = 2, lty = 3, col = "orange")

You can put more than one plot on each device. Here we create a 2-by-1 grid of plots

par(mfrow = c(2, 1)) plot(X, Y, pch = 21, bg = "red") plot(Y, X, pch = 23, bg = "blue")

par(mfrow = c(1, 1))

Plots can be saved as *pdf*, *png*, *jpg* among other formats. Let’s
save a plot in a file called “myplot.jpg”

```
jpeg("myplot.jpg")
plot(X, Y, pch=21, bg='red')
dev.off()
```

Several plots can be saved using *pdf* files. This example has two plots
in it.

```
pdf("myplots.pdf")
# make whatever plot you want
# first page
plot(X, Y, pch=21, bg='red')
# a new call to plot will make a new page
plot(Y, X, pch=23, bg='blue')
# close the current "device" which is this pdf file
dev.off()
```

This should cover a lot of our plotting needs.

It is easy to use *for* loops in R

for (i in 1:10) { print(i^2) }

## [1] 1 ## [1] 4 ## [1] 9 ## [1] 16 ## [1] 25 ## [1] 36 ## [1] 49 ## [1] 64 ## [1] 81 ## [1] 100

for (w in c("red", "blue", "green")) { print(w) }

## [1] "red" ## [1] "blue" ## [1] "green"

Note that big loops can get really slow, a drawback of many high-level languages.

R has a builtin help system, which can be accessed and searched as follows

```
> help(t.test)
> help.search('t.test')
```

Many objects also have examples that show you their usage

`> example(lm)`

In practice, we will often be using the distribution (CDF), quantile
(inverse CDF) of standard random variables like the *T*, *F*,
chi-squared and normal.

The standard 1.96 (about 2) standard deviation rule for : (note that 1-0.05/2=0.975)

qnorm(0.975)

## [1] 1.96

We might want the upper quantile for an F with 2,40 degrees of freedom:

qf(0.95, 2, 40)

## [1] 3.232

So, any observed F greater than 3.23 will get rejected at the level. Alternatively, we might have observed an F of 5 with 2, 40 degrees of freedom, and want the p-value

1 - pf(5, 2, 40)

## [1] 0.01153

Let’s compare this p-value with a chi-squared with 2 degrees of freedom, which is like an F with infinite degrees of freedom in the denominator (send 40 to infinity). We also should multiply the 5 by 2 because it’s divided by 2 (numerator degrees of freedom) in the F.

1 - pchisq(5 * 2, 2)

## [1] 0.006738

1 - pf(5, 2, 400)

## [1] 0.007165

Other common distributions used in applied statistics are *norm*, *t*.