#### Previous topic

Plotting Unemployment Data

#### Next topic

Discretization of a continuous score

# Some help for R¶

These are just some of the things I find useful. Feel free to search around for others.

## Functions¶

R is a fully functional programming language so one can define functions in it. For example, you might get tired of always typing http://stats191.stanford.edu/data. You could make a small function

    useful_function = function(dataname) {
return(paste("http://stats191.stanford.edu/data/", dataname, sep = ""))
}

useful_function("heights.table")


    ##  "http://stats191.stanford.edu/data/heights.table"



Let’s load the heights data with less code

    h.table = read.table(useful_function("heights.table"), header = T, sep = ",")



## Source¶

When working on a particular project or assignment, it is often easiest to type commands in a text editor and rerun them several times. The command source is an easy way to do this, and it takes either the name of a file or a URL as argument. Suppose we have a webpage http://stats191.stanford.edu/R/mycode.R with the function dataurl above:

dataurl = function(dataname) {
return(paste("http://stats191.stanford.edu/data/", dataname, sep=''))
}

Then, we can execute this as follows

    source("http://stats191.stanford.edu/R/mycode.R")
dataurl("heights.table")


    ##  "http://stats191.stanford.edu/data/ heights.table"



As you go through the course, you might add some other useful functions to this file.

## Concatenation, sequences¶

The function c, concatenation, is used often in R, as are rep and seq

    X = 3
Y = 4
c(X, Y)


    ##  3 4


    rep(1, 4)


    ##  1 1 1 1


    rep(2, 3)


    ##  2 2 2


    c(rep(1, 4), rep(2, 3))


    ##  1 1 1 1 2 2 2


    seq(0, 10, length = 11)


    ##    0  1  2  3  4  5  6  7  8  9 10


    seq(0, 10, by = 2)


    ##   0  2  4  6  8 10



You can sort and order sequences

    X = c(4, 6, 2, 9)
sort(X)


    ##  2 4 6 9



Use an ordering of X to sort a list of Y in the same order

    Y = c(1, 2, 3, 4)
o = order(X)
X[o]


    ##  2 4 6 9


    Y[o]


    ##  3 1 2 4



## Plotting¶

R has a very rich plotting library. Most of our plots will be fairly straightforward, “scatter plots”.

    X = c(1:40)
Y = 2 + 3 * X + rnorm(40) * 10
plot(X, Y) The plots can be made nicer by adding colors and using different symbols. See the help for function par.

    plot(X, Y, pch = 21, bg = "red") plot(X, Y, pch = 23, bg = "red") You can add titles, as well as change the axis labels.

    plot(X, Y, pch = 23, bg = "red", main = "A simulated data set", xlab = "Predictor",
ylab = "Outcome") Lines are added with abline. We’ll add some lines to our previous plot: a yellow line with intercept 2, slope 3, width 3, type 2, as well as a vertical line at x=20 and horizontal line at y=60.

    plot(X, Y, pch = 23, bg = "red", main = "A simulated data set", xlab = "Predictor",
ylab = "Outcome")
abline(2, 3, lwd = 3, lty = 2, col = "yellow")
abline(h = 60)
abline(v = 20) You can add points and lines to existing plots.

    plot(X[1:20], Y[1:20], pch = 21, bg = "red", xlim = c(min(X), max(X)), ylim = c(min(Y),
max(Y)))
points(X[21:40], Y[21:40], pch = 21, bg = "blue")
lines(X[21:40], Y[21:40], lwd = 2, lty = 3, col = "orange") You can put more than one plot on each device. Here we create a 2-by-1 grid of plots

    par(mfrow = c(2, 1))
plot(X, Y, pch = 21, bg = "red")
plot(Y, X, pch = 23, bg = "blue") par(mfrow = c(1, 1))



## Saving plots¶

Plots can be saved as pdf, png, jpg among other formats. Let’s save a plot in a file called “myplot.jpg”

jpeg("myplot.jpg")
plot(X, Y, pch=21, bg='red')
dev.off()


Several plots can be saved using pdf files. This example has two plots in it.

pdf("myplots.pdf")

# make whatever plot you want
# first page
plot(X, Y, pch=21, bg='red')

# a new call to plot will make a new page
plot(Y, X, pch=23, bg='blue')

# close the current "device" which is this pdf file
dev.off()


This should cover a lot of our plotting needs.

## Loops¶

It is easy to use for loops in R

    for (i in 1:10) {
print(i^2)
}


    ##  1
##  4
##  9
##  16
##  25
##  36
##  49
##  64
##  81
##  100


    for (w in c("red", "blue", "green")) {
print(w)
}


    ##  "red"
##  "blue"
##  "green"



Note that big loops can get really slow, a drawback of many high-level languages.

## Builtin help¶

R has a builtin help system, which can be accessed and searched as follows

> help(t.test)
> help.search('t.test')

Many objects also have examples that show you their usage

> example(lm)

## Distributions in R¶

In practice, we will often be using the distribution (CDF), quantile (inverse CDF) of standard random variables like the T, F, chi-squared and normal.

The standard 1.96 (about 2) standard deviation rule for : (note that 1-0.05/2=0.975)

    qnorm(0.975)


    ##  1.96



We might want the upper quantile for an F with 2,40 degrees of freedom:

    qf(0.95, 2, 40)


    ##  3.232



So, any observed F greater than 3.23 will get rejected at the level. Alternatively, we might have observed an F of 5 with 2, 40 degrees of freedom, and want the p-value

    1 - pf(5, 2, 40)


    ##  0.01153



Let’s compare this p-value with a chi-squared with 2 degrees of freedom, which is like an F with infinite degrees of freedom in the denominator (send 40 to infinity). We also should multiply the 5 by 2 because it’s divided by 2 (numerator degrees of freedom) in the F.

    1 - pchisq(5 * 2, 2)


    ##  0.006738


    1 - pf(5, 2, 400)


    ##  0.007165



Other common distributions used in applied statistics are norm, t.