py4sci

Table Of Contents

Previous topic

Plotting Unemployment Data

Next topic

Discretization of a continuous score

This Page

Some help for R

Download the ipython notebook.

These are just some of the things I find useful. Feel free to search around for others.

Functions

R is a fully functional programming language so one can define functions in it. For example, you might get tired of always typing http://stats191.stanford.edu/data. You could make a small function

    useful_function = function(dataname) {
        return(paste("http://stats191.stanford.edu/data/", dataname, sep = ""))
    }

    useful_function("heights.table")

    ## [1] "http://stats191.stanford.edu/data/heights.table"

Let’s load the heights data with less code

    h.table = read.table(useful_function("heights.table"), header = T, sep = ",")

Source

When working on a particular project or assignment, it is often easiest to type commands in a text editor and rerun them several times. The command source is an easy way to do this, and it takes either the name of a file or a URL as argument. Suppose we have a webpage http://stats191.stanford.edu/R/mycode.R with the function dataurl above:

dataurl = function(dataname) {
    return(paste("http://stats191.stanford.edu/data/", dataname, sep=''))
}

Then, we can execute this as follows

    source("http://stats191.stanford.edu/R/mycode.R")
    dataurl("heights.table")

    ## [1] "http://stats191.stanford.edu/data/ heights.table"

As you go through the course, you might add some other useful functions to this file.

Concatenation, sequences

The function c, concatenation, is used often in R, as are rep and seq

    X = 3
    Y = 4
    c(X, Y)

    ## [1] 3 4

    rep(1, 4)

    ## [1] 1 1 1 1

    rep(2, 3)

    ## [1] 2 2 2

    c(rep(1, 4), rep(2, 3))

    ## [1] 1 1 1 1 2 2 2

    seq(0, 10, length = 11)

    ##  [1]  0  1  2  3  4  5  6  7  8  9 10

    seq(0, 10, by = 2)

    ## [1]  0  2  4  6  8 10

You can sort and order sequences

    X = c(4, 6, 2, 9)
    sort(X)

    ## [1] 2 4 6 9

Use an ordering of X to sort a list of Y in the same order

    Y = c(1, 2, 3, 4)
    o = order(X)
    X[o]

    ## [1] 2 4 6 9

    Y[o]

    ## [1] 3 1 2 4

Plotting

R has a very rich plotting library. Most of our plots will be fairly straightforward, “scatter plots”.

    X = c(1:40)
    Y = 2 + 3 * X + rnorm(40) * 10
    plot(X, Y)

_images/helpR_fig_00.png

The plots can be made nicer by adding colors and using different symbols. See the help for function par.

    plot(X, Y, pch = 21, bg = "red")

_images/helpR_fig_01.png
    plot(X, Y, pch = 23, bg = "red")

_images/helpR_fig_02.png

You can add titles, as well as change the axis labels.

    plot(X, Y, pch = 23, bg = "red", main = "A simulated data set", xlab = "Predictor",
        ylab = "Outcome")

_images/helpR_fig_03.png

Lines are added with abline. We’ll add some lines to our previous plot: a yellow line with intercept 2, slope 3, width 3, type 2, as well as a vertical line at x=20 and horizontal line at y=60.

    plot(X, Y, pch = 23, bg = "red", main = "A simulated data set", xlab = "Predictor",
        ylab = "Outcome")
    abline(2, 3, lwd = 3, lty = 2, col = "yellow")
    abline(h = 60)
    abline(v = 20)

_images/helpR_fig_04.png

You can add points and lines to existing plots.

    plot(X[1:20], Y[1:20], pch = 21, bg = "red", xlim = c(min(X), max(X)), ylim = c(min(Y),
        max(Y)))
    points(X[21:40], Y[21:40], pch = 21, bg = "blue")
    lines(X[21:40], Y[21:40], lwd = 2, lty = 3, col = "orange")

_images/helpR_fig_05.png

You can put more than one plot on each device. Here we create a 2-by-1 grid of plots

    par(mfrow = c(2, 1))
    plot(X, Y, pch = 21, bg = "red")
    plot(Y, X, pch = 23, bg = "blue")

_images/helpR_fig_06.png
    par(mfrow = c(1, 1))

Saving plots

Plots can be saved as pdf, png, jpg among other formats. Let’s save a plot in a file called “myplot.jpg”

jpeg("myplot.jpg")
plot(X, Y, pch=21, bg='red')
dev.off()

Several plots can be saved using pdf files. This example has two plots in it.

pdf("myplots.pdf")

# make whatever plot you want
# first page
plot(X, Y, pch=21, bg='red')

# a new call to plot will make a new page
plot(Y, X, pch=23, bg='blue')

# close the current "device" which is this pdf file
dev.off()

This should cover a lot of our plotting needs.

Loops

It is easy to use for loops in R

    for (i in 1:10) {
        print(i^2)
    }

    ## [1] 1
    ## [1] 4
    ## [1] 9
    ## [1] 16
    ## [1] 25
    ## [1] 36
    ## [1] 49
    ## [1] 64
    ## [1] 81
    ## [1] 100

    for (w in c("red", "blue", "green")) {
        print(w)
    }

    ## [1] "red"
    ## [1] "blue"
    ## [1] "green"

Note that big loops can get really slow, a drawback of many high-level languages.

Builtin help

R has a builtin help system, which can be accessed and searched as follows

> help(t.test)
> help.search('t.test')

Many objects also have examples that show you their usage

> example(lm)

Distributions in R

In practice, we will often be using the distribution (CDF), quantile (inverse CDF) of standard random variables like the T, F, chi-squared and normal.

The standard 1.96 (about 2) standard deviation rule for \alpha=0.05: (note that 1-0.05/2=0.975)

    qnorm(0.975)

    ## [1] 1.96

We might want the \alpha=0.05 upper quantile for an F with 2,40 degrees of freedom:

    qf(0.95, 2, 40)

    ## [1] 3.232

So, any observed F greater than 3.23 will get rejected at the \alpha=0.05 level. Alternatively, we might have observed an F of 5 with 2, 40 degrees of freedom, and want the p-value

    1 - pf(5, 2, 40)

    ## [1] 0.01153

Let’s compare this p-value with a chi-squared with 2 degrees of freedom, which is like an F with infinite degrees of freedom in the denominator (send 40 to infinity). We also should multiply the 5 by 2 because it’s divided by 2 (numerator degrees of freedom) in the F.

    1 - pchisq(5 * 2, 2)

    ## [1] 0.006738

    1 - pf(5, 2, 400)

    ## [1] 0.007165

Other common distributions used in applied statistics are norm, t.