Leo Pekelis
March 13th, 2013, Stanford University
following Terza, Basu and Rathouz (2008)
\(y = f(s \beta_s + u \beta_u) + \epsilon\)
\(f: R \rightarrow R\) a known function
\(s\) is endogenous variable(s)
\(u\) unobserved covariates
\(s = z \rho + u \alpha_u + \tau\)
for linear models, both approaches give the same answer
for nonlinear models, the 1st is biased, while the second is consistent
let \(L(X)\) be the linear sub-space spanned by \(X\)
\(y = X \beta + \epsilon\) is projection of \(y\) onto \(L(X)\)
we are worried that \(L(s)\) is not orthogonal to \(L(u)\)
if had access to \(u\), regression would orthogonalize for us
(draw a picture)
we assume \(z\) is independent of \(u\)
hence \(L(z)\) is orthogonal to \(L(u)\)
hence projecting \(s\) onto \(L(z)\) orthogonalizes \(s\) with respect to \(u\)
for simplicity assume \(L(z) = L(s)\)
“super relevence”
(draw a nice picture of why this is causal)
for linear models, projecting onto \(L(z)\) and taking the orthogonal part does the same thing
examples: logistic, poisson, log-linear
(draw a picture)
Standard 2SPS is not consistent exactly because the outcome space is non-linear
different values of \(u\) affect projection onto \(\mu(\beta_s)\)
(draw a picture)
In contrast, 2SRI gives a consistent estimate of \(u\)
So we project onto the right part of the \(\mu\) space
(draw a final picture)
As \(n \rightarrow \infty\), \(L^\perp(z) \rightarrow L(u)\)
And \(\mu(\hat{\eta}_s, \hat{\eta_z^\perp}) \rightarrow \mu(\hat{\eta}_s,\hat{\eta}_u)\)
Work with Youssef Zaidan, Resident in Radiation Therapy
“there are several hundred minor salivary glands that are too small to see without a microscope”
“Retrospective studies show that adjuvant radiation therapy improves locoregional control of salivary gland tumors.”"
“SEER analysis of minor salivary tumors show that T-stage, site, and grade are important factors for predicting lymph node metastasis”
“Prior SEER analysis showed that adjuvant RT is associated with improved survival for high-grade and/or locally advanced major salivary gland tumors”
“To determine whether addition of postoperative radiation influences survival of a subset of patients with minor salivary gland tumors, through analysis of the SEER database.”
Data collection began in 1973
Previous work on major salivary gland tumors ran Cox-Proportional Hazards Models
individual univariate on each convarite
one multivariate using all covariates and RT
\(\lambda(t|x,RT,u) = \lambda_0(t) e^{RT \beta_{RT} + x \beta_x + u \beta_u}\)
But who is IV?
# lets see if region is a reasonable IV
setwd("/Users/leopekelis/Desktop/13_youssef_mac")
code.data <- read.csv("seer minor salivary gland 1988-2008 - staged and coded v1.1 - Coded Data.csv")
bkgd.data <- read.csv("seer minor salivary gland 1988-2008 - staged and coded v1.1 - Background Data.csv")
loc.data <- read.csv("seer minor salivary gland 1988-2008 - staged and coded v1.1 - Registry ID.csv")
seer.data = merge(merge(code.data, bkgd.data, by = "Patient.ID"),
loc.data, by = "Patient.ID")
covs = c("Age.at.diagnosis", "Sex", "Race", "Year.of.diagnosis",
"Tumor.location", "T", "N", "Grade", "Histology", "Surgery")
RT.unknown = which(seer.data$Radiation.sequence.with.surgery == 7) #remove these
seer.data = seer.data[-RT.unknown, ]
seer.data$Adj.RT = seer.data$Radiation.sequence.with.surgery %in%
c(1, 3)
factor.idx = c(3, 4, 6, 7, 8, 10, 11, 28)
for (i in factor.idx) {
seer.data[, i] = as.factor(seer.data[, i])
}
form = as.formula(paste("Adj.RT ~ Registry.ID + ", paste(covs, collapse = "+")))
# combine some locations
temp = seer.data$Registry.ID
levels(temp) <- c("Alaska/Hawaii - 1973+", levels(temp)[2:6], "Alaska/Hawaii - 1973+",
levels(temp)[8], "Kentucky / Rural Georgia - 1992+", levels(temp)[10:13],
"Kentucky / Rural Georgia - 1992+", "California SF/SJM/LA - 1973+", "California SF/SJM/LA - 1973+",
levels(temp)[17:18])
temp = factor(levels(temp)[as.numeric(temp)], levels = c(levels(temp)[3],
levels(temp)[-3]))
seer.data$Registry.ID = temp
adj.rt.vec = NULL
for (i in levels(seer.data$Registry.ID)) {
Adj.RT.temp = seer.data$Adj.RT[seer.data$Registry.ID == i]
adj.rt.vec[i] = sum(Adj.RT.temp)/length(Adj.RT.temp)
}
IV.data = cbind(round(adj.rt.vec, 2), round(table(seer.data$Registry.ID)/dim(seer.data)[1],
2))
colnames(IV.data) = c("Adj.RT.Percent", "Percent.Obs")
print(IV.data) #lets get an overview of locations
## Adj.RT.Percent Percent.Obs
## California excluding SF/SJM/LA - 2000+ 0.22 0.15
## Alaska/Hawaii - 1973+ 0.19 0.02
## Atlanta (Metropolitan) - 1975+ 0.31 0.05
## Connecticut - 1973+ 0.23 0.06
## Detroit (Metropolitan) - 1973+ 0.25 0.10
## Greater Georgia - 2000+ 0.32 0.04
## Iowa - 1973+ 0.23 0.06
## Kentucky / Rural Georgia - 1992+ 0.28 0.04
## Los Angeles - 1992+ 0.23 0.14
## Louisiana - 2000+ 0.33 0.03
## New Jersey - 2000+ 0.28 0.05
## New Mexico - 1973+ 0.21 0.03
## California SF/SJM/LA - 1973+ 0.30 0.13
## Seattle (Puget Sound) - 1974+ 0.22 0.08
## Utah - 1973+ 0.16 0.03
IV.glm = glm(form, data = seer.data, family = binomial)
summary(IV.glm)
##
## Call:
## glm(formula = form, family = binomial, data = seer.data)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -2.846 -0.542 -0.343 0.248 2.595
##
## Coefficients:
## Estimate Std. Error z value
## (Intercept) -4.65e+01 3.99e+02 -0.12
## Registry.IDAlaska/Hawaii - 1973+ -5.15e-01 5.70e-01 -0.90
## Registry.IDAtlanta (Metropolitan) - 1975+ 6.81e-01 3.43e-01 1.98
## Registry.IDConnecticut - 1973+ 1.61e-01 3.29e-01 0.49
## Registry.IDDetroit (Metropolitan) - 1973+ 3.37e-01 2.83e-01 1.19
## Registry.IDGreater Georgia - 2000+ 6.73e-01 3.51e-01 1.92
## Registry.IDIowa - 1973+ 1.11e-01 3.31e-01 0.33
## Registry.IDKentucky / Rural Georgia - 1992+ 4.71e-01 3.74e-01 1.26
## Registry.IDLos Angeles - 1992+ -8.03e-02 2.57e-01 -0.31
## Registry.IDLouisiana - 2000+ 3.00e-01 3.85e-01 0.78
## Registry.IDNew Jersey - 2000+ 1.38e-01 3.25e-01 0.42
## Registry.IDNew Mexico - 1973+ 1.69e-01 4.57e-01 0.37
## Registry.IDCalifornia SF/SJM/LA - 1973+ 6.00e-01 2.55e-01 2.36
## Registry.IDSeattle (Puget Sound) - 1974+ 3.09e-01 3.01e-01 1.03
## Registry.IDUtah - 1973+ -8.82e-01 5.13e-01 -1.72
## Age.at.diagnosis -2.61e-04 3.99e-03 -0.07
## Sex2 1.58e-01 1.32e-01 1.19
## Race2 1.53e-01 3.13e-01 0.49
## Race3 4.73e-03 2.04e-01 0.02
## Race4 -5.03e-02 6.28e-01 -0.08
## Year.of.diagnosis 1.32e-02 2.29e-02 0.58
## Tumor.location2 -7.77e-01 2.68e-01 -2.90
## Tumor.location3 3.57e-01 7.45e-01 0.48
## Tumor.location4 -2.41e-01 4.63e-01 -0.52
## Tumor.location5 -1.34e+00 4.84e-01 -2.76
## Tumor.location6 1.27e-02 4.97e-01 0.03
## Tumor.location7 1.70e-01 3.17e-01 0.53
## Tumor.location8 -1.88e-01 7.61e-01 -0.25
## Tumor.location9 8.66e-02 3.23e-01 0.27
## Tumor.location10 4.83e-01 5.40e-01 0.90
## T2 2.74e-01 1.88e-01 1.46
## T3 1.09e+00 2.43e-01 4.47
## T4 1.18e+00 1.92e-01 6.13
## T5 -4.16e-01 2.32e-01 -1.79
## N2 1.14e+00 3.74e-01 3.05
## N3 1.11e+00 2.64e-01 4.20
## N4 8.20e-02 7.78e-01 0.11
## N5 -4.41e-01 1.97e-01 -2.24
## Grade 6.87e-01 8.33e-02 8.24
## Histology2 -1.81e+00 9.57e-01 -1.89
## Histology3 -6.12e-01 9.63e-01 -0.63
## Histology4 -2.32e+00 9.55e-01 -2.43
## Histology5 -1.87e+01 2.24e+03 -0.01
## Histology6 -8.94e-01 1.62e+00 -0.55
## Histology7 1.28e+00 6.53e+03 0.00
## Histology8 -1.95e+01 4.55e+03 0.00
## Histology9 -1.51e+00 1.07e+00 -1.41
## Histology10 -1.60e+00 1.07e+00 -1.50
## Histology11 -1.81e+00 1.00e+00 -1.81
## Surgery2 1.93e+01 3.97e+02 0.05
## Surgery3 1.93e+01 3.97e+02 0.05
## Surgery4 1.90e+01 3.97e+02 0.05
## Surgery5 1.90e+01 3.97e+02 0.05
## Surgery6 1.96e+01 3.97e+02 0.05
## Surgery7 1.92e+01 3.97e+02 0.05
## Surgery8 2.01e+01 3.97e+02 0.05
## Pr(>|z|)
## (Intercept) 0.9073
## Registry.IDAlaska/Hawaii - 1973+ 0.3667
## Registry.IDAtlanta (Metropolitan) - 1975+ 0.0473 *
## Registry.IDConnecticut - 1973+ 0.6251
## Registry.IDDetroit (Metropolitan) - 1973+ 0.2342
## Registry.IDGreater Georgia - 2000+ 0.0550 .
## Registry.IDIowa - 1973+ 0.7380
## Registry.IDKentucky / Rural Georgia - 1992+ 0.2085
## Registry.IDLos Angeles - 1992+ 0.7546
## Registry.IDLouisiana - 2000+ 0.4368
## Registry.IDNew Jersey - 2000+ 0.6715
## Registry.IDNew Mexico - 1973+ 0.7119
## Registry.IDCalifornia SF/SJM/LA - 1973+ 0.0185 *
## Registry.IDSeattle (Puget Sound) - 1974+ 0.3043
## Registry.IDUtah - 1973+ 0.0856 .
## Age.at.diagnosis 0.9478
## Sex2 0.2329
## Race2 0.6253
## Race3 0.9815
## Race4 0.9362
## Year.of.diagnosis 0.5624
## Tumor.location2 0.0037 **
## Tumor.location3 0.6312
## Tumor.location4 0.6021
## Tumor.location5 0.0057 **
## Tumor.location6 0.9796
## Tumor.location7 0.5927
## Tumor.location8 0.8043
## Tumor.location9 0.7889
## Tumor.location10 0.3706
## T2 0.1448
## T3 7.7e-06 ***
## T4 8.7e-10 ***
## T5 0.0736 .
## N2 0.0023 **
## N3 2.7e-05 ***
## N4 0.9161
## N5 0.0252 *
## Grade < 2e-16 ***
## Histology2 0.0588 .
## Histology3 0.5255
## Histology4 0.0150 *
## Histology5 0.9933
## Histology6 0.5807
## Histology7 0.9998
## Histology8 0.9966
## Histology9 0.1578
## Histology10 0.1341
## Histology11 0.0704 .
## Surgery2 0.9611
## Surgery3 0.9611
## Surgery4 0.9618
## Surgery5 0.9618
## Surgery6 0.9607
## Surgery7 0.9615
## Surgery8 0.9596
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 2389.7 on 2117 degrees of freedom
## Residual deviance: 1564.6 on 2062 degrees of freedom
## AIC: 1677
##
## Number of Fisher Scoring iterations: 17
##
No unmeasured confounders?
# check for exclusion restriction by running cox model
library(survival)
## Loading required package: splines
names(seer.data)[14] = "Survival.Time"
form.ph = as.formula(paste("Surv(Survival.Time,as.numeric(Vital.status.recode)) ~ Adj.RT + Registry.ID + ",
paste(covs, collapse = "+")))
ex.ph = coxph(form.ph, data = seer.data)
## Warning: Loglik converged before variable 43,44,45 ; beta may be infinite.
summary(ex.ph)
## Call:
## coxph(formula = form.ph, data = seer.data)
##
## n= 2118, number of events= 576
##
## coef exp(coef) se(coef)
## Adj.RTTRUE -1.70e-01 8.44e-01 1.15e-01
## Registry.IDAlaska/Hawaii - 1973+ 5.66e-01 1.76e+00 3.53e-01
## Registry.IDAtlanta (Metropolitan) - 1975+ 4.39e-03 1.00e+00 2.37e-01
## Registry.IDConnecticut - 1973+ -3.00e-01 7.41e-01 2.19e-01
## Registry.IDDetroit (Metropolitan) - 1973+ -9.78e-02 9.07e-01 1.87e-01
## Registry.IDGreater Georgia - 2000+ -2.37e-02 9.77e-01 2.43e-01
## Registry.IDIowa - 1973+ -1.65e-01 8.48e-01 2.13e-01
## Registry.IDKentucky / Rural Georgia - 1992+ -3.85e-01 6.81e-01 2.99e-01
## Registry.IDLos Angeles - 1992+ -8.54e-02 9.18e-01 1.78e-01
## Registry.IDLouisiana - 2000+ 5.39e-02 1.06e+00 2.64e-01
## Registry.IDNew Jersey - 2000+ 6.41e-02 1.07e+00 2.37e-01
## Registry.IDNew Mexico - 1973+ 1.05e-01 1.11e+00 3.06e-01
## Registry.IDCalifornia SF/SJM/LA - 1973+ -1.50e-01 8.61e-01 1.82e-01
## Registry.IDSeattle (Puget Sound) - 1974+ 4.47e-02 1.05e+00 2.05e-01
## Registry.IDUtah - 1973+ -6.31e-01 5.32e-01 2.98e-01
## Age.at.diagnosis 5.75e-02 1.06e+00 3.48e-03
## Sex2 3.16e-01 1.37e+00 9.03e-02
## Race2 -5.84e-01 5.58e-01 2.38e-01
## Race3 -1.73e-01 8.41e-01 1.35e-01
## Race4 -1.52e+00 2.19e-01 1.01e+00
## Year.of.diagnosis -3.00e-02 9.70e-01 1.70e-02
## Tumor.location2 -3.25e-01 7.23e-01 1.96e-01
## Tumor.location3 1.66e-02 1.02e+00 3.26e-01
## Tumor.location4 -2.91e-01 7.47e-01 2.65e-01
## Tumor.location5 -2.40e-01 7.86e-01 2.91e-01
## Tumor.location6 -5.44e-02 9.47e-01 3.13e-01
## Tumor.location7 3.38e-01 1.40e+00 2.17e-01
## Tumor.location8 2.92e-01 1.34e+00 3.87e-01
## Tumor.location9 -5.50e-02 9.46e-01 2.30e-01
## Tumor.location10 -2.30e-01 7.94e-01 3.50e-01
## T2 2.13e-01 1.24e+00 1.42e-01
## T3 6.51e-01 1.92e+00 1.59e-01
## T4 6.40e-01 1.90e+00 1.29e-01
## T5 5.83e-02 1.06e+00 1.52e-01
## N2 2.46e-01 1.28e+00 2.13e-01
## N3 8.43e-01 2.32e+00 1.47e-01
## N4 1.64e+00 5.15e+00 3.98e-01
## N5 6.52e-02 1.07e+00 1.15e-01
## Grade 4.97e-01 1.64e+00 5.51e-02
## Histology2 -3.94e-01 6.74e-01 6.01e-01
## Histology3 -2.87e-01 7.51e-01 6.08e-01
## Histology4 -6.85e-01 5.04e-01 6.00e-01
## Histology5 -1.50e+01 3.11e-07 1.14e+03
## Histology6 -1.48e+01 3.69e-07 3.37e+03
## Histology7 -1.68e+01 5.06e-08 6.88e+03
## Histology8 -3.78e-01 6.85e-01 1.18e+00
## Histology9 -3.59e-01 6.98e-01 6.55e-01
## Histology10 -9.10e-01 4.03e-01 8.43e-01
## Histology11 3.31e-01 1.39e+00 6.17e-01
## Surgery2 -5.23e-01 5.93e-01 1.91e-01
## Surgery3 -4.41e-01 6.43e-01 2.90e-01
## Surgery4 -4.77e-01 6.21e-01 2.16e-01
## Surgery5 2.70e-01 1.31e+00 2.74e-01
## Surgery6 -1.13e-01 8.93e-01 1.82e-01
## Surgery7 -4.53e-01 6.36e-01 2.21e-01
## Surgery8 -1.03e+00 3.56e-01 7.30e-01
## z Pr(>|z|)
## Adj.RTTRUE -1.48 0.13891
## Registry.IDAlaska/Hawaii - 1973+ 1.60 0.10900
## Registry.IDAtlanta (Metropolitan) - 1975+ 0.02 0.98523
## Registry.IDConnecticut - 1973+ -1.37 0.17063
## Registry.IDDetroit (Metropolitan) - 1973+ -0.52 0.60067
## Registry.IDGreater Georgia - 2000+ -0.10 0.92215
## Registry.IDIowa - 1973+ -0.78 0.43809
## Registry.IDKentucky / Rural Georgia - 1992+ -1.29 0.19810
## Registry.IDLos Angeles - 1992+ -0.48 0.63161
## Registry.IDLouisiana - 2000+ 0.20 0.83821
## Registry.IDNew Jersey - 2000+ 0.27 0.78732
## Registry.IDNew Mexico - 1973+ 0.34 0.73073
## Registry.IDCalifornia SF/SJM/LA - 1973+ -0.82 0.41079
## Registry.IDSeattle (Puget Sound) - 1974+ 0.22 0.82751
## Registry.IDUtah - 1973+ -2.12 0.03394 *
## Age.at.diagnosis 16.50 < 2e-16 ***
## Sex2 3.50 0.00047 ***
## Race2 -2.45 0.01421 *
## Race3 -1.28 0.20132
## Race4 -1.50 0.13392
## Year.of.diagnosis -1.76 0.07762 .
## Tumor.location2 -1.65 0.09806 .
## Tumor.location3 0.05 0.95945
## Tumor.location4 -1.10 0.27199
## Tumor.location5 -0.82 0.40941
## Tumor.location6 -0.17 0.86208
## Tumor.location7 1.56 0.11811
## Tumor.location8 0.75 0.45067
## Tumor.location9 -0.24 0.81105
## Tumor.location10 -0.66 0.51094
## T2 1.51 0.13169
## T3 4.10 4.2e-05 ***
## T4 4.97 6.7e-07 ***
## T5 0.38 0.70126
## N2 1.16 0.24747
## N3 5.71 1.1e-08 ***
## N4 4.12 3.8e-05 ***
## N5 0.57 0.57101
## Grade 9.03 < 2e-16 ***
## Histology2 -0.66 0.51184
## Histology3 -0.47 0.63718
## Histology4 -1.14 0.25358
## Histology5 -0.01 0.98953
## Histology6 0.00 0.99649
## Histology7 0.00 0.99805
## Histology8 -0.32 0.74849
## Histology9 -0.55 0.58341
## Histology10 -1.08 0.28067
## Histology11 0.54 0.59084
## Surgery2 -2.74 0.00622 **
## Surgery3 -1.52 0.12859
## Surgery4 -2.21 0.02718 *
## Surgery5 0.98 0.32501
## Surgery6 -0.62 0.53421
## Surgery7 -2.05 0.04049 *
## Surgery8 -1.42 0.15682
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## exp(coef) exp(-coef) lower .95
## Adj.RTTRUE 8.44e-01 1.18e+00 0.6741
## Registry.IDAlaska/Hawaii - 1973+ 1.76e+00 5.68e-01 0.8815
## Registry.IDAtlanta (Metropolitan) - 1975+ 1.00e+00 9.96e-01 0.6312
## Registry.IDConnecticut - 1973+ 7.41e-01 1.35e+00 0.4828
## Registry.IDDetroit (Metropolitan) - 1973+ 9.07e-01 1.10e+00 0.6287
## Registry.IDGreater Georgia - 2000+ 9.77e-01 1.02e+00 0.6066
## Registry.IDIowa - 1973+ 8.48e-01 1.18e+00 0.5585
## Registry.IDKentucky / Rural Georgia - 1992+ 6.81e-01 1.47e+00 0.3788
## Registry.IDLos Angeles - 1992+ 9.18e-01 1.09e+00 0.6476
## Registry.IDLouisiana - 2000+ 1.06e+00 9.47e-01 0.6288
## Registry.IDNew Jersey - 2000+ 1.07e+00 9.38e-01 0.6695
## Registry.IDNew Mexico - 1973+ 1.11e+00 9.00e-01 0.6103
## Registry.IDCalifornia SF/SJM/LA - 1973+ 8.61e-01 1.16e+00 0.6026
## Registry.IDSeattle (Puget Sound) - 1974+ 1.05e+00 9.56e-01 0.6995
## Registry.IDUtah - 1973+ 5.32e-01 1.88e+00 0.2969
## Age.at.diagnosis 1.06e+00 9.44e-01 1.0520
## Sex2 1.37e+00 7.29e-01 1.1488
## Race2 5.58e-01 1.79e+00 0.3498
## Race3 8.41e-01 1.19e+00 0.6452
## Race4 2.19e-01 4.57e+00 0.0300
## Year.of.diagnosis 9.70e-01 1.03e+00 0.9387
## Tumor.location2 7.23e-01 1.38e+00 0.4920
## Tumor.location3 1.02e+00 9.84e-01 0.5362
## Tumor.location4 7.47e-01 1.34e+00 0.4446
## Tumor.location5 7.86e-01 1.27e+00 0.4444
## Tumor.location6 9.47e-01 1.06e+00 0.5128
## Tumor.location7 1.40e+00 7.13e-01 0.9176
## Tumor.location8 1.34e+00 7.47e-01 0.6273
## Tumor.location9 9.46e-01 1.06e+00 0.6030
## Tumor.location10 7.94e-01 1.26e+00 0.3996
## T2 1.24e+00 8.08e-01 0.9379
## T3 1.92e+00 5.22e-01 1.4044
## T4 1.90e+00 5.27e-01 1.4738
## T5 1.06e+00 9.43e-01 0.7870
## N2 1.28e+00 7.82e-01 0.8427
## N3 2.32e+00 4.31e-01 1.7398
## N4 5.15e+00 1.94e-01 2.3615
## N5 1.07e+00 9.37e-01 0.8518
## Grade 1.64e+00 6.08e-01 1.4762
## Histology2 6.74e-01 1.48e+00 0.2076
## Histology3 7.51e-01 1.33e+00 0.2282
## Histology4 5.04e-01 1.98e+00 0.1554
## Histology5 3.11e-07 3.22e+06 0.0000
## Histology6 3.69e-07 2.71e+06 0.0000
## Histology7 5.06e-08 1.98e+07 0.0000
## Histology8 6.85e-01 1.46e+00 0.0680
## Histology9 6.98e-01 1.43e+00 0.1936
## Histology10 4.03e-01 2.48e+00 0.0771
## Histology11 1.39e+00 7.18e-01 0.4160
## Surgery2 5.93e-01 1.69e+00 0.4074
## Surgery3 6.43e-01 1.55e+00 0.3643
## Surgery4 6.21e-01 1.61e+00 0.4064
## Surgery5 1.31e+00 7.63e-01 0.7652
## Surgery6 8.93e-01 1.12e+00 0.6249
## Surgery7 6.36e-01 1.57e+00 0.4120
## Surgery8 3.56e-01 2.81e+00 0.0850
## upper .95
## Adj.RTTRUE 1.057
## Registry.IDAlaska/Hawaii - 1973+ 3.519
## Registry.IDAtlanta (Metropolitan) - 1975+ 1.598
## Registry.IDConnecticut - 1973+ 1.138
## Registry.IDDetroit (Metropolitan) - 1973+ 1.308
## Registry.IDGreater Georgia - 2000+ 1.572
## Registry.IDIowa - 1973+ 1.287
## Registry.IDKentucky / Rural Georgia - 1992+ 1.223
## Registry.IDLos Angeles - 1992+ 1.302
## Registry.IDLouisiana - 2000+ 1.771
## Registry.IDNew Jersey - 2000+ 1.698
## Registry.IDNew Mexico - 1973+ 2.022
## Registry.IDCalifornia SF/SJM/LA - 1973+ 1.230
## Registry.IDSeattle (Puget Sound) - 1974+ 1.563
## Registry.IDUtah - 1973+ 0.953
## Age.at.diagnosis 1.066
## Sex2 1.637
## Race2 0.889
## Race3 1.097
## Race4 1.596
## Year.of.diagnosis 1.003
## Tumor.location2 1.062
## Tumor.location3 1.928
## Tumor.location4 1.256
## Tumor.location5 1.392
## Tumor.location6 1.749
## Tumor.location7 2.144
## Tumor.location8 2.858
## Tumor.location9 1.486
## Tumor.location10 1.579
## T2 1.634
## T3 2.617
## T4 2.442
## T5 1.428
## N2 1.942
## N3 3.101
## N4 11.241
## N5 1.338
## Grade 1.832
## Histology2 2.189
## Histology3 2.471
## Histology4 1.634
## Histology5 Inf
## Histology6 Inf
## Histology7 Inf
## Histology8 6.908
## Histology9 2.520
## Histology10 2.102
## Histology11 4.664
## Surgery2 0.862
## Surgery3 1.136
## Surgery4 0.948
## Surgery5 2.243
## Surgery6 1.276
## Surgery7 0.981
## Surgery8 1.488
##
## Concordance= 0.853 (se = 0.013 )
## Rsquare= 0.374 (max possible= 0.979 )
## Likelihood ratio test= 991 on 56 df, p=0
## Wald test = 555 on 56 df, p=0
## Score (logrank) test = 1158 on 56 df, p=0
##
# conclusion? don't get treated for salivary gland surgery in utah!
utah.patients = which(seer.data$Registry.ID == levels(seer.data$Registry.ID)[15])
seer.data = seer.data[-utah.patients, ]
seer.data$Registry.ID = factor(seer.data$Registry.ID, levels = levels(seer.data$Registry.ID)[1:14])
# now get the fitted values for IV
Adj.RT.IV = glm(form, data = seer.data, family = binomial)$fitted.values
seer.data$U.est = seer.data$Adj.RT - Adj.RT.IV
form.final = as.formula(paste("Surv(Survival.Time,as.numeric(Vital.status.recode)) ~ Adj.RT + U.est + ",
paste(covs, collapse = "+")))
final.ph = coxph(form.final, data = seer.data)
## Warning: Loglik converged before variable 30,31,32 ; beta may be infinite.
summary(final.ph)
## Call:
## coxph(formula = form.final, data = seer.data)
##
## n= 2062, number of events= 560
##
## coef exp(coef) se(coef) z Pr(>|z|)
## Adj.RTTRUE 7.38e-01 2.09e+00 3.97e-01 1.86 0.06316 .
## U.est -1.03e+00 3.57e-01 4.19e-01 -2.46 0.01399 *
## Age.at.diagnosis 5.63e-02 1.06e+00 3.49e-03 16.14 < 2e-16 ***
## Sex2 3.01e-01 1.35e+00 9.18e-02 3.28 0.00103 **
## Race2 -4.71e-01 6.25e-01 2.08e-01 -2.26 0.02388 *
## Race3 -1.47e-01 8.63e-01 1.30e-01 -1.13 0.25821
## Race4 -1.55e+00 2.12e-01 1.01e+00 -1.53 0.12488
## Year.of.diagnosis -3.44e-02 9.66e-01 1.65e-02 -2.08 0.03782 *
## Tumor.location2 -2.76e-01 7.58e-01 1.99e-01 -1.39 0.16436
## Tumor.location3 -1.03e-01 9.02e-01 3.24e-01 -0.32 0.74975
## Tumor.location4 -3.98e-01 6.72e-01 2.63e-01 -1.51 0.13121
## Tumor.location5 -5.15e-02 9.50e-01 2.96e-01 -0.17 0.86208
## Tumor.location6 -1.74e-01 8.40e-01 3.14e-01 -0.55 0.57915
## Tumor.location7 2.36e-01 1.27e+00 2.17e-01 1.09 0.27524
## Tumor.location8 1.83e-01 1.20e+00 3.96e-01 0.46 0.64479
## Tumor.location9 -8.39e-02 9.19e-01 2.32e-01 -0.36 0.71762
## Tumor.location10 -4.49e-01 6.38e-01 3.56e-01 -1.26 0.20711
## T2 1.54e-01 1.17e+00 1.43e-01 1.08 0.28225
## T3 4.77e-01 1.61e+00 1.70e-01 2.80 0.00506 **
## T4 4.81e-01 1.62e+00 1.47e-01 3.28 0.00106 **
## T5 6.47e-02 1.07e+00 1.52e-01 0.42 0.67108
## N2 1.50e-01 1.16e+00 2.21e-01 0.68 0.49785
## N3 7.38e-01 2.09e+00 1.60e-01 4.63 3.7e-06 ***
## N4 1.69e+00 5.42e+00 3.93e-01 4.30 1.7e-05 ***
## N5 9.85e-02 1.10e+00 1.17e-01 0.84 0.39898
## Grade 4.21e-01 1.52e+00 6.83e-02 6.16 7.2e-10 ***
## Histology2 -1.63e-01 8.50e-01 6.07e-01 -0.27 0.78869
## Histology3 -2.42e-01 7.85e-01 6.07e-01 -0.40 0.69017
## Histology4 -4.08e-01 6.65e-01 6.11e-01 -0.67 0.50469
## Histology5 -1.44e+01 5.61e-07 1.13e+03 -0.01 0.98980
## Histology6 -1.45e+01 5.29e-07 2.97e+03 0.00 0.99612
## Histology7 -1.67e+01 5.86e-08 6.09e+03 0.00 0.99782
## Histology8 1.17e-01 1.12e+00 1.19e+00 0.10 0.92178
## Histology9 -1.84e-01 8.32e-01 6.57e-01 -0.28 0.77963
## Histology10 -5.69e-01 5.66e-01 8.44e-01 -0.67 0.50048
## Histology11 5.06e-01 1.66e+00 6.21e-01 0.82 0.41461
## Surgery2 -1.02e+00 3.62e-01 2.84e-01 -3.57 0.00035 ***
## Surgery3 -9.91e-01 3.71e-01 3.72e-01 -2.67 0.00765 **
## Surgery4 -9.54e-01 3.85e-01 2.85e-01 -3.35 0.00080 ***
## Surgery5 -2.45e-01 7.83e-01 3.57e-01 -0.69 0.49218
## Surgery6 -7.21e-01 4.86e-01 2.99e-01 -2.41 0.01599 *
## Surgery7 -1.01e+00 3.63e-01 2.94e-01 -3.45 0.00056 ***
## Surgery8 -1.71e+00 1.81e-01 7.78e-01 -2.20 0.02791 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## exp(coef) exp(-coef) lower .95 upper .95
## Adj.RTTRUE 2.09e+00 4.78e-01 0.9604 4.552
## U.est 3.57e-01 2.80e+00 0.1569 0.812
## Age.at.diagnosis 1.06e+00 9.45e-01 1.0507 1.065
## Sex2 1.35e+00 7.40e-01 1.1291 1.618
## Race2 6.25e-01 1.60e+00 0.4152 0.940
## Race3 8.63e-01 1.16e+00 0.6683 1.114
## Race4 2.12e-01 4.73e+00 0.0291 1.538
## Year.of.diagnosis 9.66e-01 1.03e+00 0.9354 0.998
## Tumor.location2 7.58e-01 1.32e+00 0.5137 1.120
## Tumor.location3 9.02e-01 1.11e+00 0.4775 1.703
## Tumor.location4 6.72e-01 1.49e+00 0.4009 1.126
## Tumor.location5 9.50e-01 1.05e+00 0.5314 1.698
## Tumor.location6 8.40e-01 1.19e+00 0.4536 1.556
## Tumor.location7 1.27e+00 7.89e-01 0.8284 1.937
## Tumor.location8 1.20e+00 8.33e-01 0.5521 2.611
## Tumor.location9 9.19e-01 1.09e+00 0.5834 1.449
## Tumor.location10 6.38e-01 1.57e+00 0.3178 1.282
## T2 1.17e+00 8.57e-01 0.8810 1.545
## T3 1.61e+00 6.21e-01 1.1542 2.248
## T4 1.62e+00 6.18e-01 1.2131 2.158
## T5 1.07e+00 9.37e-01 0.7913 1.438
## N2 1.16e+00 8.61e-01 0.7534 1.791
## N3 2.09e+00 4.78e-01 1.5302 2.860
## N4 5.42e+00 1.84e-01 2.5097 11.726
## N5 1.10e+00 9.06e-01 0.8778 1.387
## Grade 1.52e+00 6.57e-01 1.3322 1.741
## Histology2 8.50e-01 1.18e+00 0.2585 2.794
## Histology3 7.85e-01 1.27e+00 0.2391 2.579
## Histology4 6.65e-01 1.50e+00 0.2007 2.204
## Histology5 5.61e-07 1.78e+06 0.0000 Inf
## Histology6 5.29e-07 1.89e+06 0.0000 Inf
## Histology7 5.86e-08 1.71e+07 0.0000 Inf
## Histology8 1.12e+00 8.90e-01 0.1091 11.576
## Histology9 8.32e-01 1.20e+00 0.2297 3.015
## Histology10 5.66e-01 1.77e+00 0.1082 2.963
## Histology11 1.66e+00 6.03e-01 0.4916 5.600
## Surgery2 3.62e-01 2.76e+00 0.2077 0.632
## Surgery3 3.71e-01 2.69e+00 0.1793 0.769
## Surgery4 3.85e-01 2.60e+00 0.2205 0.673
## Surgery5 7.83e-01 1.28e+00 0.3889 1.575
## Surgery6 4.86e-01 2.06e+00 0.2706 0.874
## Surgery7 3.63e-01 2.75e+00 0.2041 0.646
## Surgery8 1.81e-01 5.53e+00 0.0393 0.831
##
## Concordance= 0.854 (se = 0.013 )
## Rsquare= 0.375 (max possible= 0.979 )
## Likelihood ratio test= 968 on 43 df, p=0
## Wald test = 550 on 43 df, p=0
## Score (logrank) test = 1127 on 43 df, p=0
##
# variance for beta_AdjRT should be the same ... testing
# boot.func.ART <- function(data,idx) { data.temp = data[idx,] final.ph =
# coxph(form.final,data=data.temp)
# return(c(final.ph$coef[1],final.ph$var[1,1])) }
# library(boot)
# final.ph.boot = boot(data=seer.data,statistic=boot.func.ART,R=1000)
# var.est = var(final.ph.boot$t[,1])
# z.boot = final.ph.boot$t[,1] / final.ph.boot$t[,2]
# p.val = sum(z.boot > 0) / 1000
# exp(coef(final.ph)[1] + c(-1,1) * 2 * sqrt(var.est)) [1] 0.8712999
# 5.0171531
Joseph V. Terza, Anirban Basu, Paul J. Rathouz, Two-stage residual inclusion estimation: Addressing endogeneity in health econometric modeling, Journal of Health Economics, Volume 27, Issue 3, May 2008, Pages 531-543
Mahmood, U., et al., Adjuvant radiation therapy for high-grade and/or locally advanced major salivary gland tumors. Arch Otolaryngol Head Neck Surg. 137(10): p. 1025-30.
Overview of the SEER program. Surveillance, Epidemiology, and End Results program Web site. http://seer.cancer.gov/about. Accessed March 13, 2013