# Two-Stage Residual Inclusion Estimation: : IVs for non-linear models

March 13th, 2013, Stanford University

# Outline

1. Model / Set-up
2. 2SPS vs 2SRI
3. Geometric Interpretation for LM
4. Geometric Interpretation for GLM
5. Application: Adjuvant RT for Minor Salivary Gland Tumors

# Model / Set-up

• following Terza, Basu and Rathouz (2008)

• $$y = f(s \beta_s + u \beta_u) + \epsilon$$

• $$f: R \rightarrow R$$ a known function

• $$s$$ is endogenous variable(s)

• $$u$$ unobserved covariates

• $$s = z \rho + u \alpha_u + \tau$$

• $$z$$ is the instrumental variable

# 2SPS vs 2SRI

• 2SPS - two-stage predictor substitution
• same as two-stage least squares
• estimate $$\hat{s} = z \hat{rho}$$
• $$y = f(\hat{s} \beta_s) + \epsilon_1$$
• 2SRI - two-stage residual inclusion
• estimate $$\hat{x_u} = s - z \hat{rho} = s - \hat{s}$$
• $$y = f(s \beta_s + \hat{x_u} \beta_u) + \epsilon_2$$
• for linear models, both approaches give the same answer

• for nonlinear models, the 1st is biased, while the second is consistent

# Geometric Interpretation for Linear Models

• let $$L(X)$$ be the linear sub-space spanned by $$X$$

• $$y = X \beta + \epsilon$$ is projection of $$y$$ onto $$L(X)$$

• (draw a picture)
• we are worried that $$L(s)$$ is not orthogonal to $$L(u)$$

• if had access to $$u$$, regression would orthogonalize for us

• (draw a picture)

# Geometric Interpretation for Linear Models

• we assume $$z$$ is independent of $$u$$

• hence $$L(z)$$ is orthogonal to $$L(u)$$

• hence projecting $$s$$ onto $$L(z)$$ orthogonalizes $$s$$ with respect to $$u$$

• for simplicity assume $$L(z) = L(s)$$

• “super relevence”

• (draw a nice picture of why this is causal)

• for linear models, projecting onto $$L(z)$$ and taking the orthogonal part does the same thing

• (draw a picture)

# Geometric Interpretation for Generalized Linear Models

• examples: logistic, poisson, log-linear

• (draw a picture)

• Standard 2SPS is not consistent exactly because the outcome space is non-linear

• different values of $$u$$ affect projection onto $$\mu(\beta_s)$$

• (draw a picture)

• In contrast, 2SRI gives a consistent estimate of $$u$$

• So we project onto the right part of the $$\mu$$ space

• (draw a final picture)

• As $$n \rightarrow \infty$$, $$L^\perp(z) \rightarrow L(u)$$

• And $$\mu(\hat{\eta}_s, \hat{\eta_z^\perp}) \rightarrow \mu(\hat{\eta}_s,\hat{\eta}_u)$$

# Application: Adjuvant RT for Minor Salivary Gland Tumors

• Work with Youssef Zaidan, Resident in Radiation Therapy

• “there are several hundred minor salivary glands that are too small to see without a microscope”

• “Retrospective studies show that adjuvant radiation therapy improves locoregional control of salivary gland tumors.”"

• “SEER analysis of minor salivary tumors show that T-stage, site, and grade are important factors for predicting lymph node metastasis”

• “Prior SEER analysis showed that adjuvant RT is associated with improved survival for high-grade and/or locally advanced major salivary gland tumors”

• “To determine whether addition of postoperative radiation influences survival of a subset of patients with minor salivary gland tumors, through analysis of the SEER database.”

# The SEER database

• “The Surveillance, Epidemiology, and End Results (SEER) Program of the National Cancer Institute works to provide information on cancer statistics in an effort to reduce the burden of cancer among the U.S. Population.”
• Data collection began in 1973

• have about 200 variables to pull

# The Analysis

• Previous work on major salivary gland tumors ran Cox-Proportional Hazards Models

• individual univariate on each convarite

• one multivariate using all covariates and RT

• $$\lambda(t|x,RT,u) = \lambda_0(t) e^{RT \beta_{RT} + x \beta_x + u \beta_u}$$

• can use 2SRI to estimate the causal effect of $$RT$$
• But who is IV?

# The IV - Geographic Region

# lets see if region is a reasonable IV
setwd("/Users/leopekelis/Desktop/13_youssef_mac")

code.data <- read.csv("seer minor salivary gland 1988-2008 - staged and coded v1.1 - Coded Data.csv")
bkgd.data <- read.csv("seer minor salivary gland 1988-2008 - staged and coded v1.1 - Background Data.csv")
loc.data <- read.csv("seer minor salivary gland 1988-2008 - staged and coded v1.1 - Registry ID.csv")

seer.data = merge(merge(code.data, bkgd.data, by = "Patient.ID"),
loc.data, by = "Patient.ID")

covs = c("Age.at.diagnosis", "Sex", "Race", "Year.of.diagnosis",
"Tumor.location", "T", "N", "Grade", "Histology", "Surgery")

RT.unknown = which(seer.data$Radiation.sequence.with.surgery == 7) #remove these seer.data = seer.data[-RT.unknown, ] seer.data$Adj.RT = seer.data$Radiation.sequence.with.surgery %in% c(1, 3) factor.idx = c(3, 4, 6, 7, 8, 10, 11, 28) for (i in factor.idx) { seer.data[, i] = as.factor(seer.data[, i]) } form = as.formula(paste("Adj.RT ~ Registry.ID + ", paste(covs, collapse = "+"))) # combine some locations temp = seer.data$Registry.ID
levels(temp)[8], "Kentucky / Rural Georgia - 1992+", levels(temp)[10:13],
"Kentucky / Rural Georgia - 1992+", "California SF/SJM/LA - 1973+", "California SF/SJM/LA - 1973+",
levels(temp)[17:18])

temp = factor(levels(temp)[as.numeric(temp)], levels = c(levels(temp)[3],
levels(temp)[-3]))
seer.data$Registry.ID = temp adj.rt.vec = NULL for (i in levels(seer.data$Registry.ID)) {
Adj.RT.temp = seer.data$Adj.RT[seer.data$Registry.ID == i]
}
IV.data = cbind(round(adj.rt.vec, 2), round(table(seer.data$Registry.ID)/dim(seer.data)[1], 2)) colnames(IV.data) = c("Adj.RT.Percent", "Percent.Obs") print(IV.data) #lets get an overview of locations ## Adj.RT.Percent Percent.Obs ## California excluding SF/SJM/LA - 2000+ 0.22 0.15 ## Alaska/Hawaii - 1973+ 0.19 0.02 ## Atlanta (Metropolitan) - 1975+ 0.31 0.05 ## Connecticut - 1973+ 0.23 0.06 ## Detroit (Metropolitan) - 1973+ 0.25 0.10 ## Greater Georgia - 2000+ 0.32 0.04 ## Iowa - 1973+ 0.23 0.06 ## Kentucky / Rural Georgia - 1992+ 0.28 0.04 ## Los Angeles - 1992+ 0.23 0.14 ## Louisiana - 2000+ 0.33 0.03 ## New Jersey - 2000+ 0.28 0.05 ## New Mexico - 1973+ 0.21 0.03 ## California SF/SJM/LA - 1973+ 0.30 0.13 ## Seattle (Puget Sound) - 1974+ 0.22 0.08 ## Utah - 1973+ 0.16 0.03  IV.glm = glm(form, data = seer.data, family = binomial) summary(IV.glm) ## ## Call: ## glm(formula = form, family = binomial, data = seer.data) ## ## Deviance Residuals: ## Min 1Q Median 3Q Max ## -2.846 -0.542 -0.343 0.248 2.595 ## ## Coefficients: ## Estimate Std. Error z value ## (Intercept) -4.65e+01 3.99e+02 -0.12 ## Registry.IDAlaska/Hawaii - 1973+ -5.15e-01 5.70e-01 -0.90 ## Registry.IDAtlanta (Metropolitan) - 1975+ 6.81e-01 3.43e-01 1.98 ## Registry.IDConnecticut - 1973+ 1.61e-01 3.29e-01 0.49 ## Registry.IDDetroit (Metropolitan) - 1973+ 3.37e-01 2.83e-01 1.19 ## Registry.IDGreater Georgia - 2000+ 6.73e-01 3.51e-01 1.92 ## Registry.IDIowa - 1973+ 1.11e-01 3.31e-01 0.33 ## Registry.IDKentucky / Rural Georgia - 1992+ 4.71e-01 3.74e-01 1.26 ## Registry.IDLos Angeles - 1992+ -8.03e-02 2.57e-01 -0.31 ## Registry.IDLouisiana - 2000+ 3.00e-01 3.85e-01 0.78 ## Registry.IDNew Jersey - 2000+ 1.38e-01 3.25e-01 0.42 ## Registry.IDNew Mexico - 1973+ 1.69e-01 4.57e-01 0.37 ## Registry.IDCalifornia SF/SJM/LA - 1973+ 6.00e-01 2.55e-01 2.36 ## Registry.IDSeattle (Puget Sound) - 1974+ 3.09e-01 3.01e-01 1.03 ## Registry.IDUtah - 1973+ -8.82e-01 5.13e-01 -1.72 ## Age.at.diagnosis -2.61e-04 3.99e-03 -0.07 ## Sex2 1.58e-01 1.32e-01 1.19 ## Race2 1.53e-01 3.13e-01 0.49 ## Race3 4.73e-03 2.04e-01 0.02 ## Race4 -5.03e-02 6.28e-01 -0.08 ## Year.of.diagnosis 1.32e-02 2.29e-02 0.58 ## Tumor.location2 -7.77e-01 2.68e-01 -2.90 ## Tumor.location3 3.57e-01 7.45e-01 0.48 ## Tumor.location4 -2.41e-01 4.63e-01 -0.52 ## Tumor.location5 -1.34e+00 4.84e-01 -2.76 ## Tumor.location6 1.27e-02 4.97e-01 0.03 ## Tumor.location7 1.70e-01 3.17e-01 0.53 ## Tumor.location8 -1.88e-01 7.61e-01 -0.25 ## Tumor.location9 8.66e-02 3.23e-01 0.27 ## Tumor.location10 4.83e-01 5.40e-01 0.90 ## T2 2.74e-01 1.88e-01 1.46 ## T3 1.09e+00 2.43e-01 4.47 ## T4 1.18e+00 1.92e-01 6.13 ## T5 -4.16e-01 2.32e-01 -1.79 ## N2 1.14e+00 3.74e-01 3.05 ## N3 1.11e+00 2.64e-01 4.20 ## N4 8.20e-02 7.78e-01 0.11 ## N5 -4.41e-01 1.97e-01 -2.24 ## Grade 6.87e-01 8.33e-02 8.24 ## Histology2 -1.81e+00 9.57e-01 -1.89 ## Histology3 -6.12e-01 9.63e-01 -0.63 ## Histology4 -2.32e+00 9.55e-01 -2.43 ## Histology5 -1.87e+01 2.24e+03 -0.01 ## Histology6 -8.94e-01 1.62e+00 -0.55 ## Histology7 1.28e+00 6.53e+03 0.00 ## Histology8 -1.95e+01 4.55e+03 0.00 ## Histology9 -1.51e+00 1.07e+00 -1.41 ## Histology10 -1.60e+00 1.07e+00 -1.50 ## Histology11 -1.81e+00 1.00e+00 -1.81 ## Surgery2 1.93e+01 3.97e+02 0.05 ## Surgery3 1.93e+01 3.97e+02 0.05 ## Surgery4 1.90e+01 3.97e+02 0.05 ## Surgery5 1.90e+01 3.97e+02 0.05 ## Surgery6 1.96e+01 3.97e+02 0.05 ## Surgery7 1.92e+01 3.97e+02 0.05 ## Surgery8 2.01e+01 3.97e+02 0.05 ## Pr(>|z|) ## (Intercept) 0.9073 ## Registry.IDAlaska/Hawaii - 1973+ 0.3667 ## Registry.IDAtlanta (Metropolitan) - 1975+ 0.0473 * ## Registry.IDConnecticut - 1973+ 0.6251 ## Registry.IDDetroit (Metropolitan) - 1973+ 0.2342 ## Registry.IDGreater Georgia - 2000+ 0.0550 . ## Registry.IDIowa - 1973+ 0.7380 ## Registry.IDKentucky / Rural Georgia - 1992+ 0.2085 ## Registry.IDLos Angeles - 1992+ 0.7546 ## Registry.IDLouisiana - 2000+ 0.4368 ## Registry.IDNew Jersey - 2000+ 0.6715 ## Registry.IDNew Mexico - 1973+ 0.7119 ## Registry.IDCalifornia SF/SJM/LA - 1973+ 0.0185 * ## Registry.IDSeattle (Puget Sound) - 1974+ 0.3043 ## Registry.IDUtah - 1973+ 0.0856 . ## Age.at.diagnosis 0.9478 ## Sex2 0.2329 ## Race2 0.6253 ## Race3 0.9815 ## Race4 0.9362 ## Year.of.diagnosis 0.5624 ## Tumor.location2 0.0037 ** ## Tumor.location3 0.6312 ## Tumor.location4 0.6021 ## Tumor.location5 0.0057 ** ## Tumor.location6 0.9796 ## Tumor.location7 0.5927 ## Tumor.location8 0.8043 ## Tumor.location9 0.7889 ## Tumor.location10 0.3706 ## T2 0.1448 ## T3 7.7e-06 *** ## T4 8.7e-10 *** ## T5 0.0736 . ## N2 0.0023 ** ## N3 2.7e-05 *** ## N4 0.9161 ## N5 0.0252 * ## Grade < 2e-16 *** ## Histology2 0.0588 . ## Histology3 0.5255 ## Histology4 0.0150 * ## Histology5 0.9933 ## Histology6 0.5807 ## Histology7 0.9998 ## Histology8 0.9966 ## Histology9 0.1578 ## Histology10 0.1341 ## Histology11 0.0704 . ## Surgery2 0.9611 ## Surgery3 0.9611 ## Surgery4 0.9618 ## Surgery5 0.9618 ## Surgery6 0.9607 ## Surgery7 0.9615 ## Surgery8 0.9596 ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## (Dispersion parameter for binomial family taken to be 1) ## ## Null deviance: 2389.7 on 2117 degrees of freedom ## Residual deviance: 1564.6 on 2062 degrees of freedom ## AIC: 1677 ## ## Number of Fisher Scoring iterations: 17 ##  # The IV - Geographic Region 1. Relevance? • not completely insignificant 2. No unmeasured confounders? 3. Exclusion restriction • can we check this? # The IV - Geographic Region  # check for exclusion restriction by running cox model library(survival) ## Loading required package: splines  names(seer.data)[14] = "Survival.Time" form.ph = as.formula(paste("Surv(Survival.Time,as.numeric(Vital.status.recode)) ~ Adj.RT + Registry.ID + ", paste(covs, collapse = "+"))) ex.ph = coxph(form.ph, data = seer.data) ## Warning: Loglik converged before variable 43,44,45 ; beta may be infinite.  summary(ex.ph) ## Call: ## coxph(formula = form.ph, data = seer.data) ## ## n= 2118, number of events= 576 ## ## coef exp(coef) se(coef) ## Adj.RTTRUE -1.70e-01 8.44e-01 1.15e-01 ## Registry.IDAlaska/Hawaii - 1973+ 5.66e-01 1.76e+00 3.53e-01 ## Registry.IDAtlanta (Metropolitan) - 1975+ 4.39e-03 1.00e+00 2.37e-01 ## Registry.IDConnecticut - 1973+ -3.00e-01 7.41e-01 2.19e-01 ## Registry.IDDetroit (Metropolitan) - 1973+ -9.78e-02 9.07e-01 1.87e-01 ## Registry.IDGreater Georgia - 2000+ -2.37e-02 9.77e-01 2.43e-01 ## Registry.IDIowa - 1973+ -1.65e-01 8.48e-01 2.13e-01 ## Registry.IDKentucky / Rural Georgia - 1992+ -3.85e-01 6.81e-01 2.99e-01 ## Registry.IDLos Angeles - 1992+ -8.54e-02 9.18e-01 1.78e-01 ## Registry.IDLouisiana - 2000+ 5.39e-02 1.06e+00 2.64e-01 ## Registry.IDNew Jersey - 2000+ 6.41e-02 1.07e+00 2.37e-01 ## Registry.IDNew Mexico - 1973+ 1.05e-01 1.11e+00 3.06e-01 ## Registry.IDCalifornia SF/SJM/LA - 1973+ -1.50e-01 8.61e-01 1.82e-01 ## Registry.IDSeattle (Puget Sound) - 1974+ 4.47e-02 1.05e+00 2.05e-01 ## Registry.IDUtah - 1973+ -6.31e-01 5.32e-01 2.98e-01 ## Age.at.diagnosis 5.75e-02 1.06e+00 3.48e-03 ## Sex2 3.16e-01 1.37e+00 9.03e-02 ## Race2 -5.84e-01 5.58e-01 2.38e-01 ## Race3 -1.73e-01 8.41e-01 1.35e-01 ## Race4 -1.52e+00 2.19e-01 1.01e+00 ## Year.of.diagnosis -3.00e-02 9.70e-01 1.70e-02 ## Tumor.location2 -3.25e-01 7.23e-01 1.96e-01 ## Tumor.location3 1.66e-02 1.02e+00 3.26e-01 ## Tumor.location4 -2.91e-01 7.47e-01 2.65e-01 ## Tumor.location5 -2.40e-01 7.86e-01 2.91e-01 ## Tumor.location6 -5.44e-02 9.47e-01 3.13e-01 ## Tumor.location7 3.38e-01 1.40e+00 2.17e-01 ## Tumor.location8 2.92e-01 1.34e+00 3.87e-01 ## Tumor.location9 -5.50e-02 9.46e-01 2.30e-01 ## Tumor.location10 -2.30e-01 7.94e-01 3.50e-01 ## T2 2.13e-01 1.24e+00 1.42e-01 ## T3 6.51e-01 1.92e+00 1.59e-01 ## T4 6.40e-01 1.90e+00 1.29e-01 ## T5 5.83e-02 1.06e+00 1.52e-01 ## N2 2.46e-01 1.28e+00 2.13e-01 ## N3 8.43e-01 2.32e+00 1.47e-01 ## N4 1.64e+00 5.15e+00 3.98e-01 ## N5 6.52e-02 1.07e+00 1.15e-01 ## Grade 4.97e-01 1.64e+00 5.51e-02 ## Histology2 -3.94e-01 6.74e-01 6.01e-01 ## Histology3 -2.87e-01 7.51e-01 6.08e-01 ## Histology4 -6.85e-01 5.04e-01 6.00e-01 ## Histology5 -1.50e+01 3.11e-07 1.14e+03 ## Histology6 -1.48e+01 3.69e-07 3.37e+03 ## Histology7 -1.68e+01 5.06e-08 6.88e+03 ## Histology8 -3.78e-01 6.85e-01 1.18e+00 ## Histology9 -3.59e-01 6.98e-01 6.55e-01 ## Histology10 -9.10e-01 4.03e-01 8.43e-01 ## Histology11 3.31e-01 1.39e+00 6.17e-01 ## Surgery2 -5.23e-01 5.93e-01 1.91e-01 ## Surgery3 -4.41e-01 6.43e-01 2.90e-01 ## Surgery4 -4.77e-01 6.21e-01 2.16e-01 ## Surgery5 2.70e-01 1.31e+00 2.74e-01 ## Surgery6 -1.13e-01 8.93e-01 1.82e-01 ## Surgery7 -4.53e-01 6.36e-01 2.21e-01 ## Surgery8 -1.03e+00 3.56e-01 7.30e-01 ## z Pr(>|z|) ## Adj.RTTRUE -1.48 0.13891 ## Registry.IDAlaska/Hawaii - 1973+ 1.60 0.10900 ## Registry.IDAtlanta (Metropolitan) - 1975+ 0.02 0.98523 ## Registry.IDConnecticut - 1973+ -1.37 0.17063 ## Registry.IDDetroit (Metropolitan) - 1973+ -0.52 0.60067 ## Registry.IDGreater Georgia - 2000+ -0.10 0.92215 ## Registry.IDIowa - 1973+ -0.78 0.43809 ## Registry.IDKentucky / Rural Georgia - 1992+ -1.29 0.19810 ## Registry.IDLos Angeles - 1992+ -0.48 0.63161 ## Registry.IDLouisiana - 2000+ 0.20 0.83821 ## Registry.IDNew Jersey - 2000+ 0.27 0.78732 ## Registry.IDNew Mexico - 1973+ 0.34 0.73073 ## Registry.IDCalifornia SF/SJM/LA - 1973+ -0.82 0.41079 ## Registry.IDSeattle (Puget Sound) - 1974+ 0.22 0.82751 ## Registry.IDUtah - 1973+ -2.12 0.03394 * ## Age.at.diagnosis 16.50 < 2e-16 *** ## Sex2 3.50 0.00047 *** ## Race2 -2.45 0.01421 * ## Race3 -1.28 0.20132 ## Race4 -1.50 0.13392 ## Year.of.diagnosis -1.76 0.07762 . ## Tumor.location2 -1.65 0.09806 . ## Tumor.location3 0.05 0.95945 ## Tumor.location4 -1.10 0.27199 ## Tumor.location5 -0.82 0.40941 ## Tumor.location6 -0.17 0.86208 ## Tumor.location7 1.56 0.11811 ## Tumor.location8 0.75 0.45067 ## Tumor.location9 -0.24 0.81105 ## Tumor.location10 -0.66 0.51094 ## T2 1.51 0.13169 ## T3 4.10 4.2e-05 *** ## T4 4.97 6.7e-07 *** ## T5 0.38 0.70126 ## N2 1.16 0.24747 ## N3 5.71 1.1e-08 *** ## N4 4.12 3.8e-05 *** ## N5 0.57 0.57101 ## Grade 9.03 < 2e-16 *** ## Histology2 -0.66 0.51184 ## Histology3 -0.47 0.63718 ## Histology4 -1.14 0.25358 ## Histology5 -0.01 0.98953 ## Histology6 0.00 0.99649 ## Histology7 0.00 0.99805 ## Histology8 -0.32 0.74849 ## Histology9 -0.55 0.58341 ## Histology10 -1.08 0.28067 ## Histology11 0.54 0.59084 ## Surgery2 -2.74 0.00622 ** ## Surgery3 -1.52 0.12859 ## Surgery4 -2.21 0.02718 * ## Surgery5 0.98 0.32501 ## Surgery6 -0.62 0.53421 ## Surgery7 -2.05 0.04049 * ## Surgery8 -1.42 0.15682 ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## exp(coef) exp(-coef) lower .95 ## Adj.RTTRUE 8.44e-01 1.18e+00 0.6741 ## Registry.IDAlaska/Hawaii - 1973+ 1.76e+00 5.68e-01 0.8815 ## Registry.IDAtlanta (Metropolitan) - 1975+ 1.00e+00 9.96e-01 0.6312 ## Registry.IDConnecticut - 1973+ 7.41e-01 1.35e+00 0.4828 ## Registry.IDDetroit (Metropolitan) - 1973+ 9.07e-01 1.10e+00 0.6287 ## Registry.IDGreater Georgia - 2000+ 9.77e-01 1.02e+00 0.6066 ## Registry.IDIowa - 1973+ 8.48e-01 1.18e+00 0.5585 ## Registry.IDKentucky / Rural Georgia - 1992+ 6.81e-01 1.47e+00 0.3788 ## Registry.IDLos Angeles - 1992+ 9.18e-01 1.09e+00 0.6476 ## Registry.IDLouisiana - 2000+ 1.06e+00 9.47e-01 0.6288 ## Registry.IDNew Jersey - 2000+ 1.07e+00 9.38e-01 0.6695 ## Registry.IDNew Mexico - 1973+ 1.11e+00 9.00e-01 0.6103 ## Registry.IDCalifornia SF/SJM/LA - 1973+ 8.61e-01 1.16e+00 0.6026 ## Registry.IDSeattle (Puget Sound) - 1974+ 1.05e+00 9.56e-01 0.6995 ## Registry.IDUtah - 1973+ 5.32e-01 1.88e+00 0.2969 ## Age.at.diagnosis 1.06e+00 9.44e-01 1.0520 ## Sex2 1.37e+00 7.29e-01 1.1488 ## Race2 5.58e-01 1.79e+00 0.3498 ## Race3 8.41e-01 1.19e+00 0.6452 ## Race4 2.19e-01 4.57e+00 0.0300 ## Year.of.diagnosis 9.70e-01 1.03e+00 0.9387 ## Tumor.location2 7.23e-01 1.38e+00 0.4920 ## Tumor.location3 1.02e+00 9.84e-01 0.5362 ## Tumor.location4 7.47e-01 1.34e+00 0.4446 ## Tumor.location5 7.86e-01 1.27e+00 0.4444 ## Tumor.location6 9.47e-01 1.06e+00 0.5128 ## Tumor.location7 1.40e+00 7.13e-01 0.9176 ## Tumor.location8 1.34e+00 7.47e-01 0.6273 ## Tumor.location9 9.46e-01 1.06e+00 0.6030 ## Tumor.location10 7.94e-01 1.26e+00 0.3996 ## T2 1.24e+00 8.08e-01 0.9379 ## T3 1.92e+00 5.22e-01 1.4044 ## T4 1.90e+00 5.27e-01 1.4738 ## T5 1.06e+00 9.43e-01 0.7870 ## N2 1.28e+00 7.82e-01 0.8427 ## N3 2.32e+00 4.31e-01 1.7398 ## N4 5.15e+00 1.94e-01 2.3615 ## N5 1.07e+00 9.37e-01 0.8518 ## Grade 1.64e+00 6.08e-01 1.4762 ## Histology2 6.74e-01 1.48e+00 0.2076 ## Histology3 7.51e-01 1.33e+00 0.2282 ## Histology4 5.04e-01 1.98e+00 0.1554 ## Histology5 3.11e-07 3.22e+06 0.0000 ## Histology6 3.69e-07 2.71e+06 0.0000 ## Histology7 5.06e-08 1.98e+07 0.0000 ## Histology8 6.85e-01 1.46e+00 0.0680 ## Histology9 6.98e-01 1.43e+00 0.1936 ## Histology10 4.03e-01 2.48e+00 0.0771 ## Histology11 1.39e+00 7.18e-01 0.4160 ## Surgery2 5.93e-01 1.69e+00 0.4074 ## Surgery3 6.43e-01 1.55e+00 0.3643 ## Surgery4 6.21e-01 1.61e+00 0.4064 ## Surgery5 1.31e+00 7.63e-01 0.7652 ## Surgery6 8.93e-01 1.12e+00 0.6249 ## Surgery7 6.36e-01 1.57e+00 0.4120 ## Surgery8 3.56e-01 2.81e+00 0.0850 ## upper .95 ## Adj.RTTRUE 1.057 ## Registry.IDAlaska/Hawaii - 1973+ 3.519 ## Registry.IDAtlanta (Metropolitan) - 1975+ 1.598 ## Registry.IDConnecticut - 1973+ 1.138 ## Registry.IDDetroit (Metropolitan) - 1973+ 1.308 ## Registry.IDGreater Georgia - 2000+ 1.572 ## Registry.IDIowa - 1973+ 1.287 ## Registry.IDKentucky / Rural Georgia - 1992+ 1.223 ## Registry.IDLos Angeles - 1992+ 1.302 ## Registry.IDLouisiana - 2000+ 1.771 ## Registry.IDNew Jersey - 2000+ 1.698 ## Registry.IDNew Mexico - 1973+ 2.022 ## Registry.IDCalifornia SF/SJM/LA - 1973+ 1.230 ## Registry.IDSeattle (Puget Sound) - 1974+ 1.563 ## Registry.IDUtah - 1973+ 0.953 ## Age.at.diagnosis 1.066 ## Sex2 1.637 ## Race2 0.889 ## Race3 1.097 ## Race4 1.596 ## Year.of.diagnosis 1.003 ## Tumor.location2 1.062 ## Tumor.location3 1.928 ## Tumor.location4 1.256 ## Tumor.location5 1.392 ## Tumor.location6 1.749 ## Tumor.location7 2.144 ## Tumor.location8 2.858 ## Tumor.location9 1.486 ## Tumor.location10 1.579 ## T2 1.634 ## T3 2.617 ## T4 2.442 ## T5 1.428 ## N2 1.942 ## N3 3.101 ## N4 11.241 ## N5 1.338 ## Grade 1.832 ## Histology2 2.189 ## Histology3 2.471 ## Histology4 1.634 ## Histology5 Inf ## Histology6 Inf ## Histology7 Inf ## Histology8 6.908 ## Histology9 2.520 ## Histology10 2.102 ## Histology11 4.664 ## Surgery2 0.862 ## Surgery3 1.136 ## Surgery4 0.948 ## Surgery5 2.243 ## Surgery6 1.276 ## Surgery7 0.981 ## Surgery8 1.488 ## ## Concordance= 0.853 (se = 0.013 ) ## Rsquare= 0.374 (max possible= 0.979 ) ## Likelihood ratio test= 991 on 56 df, p=0 ## Wald test = 555 on 56 df, p=0 ## Score (logrank) test = 1158 on 56 df, p=0 ##  # Results of 2SRI for Minor Salivary Gland Tumors # conclusion? don't get treated for salivary gland surgery in utah! utah.patients = which(seer.data$Registry.ID == levels(seer.data$Registry.ID)[15]) seer.data = seer.data[-utah.patients, ] seer.data$Registry.ID = factor(seer.data$Registry.ID, levels = levels(seer.data$Registry.ID)[1:14])

# now get the fitted values for IV
Adj.RT.IV = glm(form, data = seer.data, family = binomial)$fitted.values seer.data$U.est = seer.data$Adj.RT - Adj.RT.IV form.final = as.formula(paste("Surv(Survival.Time,as.numeric(Vital.status.recode)) ~ Adj.RT + U.est + ", paste(covs, collapse = "+"))) final.ph = coxph(form.final, data = seer.data) ## Warning: Loglik converged before variable 30,31,32 ; beta may be infinite.  summary(final.ph) ## Call: ## coxph(formula = form.final, data = seer.data) ## ## n= 2062, number of events= 560 ## ## coef exp(coef) se(coef) z Pr(>|z|) ## Adj.RTTRUE 7.38e-01 2.09e+00 3.97e-01 1.86 0.06316 . ## U.est -1.03e+00 3.57e-01 4.19e-01 -2.46 0.01399 * ## Age.at.diagnosis 5.63e-02 1.06e+00 3.49e-03 16.14 < 2e-16 *** ## Sex2 3.01e-01 1.35e+00 9.18e-02 3.28 0.00103 ** ## Race2 -4.71e-01 6.25e-01 2.08e-01 -2.26 0.02388 * ## Race3 -1.47e-01 8.63e-01 1.30e-01 -1.13 0.25821 ## Race4 -1.55e+00 2.12e-01 1.01e+00 -1.53 0.12488 ## Year.of.diagnosis -3.44e-02 9.66e-01 1.65e-02 -2.08 0.03782 * ## Tumor.location2 -2.76e-01 7.58e-01 1.99e-01 -1.39 0.16436 ## Tumor.location3 -1.03e-01 9.02e-01 3.24e-01 -0.32 0.74975 ## Tumor.location4 -3.98e-01 6.72e-01 2.63e-01 -1.51 0.13121 ## Tumor.location5 -5.15e-02 9.50e-01 2.96e-01 -0.17 0.86208 ## Tumor.location6 -1.74e-01 8.40e-01 3.14e-01 -0.55 0.57915 ## Tumor.location7 2.36e-01 1.27e+00 2.17e-01 1.09 0.27524 ## Tumor.location8 1.83e-01 1.20e+00 3.96e-01 0.46 0.64479 ## Tumor.location9 -8.39e-02 9.19e-01 2.32e-01 -0.36 0.71762 ## Tumor.location10 -4.49e-01 6.38e-01 3.56e-01 -1.26 0.20711 ## T2 1.54e-01 1.17e+00 1.43e-01 1.08 0.28225 ## T3 4.77e-01 1.61e+00 1.70e-01 2.80 0.00506 ** ## T4 4.81e-01 1.62e+00 1.47e-01 3.28 0.00106 ** ## T5 6.47e-02 1.07e+00 1.52e-01 0.42 0.67108 ## N2 1.50e-01 1.16e+00 2.21e-01 0.68 0.49785 ## N3 7.38e-01 2.09e+00 1.60e-01 4.63 3.7e-06 *** ## N4 1.69e+00 5.42e+00 3.93e-01 4.30 1.7e-05 *** ## N5 9.85e-02 1.10e+00 1.17e-01 0.84 0.39898 ## Grade 4.21e-01 1.52e+00 6.83e-02 6.16 7.2e-10 *** ## Histology2 -1.63e-01 8.50e-01 6.07e-01 -0.27 0.78869 ## Histology3 -2.42e-01 7.85e-01 6.07e-01 -0.40 0.69017 ## Histology4 -4.08e-01 6.65e-01 6.11e-01 -0.67 0.50469 ## Histology5 -1.44e+01 5.61e-07 1.13e+03 -0.01 0.98980 ## Histology6 -1.45e+01 5.29e-07 2.97e+03 0.00 0.99612 ## Histology7 -1.67e+01 5.86e-08 6.09e+03 0.00 0.99782 ## Histology8 1.17e-01 1.12e+00 1.19e+00 0.10 0.92178 ## Histology9 -1.84e-01 8.32e-01 6.57e-01 -0.28 0.77963 ## Histology10 -5.69e-01 5.66e-01 8.44e-01 -0.67 0.50048 ## Histology11 5.06e-01 1.66e+00 6.21e-01 0.82 0.41461 ## Surgery2 -1.02e+00 3.62e-01 2.84e-01 -3.57 0.00035 *** ## Surgery3 -9.91e-01 3.71e-01 3.72e-01 -2.67 0.00765 ** ## Surgery4 -9.54e-01 3.85e-01 2.85e-01 -3.35 0.00080 *** ## Surgery5 -2.45e-01 7.83e-01 3.57e-01 -0.69 0.49218 ## Surgery6 -7.21e-01 4.86e-01 2.99e-01 -2.41 0.01599 * ## Surgery7 -1.01e+00 3.63e-01 2.94e-01 -3.45 0.00056 *** ## Surgery8 -1.71e+00 1.81e-01 7.78e-01 -2.20 0.02791 * ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## exp(coef) exp(-coef) lower .95 upper .95 ## Adj.RTTRUE 2.09e+00 4.78e-01 0.9604 4.552 ## U.est 3.57e-01 2.80e+00 0.1569 0.812 ## Age.at.diagnosis 1.06e+00 9.45e-01 1.0507 1.065 ## Sex2 1.35e+00 7.40e-01 1.1291 1.618 ## Race2 6.25e-01 1.60e+00 0.4152 0.940 ## Race3 8.63e-01 1.16e+00 0.6683 1.114 ## Race4 2.12e-01 4.73e+00 0.0291 1.538 ## Year.of.diagnosis 9.66e-01 1.03e+00 0.9354 0.998 ## Tumor.location2 7.58e-01 1.32e+00 0.5137 1.120 ## Tumor.location3 9.02e-01 1.11e+00 0.4775 1.703 ## Tumor.location4 6.72e-01 1.49e+00 0.4009 1.126 ## Tumor.location5 9.50e-01 1.05e+00 0.5314 1.698 ## Tumor.location6 8.40e-01 1.19e+00 0.4536 1.556 ## Tumor.location7 1.27e+00 7.89e-01 0.8284 1.937 ## Tumor.location8 1.20e+00 8.33e-01 0.5521 2.611 ## Tumor.location9 9.19e-01 1.09e+00 0.5834 1.449 ## Tumor.location10 6.38e-01 1.57e+00 0.3178 1.282 ## T2 1.17e+00 8.57e-01 0.8810 1.545 ## T3 1.61e+00 6.21e-01 1.1542 2.248 ## T4 1.62e+00 6.18e-01 1.2131 2.158 ## T5 1.07e+00 9.37e-01 0.7913 1.438 ## N2 1.16e+00 8.61e-01 0.7534 1.791 ## N3 2.09e+00 4.78e-01 1.5302 2.860 ## N4 5.42e+00 1.84e-01 2.5097 11.726 ## N5 1.10e+00 9.06e-01 0.8778 1.387 ## Grade 1.52e+00 6.57e-01 1.3322 1.741 ## Histology2 8.50e-01 1.18e+00 0.2585 2.794 ## Histology3 7.85e-01 1.27e+00 0.2391 2.579 ## Histology4 6.65e-01 1.50e+00 0.2007 2.204 ## Histology5 5.61e-07 1.78e+06 0.0000 Inf ## Histology6 5.29e-07 1.89e+06 0.0000 Inf ## Histology7 5.86e-08 1.71e+07 0.0000 Inf ## Histology8 1.12e+00 8.90e-01 0.1091 11.576 ## Histology9 8.32e-01 1.20e+00 0.2297 3.015 ## Histology10 5.66e-01 1.77e+00 0.1082 2.963 ## Histology11 1.66e+00 6.03e-01 0.4916 5.600 ## Surgery2 3.62e-01 2.76e+00 0.2077 0.632 ## Surgery3 3.71e-01 2.69e+00 0.1793 0.769 ## Surgery4 3.85e-01 2.60e+00 0.2205 0.673 ## Surgery5 7.83e-01 1.28e+00 0.3889 1.575 ## Surgery6 4.86e-01 2.06e+00 0.2706 0.874 ## Surgery7 3.63e-01 2.75e+00 0.2041 0.646 ## Surgery8 1.81e-01 5.53e+00 0.0393 0.831 ## ## Concordance= 0.854 (se = 0.013 ) ## Rsquare= 0.375 (max possible= 0.979 ) ## Likelihood ratio test= 968 on 43 df, p=0 ## Wald test = 550 on 43 df, p=0 ## Score (logrank) test = 1127 on 43 df, p=0 ##   # variance for beta_AdjRT should be the same ... testing # boot.func.ART <- function(data,idx) { data.temp = data[idx,] final.ph = # coxph(form.final,data=data.temp) # return(c(final.ph$coef[1],final.ph$var[1,1])) } # library(boot) # final.ph.boot = boot(data=seer.data,statistic=boot.func.ART,R=1000) # var.est = var(final.ph.boot$t[,1])

# z.boot = final.ph.boot$t[,1] / final.ph.boot$t[,2]

# p.val = sum(z.boot > 0) / 1000

# exp(coef(final.ph)[1] + c(-1,1) * 2 * sqrt(var.est)) [1] 0.8712999
# 5.0171531

# References

1. Joseph V. Terza, Anirban Basu, Paul J. Rathouz, Two-stage residual inclusion estimation: Addressing endogeneity in health econometric modeling, Journal of Health Economics, Volume 27, Issue 3, May 2008, Pages 531-543