# Checking the assumptions in Cox regression:

# methods and illustration for the melanoma data

 

# Cox's regression model assumes that the hazard for an individual with vector of fixed covariates

# x = (x1,x2, ... ,xp)T takes the form: a(t|x) = a0(t)*exp{bTx}. The model is based on two key assumptions:

 

# 1) Log-linear effect of numeric covariates: log a(t|x) = log a0(t) + bTx

#���� This means that one unit's increase in a numeric covariate xj should have the same effect whatever

#���� the value of xj and whatever the values of the other covariates (if no interaction effects are included).

 

# 2) Proportional hazards: a(t|x1)/ a(t|x2) = exp{{bT(x1-x2)}.

#���� This means that the hazard ratio is independent of time t. In particular if x1 and x2 are equal

#���� except for the j-th component, where x2j = x1j +1, the hazard rate becomes exp(bj).

 

 

# We will illustrate the model checking methods using the melanoma data.

# The commands below assume that the melanoma data set has been stored

# in the dataframe melanoma, that ulcer is defined as a factor, and that

# the survival library has been attached.

 

 

# We start out the model:

cox.fit=coxph(Surv(lifetime,status==1)~ulcer+thickn,data=melanoma)

 

 

# We first check whether there is a log-linear effect of the numeric covariate thickness

 

# A simple method is to make a categorical variable for thickness, estimate a separate effect

# for each thickness group and see if we get a fairly linear trend for the estimates:

quantile(melanoma$thickn[melanoma$status==1])

melanoma$thickg=cut(melanoma$thickn,breaks=c(0,2,4,6,20),labels=1:4)

cox.fitg=coxph(Surv(lifetime,status==1)~ulcer+thickg,data=melanoma)

summary(cox.fitg)

th=rep(0,4)

for (i in 1:4) th[i]=mean(melanoma$thickn[melanoma$thickg==i])

plot(th,c(0,cox.fitg$coef[2:4]),pch=1)

 

# The plot indicates that the effect of thickness is not log-linear.

 

 

# A more advanced method is to fit a penalized smoothing spline for the effect of thickness:

cox.fitps=coxph(Surv(lifetime,status==1)~ulcer+pspline(thickn),data=melanoma)

print(cox.fitps)

termplot(cox.fitps)

 

# (When you use the termplot command, you need to click repeatedly on the graphical window to see the plots.)

 

# Both the formal test (from the print command) and the plot (from the termplot command) indicate that the effect of thickness is not log-linear.

# (As there are only 7 individuals with a survival time of more than 12 years, you should not put much emphasis on the estimated curve for long survival times.)

 

 

# The model check above indicates that it is better to use log2-thickness, so we define log2-thickness as a new numeric covariate and continue with the model:

melanoma$log2thick=log(melanoma$thickn,2)

cox.fit2=coxph(Surv(lifetime,status==1)~ulcer+log2thick,data=melanoma)

 

# We then check for a log-linear effect of the covariate log2-thickness:

melanoma$log2thick=log(melanoma$thickn,2)

cox.fitlps=coxph(Surv(lifetime,status==1)~ulcer+pspline(log2thick),data=melanoma)

print(cox.fitlps)

termplot(cox.fitlps)

 

# Now both the formal test (from the print command) and the plot (from the termplot command) indicate that the effect of log2-thickness is reasonably log-linear.

 

 

 

# We then check the assumption of proportional hazards

 

# A simple method is to fit a stratified Cox model, where we stratify according to the levels of a categorical covariate,

# and plot the stratum-specific cumulative baseline hazard estimates on a log-scale.

# If we have proportional hazards, the curves should be fairly parallel (i.e. have a constant vertical distance).

 

# We first check for ulceration:

cox.fitstru=coxph(Surv(lifetime,status==1)~strata(ulcer)+log2thick,data=melanoma)

plot(survfit(cox.fitstru,newdata=data.frame(log2thick=0)),fun="cumhaz",lty=c(1,2),log=T)

 

# We then check for thickness:

cox.fitstrth=coxph(Surv(lifetime,status==1)~ulcer+strata(thickg),data=melanoma)

plot(survfit(cox.fitstrth,newdata=data.frame(ulcer=0)),fun="cumhaz",lty=c(1,2,3,4),log=T)

 

# For both covariates the curves are fairly parallel, but there is a tendency, especially for tumor-thickness, that the curves are closer together for large values of t than for smaller ones.

 

 

# We finally make a formal test for proportionality of the covariates. This is done by, for each covariate x,

# adding time-dependent covariate x*log(t), and testing whether the time-dependent covariates are significant using a score test

cox.zph(cox.fit2,transform='log')

 

# The test shows that the effect of tumor-thickness is not proportional.

 

#We also make plots that give nonparametric estimates of the (possible) time dependent effect of the covariates:

par(mfrow=c(1,2))

plot(cox.zph(cox.fit2))

 

# The plot indicate that the effect of tumor thickness is decreasing with time t.