SmartDrill Home

   Search the SmartDrill site


Survival Analysis

When applied to people in a medical context, life data analysis is often referred to as survival analysis.  Here we present such an example.

This is a hypothetical case study of the influence of various patient characteristics on survival rates for breast cancer. The survival analysis technique employed is Cox Regression. This technique is useful in situations where we have censored observations--that is, where some of the patients do not die during the observation period. (If all patients had died during the observation period, then we could have used another technique, such as linear regression, to generate a predictive model of survival times.)

Data and Method

The observation period runs for 133.8 months. The modeling sample contains 746 patients, including 50 patients who died during the observation period and 696 who survived beyond the end of the observation period.

Our dependent variable (or "status" variable) has two values: "survived" vs. "died."  In this simple example, we are testing only four predictors:

  • Age, in years, at the start of the observation period
  • Pathological tumor size, in centimeters
  • Number of positive axillary lymph nodes
  • Estrogen receptor status (positive vs. negative)

Here are the value ranges for the predictor variables:

  • Age: 22 to 88
  • Pathological tumor size: 0.10 to 7.00 centimeters
  • Number of positive lymph nodes: zero to 35
  • Estrogen receptor status: positive vs. negative


First, for those who have a statistical background, the Cox Regression used a backward stepwise likelihood-ratio variable selection method, based on maximum partial likelihood estimates (-2 log likelihood).  Significance criteria were set at 0.05 for inclusion in the model, and 0.10 for removal from the model.

Here is some of the actual computer printout from the final step of the stepwise regression analysis:

Cox regression model statistics

Since this is intended to be a non-technical discussion, we will not explain all the statistics in this table. But some key things to note are:

  • Estrogen status was removed as a predictor because it did not reach the 0.05 significance criterion for inclusion, and it showed no appreciable correlation with the dependent variable.  (The column labeled "Sig" shows the statistical significance of included variables; the column labeled "R" shows the degree of unique correlation with the dependent variable.)
  • Number of positive axillary lymph nodes was the strongest predictor of survival rates over the course of the observation period (R=.1443 / Sig=.0001)
  • Pathological tumor size was the second-best predictor (R=.1259 / Sig.=.0007), and is nearly as strong a predictor as number of positive axillary lymph nodes
  • Age, although significant, is somewhat less influential than the other two predictors
    (R= -.0893 / Sig.= .0094)

Note that both the number of positive axillary lymph nodes and the pathological tumor size are positively correlated with the dependent variable, which means that they are directly associated with more rapid mortality. In contrast, age is negatively correlated with the dependent variable, which means that younger age is predictive of somewhat longer survival.

The following chart shows the cumulative survival function during the observation period:

breast cancer cumulative survival graph

Several things are immediately apparent from this chart:

  • All patients survive through the tenth month of the observation period, at which time we begin to observe a fairly constant mortality rate which runs through the fortieth month
  • At the fortieth month, the mortality rate increases and continues at this fairly constant increased rate through the forty-fifth month
  • At the forty-fifth month, there is a five-month period without additional mortality, after which time the mortality continues at a fairly constant rate until the end of the observaton period, by which time approximately 11% of the original sample has died

Conclusions and Implications

The case study presented here is relatively simple, and is for illustrative purposes only. However, with the addition of more candidate predictors (e.g., progesterone receptor status, histologic grade, etc.), an even more powerful model could emerge.

By understanding the influence of patient characteristics on mortality rates over time, we are in a better position to estimate survival times for individual patients, and to defend using different or more aggressive therapeutic approaches for some patients.

Back to the Life Data Analysis page

The foregoing case study is an edited version of one originally furnished by SPSS, and is used with their permission. 

· Marketing Analytics 
· Market Research
· Operations Research
· Risk/Decision Analysis
· Project Management


SSL certification seal from Comodo