We present BGPhazard
, an R package which computes hazard
rates from a Bayesian nonparametric view. This is achieved by computing
the posterior distribution of a gamma or a beta process through a Gibbs
sampler. The purpose of this document is to guide the user on how to use
the package rather than conducting a thorough analysis of the
theoretical results. Nevertheless, section 2 briefly discuss the main
results of the models proposed by Nieto-Barajas and Walker (2002) and by
Nieto-Barajas (2003). These results will be helpful to understand the
usage of the functions contained in the package. In section 3 we show
some examples to illustrate the models.
Survival analysis focuses on studying data related to the occurrence time of an event. A typical function is the survival function, which in nonparametric statistics is estimated through the product limit estimator (Kaplan & Meier, 1958). This estimator is used as an approximation to the survival function. In some cases, its stair-step nature can return misleading estimators in a neighborhood of the steps: just before and after each we will have significant differences between estimators.
Lack of smoothness on nonparametric estimations give rise to methods whose outputs are smooth functions. One of many approaches is given by Nieto-Barajas and Walker (2002) for the survival function. The theory behind these models combines Bayesian Statistics and Survival Analysis to obtain hazard rate estimates. Bayesian Statistics let us introduce previous knowledge to a data set to improve estimations. Nieto-Barajas & Walker (2002) estimate the hazard function in segments by introducing dependence between each other so that information is shared and a smooth hazard rate is obtained. We review the three models contained in the package: beta model for discrete data, gamma model for continuous data and cox-gamma model for continuous data in a proportional hazards model.
A consequence of using a nonparametric Bayesian model with a dependence structure is that the resulting estimators are smoother than those obtained with a frequentist nonparametric model.
In this brief review, we examine a generalization of the independent gamma process of Walker and Mallick (1997) –gamma model–; then, a generalization of the beta process introduced by Hjort (1990) –beta model– which is often used to model discrete failure times, and lastly, the proportional risk model extension to the gamma process that copes with explanatory variables that remain constant during time (Nieto-Barajas, 2003).
We provide nonparametric prior distributions for the hazard rate based on the dependence processes previously defined and we obtain the posterior distributions through a Bayesian update.
Let λk represent the gamma process and let πk represent the beta process. Let θk represent both λk and πk. For interpretation of the Markov model, the main priority is to ensure
E[θk + 1|θk] = a + bθk
A latent process {uk} is introduced in order to obtain {θk} from θ1 → u1 → θ2 → u2 → ⋯
Walker & Mallick (1997) consider {λk} as independent gamma variables, i.e., λk ∼ Ga(αk, βk) independent for k = 1, 2, ... Nieto-Barajas & Walker (2002) consider a dependent process for {λk}. They start with λ1 ∼ Ga(α1, β1) and take uk|λk ∼ Po(ckλk) and λk + 1|uk ∼ Ga(αk + 1 + uk, βk + 1 + ck) and so on. These updates arise from the joint density
f(u, λ) = Ga(λ|α, β)Po(u|cλ) and so constitute a Gibbs type update. The difference is that they are changing the parameters (α, β, c) at each update so the chain is not stationary and marginally the {λk} are not gamma.
However, $$ \mathrm{E}[\lambda_{k+1}|\lambda_k] = \frac{\alpha_{k+1}+c_k\lambda_k}{\beta_{k+1}+c_k} $$
If ck = 0, then P(uk = 0) = 1 and hence the {λk} are independent gamma and we have the prior process of Walker and Mallick (1997).
An important result is that if we take αk = α1 and βk = β1 to be constant for all k, then the process {uk} is a Poisson-gamma process with implied marginals λk ∼ Ga(α1, β1). One can note that if u1 is Poisson distributed and λ2|u1 is conditionally gamma, then λ2 is never gamma.
Nieto-Barajas and Walker (2002) start with π1 ∼ Be(α1, β1) and take uk|πk ∼ Bi(ck, πk), πk + 1|uk ∼ Be(αk + 1 + uk, βk + 1 + ck − uk) and so on. These arise from a binomial-beta conjugate set-up, from the joint density
f(u, π) = Be(π|α, β)Bi(u|c, π)
Clearly
$$ \mathrm{E}[\pi_{k+1}|\pi_k] = \frac{\alpha_{k+1}+c_k\pi_k}{\alpha_{k+1}+\beta_{k+1}+c_k} $$
As with the gamma process, if we choose ck = 0, then P(uk = 0) = 1 and so the {uk} become independent beta and we obtain the prior of Hjort (1990).
A significant result is that if we take αk = α1 and βk = β1 to be constant for all k, then the process {uk} is a binomial-beta process with marginals uk ∼ BiBe(α1, β1, ck). Moreover, the process {πk} becomes strictly stationary and marginally πk ∼ Be(α1, β1).
We use the gamma and beta processes to define nonparametric prior distributions. In order to obtain f(θ|data), we introduce the latent variable u and so constitute a Gibbs update for f(θ|u, data) and f(u|θ, data). As we generate a sample from f(θ, u|data), we automatically obtain a sample from f(θ|data). Therefore, given u from f(u|θ, data), we can obtain a sample from f(θ, u|data) by simulating from θ ∼ f(θ|u, data).
It can be shown that f(θ|u, data) ∝ f(data|θ) × f(θ|u) and f(u|θ, data) ∝ f(data|θ, u) × f(u|θ) ∝ f(u|θ) because u and data are conditionally independent given θ.
Let T be a continuous random variable with cumulative distribution function F(t) = P(T ≤ t) on [0, ∞). Consider the time axis partition 0 = τ0 < τ1 < τ2 < ⋯, and let λk be the hazard rate in the interval (τk − 1, τk], then the hazard function is given by $$ h (t) = \sum_{k=1}^{\infty} \lambda_kI_{(\tau_{k-1},\tau_k]}(t) $$
So, the cumulative distribution and density functions, given {λk}, are F(t|{λk}) = 1 − e−H(t), f(t|{λk}) = h(t)e−H(t), where H(t) = ∫0th(s)ds. We also have that f(λk|uk − 1, uk) = Ga(αk + uk − 1 + uk, βk + ck − 1 + ck)
Therefore, given a sample T1, T2, ..., Tn from f(t{λk}), it is straightforward to derive f(λk|uk − 1, uk, data) = Ga(αk + uk − 1 + uk + nk, βk + ck − 1 + ck + mk), where nk = number of uncensored observations in (τk − 1, τk], mk = ∑irki, and
$$ r_{ki} = \left \{ \begin{array}{l l} \tau_k - \tau_{k-1} & t_i > \tau_k \\ t_i - \tau_{k-1} & $t$ \in (\tau_{k-1},\tau_k] \\ 0 & $otherwise$ \end{array} \right. $$
Additionally,
$$ P(u_k=u | \lambda_k,\lambda_{k+1},data) \propto f(u|\lambda) \propto \frac{[c_k(c_k+\beta_{k+1})\lambda_k\lambda_{k+1}]^u}{\Gamma(u+1)\Gamma(\alpha_{k+1}+u)} $$
with u = 0, 1, 2, .... Hence, with these full conditional distributions, a Gibbs sampler is straightforward to implement in order to obtain posterior summaries.
We can learn about the {ck} by assigning an independent exponential distribution with mean ϵ for each ck, k = 1, .., K − 1. The Gibbs sampler can be extended to include the full conditional densities for each ck. It is not difficult to derive that a ck from f(ck|u, λ, data) can be taken from the density
$$ f(c_k|u_k,\lambda_k,\lambda_{k+1}) \propto (\beta_{k+1}+c_k)^{\alpha_{k+1}+u_k}c_k^{u_k}\exp\left\{ -c_k \left (\lambda_{k+1}+\lambda_k+\frac{1}{\epsilon} \right) \right\} $$ for ck > 0.
Dependence between ck′s can be introduced through a hierarchical model via assigning a distribution to ϵ ∼ Ga(a0, b0). The update would be given by: $$ f(\epsilon|\{c_k\}) = Ga\left(\epsilon |a_0 + K, b_0 + \sum_{k=1}^K c_k \right) $$ where K is the number of intervals generated by the time axis partition. This hierarchical specification of the initial distribution for ck let us assign a better value for ck.
Simulating from this distribution is not so straightforward. However, we construct a hybrid algorithm using a Metropolis-Hastings scheme taking advantage of the Markov chain generated by the Gibbs sampling.
Let T be a discrete random variable taking values in the set {τ1, τ2, ...} with probability density function f(τk) = P(T = τk). Let πk be the hazard rate at τk, then the cumulative distribution and the density functions, given {πk}, are $F(\tau_j|\{\pi_k\}) = 1 - \prod_{k=1}^j(1-\pi_k)$ and $f(\tau_j|\{\pi_k\}) = \pi_j \prod_{k=1}^{j-1}(1-\pi_k)$
The conditional distribution of πk is f(πk|uk − 1, uk) = Be(πk|αk + uk − 1 + uk, βk + ck − 1 − uk − 1 + ck − uk),
Thus, given a sample T1, T2, ..., Tn form f(⋅|{πk}) it is straightforward to derive f(πk|uk − 1, uk, data) = Be(πk|αk + uk − 1 + uk + nk, βk + ck − 1 − uk − 1 + ck − uk + mk), where nk = number of failures at τk, mk = ∑irki and
$$ r_{ki} = \left \{ \begin{array}{l l} 1 & \quad t_i > \tau_k \\ 0 & \quad $otherwise$ \end{array} \right. $$
Additionally, $$ P(u_k=u | \pi_k,\pi_{k+1}, data) \propto \frac{\theta_k^u}{\Gamma(u+1)\Gamma(c_k-u+1)\Gamma(\alpha_{k+1}+u)\Gamma(\beta_{k+1}+c_k-u)} $$ with u = 0, 1, ..., ck and $$ \theta_k = \frac{\pi_k\pi_{k+1}}{(1-\pi_k)(1-\pi_{k+1})} $$ As before, obtaining posterior summaries via Gibbs sampler is simple.
We can learn about the {ck} via assigning each ck an independent Poisson distribution with mean ϵ. The Gibbs sampler can be extended to include the full conditional densities for each ck. A ck from f(ck|u, π, data) can be taken from the density
$$ f(c_k|u_k,\pi_k,\pi_{k+1}) \propto \frac{\Gamma(\alpha_{k+1}+\beta_{k+1}+c_k)}{\Gamma(\beta_{k+1}+c_k-u_k)\Gamma(c_k-u_k+1)} \left[\epsilon (1-\pi_{k+1})(1-\pi_k) \right]^{c_k} $$ with ck ∈ {uk, uk + 1, uk + 2, ...}.
Dependence between ck’s can be introduced through a hierarchical model via assigning a distribution to ϵ ∼ Ga(a0, b0). So the update would be given by:
$$ f(\epsilon|\{c_k\}) = Ga\left(\epsilon |a_0 + K, b_0 + \sum_{k=1}^K c_k \right) $$ where K is the number of discrete values in random variable T.
Differing from most of the previous Bayesian analysis of the proportional hazards model, Nieto-Barajas (2003) models the baseline hazard rate function with a stochastic process. Let Ti be a non negative random variable which represents the failure time of i and Zi = (Zi1, ..., Zip) the vector containing its p explanatory variables. Therefore, the hazard function for individual i is: λi(t) = λ0(t)exp{Zi′θ} where λ0(t) is the baseline hazard rate and θ is regression’s coefficient vector. The cumulative hazard function for individual i becomes $$ H_i(t)=\sum_{k=1}^{\infty}\lambda_k W_{i,k}(t,\lambda), $$ where,
$$ W_{i,k}(t,\theta) = \left \{ \begin{array}{l l} (\tau_k - \tau_{k-1})\exp\{Z_i'\theta \} & t_i > \tau_k \\ (t_i - \tau_{k-1})\exp\{Z_i'\theta \} & t_i \in (\tau_{k-1},\tau_k] \\ 0 & otherwise \end{array} \right. $$
Given a sample of possible right-censored observations where T1, ..., Tn are uncensored and Tnu + 1, ..., Tn are right-censored, the conditional posterior distributions for the parameters of the semi-parametric model are:
where $n_k=\sum_{i=1}^{n_u}I_{(\tau_{k-1},\tau_k]}(t_i)$ and $m_k(\theta)=\sum_{i=1}^n W_{i,k}(t_i,\theta)$.
Similarly as we pointed out in the previous two cases, we incorporate a hyper prior process for the {ck} such that ck ∼ Ga(1, ϵk). The set of full conditional posterior distributions can then be extended to include
$$ f(c_k|u_k,\lambda_k,\lambda_{k+1}) \propto (\beta_{k+1}+c_k)^{\alpha_{k+1}+u_k}c_k^{u_k}\exp\left\{ -c_k \left (\lambda_{k+1}+\lambda_k+\frac{1}{\epsilon} \right) \right\} $$
Dependence between ck’s can be introduced through a hierarchical model via assigning a distribution to ϵ ∼ Ga(a0, b0). So the update would be given by: where K is the number of intervals generated by the time axis partition. This hierarchical specification of the initial distribution for ck let us assign a better value for ck.
For this model, we will be using the control observations of the 6-MP
data set (Freireich, E. J., et al., 1963) –data from a trial of 42
leukemia patients organised by pairs in which the first element of the
pair is treated with a drug and the other is control–. For examples 1 to
4, we use a partition of unitary length intervals; for the last three
examples –5, 6 & 7–, we use uniformly-dense intervals. The
"time"
column is taken as the observed times vector
–times
– and the "cens"
column as the censoring
status vector –delta
–:
data(gehan)
times <- gehan$time[gehan$treat == "6-MP"]
delta <- gehan$cens[gehan$treat == "6-MP"]
Now that we have a time and censoring status vector, we can run several examples for this model. Default values are used for each function unless otherwise noted. Every example shows our estimate overlapped with the Nelson-Aalen / Kaplan-Meier estimator, so the user can compare them.
We obtain with our model the Nelson-Aalen and Kaplan-Meier estimators
by defining ck as a null
vector –through fixing type.c = 1
–. A unitary partitioned
axis is obtained by fixing type.t = 2
. In Figure we show
that our estimators –under independence– return the same results as the
Nelson-Aalen and Kaplan-Meier estimators.
ExG1 <- GaMRes(times, delta, type.t = 2, K = 35, type.c = 1,
iterations = 3000)
GaPloth(ExG1, confint = FALSE)
The influence of c –or
c.r
– can be understood as a dependence parameter: the
greater the value of each ck, k = 0, 1, 2, ..., K − 1,
the higher the dependence between intervals k and k + 1. For this example, we assign a
fixed value to vector c, so we
use type.c = 2
. Comparing with the previous example, we see
that the estimates that were zero, now have a positive value –see
Figures and –. Note how this model compares to Example 5 –see Figure –.
The difference between those examples is how the partition is
defined.
ExG2 <- GaMRes(times, delta, type.t = 2, K = 35, type.c = 2,
c.r = rep(50, 34), iterations = 3000)
GaPloth(ExG2)
Additionally, we can get further detail on the Gibbs sampler with a diagnosis of the resulting Markov chain. We can run this diagnosis for each entry of λ, u, c or ϵ. In Figure we show the diagnosis for λ6 which includes the trace, the ergodic mean, the ACF function and the histogram for the generated chain.
GaPlotDiag(ExG2, variable = "lambda", pos = 6)
As we reviewed, we can learn about the {ck} via
assigning an exponential distribution with mean ϵ. The estimates using ϵ = 1 –type.c = 3
– and
unitary length intervals –type.t = 2
– are shown in Figure .
We can compare the hazard function with the previous example, where
c was fixed, and observe that
because of the variability given to c, the change on the estimated
values between two contiguous intervals are greater in this example. The
survival function echoes the shape of the Kaplan-Meier with higher
decrease rates.
Comparing this example with Example 6 –Figure –, as with the previous example, the main difference is given by the partition of the time axis. This affects the estimates as we will note later.
ExG3 <- GaMRes(times, delta, type.t = 2, K = 35, type.c = 3, epsilon = 1,
iterations = 3000)
GaPloth(ExG3)
Previous example can be extended with a hierarchical model, assigning
a distribution to ϵ ∼ Ga(a0, b0)
with a0 = b0 = 0.01.
In order to set up the model we should set type.c = 4
. The
result displayed on Figure is a soft hazard function and a survival
function that decreases faster than the Kaplan-Meier estimate.
ExG4 <- GaMRes(times, delta, type.t = 2, K = 35, type.c = 4,
iterations = 3000)
GaPloth(ExG4)
This example illustrates the same concept as Example 2 –how c introduces dependence–, but with a
different partition of the time axis. To get this partition we set
type.t = 1
. Figure shows that the survival function is
close to the Kaplan-Meier estimate. The fact that it gets closer to the
K-M estimates does not makes it a better estimate, we only could say
that this partition yields, in average, a smaller hazard rate than the
Example 2 built with a unitary length partition.
ExG5 <- GaMRes(times, delta, type.t = 1, K = 8, type.c = 2,
c.r = rep(50, 7), iterations = 3000)
GaPloth(ExG5)
We can compare this example on Figure with Example 3 –Figure –. We use less intervals, so the survival function is smoother. Our estimate decreases faster than the Kaplan-Meier estimate.
ExG6 <- GaMRes(times, delta, type.t = 1, K = 8, type.c = 3,
iterations=3000)
GaPloth(ExG6)
The survival curve for this particular example results in a smoothed version of the Kaplan-Meier estimate (Figure ). As with previous examples, it can be compared with the unitary partition example (see Example 4, Figure ).
ExG7 <- GaMRes(times, delta, type.t = 1, K = 8, type.c = 4,
iterations = 3000)
GaPloth(ExG7)
%
For this model, we use survival data on 26 psychiatric inpatients
admitted to the University of Iowa hospitals during the years 1935-1948.
This sample is part of a larger study of psychiatric inpatients
discussed by Tsuang and Woolson (1977). We take the "time"
column as the observed times vector –times
– and the
"death"
column as the censoring status vector
–delta
–:
data(psych)
times <- psych$time
delta <- psych$death
As with the Gamma Example, we obtain the Nelson-Aalen and
Kaplan-Meier estimators by defining ck as a null
vector through fixing type.c = 1
–see Figure – The
conclusion does not change: the independence case of our model results
on the N-A and K-M estimators.
ExB1 <- BeMRes(times, delta, type.c = 1, iterations = 3000)
BePloth(ExB1, confint = FALSE)
The influence of c –or
c.r
– can be also understood as a dependence parameter: the
greater the value of each ck, k = 0, 1, 2, ..., K − 1,
the higher the dependence between intervals k and k + 1. In this example, we fix each
c entry at 100. As we are
defining vector c with fixed
values, we should fix type.c = 2
. We see on Figure that the
hazard function estimate turns smoother than on the previous example.
The steps on the survival function appear to be more uniform than on the
Kaplan-Meier estimate.
ExB2 <- BeMRes(times, delta, type.c = 2, c.r = rep(100, 39),
iterations = 3000)
BePloth(ExB2)
Additionally, we can get further detail on the Gibbs sampler with a diagnosis of the resulting Markov chain. We can run this diagnosis for each entry from π, u, c or ϵ. In Figure we show the diagnosis for π10 which includes plot of the trace, the ergodic mean, the ACF function and the histogram of the chain.
BePlotDiag(ExB2, variable = "Pi", pos = 6)
As with the gamma model, we can learn about the {ck} via assigning an exponential distribution with mean ϵ. The estimates using ϵ = 1 and unitary length intervals are shown on Figure . Note that the confidence intervals have widen: it is a signal that we have introduced variability to the model –because of the distribution assigned to c–.
ExB3 <- BeMRes(times, delta, type.c = 3, epsilon = 1, iterations = 3000)
BePloth(ExB3)
The previous example can be extended with a hierarchical model,
assigning a distribution to ϵ ∼ Ga(a0, b0),
with a0 = b0 = 0.01.
In order to set up the model we should set type.c = 4
. The
result displayed on Figure is a soft hazard function and a survival
function that decreases faster than the Kaplan-Meier estimate.
ExB4 <- BeMRes(times, delta, type.c = 4, iterations = 3000)
BePloth(ExB4)
In this example, we simulate the data from a Weibull model, frequently used with continuous and non negative data. The advantage of setting a model is that we know in advance the results, so we can compare the estimates with the exact values from model. The Weibull model has the probability functions: h0(t) = abtb − 1, S0(t) = e−atb
We construct the proportional hazard model as: hi(ti|Zi) = h0(t)eθ′Zibtb − 1, Si(ti|Zi) = exp{−H0(ti)eθ′Zi}
Note that this Weibull proportional hazard model is also another Weibull model with parameters (ai* = aeθ′Zi, b). Based on the previous densities and on fixed parameters a, b, θ1 and θ2 -these last two covariates are simulated from uniform distributions on the interval (0,1)-, we simulate n observations. We gather the results of the model in a table –see Table –. ti|Zi ∼ Weibull(ai*, b) and Zi = (Zi1, Zi2) are the explanatory variables; ci, the censoring time; δi, censoring indicator, and min{ti, ci} represents values deeming censorship time.
We generate a size n = 100 sample based on the simulation model with parameters $a=$0.1, b = 1, θ = (1, 1) y Zi ∼ U(0, 1), i = 1, 2. The result is a table with n = 100 observations.
On the other hand, we use almost every default parameter from
CGaMRes
excluding K = 10
,
iterations = 3000
and thpar = 10
.
Theoretically, our model should estimate a constant risk function at
a × b = 0.1.
Below, we show the code for the Weibull model and the calls for the
plots for the hazard and survival functions –command
CGaPloth(M)
–, the predictive distribution for an
observation defined as the median of the data –CGaPred(M)
–,
and the plots for θ1 and θ2
–PlotTheta(M)
–.
SampWeibull <- function(n, a = 10, b = 1, beta = c(1, 1)) {
M <- matrix(0, ncol = 7, nrow = n)
for(i in 1:n){
M[i, 1] <- i
M[i, 2] <- x1 <- runif(1)
M[i, 3] <- x2 <- runif(1)
M[i, 4] <- rweibull(1, shape = b,
scale = 1 / (a * exp(cbind(x1, x2) %*% beta)))
M[i, 5] <- rexp(1)
M[i, 6] <- M[i, 4] > M[i, 5]
M[i, 7] <- min(M[i, 4], M[i, 5])
}
colnames(M) <- c("i", "x_i1", "x_i2", "t_i", "c_i", "delta",
"min{c_i, d_i}")
return(M)
}
dat <- SampWeibull(100, 0.1, 1, c(1, 1))
dat <- cbind(dat[, c(4, 6)], dat[, c(2, 3)])
CG <- CGaMRes(dat, K = 10, iterations = 3000, thpar = 10)
CGaPloth(CG)
PlotTheta(CG)
CGaPred(CG)
Because of the way we built the model, we can compare against theoretical results on the Weibull model. In Figure we see that the hazard rate estimate is practically the same from t = 8 to t = 37. The estimate for the survival functions gets very close to the real value. Plots and histograms for θ from Figure show estimated values for the regression coefficients (θ1, θ2) and they are consistently near to 1. Finally, the plots from Figure show the hazard rate estimate over a equally dense partition for a future individual where its explanatory variable is equal to the median of the observations –xF–. Note that the effect over the hazard function is given by the product of the baseline hazard function and exp {xF′θ}.