Section 8 Predictions from time-to-event models

8.1 Parametric time-to-event model

Another key aspect is to use a fitted model to make predictions: what fraction of germinated/emerged seeds will we find in, e.g., one/two weeks? And in one month? For example, let’s consider the hydro-time model we have fitted in some previous posts (the first one is at this link):

P(t,Ψ)=Φ{Ψ(θH/t)ΨbσΨb}

In the above model, P is the cumulative proportion of germinated seeds at time t and water potential Ψ, Φ is a gaussian cumulative distribution function for base water potential, Ψb is the median base water potential in the seed lot (in MPa), θH is the hydro-time constant (in MPa day or MPa hour) and σΨb represents the variability of Ψb within the population.

The code below can be used to fit the above model to the ‘rape’ dataset in the ‘drcSeedGerm’ package and retreive the estimated parameters, with robust standard errors:

library(drcSeedGerm)
library(drcte)
data(rape)
modHTE <- drmte(nSeeds ~ timeBef + timeAf + Psi, 
                data = rape, fct = HTnorm())
summary(modHTE, units = Dish)
## 
## Model fitted: Hydrotime model with normal distribution of Psib (Bradford et al., 2002)
## 
## Robust estimation: Cluster robust sandwich SEs 
## 
## Parameter estimates:
## 
##                        Estimate Std. Error  t value  Pr(>|t|)    
## ThetaH:(Intercept)     0.751069   0.131968   5.6913 3.075e-08 ***
## Psib50:(Intercept)    -0.906981   0.039530 -22.9444 < 2.2e-16 ***
## sigmaPsib:(Intercept)  0.236995   0.031309   7.5696 4.974e-13 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Now, we may wonder: if we have a seed lot with the above characteristics (θ=0.751 MPa d, Ψb50=0.907 MPa and σΨb=0.237), what will the proportion of germinated seeds be, e.g., at 1, 3, 5, 7 days after water imbibition, when the base water potential in the substrate is, e.g., 0, -0.25 and -0.5 MPa? To predict this from the model object we can build a data frame with the values of predictors (see below the use of the expand.grid() function) and use it as the ‘newdata’ argument to the predict() method.

newd <- expand.grid(time = c(1, 3, 5, 7), 
                    psi = c(0, -0.25, -0.5))
predict(modHTE, newdata = newd)
##    time   psi Prediction
## 1     1  0.00 0.74468884
## 2     3  0.00 0.99720253
## 3     5  0.00 0.99929640
## 4     7  0.00 0.99962993
## 5     1 -0.25 0.34568239
## 6     3 -0.25 0.95689600
## 7     5 -0.25 0.98375378
## 8     7 -0.25 0.98981312
## 9     1 -0.50 0.07326801
## 10    3 -0.50 0.74565419
## 11    5 -0.50 0.86069050
## 12    7 -0.50 0.89697826

With time-to-event models, the ‘newdata’ argument takes a data frame, where the first column is always time and the succeeding columns, wherever needed, represent the environmental covariates, in the same order as they appear in the model definition. If several models have been simultaneously fitted by using the ‘curveid’ argument (not in this case, though), predictions are made for all models, always using the same ‘newdata’.

We can also obtain standard errors and confidence intervals for the predictions, by adding the se.fit = TRUE and interval = TRUE arguments. We also recommend to add the robust = T argument, so that we obtain robust standard errors, accounting for the clustering of seeds within Petri dishes (lack of independence). With parametric time-to-event models, robust standard errors are obtained by using a cluster-robust sandwich variance-covariance matrix (Zeileis et al. 2020); in this case, a clustering variable needs to be provided with the units argument.

# Naive standard errors and confidence intervals
predict(modHTE, newdata = newd, se.fit = T, interval = T)
##    time   psi Prediction           SE      Lower     Upper
## 1     1  0.00 0.74468884 0.0452433821 0.65601344 0.8333642
## 2     3  0.00 0.99720253 0.0008053309 0.99562411 0.9987809
## 3     5  0.00 0.99929640 0.0002384085 0.99882913 0.9997637
## 4     7  0.00 0.99962993 0.0001358778 0.99936362 0.9998963
## 5     1 -0.25 0.34568239 0.0488012921 0.25003362 0.4413312
## 6     3 -0.25 0.95689600 0.0060636345 0.94501149 0.9687805
## 7     5 -0.25 0.98375378 0.0027953655 0.97827496 0.9892326
## 8     7 -0.25 0.98981312 0.0019555787 0.98598025 0.9936460
## 9     1 -0.50 0.07326801 0.0182524957 0.03749378 0.1090422
## 10    3 -0.50 0.74565419 0.0143630866 0.71750306 0.7738053
## 11    5 -0.50 0.86069050 0.0098002073 0.84148245 0.8798986
## 12    7 -0.50 0.89697826 0.0084761895 0.88036524 0.9135913
# Cluster robust standard errors and confidence intervals
predict(modHTE, newdata = newd, se.fit = T, interval = T,
        robust = T, units = Dish)
##    time   psi Prediction           SE      Lower     Upper
## 1     1  0.00 0.74468884 0.1450176755 0.46045941 1.0289183
## 2     3  0.00 0.99720253 0.0035012373 0.99034023 1.0040648
## 3     5  0.00 0.99929640 0.0011070141 0.99712670 1.0014661
## 4     7  0.00 0.99962993 0.0006427237 0.99837022 1.0008897
## 5     1 -0.25 0.34568239 0.1576339911 0.03672545 0.6546393
## 6     3 -0.25 0.95689600 0.0251911861 0.90752218 1.0062698
## 7     5 -0.25 0.98375378 0.0129567170 0.95835908 1.0091485
## 8     7 -0.25 0.98981312 0.0093141326 0.97155775 1.0080685
## 9     1 -0.50 0.07326801 0.0622953823 0.00000000 0.1953647
## 10    3 -0.50 0.74565419 0.0498559815 0.64793826 0.8433701
## 11    5 -0.50 0.86069050 0.0424570614 0.77747619 0.9439048
## 12    7 -0.50 0.89697826 0.0388178876 0.82089660 0.9730599

We are currently studying a way to avoid that confidence intervals return unrealistic predictions (see above some values that are higher than 1). We may note that cluster robust standard errors are higher than naive standard errors: the seed in the same Petri dish are correlated and, thus, they do not contribute full information.

8.2 Non-parametric time-to-event models

The predict() method can also be used to make predictions from NPMLE and KDE fits. In this case, no environmental covariates are admissible and, therefore, we can provide ‘newdata’ as a vector of times to make predictions. In the code below we fit the NPMLE of a time-to-event model to four species of the genus Verbascum, for which the data are available as the ‘verbascum’ data frame. We also make predictions relating to the proportion of germinated seeds at 1, 3, 5, and 7 days from water imbibition.

data(verbascum)
mod <- drmte(nSeeds ~ timeBef + timeAf, fct = NPMLE(),
             curveid = Species, data = verbascum)

# Define the values for predictions
newd <- c(1, 3, 5, 7)
predict(mod, newdata = newd, se.fit = T, interval = T,
        robust = T, units = Dish)
##      Species newdata Prediction         SE    Lower Upper
## 1   arcturus       1       0.00 0.00000000 0.000000  0.00
## 2   arcturus       3       0.00 0.00000000 0.000000  0.00
## 3   arcturus       5       0.00 0.00000000 0.000000  0.00
## 4   arcturus       7       0.00 0.00000000 0.000000  0.00
## 5  blattaria       1       0.00 0.00000000 0.000000  0.00
## 6  blattaria       3       0.09 0.05594901 0.010000  0.24
## 7  blattaria       5       0.73 0.06893809 0.600000  0.86
## 8  blattaria       7       0.80 0.06154616 0.690000  0.91
## 9   creticum       1       0.00 0.00000000 0.000000  0.00
## 10  creticum       3       0.33 0.08076917 0.167625  0.48
## 11  creticum       5       0.97 0.02382806 0.910000  1.00
## 12  creticum       7       0.97 0.02382806 0.910000  1.00

Standard errors are estimated by using a resampling (bootstrap) approach, that is performed at the group level, whenever a grouping variable is provided, by way of the ‘units’ argument (Davison and Hinkley, 1997).

For KDE models, we can make predictions in the very same way, although we are still unsure about the most reliable way to obtain standard errors. For this reason, the use of the ‘predict’ method with this type of non-parametric models does not yet provide standard errors and confidence intervals.

8.3 Predictions from a time-to-event model from literature

In some cases, we do not have a fitted model, but we have some literature information. For example, we have seen a manuscript where the authors say that, for a certain species, emergences appeared to follow a log-logistic time-course with the following parameters: ‘b’ (the slope at inflection point) equal to -1, ‘d’ (maximum germinated proportion) equal to 0.83 and ‘e’ (median germination time for the germinated fraction) equal to 12.3. Considering that a log-logistic time-to-event model is represented as LL.3(), we can make predictions by using the following code:

predict(LL.3(), coefs = c(-1, 0.83, 12.3),
        newdata = c(1, 3, 5, 7, 10))
##   newdata prediction
## 1       1 0.06240602
## 2       3 0.16274510
## 3       5 0.23988439
## 4       7 0.30103627
## 5      10 0.37219731