key: cord-0945884-jzkyer54
authors: Wu, Hongji; Yuan, Hao; Yang, Zijing; Hou, Yawen; Chen, Zheng
title: Implementation of an alternative method for assessing competing risks: restricted mean time lost
date: 2021-06-25
journal: American journal of epidemiology
DOI: 10.1093/aje/kwab235
sha: c557fe8c2deb05422b0a7f81ebf8b81c44a21d84
doc_id: 945884
cord_uid: jzkyer54

In clinical and epidemiological studies, hazard ratios are often applied to compare treatment effects between two groups for survival data. For competing risks data, the corresponding quantities of interest are cause-specific hazard ratios (cHRs) and subdistribution hazard ratios (sHRs). However, they both have some limitations related to model assumptions and clinical interpretation. Therefore, we recommend restricted mean time lost (RMTL) as an alternative that is easy to interpret in a competing risks framework. Based on the difference in restricted mean time lost (RMTLd), we propose a new estimator, hypothetical test and sample size formula. The simulation results show that the estimation of the RMTLd is accurate and that the RMTLd test has robust statistical performance (both type I error and power). The results of three example analyses also verify the performance of the RMTLd test. From the perspectives of clinical interpretation, application conditions and statistical performance, we recommend that the RMTLd be reported with the HR in the analysis of competing risks data and that the RMTLd even be regarded as the primary outcome when the proportional hazard assumption fails. The R code (crRMTL) is publicly available from Github (https://github.com/chenzgz/crRMTL.1). Keywords: survival analysis, competing risks, hazard ratio, restricted mean time lost, sample size, hypothesis test

events, which reflects the risk of the cause of interest without ignoring the presence of other competing events.

In the clinical analysis of competing risks data, the estimations and statistical tests based on cHR and sHR still have some limitations: 1) The hazard ratio (both the cHR and the sHR) should be described as a relative rate, not as a relative risk (6) . Without the assumption of independence of competing events, the cHR cannot be linked to the comparison of CIFs for an event between two groups (7), i.e., cHR > 1 does not necessarily imply CIF 1 > CIF 0 ; that is, even if the hazard due to a main cause in a control group is always higher than that in a treatment group, the risk of the main cause in the control group is not necessarily always higher than that in the treated group. Although the sHR can affect the comparison of CIFs, i.e., sHR > 1 can indicate that CIF 1 > CIF 0 and vice versa, it reflects the relative change in the instantaneous rates of occurrence of a given type of event in subjects who have not yet experienced that event between two groups. Researchers may find it difficult to interpret the results when individuals who had a competing event are retained in the risk set (8) . 2) Both the cause-specific Cox model and Fine-Gray model depend on an assumption of the proportionality of the CSH and the SDH; as a consequence, many published survival analyses report only a single cHR or sHR, which is an average of specific hazard ratios at different time points. However, if the above assumption is violated, a single HR is difficult to interpret because the true HR varies over time. 3) Because of the semi-parametric nature of the two regression models, the "relative" hazard rates cHR and sHR are not interchangeable with the "absolute" hazard rate without baseline hazards, which may make their clinical interpretation difficult to conceptualize.

Considering the above limitation, especially the problem of clinical interpretation, some researchers recommended an alternative statistic (9) (10) (11) : restricted mean time lost (RMTL) . RMTL can be estimated as the area under the CIF curve up to a specified time Restricted mean time lost for competing risks data.

American Journal of Epidemiology. 2021 4 point and interpreted as the mean time lost due to a specific cause during a predefined time window. Thus, compared to that of HRs, the clinical interpretation of the RMTL, which is based on a time scale, can easily be understood by doctors and patients (12) (13) (14) . The difference in RMTL (RMTLd) is used to qualify the treatment effect and is also directly associated with comparisons of CIFs.

Although Anderson (9) and Zhao (10) introduced the concept of RMTL, neither of them discussed the corresponding estimation and hypothetical test based on the RMTLd.

Lyu (11) presented a statistical inference framework and sample size estimator based on the RMTLd, but it seemed to be relatively conservative based on simulations. Therefore, in this article, we introduce a new RMTLd-based statistical inference framework and sample size formula and demonstrate its performance through simulation and illustrative examples.

Without loss of generality, only one event of interest ( 1 j = ) and one competing event ( 2 j = ) are assumed. T is defined as the observed time (time to event or censoring time).

The nonparametric estimation of the CIF is as follows: which can be interpreted as the mean time lost due to a specific cause within the τ year window. The variance in μ can be estimated based on the derivation of the martingale approximation (15) (for the detailed process, see Web Appendix 1): 

where z α is the upper 100 % α quantile of the standard normal distribution.

The null and alternative hypotheses of the RMTLd test are 

Hence, the total sample size is (for the detailed derivation, see Web Appendix 2) 

In the simulation setup, we assessed the performance of the estimation of the RMTLd, the RMTLd test and the RMTLd-based sample size under different scenarios: 1) no difference between groups ( Figure 1A) ; 2) a proportional SDH with sHR ≈ 0.905 ( Figure   1B ); 3) a proportional SDH with sHR ≈ 0.741 ( Figure 1C ); 4) an early difference between groups ( Figure 1D ); 5) a late difference with curves separated at t = 1 year ( Figure 1E ); and 6) a late difference with curves separated at t = 2 years ( Figure 1F ).

Let the type of interest and competing events be generated through the binomial distributions ( ) and an unbalanced design (n 0 = 300, n 1 = 500; n 0 = 500, n 1 = 1000). For all scenarios, a nominal level 0.05 α = is applied, and the specific time point τ is selected as the minimum of the maximum follow-up time of the two groups (16). All simulations are performed using 10,000 replications.

To evaluate the performance of the RMTLd estimation, we determined the true RMTLd at τ = 4 years with a total sample size of n = 1,000,000 (n 0 = n 1 ) under the different scenarios. The true RMTLd between groups for the event of interest under scenarios A-F in Figure 1A .

To assess the statistical power, several situations were considered ( Figure 1B -F 

The proportional SDH assumption is met: failure times were generated from the CIFs (18)

the group indicator ( Z = 0 and 1 for the control group and treatment group, respectively).

Meanwhile, we considered two scenarios, sHR ≈ 0.905 and sHR ≈ 0.741, corresponding to Figure 1B and Figure 1C , respectively; 2) The proportional SDH assumption is violated: both the early difference ( Figure 1D ) and the late difference ( Figure 

The specific parameter settings of all scenarios are presented in Web Table 1 . 

The results for the performance criterion of the estimation of the RMTLd are summarized in Table 1 and Table 2 . Considering that the true RMTLd in scenario A is approximately equal to 0, we replaced the mean relative bias (Rel bias) with bias to assess the performance (20). In summary, the estimation of RMTLd has a small bias (or the mean relative bias) under all scenarios, and the root mean square error decreases with increasing sample size and decreasing censoring. Meanwhile, the relative standard error is approximately equal to 1, and the coverage falls within a reasonable range.

For each scenario ( Figure 1A -F), the type I error rate and statistical power results are summarized in Table 3 . The type I error rates in Table 3 

For each scenario ( Figure Table 4 . Under proportional subdistribution hazards scenarios (B and C), the power of the RMTLd test and the Gray test is approximately equal to the predefined level of 80%. In the early difference scenario (D), the power of the RMTLd test is larger than the prespecified level, while that of the Gray test is much lower than 80%. In the late difference scenarios (E and F), the observed power of the RMTLd test is close to 80%, but that of the Gary test has an obvious decrease with a smaller difference (F).

In summary, the sample size based on the RMTLd can obtain a nominal power of approximately 80%, except in the early difference scenario. Therefore, the validity of the The rate of the event of interest in the non-surgery group (n = 101) was 29.70%, while the rate was 5.82% in the surgery group (n = 498). The corresponding censoring rates were 38.61% and 81.73%, respectively. Figure 2A shows the CIF curve of the event of interest, and Table 5 Due to the semi-parametric nature of the regression model, neither the CSH nor the SDH could be obtained in any group, resulting in empty cells in Table 5 .

Next, we let 1 τ = 25.667 years, which corresponds to the shortest maximum follow-up time between the two groups. Table 5 shows that the RMTL of the non-surgery group was Figure   1F ), and it showed that the RMTLd test had higher power than the Gray test, as shown in Table 3 .

Example 3. The Adaptive COVID-19 Treatment Trial (ACTT-1) is a placebo-controlled trial to assess remdesivir use in patients hospitalized with COVID-19 (2, 24). The data were reconstructed (for the detailed process, see Web Appendix 3) because the original data were not publicly available (2); the event of interest was defined as recovery, and the corresponding competing event was death. In ACTT-1, 541 patients were assigned to the remdesivir group, and 521 were assigned to the placebo group. The proportions of recovered patients in the remdesivir and control groups were 70.98% and 63.92%, respectively, and the censoring rates were 17.56% and 20.92%, respectively. Figure 3A shows the CIF curve of recovery between groups.

The results based on the CSH and SDH (Table 5 ) showed significant differences, and the proportional CSH assumption was satisfied (P = 0.056), while the SDH assumption was violated (P = 0.002). In regard to the RMTL, we note that different from the event of interest in example 1 (in which death from cervical cancer was a negative outcome), the event of interest in this example, i.e., recovery, was a positive outcome. Thus, a larger RMTL indicated better therapy. From Table 5 days and re-estimated that the sample sizes based on the sHR (25) and the RMTLd were 658 and 517, respectively. Moreover, based on different time points, we calculated the RMTLd-based sample sizes. As Figure 3B shows, the RMTLd-based sample sizes were always smaller than the sHR-based sample sizes.

The presence of competing risks makes treatment effect assessment in clinical trials and epidemiological studies with time-to-event endpoints more cumbersome. The commonly reported quantitative measures are the cHR and sHR, where the former might be used to study the etiology of diseases from biological mechanisms and the latter might be more suitable for predicting an individual's risk of a specific outcome (7).

However, based on our examples, there are still some limitations to the above two indicators based on HR. First, as a "relative" measure, HRs (both the cHR and sHR) cannot be easily understood when a baseline hazard is lacking (e.g., of a control group), even though the proportional CSH and SDH assumptions were satisfied in example 1. Moreover, the cHR = 0.132 and sHR = 0.158 in example 1 cannot be directly interpreted, as the "risk" of death from cervical cancer decreased by 86.8% or 84.2%, respectively, for the surgery group; rather, this result should be understood as an 86.8% or 84.2% decrease, respectively, in the "hazard" of death from cervical cancer, which is difficult to interpret clinically (2, 6) .

Furthermore, because the proportional assumptions were violated in example 2, the CSH and SDH curves of the two groups in Web Figure 1 (obtained through the nonparametric technique) have a late difference, showing that the cHRs and sHRs may vary over time.

Therefore, a weighted average HR alone may fail to quantify and interpret the treatment effect.

As an alternative statistic, some researchers (9-11) developed the RMTL, which corresponds to the area under the CIF curve. Thus, the RMTL can easily be implemented and interpreted on a time scale. Meanwhile, as an "absolute" measure, the RMTLd can be used to supplement the cHR and sHR to evaluate the treatment effect. Moreover, the RMTLd-based test does not require any model assumptions.

Based on the RMTLd, we introduced a new statistical inference framework and sample size estimator. From our simulation results, the performance of the estimation of the RMTLd and the RMTLd test are acceptable and robust. However, notably, the simulation results of 45% censoring are not shown in Table 1 and Table 2 because we set the true RMTLd at t = 4 years; that is, the final follow-up time should be equal to or greater than 4 years for the generation of survival data, which is violated with 45% censoring (for more discussion, see the Web Table 2, Web Table 3 and Web Figure 2 ). In summary, the proposed RMTLd is accurate, and the RMTLd test has well-controlled type I error rates and has similar power to (or even larger power than) the Gray test. Table 1 : The parameter settings for the simulations Web Table 2 : Simulation results of the estimation of RMTLd under proportional SDH Web Table 3 approximately equal to 0 (in the denominator of relative bias), so we replace bias with relative bias to assess the performance of the RMTLd under scenario A in the text. Meanwhile, when high censoring exists (i.e., 45% censoring), the estimation of the RMTLd is biased and results in undercoverage. The main reason for this phenomenon is that our true RMTLd was chosen to be t = 4 years; that is, the final observed follow-up time was equal to or greater than 4 years in the generation of survival data. Therefore, when high censoring exists, the RMTLd may fail to be estimated because the final follow-up time may be less than 4 years. See associated Web Figure 1 for more details. approximately equal to 0 (in the denominator of relative bias), so we replace bias with relative bias to assess the performance of the RMTLd under scenario A in the text. Meanwhile, when high censoring exists (i.e., 45% censoring), the estimation of the RMTLd is biased and results in undercoverage. The main reason for this phenomenon is that our true RMTLd was chosen to be t = 4 years; that is, the final observed follow-up time was equal to or greater than 4 years in the generation of survival data. Therefore, when high censoring exists, the RMTLd may fail to be estimated because the final follow-up time may be less than 4 years. See associated Web Figure 1 for more details. 

Reporting and design of randomized controlled trials for COVID-19: A systematic review

How to Quantify and Interpret Treatment Effects in Comparative Clinical Studies of COVID-19

Competing risks in epidemiology: possibilities and pitfalls

Ignoring competing events in the analysis of survival data may lead to biased results: a nonmathematical illustration of competing risk analysis

A competing risks analysis should report results on all cause-specific hazards and cumulative incidence functions

Relative rates not relative risks: addressing a widespread misinterpretation of hazard ratios

Competing risk regression models for epidemiologic data

Practical recommendations for reporting Fine-Gray model analyses for competing risk data

Decomposition of number of life years lost according to causes of death

Estimating Treatment Effect With Clinical Interpretation From a Comparative Clinical Trial With an End Point Subject to Competing Risks

The use of restricted mean time lost under competing risks data

Restricted mean survival time: an alternative to the hazard ratio for the design and analysis of randomized trials with a time-to-event outcome

Interpretability of Cancer Clinical Trial Results Using Restricted Mean Survival Time as an Alternative to the Hazard Ratio

Novel Risk Modeling Approach of Atrial Fibrillation Restricted mean time lost for competing risks data

Two-sample tests of the equality of two cumulative incidence function

How to Quantify and Interpret Treatment Effects in Comparative Clinical Studies of COVID-19

Remdesivir for the Treatment of Covid-19 -Final Report

Enhanced secondary analysis of survival data: reconstructing the data from published Kaplan-Meier survival curves

and 0 n can be written asHence the total sample size n is