key: cord-0806146-1dklzvx8
authors: Homer, Victoria; Yap, Christina; Bond, Simon; Holmes, Jane; Stocken, Deborah; Walker, Katrina; Robinson, Emily J; Wheeler, Graham; Brown, Sarah; Hinsley, Samantha; Schipper, Matthew; Weir, Christopher J; Rantell, Khadija; Prior, Thomas; Yu, Ly-Mee; Kirkpatrick, John; Bedding, Alun; Gamble, Carrol; Gaunt, Piers
title: Early phase clinical trials extension to guidelines for the content of statistical analysis plans
date: 2022-02-07
journal: BMJ
DOI: 10.1136/bmj-2021-068177
sha: e350dbd88c74a4891551b103bd9f26c34778399e
doc_id: 806146
cord_uid: 1dklzvx8

This paper reports guidelines for the content of statistical analysis plans for early phase clinical trials, ensuring specification of the minimum reporting analysis requirements, by detailing extensions (11 new items) and modifications (25 items) to existing guidance after a review by various stakeholders.

Item 1b: Trial registration number.

"A trial registration number should be provided which uniquely identifies a clinical trial and its existence on a publicly-accessible registry. The International Committee of Medical Journal Editors (ICMJE) mandates the registration of clinical trials in a primary register of the World Health Organization (WHO) International Clinical Trials Registry Platform (ICTRP) or in ClinicalTrials.gov before recruitment of the first patient as a condition of consideration for publication [4] . This identifier should be clearly listed in all relevant documentation including the protocol and the SAP."

[1]

Item 2: SAP version number with dates.

Explanation: "Sequentially numbering and dating each SAP version avoids any confusion over which document is the most recent. Transparent tracking of version numbers and amendments facilitates trial conduct, review and oversight. The first final version of a document will be Version 1.0. It is recommended that subsequent final documents will have an increase of "1.0" in the version number (1.0, 2.0, etc.).

While the document is under review, subsequent draft versions will increase by "0.1", e.g., 1.1, 1.2, 1.3, etc. When the revised document is deemed final, the version will increase by "1.0" over the version being revised, e.g., the draft 1.3 will become a final 2.0." [1]

Example: "Version 1.0 (21 Apr 2017)" [5] Protocol version Item 3: Reference to version of protocol being used.

Explanation: "Referencing the version of the protocol being used is helpful as it links the SAP to the protocol and serves as a reminder that the SAP is not a standalone document and needs to be read in conjunction with the corresponding version of the protocol. This avoids the need for the author to duplicate information from the protocol in the SAP. If there have been protocol amendments after the SAP has been written then the SAP needs to be reviewed against the amendments, and updated where necessary. The information in SAP revision history table may be extended to record that the SAP has been reviewed in light of protocol amendments but no changes were required." [1] Example: "This statistical analysis plan is based on protocol version 5 dated 24 February 2015." [5] SAP revisions Item 4a: SAP revision history.

Justification for each SAP revision.

Timing of SAP revisions in relation to interim analyses, etc.

Explanation: "A clear explanation of the changes made between each version of the SAP is essential, along with a justification for the revision and the date. This is important to maintain transparency. After the first version of the SAP is agreed and signed off, the SAP revision history should include the following information: the previous version number, the SAP section changed, details of the change made along with justification for the revision, and date of revision. A justification for each SAP revision is necessary to document the reasons for changes. This ensures the external validity of the trial as it demonstrates that changes are not being made based on unblinded trial data. From a regulatory perspective when SAP revisions occur after unblinded interim analyses have been conducted the people involved in deciding, writing, or approving the SAP should ideally have no knowledge of unblinded data particularly if the trial will be used for a licence application. In other situations, it may be sufficient for the justification to document the reason for the change is not based upon comparative data and for the approver to have no knowledge of unblinded data." [1] Vn2.0, Vd12-Aug-2016 Updated to match protocol: objectives, DLT definition. Using cohorts of 3-5 patients.

Vn3.0, Vd09-Mar-2018 Addition of definitions of populations for analysis and protocol changes [6] Roles and responsibility Item 5: Names, affiliations, and roles of SAP contributors.

"Individuals who contribute significantly to SAP development should have their contributions described. Listing the SAP contributors, their affiliations and their roles in the SAP development process provides due recognition, accountability, and transparency. Naming of authors and statements of author's contributions is standard for SAPs published in journals such as Trials, but rare in unpublished SAPs. Contributors may be non-signatory members if only the statistician writing the SAP, supervising senior statistician and the chief investigator/clinical lead will sign and approve the SAP." [1] Signatures of:

Item 6a: Person writing the SAP.

Explanation: "The signature of the person writing the SAP is crucial as it identifies who is responsible for the SAP and that they have approved the SAP. In all circumstances this should be signed and dated. If an update has been made then the author of the update should sign the updated version." [1] Signatures of:

Item 6b: Senior statistician responsible.

Explanation: "The signature of the senior statistician responsible for overseeing the trial is important as it highlights that the SAP has been reviewed and approved by an experienced statistician. In some circumstances the senior statistician may be the person writing the SAP and such a dual role should be reflected in the signatories. The signature should always be dated." [1] Signatures of:

Item 6c: Chief investigator/clinical lead.

"The signature of the chief investigator/clinical lead demonstrates that they have reviewed and approved the SAP. Once the final version has been approved and signed off it avoids any post-hoc changes being made without the justification and approval of all signatory members to maintain internal and external trial validity. The signature should always be dated." [1] Section 2: Introduction

Background and rationale Item 7: Synopsis of trial background and rationale including a brief description of research question and brief justification for undertaking the trial.

Explanation: "The full rationale for undertaking the trial and trial background are explained in detail in the protocol so only a brief synopsis is necessary within a SAP to avoid duplication of information. The synopsis should include justification for undertaking the trial, why the trial is needed and description of the research question. This item would be regarded as essential if the SAP is to be accessible externally (e.g., published in a journal or on a website) but is optional if the SAP is an internal document only." [1]

Example: "There is substantial non-clinical, preclinical and clinical data that the therapy can arrest the autoimmune mediated destruction of pancreatic beta cells by induction of functional Tregs that inhibit islet specific autoreactive Teffs. However, prior to embarking on large proof of concept trials in type 1 diabetes it is essential that the dose of the therapy that induces an increase in Treg proportion while resolving qualitative defects is determined." [7] "This is a phase I clinical trial of the combination of the experimental drug combination in patients with advanced solid tumours. It is a dose escalation study to establish the recommended phase II dose followed by an expansion phase to further assess tolerability, PK/PD profile and antitumor activity of the recommended dose of the combination." [5] Objectives Item 8: Description of specific question, objectives or hypotheses. It should be made clear what the key objectives are (for example primary and secondary objectives that encompasses toxicity, efficacy, PK, PD, or some combination).

The trial objectives reflect the scientific questions to be answered by the trial, defining its rationale and scope. This information may be provided in sufficient detail within the protocol, in which case a reference would be sufficient. If the protocol contains insufficient detail, then additional detail may be required within the SAP. From the trial objectives or hypotheses, it should be clear whether the final trial conclusions (and where appropriate, the dose to be taken forward), are to be based on toxicity, efficacy, PK, PD or some combination of the aforementioned. In the scenario where the design is jointly assessing toxicity and efficacy, it should be clear which one is to take precedent in the scenario where they draw different conclusions.

Example: "ADaPT aims to establish a dose of the treatment sufficient to raise circulating DHEA levels in severely injured trauma and hip fracture patients with rule-based escalation supplemented by Bayesian hierarchical models." [8] "CLARITY aims to assess the eradication of detectable minimal residual disease (MRD) using the drug combination." [9] Section 3: Study Methods

Item 9a: Brief description of trial design, including the trial phase and the design method (dose escalation e.g., CRM or single-arm phase II e.g., Simon's Two Stage). If the trial has a randomised element to it, summary information regarding the randomisation, including the allocation ratio, should be specified.

Specify the trial design, including references where appropriate. Including the trial phase (e.g., phase I with dose expansion cohort, or phase I/II) is important in the context of early phase trials as there can be less clear distinction between trial phases. The content and level of detail required in the SAP is directly dependent on the methodology that underpins the trial. By making this apparent at an early stage, it encourages transparency and focuses the SAP.

While randomisation is rare in early phase clinical trials, it can occur. If a trial has a randomised element, it is important to state: i) whether the analysis is intended to be comparative, ii) to provide the allocation ratio, and iii) to specify which aspects are blinded. For example, in the instance of a placebo-controlled trial, the trial may be blind to active treatment vs placebo within cohort, but the dose level used in each cohort may be open.

"This is a prospective, single centre, cross-sectional, randomised, pharmacokinetics study with rule based escalation supplemented by Bayesian hierarchical models. Further details regarding the proposed Bayesian models can be found in section X. The randomised element of the trial randomises patients 1:1 to receive IMP either orally or sublingually. The randomisation will not formally be comparative but aid the evaluation of the secondary trial objectives. The randomisation will not be blinded." [8] "This is an open-label, multi-centre, dose-escalating adaptive platform phase Ib/IIa trial. The trial will employ a two-stage modified Time-To-Event Continual Reassessment Method for Partial Ordering (PO TiTE-CRM, described in section X) to determine the Maximum Tolerated Dose (MTD) of the drug in combination with radiotherapy." [10] Trial design Item 9b: Treatment information, including the dose levels of intervention(s). Where appropriate, and if multiple doses are used, the following should also be reported: the ordering and combination (in the instance of multiple agents under investigation) of dose levels, and the dose level to start at.

All relevant treatment information should be made available in the SAP, or suitably referenced to in supporting documents (such as the protocol).

In early phase clinical trials, it may be the case that multiple dose levels of treatment are under investigation. If this is the case, then it is advised that these dose levels and their ordering should be clearly written for all trial designs (not just those with a partial ordering component Where multiple doses are used, it is best practice to specify the dose levels under investigation in advance. However, there are times when this is not possible, such as when instances where IMP is administered by IV. If this is the case, careful and thorough documentation should here be provided regarding how doses will be chosen and dose escalation will occur.

Where only a single dose is under investigation the details provided in this section will be briefer.

"Five doses (1x10 10 ; 3x10 10 ; 1x10 11 ; 3x10 11 and 1x10 12 ) of drug will be investigated using a 3+3 design.

The first cohort will be treated at dose 1x10 10 . The doses given to subsequent cohorts will be adaptively selected based on the incidence of DLTs. This design will require a maximum sample size of 30 patients, and could stop the trial early if excess toxicity is observed at a dose." [11] Figure A3a: Dose pathway " [10] "Patients will receive x mg/kg of drug subcutaneously for ten weeks once per week in the first instance, during out-patient appointments at site. For the purposes of safety, it is proposed that the first 2 patients will be recruited as Sentinel patients. These will be recruited in series, and each will be assessed for 2 weeks before the next patient will be recruited. Data on Sentinel patients will be assessed by an independent safety monitoring committee. If the safety monitoring committee is satisfied that the product has an acceptable safety profile in the sentinel patients, the study will be opened to general recruitment." [12] Trial design Item 9c: Details regarding the statistical methodology underpinning the trial, including the choice of the number of parameters in the model if applicable, its empirical form and all formulae. It is also important to ensure all model parameters are given, including where appropriate, the weights of the model.

The statistical methodology underpins the trial and ensures that achieving the objectives and hypotheses is feasible. Clear detailing and explanation of the statistical methodology should be made available. This information may be provided within the protocol, in which case a reference to the relevant section(s) of the protocol would be sufficient. However, if the protocol contains insufficient detail, as protocols usually target clinical rather than statistical readers, then additional detail may be required within the SAP. By including details regarding the mathematical form of the model (where appropriate for trial design), and the number of parameters in the model, transparency in the trial is promoted. It may also be appropriate to justify why the model specification was chosen.

Where parameters are to be sampled from a distribution, it is imperative that these distributions, and the elicitation of these (be it through expert elicitation or chosen from standard distributions) are given here to ensure observed results do not influence critical parameters required for analysis.

For models with a TiTE component, the mathematical form of the weight formulae should be explicitly stated.

If the trial makes purely rule-based dose escalation decisions (e.g., 3+3), and there is negligible statistical methodology underpinning the trial design, this section can be omitted.

"The EffTox design [13, 14] (and version 4.0.12 of the EffTox software, and a proprietary implementation of EffTox written in Python, where necessary) is used for dose escalation/deescalation decisions. This design establishes the optimal dose which is both safe and effective in terms of the definition of tolerability and efficacy as above.

EffTox estimates the probability of efficacy and toxicity at each dose given the patient outcomes observed and the investigators' prior beliefs. The design then uses contours to calculate the utility score of each dose given its associated probabilities of efficacy and toxicity. A dose is preferable to another if it has a higher utility score. When invoked to provide the next dose allocation, the EffTox design disregards the doses that are probably intolerable or ineffective. Of the remaining doses, it selects the dose with the greatest utility score. We seek a dose of drug x to be given in combination with treatment y that is associated with a probability of efficacy of 45% or more, and a probability of toxicity of 40% or less. The EffTox design will infer that a dose is probably ineffective if there is at least a 97% probability that the rate of efficacy is less than 45%. It will infer that a dose is probably intolerable if there is at least a 95% probability that the rate of toxicity is greater than 40%. Initial patients will receive dose level 1. The model will be updated after each patient or cohort of patients is evaluated for DLT and efficacy outcomes. The model is updated using all accumulated information to provide the recommended dose for the next patient. The EffTox design does not skip untried doses in escalation or de-escalation. When calculating the next dose, EffTox calculates the Bayesian posterior probabilities of toxicity and efficacy at each dose using the patients' outcomes accumulated thus far. Marginal probabilities of toxicity and efficacy are modelled in linear (1) and quadratic (2) form respectively. The marginal probability of toxicity at dose x is given by:

ℎ > 0 ( , ) ↑ ℎ and the marginal probability of efficacy at dose x is given by:

where g -1 is the inverse logistic transform. As it is expected that higher doses of the combination do not necessarily result in greater efficacy, a quadratic form is utilised to allow for this non-monotone dose-response relationship. The joint probability model is:

ℎ , , ∈ {0, 1} = ( , , , 1, , 2, , ) where the x, theta notation has been suppressed for readability and is an association parameter.

The model hyperparameters are based on informative prior guesses for efficacy and toxicity at each dose level elicited from clinical investigators ( Table A3b ). The prior effective sample size (ESS) used is 1.3. Thall et al., [15] advise ESS values between 0.5 and 1.5. The greater the ESS, the stronger weight the investigators' prior beliefs bear on the posterior beliefs. Prior beliefs on the six model parameters are assumed to be described by normal distributions. The hyperparameter values associated with our value for ESS are calculated by the EffTox software (Table A3c) . " [16] "For patients who have not completed the scheduled treatments, the TiTE-CRM model will weight their safety data based on the proportion of days the patient has been assessed for over the DLT assessment period using a linear function." [17] "Linear weighting functions will be employed for any patient with a length of follow up between the three time points. One weight function to calculate weights between 8-12 weeks and another for weights between 12-52 weeks. For the weighting function ( ; 1 , 2 , 3 ) where is the time-totoxicity of patient and 1 , 2 , 3 is the time period with values 8, 12 and 52 respectively. Then for

Item 9d: Rules of the trial design and model.

Here information on the target objective (toxicity, response, PK, or PD, either singularly or in combination), classification of overdosing, and any stopping boundaries should be given. This may include the desired certainty in these estimates.

Moreover, where dose decisions (e.g. escalation, de-escalation, remain at current dose or stop early) are to occur, details regarding dose escalation transitions and dose skipping should be given

The primary objectives for most early phase trials are investigated by trying to attain a fixed probability of an event occurring, or value of a continuous outcome. For safety and dose escalation trials, this may be targeting a toxicity probability (e.g., probability of toxicity between 25% and 33%), or attaining a fixed figure from a continuous scale (e.g., an Area Under the Curve 0-24 hours after administration ( 0−24 ) above a desired threshold); whereas for single arm phase II trials, this may be targeting an efficacy/response probability (e.g., at least 72% (13/18) of patients achieving an objective response). There do exist trial designs, such as the EffTox and Emax designs, which target both. These targets should be made clear. Moreover, for multi-stage designs where the continuation of the trial is based on formal interim analyses, the target probabilities at each interim, and where appropriate, stopping boundaries (e.g., at the interim if at least 50% of patients have achieved an objective response, then the trial shall continue) should also be explicitly given.

Furthermore, indications of the desired certainty that these targets have been attained should be made distinct. For example, we seek a 70% posterior probability that the true toxicity rate falls between 25% and 33%, or evidence that the lower bound of a 95% confidence interval for protection rate is greater than 70%. For fixed designs, such as the A+B dose escalation, or single arm single stage phase II, such as A'Hern's designs, this certainty is ascertained from exact probability distributions. Where interim analyses and stopping rules are implemented, the desired certainty in the interim results should also be given.

For early phase trials with outcomes or dose escalation decisions that depend on toxicity, classification of over-and under-dosing, and the certainty in these that would warrant action should be given either in the SAP or suitably referenced (for example to the protocol).

Details on how the design would be implemented in the trial and the adaptations that would be made based on accruing data on key outcomes (e.g. toxicity or efficacy or both) should be provided. For instance, for dose escalation trials, explicit statement of the rules regarding dose escalation and de-escalation, especially regarding dose skipping. For example, no doses or only up to one dose may be skipped per escalation, however, doses can be skipped if the dose is to be de-escalated for safety. Instances where the model can be overwritten (due to safety concerns) and the dose selected downgraded require documentation detailing criteria for these situations. This information should either be given in the SAP or suitably referenced (for example to the protocol).

Example: "We seek a dose of drug x to be given in combination with treatment y that is associated with a probability of efficacy of 45% or more, and a probability of toxicity of 40% or less. The EffTox design will infer that a dose is probably ineffective if there is at least a 97% probability that the rate of efficacy is less than 45%. It will infer that a dose is probably intolerable if there is at least a 95% probability that the rate of toxicity is greater than 40%." [16] "The Simon's two-stage minimax design requires 3/18 successes at the interim analysis to continue, and 9/37 successes at the final analysis." [3] "The trial may stop early for safety. In the event that all dose levels are toxic, the trial will stop before reaching the maximum number of patients. If P(risk of DLT > 0.35|dose = 1, current data) > 0.65 for the lowest dose level and at least three patients have complete data for the toxicity endpoint (a DLT or have completed the toxicity window) we will stop the trial.

The dose suggested for the next patient is the optimal dose as defined above. However, escalation to an untried dose is subject to no dose skipping, and is only permissible if at least 2 patients have been given the dose immediately below for at least 8 weeks. There is no restriction on deescalation." [18] Trial design Item 9e: Experimental details and design specifics.

For dose escalation trials, information regarding cohort size, including whether this is fixed or flexible should be given.

For model-based and model-assisted designs, details on the prior including full skeleton (if applicable) and its elicitation should be given.

For single arm phase II trials, the target sample size and, where appropriate, the timing of any interim analyses.

Most dose escalation trial designs rely on patients being enrolled in dose cohorts and then being evaluated at the appropriate time before decisions regarding dose escalation are made. Therefore, it is imperative to include information regarding cohort size, the target total trial sample size, and the timing of dose escalation decisions. It should also be made clear whether the total cohort size is going to reflect when the dose escalation decisions will be made. For example, if the total cohort size is going to be n=8, but dose escalation decisions made when n>4, or after each patient in the cohort has at least completed cycle 4. Sufficient detailed regarding cohorts and dose escalation decisions may be contained within the protocol, in which case suitable reference to this section is sufficient. If flexible cohort sizes are to be used, then reference should be made to how the size of the cohort will be ascertained.

For novel agents, sentinel dosing may be used to aid safety evaluation before the recruitment to the full cohort commences. If sentinel participants are to be used, a clear description of the number of sentinel participants, how much treatment and follow up they need to complete before the recruitment to the full cohort can commence, and whether they will be evaluated with the full cohort should here be included. If adequate details are captured in a supporting document (such as the protocol), suitable reference to these may instead be given here.

Where model-based or model-assisted designs are used and continually updated, thus where the notion of cohorts is depreciated, information regarding when the model will be updated should be specified (e.g., after each DLT or at least every three evaluable patients or a minimum treatment period).

Where simulations have been run to assess the operating characteristic of the trial design, summary details may be given here with reference given to another document (such as a simulation report) where greater details are contained.

Dose expansion cohorts are often used to gain a better insight into the safety or efficacy profile at the proposed dose if these are to be used, information regarding the sample size of the expansion cohort should be included here. Moreover, if results from the dose expansion cohort contradict those from the original dose escalation trial, clarification should be provided regarding the consequences (e.g., if doses could be altered).

For single arm phase II trials reference to the total and, where appropriate, interim sample sizes should be made (e.g., the interim analyses will take place after the first 9 evaluable patients have received their outcome assessment visit). It is not necessary to include the full power and sample size calculations, as these will be detailed later (see item 11).

A definition or suitable reference to, the end of trial definition, including any formal stopping rules should be included. If simulations have been run to assess the operating characteristics of the stopping rules, either inclusion or reference (for example to a simulation report) to these should be made.

For model-based and model-assisted designs, since the choice of prior distribution, and where appropriate the skeleton distribution, can influence the posterior results, transparency regarding model specification is encouraged and so the full form of the priors should be included in the SAP or be suitably referenced. This further allows for full trial reproducibility and replication if needed. Moreover, an indication as to how this was elicited (e.g., using an expert or expert panel, or using statistical packages, functions or programs) should also be included for transparency.

Example: "30 patients will be recruited. Simulations have been used to justify this sample size, results are given within the simulation report in Appendix X. The trial may stop early for safety. In the event that all dose levels are toxic, the trial will stop before reaching the maximum number of patients. If P(risk of DLT > 0.35|dose = 1, current data) > 0.65 for the lowest dose level and at least three patients have complete data for the toxicity endpoint (a DLT or have completed the toxicity window) we will stop the trial." [18] "There will be a maximum of 12 patients treated in each group in the phase I component. Once a dose has been decided upon for each group there will be a 9 patient expansion in each of these doses for phase II. "The phase II Simon's two-stage minimax design incorporates an interim analysis of the accumulating data. The interim analysis (stage-1) takes place once 18 patients have been evaluated for the primary outcome -which is based on the response level of ALP. If three or more successful responses (i.e. 25% or more reduction in ALP level) are observed in stage-1 then the trial will continue into stage-2. Recruitment will not be halted while stage-1 is assessed. Further patients will be recruited during stage-2 in order to obtain the necessary sample size of 37 patients; allowing for 10% patient drop out during trial duration, this number could reach a total of 41 patients being required." [3] "The six dose levels scheduled for a combination of drug x and drug y, together with the prior probabilities of a DLT at those levels, are presented in Table A3e . The prior guess of MTD is at Dose 4, but to exercise caution as this combination regimen has never been studied in this patient population, we will start at Dose 2. If the combination dose is too toxic, the design allows for deescalation to dose level 1. Randomisation Item 10:

Where appropriate, randomisation details e.g., whether any minimisation or stratification occurred (including stratifying factors used or the location of that information if it is not held within the SAP) and where applicable, details on blinding.

While randomisation in the context of early phase clinical trials is uncommon, it can occur. If randomisation is used, this should be clearly stated and details regarding the randomisation provided. This will typically include the method of randomisation, e.g., stratification, block, or minimisation and information of factor levels provided (where appropriate). It may be that sufficient information is available in other trial specific documents (such as the protocol), in which case reference to this is acceptable.

"Approximately 36 eligible subjects aged 10-17 years were to be randomised at a ratio of 1:1:1 to one of three doses of Ferric Maltol (7.8 mg, 16.6 mg or 30 mg BID) for nine days (Days 1 to 9). Randomisation was to be stratified by age (10-14 years, 15-17 years) and gender (male, female)." [20] Sample size

Full sample size calculation determination or justification or reference to relevant sample size calculation section in protocol (instead of replication in SAP)

The sample size calculation may be included in full in the SAP or a reference to the sample size calculation in the protocol or other document may be provided. The sample size calculation is an important piece of information for every trial as it determines how many patients are required in the primary analysis to ensure the trial is appropriately justified to detect a clinically important difference.

For phase I trials, it may be sufficient to justify the trials sample size by the number of patients per cohort, and the total number of cohorts expected to be enrolled. Moreover, for dose escalation trials where dose escalation is based solely on the observed toxicity, it may be useful to detail the minimum number of participants expected to be recruited in the scenario that either no DLTs are seen (if this is different to the maximum sample size), or all doses are found to be too toxic.

Where the sample size has been verified by simulations (to ensure the trial can yield a successful result), the operating characteristics and results should be included. Again, it may be appropriate to only include summary information in this section of the SAP with suitable reference given to a supporting document such as a simulation report (see point 33) where greater detail is given. Where single-arm phase II trial designs are used, it is important to include all relevant information on which the trial design is based, e.g., design (A'Hern, Simon's Two Stage, etc.), statistical significance level (alpha), power, the exact type I and type II error rates (where calculated), effect size including 0 (the largest unacceptable response rate) and 1 (the smallest acceptable response rate), and where appropriate, the dropout rate assumed. Moreover, where fixed designs are to be used, it is important to clearly document how deviations from the planned sample size will affect decisions regarding the conclusions drawn from the trial. For example, if a trial requires 22 responses out of 30 patients to be classified as a success, how an increase or decrease in the number of evaluable patients, (e.g., to 32 or 27), would affect the number of required responses and success criteria.

In all scenarios, details of any sample size calculations, including the software used (and version), must be provided to allow the calculation to be reproduced.

"There was no formal sample size calculation in the Phase I stage. The design was based on the traditional 3+3 design for phase I trials. The recruitment plan was to recruit 3-6 participants to be treated at each of up to 7 dose levels until the MTD could be identified. Participants who did not complete the first cycle of treatment for reasons other than toxicity were replaced at the current dose level. A maximum of 42 participants evaluable for toxicity were required to complete all of the dose levels for Phase I." [21] "We use an A'Hern design to investigate whether 12 months of combined treatment of drug x and drug y leads to MRD eradication in the bone marrow of at least 30% of patients. Over this time horizon, using drug x as a monotherapy, we would expect no more than 10% of patients to eradicate MRD from their bone marrow, thus we compare 1 =0.3 to 0 =0.1. Using statistical significance (alpha) of 2.5% and statistical power of 95.5%, this design requires at least 10 patients to achieve MRD-eradication in the bone marrow out of 50 to approve the combined treatment. This means:

-If the true rate of MRD-eradication in bone marrow after 12 months of treatment with drug x & drug y is 10%, the statistical design will correctly reject the treatment at least 97.5% of the time; -If the true rate of MRD-eradication in bone marrow after 12 months of treatment with drug

x & drug y is 30%, the statistical design will correctly approve the treatment at least 95.5% of the time." [9] "The Simon's two-stage design requires a total of 37 evaluable patients receiving the confirmed dose. This was calculated using the following parameters: alpha = 0.10, beta = 0.2, 0 = 0.15, 1 = 0.30.

[Note: the values for 0 (0.15) and 1 (0.30) correspond to the required reduction in patients experiencing raised levels of ALP from 85% to 70%, i.e., 1-0.85=0.15 and 1-0.70=0.30] The Simon's two-stage minimax design requires 3/18 successes at the interim analysis to continue, and 9/37 successes at the final analysis. The phase II Simon's two-stage design incorporates an interim analysis of the accumulating data. The interim analysis (stage-1) takes place once 18 patients have been evaluated for the primary outcome -which is based on the response level of ALP. If three or more successful responses (i.e., 25% or more reduction in ALP level) are observed in stage-1 then the trial will continue into stage-2. Recruitment will not be halted while stage-1 is assessed. Further patients will be recruited during stage-2 in order to obtain the necessary sample size of 37 patients; allowing for 10% patient drop out during trial duration, this number could reach a total of 41 patients being required. If overall there are nine or more successful responses from 37 evaluable patients, then we conclude that the treatment warrants further investigation. If the prescribed patient number is not met then the appropriate decision criterion, corresponding to the total number of evaluable patients, will be selected from table A3f. Patients treated at the confirmed dose during the dose confirmatory stage will contribute to the total evaluable patient requirement.

Method for calculation used "Sample Size Tables for Clinical Studies Software", Sze-Huey Tan (2008). 

This section is not always appropriate. Relevant details on phase I trials will typically be captured elsewhere.

For single arm phase II trials where hypothesis testing is to be undertaken, outline the intended analysis framework.

Regardless of the framework of the primary analysis, other estimands may be important to draw trial conclusions, for all early phase trials. The SAP should clearly specify the framework for each estimand or provide a global statement.

"The main analysis for the single-arm cohorts will be Bayesian in nature." [22] "The A'Hern's design is employed under a frequentist hypothesis framework."

Item 13a: Information pertaining to interim dose decisions (e.g. escalation, de-escalation, remain at current dose or stop early).

Dose-escalation can be poorly documented meaning dose-escalation decisions may be ambiguous. Clear descriptions of the dose-escalation procedure and associated analyses should be provided. This will typically include who will perform the analyses, what interim analyses will be carried out, when they will be performed (e.g., timing and frequency), and who will ultimately decide whether to escalate the dose (e.g. the model, or the DMC/TSC/TMG). Clearly documenting the timing of dose escalation decision in relation to data collection and portion of trial lapsed avoids dose escalation decisions being made based on non-robust/incorrect data. If the trial does not have a dose escalation portion, this section is not necessary. If separate SAPs have been written for dose escalation analyses, then these should be referenced.

Example: "After the initial 10 patients are assigned to fixed doses as described in Section X the data will be examined by the Dose Determination Committee (DDC) following each new patient. A set of interim analyses will be conducted where the accumulated data will be analysed. A set of candidate models (presented in Section Y) will be fitted to the data. Each model will provide an estimate and standard error (SE) of the target doses that achieve the two targets of a minimum Treg increase and a therapeutic Treg increase. Each model will also provide a recommended dose to assign to the next patient. At each DDC meeting, the choice of the next dose to assign to the next patient will be decided. The choice will be made after consideration of the analyses, but will not be bound by formal decision rules. The choice of dose will always lie below the maximum of 1.5 X 10 6 IU/m 2 BSA.

The target response rates are those that achieve a:

1. Minimum Treg increase defined by the Trial Steering Committee (TSC) at a 10% maximum increase of Treg 2. Therapeutic Treg increase, defined by TSC at a 20% maximum increase" [7] "The recommended dose (the dose with estimated DLT probability closest to the target of 35%) for each of the subsequent cohorts is determined using the CRM incorporating all of the accumulated DLT outcomes but for added safety, the design includes a restriction to prevent skipping of untested doses when escalating. Recruitment continues until either the maximum sample size is reached, the trial is stopped early due to unacceptable levels of DLT at the lowest dose or when there are four consecutive cohorts allocated to the recommended MTD (providing sufficient evidence that the MTD is reached). The two early stopping rules allow for early termination:

1. If there is a high probability (> 0.7) that the posterior probability of DLT at the lowest dose is greater than the target DLT rate of 35%, indicating that the lowest dose is too toxic. 2. If four consecutive cohorts (three patients in each cohort) have already been allocated at the current MTD, which would also be the recommended dose level for the next cohort if the trial continued.

The value of 0.7 was selected so that the design will recommend stopping early for excessive toxicity if we observe 2 or 3 DLTs out of the first 3 patients at the lowest dose level." [23] Statistical interim analyses and stopping guidance Item 13b: Information on other interim analyses specifying what and when interim analyses will be conducted.

Information needed to conduct any other interim analyses, aside from dose-escalation analyses, should be detailed. Information to be recorded includes statistical methods to be used, who will perform the analyses, what interim analyses will be carried out, when they will be performed (e.g., timing and frequency), and what decisions can be taken. If there are multiple interim analysis timepoints, researchers may choose to include checklists detailing which analyses are to be carried out at each time point. If interim analyses are not planned then this should be stated for clarity. Moreover, if sample size re-estimations to verify initial assumptions are to be performed following such interim analyses, indication of this and the assumptions which are liable to be tested (e.g. variance of the primary outcome, overall event rates, dropout rates) should be here detailed. If details of interim analyses are recorded with sufficient detail in other documents, such as the protocol, then suitable reference may be appropriate to avoid duplication. Finally, if separate SAPs have been written for interim analyses, then these should be referenced.

Example: "Only one interim analysis is planned and will take place once 18 patients have been evaluated for the primary outcome (response in ALP level, measured from baseline to day 99). The interim report will be prepared and supplied to the DMC when the study has recruited and evaluated 18 patients at the chosen MED dose of drug X (including those recruited on the MED dose during the dose confirmatory stage), or annually whichever is earliest." [3] "No formal interim analysis is planned for this trial. However, accumulating un-blinded data will be presented by component/cohort and treatment arm on a yearly basis to an independent Data Monitoring Committee (DMC) for monitoring of safety, recruitment, data quality and activity. After the trial has opened, a Trial Safety Committee (TSC), with an independent chair, will meet at least annually following the DMC to provide overall supervision for the trial and provide advice through its independent chair. The ultimate decision for the continuation of the trial lies with the TSC." [22] Statistical interim analyses and stopping guidance Item 13c: Any planned adjustment of the significance level due to interim analysis.

Many early phase trial designs feature formal interim analyses, both in the context of dose escalation trials or multi-stage single arm phase II trial designs, to inform the future conduct of the trial. These interim analyses and where appropriate, any adjustments to control the type I error rate, are often informed by the trial design. If alpha spending functions are going to be used to control the type I error rate, the chosen approach should be clearly specified, justified and referenced. If no adjustments for alpha spending are to be made, this should also be clearly stated.

"This is not a confirmatory study, we will not consider multiple testing although we do acknowledge that any finding relating to secondary endpoints will be treated as hypothesis generating." [7] "There are three sources of multiplicity in this study: multiplicity due to interim analyses, multiplicity due to multiple doses, and multiplicity due to multiple endpoints.

The overall type 1 error rate for the study is protected against multiplicity due to interim analyses, because of the alpha-and beta-spending rules described in the preceding section.

The overall type 1 error rate for the study will be protected against multiplicity due to multiple doses by using a step-down, or gatekeeper, procedure. The statistical significance of the difference in response between the low dose group and the placebo group will be assessed if and only if the corresponding difference between the high dose and placebo has already been shown to be statistically significant.

The study has a single primary endpoint, corresponding to the single primary estimand.

All other endpoints are secondary or exploratory. Therefore, no adjustment to nominal p-values will be made to protect the overall type 1 error rate for the study against multiplicity of endpoints." [24] Statistical interim analyses and stopping guidance Item 13d: Details of guidelines for stopping the trial early.

Details should be provided on the guidelines to be used for stopping the trial early, including whether these stopping rules are binding or advisory and any alternations to recruitment which may be implemented prior to stopping the trial early.

Information on specific stopping boundaries and/or thresholds to be used, including posterior probability cut-offs should be included.

A description of instances where model prediction can be overridden for safety reasons should be pre-specified. The risk of overdosing should be quantified and justified during the design. Such calculations will often be given in supporting documents, e.g., in the simulation report or the protocol. Reference to these documents should be made. It should be clear whether a statistical method will be considered within the early stopping guidelines.

"Two additional criteria have been added to the modified TiTE-CRM to allow for early termination of either group. They are as follows:

• If there is a high probability (>80%) that the posterior probability of DLT at the lowest dose is greater than the target DLT rate, indicating that the lowest dose is too toxic. If the model recommends early stopping due to this safety criteria, the TMG and Safety Committee will be alerted and the latter, with support of any external evidence, will recommend if the trial should be stopped. • We would allow the trial to stop early before the full recruitment of 21 patients if nine patients have already been allocated at the most current MTD, which would be the recommended dose level for the next cohort if the trial continues, in consultation with the DMC.

A "look ahead" strategy will be implemented if the next recommended dose level by the modified TiTE-CRM model will not be influenced by the outcome of the remaining patient(s) of a particular cohort (DLT or no DLT). By implementing this strategy, we enable the next cohort of patients to be recruited immediately without awaiting the final observations from the current cohort, therefore reducing waiting times." [17] "The clinical trial will be subject to periodic reviews by an independent safety monitoring committee. The trial will be suspended if any of the following conditions are met:

1. ≥1 patient in the first Sentinel patients experiences a Serious Adverse Event related to IMP 2. ≥33% of patients (with n > 3) recruited to the study show a significant decrease in the functional rating score (>50%) compared to baseline during the 10-week dosing period.

Life (>50%) compared to baseline during the 10 week dosing period." [12] Timing of final analysis

Timing of final analysis, e.g., all outcomes analysed collectively or timing stratified by planned length of follow-up.

Explanation: "Information on the timing of final analyses should be included, if relevant. Information on timing of final analysis should explain whether all outcomes are analysed collectively or whether timing is stratified by length of follow-up required. Details should be provided on whether there are shortterm and long-term outcomes and how they will be reported i.e. will all outcomes be analysed collectively or will the short-term outcomes be published earlier and the long-term outcomes reported at a later date." [1]

Example: "A preliminary final analysis will be undertaken to present available data for the escalation phase of the study once an MTD has been determined and agreed. Once the escalation phase is complete and all patients have been followed up for the full duration in accordance with the schedule then the planned final analysis for this drug will be undertaken. This will take into account secondary and exploratory outcome measures." [10] "For this study, the end of the trial is defined as "the final visit of the last patient undergoing the trial". A final visit should take place 30-35 days after the last administration of IMP. All patients will be followed up for survival (unless they withdraw their consent) once every 3 months until death or until the last patient last visit (LPLV) time point, whichever occurs first. After the LPLV, the trial data will be monitored, then locked, final data listings will be produced and the analyses will be carried out." [2] Timing of outcome assessments Item 15: Time points at which the outcomes are measured including visit "windows".

Explanation: "The time points at which outcomes are measured is helpful information that can be found in the protocol often in table format. The SAP should either refer to the relevant section of the protocol for details or include this information. If outcomes are required to be measured within a particular time window in relation to each planned visit in order to contribute to the analysis then this should also be specified." [1] 

Indications of uncertainty a

Level of statistical significance.

Where applicable and if traditional tests of significance and cut-off values are to be used to gauge statistical significance, then the significance level to be used including whether tests will be one-or two-sided should be documented. Where a trial has a formal sample size calculation, the significance level used for the primary outcome should be consistent with that used in the sample size calculation. However, it is not necessary for secondary outcomes to use the same significance level, and if these are to change depending on outcome, the critical value for each outcome should be documented.

"There is no statistical significance level defined for the primary outcome in CAMELLIA as it is a dosefinding trial and does not involve hypothesis testing; there will be no adjustment for multiplicity. The secondary outcomes described here will be assessed at the 5% significance level and/or using 95% two-sided confidence intervals, as appropriate." [2] "Unless specified otherwise, a two-sided significance level of 5% will be used in frequentist analyses." [8] Indications of uncertainty a

Description of any planned adjustment for multiplicity, and if so, including how the type I error is to be controlled.

Multiple testing in the context of early phase trials is generally not recommended, as these trials tend not to formally test hypotheses, rather make recommendations for future confirmatory trials. [25] However, if adjustments for multiplicity are to be made, authors should pre-define what methods will be used and which outcomes these methods will be applied to. The rationale for adjustment and method(s) chosen should also be justified.

Example: "There will be no adjustment for multiplicity in this trial." [21] Indications of uncertainty a

Either confidence or credible intervals to be reported (appropriately picked dependent on the trial methodology).

The intervals (either confidence or credible) are essential to the interpretation of statistical analyses reported for any of the primary or secondary outcomes. Typically, confidence intervals (CI) and pvalues will be reported if the trial uses a frequentist framework, whereas credible intervals (CrI), and where appropriate posterior probabilities, will be reported if the trial uses a Bayesian framework. The level of the CI or CrI to be reported should be decided at the design stage to avoid bias being introduced by modification based on trial data. These levels may be consistent across outcomes or vary by primary, secondary, exploratory and safety outcomes. If this is the case, this should be clearly specified.

If models are being implored at any point, it may be appropriate to here specify the model output which will be reported.

Example: "95% confidence intervals, calculated using Wilson's method, will be used in frequentist analyses." [8] "The proposed target doses of each model with their standard errors and with a 95% confidence interval will be reported." [7] "The posterior probability of DLT at each dose level will be reported with 95% credible intervals."

Item 19a: Definition of adherence to the intervention and how this is assessed including extent of exposure Explanation: "Authors should pre-specify their definition of adherence to the intervention. Non-adherence to the intervention can include not completing the intervention, (e.g., not consuming all prescribed drugs or consuming a lower dose than is prescribed). This may be reported to aid generalizability of results or may be linked to an analysis population specification." [1]

Example: "Adherence/Compliance will be assessed by the date of protocol treatment, dose delays, discontinuation and reasons for delays or discontinuation for each patient." [21] Adherence and protocol deviations Item 19b: Description of how adherence to the intervention will be presented.

"Along with defining adherence to the intervention it is also crucial to describe how adherence to the intervention will be presented. This process avoids any bias being caused by adherence being defined after unblinding of data." [1]

Example: "The treatment that patients received in each cohort will be reported in table X (patient disposition and treatment) and figure Y (treatment received by cohort). Specifically, the treatment received, dose delays, dose intensity, discontinuation and reasons for delays or discontinuation will be reported." [21] Adherence and protocol deviations Item 19c: Definition of protocol deviations for the trial.

Explanation: "A protocol deviation is defined as a failure to adhere to the protocol such as the wrong intervention being administered, incorrect data being collected and documented, errors in applying inclusion/exclusion criteria or missed follow-up visits. A protocol deviation should be defined as major or minor. A deviation may be considered a serious breach if it affects efficacy, the safety, physical or mental integrity of the participants in the trial, or the scientific value of the trial. Protocol deviations should be defined prior to unblinding of data to avoid any bias being caused and due consideration given to inclusion of participants within analysis populations. [26] Protocol deviations may be defined in another document and referenced within the SAP." [1]

Example: "A protocol deviation is defined as a failure to adhere to the protocol. Major and minor deviations are defined in the protocol. A deviation may be considered a serious breach if it affects efficacy, the safety, physical or mental integrity of the participants in the trial, or the scientific value of the trial. For this study protocol deviations will be defined as deviations from the treatment schedule as per the protocol." [21] Adherence and protocol deviations Item 19d: Description of which protocol deviations will be summarized.

Explanation: "A description should be provided on how protocol deviations will be summarised. Providing details of whether the deviation is major or minor is helpful if sensitivity analyses are to be conducted by removing patients with major deviations to assess impact on overall conclusions or to align with analysis populations. The approach to summarising the protocol deviation should also be made clear e.g., number and type of protocol deviations by intervention group or listing of all deviations." [1]

Example:

Protocol deviations will be reported for each dose level, tabulated according to their major/minor classification.

Item 20: Clear definition of the trial/dose cohort(s) including how cohorts will be referred to, how patients enter cohorts, the minimum number of patients needed to be in a cohort (and how long they have been in) before dose escalation decisions can be made.

Trial level definitions of patient populations (e.g., per-protocol, intention to treat, safety) should also be given.

Details regarding evaluable patients and specify what happens to unevaluable patients should also be made.

These definitions should be also be provided for any interim analysis populations.

The analysis populations should be specified in advance. This includes how the analysis populations will be defined and which dose escalation decisions and outcomes will be analysed according to each analysis population. It is important to clearly define populations, even if terms are considered standard. For example, if there is no consistent definition of intention to treat (ITT) and the phrase has different meanings for different authors, then a clear definition of these patient populations facilitates the definition of outcomes under the estimands framework (further details given in section 6: Analysis). Patients may be evaluable for different populations.

In the context of dose escalation trials, it is also important to define the cohorts and how they will be referred to, (for example, according to the dose they received or their sequential enrolment). It should also be made clear how many patients can enter each cohort, and the minimum number per cohort and how much trial treatment/follow-up they must have completed before dose escalation decisions can be made.

For all types of early phase trials, the criteria for a patient to be considered evaluable for outcome assessment and when patients are to be replaced should be stated (e.g., must complete at least one IMP administration). It is common in early phase trials, that patients who are not evaluable (for example due to withdrawal or non-compliance) are replaced.

In the event that the trial has a formal sample size calculation and does not recruit to target, it should be specified what the minimum percentage of the target sample size that would need to be recruited to justify completing the full analysis. For recruitment below this threshold, it should be detailed what analysis will be performed and reported.

This section should be made clear either in the SAP, or suitably referenced supporting document (e.g., trial protocol).

Example: "Dose escalation population: Assessment of the proportion of DLTs for each dose level will be based upon assessment of patients who complete ≥75% of their doses (≥ 6 doses) during the DLT assessment period or who experience a DLT at any time after initiation of the infusion of the first dose. Patients who withdraw early from the study for reasons other than DLT will not be assessable for DLT, and may be replaced by another patient at the same dose level. Safety population: The safety analysis population will include all patients who received at least part of one dose of drug X. Efficacy (disease response) population: The efficacy population will include all patients who have received at least part of one dose of study treatment and at least one posttreatment response assessment." [2] "All patients will be analysed on an intention to treat basis. Any patients discovered to be ineligible after being entered into the trial will be listed. Participants who did not complete the first cycle of treatment for reasons other than toxicity were replaced at the current dose level. A maximum of 42 participants evaluable for toxicity were required to complete all of the dose levels for Phase I. All patients starting cycle 1 treatment were evaluable for toxicity." [21] "The primary analysis population will consist of all participants who receive at least one dose of any trial treatment and, have at least one response assessment available. Only participants, for whom written informed consent has not been received, will not be included in this population.

The safety population will include all participants who receive at least one dose of any trial treatment. Only participants for whom written informed consent has not been received, will not be included in this population." [27] Section 5: Trial Population

Reporting of screening data (if collected) to describe representativeness of trial sample.

"If a trial collects screening data then it is important that the data are appropriately presented to describe the representativeness of the trial sample. This information is not only important for the trial but also important for future trials in the area. The process for screening patients e.g. how patients will be screened and what data will be collected, should be fully described within the trial protocol. According to the CONSORT guidelines [28] as a minimum the number of patients who are assessed for eligibility should be provided with this information presented in a flow diagram, however, more detailed tabulations may be provided. The SAP should describe how this data will be summarised and presented." [1]

Example: "Information relating to screening data including the number of participants screened, found to be ineligible (with reasons where available) or declined to participate (with reasons where available) will be presented as in Table X ." [2] Eligibility Item 22: Summary of eligibility criteria.

Explanation: "The trial inclusion and exclusion criteria should be specified in the protocol. Details of how eligibility data will be summarised should be provided. Some CONSORT diagrams provide details of the number of patients screened followed by a breakdown of how many patients were eligible and how many were excluded due to violating each inclusion/exclusion criteria." [1]

Example: "The number of patients falling into the exclusion criteria will be tabulated by cohort and any ineligible patients randomized will be reported, with reasons for ineligibility in Table X ." [21] Recruitment Item 23: Information to be included in the CONSORT flow diagram.

"Information included within a CONSORT flow diagram displays the progress of all participants through the trial. The CONSORT guidelines say that "you must complete a flow diagram in order to be compliant with the CONSORT 2010 standard." [28] They provide a CONSORT flow diagram template that can be used and adapted to create a trial specific flow diagram. All necessary information that is displayed in a CONSORT flow diagram should be listed in the SAP so it is clear where the patient throughput will begin to be summarised and how, specific follow-up time points that will be presented along with information on withdrawals and loss to follow up. Alternatively, a study specific CONSORT flow diagram template can be included in the SAP highlighting the information that will be collected." [1]

Example: "The flow of participants through each stage of the trial, including numbers of participants assigned to a schedule, receiving intended treatment, completing the study protocol, and analysed for the primary outcome is provided following CONSORT. Protocol violations/deviations and information relating to the screening data including the number of ineligible patients entering the study, together with reasons will be reported. Information on number of participants screened, found to be ineligible (with reasons where available), refused to participate (with reasons where available) will also be included.

A CONSORT diagram will be prepared, an example CONSORT diagram is given in Appendix 3." [18] A CONSORT diagram will be produced to highlight the flow of patients through the trial, and a dose decision by cohort diagram will be produced to show the number of patients enrolled to each cohort and the decisions of the DDC.

Item 24a: Level of withdrawal, e.g., from intervention and/or from follow-up.

"In this section, all the possible levels of withdrawal should be listed, which may differ from trial to trial. Participants may withdraw from the intervention but continue with follow-up; withdraw from follow-up but allow data collected to date to be used; withdraw from follow-up and withdraw consent for data collected to date to be used; or be lost to contact/follow-up. Some clarification within the SAP about how each level of withdrawal will be categorised and presented is important."

[1]

Example: "The level of consent withdrawal will be tabulated and reported as a line listing containing the requisite dosing information (e.g., dose cohort assigned) and will be classified as:

• Consent to continue follow-up and data collection,

• Consent to continue data collection only,

• Complete -no further follow-up or data collection" [21] Withdrawal/follow-up

Timing of withdrawal/lost to follow-up data. Example: "This will be presented in tabular format, with numbers of withdrawals, discontinuations or dropouts, number of days to withdrawal, and reasons for withdrawal, drop outs or discontinuations for each Cohort, as in Table X ." [21] Withdrawal/follow-up Item 24c: Reasons and details of how withdrawal/lost to follow-up data will be presented.

"Patients can withdraw and be lost to follow up for many different reasons e.g. moved home, unable to participate any longer, withdrawn by clinician reasons etc. It is useful for the trial team to attempt to ascertain reasons for all withdrawals and loss to follow up. According to ICH E6 'Although a subject is not obliged to give his/her reason(s) for withdrawing prematurely from a trial, the investigator should make a reasonable effort to ascertain the reason(s), while fully respecting the subject's rights'. [24] Details of how this data will be presented should be included in the SAP. This information may be presented by intervention arm within a CONSORT flow diagram or in a table."

[1]

Example: "Withdrawals/loss to follow-up together with reasons will be reported by treatment schedule." [18] Baseline patient characteristics Item 25a: List of baseline characteristics to be summarized.

Presentation of baseline characteristics is crucial for every trial as it allows the reader to see whether the characteristics are balanced across any intervention groups or consistent with the target population. Details of which baseline characteristics will be summarised in the final report should be specified along with the population for which characteristics will be presented.

If there is a randomised element to the trial, it is important to present baseline characteristics for the entire trial and by randomised treatment, and at a minimum report baseline characteristics over any factors which the randomisation has been stratified or minimised over.

For dose escalation trials, it may be preferable to present baseline characteristics by allocated dose or enrolment cohort as well as across all dose levels.

For single arm phase II trials, baseline characteristics can be presented over the entire population or by appropriate subgroup.

For trials with a suitably small sample size, it may be appropriate to report individual baseline characteristics as a line listing. If this is to occur suitable information regarding this should here be included.

"These characteristics will be presented by analysis cohort. At a minimum, this will include:

• Age at time of trauma (years),

• Total injury severity score (for the trauma cohorts only), and • Mechanism of injury.

Further characteristics may be added as the discretion of the trial statistician, TMG, and DMC." [8] "Baseline characteristics, including important prognostic, demographic and clinical variables will be reported overall for the main population." [18] "Line listings will also be produced of baseline patient characteristics recorded on the Registration Form and Screening Form. Tabulated data will include: age at registration, sex, disease status at trial entry, disease history including time from first diagnosis to registration." [6] Baseline patient characteristics Item 25b: Details of how baseline characteristics will be descriptively summarized.

Explanation: "It is important to describe how baseline characteristics will be summarised and presented in the final analysis report. Formal statistical comparisons of baseline data by randomised groups are not advocated [30, 31] but if such comparisons are planned these should be justified. It is recommended that prognostic baseline characteristics are presented for the analysis population included in the primary analysis of the primary outcome as well as for all randomised participants in order to assess whether attrition has introduced selection bias and/or upset the balance achieved at randomisation." [1]

Example: "Baseline characteristics will be presented descriptively (without statistical hypothesis testing) on characteristics collected at the point of trial entry." [8] Section 6: Analysis

Examples of estimands are given after all the relevant explanations have been given.

List and describe each primary and secondary estimands including details of:

The SAP should define each estimands explicitly clearly identifying primary and secondary variables.

Definitions of estimands are captured in 26a-e based on ICH E9 (R1) which details the estimand framework that has been adopted by various clinical trial regulators. [32, 33] Estimand definition b

Item 26a: Treatment (including treatment combinations).

Details regarding the treatment of interest and, if applicable as in the instance of a randomised phase I, any alternative treatments to which comparisons will be made. In the context of doseescalation trials where multiple doses may be under investigation, it should be made clear if the outcome will analyse patients according to their cohort, dose received, pooled across all dose levels, or some combination of the aforementioned.

Item 26b: Population.

The trial population, defined with reference to item 20, pertinent to the outcome should be clearly stated.

Item 26c:

Variable of interest.

The endpoint to be obtained for each patient that is required to address the scientific question. If an outcome is recorded at multiple time points, it should be made clear which of these time points are required for the specific outcome. Detailed explanations should be provided, for example for survival outcomes making it clear what the length of survival is (e.g., calculated from the time of randomisation or time of administration of intervention) and censoring information. Details given here should include specific measurements and units, especially pertinent when multiple collection methods are used. Details need to be provided on what data manipulations or derivations will be performed and how they will be carried out (e.g., change from baseline, Quality of Life (QoL) score, Time To Event (TiTE), logarithmic transformations). If the calculation of a score is more complex, but a validated algorithm is available, then providing a reference and a link to the algorithm is sufficient. Scoring, including handling of missing data, should follow guidance proposed by the instrument developers, unless there is good reason to use an alternative technique, which should be described and justified. Sufficient detail needs to be provided in order for the reader to understand how the scores or results are to be calculated for each outcome.

For dose escalation trials where dose escalation is dependent on observed rates of dose limiting toxicities (DLTs), specification of (or suitably reference to) the definition of a DLT, its reporting window and how the maximum tolerated dose (MTD) and the recommended phase II dose (RP2D) will be identified.

Item 26d: Intercurrent event handling strategy.

Intercurrent events of interest should be defined here. Details regarding the strategy, including analysis adjustments, for dealing with intercurrent events should be specified. The five strategies for handling intercurrent events are: the treatment policy strategy; the composite strategy; the hypothetical strategy; the principal stratum strategy; and the while on treatment strategy. These strategies can be used independently or in combination, but intention of how to use should be clearly specified in advance of any analysis. [32] Estimand definition b

Item 26e: Summary measures.

Indication as to the population-level summary measure of the variable to which will be used. The summary measure provides a basis for a comparison between treatments or doses.

The estimand is described by the following attributes:

[26a] Treatment: Drug X infusions at days 1, 8 and 15 at a dose specific to the entry cohort as recommended by the CRM design.

[26b] Population: The evaluable population as defined in item 20. [The evaluable patient population is defined as those who meet the eligibility criteria, at a minimum have received the day 1 infusion and excludes those who have withdrawn for non-treatment related reasons.]

[26c] Variable of interest: Incidence of dose limiting toxicity (DLT) within the first 8 days of treatment. A DLT will be any adverse event (categorised as per CTCAE) which is graded as severe (grade 3) or higher and is deemed to be at least potentially related to treatment. Any patient who withdraws or dies due to treatment related reasons will be categorised as having experienced a DLT.

[26d] The following intercurrent events (IEs) of interest will be considered:

(1) Day 8 toxicity assessment not performed through patient related reasons.

(2) Day 8 toxicity assessment not performed due to site error.

(3) Day 8 toxicity assessment not being performed at the right time (performed either earlier or later than scheduled).

For IE (1), the reasons why the assessment was not performed will be investigated. Depending on the reasons for non-attendance a decision will be made regarding whether they are to be:

• Included in the analysis and assumed to have experienced a DLT;

• Included in the analysis and assumed to not have experienced a DLT; or • Excluded from analysis and replaced with recruitment of additional patient.

For intercurrent event (2) data from subsequent visit(s) will be used to ascertain if a suspected DLT occurred during the DLT reporting window. The main estimand will use all patients who had their day 8 assessment and those who it can be definitely ascertained to have experienced a DLT within the report window (using data from subsequent visits). Any patient who did not have the day 8 assessment and who either did not experience a DLT or experienced a DLT outside of the reporting window will be excluded from the analysis. The sensitivity estimand will then include the entire population as defined above, therefore covering all those as in the population who both did and did not have their day 8 assessment performed. For patients who missed the day 8 the following will hold: any patient who experiences an event which fulfils the criteria of a DLT at any point up until their safety visit will be assumed to experience a DLT; any patient who does not experience an event fulfilling the criteria of a DLT at any point up until their safety visit will be assumed to not experience a DLT at any point.

For IE (3) an analogous approach to the strategy defined to handle IE (2) will hold. Where it is the case that the safety assessment occurs prior to completion of the DLT reporting window, then data will also be ascertained from the first safety visit occurring after the completion of the DLT reporting window.

[26e] Summary measure: The number (count), proportion and percentage of patients experiencing a DLT per dose cohort. The estimated DLT probability for each dose from the CRM model, and the subsequent recommended dose.

The primary estimand is described by the following attributes:

[26a] Treatment: 7 infusions of drug X at dose Y mg/kg approximately 7 days apart starting on day 1.

[26b] Population: The modified intention to treat (mITT) population as defined in item 20. [The mITT population contains all patients who have received at least one infusion at the confirmed dose of drug X.]

[26c] Variable of interest: Serum alkaline phosphatase (ALP) at visit 3 (pre-infusion) and at follow up visit 10 (day 99) as evaluated at central laboratory.

The primary estimand is the response at day 99 in serum ALP, requiring a reduction of 25% or more from baseline. Baseline ALP level will be measured at pre-infusion on the first treatment visit (overall trial visit 3), and again at follow-up visit 10 day 99. The response will be calculated using the formulae:

× 100

Using the above formulae, a negative value indicates a reduction, whereas a positive value indicates an increase. The clinically meaningful reduction required corresponds to a value of -25% or less (≤ -25%). The proportion of patients with a clinically meaningful reduction will be calculated as

Where patients have their follow up visit 10 ALP sample missing, they shall be treated as a nonresponder and included in the denominator of the above equation. The number of non-responders for the primary outcome will be reported.

[26d] Intercurrent event:

The key intercurrent events pertains to blood samples not being analysed or returned from the central laboratory (e.g., due to samples haemolysing or being lost in transit). In order to mitigate against this further samples will be analysed locally. It is our intention to use the principal stratum strategy, and thus only analyse patients who have centrally analysed samples in the primary estimand.

[26e] Summary measure: The number and proportion of patients with a clinically meaningful reduction will be reported.

What estimator and analysis method will be used and how the results will be presented.

Conclusions can be affected substantially by the analysis method(s) used, therefore, it is extremely important to pre-specify the analysis method(s) so there is no possibility of the method being chosen because it gives the most positive results. For each outcome, the SAP should specify what analysis method(s) will be used for statistical comparisons. The population and summary measure used should be consistent with that specified in the definition of the estimand, in items 26a-e. If more than one method is to be used to analyse the primary outcome, e.g., adjusted and unadjusted for covariates, then the primary analysis method should be identified.

Where line listings are to be used, it may be prudent to here include which information will be reported.

For dose escalation trials, the criteria for deciding to escalate doses and how the final dose will be picked (e.g., that with DLT probability closest to but not exceed 33%) should be described.

For all model-based and model-assisted early phase trials it is useful to include the formulae (or sufficient reference to), and the mathematical specification of the model required for the analysis. If these formulae have been specified in earlier sections, such as in item 9 or in a supporting document, reference to this is sufficient. Moreover, if transformations are to be applied, then these should be specified along with the rationale for the transformation and the resulting interpretation.

To ensure that critical decisions and conclusions drawn from the trial where the analysis method is novel or non-conventional, it is recommended that the code required to produce the analysis and, where appropriate, inform dose escalation decisions is made available. While the main body of the SAP is not the appropriate place for this, it is suggested that the code is appended, see point 35.

Making the code available allows the critical decisions of the trials to be replicated and reproduced.

Example: "The estimator in order to determine the optimal dose is the EffTox design (as described in item 9c).

The optimal dose will be reported with its associated probability of DLT and response." We can then calculate the posterior probability of efficacy for each treatment schedule." [18] "The estimand is estimated by the number and proportion of patients with a clinically meaningful ALP reduction as described in item 26e."

Any adjustments for covariates.

For each estimator which has an underlying statistical model, the SAP should specify whether adjustment will be made, and if so, the covariates to be used (including the categories if applicable), and how these will be included in the model (e.g., as fixed effects, or random effects). For the primary endpoint, it must be clear whether the adjusted or unadjusted analysis is the primary analysis as failing to pre-specify can lead to bias.

"Baseline covariates will be adjusted for in the modelling as necessitated by clinical indication and in order to aid model convergence/diagnostics."

"No adjustments for covariates will be made."

Item 27c: Methods used to check assumptions of the underlying statistical methods and goodness of fit for the model.

For each estimator which has necessary post estimation check, there may be a number of assumptions which need to hold for the analysis to be valid and to ensure that conclusions, and where appropriate dose escalation decisions, drawn are correct. Checks to assess the underlying assumptions should be pre-specified.

Example: "The first method of checking model adequacy will be the presence of divergent transitions. Presence of any divergent transitions will indicate that the proposed model does fit the observed data satisfactorily, and that alternative models need to be considered. First, alternative specifications for any fixed-effects will be considered. Analytical functions of timevarying covariates will be considered (e.g., Time 2 , or √ ) to address the potential of non-linear progression. Secondly, and where appropriate, alternative specifications for the random-effects will be considered. It is anticipated that the terms used in the random effects structure will be a subset of those used in the fixed effects structure. In all situations, a saturated model is likely to provide a good fit. However, we will prefer a more efficient model with fewer parameters, if possible. In all cases, the final functional form of the models used will be presented. While all models will be run on multiple chains, and a warm-up sample discarded with the aim of minimising the possibility of non-convergence, non-convergence is possible. Model convergence will be assessed through visual inspection of history, density, and autocorrelation plots. Model convergence statistics of and the effective sample size will also be monitored. As with all convergence plots, such methodology is only appropriate for detecting non-convergence, should any of the aforementioned convergence plots or statistics suggest evidence of non-convergence, sensitivity to warm-up and sample, inclusion of different baseline covariates, and alternative model specifications will be considered." [34] "Given that only descriptive statistics are to be presented, there is no appropriate method for checking assumptions. The type of diagnostic statistics (either means and SDs or medians and IQRs) will be chosen based on the distribution of observed data." Section 7: Suggested SAP Appendices Simulation Report

Operating characteristics of the trial design to assess the probability of trial success under different plausible scenarios.

The estimand and scenarios assessed through simulations will be analogous to the estimand used in the main trial, appraising all the underlying assumptions and limitations of the model.

Where model-based designs are used, assessment of the design's operating characteristics is needed to ensure the trial will yield a (successful) result and provide sufficient overdose control. [40] Proper documentation of simulation studies is favoured by regulators.

The appropriate simulations to assess utility of the trial design should, at a minimum, test the scenarios where each dose level is the maximum tolerated dose (MTD) and where all doses are ineffective or dangerous (e.g. too toxic or fail to achieve the desired response). With regards to simulation output for dose-escalation trials, for each scenario it is preferable to report:

• Prior distributions or skeletons (as appropriate), • The true DLT rate, • The probability of each dose being selected as the MTD (where applicable), • The percentage or mean number of patients being treated at each dose level, • The probability or mean number of patients being treated above the true and estimated (where different) MTD,

• The probability the trial stops early due to excess toxicity (e.g., when the lowest dose is too toxic).

With regards to simulation output for efficacy estimands, for each scenario it is preferable to report:

• Prior distributions (as appropriate), • The true efficacy or response rate (as appropriate),

• For trials with a formal interim, probability of stopping for either efficacy or futility (depending on trial design), where the underlying truth would both agree with and counter indicate this, • Probability of yielding a successful result at the end of the trial. Sufficient information should be included to allow for replication by someone without prior knowledge of the trial. The methodology and rules underpinning the design (e.g., doses under investigation and their order, target dose, model type and parameters, including where appropriate, model weights, and how to select a dose) should be the same as for the main model to be used for analysis (as specified in the main SAP). In addition to output and results, at a minimum the following would be recommended:

• The estimated duration of the trial, • The number of patients to be enrolled per cohort and in total,

• If a flexible cohort size is to be used, how this cohort size was sampled, • How many simulations are run, • If the trial design has a TiTE component, the time between patients are enrolled, • If a seed was used, this should be included.

Further guidance, particularly for CRM trials is available. [41] Dose transition pathways Item 34: For dose-escalation trials, indication of the dose transition pathways (either using tables or trees/graphs) under different DLT scenarios.

For any dose escalation component of early phase trials, dose transition pathways (DTPs) are a useful tool used to assist decision-making, particularly useful in the instance of novel methodology or complex dose escalation/overdose control rules. DTPs facilitate transparency of dose escalation decisions and can be a useful tool to facilitate the work of the relevant safety monitoring board. [42] This section is not applicable for single arm phase II trials with efficacy outcomes. Figure A3b : Dose Transition Pathways [18] [6] Code Item 35: Full model specification and programming code used for evaluation of dose-escalation decisions.

Optional section, encouraged for novel model-assisted and model-based phase I designs. In these instances, the full model specification and programming code should be made available in the SAP (or suitably referenced document). If model code is to be included, appropriate annotation of the model code should be incorporated. Where established methodology is to be used, for which there are publicly available specialist software available, appropriately referenced indication of functions and packages to be used (including an example of such functions) is sufficient. This allows dose escalations decisions to be replicated and promotes reproducibility.

Optional section detailing exemplar tables, graphs and report templates.

While not necessary, a template may be appended to the SAP to aid in producing reports to be used

Reference should be made to any other Standard Operating Procedures (SOPs) or documents that are adhered to and followed when writing the SAP

These reports may detail the intended layout, content, tables, and graphs to be produced. This template may be stored separately to the SAP, in which case suitable reference is sufficient

Baseline characteristics, • Patient disposition, • Treatment (received, discontinuation, compliance), • Adverse Events (AEs), to include sections on DLTs, Serious Adverse Events (SAEs), nonserious AEs, • Dose-escalation content e.g., Modelling output, recommendation for next cohort (where appropriate), • Efficacy estimands (where appropriate), • PK estimands (where appropriate), • Sensitivity analysis results

This item was called 'Confidence intervals and P values' in the Gamble et al. paper. It has been changed to 'Indications of uncertainty' to reflect that many early phase trials designs are underpinned by Bayesian methodology

This item was called 'Outcome definitions

Adverse Events • AIC: Akaike information criterion • ALP: Alkaline Phosphatase • AUC: Area Under the Curve • BID: Bis In Die (twice a day) • CI: Confidence Interval • CONSORT: CONsolidated Standards Of Reporting Trial • CrI: Credible Interval • CRM: Continual Reassessment Method • CTCAE: Common Terminology Criteria for Adverse Events • DDC: Dose Determination Committee • DLT: Dose Limiting Toxicity • DMC: Data Monitoring Committee • DMP: Data Management Plan • DTP: Dose Transition Pathways • ESS: Effective Sample Size • ICH: International Council for Harmonisation • IE: Intercurrent Event • IMP: Investigational Medicinal Product • IQR: Interquartile Range • ITT: Intention To Treat • IV: Intravenous • LPLV: Last Patient Last Visit • MED: Minimum Effective Dose • mITT: Modified Intention To Treat • MRD: Minimum Residual Disease • MTD: Maximum Tolerated Dose • 0 : The largest unacceptable response rate • 1 : The smallest acceptable response rate • PD: Pharmacodynamics • PK: Pharmacokinetics • PO: Partial Ordering • QoL: Quality of Life • RP2D: Recommended Phase II Dose • SAE: Serious Adverse Event • SAP: Statistical Analysis Plan • SAR: Serious Adverse Reaction • SD: Standard Deviation • SOP: Standard Operating Procedures • SUSAR: Suspected Unexpected Serious Adverse Reaction • TiTE: Time To Event • TMG: Trial Management Group • TSC: Trial Steering Committee References

Full title: A Phase I dose escalation trial of the Humanized Anti-CD47 Monoclonal Antibody Hu5F9-G4 in Haematological Malignancies. Statistical Analysis Plan

A single arm, two-stage, multi-centre, phase II clinical trial investigating the safety and activity of the use of BTT1023, a human monoclonal antibody targeting vascular adhesion protein (VAP-1), in the treatment of patients with primary sclerosing cholangitis (PSC)

Clinical trial registration: a statement from the International Committee of

ComPAKT: A Phase I multi-centre trial of the Combination of olaparib (PARP inhibitor) and AZD5363 (AKT inhibitor) in patients with advanced solid tumours (CCR4058) Statistical Analysis Plan

Phase I/II study to determine the maximum tolerated dose and activity of the combination of romidepsin and carlzomib in relapsed or refractory peripheral T-cell lymphoma. Statistical Analysis Plan

Adaptive study of IL-2 dose on regulatory T cells in type 1 diabetes

A prospective, phase II, single centre, crosssectional, randomised study investigating Dehydroepiandrosterone and Pharmacokinetics in Trauma. Statistical Analysis Plan

CLARITY: Assessment of VenetoCLAx (ABT-199) in combination with IbRutInib in relapsed/refracTory Chronic LymphocYtic Leukaemia. Statistical Analysis Plan

ADePT DDR. Accelerating the Development and implementation of Personalised Treatments of DNA Damage Response agents and radiotherapy +/-immunotherapy for head and neck squamous small cell cancer. Statistical Analysis Plan

A Phase I Clinical Trial of a replication defective type 5 adenovirus vector expressing nitroreductase and GMCSF (AdNRGM) given via trans-perineal, template-guided, intra-prostatic injection, followed by intravenous CB1954, in patients with locally relapsed Prostate Cancer. Statistical Analysis Plan

ALS: A Phase II pilot safety and tolerability study of ILB in patients with Motor Neurone Disease (MND) / Amyotrophic Lateral Sclerosis (ALS)

Dose-Finding Based on Efficacy-Toxicity Trade-Offs

Efficacy-Toxicity trade-offs based on L^p norms

Effective sample size for computing prior hyperparameters in Bayesian phase I-II dose-finding

MATCHPOINT: MAnagement of Transformed CHronic myeloid leukaemia: POnatinib and INTensive chemotherapy: a dose-finding study

A Phase I trial of WEE1 inhibition with Chemotherapy and Radiotherapy as adjuvant treatment, and a Window of Opportunity trial with Cisplatin in Patients with Head and Neck Cancer. Statistical Analysis Plan

Full title: A Phase 1 study of the safety, tolerability and biological effects of intravenous EnAdenotucirev, a novel oncolytic virus, in combination with chemoradiotherapy in locally advanced rectal cancer. Statistical Analysis Plan

International phase I/II expansion trial of the MEK inhibitor selumetinib in combination with dexamethasone for the treatment of relapsed/refractory RAS-pathway mutated paediatric and adult Acute Lymphoblastic Leukaemia. Statistical Analysis Plan

A Phase 1, Open-Label, Randomised, Repeat Dose, Parallel Group Study to Evaluate the Pharmacokinetics, Safety and Tolerability of Ferric Maltol at Three Dosage Levels in Paediatric Subjects Aged 10-17 Years of Age with Iron Deficiency (with or without Anaemia)

Trial of Temsirolimus for Advanced Cancers. A Phase I/II single-arm trial to evaluate the combination of cisplatin and gemcitabine with the mTOR inhibitor temsirolimus for treatment of advanced cancers, including first-line treatment of patients with advanced transitional cell carcinoma of the urothelium

A phase II, randomiSed study of CHOP-R in combination with acalabruTinib comparEd to CHOP-R in patients with newLy diagnosed Richter's Syndrome (RS) and a pLAtfoRm for initial investigations into activity of novel treatments in relapsed/refractory and newly diagnosed RS. Statistical Analysis Plan

TRAFIC: statistical design and analysis plan for a pragmatic early phase 1/2 Bayesian adaptive dose escalation trial in rheumatoid arthritis

Study BP42698: A DOUBLE-BLIND, RANDOMIZED, PARALLEL-GROUP, PHASE 2 STUDY TO INVESTIGATE THE EFFECT OF RO7049665 ON THE TIME TO RELAPSE FOLLOWING STEROID TAPERING IN PATIENTS WITH AUTOIMMUNE HEPATITIS

Guideline on multiplicity issues in clinical trials

Medicines for Healthcare products Regulatory Agency. Good Clinical Practice Guide. United Kingdom: The Stationary Office

Phase Clinical Trial Network. Bortezomib, Vorinostat, and Dexamethasone Combination Therapy in Relapsed Myeloma: Results of the Phase 2 MUK four Trial

statement: updated guidelines for reporting parallel group randomised trials

International Conference on Harmonisation. Topic E6 (R1) Guideline for Good Clinical Practice (CPMP/ICH/135/95) European Medicines Agency

explanation and elaboration: updated guidelines for reporting parallel group randomised trials

Testing for baseline balance in clinical trials

International Council for Harmonisation of Technical Requirements for Pharmaceuticals for Human Use

ICH E9 -Statistical Principles for Clinical Trials

A first-in-human clinical trial of a bioactive dressing designed to reduce scarring of skin burns. Statistical Analysis Plan

Diagnostics for assumptions in moderate to large simple clinical trials: Do they really help?

The consequences of proportional hazards based model selection

Effects of TAMoxifen on the Mutant Allele Burden and Disease Course in Patients with MyeloprolifeRative Neoplasms. Statistical Analysis Plan

Are missing outcome data adequately handled? A review of published randomized controlled trials in major medical journals

Inference and missing data

Using simulation studies to evaluate statistical methods

How to design a dose-finding study using the continual reassessment method

Dose transition pathways: The missing link between complex dose-finding designs and simple decisionmaking

Explanation: "Details of the statistical packages to be used to conduct the statistical analyses may be provided in the SAP. While version numbers of software may change during the lifetime of the trial and so should not be specified in the SAP they should be included within final reports." [1]Example: "Statistical analyses will be carried out using relevant statistical software; SAS, Stata or R." [11] References Item 32a: References to be provided for nonstandard statistical methods.

"References should be provided in a SAP for any non-standard statistical methods that will be used. If there is any doubt on whether a method is non-standard then it is better to include a reference." Explanation: "Reference should be made to the Data Management Plan (DMP) with the version number that was used when writing the SAP. This is important as both documents should be linked with information in the DMP that is also important for the final analysis report. If there is no DMP, then the location of this information (e.g., data handling and cleaning) should be provided." 

Item 27d: Details of alternative methods to be used if distributional assumptions do not hold.

Since randomisation and blinding can be rare in early phase trials, a blinded review of distributional assumptions may not be relevant or possible. Therefore, it is important to pre-specify alternative methods and models which are to be used if the underlying assumptions do not hold. Akin to the main estimator and where possible, the formulae and mathematical specification of these alternative models should be given.The approach taken should be considered carefully as bias may be introduced either by choosing the method of analysis based on the results of tests of assumptions [35, 36] or from performing hypothesis tests in which the underlying assumptions are not upheld. Three possible approaches may be considered: i) pre-specify alternative analyses and how the statistician will choose between them in the SAP so that the process is transparent; ii) select a method of analysis that is robust to assumptions; or iii) state the method of analysis to be used in the SAP and specify that a sensitivity analysis will be performed using an alternative set of assumptions and the results compared.Example: "The relationship of the primary endpoint with dose will be explored by fitting a number of candidate models. The list of candidate models were fitted to the primary endpoint divided by 100 with the targets defined as 0.10 and 0.20, respectively. The candidate models include the linear, the quadratic, the cubic, the logistic and the Emax (with 3 and 4 parameters). The mathematical formula of the models are given by:The estimated equation of each convergent model will also be plotted in the scatter plot of the primary endpoint against dose. For each model, its estimated coefficients (a to d where applicable) with their standard error will be reported. The Akaike Information Criterion (AIC) and the deviance of each model will be reported as measures of adequacies of fit. The residual error of each regression will also be reported. The residual values of each model against its predicted values will be plotted, as well as a quantile-quantile plot of its residuals. The proposed target doses of each model with their standard errors and with a 95% confidence interval will be reported." [7] "A priori it is not thought that any alternative model specifications will be warranted. If this ends up not to be true, the alterations to this will be detailed in all generated reports and marked as a deviation from the SAP"."No modelling is here to be performed and so specification of alternative models is not appropriate".

Item 27e: Any planned sensitivity analyses for each outcome where applicable.

For each outcome, where applicable and in line with the definition of the estimand, the SAP should specify whether any sensitivity analyses will be conducted. The definition and description of any planned sensitivity analyses should include the same level of detail as in the descriptions of the primary and secondary estimators. Any parts of the estimand which will change when conducting sensitivity analysis (e.g., a change in analysis population) should be clearly defined and explained. Moreover, while it is unlikely in the context of early phase clinical trials that the presence of a high amount of missing data would trigger the need for sensitivity analyses, if such a minimum percentage does exist, this should be clearly stated.Example: "There are two planned sensitivity estimands to the primary estimand planned. For the first, where patients do not have a centrally analysed ALP value, the locally analysed ALP value will be imputed and the primary estimand repeated. The second sensitivity analyses will repeat the primary estimand this time using only locally analysed ALP for all patients. This sensitivity analyses is intended to only be exploratory and so no significance testing will be performed on sensitivity results.""Sensitivity analyses will also be performed in the per-protocol population, which is defined as those patients who completed the treatment as originally allocated with no dose modification or missing doses (i.e. patients that have received all 7 infusions as scheduled in the protocol at the MED dose). For sensitivity analyses, only the primary outcome measure (with central processing) will be assessed." [3] "Sensitivity analyses will be carried out during the trial for dose-decision meetings and also for final analysis for estimating the optimal dose. In addition to repeating the analysis using the sensitivity population defined in Section 4.2, we will also repeat the analysis using different weight functions. Therefore the 2 sensitivity analyses are:1. Sensitivity population and analysis using weights according to length of follow-up only, i.e. not taking into account how much dose has been received 2. Main population but with the most toxic scenario, i.e. we assume that all patients currently in follow-up within the DLT Window of 13 weeks have a DLT" [18] "For all the Bayesian analysis listed above, where prior distributions are specified in advance, sensitivity to prior will be assessed." [34] "There are no planned sensitivity analyses for this study." [21] Analysis methods

Any planned subgroup analyses for each outcome including how subgroups are defined.

All pre-planned subgroup analyses should be clearly specified. It may be appropriate to define the subgroup analysis using the estimand framework, including the same considerations such as how the patient populations will be defined and patients assigned subgroup categories, and how the results will be presented. Performing a large number of subgroup analyses is often infeasible in early phase trials due to the limited sample size and should generally be avoided. However, there may be times when it is appropriate to do so (for example when the aim is to demonstrate consistency across subgroups).Example: "Due to the lack of statistical power for subgroup analyses in this early phase II trial, results provided will be exploratory only. Therefore, results should not be over interpreted and instead used as a guide for further subgroup analyses in a larger phase III setting. Subgroups to be studied include:• Results by mutation type • Primary disease • Oestrogen receptor data • Sex • Health Economics" [37] "Exploratory subgroup analyses may be performed based on stage of liver disease and/or prior treatments. For exploratory subgroup analyses, only the primary estimand (with central processing) will be assessed and no hypothesis testing performed." [3] "No subgroup analyses are planned." [2] Missing data

Reporting and assumptions/statistical methods to handle missing data (e.g., multiple imputation).

While the majority of trials will have some missing data, [38] thus potentially introducing bias dependent on the pattern of 'missingness', [39] using formal methods such as multiple imputation to handle missing data is generally not advocated in early phase clinical trials due to the restricted sample size.Regardless, it is important that the SAP states how missing data will be handled and reported including details of any statistical methods and their assumptions, which will be used to handle missing data. The definition of what data is considered to be missing and the methodology used to deal with any missing data will be directly impacted by the definition of the estimand, as given in items 26.If there are plans to impute missing outcome data, then a list of variables and details regarding the imputation process should be made apparent, either through explanation here or through suitable reference to another supporting document where more details can be sought. Since conclusions drawn from any imputation depend on the statistical method used, it is crucial to pre-specify which methods will be used under which circumstances, and which will be considered the primary analysis. This again promotes transparency in the trial and minimises any ambiguity in the methods.Example: "The incident of missing data will be reported and if it rises more than 10% then sensitivity analyses will be carried out as appropriate. Summary tables will present the population size either in the title or in the column headers, and thus the number of missing values for any particular variable/visit will be documented." [7] "No data imputation is planned." [18] Additional analyses

Details of any additional statistical analyses required, e.g., complier-average causal effect analysis.

"Any additional analyses to be conducted should be specified with reasons these are required, a description of the additional analysis and how it will be conducted. This may include pre-specified exploratory analyses that are hypothesis generating or confirmatory of issues identified in other trials." [1]Example: "No imputation or additional analyses are planned a priori." [9] "Additional analyses will be performed which combines translational data with clinical outcomes."

Item 30: Sufficient detail on summarizing safety data outside of that used for dose escalation (e.g., non-DLT safety data), e.g., information on severity, expectedness, and causality; details of how adverse events are coded or categorized; how adverse event data will be analysed, i.e., by grade, incidence case analysis, intervention emergent analysis.

Where information on DLTs is collected, incidence and details of DLTs will typically be reported alongside the relevant outcome. However, consideration of the full safety profile is key for every clinical trial. It is important that safety data is reviewed and details are provided in the SAP on how the remaining safety data will be summarised during interim and final analyses, including the analysis population to be used. Information may be provided on the severity, causality and expectedness of the adverse event, events resulting in dose reductions, information on how the adverse events will be coded or categorised and by whom. The method of summarising the adverse event data should be described ensuring it is clear whether the descriptive summary will use number of events or number of patients and any analyses to be conducted (e.g., will the adverse events be compared descriptively or will formal statistical testing be undertaken).Example: "All safety analysis will be conducted on the safety analysis set. In order to assess toxicity throughout the trial, the following will be presented at each DMC meeting and in the primary analysis report.• The number of deaths in the trial will be reported by cohort and treatment arm with cause of death reported. • The number of serious adverse events (SAEs) will be presented by cohort and treatment arm. A summary for each SAE categorisation code (e.g., SAR, SUSAR, unrelated SAE) will be presented. • The number of grade 3 or greater adverse events reported by cohort and treatment arm (for the randomised component). • The number and proportion of patients experiencing a grade 3 or greater adverse event.• The number and proportion of patients experiencing any adverse event." [22] "Details of all AEs will be documented and reported from the date of commencement of protocol defined treatment until 28 days after the administration of the last treatment. All AEs will be followed up until resolution or until last trial visit (whichever occurs soonest). Any AEs ongoing at the patient's last trial visit will be marked as unresolved. In addition to safety outcomes as detailed below, the following will be reported, stratified perprotocol and, where appropriate, according to trial stage.• Toxicities will be tabulated by CTCAE v5.0 classification, grade and number (and percentage) of patients affected, • A line listings given of grade 3, 4 or 5 adverse events deemed at least possibly related to treatment, • Duration of adverse events will be summarised, • SAEs will be reported as frequency and number of patients experiencing them, together with outcome (e.g., death, resolved etc.), and • Line listings of all SAEs and DLTs.• Incidence and summary characteristics of adverse events of particular interest shall be reported. Adverse events that are of particular interest are those which are perceived to be related to tolerability of the gel, e.g., redness or itchiness of the wound." [34] Statistical software Item 31: Details of statistical packages to be used to carry out design, simulation and analyses