Statistical Issues in Clinical Trials for Treatment of Opiate Dependence Editor: Ram B. Jain, Ph.D. NIDA Research Monograph 128 1992 U.S. DEPARTMENT OF HEALTH AND HUMAN SERVICES Public Health Service Alcohol, Drug Abuse, and Mental Health Administration National Institute on Drug Abuse 5600 Fishers Lane Rockville, MD 20857 2 YI x ACKNOWLEDGMENT MH { Ww This monograph is based on the papers ar. c.ccucoiciio nui a weaiimcal review on “Statistical Issues in Clinical Trials for Treatment of Opiate Dependence” held on December 2-3, 1991, in Bethesda, MD. The technical review was sponsored by the National Institute on Drug Abuse (NIDA). COPYRIGHT STATUS NIDA has obtained permission from the copyright holders to reproduce certain previously published material as noted in the text. Further reproduction of this copyrighted material is permitted only as part of a reprinting of the entire publication or chapter. For any other use, the copyright holder's permission is required. All other material in this volume except quoted passages from copyrighted sources is in the public domain and may be used or reproduced without permission from the Institute or the authors. Citation of the source is appreciated. Opinions expressed in this volume are those of the authors and do not necessarily reflect the opinions or official policy of the National Institute on Drug Abuse or any other part of the U.S. Department of Health and Human Services. The U.S. Government does not endorse or favor any specific commercial product or company. Trade, proprietary, or company names appearing in this publication are used only because they are considered essential in the context of the studies reported herein. NIDA Research Monographs are indexed in the “Index Medicus.” They are selectively included in the coverage of “American Statistics Index,” “BioSciences Information Service,” “Chemical Abstracts,” “Current Contents,” “Psychological Abstracts,” and “Psychopharmacology Abstracts.” DHHS publication number (ADM)92-1947 Printed 1992 Contents INEOBUCHION ...ovterreresnnrinnnniamesiisuassmessibRinminmsonsnms ss sins sa 3 4 39343 Ram B. Jain Drug Dependence (Addiction) and Its Treatment ..................... Frank J. Vocci, Jerome H. Jaffe, and Ram B. Jain Background and Design of a Controlled Clinical Trial (ARC 090) for the Treatment of Opioid Dependence ................ Rolley E. Johnson and Paul J. Fudala Clinical Endpoints: Discussion Session .........cccccceevvverieverinennns Ram B. Jain Design of Clinical Trials for Treatment of Opiate Dependence: What IS MISSING? ..cooeeeiiiieeireeeerre cece eee Ram B. Jain COMMBALS .....o.veriieiiovsmmmmrminimmniimmmmnmssm ies ergsss Sudhir C. Gupta Rejoinder ......... os A ER SEER TA hes st as ar rn bi SO SETAE Ram B. Jain Summary of DISCUSSION .......coivueeiiiiieeiie ieee erences Ram B. Jain sos 29 Efficacy of Urinalysis in Monitoring Heroin and Cocaine Abuse Patterns: Implications in Clinical Trials for Treatment: of Drug: DOPONTBNCE ........c..cstsiissinnmemssississnssnsserransestrnsenrons 46 Edward J. Cone and Sandra L. Dickerson COMINGS re ssi teisionitemiseotmasnrs ser buss sivi oiinds se Bist EnE so Re Besar bs SUA ld 59 Nancy L. Geller SUMMaTY Cl DISCUSSION ....vt. iin runs ods snnenessssassatisivstidsons sannsrencedlesnicsntosny 62 Ram B. Jain Open/Panel Discussion: Design ISSUES...........ccceeinienieniennieninnienieene 64 Ram B. Jain A Bayesian Nonparametric Approach to Analysis of Treatment for Drug DapendanCoi Data ....c......cv cin csamrsnstimassansasinestpnsncsniasmasnetabass 70 Ram C. Tiwari Three Estimators of the Probability of Opiate Use From INCOMPIBIO I DAIR .... oes tiiiis chanson reais ste sila sens iieas sansa ys iit stein hss neenvias 82 Alan J. Gross SUMMBLY- ORDISCUSSION .... oii. Liisi sisisnins stints siamo ts ion sss wooa os 95 Ram B. Jain Issues in the Analysis of Clinical Trials for Opiate Dependence ................ 97 Dean Follmann, Margaret Wu, and Nancy Geller SUMMBTIY. Of DISCUSSION... cic 2 isis berprns sn srsantnssnanssases iitsmaa sansa sas issnsnis sh 114 Ram B. Jain Analysis of Clinical Trials for Treatment of Opiate Dependence: What Ar SIPOSSIDIRIBET. o.oo iirsieirs seis sist nssmnashstabnsssasmssssatssentesshrosahitss 116 Ram B. Jain Summary Of DISCUSSION «.....ceo iris iisrsssmisensi inns isessnssasis si nsshstashessnsaintaness 135 Ram B. Jain Toward a Dynamic Analysis of Disease-State Transition Monitored by Serial Clinical Laboratory TesIS ...........uucnionssiessissnsssbnnsssnessssseisensstosenions 137 T.S. Weng SUMMABIY Of DISCUSSION... coos verrisinmsssmsiriasismusimmsminsishimmssneecesnrresnnsrenirorosin 158 Alan J. Gross A Markov Model for NIDA Data on Treatment of Opiate Dependence ....... 160 Mei-Ling Ting Lee SUMMALY Of DISCUSSION ....snntmrmersrmrsesmsmsressinncenseunsensensensrebmnneannenesnpens 168 Alan J. Gross Open/Panel Discussion: Analysis ISSUES ...........ccceevueriiiriiiinicnieeenieniene 170 Ram B. Jain Open/Panel Discussion: General I8SUBS ............uuusssusssssmsspsssssnsnin 176 Ram B. Jain IBY Of ParlICIDBNES curv ummmmiusiiisnmunsannvansvsssvansss sisson sss sius sass emsm dims iavs ses 182 List of NIDA Resgarch MONOQIPhS .........ccswnmimmimremisrsmsinssimssnmsssnian 186 TR TT I RA TTY ie ii 8 amt NE 8 % = wILSEr i [3 - 3 . i fe mrs maar wi Introduction Ram B. Jain The Medications Development Division (MDD) of the National Institute on Drug Abuse (NIDA) came into existence in August 1990. Its mandate from the U.S. Congress is to develop medications for the treatment of drug dependence, primarily heroin and cocaine dependence. The organizational structure of MDD allows for five branches, one of which is the Biometrics Branch. | happened to be the first one to join the Biometrics Branch, and it was and still is a great learning opportunity for me. | found: Drug dependence is not a disease in the traditional sense that cancer or heart disease is; its treatment is not a treatment in the traditional sense—drug dependence is not treated the way a cancer or an infection is treated; and the characteristics of the data generated by clinical studies in drug abuse area are unique, not seen in other branches of medicine—a more than 50- percent dropout rate! The data generated by these studies are the product of a continuous dynamic interaction between the pharmacological effect of the therapeutic agent, the effect of nonpharmacological services provided as part of the total treatment, and most importantly, the drug-seeking behavior of the addict, which is shaped and influenced by the environmental stimuli around him or her. How does one statistically adjust for this multidimensional “noise”? What is being treated here is not quite obvious—Is it a medical condition, a mental disorder, a behavioral abnormality, or all of them at the same time? Between September 1988 and May 1990, Drs. Rolley E. Johnson and Paul J. Fudala conducted a randomized double blind, “double dummy” clinical trial (ARC 090) to evaluate the efficacy of 8 mg sublingual doses of buprenorphine compared with 20 mg and 60 mg oral doses of methadone in 162 patients. This study was conducted at NIDA’s Addiction Research Center (ARC). These data were provided to me for analysis. The primary data consisted of binary (positive vs. negative) data points obtained by assaying the urine samples for the presence of opiates. Since the urine samples were obtained three times a week from each patient in this 25-week study, each patient could provide up to 75 data points. Many endpoints could be defined and clinically defended using these data (e.g., percent-positive samples; a drug-free period of, say, 28 days or more), and several different statistical methods could be used to analyze them. After spending several months with these data, finding myself more informed every day than the day before, | determined that more could be learned—I could use expert opinion from outside. During the summer of 1991, | began planning for a workshop (a NIDA technical review) in design and analysis of clinical trials in the treatment of opiate dependence. Many well-known statisticians, including those who had many years of experience in managing and analyzing clinical trials, were contacted and asked if they would like to write and present research papers on the design and analysis of clinical trials in the treatment of opiate dependence and/or participate in this workshop. Commitments were obtained for five research papers. Each paper was to present the results of analyzing a part of the ARC 090 data. | also decided to present two papers—one on design, one on analysis. The statisticians who agreed to write research papers and/or participate (and finally came to the workshop) included Drs. Joseph Collins (Veterans’ Administration Medical Center), Lloyd D. Fisher (University of Washington), Dean Follmann (National Heart, Lung, and Blood Institute [NHLBI]), Nancy L. Geller (NHLBI), Albert J. Getson (Merck Sharp & Dohme), Joel B. Greenhouse (Carnegie-Mellon University), Alan J. Gross (Medical University of South Carolina), Sudhir C. Gupta (Northern lllinois University), A.S. Hedayat (University of lllinois), Nicholas P. Jewell (University of California at Berkeley), Peter A. Lachenbruch (University of California, Los Angeles), Jack C. Lee (National Institute of Child Health and Human Development [NICHD]), Mei-Ling Ting Lee (Boston University), Shou-Hua Li (National Institute of Dental Research), Taesung Park (NICHD), Carol K. Redmond (University of Pittsburgh), Saul Rosenberg (NIDA), Vincent Shu (Abbott Laboratories), Richard Stein (Food and Drug Administration [FDA]), Ram C. Tiwari (University of North Carolina), L.J. Wei (Harvard School of Public Health), T.S. Weng (FDA), and Margaret Wu (NHLBI). Without the presence, interaction, guidance, and advice of clinicians working in the drug abuse area, talking about designing and analyzing clinical trials for treatment of drug dependence would have been an exercise in futility, and therefore we requested participation from well-known clinicians in government, industry, and academia. Those who agreed to participate (and came to the workshop) included Jack D. Blaine (NIDA), Robert J. Chiarello (NIDA), Edward J. Cone (ARC), Paul J. Fudala (University of Pennsylvania), Harold Gordon (NIDA), David A. Gorelick (ARC), Charles W. Gorodetzky (CIBA-Geigy Corporation), Charles V. Grudzinskas (NIDA), John Hyde (FDA), Donald R. Jasinski (Johns Hopkins University), Rolley E. Johnson (Johns Hopkins University), Michael Murphy (Hoechst Roussel Pharmaceutical, Inc.), Frank J. Vocci (NIDA), and Curtis Wright (FDA). The NIDA technical review on “Statistical Issues in Clinical Trials for Treatment of Opiate Dependence” took place on December 2-3, 1991, at the Bethesda Marriott, Bethesda, MD. It consisted of four sessions: a Clinical Session, a Design Session chaired by Dr. Gross, a two-part Analysis Session chaired by Drs. Wei and Fisher, respectively, and a General Issues Session cochaired by Drs. Lachenbruch and Jack C. Lee. Drs. Vocci and Johnson presented papers during the Clinical Session; Dr. Cone (with Sandra L. Dickerson) and | presented papers during the Design Session; and Drs. Follmann (with Drs. Geller and Wu), Gross, Gupta, Mei-Ling Ting Lee, Weng, and | presented papers during the Analysis Session. All papers presented during the Design and Analysis Sessions were available for precirculation and were peer reviewed prior to the meeting. Authors were also invited to write rejoinders to referees’ comments. Drs. Geller, Greenhouse, Gross, Gupta, Jewell, Jack C. Lee, Redmond, and Tiwari were the reviewers. After the authors had presented their papers, reviewers also presented their comments at the workshop. Following the reviewers’ comments and rejoinders, if any, there was an open brief discussion of each paper that was presented. Individual papers during the Clinical Session were followed by a Discussion Session. The aim of this discussion session was to have the opinion of FDA about what kind of endpoints would be adequate and/or appropriate in clinical trials for treatment of drug dependence, what statistical methods should be used to analyze the data generated from these trials, and in general, what should be the strategy used to design these trials? The discussants for this session were Drs. Hyde, Gorodetzky, Stein, and Wright. All three Statistical Sessions concluded with a combined open/panel discussion. At each of these discussion sessions, a series of questions were presented (by NIDA) to the panels for discussion. Additional questions as appropriate were allowed to be presented by any of the participants at the workshop. The members of the Design Panel were Drs. Hedayat (chair), Getson, Gross, Gupta, Jasinski, Mei-Ling Ting Lee, Redmond, and Wu. The members of the Analysis Panel were Drs. Redmond (chair), Fisher, Follmann, Greenhouse, Gross, and Hedayat. The members of the General Issues Panel were Drs. Lachenbruch (cochair), Jack C. Lee (cochair), Collins, Fisher, Gupta, Jewell, Murphy, Shu, and Tiwari. | was honored to organize and be a participant in this NIDA technical review. The workshop was a tremendous success. There was a free exchange of opinion and information between the statisticians and clinicians. There were more agreements than disagreements. There was a unanimous agreement: These trials need a lot more work in both the design and analysis areas. However, in the unbiased opinion of a very prominent statistician, not connected with NIDA in any way to the best of my knowledge, one of the papers presented at this workshop was what might be called a breakthrough. This monograph presents the revised manuscripts as provided by the authors. Some of the revisions in these manuscripts may be a direct result of referees’ comments and authors’ rejoinders. Consequently, except for two papers, referees’ comments and/or authors’ rejoinders are not being reproduced, but all the referees have been given credit for their comments. Dr. Tiwari, who reviewed Dr. Gupta’s paper, showed interest (after the workshop) in writing a paper. His paper is also included in this monograph. However, Dr. Gupta could not submit an acceptable revised manuscript in time for publication of this monograph. Consequently, his manuscript could not be included in the monograph. Summaries of discussions on individual papers presented in the statistical sessions are also presented. Dr. Gross prepared the summary of discussions that followed the papers by Drs. Mei-Ling Ting Lee and Weng. | prepared all other summaries. | also prepared the summaries for the discussion session that took place during the Clinical Session and for the open/panel discussions during the Statistical Sessions. | have tried to give credit to individual speakers/ participants to the best of my ability. | have tried to reproduce opinions as close to the those of individual speakers as possible. | have tried not to inject my own biases to the degree | could. However, | take responsibility for all errors and omissions and tender my apologies to those whom | may have misrepresented and/or offended. This is just a beginning. NIDA’s MDD is busy planning the development of or is in the process of developing a variety of medications for the treatment of cocaine, heroin, and other substances that have the potential for abuse. In addition to buprenorphine (to treat heroin abuse), for which a multicentered pivotal trial is ongoing, a trial for I-alpha-acetylmethadol (LAAM) (to treat heroin abuse) will soon be initiated. This LAAM trial should lead to approval for its marketing by FDA sometime in late 1992 or early 1993. A pivotal trial for a sustained release formulation of naltrexone should be under way sometime in 1993. There are definite plans for developing a combination formulation of buprenorphine and naltrexone. New compounds are being acquisitioned from industry and elsewhere and are being tested for their potential for treatment of drug abuse. AUTHOR Ram B. Jain, Ph.D. Mathematical Statistician Biometrics Branch Medications Development Division National Institute on Drug Abuse Parklawn Building, Room 11A-55 5600 Fishers Lane Rockville, MD 20857 Drug Dependence (Addiction) and Its Treatment Frank J. Vocci, Jerome H. Jaffe, and Ram B. Jain INTRODUCTION AND SOME DEFINITIONS What is drug dependence or drug addiction? How does one become an addict or dependent on a drug? There is no simple or single answer to these questions. Dependence and addiction are the terms often used synonymously (as they are in this chapter). Unfortunately, these terms are often used in different ways in different contexts. Furthermore, according to Jaffe (1992): .. . science has been given no exclusive . . . right to the use of [these] terms . . . . Among the many behaviors that have been labeled “addictions” in the mass media are: eating salt; buying lottery tickets; using gasoline, computers, or foreign capital; taking educational courses; watching television; running; and engaging in sex. Some of the uses of the term are deliberately metaphorical. This chapter, however, attempts to summarize how dependence or addiction is currently viewed by most psychiatrists, physicians, and many behavioral psychologists. Although the concept of dependence has historically been divided into psychological dependence and physical (physiological) dependence, the current approach recognizes that such terms tend to contribute to an unscientific dualism. Today, most researchers believe that the mind does not exist independently of the brain. Drug dependence involves body, brain, and behavior as influenced by the environment. Abuse of drugs does not necessarily constitute drug dependence. One may keep abusing a drug but may never be dependent on it and may never need to take it to feel normal. For someone to change from being a drug abuser who is nondependent to someone who is drug dependent, the sense of control must change so that the individual begins to feel a need to take the drug to feel normal, and therefore the flexibility to use or not to use the drug is diminished. During this transition from nondependeny to dependency, the pattern of drug abuse does not have to change, although quite often there is an escalation in terms of the number of times the drug is used or the amount that is used. “Drug tolerance is a state of decreased responsiveness to the pharmacological effect of a drug resulting from a prior exposure to that drug or a related drug. When exposure to drug A produces tolerance to it and also to drug B, the organism is said to be cross-tolerant to drug B” (Goldstein et al. 1974). Drug tolerance can occur because of alterations in the central nervous system or because of more rapid metabolism (usually by hepatic induction). Although still used, “physical dependence” is another term that conveys a sense of sharp distinction between the brain and the “mind.” Physical dependence is used to mean that the use of a given drug has produced an altered body physiology so that, when the drug is stopped, there are physiological abnormalities (which eventually pass) that can be prevented by continued use of the drug. Physical dependence can be revealed by stopping the drug or by giving an antagonist that displaces the drug from its site of action in the body. Physical dependence can result from the therapeutic uses of a drug, for example, by using opioids to relieve pain in cancer therapy or benzodiazepines to treat anxiety. The discontinuation of a drug that one is physically dependent on can result in various pathophysiologic disturbances collectively known as a withdrawal or abstinence syndrome. lt is entirely possible that an individual could be physically dependent on a drug but still not be “addicted” to a drug; that is, the appearance of withdrawal symptoms does not necessarily cause the individual to continue using the drug. Then, what is drug dependence? According to Goldstein and colleagues (1974), drug dependence consists of three distinct and independent components: tolerance, physical dependence, and drug-seeking behavior resulting in compulsive abuse (psychic craving). Of course, these features are noticed in different degrees in drug dependence on different drugs. In the case of some drugs, only one or two of these components are noticed. “An example of tolerance and physical dependence without compulsive abuse is provided by the morphine congener and antagonist nalorphine” (Goldstein et al. 1974). According to earlier concepts formulated in the 1930s, 1940s, and 1950s, a drug was not considered to be addictive unless it produced physical dependence characterized by an easily observable withdrawal syndrome. This view led to popular misconceptions about the dependence potential of both nicotine and cocaine. However, addiction is still an evolving concept. Currently, many researchers and clinicians believe that life-threatening intensity or easy observability of a withdrawal syndrome is not a necessary element in addiction. For example, nicotine is believed to be addicting even though its withdrawal syndrome is not dramatic and no one has ever died from its withdrawal. An increasing trend in the diagnosis of dependence is to characterize the addictive disorders in terms of the pattern of use, loss of control over amounts ingested, and continued use despite medical, legal, occupational, or interpersonal problems. There are now two widely recognized sets of standard criteria that are used to determine whether a given individual should be considered to be dependent on a drug: the DSM-III-R criteria developed by the American Psychiatric Association (1987) and the ICD-10 criteria developed by the World Health Organization (1990). The DSM-III-R criteria for drug dependence include behaviors that allow an observer to infer that the individual has a decreased freedom to choose whether or not to use the drug. To be diagnosed as drug dependent, a person must meet three of the following criteria (American Psychiatric Association 1987): * Ingestion of larger amounts (of drug) or over a longer period of time than intended, signifying loss of control over behavior + Desire to or unsuccessful attempt to cut down drug use, once again representing loss of control over behavior » Great deal of time spent in procuring drug and recovering from its effects * Frequent intoxication or withdrawal when expected to fulfill major role obligations at work, school, or home; i.e., interference with obligations of life; e.g., reinforcing things in life like watching TV, reading books, interactions with people etc. » Other activities given up or reduced due to substance use « Continued use despite problems at work, in life (e.g., marital problems) or legal problems * Marked tolerance » Characteristic withdrawal symptoms « Substance use to relieve withdrawal In addition, these symptoms or behaviors must persist for more than 1 month. Furthermore, drug dependence can be graded as mild, moderate, or severe depending on the number of criteria met. A full remission means no use or use with no dependence in the past 6 months. The criteria used in ICD-10 are somewhat different. According to ICD-10, for someone to be diagnosed as (drug) dependent, at least three of the following should have been experienced or exhibited at some time during the previous year (World Health Organization 1990): » A strong desire or sense of compulsion to take the substance «An impaired capacity to control substance taking behavior in terms of onset, termination or levels of use «Substance use with intention of relieving withdrawal symptoms and with awareness that this strategy is effective + Physiological withdrawal state + Evidence of tolerance such that increased doses of the substance are required in order to achieve effects originally produced by lower doses » Narrowing of the personal repertoire of patterns of substance use * Progressive neglect of alternative pleasures or interests in favor of substance use * Persisting with substance use despite clear evidence of overly harmful consequences However, neither of these sets of criteria is used by the Federal Government for admission to a methadone maintenance program. According to Federal regulations, dependence criteria for admission to a methadone maintenance program are at least 1 year of addiction history, physiological addiction for at least 1 year, and continuous or episodic addiction for most of the preceding year (Methadone maintenance criteria 1989). It would be inappropriate to view this as a formal definition of addiction; rather, it should be seen as specifying a degree of addiction or opioid dependence that justifies admission to a specialized program. In one sense, however, one could say that there is no standard definition of drug dependence or any standard diagnostic test that can be administered to classify a drug-dependent individual in need of treatment. However, in the case of opioid dependence, there is a naloxone challenge test that, by displacing opioids from the receptors in the brain, will produce signs of physical dependence, that is, withdrawal symptoms, in anyone who has been using opioids for a few days or longer. This test can also be given to an individual who might be taking opioids for therapeutic purposes (and will produce the same withdrawal symptoms after even a few doses of opioids). Hence, the presence of a withdrawal syndrome (even a severe one) does not necessarily mean the individual is addicted. The presence of a withdrawal syndrome is neither necessary nor a sufficient condition for the diagnosis of drug dependence. However, as noted above, in an individual with a history of abuse, the presence of a withdrawal syndrome should be documented when that person is seeking admission to a methadone maintenance program. Hence, for the purpose of a clinical trial, the definition (DSM-III-R or ICD-10) of dependence with or without additional criteria (e.g., naloxone challenge scores) can be used. Using DSM-III-R criteria allows entrance into clinical trials of patients who would not necessarily meet criteria for admission to a methadone maintenance program. TREATMENT OF OPIOID ADDICTION There are more than 1 million opioid abusers in the United States who can possibly benefit from a treatment program. Of these, about 110,000 are in methadone maintenance programs, and about 3,000 are in naltrexone treatment. Many others are treated in detoxification programs, therapeutic communities, and 12-step, drug-free programs; it is likely that the overwhelming majority of this population are not participating in any kind of treatment. Although pharmacologically based treatments are only one approach to treatment, this approach plays an important role in the American system. There are primarily two pharmacological approaches to treatment of opioid dependence: agonisttherapy and antagonist therapy. Agonist therapy for opioid dependence constitutes replacing the abused opioid with another, most likely a synthetic, opioid (called an opioid agonist or partial agonist) with relatively less potential for abuse. The ideal replacement opioid should have a less intense or no euphoric effect, should have a longer pharmacological effect, and should have a withdrawal effect less severe than that of the abused opioid. Replacement (maintenance) therapy may last indefinitely, although in many treatment programs the ultimate goal is to remove the addicts from all drugs and opioids. 10 Antagonist therapy for opioid addiction treats addicts with an opioid antagonist that blocks binding of opioids to its receptors and thus blocks all effects of external opioids and, perhaps in some cases, the action of endogenous opioid peptides. However, this therapy is likely to be successful only for those who are extremely motivated to stop using opioids or to comply with taking an antagonist (e.g., physicians who may risk losing their license to practice if they are not off the drug). In addition, the currently available antagonist agent naltrexone is not well liked by addicts for several reasons. In some individuals, it may produce negative mood states. However, these adverse effects are not usually seen in individuals who have not been dependent on opioids. However, in many cases, unwillingness to take the antagonist may stem from its therapeutic effects—it blocks the effects of opioid agonists. As noted above, in addition to agonist and antagonist therapy, there are drug- free programs. The relapse rates for addicts who enter these programs are very high, but for small percentages who remain in TCs for 6 months or more, the outcome is generally quite positive (Vaillant 1992). OPIOID AGONIST THERAPY Currently, the only Food and Drug Administration (FDA)-approved pharmacotherapeutic opioid agonist for drug dependence is methadone maintenance with counseling. Methadone, given orally once a day to a tolerant individual, has no or little euphoric effect. Its pharmacological effect lasts for about 24 hours (thus, need for methadone arises about every 24 hours), and it has less severe though longer lasting withdrawal symptoms than heroin. Methadone has been found to be an effective treatment in reducing the use of illicit opioids that are generally administered through an intravenous (IV) route. Since IV use and sharing of injection equipment have been associated with the spread of human immunodeficiency virus (HIV) infection, reduction in heroin IV use indirectly reduces the risk of HIV infection. Although a decrease in heroin use is seen within days after methadone is started, in opioid maintenance with methadone treatment, patients must be stabilized on methadone for a certain length of time before they can draw maximum benefits from the treatment. Compared with drug-free programs, considerably higher retention rates are seen in methadone treatment. It must be mentioned here that by FDA regulation, methadone maintenance treatment must include other services such as counseling in addition to the administration of oral methadone. Hence, there are nonpharmacological aspects of methadone maintenance treatment. These additional services aid 14 addicts, for example, after they have stabilized to the point of ceasing to participate in crime-related activities, improving social and family relationships, and remaining in rehabilitation. The quality and quantity of these services can powerfully affect the results of treatment. Although research has shown that doses of methadone above 60 mg are more effective than lower doses in reducing heroin use, there are substantial variations in the methadone dose (10 mg per day to as much as 100 mg per day) administered in different clinics as well as in the quality and quantity of nonpharmacological services. Hence, success rates in reducing IV heroin use vary greatly from one clinic to another (Ball and Ross 1991; D’Aunno and Vaughn 1992). On the average, over a 1-month period, on a 10-mg daily dose, four of five addicts continue using heroin; on a 20 to 40 mg per day dose, about half the addicts still use heroin; on a 40 to 60 mg per day dose, only one of five addicts will use heroin; and on more than 60 mg per day doses, fewer than one in five addicts continue to use heroin, provided other services are of high quality. However, methadone treatment is not without problems. Methadone has a protracted withdrawal, and, therefore, it is difficult to withdraw from methadone. It follows that it would be desirable to have an alternative opioid agonist that induces less severe physical dependence and from which it is easier to withdraw. Methadone is a full agonist, and fatal accidental overdoses in unintended users (e.g., nontolerant drug users, children) have been reported. A treatment agent with less toxicity would be an advantage. Methadone must be used every day, which can be costly and time-consuming and hinders rehabilitation; alternatively, addicts must be allowed take-home doses. Take-home privileges have resulted in diversion of methadone into illicit markets and, according to isolated reported cases, in the creation of methadone addicts. Hence, an agent that has longer pharmacological action (e.g., can be used twice or thrice a week rather than every day and is less susceptible to diversion) would be an advance. In addition, in certain neighborhoods and communities, methadone is not well accepted and has been perceived as a stigma. Alternative treatments that are more acceptable to such communities would be an advantage. REFERENCES American Psychiatric Association. Diagnostic and Statistical Manual of Mental Disorders. 3d ed., revised. Washington, DC: American Psychiatric Association, 1987. Ball, J.C., and Ross, A. The Effectiveness of Methadone Maintenance Treatment. New York: Springer-Verlag, 1991. D’Aunno, T., and Vaughn, T.E. Variations in methadone treatment practices results from a national study. JAMA 267:253-258, 1992. 12 Goldstein, A.; Aronow, L.; and Kalman, S.M. Principles of Drug Action: The Basis of Pharmacology. New York: Wiley, 1974. 854 pp. Jaffe, J.H. Current concepts of addiction. In: O’Brien, C. P., ed. Addictive States. New York: Raven Press, 1992. pp. 1-21. Methadone maintenance criteria. Federal Register 54(40):8954-8971, 1989. Vaillant, G.E. Is there a natural history of addiction? In: O’Brien, C.P., ed. Addictive States. New York: Raven Press, 1992. pp. 1-21. World Health Organization. 1990 draft of chapter V: Mental and behavioral disorders. Clinical descriptions and diagnostic guidelines. International Classification of Diseases. 10th rev. Geneva: World Health Organization, 1990. AUTHORS Frank J. Vocci, Ph.D. Deputy Director Medications Development Division National Institute on Drug Abuse Parklawn Building, Room 11A-55 Jerome H. Jaffe, M.D. Deputy Director Office of Treatment Improvement Rockwall Il, 10th floor Ram B. Jain, Ph.D. Mathematical Statistician Biometrics Branch Medications Development Division National Institute on Drug Abuse Parklawn Building, Room 11A-55 5600 Fishers Lane Rockville, MD 20857 13 Background and Design of a Controlled Clinical Trial (ARC 090) for the Treatment of Opioid Dependence Rolley E. Johnson and Paul J. Fudala INTRODUCTION The initial clinical abuse liability study of buprenorphine was reported by Jasinski and coworkers (1978). They noted that acute single doses of buprenorphine produced morphine-like subjective, physiologic, and behavioral effects. They also found buprenorphine to be acceptable to the addict population and to block the effects of subcutaneously administered morphine. Buprenorphine appeared to have a long duration of action similar to methadone but, unlike methadone, was associated with a limited physiologic withdrawal syndrome. In the same report, the chronic subcutaneous administration of 8 mg/day of buprenorphine was equivalent to 60 mg/day of orally given methadone for subject-reported “liking.” A later study (Mello and Mendelson 1980) provided additional data regarding the potential efficacy of buprenorphine by showing that it suppressed the rate of heroin self-administration by individuals participating in a clinical laboratory study. The relative ineffectiveness by the oral route of administration (Jasinski et al. 1982) led to studies using sublingual buprenorphine. These investigations demonstrated that sublingually given buprenorphine was two-thirds as potent as when administered subcutaneously (Jasinski et al. 1989). Subsequent studies focused on various dose-induction procedures and the appropriate dose levels for the treatment of street opioid- and methadone-dependent individuals (Jasinski et al. 1983; Reisinger 1985; Seow et al. 1986; Bickel et al. 1988a; Kosten and Kleber 1988). Bickel and colleagues (1988a) reported that sublingual buprenorphine, 2 mg/day, was significantly less effective than 30 mg of orally administered methadone in attenuating the effects of a hydromorphone challenge. The same authors later reported that the opioid-blocking activity of buprenorphine was dose related up to 8 mg/day (Bickel et al. 1988b), with little apparent increase in benefit when the dosage was increased to 16 mg/day. 14 Still to be determined were appropriate induction and dosing schedules for a clinical comparison of buprenorphine with methadone. Thus, an inpatient trial was conducted to address these therapeutic issues. Results from this study indicated that a rapid 3-day dose-induction procedure was both effective and acceptable to study participants (Johnson et al. 1989). It was also concluded that daily dosing was probably more appropriate than alternate-day dosing (Fudala et al. 1990). The present study was designed to meet Food and Drug Administration regulatory requirements for a well-designed, well-controlled clinical trial that could be used in support of a New Drug Application for buprenorphine. To this end, the investigators attempted to control or account for those aspects of the study that could confound the data analyses or interpretation (Hargreaves 1983), including issues such as choosing appropriate design and outcome measures, subject characteristics, attrition, blinding, and others. This chapter describes the background and design of a controlled clinical trial comparing the efficacy of buprenorphine and methadone for the short-term maintenance and detoxification of opioid addicts. DESIGN Patients Inclusion criteria included the following: 1. Male or female volunteers seeking treatment for opioid dependence 2. Age 21 to 50 years 3. Length of present addiction of at least 4 months 4. Atleast two or more episodes of heroin use per day 5. Daily value of heroin use of $50 or greater 6. Arating of 4 or greater on a self-reported level of withdrawal scale 12 hours after the last heroin dose (0=no withdrawal, 9=worst withdrawal ever experienced) 7. Three consecutively collected daily urines, at least two of which were positive for opioids but negative for methadone 15 Exclusion criteria included the following: 1. Any acute or chronic medical or psychiatric condition that may have compromised an individual's ability to complete the study 2. Ascore of 7 or higher on the interviewer severity rating of need for psychiatric/psychological treatment on the Addiction Severity Index (ASI) 3. Clinically significant abnormalities in laboratory values 4. Alanine or aspartate aminotransferase levels greater than 99 units/L on admission Individuals were recruited through a contract service that identified potential patients from treatment, general medical, and other facilities having contact with chronic drug abusers. This service used the Shipley Institute of Living Scale and the Hopkins Symptom Checklist 90 (Revised) to ensure that prospective study participants could read and understand both the informed consent and study questionnaires and also as aids in identifying individuals who might not be qualified for the study. The study was conducted under protocol 090 at the Addiction Research Center of the Intramural Division of the National Institute on Drug Abuse (NIDA), Baltimore, MD, using its outpatient facilities. Individuals were enrolled in the trial between September 1988 and November 1989. Each patient gave informed consent for participation in the study. The consent forms and experimental procedures were approved by the local institutional review board in accordance with the U.S. Department of Health and Human Services guidelines for the protection of human subjects. \ ~ Methods The study was conducted using a double-blind, double-dummy (both an oral and sublingual dosage form given), parallel groups design. One dosage form contained the assigned treatment; the other was a matching placebo. The three treatment groups were: 1. Buprenorphine, 8 mg/day sublingually (n=53) 2. Methadone 20 mg/day orally (n=55) 3. Methadone 60 mg/day orally (n=54) The 20 mg/day dosage was chosen since one-tenth of the patients in methadone clinics were treated during the initial 3 months and longer with 16 this or a lesser dose (U.S. Department of Health and Human Services 1984; Allison et al. 1985). Also, it has been reported that 31 percent of patients entering methadone treatment can be successfully maintained on a dose of 20 mg/day or less for 4 weeks (Peachey and Lei 1988). The 60 mg/day dosage was chosen because it was reported as the approximate median daily dosage used in maintenance therapy (U.S. Department of Health and Human Services 1984) and one that the authors hypothesized would give results significantly better than those obtained from the 20 mg/day group. The 8 mg/day dosage of buprenorphine was selected based on previous reports indicating possible efficacy (Johnson et al. 1989; Fudala et al. 1990) and effects comparable to those seen with 40 to 60 mg/day of methadone (Jasinski et al. 1978). The working hypothesis of the study was that buprenorphine 8 mg/day and methadone 60 mg/day would be more effective than methadone 20 mg/day and that buprenorphine would be at least 80 percent as effective as methadone 60 mg/day. The dose-induction procedure is shown in table 1. Patients were subsequently continued on their maintenance dosage through study day 120. The study consisted of 120 days of induction/maintenance followed by 49 days of gradual dosage reduction and 11 days of placebo dosing. Patients who wished to voluntarily terminate their participation in the study or who were administratively discharged were given a 21-day methadone detoxification. For the purposes of data analysis, the study was divided into a 17-week maintenance phase (days 1 through 119) and an 8-week detoxification phase (days 120 through 175), since the detoxification phase was considered to begin with the last maintenance dose. The gradual detoxification was carried out by decreasing each treatment group’s dosage by the same percentage for a given week of the study. Although the study was designed to be carried out over 175 days (25 weeks), patient participation and data collection were extended to a total of 180 days to parallel existing Federal methadone regulations for long- term detoxification. TABLE 1. ARC 090 trial: dose-induction procedure Study Day Drug/Dosage 1 2 3 4 5 6 7 8 9 10 Buprenorphine8 mg 2 4 8 8 8 8 8 8 8 8 Methadone 60 mg 20 30 40 50 60 60 60 60 60 60 Methadone 20 mg 20 30 30 30 30 25 25 25 25 20 17 Stratification Patients were stratified into treatment groups by the following criteria: 1. Age (21 to 35 and 36 to 50 years). 2. Gender. 3. Clinical Institute Narcotic Assessment scores (less than 30 and greater than or equal to 30) (Peachey and Lei 1988). These scores reflect the results of a naloxone challenge test that was given to all patients immediately prior to their receiving the first dose of study medication. Each stratification factor had two levels for a total of eight strata. Treatment assignment was performed randomly for each stratum using a permuted block design with possible block sizes of three, six, or nine. The naloxone challenge test was used as a stratification variable to ensure approximately equivalent levels of physical dependence between groups. Age was used since various authors have shown differences in relapse and retention rates based on a patient's age (Richman 1966; Babst et al. 1971; Brown et al. 1973). Gender differences have been reported to affect retention of patients in methadone maintenance (Hser et al. 1991) and therapeutic community treatment programs (Sansone 1980). Also, since the present study incorporated fixed-dosage regimens, potential pharmacokinetic differences due to gender were controlled by stratification. Clinic Milieu Thirty to sixty minutes of individual counseling per week, using a relapse prevention model, was offered but not required. Medical safety was evaluated using hematology and blood chemistry panels and urinalyses collected on study days 30, 60, 90, 120, and 180. Vital signs were recorded every 2 weeks, and urine pregnancy tests were obtained every 2 months. Patient case report forms and medical records were maintained for each participant. Observed urine samples were collected three times weekly on Monday, Wednesday, and Friday. To promote patients’ compliance with the urine collection process, individuals were required to submit a sample on the day(s) following a missed, scheduled collection. However, because of potential carryover and other confounds, these samples were not analyzed. Level 1 to level 2 clinical services (Childress et al. 1991) were provided to all patients. 18 Treatment compliance was maximized by requiring participants to come to the clinic daily to receive medication. Individuals who missed 3 consecutive days of medication were dropped from the study, with their third missed day considered to be the last day of study participation. Every effort was made to retain individuals in the study. For example, whenever possible, medications were delivered to and data collected from patients who were incarcerated in the Baltimore metropolitan area. The last day of study participation for individuals administratively discharged or those who voluntarily terminated from the study was their actual discharge or termination date. One, zero, and three patients, randomized to the buprenorphine and methadone 20 and 60 mg/day groups, respectively, had their dosages halved due to an inability to tolerate them. Since this was a fixed-dosage protocol, these patients were considered treatment failures effective on the first day of dosage adjustment, although data collection continued. Study staff members (except pharmacy personnel) were blind to this provision of the protocol. Primary Dependent Variables Three primary dependent variables were identified a priori: 1. Patient retention time in the study 2. Monday, Wednesday, and Friday urine samples negative for opioids 3. Failure to maintain drug abstinence as assessed by two consecutive Monday urine samples positive for opioids following 4 weeks of treatment The criterion for the last variable was chosen to give patients time to stabilize in treatment and to account for the probability that patients would more likely challenge the pharmacologic blockade early in treatment. Monday urine samples were selected since it was felt that patients were more likely to use (or use more) illicit opioids on weekends. A 1-week interval between samples was chosen so that a positive result would not be due to a previous sample. Secondary Dependent Variables Collected within the first 7 study days were results from the following: 1. Buss-Durkee Hostility Scale 2. Diagnostic Interview Schedule 18 Early Experience Questionnaire Elliot Huizinga Lifetime Events Survey Eysenck Impulsivity, Venturesomeness, and Empathy Questionnaire Eysenck Personality Questionnaire Hopkins Symptom Checklist 90 (Revised) Personality Diagnostic Questionnaire ASI (also obtained at study completion or termination and 3, 6, and 12 months thereafter) The following patient-reported data were collected daily: 4 An adjective checklist (interval scale from 0 to 9) assessing opioid withdrawal symptoms, with additional items measuring urge and need for an opioid, frequent urination, and “hooked on” and “liking” for the study medication A structured questionnaire (true/false) assessing opioid withdrawal symptoms Collected three times weekly were urine samples assayed for barbiturates, benzodiazepines, cocaine metabolite, methadone, and phencyclidine. Data collected biweekly (patient reported) included: ]. A visual analog scale assessing “want” and “need” for an opioid and cocaine 2. A 14-item medication adverse effects questionnaire 3. Beck Depression Inventory Collected at 30, 60, 90, and 120 days and at termination were: T. Hematology and blood chemistry panels 2. Urinalyses 3. Vital signs 20 Urine Toxicology Urine samples were assayed in triplicate using appropriate positive and negative controls, once with radioimmunoassay (Abuscreen; Roche Diagnostic Systems Inc., Montclair, NJ) and twice with enzyme-multiplied immunoassay technique (EMIT; Syva Corporation, Palo Alto, CA). A sample was considered to be positive if the amount of analyte in the sample was greater than a predetermined cutoff value (e.g., 300 ng/mL for opioids). If a sample tested negative at least twice out of the three assays, it was considered negative; otherwise, it was considered positive. Study Medications Buprenorphine hydrochloride was obtained from Reckitt and Colman (Hull, England) through NIDA’s Research Technology Branch (Rockville, MD). Drug solutions were aseptically prepared in 30 percent ethanol (vol/vol) and stored at room temperature. All solutions were administered sublingually in a volume of 1 mL using Ped-Pod oral dispensers (SoloPak Laboratories, Franklin Park, IL). Buprenorphine solutions have been shown to be stable in these dispensers for at least 3 months. To maximize the amount of buprenorphine absorbed from the sublingual mucosa, all patients were instructed to refrain from speaking and to hold the solution under the tongue for 10 minutes. Methadone HCI (methadone hydrochloride oral concentrate USP, 10 mg/mL) and cherry flavor concentrate (Mallinckrodt Inc., St. Louis, MO) were used. A methadone HCI, 2 mg/mL solution was prepared from the concentrate and distilled water. Final methadone dosages were prepared to a volume of 30 mL using this solution in a vehicle of cherry flavor concentrate:water (1:4) containing denatonium benzoate (Bitrex; J.H. Walker and Co., Inc., Mt. Vernon, NY), 0.2 ng/mL, to mask the flavor of the solutions. SUMMARY This study represents the largest clinical trial reported to date that demonstrated the efficacy of buprenorphine for opioid dependence treatment (Johnson et al. 1992). Although the study design was adequate to demonstrate differences between treatment groups, there has not been a consensus regarding the most appropriate method for analyzing various outcome measures of this and similar studies. To present a comprehensive review of these methods, other chapters in this monograph focus on various analytical techniques for assessing one of these measures—urine toxicology screens—for illicit opioids. 21 REFERENCES Allison, M.; Hubbard, R.L.; and Rachal, J.V. Treatment Process in Methadone, Residential, and Outpatient Drug-Free Programs. National Institute on Drug Abuse Treatment Research Monograph Series. DHHS Pub. No. (ADM)85- 1388. Rockville, MD: U.S. Department of Health and Human Services, U.S. Public Health Service, Alcohol, Drug Abuse and Mental Health Administration, 1985. Babst, D.V.; Chambers, C.D.; and Warner, A. Patient characteristics associated with retention in a methadone maintenance program. Br J Addict 66:195-204, 1971. Bickel, W.K.; Stitzer, M.L.; Bigelow, G.E.; Liebson, |.A.; Jasinski, D.R.; and Johnson, R.E. A clinical trial with buprenorphine: Comparison with methadone in the detoxification of heroin addicts. Clin Pharmacol Ther 43:72-78, 1988a. Bickel, W.K.; Stitzer, M.L.; Bigelow, G.E.; Liebson, I.A.; Jasinski, D.R.; and Johnson, R.E. Buprenorphine: Dose-related blockade of opioid challenge effects in opioid dependent humans. J Pharmacol Exp Ther 247:47-53, 1988b. Brown, B.S.; DuPont, R.L.; Bass, U.F. lll; Brewster, G.W.; Glendinning, S.T.; | Kozel, N.J.; and Meyers, M.B. Impact of a large-scale narcotics treatment program. A six month experience. Int J Addict 8:49-57, 1973. Childress, A.R.; McClellan, A.T.; Woody, G.E.; and O’Brien, C.P. Are there minimum conditions necessary for methadone maintenance to reduce intravenous drug use and AIDS risk behaviors? In: Pickens, R.W.; Leukefeld, C.G.; and Schuster, C.R., eds. Improving Drug Abuse Treatment. National Institute on Drug Abuse Research Monograph 106. DHHS Pub. No. (ADM)91-1754. Washington, DC: Supt. of Docs., U.S. Govt. Print. Off., 1991. pp. 167-177. Fudala, P.J.; Jaffe, J.H.; Dax, E.M.; and Johnson, R.E. Use of buprenorphine in the treatment of opioid addiction. Il. Physiologic and behavioral effects of daily and alternate-day administration and abrupt withdrawal. Clin Pharmacol Ther 47:525-534, 1990. Hargreaves, W.A. Methadone dosage and duration for maintenance treatment. In: Cooper, J.R.; Altman, F.; Brown, B.S.; and Czechowicz, D., eds. Research on the Treatment of Narcotic Addiction. State of the Art. National Institute on Drug Abuse Treatment Research Monograph Series. DHHS Pub. No. (ADM)83-1281. Rockville, MD: U.S. Department of Health and Human Services, U.S. Public Health Service, Alcohol, Drug Abuse, and Mental Health Administration, 1983. pp. 19-79. Hser, Y.; Anglin, M.D.; and Liu, Y. A survival analysis of gender and ethnic differences in responsiveness to methadone maintenance treatment. Int J Addict 25:1295-1315, 1991. 22 Jasinski, D.R.; Fudala, P.J.; and Johnson, R.E. Sublingual versus subcutaneous buprenorphine in opiate abusers. Clin Pharmacol Ther 45: 513-519, 1989. Jasinski, D.R.; Haertzen, C.A.; Henningfield, J.E.; Johnson, R.E.; Makhzoumi, H.M.; and Miyasato, K. Progress report of the NIDA Addiction Research Center. In: Harris, L.S., ed. Problems of Drug Dependence, 1981: Proceedings of the 43rd Annual Scientific Meeting, The Committee on Problems of Drug Dependence, Inc. National Institute on Drug Abuse Research Monograph 41. Washington, DC: Supt. of Docs., U.S. Govt. Print. Off., 1982. pp. 42-52. Jasinski, D.R.; Henningfield, J.E.; Hickey, J.E.; and Johnson, R.E. Progress report of the NIDA Addiction Research Center, Baltimore, Maryland, 1982. In: Harris, L.S., ed. Problems of Drug Dependence, 1982: Proceedings of the 44th Annual Scientific Meeting, The Committee on Problems of Drug Dependence, Inc. National Institute on Drug Abuse Research Monograph 43. DHHS Pub. No. (ADM)83-1264. Washington, DC: Supt. of Docs., U.S. Govt. Print. Off., 1983. pp. 92-98. Jasinski, D.R.; Pevnick, J.S.; and Griffith, J.D. Human pharmacology and abuse potential of the analgesic buprenorphine. Arch Gen Psychiatry 35:501-516, 1978. Johnson, R.E.; Cone, E.J.; Henningfield, J.E.; and Fudala, P.J. Use of buprenorphine in the treatment of opiate addiction. I: Physiologic and behavioral effects during a rapid dose induction. Clin Pharmacol Ther 46:335-343, 1989. Johnson, R.E.; Jaffe, J.H.; and Fudala, P.J. A controlled trial of buprenorphine treatment for opioid dependence. JAMA 267:2750-2755, 1992. Kosten, T.R., and Kleber, H.D. Buprenorphine detoxification from opioid dependence: A pilot study. Life Sci42:635-641, 1988. Mello, N.K., and Mendelson, J.H. Buprenorphine suppresses heroin use by heroin addicts. Science 207:657-659, 1980. Peachey, J.E., and Lei, H. Assessment of opioid dependence with naloxone. Br J Addict 83:193-201, 1988. Reisinger, M. Buprenorphine as new treatment for heroin dependence. Drug Alcohol Depend 16:257-262, 1985. Richman, A. Follow-up of criminal narcotic addicts. Can Psychiatric Assoc J 11:107-115, 1966. Sansone, J. Retention patterns in a therapeutic community for the treatment of drug abuse. Int J Addict 15:711-736, 1980. Seow, S.S.W.; Quigley, A.J.; llett, K.F.; Dusci, L.J.; Swensen, G.; Harrison- Stewart, A.; and Rappaport, L. Buprenorphine: A new maintenance opiate? Med J Australia 144:407-411, 1986. 23 U.S. Department of Health and Human Services. National Summary of Narcotic Treatment Programs. Annual Report for Treatment Programs Using Methadone. Washington, DC: Supt. of Docs., U.S. Govt. Print. Off., 1984. ACKNOWLEDGMENTS This work was supported through the NIDA intramural research budget. The authors acknowledge Jean Fralich, Louise Glezen, and John Hickey for medical monitoring and coordinating the study; Ed Bunker, Charles Collins, Jose deBorja, Nancy Kreiter, lvan Montoya, and Renea Siebold for data entry, computer programing, and statistical analysis; Ed Brown, Tommy Calloway, Denise Dickerson, Marge Ewell, and Ramona Parker for patient recruitment; C. Dan Baker and Faye Hodges for patient counseling; and Anna Dorbert and Lillian Morgan for nursing services. AUTHORS Rolley E. Johnson, Pharm.D. Associate Professor Department of Psychiatry and Behavioral Sciences The Johns Hopkins School of Medicine Building G, Room 2725 5510 Nathan Shock Drive Baltimore, MD 21224 Paul J. Fudala, Ph.D. Assistant Professor Department of Psychiatry University of Pennsylvania School of Medicine and the Department of Veterans Affairs Medical Center Building 15 University and Woodland Avenues Philadelphia, PA 19104 24 Clinical Endpoints: Discussion Session Ram B. Jain Discussants: John Hyde, Charles Gorodetzky, Richard Stein, and Curtis Wright The aim of this discussion session was to obtain the opinion of the U.S. Food and Drug Administration (FDA) about what kind of endpoints would be adequate and/or appropriate in clinical trials for treatment of drug dependence, what statistical methods should be used to analyze the data generated from these trials, and in general, what strategy should be used to design these trials. Drs. Hyde, Stein, and Wright represented FDA, and Dr. Gorodetzky presented the pharmaceutical industry's viewpoint because FDA policy might affect its ability to conduct clinical trials. Dr. Wright reminded that although most funded research is exploratory in nature, generating new and exciting information on the cutting edge of science, most of the drug approval work at FDA is confirmatory in nature, calling for regulatory decisions to approve or not approve drugs. As a consequence, results obtained by applying a new mathematical technique should be backed up or linked with results obtained by a mathematical technique that is known to work. Drug approval is easy when information about a new drug is coherent and robust and there is a large effect size. The results obtained in large phase Ill trials—generally used to support a new drug application (NDA)— should be in coherence with the results obtained from the earlier phase | and Il trials in selected and general human populations and from preclinical work on animals; they should get the same answers in all those places. The conclusions obtained from analysis of data should be robust; that is, they should not be dependent on a specific experimental design, a specific method of analysis, or the specific way a trial may have been conducted. Different trials, probably using different designs, should lead to the same conclusions. This is what Dr. Stein called clinical robustness as opposed to statistical robustness. The effect size should be relatively large. 25 The results of the pivotal trials should not depend on a set of assumptions made at any stage of development. Outcome variables (endpoints) selected for the pivotal trials should tap several different kinds of domains. Subjective self-reports (e.g., “How are you doing today?”) should be linked or obtained in parallel with observer rating by a clinical staff member or physician about, for example, how the addict was doing that day. Physiologic measures or responses—for example, urine screens, hair analysis, naloxone challenge scores—should be obtained along with behavioral measures such as retention rates. Common or similar results across different domains sampled strengthen an NDA. FDA's Pilot Drug Evaluation Division would permit four primary variables without penalizing for multiplicity. An approval may become difficult if effect is shown for only one variable in one population in one study only. The results obtained by analyzing a data set validated by FDA's Division of Scientific Investigations using a specific method (of analysis) are cross-validated by analyzing data using some other techniques to see whether the findings are robust. Implicit assumptions built into data collection, reduction, and analysis are evaluated. Knowledge of what took place at each step along the way— from preclinical work to analysis of phase ll trials—is helpful. Trial designs that not only meet the requirements of a particular analytical technique to be used but also are robust toward dropouts, violation of protocol assumptions, and alternative analytical techniques are preferable. This is so because trials designed to prove efficacy may also be looked at to try to determine the dose, to evaluate adverse reactions, or to develop specific instructions for use for subpopulations. It is also important to look at what information may have been thrown away and what information may be so confounded that dose, duration of treatment, patient acceptability, specific adverse events, and management of patient dropouts are so distorted that the trial cannot be used to make a regulatory decision. Dr. Stein believed it important to evaluate the social impact of the proposed drug in these populations. How healthy and how productive the patients may be after the treatment is probably a primary variable for these populations. The endpoints should be reliable and quantifiable. Simple surrogate measures such as how frequently the drug is abused, what is the abuse pattern, and how much and what kind of drug is being abused are important. An acceptable analysis should be able to identify how each patient did during the treatment and what his or her contribution is to the overall analysis. Dr. Gorodetzky commented about the use of four primary variables. The number of primary variables to be used will depend on the kind of experiment designed and whether it is aimed at the consumer, at the science, or at 26 medicine. Some kind of compromise is possible. A-clinical trial is an experiment in which one has to think very specifically about the objectives and the operational manner in which one is going to attempt to reach those objectives. One may not want to do certain things in a given situation that might be interesting to do in another context. It is not as simple as choosing one variable or four variables; the question is how some very practical questions can be answered and how specific objectives can be drawn up for clinical trials. The end product of an approved drug is a package insert aimed at the users—the practicing physicians and other scientists. The package insert communicates what should be expected from the approved drug. As Dr. Wright put it, what should be communicated to these users is fairly basic practical data: For example, is the patient going to be arrested less often? Is the patient going to be using drugs less? Is the patient going to come back to the clinic? If a package insert communicates information that is too complex, it would not be understandable to the users of a package insert. As Dr. Wright pointed out, combination variables are good at supporting a fairly robust statistical outcome, but they can make it extremely difficult to go back to the original data for dose selection, to develop instructions for use for subpopulations, and to establish relationships between adverse events and treatment drugs. There was some discussion about the retention rates in these trials. How should this variable be used? What does this variable mean? Dr. Vocci wanted to use this variable as an outcome measure not only because it is important for the analysis, but also because, if a treatment works for only a subpopulation, there is an interest in knowing the characteristics of that subpopulation. This variable might tell who is going to be a possible treatment success. Retention is important because, before patients can benefit from the treatment and, thus, start changing their behavior (other than drug-taking behavior), they must stay in the treatment for a certain length of time. This reflects on the effectiveness of a treatment program vs. the effectiveness of a drug. According to Dr. Gorodetzky, retention is a complex variable and may have more practical consequences than some of the other outcome variables. Because treatment milieu differs substantially from one clinic to another, the largest treatment by investigator (clinic) interaction is likely to be discovered for retention in multicenter trials. People may drop out of these trials for different reasons: because of a 4-hour questionnaire they are asked to complete on the last day of the treatment; because the treatment failed for them; or because of how they get paid, how much they are paid, and when. Dropouts modify treatment effects in these trials in unknown ways. 27 AUTHOR Ram B. Jain, Ph.D. Mathematical Statistician Biometrics Branch Medications Development Division National Institute on Drug Abuse Parklawn Building, Room 11A-55 5600 Fishers Lane Rockville, MD 20857 28 Design of Clinical Trials for Treatment of Opiate Dependence: What Is Missing? Ram B. Jain INTRODUCTION A typical trial to evaluate the safety and efficacy of a new pharmacotherapy for the treatment of drug dependence, including opiate dependence, would be double blind and would use one or more doses of the new pharmacological agent as well as a placebo and/or an active control as the alternative treatment arm. The primary outcome variable of interest will be the frequency and/or the amount of the addicting/abused opiates used by the subjects in the trial in different treatment arms. The only practical way to determine either the frequency or the amount of the addicting/abused opiates used by the addicts would be through self-reports. However, these self-reports are not likely to be very reliable. Consequently, the addicts are asked to provide urine samples as specified in the protocol. These urine samples are assayed to determine the presence and/or the amount of the addicting/abused opiates. T, and T, are the two consecutive time points (figure 1) at which a subject provides urine samples for testing. If episodes A, B, and C are three independent episodes of opiate abuse, then A will not be detected at either T, or T,, B will be detected at T, only, and C will be detected at both T, and T, since the amount of opiate abused at these episodes was different, and as such the duration for which opiates stay in the urine will be different. To detect episode A or to avoid Unisresinann of the frequency of opiate abuse, the urine samples should have been collected and assayed earlier; in other words, to avoid underestimation, the urine samples should be collected as frequently as possible. To avoid episode C being detected twice or to avoid overestimation of the frequency of opiate abuse, the urine samples should be collected as infrequently as possible. The phenomenon of two or more consecutive samples detecting the same episode of opiate abuse is called the carryover from one positive sample to another positive sample. There are substantial variations in drug-seeking behavior from one addict to another: 29 Episode C Episode A Cut-off Episode B Amount of Opiate in Urine ! Time at Which Urine Samples Are Collected FIGURE 1. Detection of drug abuse by urine assays Some abuse large amounts in relatively few episodes; some use small amounts in relatively large numbers of episodes; some abuse drugs during weekends only; and some use them every day. For this reason, it is difficult to determine whether two or more consecutive positive urine samples represent one or more episodes of drug abuse or, in other words, whether there is a carryover. Also, since the estimation of carryover is difficult, carryover or overestimation rather than underestimation of the frequency of opiate abuse is more of a concern. However, complete elimination of the probability of carryover may not be achievable. Hence, it is probably best to design the trials so that the probability of carryover from one positive urine to another positive urine is minimized and the probability of detecting an episode of drug abuse is maximized. This chapter provides suggestions as to how a trial can be designed to achieve this and what may still be missing. The issues that reflect on the design of these trials can be studied under the following titles: 1. Sampling schemes used to obtain urine samples 2. Frequency and timing of the collection of urine samples 3. Qualitative vs. quantitative analysis of urine samples 30 SAMPLING SCHEMES USED TO OBTAIN URINE SAMPLES In one of the earlier trials conducted to evaluate the safety and efficacy of LAAM, Ling and colleagues (1976) collected urine samples once a week using a random time sampling scheme. In a random time sampling scheme, although the subjects know how many times during a given week they will be asked to provide their urine samples, they do not know on which days of the week they will be asked to provide a urine sample. It is randomly decided who will provide a urine sample on which day of the week. For example, if the protocol calls for collection of one urine sample per week from each subject and if the urine samples are to be collected Monday through Friday only, 20 percent of the total subjects in the study will provide urine samples on Monday, 20 percent of the total subjects in the study (from the remaining 80 percent of the total subjects) will provide urine samples on Tuesday, and so on until all the subjects who have not provided their urine samples by Thursday will be asked to provide their urine samples on Friday. Consequently, the probability of a subject providing a urine sample will vary from day to day, ranging from zero to one. Consequently, this type of sampling scheme is not truly random. In addition, a subject X may provide a urine sample on Monday of one week and on Friday of the next week, thus being allowed free drug-seeking behavior for 10 days. On the other hand, a subject Y may provide a urine sample on Friday of one week and on Monday of the next week, thus being allowed free drug-seeking behavior for only 2 days. Thus, a random time sampling scheme has the potential to make alternate treatment groups incomparable for analysis. As said earlier, this sampling scheme is not truly random, but for lack of better terminology, it is called a random time sampling scheme. This type of sampling scheme was earlier advocated by Goldstein and Brown (1970). Certain other types of random time sampling schemes are discussed in Harford and Kleber (1978) and Goldstein and Brown (1970). However, since these schemes are not in practical use, they will not be discussed further. According to a report published by the Council on Scientific Affairs (1987), opiates stay in the urine for about 48 hours. Hence, unless urine samples are collected at less than 48-hour intervals, carryover is not likely to be a problem. Consequently, once-a-week, 5-days-a-week random time sampling is not likely to lead to carryover, but since an addict may be tested as far apart as 10 days, it certainly will lead to underestimation of the frequency of opiate abuse. But for twice and thrice a week, 5-days-a-week random time sampling, as can be seen from tables 1 and 2, the probability of being tested less than 48 hours apart, that is, on consecutive days, is 54.9 and 45.8 percent, respectively, which is likely to lead to a serious carryover. The probability of being tested more than 48 hours apart, that is, probability of underestimation, is 18.9 and 13.7 percent, respectively. 31 TABLE 1. Probabilities of being tested in a twice-a-week, 5-days-a-week random time testing’ Minimum (Maximum) Probability of Number of Free Number of Free Being Tested on Drug-Seeking Days Drug-Seeking Days MT"W TF During the Week During 2 Weeks X X .1600000 5 5 (8) X X .1142857 4 4(7) X X .0628571 3 3 (6) X X .0628571 2 2 (5) XX .1142859 4 4(7) X X .0628571 3 3 (6) X X .0628571 2 2 (5) Xe aX .0857142 3 3 (6) X X .0857142 2 2 (5) Xi X .1885714 2 2 (5) "Total probability of being tested on consecutive days=.5485713; probability of being tested more than 48 hours apart during the same week=.1885713. Hence, random time sampling could render treatment groups incomparable for analysis and may result in serious underestimation of the frequency of opiate abuse and/or a serious carryover from one positive sample to another positive sample depending on the frequency of sampling. To further dwell on the merits and demerits of random time sampling, another type of sampling scheme called fixed time sampling needs to be defined. In a fixed time sampling scheme, all subjects are asked to provide urine samples on the same days of the week. In a double-blind, double-dummy clinical trial to compare the efficacy and safety of 8-mg sublingual doses of buprenorphine with 20- and 60-mg doses of methadone conducted at the Addiction Research Center of the National Institute on Drug Abuse (the ARC 090 trial), between September 1988 and May 1990, a fixed time sampling scheme was used to obtain urine samples three times a week on Mondays, Wednesdays, and Fridays. Because the urine samples were obtained at least 48 hours apart, the probability of carryover is minimal. According to Dr. Edward J. Cone (personal communication, July 1991) of the Addiction Research Center, the mean time to detect (cutoff=800 ng/mL) intramuscular administration of 6 mg of morphine by an enzyme-multiplied immunoassay technique (EMIT) 32 TABLE 2. Probabilities of being tested in a three-times-a-week, 5-days-a- week random time testing’ Probability of Number of Free Number of Free Being Tested on Drug-Seeking Days Drug-Seeking Days MT W T F During the Week During 2 Weeks X X X .1885714 4 4 (6) X X X .1487258 3 3 (5) X X X .0227026 2 2 (4) X X X .1090656 3 3 (5) X X X .0151350 2 2 (4) X X X .1142857 2 2 (4) X X X .1090656 3 3 (5) X X X .0151350 2 2 (4) X X X .1142857 2 2 (4) X X X .1600000 2 2 (4) "Total probability of being tested on consecutive days=.457637; probability of being tested more than 48 hours apart during the same week=.1369883. assay was 21.82 hours (n=5, SD=5.34). Given that the urine half-life of morphine is 4 to 6 hours, on the average, up to 96 mg of morphine can be consumed by an addict during one episode and still result in only one positive urine if the consecutive urines are collected and assayed at least 48 hours apart. However, since Friday and Monday samples were collected 72 hours apart, the potential for underestimation is certainly there, but this is likely to happen only when opiates are abused on Fridays but not on Saturdays and Sundays. At worst, the addicts have 3 free days of drug-seeking behavior. But because everybody has the same number of free days uniformly across the whole study period, the comparability of different treatment groups is maintained. The strongest argument in favor of random time sampling is that the addicts try to avoid drug abuse detection, and as such, if they know they will be tested, they will not show up for their scheduled visits. In certain special treatment situations in which a positive result is associated with certain contingencies, this might be true, but in a clinical trial environment there is no reason to expect any such contingencies. As such, the argument to use random time sampling is merely philosophical, with no advantage and many drawbacks, including a substantial potential to render the data nonanalyzable. If a protocol calls for administrative withdrawal after a certain number of positive urines, the addict may be switched to an alternate, possibly more beneficial treatment 33 rather than being withdrawn, and makeup urines may be collected on days following a missed visit; these makeup urines may or may not be used in the analysis. In addition, there are no published data to suggest that such a practice does occur in a noncontingent treatment environment. Hence, a fixed time sampling should be the design of choice. FREQUENCY AND TIMING OF THE COLLECTION OF URINE SAMPLES When and how frequently the urine samples should be collected depends on the kinetics of the drug of abuse and the sensitivity of the assay used to analyze the urine samples. For heroin, with a cutoff of 300 ng/mL, a sample every 48 hours seems to be the optimal choice, because as pointed out by the Council on Scientific Affairs (1987), heroin stays in the urine for about 48 hours provided EMIT-type assays are used. This is likely to minimize the probability of carryover and maximize the probability of detecting an episode of opiate abuse. With a lower cutoff and/or a more sensitive assay such as gas chromatography/ mass spectrometry, the samples may have to be collected and assayed more infrequently. Otherwise, the probability of carryover may be increased. However, this may decrease the probability of detecting an episode of drug abuse. Also, for shorter acting drugs, the samples may have to be collected more frequently. For longer acting drugs, they may have to be collected more infrequently. The timing of sample collection should be such that the days of heavy use do not go undetected. For example, to detect use on weekends, it may be necessary to collect the first sample of the week on Monday. In summary, the decision of when and how frequently the samples should be collected should be made by a joint team: a statistician, who should ensure that the probability of carryover is minimized and the probability of detecting the drug abuse is maximized to the degree possible; a pharmacologist/ pharmacokineticist, who should ensure that reliable information on the kinetics of the drug of abuse is available and is provided to the statistician; and a physician/clinician, who is adequately informed of the pattern of drug abuse and should be primarily responsible for the timing of sample collection. 34 QUALITATIVE VS. QUANTITATIVE ANALYSIS OF URINE SAMPLES Currently, the clinical trials in the drug abuse area are designed to estimate the frequency of drug abuse and not the amount of drug abuse. However, a replacement drug may decrease the frequency of drug abuse, but the addicts may still be using the same amount of the drug (of abuse), though in a smaller number of episodes. The amount of drug abuse may be estimated by analyzing the urine samples quantitatively rather than qualitatively, that is, by estimating the amount of the drug of abuse in the urine, rather than just the presence or absence of the drug of abuse. However, a real-life relationship between the amount of drug present in the urine and the actual amount of drug consumed is confounded by many factors. A relationship between the amount of drug present in the urine and the actual amount of drug consumed may be established in laboratory experiments, and an inference can be drawn about the amount of drug consumed from the amount of drug present in the urine. However, a relationship established in the laboratory is not likely to hold in real-life situations because of the uncertainty of the timing of the episodes of drug abuse, the variations in the purity of drugs of abuse with different geographic locations and times, the effect of multiple episodes of drug abuse on the metabolism of these drugs, the interactions between multiple drugs of abuse consumed by the addicts in same or different episodes, the differences in frequency and timing of drugs abused by the addicts, and so on. And, of course, how accurately this relationship can be determined will also depend on the accuracy of the quantitative assays used to analyze urine samples. In addition, instead of urine samples, plasma samples may be better determinants of this relationship, but once again, this relationship too will be confounded by the same factors that confound this relationship for urine samples. At best, a relationship between the amount of drug present in the urine or plasma samples and the actual amount of drug abused is very complex and not easy to capture in real-life situations. However, a joint effort by statisticians, pharmacokineticists, and physicians/clinicians to model this relationship is likely to be fruitful. It must also be mentioned that the estimation of the amount of drug abuse should not be done in lieu of the estimation of the frequency of drug abuse. Both should be done simultaneously. Because of the strong relationship between the frequency of intravenous use and human immunodeficiency virus infection, it is of paramount importance that the replacement drugs should decrease the frequency as well as the amount of drug abuse. 35 WHAT IS MISSING? 1. The statistical/pharmacokinetic methods/design to model the relationship between the amount of drugs present in the urine or plasma samples and the actual amount of drugs abused is missing. 2. The present methods to estimate the frequency of drug abuse provide, at best, a lower bound on the frequency of drug abuse because of: + The inability to detect possible multiple episodes of drug abuse during the time two consecutive urine samples are collected, and * The need to do infrequent sampling to minimize the carryover from one positive sample to another positive sample: 3. The probability of carryover is not entirely eliminated, and the degree of carryover is not known. It will be helpful if methods/techniques can be developed to ascertain whether multiple, consecutive positive samples are due to one or multiple episodes of drug abuse. This may, for example, be done by using self-reported episodes of drug abuse during the time consecutive urine samples are collected. REFERENCES Council on Scientific Affairs. Scientific issues in drug testing. JAMA 257:3110- 3114, 1987. Goldstein, A., and Brown, B.W. Urine testing schedules in methadone maintenance treatment of heroin addiction. JAMA 214:311-315, 1970. Harford, R.J., and Kleber, H.D. Comparative validity of random-interval and fixed-interval urinalysis schedules. Arch Gen Psychiatry 35:356-359, 1978. Ling, W.; Charuvastra, V.; Kaim, S.C.; and Klett, C.J. Methady! acetate and methadone as maintenance treatments for heroin addicts. Arch Gen Psychiatry 33:709-720, 1976. AUTHOR Ram B. Jain, Ph.D. Mathematical Statistician Biometrics Branch Medications Development Division National Institute on Drug Abuse Parklawn Building, Room 11A-55 5600 Fishers Lane Rockville, MD 20857 36 Comments on “Design of Clinical Trials for Treatment of Opiate Dependence: What is Missing?” by Jain Sudhir C. Gupta This chapter discusses the following three important issues in the design of clinical trials for opiate dependence: Random vs. fixed time sampling scheme for collecting urine samples 2. Frequency and timing for collecting urine samples 3. Estimating the amount of drug abuse in addition to the frequency of drug abuse SAMPLING SCHEME FOR COLLECTING URINE SAMPLES As discussed by Dr. Jain, the main problem with using a random time sampling scheme is that the methods for analyzing the data obtained using this scheme may not be available. This means that suitable methods should first be developed before the analysis of the data can be carried out. As pointed out by Dr. Jain, this approach is not recommended. The trial should be designed so as to allow an efficient interpretation of the data. A fixed time censoring scheme is thus recommended. The strongest argument in favor of random time sampling is that addicts try to avoid drug abuse detection. In a fixed time sampling scheme if they know that they will test positive because of drug abuse, they may not show up for their scheduled visits. However, Dr. Jain has pointed out that this is not to be expected in this trial because subjects who are known to be drug addicts do not have anything to gain by avoiding detection of drug abuse. In a fixed time sampling scheme all the subjects are required to provide urine samples on each of the scheduled days. Sometimes it may become necessary to use a random time sampling scheme if enough resources are not available to handle all the subjects in one day. If a random time sampling scheme is to be used under such circumstances, then it should be modified to yield truly random samples as indicated below. 37 Suppose the protocol calls for collection of two urine samples per week from each subject. Then there should be an equal probability for a subject to be tested on any 2 of the 5 days of the week. Let MTh denote that a subject is to be tested on Monday and Thursday, etc. A subject may be tested on MT, MW, MTh, MF, TW, TTh, TF, WTh, WF, or ThF, resulting in 10 possibilities as pointed out by Dr. Jain. A subject should be assigned to one of these 10 possibilities randomly. This random assignment should be done separately for each week, and it should not be known to the subjects in advance of their urine collection. In the case of two urine samples per week, the expected number of free drug-seeking days is 3.15 using table 1 of Dr. Jain’s chapter. For the above suggestion the probability is 0.10 for a subject to be tested on any of the 10 possible pairs of days. The expected number of free drug-seeking days is then 3.0. A similar method can be used for three urine samples per week, reducing the expected number of free drug-seeking days to 2.2. The corresponding expected number is 2.74 from Dr. Jain’s chapter. FREQUENCY AND TIMING FOR COLLECTING URINE SAMPLES As pointed out by Dr. Jain, the frequency of collecting urine samples should be determined so as to minimize the probability of carryover and to maximize the probability of detecting opiate abuse. As discussed in Gupta (1991), a model that incorporates subject and carryover effects can be developed using the approach of Bonney (1987). However, in this approach the subject effects and carryover effects are confounded, and a separate estimate of carryover effect is not provided. This does not seem to be a serious limitation. ESTIMATING THE AMOUNT OF DRUG ABUSE IN ADDITION TO THE FREQUENCY OF DRUG ABUSE Dr. Jain has clearly discussed the problems associated with estimating the amount of drug abuse in addition to the frequency of abuse. As pointed out by Dr. Jain, currently the clinical trials in this area are designed to estimate the frequency of drug abuse and not the amount of drug abuse. If the addict tests positive for drug abuse, then it is important to find out the extent to which the drug was abused. In other words, it is important to know if a replacement therapy is effective in reducing the total amount of drug abused in addition to reducing the frequency of drug abuse. A relationship between the amount of drugs present in the urine and the amount of drug consumed by the addict may be established in laboratory experiments, from which an estimate of the amount of drug consumed may be obtained. However, as Dr. Jain has clearly pointed out, such estimates are confounded by many factors. Therefore, such 38 estimates derived using the results obtained in the laboratory will not be precise. Under these circumstances it will be best to study the extent rather than the exact amount of drug abuse. Let us assume, for example, that the extent of drug abuse is categorized as low, medium, or high. Let the outcome variable Y be coded as 0 if the assay shows absence of abused opiates in the urine. Similarly, Y = 1, 2, 3 will be used to denote that the assay shows the extent of drug abused to be low, medium, and high, respectively. Since the outcome variable takes more than two distinct values, an appropriate polytomous logistic regression model can be developed for comparing the probabilities under different treatments after adjusting for the effects of covariates. A patient provides repeated observations up to a maximum of 17 weeks. Since each dose of a treatment drug provides one observation, a maximum of 51 replications for a treatment can be obtained for any patient. These observations from the same patient will not be independent. Thus, conditional probabilities will be used under the polytomous logistic regression setup. Suppose that there are n; observations for the ith patient, which are denoted by Y;;, and let X;; = (X1ij, Xaij, ..., Xpij)' denote the vector of covariates associated with Y;;,7 = 1,2, ...,m,j = 1, 2, ...,n;. Let Y;; = Ya, Ya, , Yiy-1l Yi = [yi1, Yi2, --. Yi 3-11 P(Ya = galX = Xi) = my,; PY: = wil = 45: X = Xi) = mj, yi; =0,1,2,3 {= 1; 2, ent J = 1,2 wisi Following the approach of Bonney (1987) as discussed by Gupta (1991) for the case of dichotomous outcome variables, the conditional probabilities as defined above can be modeled by considering Y; as covariates. Let Be = (Bie, Bae, cee , Bye) Nie (me, mae oy Mj—-1,8)’ 1 1 wll (et +FeXa) T10 39 1 LT (ee PR +m) e%s +B. Xi es +B Xij +03,Y;] Tjs = Tr ye ei a] y § = 1, 2. 3 Tj0 = fm 1,09. ...m domi Db i The logits for comparison with Y = 0 are thus obtained as given below. The logits for comparing with other values of Y can be written down in a similar way. log (2) = a, + Bi Xa T10 Tis * * log (22) = a, + B,Xij + nj,Y; Tj0 53m, 2,.8,i=1,2,...,m, J=1,2°,.. mn Following Gupta (1991), finally the model can be written as given below. 1 Zioe= 1+ ye (exe +P Xis +7, Wij +8:2i5) es +08, Xi; + Wii +6]2i; an = + X32 (ene HBX +1 Wis +42:)) where Wy = (Wig, Wags vos Waris) Zi = Kl Zuiss Zoigy vor ZnetisY Yi i= (71e, Y2ey oy Yn—-1,L ) be = {Gye py) Opin dY Weis 1 if Viu=0andu 9 »* »* * * » foo mmm i —---- | M,T,W,Th,F,Sa 7.3 i » ‘ v y ; v » * »* » * » x * r= eccmccs mrs rrr rar ——- | M,T,W,Th,F,Sa,S 0 ‘ . ' * ow ox x ox x x x M T W Th F Sa Ss M Safe Time Detected Time FIGURE 6. The relationship of %safe time to the urinalysis testing schedule. Test days are indicated by an asterisk. A second analysis of urine testing schedules was performed by simulating two random cocaine uses occurring during the same week. The time between the two doses was varied from 6 hours to 84 hours. Sets of 100 randomly selected times, separated by the minimum interval between cocaine use, were generated. The effectiveness of testing three times per week was compared with testing only once per week. The numbers of times that two uses resulted 53 100 A. Cocaine (n=4) 100 B. Morphine (n=6) 80 —d4— 20mg » —O— 10mg 60 60 —h— 20mg =e “0 9 oO 20 20 @ Bly @ 0 4 0+ Ay T 0 2 4 6 8 0 2 4 6 8 — 3 100 C. Heroin (n=6) —O— 33mg z 80 —&— 86mg 60 40 20 0+ \ 0 2 4 6 8 Days/Week Tested FIGURE 7. Relationship of drug testing schedules to %week undetected for cocaine, morphine, heroin, and codeine (EMIT d.a.u. analysis, 300-ng/mL cutoff) in 0, 1, and 2 positive results are shown in table 2 along with the number of times that two uses occurred within the same detection time period resulting in a single positive result. When the testing schedule called for only 1-day-per- week testing, a substantial amount of drug use went undetected. The number of times that no drug use was detected varied from 64 to 39 percent depending on the time interval between uses. Positive results ranged from 36 to 61 percent. There were only a few occurrences of random multiple drug use occurring within the same detection time. With a 3-days-per-week testing schedule (Monday, Wednesday, Friday), detection efficiency increased substantially over the 1-day-per-week testing schedule. The number of times that no positive results were obtained by the 3-days-per-week schedule varied from 6 to 16 percent. Single positive results 54 TABLE 1. Effect of urinalysis testing schedules on detection of a single cocaine use during a week of testing’ Average Single %Drug Episodes Detected Drug Use Urinalysis Episodes (Percent) Testing Tests/ Trial Trial Trial Trial Resulting in Two Schedule Week #1 #2 #3 #4 Mean Positive Tests M 1 13 18 26 22 20 0 M,Th 2 34 32 47 48 41 0 M,W,F 3 61 53 67 66 63 0 M,W,Th,F 4 68 59 75 74 69 13.8 M,T,W,Th,F 5 77 69 81 81 79 26.3 M,T,W,Th,F,Sa 6 93 89 95 95 93 33.0 M,T,W,Th,F,Sa,S 7 100 100 100 100 100 48.3 * Each trial consists of 100 randomly generated times during the week that a person might self-administer a single dose of cocaine. A detection time of 35.8 hours was used in the determination of %drug episodes detected. (one use went undetected) were obtained between 43 to 60 percent of the time, and double positive results (both uses were detected) were obtained at a frequency of 27 to 45 percent. When the single and double positive results are combined, the efficiency of detection of cocaine use for the week averaged 87.3 percent across the different drug use patterns. There were a maximum of seven instances of drug use occurring in the same detection time window when the second drug use could occur within 6 hours of the first use. In these instances, two uses appeared as a single use from the testing result. As the drug use interval lengthened to 24 hours, this phenomenon disappeared and was no longer a problem. The data shown in tables 1 and 2 were generated to challenge the earlier conclusion that a 3-days-per-week schedule was the best compromise between maximizing drug detection and minimizing carryover. A Monday, Wednesday, Friday testing schedule demonstrated a mean efficiency of 63 percent in detecting single incidents of cocaine use. The increase in efficiency by further testing was relatively minimal until the frequency was increased to 6 days or more per week. Carryover of drug use from one test to another was not a factor with the Monday, Wednesday, Friday testing schedule but did occur at higher frequency testing schedules. When multiple cocaine use was simulated, that is, 2-times-per-week separated by a minimum time interval, the 6-days- per-week testing schedule was substantially better than a 1-day-per-week schedule. 55 TABLE 2. Effect of urinalysis testing schedules on detection of two cocaine uses separated by a minimum hourly interval between uses during a week of testing’ Number of Times That Two Urinalysis Number of Positive Occurred in the Same Testing Minimum Hours Test Results Detection Period and Were Schedule Between Drug Use 0 1 2 Counted as One Positive Test M 6 64 36 —- 3 12 67 33 — 3 18 59 41 — 1 24 50 50 — 1 36 51 49 — 0 48 48 52 — 0 72 47 53 —_ 0 84 39 61 — 0 M,W,F 6 16 50 34 7 12 13 60 27 5 18 13 51 36 3 24 6 49 45 0 36 11 47 42 0 48 12 43 45 0 72 15 43 42 0 84 16 43 41 0 * Each trial consisted of 100 randomly generated time pairs (separated by a minimum interval) during the week that a person might self-administer two single doses of cocaine. A detection time of 35.8 hours was used in the determination of number of positive results. SUMMARY AND CONCLUSIONS Urinalysis can be used as an objective criterion for monitoring the outcome of a treatment program or a clinical trial. Important factors to consider when implementing a drug testing program include standardization of assay technology and cutoffs between participating centers and selection of identical testing schedules. Also, it is vitally important to minimize the amount of safe time (time that drug use can go undetected) occurring in a testing schedule. The detection times for cocaine and heroin have been shown to vary with selection of cutoff and with the drug dose. Obviously, the selection of cutoffs is under program control, whereas the amount of illicit drug use is under subject control. Fortunately, changes in the illicit drug dose by the subject demonstrate a log-linear relationship to detection time. Hence, a higher drug dose by the 56 subject only extends the detection time slightly (and improves the probability of detection) without greatly increasing the risks of drug carryover from one urine test to another. The most efficient testing schedule for judging the outcome of clinical trials for cocaine and heroin appears to be a 3-days-a-week schedule (Monday, Wednesday, Friday or Tuesday, Thursday, Saturday). When different schedules were challenged by simulating random times at which cocaine use might occur during the week, the 3-days-per-week schedule was the most efficient without the risk of carryover. The 3-days-per-week schedule also performed better than 1-day-per-week when multiple random drug use was simulated. Overall, the 3-days-per-week testing schedule with specified assay technology and cutoffs was the best compromise for maximizing detection of drug use, minimizing carryover, and providing a standardized methodology for outcome comparison between programs. REFERENCES Cone, E.J.; Dickerson, S.; Paul, B.D.; and Mitchell, J.M. Forensic drug testing for opiates: IV. Analytical sensitivity, specificity and accuracy of commercial urine opiate immunoassays. J Anal Toxicol 16:72-78, 1992. Cone, E.J.; Menchen, S.L.; Paul, B.D.; Mell, L.D.; and Mitchell, J. Validity testing of commercial urine cocaine metabolite assays: |. Assay detection times, individual excretion patterns, and kinetics after cocaine administration to humans. J Forensic Sci34:15-31,1989. Cone, E.J., and Mitchell, J. Validity testing of commercial urine cocaine metabolite assays: Il. Sensitivity, specificity, accuracy and confirmation by gas chromatography/mass spectrometry. J Forensic Sci 34:32-45, 1989. Goldstein, A., and Brown, B.W., Jr. Urine testing schedules in methadone maintenance treatment of heroin addiction. JAMA 214:311-315, 1970. Gorodetzky, C.W. Detection of drugs of abuse in biological fluids. In: Born, G.V.R,; Eichler, O.; Farah, A.; Herken, H.; and Welch, A.D., eds. Handbook of Experimental Pharmacology. Vol. 45. Berlin: Springer-Verlag, 1977. pp. 319-409. Harford, R.J., and Kleber, H.D. Comparative validity of random-interval and fixed-interval urinalysis schedules. Arch Gen Psychiatry 35:356-359, 1978. Mandatory guidelines for Federal workplace drug testing programs; final guidelines; notice. Federal Register 53:11970-11989, Apr. 11, 1988, ACKNOWLEDGMENT Dr. Nancy L. Geller of the National Heart, Lung, and Blood Institute reviewed and commented on the manuscript. 57 AUTHORS Edward J. Cone, Ph.D. Chief Laboratory of Chemistry and Drug Metabolism Sandra L. Dickerson, B.S. Medical Technologist Addiction Research Center National Institute on Drug Abuse P.O. Box 5180 Baltimore, MD 21224 58 Comments on “Efficacy of Urinalysis in Monitoring Heroin and Cocaine Abuse Patterns: Implications in Clinical Trials for Treatment of Drug Dependence” by Cone and Dickerson Nancy L. Geller Cone and Dickerson consider fixed-interval scheduling for drug use monitoring in trials for treatment of drug dependence. They conclude that changes in drug dose by the user alter detectability of drugs by urinalysis only slightly and that the Monday, Wednesday, Friday monitoring schedule is optimal because it maximizes the chance of detection of an episode of drug use and minimizes the chance of having two detections of the same episode. The conclusion that dose alters detectability only slightly assumes a log-linear relationship between drug dose and detection times. This is equivalent to a one-compartment pharmacokinetic model. The data for morphine and heroin in Cone and Dickerson’s figure 3 (this volume) suggest that a higher order compartmental model might be more appropriate. Such a possibility should be investigated. The authors’ conclusion that the Monday, Wednesday, Friday test schedule is optimal rests on certain assumptions: 1. [If there is any episode of drug use, the test schedule should be able to detect it most of the time. 2. Detection of drug use within approximately 36 hours of that use is certain; that is, there are no false negatives. 3. Having two tests detect one episode of drug use should be avoided if possible. 59 09 TABLE 1. Effect of urinalysis testing schedules on detection of a single random episode of cocaine use during a week of testing’ Simulated Probability of Urinalysis Simulated Probability of Drug Episode Resulting Actual Probability of Testing Detection of Drug Episode Actual Probability of in Two Positive Tests Drug Episode Resulting Schedule (n=400) Detection of Drug Episode (n=400) in Two Positive Tests None 0 0 0 0 M .20 .213 0 0 M,Th A 426 0 0 M,W,F .63 .639 0 0 M,W,Th,F .69 712 .138 .140 M,T,W,Th,F 79 .785 .263 .281 M,T,W,Th,F,Sa .93 927 .330 .351 Every day 1.00 1.00 .483 492 “A detection time of 35.8 hours and a zero probability of false negative tests were assumed in both the simulations and calculations. 4. If drug use is detected, there has indeed been drug use; that is, there are no false positives. 5. Drug detection will be done in multiples of 24 hours. The probabilities that are simulated, according to the assumptions above, can be calculated exactly and are shown in table 1. As in the simulations, the exact calculations assumed that an episode of drug use is equally likely to occur at any time during the week (i.e., uniformly distributed). However, a trial participant who is going to take the drug may recognize that he or she is less likely to test positive next time if the drug is used soon after a urine test. Similarly, the probabilities of the model assumed for Cone and Dickerson’s table 2 (this volume) can be explicitly calculated, but again, the times of an episode of drug taking may not be uniformly distributed. Simulation is a rich tool and could allow more complicated scenarios to be evaluated, including nonuniform times of drug use. The possibility of false positives and false negatives could be built into a simulation model, which is equivalent to varying the cutoff for detection from 300 ng/mL. Testing at more than one time of day, such as mornings or afternoons, could also be evaluated. Software for simulating stochastic processes, such as the General Purpose Simulation System, might be used so that, in addition, random test times could be assessed. The conclusions in Cone and Dickerson’s chapter follow logically from their assumptions. However, more complex assumptions might be more realistic and could be considered in further work. ACKNOWLEDGMENT Dean A. Follmann, Ph.D., National Heart, Lung, and Blood Institute, National Institutes of Health, is acknowledged for helpful discussions and for presenting these comments at the technical review in my absence. AUTHOR Nancy L. Geller, Ph.D. Chief Biostatistics Research Branch National Heart, Lung, and Blood Institute National Institutes of Health Federal Building, Room 2A-11 7550 Wisconsin Avenue Bethesda, MD 20892 61 Summary of Discussion: “Efficacy of Urinalysis in Monitoring Heroin and Cocaine Abuse Patterns: Implications in Clinical Trials for Treatment of Drug Dependence” by Cone and Dickerson Ram B. Jain Dr. Weng suggested that blood samples from each subject be obtained prior to entry into the trial so that their individual pharmacokinetic profiles could be studied and their metabolic rates evaluated. The differences in metabolic rates will have a bearing on the detectability of drugs in urine. Individual pharmacokinetic profiles could also be used to appropriately schedule collection of urine samples. This suggestion was appreciated; however, as pointed out by Dr. Johnson, it is not practical to obtain blood samples from every subject, since some have poor venous access due to abuse of their veins from frequent injections. In addition, the process of randomization should equally distribute fast and slow metabolizers across different treatment groups. Dr. Wright inquired about the cross-reactivity between opiates of abuse and replacement (treatment) opiates (and over-the-counter drugs) in immunoassays and about the need for confirmatory testing. According to Dr. Cone, the probability of false positives in immunoassays to detect opiates is very small unless a subject is using codeine. Use of a confirmatory assay such as gas chromatography/mass spectrometry would add little unless there was a need for quantitative data. Dr. Gorodetzky asked if, in testing an individual by immunoassay following drug usage, negative results could be followed by positive results. It was acknowledged that this does not happen very often except with marijuana. Dr. Fisher suggested that urine specimens be collected every day to collect the maximum amount of information. He suggested that this information could then be used to more appropriately interpret and/or modify information obtained from 62 Monday, Wednesday, and Friday specimens. He also suggested the need for estimating the amount of opiates used by using a method such as area under the curve. This method is probably impractical since many urine specimens would have to be collected over time, or timed plasma specimens with knowledge of duration since injection and amount of drug injected would be required. Dr. Johnson said different subjects may need different amounts of opiates to have the same effect, and because of risk of human immunodeficiency virus infection from intravenous injection of drugs using shared needles, it is important to know the exposure frequency and what the treatment drug can do to reduce this frequency. Dr. Gordon also proposed to collect urine specimens more often than three times a week and, based on the results of a certain number of successive specimens (e.g., positive, negative, positive, positive), develop an algorithm to decide whether two or more consecutive positive specimens represent independent episodes of drug abuse or carryover. The proposal was well taken, but the same algorithm cannot be applied to all subjects since the probability of carryover varies from subject to subject. Such an algorithm has the potential to underestimate the probability of drug abuse. However, such an algorithm used in conjunction with self-reported drug use might be a possibility. AUTHOR Ram B. Jain, Ph.D. Mathematical Statistician Biometrics Branch Medications Development Division National Institute on Drug Abuse Parklawn Building, Room 11A-55 5600 Fishers Lane Rockville, MD 20857 63 Open/Panel Discussion: Design Issues Ram B. Jain Panel Members: A.S. Hedayat (Chair), Albert J. Getson, Alan J. Gross, Sudhir Gupta, Don Jasinski, Mei-Ling Ting Lee, Carol K. Redmond, and Margaret Wu The three primary issues discussed were: * Fixed time vs. random time sampling, including sampling frequency » Estimation of carryover « Estimation of the amount of drug abuse FIXED TIME VS. RANDOM TIME SAMPLING It was opined that the objectives of the clinical trials would determine the adequacy of fixed or random time sampling. If the objective was merely to evaluate the efficacy of a treatment drug, fixed time sampling would probably be the sampling scheme of choice. If determination of the effectiveness of the treatment drug was the objective of the trial, then random time sampling would probably be the sampling scheme of choice. It was pointed out that determination of pharmacological efficacy of a treatment drug was the primary objective of a clinical trial such as the ARC 090 trial completed at the Addiction Research Center. The pharmacological efficacy was primarily evaluated by posttreatment frequency of drug abuse. However, as pointed out by Dr. Johnson, variables such as retention rates, withdrawal symptoms and signs, and opiate- and cocaine-craving scores were also evaluated in the ARC 090 trial. Since the frequency of drug abuse is not directly measurable, frequency of detected drug (ab)use from urine samples on a per-sample or per-week basis is the surrogate measure used to represent posttreatment frequency of drug abuse. Using this surrogate measure, it is possible that multiple episodes of drug abuse are counted as one, but this is the limitation of the sampling 64 techniques currently used. In addition to reduction in the frequency of drug abuse, as Dr. Vocci mentioned, there is also interest in knowing when the treatment drug starts working. Some individuals in the buprenorphine and methadone 60 mg arms of the ARC 090 study stopped abusing drugs almost immediately and remained drug-free throughout the trial. Some individuals need to build a reservoir of treatment drug in the body before the drug shows its effect; it takes these individuals some time (4 to 6 weeks) before they stop abusing drugs. Eventually, a receptor occupancy may be reached for all individuals that may be consistent with no more drug (ab)use. Since agonists, partial agonists, and antagonists act differently, there is also interest in evaluating the pattern of cessation of drug abuse, that is, the pattern of positive and negative urine samples. There is interest in being able to know if some daily users are being converted to weekend users only, for example, the maximum duration for which they can remain drug-free. For example, one of the outcome variables analyzed for the ARC 090 trial was the time to the (first) drug-free period of 28 days or more as determined by negative urines. However, as pointed out by Drs. Wright and Getson, it is critical to remember that reduction in frequency and/or amount of drug abuse is only a small part of the claims that can be made for a treatment drug. These medications may also alter symptoms of drug (ab)use; suffering from drug (ab)use; social functioning (behavior) such as employment stability, family life, and crime- related activities; or target behaviors such as needle sharing and injection of illicit drugs. Evaluation of efficacy should be married to the development of the treatment compound as a whole. Efficacy trials should be followed by effectiveness trials, which may focus more on the sociological behaviors, as mentioned earlier. For these effectiveness trials, random time sampling may be the sampling scheme of choice. These effectiveness trials should lead to a broader understanding of the compound as a whole. An efficacy trial determines whether or not the drug works; an effectiveness trial generates additional information helpful in writing a good label (package insert) for the treatment drug. An efficacious drug in the hands of a good clinician would work more effectively since these clinicians are likely to supplement treatment drugs with services such as family and/or employment counseling. However, if these effectiveness variables are allowed to interact with efficacy variables in the efficacy trials, the sample size requirements would become prohibitive and it may not be possible to show the pharmacological efficacy of the treatment drugs. It was suggested that efficacy trials may include an additional treatment arm in which subjects get other services, such as counseling, after only 2, 4, or 8 weeks of drug therapy. 65 If frequency of detectable drug abuse is the primary outcome variable in an efficacy trial, the drug abuse phenomenon should be viewed on a continuum, and as such, fixed time sampling should be appropriate. In fact, Dr. Fisher strongly favored collection of urine samples more often than three times a week, probably every day, since more information is generally better. However, collecting urine samples too often may have a negative effect on dropout rates and will shape the patient population remaining in the trial in such a way that generalizations to the addict population-at-large may be difficult. In fact, dropout rates are substantial (as much as 60 to 80 percent) in these trials. Cost may be another factor that should be considered. A compromise may be to do less frequent sampling in those who remain in the trial for a certain period and get as much information as possible on those who dropped out by sending out nurses, social workers, etc. There was a strong feeling that additional information should be obtained on those who drop out of the trial because, with dropout rates as high as they are in these trials, there certainly is a serious problem in making inferences for the total addict population. Also, such high dropout rates make it difficult to do an intent-to-treat analysis. Dr. Jasinski pointed out that clinical trials are unique experiments as opposed to drug treatment programs. The lack of resources (financial and others) and practical considerations such as frequent collection of urine samples, if desired, should not stand in the way of doing these experiments. Resources should be obtained and study centers identified where these experiments can be successfully conducted. There were other arguments in favor of and against both fixed and random time sampling. It was suggested that fixed time sampling results in nonrandom missed observations. Since missing at random may be an assumption required to do some analyses, this may create a potential bias in these analyses. However, even in random time sampling, addicts are able to determine how often they will be tested and when, and thus, even random time sampling cannot ensure random missed observations. The data do not exist to show which type of sampling leads to higher noncompliance, including dropout rates. It may be that it is just the frequency (e.g., once a week vs. five times a week) of urine collection, irrespective of the type of sampling, that has a bearing on the noncompliance problem. In fixed time sampling, staffing requirements (to collect urine samples) are known in advance, which helps in planning for resources. A Food and Drug Administration audit of a trial done using fixed time sampling is relatively easier to conduct. Also, the choice between fixed time and random time sampling may be a choice between dealing with a possible treatment by day interaction and a relatively large error term (noise). Fixed time sampling may be used to collect data from some experimental units, and random sampling may be used to collect data from other experimental 66 units. But analysis of these data may present unknown challenges and possible interpretation problems. Alternatively, the data may be collected frequently using fixed time sampling, and only randomly selected data points may be used for analyses. In addition, irrespective of the type of sampling used, a clinical trial that has the ability to test (internally validate) some of the assumptions used in the analyses is preferred over one that does not have such an ability. It was brought to attention that, since efficacy trials do not have any negative contingencies associated with results of urine samples, the question of whether data are missing at random may be a nonquestion since addicts may not have a reason to miss clinic visits. Dr. Blaine agreed. He explained that in their gepirone study, which did not have any negative contingencies associated with urine results, patients’ admission of drug (ab)use matched urine test results most of the time. However, he added that absence of negative contingencies amounts to permission for drug abuse, as can be seen from the substantially higher percentage of positive urines (60 to 70 percent) from one of their buprenorphine trials, which did not have negative contingencies, compared with some treatment clinics (10 to 12 percent positive urines), which do have negative contingencies. Dr. Vocci emphasized that “if you are looking for efficacy in a clinical trial, you are better off . . . allowing individuals to use [drugs] in a manner that is not proscribed by the policies of the clinic.” Artificially controlling drug abuse may result in prohibitive sample size requirements if a pharmacological effect is to be shown. It was also suggested that the question of fixed vs. random time sampling should be decided by simulation methods using known pharmacokinetic profiles of the drugs that the urine samples are supposed to detect. These simulation methods may allow for a permissible degree of carryover and an inability to detect episodes of drug abuse. However, since pharmacokinetic profiles of the drugs of abuse are dose dependent and the dose and timing of drugs consumed by an addict are not known, such an exercise may be very difficult. ESTIMATION OF CARRYOVER It was mentioned that, in addition to a parallel design, a crossover design should be considered. A crossover design may be able to better handle the problem of carryover. However, as Dr. Hedayat pointed out, crossover designs have their own problems in interpretation of results. Parallel designs may be used to answer certain questions, whereas crossover trials may be designed to answer other questions. 67 Dr. Mei-Ling Lee visualized the problem in a different way. She observed that in these trials researchers are working with a mixture of distributions, and data should be analyzed as a mixing distributions problem. However, to analyze these data as a mixture of distributions, an estimate of the amount of drug present in the urine or plasma samples will be needed. Dr. Follmann agreed. Binary data would not be sufficient. Dr. Collins commented that, unless the timing of the episodes of drug abuse is known, concentration profiles of drug in the urine or plasma samples may not be fully informative. And as such, information obtained from pharmacokinetic profiles may have to be supplemented with that obtained by asking the addicts about the timing of drug abuse, if any, each time he or she is asked to provide a urine or plasma sample. If binary data must be used, information obtained from a self-reported measure of drug abuse (e.g., “Did you use the drug during the last 24 hours?”) when combined with the urine results may be able to help decide if a carryover existed. Dr. Wright suggested we should be looking for other sources of information, such as the staff and clinicians present at the time of clinic visits. The clinic staff should be able to judge if the subject may or may not have abused the drugs since the last clinic visit or since the last time a urine specimen was provided. Various pieces of information from various sources, including urine test results, can be put together according to a certain predefined set of rules and converted into some sort of scores that may be interpreted as a new episode of drug abuse or a carryover from the previous episode. ESTIMATION OF THE AMOUNT OF DRUG ABUSE A reduction in the frequency of drug abuse does not guarantee reduction in the amount of drug abuse. A daily user may be converted to an occasional user, but he or she may still be using the same amount of drug. Instead of using the same amount in, for example, 10 episodes, he or she might be using the same amount in 5 episodes. However, the binary data currently obtainable from urine assays cannot provide estimates of the amount of drug abuse. Hence, it was of interest to discuss this issue of being able to estimate the amount of drug abuse. There was a strong sentiment at the meeting against attempts to estimate drug consumed from the drug present in the urine samples. As Dr. Gorodetzky put it, “Do not ask too much from a qualitative . . . urine. You can be as precise as you want in terms of quantity of morphine in a . . . urine sample. It is not going to tellyou a . . . thing about how much drug was taken, when and how many times. You cannotdo it. ... There are some theoretical models you can build, 68 if you knew what time, when the time of drug administration was, if you knew when patients urinated, what the time was since last urination, what the volume was in this timed collection. Then maybe, if you had good enough data on which to base it, you could make some inferences. . .. Right now, it cannot be done.” Dr. Jasinski expressed similar sentiments. AUTHOR Ram B. Jain, Ph.D. Mathematical Statistician Biometrics Branch Medications Development Division National Institute on Drug Abuse Parklawn Building, Room 11A-55 5600 Fishers Lane Rockville, MD 20857 69 A Bayesian Nonparametric Approach to Analysis of Treatment for Drug-Dependence Data Ram C. Tiwari INTRODUCTION In the National Institute on Drug Abuse ARC 090 trial to evaluate the efficacy of buprenorphine as compared with methadone 20 mg, and methadone 60 mg for treatment of opiate dependence, urine samples were obtained from patients three times (Monday, Wednesday, and Friday) every week for a period of 25 weeks. During the first 17 weeks of the study, the patients were maintained on the treatment drug; during the rest of the study, they were detoxified from the treatment drug. The urine samples were assayed for the presence of opiates. This chapter analyzes the data set collected from the first 17 weeks. As each dose of a treatment drug provides one observation, a maximum of 51 urine samples were obtained from each patient. The data also contain some missing observations due to no-shows during the course of study or due to withdrawals from the study. The accommodation of missing observations is an important issue and so is the use of information from the withdrawals from the study. To accommodate some missing observations, we have reduced 51-dimensional data to 17-dimensional data by developing a weekly index of urine samples being positive or negative (see, also, Jain, this volume). A week is considered to be negative for opiates if at least two observations in this week are negative. Otherwise, the week is considered to be positive for opiates. Thus, the weeks with censored observations and two or more missing observations are automatically considered to be positive. This assumption does result in some loss of information, e.g., one who has three negative urines during a week is treated the same way as one who has only two negative urines during the week. The next section presents a Bayesian approach to analysis of the binary response data. The analysis of ARC 090 trial data is presented in the last section. 70 BAYES ESTIMATION Here, we consider a Bayesian nonparametric approach to the estimation of the conditional probabilities of the binary responses. To simplify the notations, denote a typical point (ti, .. .,t,) of the product space 7, = {0,1}" by ¢,. (r=1,2,...). By t.0 denote the point in 7; that is obtained by augmenting ¢, by 0, that is, £0 = (t1,...,t,,0), and similarly for ¢,1. Finally, denote by [¢,] the cylinder set of all points in 7 = {0, 1}* whose first » coordinates form the vectort,, thatis, [t,] = {s = (t1,12,...) € T:s5, =1,}. Let F, be the collection of the empty set and the finite disjoint unions of cylinder sets [t,],t, € 7, r = 1,.... The F,’s form an increasing sequence of o-fields, and F* = |J;2, F, is a field. The o-field F in 7 is the smallest o-field containing F*. Consider a sequence of blocks w of numbers in the closed unit interval [0, 1]: w = {m, (71, m0), (711, 710, T01, M00); - - -} (1) Fort € {0,1}, letn} =m ift=1, and m =1-m, ift=0. Definea probability measure P, on (, F,) such that r r II Tiiyer tien II (1- Tta,..., 23.1) ’ j=1lit;=1 j=1:t;=0 = zh Ty my Lo I EF r=1,2,.. (2) Ii Pr([t-]) (Here and throughout ¢, is interpreted as the empty sequence, ¢, and m, is interpreted as =.) Then, it can be checked that Pri ([t,1]) + Pr41([t,0]) = Pria([t,]) = Pr([t,]), i, €F, r=1,.... Thus, the restriction of P.4; to F, is P,, and P, uniquely extends to a probability measure P on (7, F). Let Q = {w} be the space of all blocks w with its coordinates lying in [0, 1]. If © "is equipped with the product o-field, o(2), then the map w — P,, defines a transition function from (©, o(2)) into (7, F). Consider a probability measure Q on (2, 0(2)) such that, under Q, the coordinates of w are mutually independent with P,([1]) = = ~ Beta(a([1]),«([0])), and Pu([t,1][t.]) m, ~ Beta(a([t,1]), a([t.0]), t, € 7, r= 1,2,...,(3) 71 where Beta(a, b) denotes a Beta distribution with parameters a and b, then, under @, the random probability measure P, is said to have a Polya tree process (Ferguson 1974). The posterior distribution of a Polya tree process, given an observation, is also a Polya tree process and is obtained by updating one of the Beta distributions at each level of the tree. If in (3) « is a finite measure on (7, F), then, under Q, the joint distribution of {P.([t,]); t. € T.} is a Dirichlet distribution with parameter {«([t,]); ¢, € 7+}, r=1,2,... (see Basu and Tiwari 1982). Furthermore, if {A1,A,,..., Ax} C F* is a partition of 7, then (P,(A1), Pu(A2), ..., Pu(Ar)) have a Dirichlet distribution with parameters (a(A1), a(Az), ..., a(A)), since Aj €F*(j=1,2,...,k)impliesthat A; € F.(j = 1,...,k), forsome r (r=1,...,n). The following result is useful. Theorem 1. (Blackwell 1973). The random probability measure P,, is a Dirichlet process on (7, F) with parameter a. To prove theorem 1 it suffices to show that, for arbitrary measurable partition Ay, Aa, ..., Ax C F of T, the distribution of (P,(A1), Pu(Az2),..., Pu(Ar)) is a Dirichlet distribution with parameters (a(A1), a(A2),..., a(Ax)) (cf. Ferguson 1973). This follows from the following lemmas. Lemma 1. Let A be an arbitrary set in F. Then, under the assumption (3), the random variable P,(A) has a Beta(a(A), a(A°)) distribution. Proof. LetC = { A € F: P,(A) has a Beta(a(A), a(A°)) distribution }. Then, by definition C C F. It is easy see that C contains F*. Also, C is a monotone class. To see this, let {A,}5%, be an increasing sequence of sets in C, and let A =U;2, An. Then, clearly A € F, and by continuity from below P,(An) 1 P,(A), foreach w € §, as n — oo, which is a stronger result than the convergence in distribution. Again, since all finite moments of the random variable P, (A, ) converge to the corresponding moments of a random variable X, say, having Beta(a(A), a(A°)) distribution. Hence, A € C. For a decreasing sequence of sets in C, we argue in a similar way. Thus, C is a monotone class containing the field F*, and hence C contains F, the smallest o-field containing F* Lemma 2. For any arbitrary finite partition (A1, As, ..., Ax) for T in F-measurable sets, the random variables (P, (A;), P.(A2),..., P.(Ax)) have the Dirichlet distribution with parameters (a(A;), a(42), ..., a(Ax)). Proof. By an approximation theorem (see, Billingsley 1979, Theorem 11.4, 72 p. 140), there exists a sequence {(A1n, Azan, . . ., Akn)} ox; of partitions of 7 into F*-measurable sets such that a(4;AAj,) —0asn —o00,j=1,2,...,k. Also, from Lemma 1, the random variable P,, (A; AAj,) has a Beta(a(AjAAjn), a(A§AAj,)) distribution, j = 1,2,..., k. Therefore, for j=12,...,k we have | Pu (45) Po, (An) < P(A; DAjn) — 0 wpl[Q] as n — oo, since Eq P,(AjAAjn) = Hien) — 0 as n — oo. Thus, the random variables (P., (Ain), Puo(A2n),..., Pu(Ak)) converge in probability and hence in distribution to the random variables (P,, (Ai), Pu(Az2), . .., Pu(Ag)). Since, for each n, the random variables (P, (Ain), . . ., Pu(Akn)) have a Dirichlet distribution with parameters (a(Ai1,), @(Azn), .. ., a(Arn)), and all finite moments of (P, (Ain), Pu(A2n),..., Pu(Arn)) converge to the corresponding moments of the random variables (Xi, X», ..., Xi), say, having a Dirichlet distribution with parameters (a(A1), (Az), ...,a(Ax)), it follows that (P.(A1), Pu(A2),..., P,(Ar)) have a Dirichlet distribution with parameters (a(Ay),a(A2),...,a(Ar)). For more on the Dirichlet process, see Ferguson (1973, 1974) or a recent survey article by Ferguson and colleagues (1992). The mapt = (t1,t2,...) = req && from 7 into [0, 1] induces a random probability measure on [0, 1]. If P,, is a Polya tree process on 7, then the induced random probability measure on [0, 1] is also a Polya tree process. Furthermore, if > > Var(m; ) < oo, (4) ras] t,.€7, then the induced random distribution function on [0, 1] is absolutely continuous w.p. 1[@] (cf. Kraft 1964 and Métiviar 1971). If P,, is a Dirichlet process on (7, F) with parameter a, then (4) simplifies to $y alee) _ " r=1t €7, a(lt,])’ (e([t,]) $ 1) Suppose there are m patients involved in the study on a treatment. Corresponding to the ith patient, let y* = (yi, 4,.. ., i) denote the vector of observations on the response variable y taking on only two values: 1 = presence, and 0 = absence of opiate. Thus, y* € 7,,. Let vi =, 1) 73 Given P,, lett!,t2,...,t™ be i.i.d. observations from P,. Then, the likelihood function of w is L(w|tk = yl nil =y™) 1 (JT ese) [L. i= wo) | [ pn = [Tn hain Ya i=1 v, ¥ I = panhe A x)™(OD zm (q —- 1), = i il II ay 5 my m0, (6) r=0¢ €7, where m([t,]) = 3°i%, 6,:([t,]), and m([¢,0]) = m — m([t,1]), t, € Tr, and &, is the degenerate measure at a. Clearly, under @, Py([t,]) ~ Beta(a(lt,]), (7) — a([t]), t, € Tr, r= 1,2, Furthermore, from (3) and (6) it can be easily checked that the coordinates of w are mutually independent a posteriori, and Tl =g1,. tm=ym ~ Beta(a([1]) + m([1]), «([0]) + m([0])) and mt, le =y1,.. em=ym ~ Beta(a([t,1]) + m([t,1]), a([y,0]) + m([y,0])), {eT r=1%%0. From (2), (3), and (7), it follows that if P,, is a Polya tree process, then P,,, given the data, is again a Polya tree process. In particular, if P, is a Dirichlet process with parameter a, then P,, given the data, is also a Dirichlet process on (7, F) with updated parameter a(-) + m(:). Also, from (7), under squared error loss, the Bayes estimators of the conditional probabilities are given by + = 21) + m1) a(T)+m 74 and + = oll) + mt 1) v= a) +m) ” oll) (elt) mt) (m1) «(G+ mE] ( a(t) ) + ED) +m) ( m((t,]) ) 8) where t,. € 7,. The posterior variance of the conditional probabilities are given by Come my = (et) +m D0) + mit 0) Vara(my ln = 8-0 820) = (010) + mL) P(e) + mE] + 1 Also, the Bayes estimator of the unconditional probabilities P.,([t,]), t, € 7; and their posterior variances are given by = gm = AD Em) P(lt,]) EQ(Pu(ltDlta = ¥",-- 8 ao(T)+m a (2%) + Th m (eD) ) 9) and Varg(Pul]) 16h = 7 = yn) = CLA mL) 4m mile), respectively. ANALYSIS OF ARC 090 TRIAL DATA As mentioned earlier, we have reduced 51-dimensional data to 17-dimensional data by developing a weekly index of urine samples being positive or negative. A week is considered to be negative for opiates if at least two observations in this week are negative. Otherwise, the week is considered to be positive for opiates. Clearly, this approach takes into account the censored observations. We have denoted the positive weeks by 1’s and the negative weeks by 0's. For simplicity, we assume that the parameter « of the Dirichlet process is given by a([t,]) = 3=, t, € Fr. This corresponds to the Lebesgue measure on [0, 1]. Thus, for no sample case, the prior guess of the unconditional probability P([t,]) is 2 and that of the conditional probability =; = P([¢.1] | [¢,]) is 3. The corresponding Bayes estimates P([t,]) and #y_, for some selected sequences 75 for the three treatments buprenorphine, methadone 20 mg, and methadone 60 mg are given by the columns 2 and 3 in tables 1, 2, and 3. For example, if r= >5 and [t.] = [11111], then from table 1 we observe that P([t,]) = 0.37094907407 and wy = P([111111] | [11111]) = 0.9921996880. Graphs of unconditional probabilities for some sequences (up to length five) for the three treatments are given in figures 1, 2, and 3. TABLE 1. Conditional and unconditional probabilities of some selected sequences for buprenorphine treatment tr P([t,]) P([t,1]I[2.]) 1 7.1296296296E-01 7.0779220779E-01 1 5.0462962963E-01 8.8532110092E-01 10 2.0833333333E-01 2.7777777778E-01 00 2.6388888889E-01 1.4912280702E-01 111 4.4675925926E-01 8.3160621762E-01 100 1.5046296296E-01 3.7692307692E-01 1111 3.7152777778E-01 9.9844236760E-01 1110 7.5231481481E-02 9.9230769231E-01 1001 5.6712962063E-02 6.6326530612E-01 1111 3.7094907407E-01 9.9921996880E-01 11101 7.4652777778E-02 7.4806201550E-01 10011 3.7615740741E-02 9.9230769231E-01 00000 2.0428240741E-01 2.7337110482E-01 111111 3.7065972222E-01 9.4964871194E-01 111011 5.5844907407E-02 6.6580310881E-01 100000 7.4363425926E-02 5.0000000000E-01 100111 3.7326388889E-02 9.9612403101E-01 1111111 3.5199652778E-01 9.9979449240E-01 1110111 3.7181712963E-02 9.9805447471E-01 1000001 3.7181712963E-02 9.9805447471E-01 1001111 3.7181712963E-02 9.9805447471E-01 11111111 3.5192418981E-01 9.4727646454E-01 11101111 3.7109375000E-02 9.9902534113E-01 10011111 3.7109375000E-02 9.9902534113E-01 11111 3.3336950231E-01 9.4439622437E-01 111011111 3.7073206018E-02 9.9951219512E-01 100111111 3.7073206018E-02 9.9951219512E-01 M1111 3.1483289931E-01 9.4115112873E-01 1110111111 3.7055121528E-02 9.9975597853E-01 1001111111 3.7055121528E-02 9.9975597853E-01 11111111111 2.9630533854E-01 ~~ 9.3748664897E-01 11101111111 3.7046079282E-02 9.9987795948E-01 10011111111 3.7046079282E-02 9.9987795948E-01 ARARARERERRE] 2.7778229890E-01 9.9999186211E-01 111011111111 3.7041558160E-02 9.9993897229E-01 100111111111 3.7041568160E-02 9.9993897229E-01 1111111111111 2.7778003834E-01 9.9999593102E-01 1110111111111 3.7039297598E-02 9.9996948428E-01 1001111111111 3.7039297598E-02 9.9996948428E-01 1111111111111 2.7777890806E-01 9.9999796550E-01 1110111111111 3.7038167318E-02 9.9998474168E-01 1001111111111 3.7038167318E-02 9.9998474168E-01 11111111 1In 2.7777834292E-01 9.9999898275E-01 1110111111111 3.7037602177E-02 9.9999237072E-01 100111111111111 3.7037602177E-02 9.9999237072E-01 76 The tables and figures show that for all the three treatments the probabilities for consecutive positive weeks of a fixed length are larger than the probabilities of any other sequences of the same length. The probabilities of consecutive positive weeks for buprenorphine and methadone 60 mg are smaller than the corresponding probabilities for methadone 20 mg. TABLE 2. Conditional and unconditional probabilities of some selected sequences for methadone 20-mg treatment tr P([t,]) P([t,1]Ilt.]) 1 7.0464285714E-01 8.3707865168E-01 00 1.8303571429E-01 4.0243902439E-01 11 6.6517857143E-01 9.6979865772E-01 10 1.2946428571E-01 5.6896551724E-01 001 7.3660714286E-02 7.4242424242E-01 111 6.4508928571E-01 8.8754325259E-01 100 5.5803571429E-02 6.6000000000E-01 0011 5.4687500000E-02 6.6326530612E-01 1111 5.7254464286E-01 9.6783625731E-01 1110 7.2544642857E-02 7.4615384615E-01 1001 3.6830357143E-02 9.8484848485E-01 00111 3.6272321429E-02 9.9230769231E-01 7.1986607143E-02 5.0000000000E-01 1111 5.5412946428E-01 9.9949647533E-01 11101 5.4129464286E-02 9.9484536082E-01 10100 5.4129464286E-02 9.9484536082E-01 10011 3.6272321429E-02 9.9230769231E-01 001111 3.5993303571E-02 9.9612403101E-01 000001 3.5993303571E-02 9.9612403101E-01 111111 5.5385044643E-01 9.9974811083E-01 111011 5.3850446429E-02 9.9740932642E-01 100111 3.5993303571E-02 9.9612403101E-01 0011111 3.5853794643E-02 9.9805447471E-01 1111111 5.5371093750E-01 9.9987402368E-01 1110111 5.3710937500E-02 6.6623376623E-01 00111111 3.5784040179E-02 9.9902534113E-01 11111111 5.5364118303E-01 9.9993700390E-01 11101111 3.5784040179E-02 9.9902534113E-01 001111111 3.5749162946E-02 9.9951219512E-01 111111111 5.5360630580E-01 9.9996849997E-01 111011111 3.5749162946E-02 9.9951219512E-01 0011111111 3.5731724330E-02 9.9975597853E-01 mnmnn 5.5358886719E-01 9.6772720113E-01 1110111111 3.5731724330E-02 9.9975597853E-01 00111111111 3.5723005022E-02 9.9987795948E-01 111m 5.3572300502E-01 9.6665907 130E-01 11101111111 3.5723005022E-02 9.9987795948E-01 001111111111 3.5718645368E-02 9.9993897229E-01 minim 5.1786150251E-01 9.9999579071E-01 111011111111 3.5718645368E-02 9.9993897229E-01 0011111111111 3.5716465541E-02 9.9996948428E-01 111111111111 5.1785932268E-01 9.9999789535E-01 1110111111111 3.5716465541E-02 9.9996948428E-01 oo111111111111 3.5715375628E-02 9.9998474168E-01 1111111111111 5.1785823277E-01 9.9999894767E-01 1110111111111 3.5715375628E-02 9.9998474168E-01 001111111111111 3.5714830671E-02 9.9999237072E-01 1111111111111 5.1785768781E-01 9.9999947383E-01 111011111111111 3.5714830671E-02 9.9999237072E-01 77 ACKNOWLEDGMENTS Dr. Ram Jain of the Division of Medications Development, National Institute of Drug Abuse made helpful comments. Also, Stavros Tourkodimitris, a graduate student, helped with the calculations. TABLE 3. Conditional and unconditional probabilities of some selected sequences for methadone 60-mg treatment t P([,]) P([t,1]/[t-]) 0 22727272727E-01 5.0000000000E-01 1 7.7272727273E-01 7.1176470588E-01 01 1.1363636364E-01 8.2000000000E-01 10 2.2272727273E-01 4.1836734694E-01 11 5.5000000000E-01 8.3057851240E-01 011 9.3181818182E-02 9.8780487805E-01 101 9.3181818182E-02 7.9268292683E-01 100 1.2954545455E-01 4.2982456140E-01 111 4.5681818182E-01 7.9850746269E-01 110 9.3181818182E-02 4.0243902439E-01 0111 9.2045454545E-02 7.9629629630E-01 1011 7.3863636364E-02 7.4615384615E-01 1001 5.5681818182E-02 6.6326530612E-01 1111 3.6477272727E-01 9.4859813084E-01 1110 9.2045454545E-02 4.0123456790E-01 1101 3.7500000000E-02 9.8484848485E-01 01111 7.3295454545E-02 9.9612403101E-01 10111 5.5113636364E-02 9.9484536082E-01 10011 3.6931818182E-02 9.9230769231E-01 11111 3.4602272727E-01 9.9917898194E-01 11100 5.5113636364E-02 6.6494845361E-01 11101 3.6931818182E-02 9.9230769231E-01 11011 3.6931818182E-02 9.9230769231E-01 011111 7.3011363636E-02 9.9805447471E-01 101111 5.4829545454E-02 9.9740932642E-01 100111 3.6647727273E-02 9.9612403101E-01 111111 3.4573863636E-01 9.4700082169E-01 0111111 7.2869318182E-02 9.9902534113E-01 1011111 5.4687500000E-02 9.9870129870E-01 1001111 3.6505681818E-02 9.9805447471E-01 1111111 3.2741477273E-01 9.4425162690E-01 01111111 7.2798295454E-02 7.4975609756E-01 10111111 5.4616477273E-02 9.9934980494E-01 10011111 3.6434659091E-02 9.9902534113E-01 11111111 3.0916193182E-01 9.9988513669E-01 011111111 5.4580965909E-02 9.9967469095E-01 101111111 5.4580965909E-02 9.9967469095E-01 100111111 3.6399147727E-02 9.9951219512E-01 111111111 3.0912642045E-01 9.9994256174E-01 0111111111 5.4563210227E-02 9.9983729255E-01 1011111111 5.4563210227E-02 9.9983729255E-01 1001111111 3.6381392045E-02 9.9975597853E-01 1111111111 3.0910866477E-01 9.9997127922E-01 01111111111 5.4554332386E-02 9.9991863303E-01 10111111111 5.4554332386E-02 9.9991863303E-01 10011111111 3.6372514204E-02 9.9987795948E-01 11111111111 3.0909978693E-01 9.9998563920E-01 011111111111 5.4549893466E-02 9.9995931321E-01 101111111111 5.4549893466E-02 9.9995931321E-01 100111111111 3.6368075284E-02 _9.9993897229E-01 78 08 11111 1+-POSITIVE O-NEGATIVE 111111 0.2 — 0 FIGURE 1. Unconditional probabilities for buprenorphine treatment 1 1-POSITIVE 1 O-NEGATIVE 11 ~~ os} ssi tonite VIAN init] Bn = ~~’ 0.2 10011 0 + } + T I T T T I + v 1 v Ll ty ~ FIGURE 2. Unconditional probabilities for methadone 20-mg treatment 79 08 ” 1-POSITIVE O-NEGATIVE FIGURE 3. Unconditional probabilities for methadone 60-mg treatment REFERENCES Basu, D., and Tiwari, R.C. A note on the Dirichlet process. In: Kallianpur, G.; Krishnaiah, P.R.; Ghosh, J.K., eds. Statistics and Probability: Essays in Honor of C.R. Rao. New York: North-Holland, 1982. pp. 89-103. Billingsley, P. Probability and Measures. New York: Wiley, 1979. Blackwell, D. Discreteness of Ferguson selections. Ann Stat 1:356-358, 1973. Ferguson, T.S. A Bayesian analysis of some nonparametric problems. Ann Stat 1:209-230, 1973. Ferguson, T.S. Prior distributions on spaces of probability measures. Ann Stat 2:615-629, 1974. Ferguson, T.S.; Phadia, E.G.; and Tiwari, R.C. Bayesian nonparametric inference. In: Ghosh, M., and Pathak, P.K., eds. Essays in Honor of D. Basu Vol. 17, IMS Lecture Notes - Monograph Series. Hayward, CA: Institute of Mathematical Statistics, 1992. Kraft, C.H. A class of distribution function processes which have derivatives. J Appl Probability 1:385-388, 1964. Métivier, M. Sur la construction de mesures aléatoirs presque srment absolument continues par rapport 4 une mesure donnée. Z Wahrscheinlichkeitstheorie view. Geb. 20:332-344, 1971. 80 AUTHOR Ram C. Tiwari, Ph.D. Associate Professor Department of Mathematics University of North Carolina at Charlotte Charlotte, NC 28223 81 Three Estimators of the Probability of Opiate Use From Incomplete Data Alan J. Gross INTRODUCTION Drug testing in biologic fluids, especially urine, has become the usual method by which addicts in a treatment program are evaluated to determine whether they are adhering to the treatment regime in which they have been placed. Unfortunately, issues such as sensitivity and specificity of the various tests have caused difficulties in the past. The Council on Science Affairs (1987) has dealt with these important issues. Besides these issues of sensitivity and specificity, there are concerns about whether a random time or a fixed time sampling scheme should be used when collecting urine specimens from subjects who are involved in a clinical trial that is designed to test the safety and efficacy of a new pharmocotherapy for treatment of opiate dependence. It is the purpose of this chapter to consider three estimators of the probability that an addict tests positive for a particular opiate and compare these estimates for the ARC 090 data that were generated by means of fixed time sampling. The properties of these estimators will also be investigated, and some preliminary results will be given. Although other important issues exist in this area of research, such as random time sampling schemes, they are not specifically addressed in this chapter. ESTIMATORS OF THE PROBABILITY OF OPIATE USE In an effort to estimate the probability of opiate use by an addict during the time period in which he or she is enrolled in a clinical trial designed to reduce drug dependence, the following assumptions and definitions are required. Assume that an individual within a clinical trial is scheduled to present m times for testing of the presence of the opiate for which he or she is being treated. On 82 the ith visit, the random variables U; and A; are defined as Us = 1 the individual tests positive for the opiate, * 7 10 theindividual tests negative and Avi 1 the individual appears for the test and is still in the trial, * 7 1 0 the individual does not appear it =1,...,m. Itis noted that m can and does change from subject to subject, and in the data set that is considered when treatment groups are compared, once a subject has been censored from the trial, that subject never returns for any future testing. It is further assumed that (a) {Ui}, are Bernoulli random variables such that (i) pr{U; =1}=p,0< p< 1,andcorr(U;,U;) = pij, 0 < pij < 1, where in the first case JC py = pl, in the second case (ii) Pij = P, t # j, and in the third case, the correlation structure (iii) dp Hi=it] Pii =0 ifj>i+1 and (b) {A;}, are iid Bernoulli random variables such that pr{di=1)=a=, 04, di=l,.. (1) (2) am—=1. (3) The three correlation structures considered here deal then, respectively, with the following three scenarios: 83 1. There is correlation between all pairs of visits for a given individual. In this case, the correlation between successive visits is assumed to be greater (in absolute value) than visits that are more distant. The structure is the same as in the simple autoregressive model. 2. The correlation in this case is assumed to stay constant between all pairs of visits, successive as well as more distant, within an individual. This assumption may tend to be somewhat conservative. 3. The correlation between successive visits within an individual is assumed and is constant within individuals. It is also assumed constant from individual to individual. However, more distant visits are assumed to be uncorrelated. The dependence or correlation structure presented in this chapter differs, to some extent, from correlated binomial random variables that were considered in other applications. These earlier applications include correlated binomial models to predict the probability of rainfall on a given day realizing that the occurrence or nonoccurrence of rain on a given day depends on the occurrence or nonoccurrence on the previous day. Such models were developed by Gabriel (1959), Gabriel and Neuman (1962), and Klotz (1973). In the model considered by Klotz (1973), it can be shown that pli=il = [(p* — p)(1 =p)! where px = pr{X; = 1| X41 = 1}. As a second example in ophthalmology studies, a particular disease may be present in one eye, both eyes, or neither eye in a patient. Rosner (1984) considers a correlated binomial model in this situation because, clearly, absence or presence of disease in the two eyes of an individual is not independent from eye to eye. Finally, in this vein, Kupper and Haseman (1978) and Haseman and Kupper (1979) apply correlated binomial models to analyzing data within and among animal litters for which the responses are dichotomous, e.g., occurrence or nonoccurrence of a malformation. Consider now, the estimator for the probability of opiate use. Define P as P= fo Uridso/ Tro Bey BT 0, A 2, (a) 1, "id =08= = 8n=0 This definition indicates that an individual who is never present for a test to determine his or her drug abuse status is very likely, if not certain, to be still 84 abusing the drug. It should be noted that (1) this definition does not distinguish all sequences, for example, it does not distinguish 000111 and 111000; and (2) P represents an average across visits for each individual. Although, these are somewhat limiting, it is an initial attempt to deal with such binomial data in a relatively simple manner. The first principal goal of this chapter is to obtain E(P) and Var(P) under the three correlation structures as indicated by the three points listed above. This constitutes the next section. E(P) AND Var(P) Define V = Yi, Ai. Then, E(P)=E(P|V 2 Dpr{V > 1} + pr{V = 0}. Now, pr{V=0}=(1-m)" Furthermore, m Vv 5 Us A; Us E(P|V >1)=EvE ——|V>1| =EvE =£|V>1 (PlV2>1) vEan | LS 1V2 vm [3- 4 1v 2 where (Uj, , ...,U;,) is the vector of the u-values obtained by the subject on his or her v-visits, v > 1. Thus, E(P | V > 1) = pand E(P)=p[l-(1-m)"]+(1-m)" =p+q(l-m)". (5) Derivation of Var(P) i is considerably more complicated. Thus, some preliminary considerations are in order prior to obtaining Var(P) for the three cases that are dealt with in this chapter. Let the random vector U’ be a single m dimensional observation from a population whose cdf is Fy:(U '), thatis, U’ is multivariate in nature. Suppose, E(U?) < o0,i=1,...,m and let p;; = corr(U;, Uj), i # j and suppose, in total generality, E(U;) = p; and VarU; =o? i=1,...,m Assume, further, a hypergeometric sampling process such that (Uy, ,..., Ux, ) are sampled from (Uy, ..., Un), v < m without replacement. Then if S= or Ux; we may rewrite S as S = y_/2, U;I(U;) where ~ _ J 1 ifU; is selected in the sample Hey= 0 otherwise (6) The sampling process is assumed independent of Uy, ..., U,,. Thats, E(U:I(U;)) = EU) E(I(U:)). Note: E(I(U;)) = pr{I(U;) = 1} = v/m. 85 Thus, . EGIL) = pi Hence, E(S) the E(S?) = 7 (Suwa) = SUP) +2 UI (U:)I(U;) i=l i Ew I(Ui)) +2) D(U:U;)E(I(U:) E(1(U;)) i wu) - = en (So ) | i=1 i 1 yields bv2 = (2) (m- (25)} sm + wad bie (58 om In the second case, E(U;) = p, Var U; = pq and corr(U;, U;) = p, i # J, 0 < p < 1. Thus, one can show, without difficulty, ar (> Ui A; | . = vpq[l + (v — 1)p]. (13) i=1 Hence, Var(P |v > 1) = pglp + (1 = p)E(V1)]. (14) Finally, in the third case, E(U;) = p, Var(U;) = pq and corr(U;, Ui41) = p, 0i+1,i=1,...,m—2. Here, 2p Var(P |v > y=p {2 +(1- 2) BV}. (15) Define, generically, § = Var(P | v > 1) to represent Var(P | v > 1) for all three cases of interest. It then follows that, unconditionally, Var(P) = [0 + ¢*(1 - m)™] [1 — (1 = m)™]. (16) 87 Thus, to review, E(P) is given by (5) and Var(P) is given by (16). It can be shown without much difficulty that if 6 is given (12) or (15) lim, _, Var(P) = 0 implying that P is a consistent estimator of p assuming mw < 1. On the other hand, if 4 is given by (14), then consistency does not hold. Finally, if 8 is given by (12), it is noted that at p = 0, 0 =pgE(V™) (17) andatp =1, 0 = pq. (18) fp) = (+5) {m- (FZ). it can be demonstrated with some difficulty that f(p) is an increasing function of p, 0 < p <1 and so the bounds on Var(P) are [PeE(V™1) + ¢*(1 = m)™] [1 = (1 = m)™] < Var(P) <[lpg+PA-m)" [1-1-7]. (19) Consideration of f(p) is contained in the appendix. Finally, Mendenhall and Lehman (1960) show that m—2 By (==2) (mr — (1-7)! (20) and provides two significant figure accuracy for mm > 5. COMPARISON OF TREATMENT GROUPS The goal of this section is to develop a test of the hypothesis Hy: p; = pa, where p; is the probability an individual in the ith treatment group tests positively for the presence of the opiate. It is noted, in the example to be presented in the next section, that there are three treatment groups in question; therefore, multiple comparison methods are used in comparing the results among the three groups. In this section, the notation adds subscripts xj to represent the jth time the «th individual is tested within a treatment group. Let P;, be the proportion of the trials in which the «th individual tests positive on the ith treatment, k = 1,...,n;. Then, itis clear from (4) that ne 1K Diy 1 Bowl Say gp, v2 1 0) 1, Ve. =0. 88 Define P; = 3.7%, Pix /ni. Again, note that E(Pix) = pi + gi(1 — m;)™= (5) since within each treatment group itis assumed p;; = + = pim,, = pi. Thus, — Tm; Mix BR) = pia 3 fo” (19) n; If the same reasoning is followed concerning Var P;, one finds Var By = [0ix + 7 (1 — m)™*] [1 — (1 = m;)™x] (16%) where, generically, 0;, = Var( Pix | Vie > 1). Since PB; and P,,, are stochastically independent for k # «/, it follows that % [6ix +1 =m) 1 = (1 =m)" Var P; = > 22 : (20) It is easy to show that for all three cases, i.e., all three 0;,, Var B — 0 as n; — 00, t = 1,2, 3. Thus, p; is a consistent estimator of p; + ¢;a; where ng y = tm Lm a; = Jim > rr n=l In order to reduce the bias, let (1 ed Mix =P - 03 Gar & Aan (21) P? is termed the reduced estimator and its realizations are presented as reduced estimates in table 2, : = 1, 2, 3. The variance of (21) can be approximated by the delta method. Thus, (=a ~ ~ — Tr; iV VarP! = VarP{1 Rl dl arb; ar { + > | oN mig(1 = mmr | a ha m;, —. wi w= +Var#; §? {3 mare . (22) 89 Hence, to test for treatment difference, the following confidence intervals all at (1 — «/6) can be used: (3 = B}) % Zays\/ VarP; + Var Py (23) i # Jii,J =1,2,3, where Z,/6 is the «/6th percentile of the standard normal pdf. Thus, confidence intervals for p; — ps, p1 — ps, and ps — p3 can be constructed using (23) with overall confidence of at least (1 — a). ANALYSIS OF THE EXISTING DATA AND CONCLUSIONS The methodology that has been developed in this chapter is now applied to the double blind, three-armed controlled ARC trial. This trial was conducted to evaluate the efficacy of buprenorphine (arm 1), methadone 20 mg (arm 2), and methadone 60 mg (arm 3) in the treatment of opiate addiction. Data on only the first 17 weeks of the study were used to study how well the patients were maintained on the treatment drug. No analysis was performed on weeks 18 through 27. Table 1 shows the summary results for the three treatment groups. The correlation coefficient in each treatment group was estimated as an average (unweighted) of the serial correlations for the patients in that group. It should be noted from table 1 that there is no statistically significant difference among the 7's. That is, roughly 77 percent of all individuals presented urine samples in the first 17 weeks of the study. Furthermore, it can also be shown that all three of the correlation coefficients are not statistically significantly different from zero. However, it was decided to use the established values of p for illustrative purposes. TABLE 1. Estimates of p, mw, and p Treatment Group p iy p n Buprenorphine 0.483 0.773 0.087 53 Methadone 20mg 0.687 0.767 0.013 55 Methadone 60mg 0.564 0.786 0.133 54 90 TABLE 2. Estimates of the probability of opiate use Treatment Group p Var; P Var, P Varg P Buprenorphine (raw) 0.483 3.83-10-* 6.76.10-* 3.33.10~* Methadone 20 mg (raw) 0.687 3.80-10~% 4.15.10-* 3.51.10~% Methadone 60 mg (raw) 0.564 4.79-10-% 9.05.10~* 4.16.10* Buprenorphine (reduced) 0.468 4.05.10-* 7.08.10-% 3.52.10~% Methadone 20 mg (reduced) 0.685 3.87-10-% 4.22.10 3.57.10~% Methadone 60 mg (reduced) 0.558 4.90-10-% 9.25.10% 4.26.10~* Table 2 provides the raw and the reduced estimates of p;, i = 1, 2, 3, the probability of detecting the opiate in each of the three treatment groups. Also, the three estimates of variance are provided for the three different patterns of correlation that are assumed. Where Var; P assumes visits i and J have correlation pli=i I, Var, P assumes all visits, adjacent as well as nonadjacent, have correlation p and, finally, Varz P is such that adjacent visits have correlation p and nonadjacent visits have correlation zero. Overall, 95 percent confidence intervals for p; — py, k # I; k,1 = 1,2, 3 are then easily obtained for each pair of treatment differences. The formula used is (Pt — Bx) % Zo.0083\/ Var; fy Xs Var; jr, to ensure an overall 95 percent confidence, j= 1, 2, 3, the three variance estimates based on the three correlation patterns assumed. Note that 20.00833 = 2.400. As one would expect, the largest variance occurs when p is constant across all visits, has a value between the smallest and largest values when visits i and j have correlation pli=il, and is smallest when adjacent visits have correlation p and nonadjacent visits have zero correlation. If one examines the confidence intervals that are generated from table 2 (see table 3), it is clear that buprenorphine is superior to methadone 20 mg, regardless of the correlation pattern assumed. Furthermore, methadone 60 mg is clearly superior to methadone 20 mg. However, it is still not clear that buprenorphine is a better treatment regime than methadone 60 mg. However, the analysis is suggestive of this conclusion, since the only situation where the confidence interval contains the null hypothesis is when the correlation between 91 TABLE 3. Confidence intervals of the difference of the probabilities of opiate use Group Difference ci. cig. c.i3. Meth 20 - bup (raw) 0.137-0.271 0.124-0.284 0.141-0.267 Meth 20 - meth 60 (raw) 0.052-0.194 0.036-0.210 0.056-0.190 Meth 60 - bup (raw) 0.010-0.152 -0.014-0.176 0.015-0.147 Meth 20 - bup (reduced) 0.149-0.285 0.136-0.298 0.153-0.281 Meth 20 - meth 60 (reduced) 0.056-0.198 0.039-0.215 0.060-0.194 Meth 60 - bup (reduced) 0.018-0.162 -0.007-0.187 0.023-0.157 pairs of visits remains constant over all pairs of visits. If, however, this is the situation, then mathematically random testing vs. systematic testing does not make a great deal of difference, since fixed as well as randomly spaced times between visits will have the same correlation. It should be noted at this point that the true correlational structure between pairs of visits does not and should solely determine whether random or systematic testing is appropriate. If the sampling scheme is determined on the basis that an individual is still using a given substance during a given week or throughout the study period, then random sampling is likely appropriate to detect his or her use. However, if the extent of drug abuse is of importance, such as with the ARC 090 study, then capturing all the episodes of drug abuse is important and, hence, systematic sampling is likely to be more useful since drug-seeking behavior is not random. Finally, it is noted that carryover effects are probably quite important but are not considered here. APPENDIX Theorem: Suppose 0 glia ic YR 10 = m= [122]. (41) Then f(0) = 0 and f(1) = sl and f(p) increases inp, 0 < p < 1. Proof: f(0) = 0 is trivial. To find f(1) an application of L’ Hopital’s rule twice is needed. Finally, to show f(p) increases in p, it is noted that Inf(p) =Inp+2In(l—p)+In[(m—1)+p™ — p]. (A.2) 92 Thus, dln f(p) _ (m=1)[1 = pm] — (m+ 1)p[l — p"'] dp p(1 = p){m — 1+ p™ — mp} The denominator is positive for 0 < p < 1 since m(1 — p) > 1 — p™ for p in this domain. This is easily established by induction on m. Finally, it is necessary to establish that (m—1[1-p™] = (m+ 1)p[l- pm] 20 (4.3) form > 1and 0

¥, and m, > m,, take Rank (Y,,) < Rank (Y,). If the equal sample averages are below the overall mean, take Rank (Y,,) > Rank ( Y,) Treating ties in this manner is consistent with the ranking based on the shrinkage or Empirical Bayes estimates of the 6,s. This generalizes a method for breaking ties suggested by Sahlroot and Pledger (1991). Call this rank test T,. The assumption that 6, = 6, may be untenable. For example, the chance of a positive test may increase as the study progresses and subjects lose interest. Even in this case, the subject average Y;, provides an unbiased estimate of his or her average 6, over the complete visits. Therefore, tests of the hypothesis that HE [8,,(1-D,)| D,] = E16,,(1-0,,) | B,] can in principle be made by looking at either the ranks of the Y.s or the difference of the unweighted within-group averages. One problem with the former test is that a variance estimate requires additional structure to be put on the 6s. Although the test based on ranks can be used in a straightforward manner, Hf may be of limited interest if the distribution of the D,,s differs from that of the D,,s. For example, if 6, increases with j for both groups and subjects drop out early for one group, the group with the earlier dropouts will tend to have a lower test average. The test statistic could identify the better treatment as the one with the earlier dropouts even if later 6 6s were higher for the group with earlier dropouts. Since the rank test of Hf? may result in a misleading inference, it is necessary to examine whether the missed visits and dropout times differ between the two groups. Tests of equality of the pattern of missingness between the two groups should be calculated. Rank tests, tests of means, or logrank tests could be used. Informally, these tests could be used to see how meaningful the test of Hf? is. For example, if a similar pattern of missingness is expected to occur over time in the treatment groups, one could use a test of the difference in proportions of missing data in the two treatment groups, which is analogous to (1) with D,, replacing ¥,, and a suitable estimates of variance, say Om» replacing or- Call this test statistic M,. A Wilcoxon test based on D, 18 also considered; call this test statistic M,. 101 More formally, a test for missingness can be combined with a test of efficacy using a multivariate test (O’Brien 1984; Pocock et al. 1987). Here the null hypothesis is H®:(E [6,).E [n,,]) = (E [6,).E [;.]) and the alternative is HY:(E [8,).E [x,,)) > (E [0,1E [x,)) or (E [6,).E [xy]) < (E [B,L.E [r),]); that is, one treatment is better than the other with respect to both the proportion of positive tests and the proportion of missing tests. As an example, consider combining two rank tests, such as 7, and M,. O’Brien (1984) proposed ranking each outcome separately, as one would to perform a Wilcoxon rank sum test on each outcome and summing the ranks over the two outcomes. He then proposed calculating a Wilcoxon rank sum test for these sums. Call the resulting statistic 0,. In the case of more than two samples, O’Brien suggested ranking each outcome over all samples, forming the rank sums for each subject, and then using one-way analysis of variance on the sums. Alternatively, a Kruskal-Wallis test could be used on the sums. More complex combinations of test statistics may be formed using the method proposed by Pocock and colleagues (1987). To combine T, with M, in an O’Brien-type statistic, the correlation p(T,, M,) between T, and M, would need to be estimated. Pocock and colleagues (1987) gave an explicit reduction of the formula for O’Brien’s generalized least-squares statistic when endpoints are equally correlated and the within-group data are iid. The within-group data here are not iid, however, since the variance of ¥,, depends on m,. Estimation of the correlation between these test statistics requires further work. SIMPLE IMPUTATION If the assumption that 6, = 6, is untenable, an imputation of a specific value for each missing data point may be reasonable. We will call this simple imputation. One possibility is to replace missing responses with positive responses. This is appropriate if it seems likely that subjects would have tested positive if they had 102 been tested. Another rationale for this imputation is that it defines a new endpoint: missed test or positive test. This endpoint tests a new hypothesis HOE [n,,] = E [n,,), where n,, is the probability that the ith person in the kth group is positive or missing for the jth test. One could argue that both positive tests as well as missing data suggest failure of the program. An advantage of the simple imputation approach is that the analysis then proceeds as if complete data had been obtained, and (1) or its rank analog can be used. The test statistic T, is referred to as (1) with a value 1 imputed for missing values. MODEL-BASED IMPUTATION The basic idea here is to use a model to provide an accurate test of the original hypothesis H{":E [6;,] = E [8,,] even if individuals likely to test positive tend to drop out or if [) # 0. The atithors attempt to succinctly describe 6,, for each group with a model, estimate the parameters of the model separately i in each group, and then compare the estimate of 37/37, 8;/(n,m) to the estimate of Yr 8p n,m). Although this approach is heavily model based, one can allow for quite general effects of dropping out, missed visits, and other factors as long as they are correctly incorporated into the model. To justify our procedure in a simple setting, ignore the treatment identifier k and suppose that the following model holds: In On = B+p (4) 1-6, g Jad y = ay+a, By (5) 1-x, where Bis a random parameter with some distribution H with mean zero, and B, a, a, are fixed-effects parameters. In other words, each subject draws a random propensity for a positive test, B, from H, which also affects the probability of a missed visit. If a, is not zero, the missing observations are said to be informative with respect to the parameter of interest, B, (Wu and Carroll 1988). 103 Although maximum likelihood estimation could be performed using the (Y, D,)), a simpler approach can be argued as was done by Wu and Bailey (1989). Suppose the Ys are ignored. The model defined by (5) can be viewed as an Empirical Bayes model. The information from the ith individual is given by 2,0; or equivalently m; since m= m3, D,. For the ith individual and any prior H, the posterior expectation E By; Im) is increasing (decreasing) as a function of m, when a. is negative (positive). A proof of this result is presented elsewhere. In other words, if a., is positive, individuals who are likely to test positive are also likely to miss tests. This result suggests that a simple way to capture the information contained in is to use the model 8 In fh = B+p, + h(m) i where B, has a distribution H( ) with mean 0, and h( ) is some function that is allowed to be either increasing or decreasing, such as a polynomial, perhaps with restricted coefficients. The above argument justifies fitting a logistic regression model with random subject effects and other fixed effects that describe the missingness. Such models have been discussed in a more general setting by Pierce and Sands (1975), Stiratelli and colleagues (1984), and Follmann and Lambert (1989). The approach of Follmann and Lambert (1989) is used and H is estimated via nonparametric maximum likelihood, along with parametric estimates of the fixed effects. Under this approach, His assumed to follow a distribution with a finite number of support points. The number of support points is estimated by the data. For the problem at hand, consider the following model: 0, In 0 = By +Bout BM + Bal (6) -6, fork = 1,2. Note that the probability of a positive response is assumed free of j and the test derived from it is appropriate for the hypothesis HE [8,] = E [6,,]. Therefore, results from this procedure can be compared with other tests of the same hypothesis from the previous sections. 104 The numerator of the model-based test statistic is SLE [6,1in,-E AE [6,)/n, where the expectations are Empirical Bayes posterior expectations using the within-group estimates of the random effects distribution, the fixed effects, and each individual's data. The asymptotic variance of this test statistic can be estimated by the delta method, given a covariance matrix for the estimates. The authors use the observed Fisher Information (pretending that the number of support points is known) to estimate this covariance (Follmann and Lambert 1989). In general, trends in the 8s could be made to depend on j via a covariate, for example, polynomials of jor log(j). If the 6;s depend on j and this dependence is accurately summarized via the random effects model, the original hypothesis H{".E [8,] = E [6] can be tested, even with missing data. EXAMPLE A recent randomized clinical trial compared three treatments—buprenorphine, methadone at 20 mg (methadone 20), and methadone at 60 mg (methadone 60)—for their ability to reduce opiate use within a group of addicts. This section focuses on the buprenorphine and methadone 20 groups. Respectively, 53 and 55 subjects were randomized to these two groups. The methadone 60 group contained 54 subjects. Following randomization, urine tests were conducted three times per week for a total of 17 weeks. One subject was assigned to the buprenorphine group who never took a test. Although one might include this subject with some imputed value in an analysis of the treatments, she is excluded for simplicity. Figure 1 displays the proportions of positive tests over visits for the two groups. Buprenorphine is almost always better, and there seems to be little trend in the proportion positive. Figure 2 displays the proportion of missed tests over time. Both groups show increasing trends that seem fairly comparable. Figure 3 displays the scatter plot of Y, vs. 1-m/51, that is, the proportion positive vs. the proportion missed for the two groups. A moderate positive correlation (p = .51) is seen between the two in the buprenorphine group, whereas the correlation is less strong in the methadone 20 group (p = .18). Also note that subjects in the buprenorphine group who always test positive have many missed tests. Some subjects in the methadone 20 group who always test positive rarely show up. 105 3 = a 0 a c z o i Phd x 8 in £5) (Fe 2 [WV — a. Zz 8 0.2 M be 7) oh ~~ Buprenorphine —— Methadone 20 | Y a 0+ OW A J ET No et A yd Ee SE A 7 10 15 16 19 32 35 28°31 34 37 40 43 46 49 Visit FIGURE 1. Proportion of positive tests over time, by treatment Table 1 shows some sample statistics for the two groups. Burprenorphine has a lower average response proportion, a larger random-effects variance, and also a larger variance of 3Y./n. The latter discrepancy is influenced both by the larger 12 and the average response being closer to .5 for the buprenorphine group. The average proportion missing is somewhat larger in the methadone 20 group. Table 2 provides the results for the various tests discussed in the text. For all test statistic numerators, the methadone group result is subtracted from the buprenorphine group result. The first two tests provide very similar results for the hypothesis that the average 6, is the same for the two groups. The first test is obtained from the results of table 1. Both tests indicate that the buprenorphine group has a substantially lower probability of a positive test. For the rank test, the authors determined the number of times that the Empirical Bayes approach adjudicated tied observations. With no ties, each of the 107 observations forms a “cluster” of 1. For these data, there were 69 clusters, ranging in size from 2 to 25. For example, for vy = .5, there are four observations, one with m,= 4 in the buprenorphine group and 3 in the 106 Proportion Positive -£3- Buprenorphine —— Methadone 20 TTF T 3 3 A XY LUN CY 8 4 0 Og a i dred od vl XC ya Yh 0 1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 Visit FIGURE 2. Proportion of missed tests over time, by treatment methadone 20 group with m,= 4, 2, and 2. For ¥,.= 1.0, there were 25 observations. Following the Empirical Bayes method of breaking ties, there were 92 clusters. Interestingly, when the ties are not broken, the Wilcoxon rank test is -2.85. Thus, how ties are treated makes a difference here. Although the overall proportion missing is somewhat higher in the methadone group, a test of the difference in these proportions is not significant. However, this difference explains why the test statistic that imputes 1 for missing observations is higher than the analogous test without imputation. Burprenorphine is better both with respect to missing data and with respect to the proportion of positive tests. The O’Brien rank test shows that buprenorphine was better than methadone 20 simultaneously with respect to efficacy and missingness. In calculating this statistic, average ranks were used for ties. The final ranking had 80 clusters, 1 of size five, 4 of size three, and 15 of size two. It is not surprising that the value of this test statistic is smaller in absolute value than the rank test for the 107 0.91 + 40 0.8 0.7 #5, ob 0.5 o 2 8 + -+~ 0.44 - 0.37 0 o + Proportion Positive Oo 5 0.2 0 oO 0 OD orf 0 | © Buprenorphine + Methadone 20 | oO 0+== T T 0 0 0 03 0h S06 07 08 09 Proportion Missed FIGURE 3. Proportion of positive and missed tests, by subject proportion of positive tests. The difference in proportions of missingness, although not significant by itself, has a modest diluting effect. The test based on the model for informative censoring provides the smallest p-value of all tests of the hypothesis E [6,] = E[6,,]. It is substantially larger than the test based on the Y,,s. This is not surprising since tests that require more assumptions are generally more efficient. The estimated models for the two groups are presented in table 3. Using the Wald statistics, the missing data are seen to be highly informative for the methadone 20 group and not as informative for the buprenorphine group. Since subjects with fewer missing observations tend to drop out later, it is somewhat misleading to talk about the separate effects of m, and L, However, note that for the methadone 20 group, subjects with larger m; s (i.e., fewer missing observations) tend to have a lower proportion of positive tests. Subjects who drop out later are more likely to test positive. 108 The estimated average proportion of positive tests for the two groups in table 3 is quite close to the average of the Ys. However, the variance of ¥, Ele [6/n,is substantially smaller than Viz, Y, «Tl from table 1. As mentioned previously, the smaller variability is not surprising since the model introduces more “structure” to the data. However, the ratio of the estimated variances is similar for the two approaches. For simplicity, detailed comparison involved two groups. Finally, the authors use an evaluation of the three arms using the O’Brien rank test, which simultaneously tests equality of efficacy and missingness over all three arms. The Kruskal-Wallis chi-square test with two degrees of freedom had the value of 8.55 (p=.01). We then considered the three pairwise comparisons and used a Bonferroni correction to determine an (approximately normal) critical value of 2.39. Buprenorphine is better than methadone 20 (0, = -2.70), but not better than methadone 60 (0, = -.95). Methadone 60 was better than methadone 20, but not significantly so (0, = 2.15). TABLE 1. Some summary statistics for the buprenorphine and methadone 20 groups Group Buprenorphine Methadone 20 Statistic (n=52) {n=55) AM 49 69 42 11 .07 Viz,Y./n .0025 .0016 3,.D./n 48 58 Sample variance of D, 1163 1119 109 TABLE 2. Tests comparing the response proportions between the buprenorphine and methadone 20 groups Test Hypothesis Z Value Difference in average Y,,s with El6,]=E[0,) -3.05 random effects variance Rank version of above with Empirical Elo, =E[0,] -3.01 Bayes adjudication of ties or E68, -Dy)] = E[6,,(1 —Dy)l Difference in average D,s E [7] wl [ml -1.63 Difference in average Vs with missing E yl wif Ml -3.48 to 1 imputation O’Brien’s rank test (E,) Elx,)=(E[0,), Ex) -270 Difference in average E [68,)/n,s E[6,]=E[0,] -3.82 TABLE 3. Parameter estimates for the models of informative censoring. The estimated mixing distribution for the buprenorphine (methadone 20) group had 4 (2) points of support. Estimated Wald statistics are provided in parentheses. Group Effect Buprenorphine Methadone 20 B, 1.20 38 Bu 022 136 (-1.29) (-6.14) A -.021 131 (-1.14) (5.59) Y.E08,1/n, 48 68 Sample variance of Vz, E8,)/n,] .00133 .00116 110 DISCUSSION This chapter briefly introduces and illustrates several techniques that may be useful for dichotomous repeated measures with a substantial proportion of missing data. A more rigorous evaluation would be useful before definitive recommendations are made. Nonetheless, several points can be offered. The rank test with Empirical Bayes adjudication is an appealing procedure because it allows an unbiased robust comparison of two proportions as long as it is assumed that the probability of a positive test does not vary with j. The analogous test of means may be more substantially affected by Ys based on few observations. Furthermore, the means test requires some structure to derive a variance estimate. The simple imputation of missing to positive might be favored either if one felt that subjects were taking drugs on missed days or if a combined endpoint were thought reasonable. The attraction of the model-based approach is that, in principle, it can provide a test of the original hypothesis. The disadvantage to this approach is in the implementation. Issues of covariate selection and model fit need to be explored. Additionally, optimization requires some care due to the possibility of local maxima and numerical stability. For example, in the methadone 20 group, it seemed that an additional support point at — would slightly increase the log likelihood. However, this point was not included due to numerical problems with the information matrix. In general, there may be additional information to aid investigators in deciding how to deal with each specific missing datum. For example, some missing data might correspond to occasions when the subject was strongly suspected of using opiates. Such additional information can be incorporated into a combination procedure in which the imputation from missing to positive is made for a subset of the data. The other procedures discussed in this chapter could then be applied to the partially transformed data. However, it is important to recognize that no statistical procedure will improve the results from a trial with more than two-thirds of the endpoint data missing. Ultimately, the quality of evidence from such a trial is more like that of an observational study. 111 REFERENCES Follmann, D.A., and Lambert, D. Generalizing logistic regression by nonparametric mixing. J Am Stat Assoc 84:295-300, 1989. O’Brien, P.C. Procedures for comparing samples with multiple endpoints. Biometrics 40:1079-1087, 1984. Pierce, D.A., and Sands, B.R. Extra-Bernoulli Variation in Binary Data. Technical Report 46. Corvallis, OR: Oregon State University, Department of Statistics, 1975. Pocock, S.J.; Geller, N.L.; and Tsiatis, A.A. The analysis of multiple endpoints in clinical trials. Biometrics 43:487-498, 1987. Sahlroot, J.T., and Pledger, G.W. “Monitoring Plasma Levels in Response to Monitored Plasma Concentrations: Can Unblinded Staff Adhere to Objective Criteria?” Unpublished manuscript, 1991. Stiratelli, R.; Laird, N.; and Ware, J.H. Random effects models for serial observations with binary response. Biometrics 40:961-971, 1984. Wu, M.C., and Bailey, K.R. Estimation and comparison of changes in the presence of informed censoring: Conditional linear models. Biometrics 45:939-955, 1989. Wu, M.C., and Carroll, R.J. Estimation and comparison of changes in the presence of informative censoring by modeling the censoring process. Biometrics 44:175-188, 1988. Wu, M.C.; Hunsberger, S.; and Zucker, D. Comparison of changes in the presence of censoring: Parametric and nonparametric methods. In: Proceedings of the Biopharmaceutical Section of the American Statistical Association. Alexandria, VA: American Statistical Association, 1991. pp. 291- 299. ACKNOWLEDGMENTS Dr. Ram B. Jain provided the data; Mario Stylianou prepared the figures and analyzed some of the data; and Dr. Jack C. Lee reviewed the manuscript and provided useful comments. AUTHORS Dean Follmann, Ph.D. Mathematical Statistician Margaret Wu, Ph.D. Mathematical Statistician 112 Nancy Geller, Ph.D. Chief Biostatistics Research Branch National Heart, Lung, and Blood Institute National Institutes of Health Federal Building, Room 2A-11 Bethesda, MD 20892 113 Summary of Discussion: “Issues in the Analysis of Clinical Trials for Opiate Dependence” by Follmann, Wu, and Geller Ram B. Jain Dr. Jack C. Lee of the National Institute of Child Health and Human Development, National Institutes of Health, who reviewed this paper, expressed concern about the missing-at-random assumption implicitly made by the authors in some of the models used by them. The same concern was expressed by many other participants at one time or another. The assumption of missing at random is questionable since a missed visit might be dependent on opiate abuse during the days just prior to missed visits. Dr. Follmann replied that the model-based imputation method presented by him did not require the assumption of missing at random. The parameters a, and a, will be zero if the data are missing at random. | believe the distinction between a missed observation and a censored observation was lost during this discussion. An observation is considered to be censored when a subject permanently drops out of the study. For the censoring to be informative, total experience or abuse history of the subject till the time of censoring should play a role and should be investigated. A missed observation, on the other hand, is a temporary event. For a missed observation to be informative, a single or only a few events just prior to the missed visit should play a role. Dr. Lee also made a number of other suggestions that can be incorporated in the models to describe the phenomenon of drug addiction. He suggested that cyclic effects introduced by, for example, the pattern of drug abuse and their relationship with missed visits can be incorporated in the models, and the covariates that may affect b,, should be included in the models. He suggested that the whole patient population may be divided into four or five relatively homogeneous strata, and these strata can then be separately analyzed. These analyses may not need the assumption of missing at 114 random. Dr. Lee was also of the view that the total study, for example, may be divided into three periods and that each of these periods may be studied separately. This might help in studying the issue of compliance. Finally, he thought a goodness-of-fit test was lacking from the presentation. There was a rather involved discussion about imputation of missing values and the effect this may have on the inferences that are drawn. Dr. Hedayat was concerned about always imputing missing observations to a single value. Variations in patient characteristics across different areas may justify imputation to different values. Dr. Wright was concerned about one treatment being favored over another because of the specific imputation procedures used by the statistician. Dr. Fisher was in favor of some kind of sensitivity analysis where missing observations are imputed to different values in different treatment groups to describe various possibilities under a different set of imputed observations. However, Dr. Geller was of the view that you may come up with any conclusion when the missing (censored) data are as massive as in drug abuse trials. Another suggestion was to consider some of the multiple imputation procedures used by Dr. Don Rubin, Harvard University. This might give a handle on the variability induced by the imputation procedure itself. It was pointed out that one of the natural sets to select for multiple imputation would be the entire history of the patient. AUTHOR Ram B. Jain, Ph.D. Mathematical Statistician Biometrics Branch Medications Development Division National Institute on Drug Abuse Parklawn Building, Room 11A-55 5600 Fishers Lane Rockville, MD 20857 115 Analysis of Clinical Trials for Treatment of Opiate Dependence: What Are the Possibilities? Ram B. Jain INTRODUCTION One of the major primary outcome variables in clinical trials for treatment of opiate dependence is the frequency of drug abuse, that is, of opiates (primarily heroin), after therapy for opiate dependence has been initiated. Because the episodes of opiate abuse are not directly observable, an estimate for the frequency of opiate abuse is obtained from the urine samples collected with a prespecified frequency and tested for the presence of opiates and their metabolites. Hence, a data sequence of binary numbers for each addict is available for analysis. Analyses of these data present serious obstacles. To obtain the “true” estimate of the frequency of opiate abuse, it will be necessary that each positive urine sample represent an independent episode of opiate abuse. However, depending on the amount of opiate consumed by an addict during a given episode, it will not always be true. Two or more consecutive positive urine samples may represent the same episode of opiate abuse. In other words, there is a probability that treatment effect will be confounded with the carryover from one positive urine to another. It is difficult to estimate carryover, because the probability of carryover for a given addict varies from day to day and among addicts from one addict to another because of differentials in drug-seeking behavior. Consequently, using the available information on the kinetics of opiates, the frequency of urine samples is selected in such a way that the probability of carryover is minimized and the probability of being able to detect an episode of opiate abuse is maximized, note that the probability of carryover is not entirely eliminated. This is the first obstacle in analyzing these trials. In clinical trials among drug addicts, the dropout rate is unavoidably high, to the order of 80 percent in a placebo group. Also, even during the period the addicts stay in the trial, they miss about one in every five scheduled visits for 116 treatment. Hence, the number of missing or censored data points may be as much as or more than the number of available data points, which reduces the power of the statistical tests of hypotheses. In 15- to 20-week trials, urine samples may be collected up to three times a week. Hence, each addict may have 50-or-so data points for analysis. As such, the problem of analyzing these trials may be perceived as a 50-or-so dimensional problem. The selection of a powerful statistical method that will permit 50-or-so dimensions with sample sizes on the order of 150 to 500 patients with substantial missing data to detect “true” treatment differences is a serious challenge. Before the possibilities for analyzing these data are considered, it would be beneficial to understand the nature of treatment for opiate dependence. Agonist therapy for opiate dependence essentially constitutes replacing the abused opiate with another, most likely a synthetic opiate (called an opiate agonist or an opiate partial agonist), with relatively less potential for abuse. In replacement treatment, the next dose is given when the effect of the previous dose is about to wear off. If the next dose is not given in time, the addict is more likely to go out and seek the illegal drug of abuse. Overdosing amounts to exposing the addict to the addictive potential of the replacement opiate. Hence, each dose of the replacement opiate has its own pharmacological effect and may be considered as one unit of treatment. According to Blaine and colleagues (1981), replacement therapy “is intended to... achieve a more pharmacologically stable physiological state.” Each unit of replacement therapy, if successful, should lead to a physiological state that is pharmacologically more stable than with the previous unit of replacement treatment. Hence, attainment of a fully pharmacologically stable physiological state at which the addict does not seek the abused opiate and is ready for detoxification is going to be a gradual, one-step-at-a-time process. WHAT ARE THE POSSIBILITIES? Let Pi (i=0,1,2,... m)be the probabilities (table 1) of an addict using the abused opiate before entering the trial (i = 0) and after scheduled dose i (i > 0) of the replacement opiate j. If each unit of the replacement opiate is gepsistany successful and has no “reverse” therapeutic effect, P, ,,< P, i=0,..., m-1. However, because of missed doses and other factors, P; can assume any value between 0 and 1. Also, a data point s after scheduled dose s,s=2,...,m,foragiven addict may not be available because of the missed dose s or because he or she dropped out of the trial after dose n, n 0) is scheduled to be collected for the treatment group receiving the replacement opiate j. Again, not all data points are available because of addicts missing one or more of the scheduled doses before the urine sample k is scheduled to be collected or because of not providing one or more scheduled urine samples, for example, for nonvisits or because of dropping out of the study after providing u (u < k) urine samples. The most practical way to estimate P| is to assay urine sample k for the presence of the abused opiate(s) and/or its (their) metabolites. However, because of many reasons, more fully described in Jain’s chapter “Design of Clinical Trials for Treatment of Opiate Dependence: What Is Missing?” (this volume), the probability of a urine sample detecting an episode of opiate abuse depends on several factors, primarily the duration between the last episode of drug abuse and the time the current urine sample was obtained. Let Pj, 118 (table 1) be the probability of a urine sample k for treatment j being declared as positive for opiate. Then, in an experiment, the best that can be done is to estimate P,,; and hope that P,,; is the best available estimate of P,. There are at least three distinct possibilities for analyzing these data or estimating P,,. First, reduce the multiple data points for each addict to one and then use regular inference procedures to compare the efficacy of different treatments. For example, multiple data points obtained from urine samples for an addict may be reduced to a single data point defined as the proportion of positive urines, or alternatively, his or her overall profile/pattern of +/- urines can be classified by some rank order procedure as a single rank. Let this possibility be denoted as DATA-REDUC-1. If sequential performance of successive units of replacement opiate is of interest, estimates of P's can be obtained, trends studied, and a summary statistic Zw, P obtained to evaluate the program performance of the different treatments. Weights w,; can be defined in many different ways and are well documented in statistical literature. Let this possibility be denoted as ANAL- SEQ-UNIT. It will be in order here to clarify the major distinction between the summary statistic obtained from ANAL-SEQ-UNIT and the single statistic obtained from DATA-REDUC-1 procedures. Whereas the summary statistic obtained from ANAL-SEQ-UNIT procedures is adjusted for differentials in treatment performances and sample sizes over time, the single statistic obtained from DATA-REDUC-1 basically ignores these differences in treatment performances and sample sizes over time. However, the latter is simpler to compute and understand. Also, attention can be focused on only the positive urines, and a correlational structure between time to various positive urines or failures can be studied. In other words, data can be analyzed as a multiple failure problem. Let this possibility be denoted as MULT-FAIL. Some of these possibilities were explored in analyzing the ARC 090 data for buprenorphine vs. methadone 60 mg treatment, and some of these results are presented. DATA-REDUC-1 If the multiple data points are reduced to one data point for each addict, one of the first temptations would be to use some form of parametric or nonparametric analysis of variance. However, censored observations are not permitted in 119 analysis of variance, and then what is to be done with missing observations? Both missing and censored observations can be considered as “negative” or “positive,” as can some other combination of “negative” and “positive,” probably depending on the reason for missing and censored observations. But then there are at least as much “made-up” data as the real observed data. This is probably not acceptable to most analysts. If the proportion of positive urines, Py: is to be computed for addict v,v = 1,..., nin treatment jfor using parametric analysis of variance, the censored and missing observations may be excluded from the analysis; that is, a different denominator is used for each addict. This would violate the assumption that each subject in a given treatment group is drawn from the same population. In addition, since the probability Py of a positive urine for urine sample k varies with k, the single variable y derived from multiple data points will be the sum of u binomial variables with parameters n= 7 and P = Pag: Is y normally distributed and with what parameters? However, irrespective of theoretical objection, this possibility was explored for the ARC 090 data, and the results are given in table 2. No significant differences were observed. An additional problem with both parametric and nonparametric analysis of variance is that information about the pattern of positive and negative urines or temporal correlations is lost. Also, since the kinetics of the drugs are different, the information about the relationship between drug effect and time is lost. Survival methods that permit censored observations can be used with a little more confidence. However, in addition to the problem of missing observations, the definition of what constitutes a failure may be subjective, and depending on the definition used, the power of statistical procedure may become too low TABLE 2. Parametric analysis of variance results for the ARC 090 study (maintenance period only) Treatment Missing Values Mean p, Group Treated as N (SD) t Pp Buprenorphine Negative 48 .33(.26) -.46 .65 Positive 48 .54(.33) -.85 40 Excluded 47 .45(.35) -.29 78 Methadone 60 mg Negative 51 .35(.30) Positive 51 .60(.33) Excluded 47 .48(.36) 120 (because of too few failures) or the trial may be over too soon, thus making most of the data observed unused. For example, if the first positive urine is used as a measure of treatment failure, ARC 090 would probably be over in a week or so. For ARC 090, two consecutive Monday positive urines were used, starting with the fourth Monday of treatment as the measure of treatment failure; the results are given in table 3. No significant differences were found. Kaplan- Meier survival curves are displayed in figure 1. The number of failures in each group was 25. Another measure of treatment failure, that is, the beginning of first drug-free period of 28 days or more, was also used. The number of failures using this criterion was 13 in the buprenorphine group and 7 in the methadone 60 mg group. The results are given in table 4, and Kaplan-Meier curves are plotted in figure 2. As can be seen from table 4, the two statistics can give different results. Only the Breslow statistics provide significant results. Hence, depending on the definition of a failure, different methods of inference can give different results. Another possibility is being explored by Dr. John Harter, director of the Pilot Drug Evaluation Division of the Food and Drug Administration, in analyzing analgesic trials that use a combination of a sorting routine and a nonparametric rank sum test. The sorting routine first sorts all subjects by their pain intensities at time (sample) 1; then each distinct subgroup obtained after first sort is sorted by its pain intensity at time (sample) 2, and so on. After the last sort, the subjects are ranked according to their profiles in ascending or descending order. Then, ranks are summed for each treatment group, and a rank sum test to evaluate treatment differences may be used. This approach may also be tried for drug abuse trials. TABLE 3. Results of survival analysis of the ARC 090 study (maintenance period only) using two consecutive Monday positive urines starting with the fourth Monday of treatment as treatment failure 95-Percent Brookmeyer-Crowley Confidence Intervals in Days Mantel-Cox Likelihood Ratio for Median Survival Time for Chi-Sq (p) Chi-Sq (p) Buprenorphine Methadone 60 mg 48.0-90.0 35.0-77.0 0.50 (.48) .09 (.77) 121 1.0 +H + RH, svi sane it sales e - - H. =x - B. - - H. - .80 + or + - “Bes - - HB, - - . B - - wT Beas - - H. . - S - . B. - — H - £ 60 + . B + = - Ho. B. - - H. B - a - Bierrnns : ec - H. - 2 - H.sesnvs = = - , B.. - 2 Ha H.. - [J .40 + H.. + a - Hevovuroneas - - H.. Bovivvwvs - = BL yeh alee - - Bastions - - H, ~- +20 + + 6.0 + + = + + + + + hive oh * + dranet, int 10. 30. 50. 70. 90 110. 0.0 20. 40. 60. 80. 100. 120. Time (in days) to Two Consecutive Monday Positive Urines After Fourth Monday FIGURE 1. Kaplan-Meier survival curves for ARC 090 study B = buprenorphine, H = methadone 60 mg Consider six addicts on two different treatments who have their urine results on three samples as shown in table 5a. The subjects are first sorted according to their results on sample 1 as shown in table 5b. Subjects 1, 3, 5, and 6 have positive urines and as such are subgrouped first, followed by subjects 2 and 4, each of whom has a negative urine. If there were to be more than two distinct scores, the procedure would be the same. After the first sort, the 122 TABLE 4. Results of survival analysis of the ARC 090 study (maintenance period only) using the beginning of the first drug-free period of 28 days or more as a treatment failure Mantel-Cox Chi-Sq (p) Breslow Chi-Sq (p) Likelihood Ratio Chi-Sq (p) 2.95 (.09) 5.51 (.02) 2.94 (.09) 1.0 0.9 —-_ oY Love £ 2 08] hremensmin nonin deo > =] ® 5 07 z a o = Buprenorphine a 06] ----- Methadone 60 mg 0.5 > r - . . 0 20 40 60 80 100 Time (in days) to First Clean Urine Period of 28 Days or More FIGURE 2. Kaplan-Meier curves for ARC 090 study when treatment “failure” is defined as the first drug-free period of 28 days or more distinct subgroup of subjects 1, 3, 5, and 6 is sorted first by their results on urine sample 2, and then the second distinct subgroup of subjects 2 and 4 is sorted by their results on urine sample 2. This creates three distinct subgroups: subjects 3 and 6 with a (+,+) profile; subjects 1 and 5 with a (+,-) profile; and subjects 2 and 4 with a (-,+) profile (see table 5c). As shown in table 5d, each of these three subgroups is then sorted by results on the third urine sample, thus creating six distinct subgroups of subjects: subject 6 with a profile (+,+,+) ranked 1, subject 3 with a profile (+,+,-) ranked 2, subject 5 with a profile (+,-,+) ranked 3, subject 1 with a profile (+,-,-) ranked 4, subject 2 with a profile (-,+,+) 123 TABLE 5a. Results of three urine samples from a hypothetical trial Results of Urine Sample Patient Identification = Treatment Received 1 2 3 1 A ¥ - . 2 A - + + 3 A + + - 4 B - + - 5 B + - + 6 B + + + TABLE Sb. Results from a hypothetical trial after first sort Patient Identification Treatment Received Results After First Sort C4 + + + ANNO OTW= o>omo> > TABLE 5c. Results from a hypothetical trial after first two sorts Results After First Patient Identification Treatment Received Two Sorts 3 A + + 6 B + + 1 A + 5 A + * 2 A - + 4 B - + 124 TABLE 5d. Results from a hypothetical trial after three sorts Patient Identification Treatment Received Results After Three Sorts Rank 6 B + + + 1 3 A + + 2 5 B + + 3 1 A + 4 2 A - + + 5 4 B - + 6 ranked 5, and subject 4 with a profile (-,+,-) ranked 6. For subjects with the same profiles, average ranks can be calculated. The sum of ranks for treatment A = 11 may then be compared with sum of ranks for treatment B = 10. There are two problems with this approach. First, as with any nonparametric rank test, the magnitude of treatment differences on original measurement scale is not available. Second, this approach puts too much weight on the first observation. For example, subject 3 with a profile of (+,+,-) is given the rank of 2, whereas subject 2 with a profile of (-,+,+) is given the rank of 5. Both the subjects have two out of three positives, but they are considered (almost) opposite extremes in this approach. However, it may be possible to come up with certain variations of this approach that do not rely on the first observations so heavily. For example, if clinically acceptable, the first few results may be ignored or a ranking mechanism may be developed based on some combination of result profiles and number of positives. ANAL-SEQ-UNIT The first approach to explore this possibility would be to construct 2x2 tables for each urine testing opportunity and compute, for example, a Mantel-Haenszel z-statistic (Mantel and Haenszel 1959; Miller 1981). For the ARC 090 study, the z-scores are displayed in figures 3, 4, and 5 when missing values are considered as missing, negative, and positive, respectively. A consistent pattern of superiority of buprenorphine over methadone 60 mg is observed, except probably during the middle of the study period. However, the degree of superiority of buprenorphine seems to be decreasing over the first half of the study period and then increasing again during the second half of the study period. Some, but not all, of this is explainable based on the process of self- selection as the study progresses and because of different sample sizes at different times during the study period. The differences probably lie in different kinetics of buprenorphine and methadone. Methadone is probably catching 125 1.59 ] 0) 1.04 L) 0.54 Y LJ N d hes anf bo fp sea = (OE RLLL LLL LLL LLLELLLT - ; Q p 5 ) J B® 0544 . ) Bu : h Nn 0 ] -1.0 9 0 [ 1 J d . J -15 4 : i J H . Ld -2.0 1 L 0 O 0 -25 my TY 1 TT 0 8 16 24 32 40 48 Urine-Testing Opportunity FIGURE 3. Mantel-Haenszel z-statistic for each urine testing opportunity when missing values are considered as missing up with buprenorphine during the first half of the study period, as may be suggested from percent of positive urines at different times during the study, as seen from figure 6. However, a summary Mantel-Haenszel statistic cannot be validly calculated from individual 2x2 tables because this summary statistic does not account for correlations between individual 2x2 tables. How then can one calculate a summary index of program effectiveness? First, a simple though somewhat questionable alternative may be to score the direction of the relative efficacy of the two drugs for each time point or urine testing opportunity and use a binomial test to evaluate if, “overall,” one drug is more effective than the other. Another alternative may be to use a weighted summary statistic for correlated tables as described in Wei and Johnson (1985). However, the use of, for example, a 126 0.5 1 D0 Statistic MHZ -1.0 4 . r J eh rp mp gy 0 8 6 24 32 40 48 Urine-Testing Opportunity FIGURE 4. Mantel-Haenszel z-statistic for each urine testing opportunity when missing values are considered as negative 51-dimensional variance-covariance matrix as in the ARC 090 study with the data as sparse as they are, particularly during the last few weeks of the study, would certainly lead to some problems. For example, to solve such a huge and sparse matrix will be numerically difficult, and the dimension of the problem will adversely affect the power of the statistic. The use of parametric repeated measure of analysis is even more problematic. In addition to the inadmissibility of missing and censored observations, the degree of robustness of repeated measure analysis of variance to analyze binary data is unknown when there are as many repeated measures as in these studies. Also, these studies do not generate traditional repeated measures data. Each of the two consecutive repeated measures is interrupted by the administration of the replacement opiate and possibly the use of opiate of abuse. This is different when compared with using a new instructional 127 1.54 1.04 8 0 0.5 0 0 & 00 crm mccnccen—- - = $= = - mul. ———————— Statistic MHZ -20 25 Pr — T —— T 0 8 16 24 32 40 48 Urine-Testing Opportunity FIGURE 5. Mantel-Haenszel z-statistic for each urine testing opportunity when missing values are considered as positive method for several months and comparing the effects of the traditional and the new method over a period of time. It is not certain if the repeated measure theory is applicable to these data. At best, these data seem to be multiply interrupted time series data. MULT-FAIL Several authors have considered the problem of analyzing multiple failures under various configurations (e.g., failures of the same type over time or failures of different types at a fixed point in time and space). Recent work in this area has been done by several researchers (Lagakos et al. 1978; Hsieh et al. 1983; Prentice et al. 1981; Gail et al. 1980; Lawless 1987; Wei and Lachin 1984; Thall and Lachin 1988; Wei and Stram 1988; Wei et al. 1989; Lin 1990). Some of these authors, for example, Wei and Stram (1988) and Wei and colleagues (1989), used a regression-based approach, whereas others, such as Wei and 128 =={J== Buprenorphine eee fif«» Methadone 60 mg 704 Percent Positive Samples ersugy oh [I 50- My Rn Fwy iM i 1 401 250%: L Ni 30 4 20 4 10 4 0 fry 0 8 16 24 32 40 48 Urine-Testing Opportunity FIGURE 6. Percent positive urines for ARC 090 study Lachin (1984) and Thall and Lachin (1988), used multivariate versions of log rank (Mantel 1966) and/or the Gehan test (Gehan 1965) to analyze multiple failures. The regression approach of Wei and colleagues (1989), which is a multivariate version of Cox's proportional hazards model (Cox 1972), imposes the least restrictive structure on recurring events (failures) and thus is very appealing. The regression approach of Wei and colleagues (1989) will be an excellent choice if the number of failures in the model is limited and there are many subjects in the study. In the ARC 090 study, the number of subjects, 162 across three treatment groups, was probably sufficient to use this model, but each subject could also experience up to 51 failures in the 17-week maintenance phase of the study. Hence, to use this model, there was no choice but to use an algorithm that reduces the maximum number of failures 129 to 17. Even though this approach does permit censored observations, the missing observations must still be handled in some way. In fact, the algorithm used to reduce 51-dimensional data to 17-dimensional data more or less solved this problem, except when all three samples during a week were missing. A weekly index was developed for urine samples being positive or negative for opiates. If at least one of the three samples was positive or all samples for a given week were missing, that week was considered to be positive for opiates. Otherwise, that week was considered to be negative for opiates. Thus, the maximum number of failures was limited to 17 for this analysis. However, to avoid too many ties, the time (in days) to each failure used to compute various statistics was defined as the time to first positive urine or missing observation (if all observations were missing during a week) during the week in consideration. This algorithm does result in some loss of information, for example, one who has three positive urines during a week is treated the same way as one who has only one positive urine during that week. Hopefully, this loss of information will be random and uniform across different treatment groups and will result in a valid comparison. No formal statistical tests were done to verify this. Only one covariate—that is, treatment assignment—was used for analyzing ARC 090 data (1 = buprenorphine, 0 = methadone 60 mg). Thus, 17 regression coefficients, one for each week, were estimable. A joint test of hypothesis testing H,:B,=0,k =1,..., 17 was conducted. An estimate of common regression coefficient, Q = Zc, i=1,..., 17 was also obtained and tested for Q = 0. The weights c, were optimally calculated by the program MULCOX (Lin 1990). A negative regression coefficient indicates a decreased hazard rate for buprenorphine compared with methadone 60 mg, that is, a negative regression coefficient favors buprenorphine treatment. Also, a hazard ratio of less than one favors buprenorphine treatment. The hypothesis H,:B, = 0 was not rejected (Wald statistic with 17 degrees of freedom = 20.89, p = .23). However, the estimate of Q = Zc, was found to be significantly different than zero (Q = -0.294, p = .04), indicating an “average” superiority of buprenorphine over methadone 60 mg. The 95-percent confidence interval for the common hazard ratio of .746 was (.566, .983). The hazard ratios for each week are plotted in figure 7 indicating consistent superiority of buprenorphine. 130 0.8 1 0.7 4 0.6 1 Hazard Ratios 054 04 R T ” v T T 0 3 6 9 12 15 Study Week FIGURE 7. Hazard ratios for each week for ARC 090 study WHAT ARE THE PROBLEMS? The biggest problems in analyzing these data are: 1. The order of dimension (51-dimensional) 2. The sparseness of data 3. The problem of missing values None of these problems seems to be handled too well by any of the possibilities explored in this chapter. DATA-REDUC-1 methods do reduce the data to one dimension but at a tremendous cost—complete loss of information about correlational structures between various dimensions and more or less no ability to handle missing and/or censored observations. In fact, some of the DATA-REDUC-1 methods make no distinction between missing and censored observations. ANAL-SEQ-UNIT methods do handle one dimension at a time but have difficulty combining information unless some sort of miniature data reduction scheme can be implemented. MULT-FAIL methods do handle censored data, but heavy censoring causes loss of power, and dimension 131 of data must be reduced somewhat by using a miniature data reduction scheme. However, the possibly informative nature of censoring causes interpretational difficulties. The sparseness is either ignored or subjectively handled in DATA-REDUC-1 and ANAL-SEQ-UNIT methods. None of the methods has any ability to handle missing values without outside intervention. The solution may be to consider a missing observation as the third stage as discussed by Weng (this volume). Another possibility suggested by Dr. Gross of the Medical University of South Carolina is to consider a quadrinomial model with four categories—positive, negative, missing, and censored—and then to consider a conditional binomial model in which conditioning is on the later two categories. OTHER PRIMARY VARIABLES AND THEIR ANALYSES One of the other three primary variables of interest in these clinical trials is the retention rates in the treatment program. These data can easily be analyzed by any one of the survival analytic techniques. However, the problem of informative dropouts may have to be handled in some way. Some of the work in this area is due to Dr. Margaret Wu of the National Heart, Lung, and Blood Institute (Wu and Bailey 1988, 1989; Wu and Carroll 1988). One of the self-reported measures of drug abuse is the “craving” scores obtained periodically during the course of the study. Before entry into the trial (time 0) and at times i, i = 1, . . ., m, addicts are asked to report how much craving or need or desire they had during the last few days (e.g., a week or since the last time they visited the clinic) for the abused drug. Usually they are asked to “mark” the intensity of their craving or need on a 100-mm-long line called a craving scale, such as the one shown in figure 8. A score of zero means no craving, and a score of 100 means the most intense craving ever experienced. Let S, be the craving scores reported by an addict on treatment jattime i i=0, 1,... m. These data, like the urine data, have missing and censored observations. There are several ways to analyze these data. Regression analysis can be performed on either Sj(i=0,... moron Si- Sy (i=1,... m) and standard tests for p = 0 or B-B, = 0 can be performed. Alternatively, regression analysis for multiple failures based on proportional hazards model such as those described by Wei and colleagues (1989) can also be used. Another important outcome variable in the drug abuse trials is the physician's (or staff's or patient's) global impression of an addict's status with respect to his or her drug-seeking behavior at different time points during the study as compared with a previous time point or compared with his or her status at the 132 0 100 mm FIGURE 8. Craving scale used in drug abuse research time of entry into the study. Generally, these physician's (or staff's or patient's) scores are obtained on a 3- to 5-point rating scale. These data can be analyzed the same way as craving scores data, or in addition, change in the status can be evaluated by a one-sample or two-sample (Feuer and Kessler 1989) McNemar's chi-square test statistic. REFERENCES Blaine, J.D.; Thomas, D.B.; Barnett, G.; Whysner, J.A.; and Renault, P.F. Levo- alpha acetylmethadol (LAAM): Clinical utility and pharmaceutical development. In: Lowinson, J.H., and Ruiz, P., eds. Substance Abuse Clinical Problems and Perspectives. Baltimore/London: Williams & Wilkins, 1981. pp. 360-388. Cox, D.R. Regression models and life tables (with discussion). J Royal Stat Soc B 34:187-220, 1972. Feuer, E.J., and Kessler, L.G. Test statistic and sample size for a two-sample McNemar test. Biometrics 45:629-636, 1989. Gail, M.H.; Santner, T.J.; and Brown, C.C. An analysis of comparative carcinogenesis experiments based on multiple times to tumor. Biometrics 36:255-266, 1980. Gehan, E.A. A generalized two-sample Wilcoxon test for comparing arbitrarily singly-censored samples. Biometrika 52:203-223, 1965. Hsieh, F.Y.; Crowley, J.; and Tormey, D.C. Some test statistics for use in multistate survival analysis. Biometrika 70:111-119, 1983. Lagakos, S.W.; Sommer, C.J; and Zelen, M. Semi-Markov models for partially censored data. Biometrika 65:311-317, 1978. Lawless, J.F. Regression methods for Poisson process data. J Am Stat Assoc 82:808-815, 1987. Lin, D.Y. MULCOX: A computer program for the Cox regression analysis of multiple failure time variables. Comput Methods Programs Biomed 32:125- 135, 1990. Mantel, N. Evaluation of survival data and two new rank order statistics arising in its consideration. Cancer Chemother Rep 50:163-170, 1966. Mantel, N., and Haenszel, W. Statistical aspects of the analysis of data from retrospective studies of disease. J Natl Cancer Inst 22:719-748, 1959. 133 Miller, R.G., Jr. Survival Analysis. New York: Wiley, 1981. Prentice, R.L.; Williams, B.J.; and Peterson, A.V. On the regression analysis of multivariate failure time data. Biometrika 68:373-379, 1981. Thall, P.F., and Lachin, J.M. Analysis of recurrent events: Nonparametric methods for random-interval count data. J Am Stat Assoc 83:339-347, 1988. Wei, L.J., and Johnson, W.E. Combining dependent tests with incomplete repeated measurements. Biometrika 72:359-364, 1985. Wei, L.J., and Lachin, J.M. Two-sample asymptotically distribution-free tests for incomplete multivariate observations. J Am Stat Assoc 79:653-661, 1984. Wei, L.J,; Lin, D.Y.; and Weissfeld, L. Regression analysis of multivariate incomplete failure time data by modeling marginal distributions. J Am Stat Assoc 84:1065-1073, 1989. Wei, L.J., and Stram, D.O. Analyzing repeated measurements with possibly missing observations by modelling marginal distributions. Stat Med 7:139- 148, 1988. Wu, M.C., and Bailey, K. Analysing changes in the presence of informative right censoring caused by death and withdrawal. Stat Med 7:337-346, 1988. Wu, M.C., and Bailey, K.R. Estimation and comparison of changes in the presence of informative right censoring: Conditional linear model. Biometrics 45:939-955, 1989. Wu, M.C., and Carroll, R.J. Estimation and comparison of changes in the presence of informative right censoring by modeling the censoring process. Biometrics 44:175-188, 1988. ACKNOWLEDGMENT An earlier version of this chapter was reviewed by Dr. Alan J. Gross of the Medical University of South Carolina; his helpful comments led to certain useful changes in this version of the chapter. AUTHOR Ram B. Jain, Ph.D. Mathematical Statistician Biometrics Branch Medications Development Division National Institute on Drug Abuse Parklawn Building, Room 11A-55 5600 Fishers Lane Rockville, MD 20857 134 Summary of Discussion: “Analysis of Clinical Trials for Treatment of Opiate Dependence: What Are the Possibilities?” Ram B. Jain During my talk | made a comment about the difficulty of explaining certain statistical methods to clinicians. Dr. Gorodetzky thought the likelihood of explaining the details of some of the statistical analysis to clinicians is very remote. If the statisticians can agree on what is an appropriate method of analysis, a qualitative discussion or description of the method along with discussion of results in relation to analysis would be sufficient. For Dr. Fisher, war was too important to be left to generals. He would not hesitate to speak on clinical matters, and sometimes the best statistical ideas do come from the clinicians. Dr. Gorodetzky remarked, I think the real problem is sometimes we tend not to talk each other's languages, and we tend to be way out here clinically and way out here statistically. If we can come a little bit more towards the middle with a little bit of mathematical understanding from a clinician and a little bit of clinical understanding from the statistician, . . . there can be a very productive interchange. Dr. Geller found pictures (e.g., cumulative hazard plots) to be very useful in helping clinicians understand some complicated statistical concepts. One should not try to explain every little detail because it is really not important to clinicians. Dr. Geller found ranking methods to be a rich tool for analyzing multiple endpoints data (e.g., proportion positive and proportion missing) also. However, there may be some price to pay (e.g., loss in power) when parametric methods are applicable but nonparametric procedures are used. In addition, magnitude of treatment effects is not easily discernible when ranking methods are used. There are ways to go back to the original 135 unranked data, but they do not always work and may not always be desirable. Dr. Fisher did not think one should necessarily be tied to description of, for example, magnitude of treatment effect going along precisely with the specific test of hypothesis used to compute p values. AUTHOR Ram B. Jain, Ph.D. Mathematical Statistician Biometrics Branch Medications Development Division National Institute on Drug Abuse Parklawn Building, Room 11A-55 5600 Fishers Lane Rockville, MD 20857 136 Toward a Dynamic Analysis of Disease-State Transition Monitored by Serial Clinical Laboratory Tests* T.S. Weng INTRODUCTION In many clinical trials dealing with the monitoring and/or management of a chronic disease under medical treatment, it is customary to follow each patient up to some censoring time. The observations usually consist of longitudinal counts of patients in cohorts with common disease states identified by an ad hoc laboratory test repeatedly administered over a fixed sequence of time points. For example (see Jain’s chapter, “Analysis of Clinical Trials for Treatment of Opiate Dependence: What Are the Possibilities?,” this volume), in a randomized clinical trial (ARC 090) to evaluate the efficacy of buprenorphine for the treatment of opiate addiction, 162 qualified patients were put through a 17-week maintenance phase in three separate treatment groups: Group 1 was maintained on 8 mg of buprenorphine administered sublingually daily, and groups 2 and 3 were maintained on 20 mg and 60 mg, respectively, of methadone (positive control) administered orally daily. To evaluate the frequency of opiate abuse, all patients were asked to provide urine samples three times weekly on Mondays, Wednesdays, and Fridays. These samples were assayed to detect the presence of opiates (mainly heroin or morphine). A positive sample was defined as a possible treatment failure. Due to missed clinic visits or other reasons, 19.8, 17.7, and 17.7 percent, respectively, of urine samples from the three treatment groups were uncollected. Furthermore, the percentages of patients lost to followup in these groups were noted to run up to 60.4, 80.0, and 63.0 percent, respectively. If it were not for the massive, possibly nonrandom missing observations and loss of patients to followup, as well as for the seemingly time-dependent nature of the data encountered, this clinical trial could have been analyzed by the popular method of survival analysis using either the multivariate versions of Gehan’s log-rank tests (Gehan 1965; Peto and Peto 1972; Wei and Lachin 1984) or the generalized versions “The views presented here are those of the author. No support or endorsement by the Food and Drug Administration is intended or should be inferred. 137 of Cox's semiparametric, proportional hazard model for censored failure time with covariates acting as treatment responses (Prentice et al. 1981; Gail 1981; Wei et al. 1989; Lin 1990). The purpose of this chapter is to propose a stochastic compartmental model as an alternative for modeling the data generated from the aforementioned study, thereby evaluating the efficacy of buprenorphine against methadone in the treatment of opiate addiction. The plan of this chapter is as follows: First, a closed three-compartment system is introduced by which patients are classified according to their patterns and directions of response to medication during the course of treatment. This is followed by the introduction of a Markov process, which provides a natural context for addressing the problem of statistical dependence among successive observations and characterizes the dynamics of disease-state transition within the compartmental system. Based on the assumption that this synthesized stochastic compartmental model is piecewise stationary in time, an iterative weighted conditional nonlinear least- squares procedure is then developed to facilitate parameter estimation. The results are then applied to analyze the ARC 090 study to draw conclusions on the efficacy of buprenorphine treatment. Finally, a general discussion is given. COMPARTMENTAL MODEL The patient pool in the ARC 090 study can be partitioned into three cohorts or compartments (see figure 1 below) that are each numbered 1, 2, or 3 depending on whether they encompass patients who have tested negative (-) for opiates, positive (+) for opiates, or have missed the test with the potential for being lost to followup (M/L). In figure 1, the compartments are represented as boxes with arrows between boxes indicating the direction of disease-state transitions. Let N be the total number of patients and N,(t) be the number of patients in compartment i (i = 1,2,3) attime t > 0 and let A (t) denote the transition rate from compartment i to compartment j (i,j = 1,2,3) at time t > 0. Patients will then be included in different compartments depending on the results of their urinary tests or on whether they comply with the urinary test schedule. This compartmental system, within which all compartments (or states) communicate with one another, is regarded as closed in the sense that ZN,(t) = N at any time t > 0. The individual patients in the system are assumed to act independently without being influenced by others. STOCHASTIC PROCESS The dynamics of changes in disease states within this system may be described by a Markov process {X(t): 0 < t <=} defined on the state space S = {1, negative; 2, positive; 3, missing or lost to followup} with the associated transition 138 Aaa (£) [1] - N, (t) [2] + N;, (t) Az (t) Az (t) SA A3(t) As (t) (3] M/L N;(t) FIGURE 1. Schematic diagram of a closed three-compartment model with 4,(t) representing rates of transition between pairs of compartments attime t> 0 (ij=1,2,3) probability matrix (or transition matrix, for brevity) P(t,t,) = x (tL), 0st) < and the transition rate matrix (or rate matrix, for brevity) K(t) = [A,(t) 1, i : g where Putt) = Pr{X(t) = j| Xt) = i}, (3.1) 0 < 4,00) = lim P(t + 8L1/8t, j » i, and A,() = -T 4,0. (62) For the time being, let At) = A for all i,j so that {X(t)} becomes a stationary process with P(t,t)) being a function of t - t; only. Without loss of generality, therefore, it may be assumed that t; = 0, so one can simply write P(t) = P(t,0). Under these assumptions, the transition matrix P(t) is uniquely given by the Kolmogorov forward differential equation (d/dt)P(t) = KP(t) (8.3) with the initial condition P(0) = I, a 3x3 identity matrix. In the above equation, the rate matrix K is singular, as can be seen by the expressions in (3.2). Thus, the eigenvalues of K are given by 0, -a, and -f (0 < a, B < 1; a # B), where aB = (1/4 + A, + 2) = (12WV[(A1 + A+ A) - 4(y: + v2 + 7], (34 139 with &, = -A; (i = 1,23), 7; = AAq - Agha, Tp = AyAq - Aighg, and 1g = AA; - Apphy. The explicit forms of the elements of P(t) = [P;(t)] are then given by (Chiang 1980, pp. 416-426): Pl) = v/(ap) + YEA) axon) + TBlah) gpg «(a-p) pB-a) Po) = vo/(ap) + _12%% expat) + _ Ye Phat expipy), a(a-p) B(p-a) Pot) = vi/(ap) + _Y7%% 2 expat) + _¥Phe exppy), a(a-B) B(a-B) Pol) = va/(ap) + Y22B2D expat) + YerBladd gp ay), a(a-p) B(p-«) Pld) = vi/(ap) + _Y%*s expat) + __Y PAu oxppy), a(a-p) B(p-a) Poll) = v/(aB) + _Y2%%2 _ exp(-at) + _Y2 P42 exp(.pt), and a(a-p) B(p-a) Palt) = 1- P(t) - Px(t), i =1,23. 140 It is noted that i=) lim, ,P; {) = { (3.6) 0, otherwise, and that for all i, lim,P, (t) = y/(eB) =m, (j = 1,2,3), say. (3.7) tle ji The ms, known as the (asymptotic) state probabilities, are independent of the initial state i. It is further noted that the elements of the rate matrix K contain structural information about the process, for example, A" = expected length of time (or mean residence time) for a patient in state i to remain in that state. 2.8) Also useful for checking the validity of parameter estimates (see section titled “Analysis of the ARC 090 Study and Conclusion”) are the relations A+A, +A, =0 + (3.9) and Y+Y, + Y= 0B, (3.10) which follow immediately from expression 3.4. WEIGHTED CONDITIONAL NONLINEAR LEAST-SQUARES ESTIMATION Suppose that the Markov process {X(1)} is piecewise stationary (Faddy 1976) so that the transition rates K(t) = [A,(t)] may take on different sets of constant values for disjoint segments (time intervals) of {X(t)}. These stationary segments are in fact chosen to approximate the true process, which may be time dependent. The chosen segments should each contain a sufficient 141 number of observations to make the parameters (namely, the transition rates) statistically estimable. To fix the idea, let Kt) = K, fore,, n{t) log[v — v(1 — u — v)¥] k=1 k=1 3 + mii oglu + o(1 -u—v)f]. k a 3 CER — log(u + v)+ k=11i=0 j= log [w/v!™7 + (=1)"H uli (1 — u — v)¥] } where C is a constant. Estimates for each patient for the maintenance phase of the study are calculated. MLEs of u and v, denoted by # and 9, can be obtained numerically by the Newton-Raphson method. Therefore, p; = i/(4 + 9), estimate of the probability of using opiate of each patient, can be obtained. 164 For weeks 1 through 17, P,, MLE of the probability of using opiate for patients assigned with buprenorphine, methadone 20 mg, and methadone 60 mg have means (standard deviation) given by 0.4734 (0.3719), 0.6288 (0.3616), and 0.4970 (0.3157), respectively. These figures are listed in table 1. Note that patients who have only one urine sample result do not contribute any information in the study of Markov transition probabilities and, hence, are not included in the computation. Due to the fact that there are many cases where the length of the binary sequence is very short, weighted means of P, are calculated in table 2. The weights used in the calculation are the numbers of informative transitions, that is, transitions that do not begin or end with a missing value, in the sample sequences. The weighted means of P; are 0.3664 for buprenorphine, 0.6260 for methadone 20 mg, and 0.4854 for methadone 60 mg. MLEs of U, V, and P; vary widely. This phenomenon, however, may be attributed to the heterogeneity of individuals involved in the study. To take into account the variability among individuals within each treatment group, an empirical Bayes approach is undertaken. By adjusting Anderson and Goodman's (1957) results for complete data, procedures for testing the homogeneity for a data set with missing values will be investigated. Censored cases can be incorporated by modifying Aalen and Johansen’s (1978) method. This will be done in a subsequent paper. TABLE 1. Summary statistics for P1. The probability of using opiate. Method Number of Cases Mean (standard deviation) Buprenorphine # of cases=50 0.4734 (0.3719) Methadone 20 mg # of cases=53 0.6288 (0.3616) Methadone 60 mg # of cases=52 0.4970 (0.3751) TABLE 2. Summary statistics of P1. The probability of using opiate. Weighted mean. (The weights used in the calculation are the numbers of informative transitions in the sample sequences.) Method Mean (standard deviation) Buprenorphine 0.3664 Methadone 20 mg 0.6260 Methadone 60 mg 0.4854 165 LIMITATIONS AND DISCUSSION 1. In the section titled Incomplete Data, random variables U and V are assumed to have independent Beta priors. Similar empirical Bayes inferences can be derived if U and V are assumed to have dependent joint distributions. 2. An alternative method to incorporate missing values in a Markov model is to consider the missing value as a third state in the process. Thus, one can transform the problem into a three-state Markov model with complete data. In this case independent Dirichlet priors should be used, instead of independent Beta priors, for entries in the one-step transition probabilities matrix. It is necessary, however, to justify the randomness of the missing state. 3. For a better application of the Markov chain model for urine testings, a systematic sampling scheme with an equal number of days in between tests should be planned. This can be done by taking urine samples on Monday, Wednesday, and Friday for one week and then on Tuesday and Thursday the next week. Continuing to take samples in this 2-week alternating pattern will best use the proposed Markov model, which can handle missing observations for Sunday and Saturday in alternating weeks. REFERENCES Aalen, O.0., and Johansen, S. An empirical transition matrix for nonhomogeneous Markov chains based on censored observations. Scand J Statist 5:141-150, 1978. Anderson, T.W., and Goodman, L.A. Statistical inference about Markov chains. Ann Math Stat 28:89-110, 1957. Basawa, I.V., and Prakasa Rao, B.L.S. Statistical Inference for Stochastic Processes. London: Academic Press, 1980. Billingsley, P. Statistical Inferences for Markov Processes. Chicago: University of Chicago Press, 1961. Cox, D.R.. and Miller, H.D. The Theory of Stochastic Processes. London: Methuen, 1965. Keiding, N., and Gill, R.D. Random truncation models and Markov processes. Ann Stat 18:582-602, 1990. Martin, J.J. Bayesian Decision Problems and Markov Chains. New York: Wiley, 1967. Muenz, L.R., and Rubinstein, L.V. Markov models for covariate dependence of binary sequences. Biometrics 41:91-101, 1985. 166 ACKNOWLEDGMENTS Dr. Ram B. Jain and Dr. Peter A. Lachenbruch made helpful comments. AUTHOR Mei-Ling Ting Lee, Ph.D. Statistics Department Harvard University One Oxford Street Cambridge, MA 02138 167 Summary of Discussion: “A Markov Model for NIDA Data on Treatment of Opiate Dependence” by Mei-Ling Ting Lee’ Alan J. Gross In her chapter, Dr. Mei-Ling Ting Lee considers a two-state Markov chain to model urine sample test results in which the outcome is either positive or negative for the presence of a particular opiate. At times, study subjects do not appear for these tests, and as a result, missing values are present in the data. These missing values are taken into account by adjusting the likelihood function using Markov properties. The vector of observations on a given subject is assumed to form a sequence of Markov-dependent Bernoulli trials with time-independent transition probabilities. The Markov property allows formation of the likelihood function based on a sample sequence of test results of a given subject. Maximum likelihood estimates are obtained for the one-step transition probabilities, and then, assuming stationarity, an estimate of the stationary drug abuse probability of each subject is obtained. Since these transition probabilities vary among subjects, their variability is modeled by assuming that p prior- density functions govern their behavior. However, this aspect of the research has not been totally developed. Finally, an estimate for the stationary drug abuse probability is obtained over each of the three treatment groups. Some generalizations that may be considered include (1) development of a three-state model in which the third state is for the missed visits by subjects (this may be preferable to the present model since missing observations currently are treated by multiplying the transition matrix to the power corresponding to the number of consecutive missing observations) and * The chapter by Dr. Lee was reviewed by Dr. Jewell prior to the technical review. Dr. Gross reviewed the revised chapter submitted by Dr. Lee and wrote this summary of the discussion that took place at the meeting. 168 (2) relaxation of time homogeneity and equally spaced observations currently required in the model. Some difficulties that exist in applying this model are: » The Markov assumption—that is, the probability an individual tests drug-free on the current test depends only on whether he or she tested drug-free on his or her immediate past test—is questionable. One testing drug-free is likely to have tested drug-free on more than just one previous test, that is, several tests back. * Much information is likely to lie within the missing data. This aspect of the model development process is where future efforts should be placed. + If, indeed, the transition probabilities are time homogeneous, then times between transitions should be roughly exponential. Thus, a test for exponentiality should be considered to check this assumption. « Concern exists as to whether the data have been sampled for a sufficiently long period to assume stationarity. At this point, the assumption may be somewhat optimistic. Dr. Mei-Ling Lee's chapter describes an interesting and potentially useful method for analyzing clinical trial data with this structure. Modifications of the procedure presented herein, once they are implemented, will lead to an improved model. AUTHOR Alan J. Gross, Ph.D. Professor of Biostatistics Department of Biostatistics, Epidemiology, and Systems Science Medical University of South Carolina 171 Ashley Avenue Charleston, SC 29425-2503 169 Open/Panel Discussion: Analysis Issues Ram B. Jain Panel Members: Carol K. Redmond (chair), Lloyd D. Fisher, Dean Follmann, Joel B. Greenhouse, Alan J. Gross, and A.S. Hedayat The primary issues discussed during this discussion session were: » Statistical approaches to analyze urine data » Validity and importance of craving scores, physician/staff/patient global scores, and withdrawal symptoms and signs data * Treatment of missing/censored observations STATISTICAL APPROACHES TO ANALYZE URINE DATA The following statistical approaches to analyze urine data were discussed: 1. Parametric and nonparametric methods to calculate summary statistics, for example, estimate of p across clinic visits 2. Model-based, nonsurvival type methods, including those based on Markovian theory 3. Single- and multiple-failure survival methods It was mentioned that the statistical approach to analyze urine data would depend on the nature of the primary outcome variable in the study. A fair number of participants were in favor of analyzing these data by parametric model-based approaches (e.g., one proposed by Follmann et al., this volume), including those using Markovian theory because these models can provide for (1) explicit consideration of missing/censored observations, (2) inclusion of covariates representing design and population characteristics, and (3) consideration of other, for example, sociological variables, which may 170 help to better understand the total phenomenon of drug dependence. Dr. Hedayat was of the view that if we are interested in implications of our actions in the future, the model-based approach should be the approach of choice. There were others (e.g., Dr. Follmann) who thought that calculation of summary statistics, such as across-clinic visits, is simple and straightforward and is sufficient to go through the Food and Drug Administration (FDA) approval process. According to Dr. Follmann, if a model must be used, it should be restricted to modeling missing/censored observations. Correlations and/or interactions between other variables, for example, correlations between day (Monday vs. Wednesday) of visit and urine sample results, are not important enough to get the drugs approved by FDA. Modeling is more suitable to describe patterns of behavior. Dr. Gross preferred some combination of summary statistics such as p and a model allowing for covariates where a model would be superimposed on, for example, p. Dr. Greenhouse was interested in a problem-driven approach. According to him, heterogeneity is a serious and most important issue in these trials, and an approach that deals with these kind of issues, for example, one that is based on Empirical Bayes’ methods (Lee, this volume), would be preferable. Dr. Jack C. Lee reminded that drug addiction is a chronic condition. He thought summary statistics and modeling that analyze response are more appropriate for analyzing data generated by studies in acute conditions. In an area such as drug dependence, application of survival-type methods should not be summarily dismissed and should be considered as in other chronic diseases (e.g., cancer) where it is more important to study survival than response. For Dr. Follmann, it was more a matter of taste. If survival-type methods are to be considered for these trials, there would be many remissions and relapses. There was a fair amount of discussion about the pros and cons of a hypothesis- testing vs. a model-based approach for analyzing these trials. Dr. Follmann and others were in favor of formulating one or more (very) specific primary hypotheses that will help get the drug approved by FDA. Modeling is more suitable for describing the pattern of behavior. When reminded that these trials result in possible multiple endpoints, it was said that as long as hypotheses are prespecified, be it a (linear) combination of several things and/or variables, it would be in the best interest of getting a drug approved by FDA. However, if several endpoints have to be integrated, it should be described in advance how they would be integrated and the rationale for doing so. Dr. Hedayat thought that hypothesis testing generates binary (e.g., yes vs. no) results that are too restricted. He thought that scientifically it is more important to know, understand, and learn from the process/phenomenon and that this can be done 171 by modeling the process. It was said that FDA has to make a decision and that a physician, when treating an individual patient, must know whether the drug works. However, the information that hypothesis testing may provide to the physician may be too little. He or she not only wants to know whether or not the drug works but also how, for example, to titrate the drug for different patients (e.g., a child vs. an adult). Dr. Greenhouse noted that there is a place for Bayesian and decision theory in designing and analyzing these trials because these approaches result in meaningful measures such as the probability of clinically effective response. “These sorts of methods give us a formalism for doing sensitivity analysis to assess the robustness of our analyses,” he said. Understanding and learning from the process is important and necessary for future benefits, but a decision also must be made within the confines of what is known and knowable now. There was some discussion about what research should be done to develop better methods of analyzing these trials. It was felt that the process of missing/ censoring should be modeled. There was also a strong feeling to get more and better data on those who drop out. When it was mentioned that subjects who missed three consecutive visits in the ARC 090 study were dropped, Dr. Fisher said if they had wanted to come back they should have been welcomed back into the study. This would have generated more data. Dr. Rolley E. Johnson explained the practical difficulties in putting these people back in the study, including the problem of titrating them back to their assigned doses, the ethical problem of putting them back on opiate treatment if they have gone through the worst part of the withdrawal process, and the possibility of blind being broken during the period they were out of the trial. There is also the difficulty of analyzing data on these patients if they are reentered into the trial. Should the postreentry data on these patients be integrated with the rest of the data? Or should these data be analyzed separately, and if so, what conclusions can be drawn from these data? Should the last observation be carried forward for these patients for the purpose of analyzing the main data set? Some of the patients who dropped out may not want to come back or should not be allowed to come back at all because treatment was a failure for them or because treatment cured them and they did not need the treatment any more. A suggestion that incentives be built into the design to improve compliance (i.e., dropout rate) resulted in an involved, informative discussion. Doubts were raised about the appropriateness of introducing incentives to improve dropout rates. Incentives may affect the outcome variable itself, in which case it may not be known whether one is evaluating the effect of incentives or the treatment. Or there might be an incentive therapy interaction. This will depend on what the incentives are tied to. For example, if the incentives, particularly the financial incentives, are tied to producing a negative urine, 172 then it will result in informative missed visits (i.e., a clinic visit might be missed because a positive urine may be detected), or financial incentive, rather than treatment, might be the factor to promote absence from drug use. Dr. Fisher recognized that there might be an interaction between therapy and incentives and that incentive also may affect compliance, but, “nevertheless, a beneficial differential drug effect on the top of money would probably mean some sort of beneficial drug effect.” A concern that different treatment groups may somehow be imbalanced in terms of incentives received by them could be rectified if patients can be stratified “on some covariate which would reflect their propensity or likelihood to take a financial incentive,” said Dr. Follmann. Dr. Johnson thought that such incentives/contingencies have the potential to dilute the treatment differences, resulting in prohibitive sample size requirements. Dr. Peter A. Lachenbruch was concerned about using financial incentives because this money might end up being used for buying illegal drugs. Dr. Hedayat insisted on putting more emphasis on characterizing patient populations to blend the concepts of sampling and design and to come up with robust designs that minimize natural deficiencies, rather than, for example, create contingencies to strive for “noise-free” (e.g., dropouts) experiments. Dr. Gross mentioned that randomized response models that have a well-developed theory might be suitable for implementing Dr. Hedayat's recommendations. However, Dr. Greenhouse reminded that randomized response models would answer a very limited question in the context of clinical trials in the drug abuse area. VALIDITY AND IMPORTANCE OF CRAVING SCORES, PHYSICIAN/STAFF/ PATIENT GLOBAL SCORES, AND WITHDRAWAL SYMPTOMS AND SIGNS DATA Dr. Greenhouse suggested that some of these variables may be independent or explanatory variables, unrelated to the pharmacological effect of the treatment; as such, they would be time-varying covariates. If so, this presents challenging statistical analysis problems. Dr. Fisher expressed that physician/staff/patient global scores are very crude measures and often do not work. Dr. Michael Murphy replied that, the approval of Alzheimer’s drugs has now basically stopped in its tracks because no one has been able to show global improvement rating. . . . It depends on how one does the global . . .. They range from bad to worse, and there is a big controversy right now in the field as to how one might get at that very useful information. 173 As Dr. Charles Gorodetzky put it, these are difficult-to-measure behavioral endpoint variables. However, these are the variables that answer questions such as “what have you done to help the patient?” These are difficult issues, but they must be dealt with. A need was felt about some sort of global summary measure that can describe the status of patients in a global sense, for example, how he or she is doing after the trial is over. Dr. Nancy L. Geller pointed out that an optimal linear combination of multiple endpoints can be obtained; that is, a test of missingness can be combined with a test for efficacy. Weights assigned to different endpoints can be determined by experts in the field (e.g., cardiologists in cardiovascular trials). This methodology is very powerful and can detect relatively smaller differences. However, interpretation of these weighted summary measures may not always be easy. Dr. Donald R. Jasinski thought craving scores were not a valid measure of any form of chemotherapeutic effect and probably had nothing to do with the whole process of drug dependence. In addition, there was no agreed-on meaning of craving. “It is probably related to the environment and to learned behavior . . . ,” Dr. Jasinski said. Dr. Jasinski was also concerned about moving away from dichotomous measures to continuous or ordinal scale measures to do some sort of parametric analysis. He thought all these measures, such as global scores and symptoms and signs measures, could as well be measured on a binary scale, which probably would be as meaningful as measuring them on a continuous or ordinal scale. Drug dependence is a chronic relapsing disorder. As in other trials of chronic diseases (e.g., oncology trials), a partial or complete remission (as determined by urine data) is good enough to show that the drug is efficacious. The key issue in these trials is to establish pharmacological efficacy. It is very difficult to show differences in behavioral outcome measures. TREATMENT OF MISSING/CENSORED OBSERVATIONS | asked the panel which of the several methods of treating missing/censored observations presented in the meeting is more acceptable than others. The methods presented for consideration were estimate, substitute (e.g., by 1), model, and ignore. Dr. Lee observed estimate, substitute, and ignore to be different forms of modeling, and as such, modeling and missing at random are the only alternatives. Dr. Lachenbruch wanted missingness to be considered as a response by itself. 174 Dr. Fisher recommended doing a sensitivity analysis where different values for missing/censored observations on a range from zero (a negative sample) to one (a positive sample) are assumed and different values for different treatments are assigned. Thus, for each pair of treatment arms, a probability matrix can be constructed. Each cell of this probability matrix would provide the significance level, or p value, available from test of treatment differences when specific pairs of values attached to that cell are assigned to missing/censored observations for the two treatment groups. From this matrix, a set of pairs of values (assigned to missing/censored observations) for which one treatment can be inferred to be better than others can be determined. A professional judgment then can be made as to whether, under reasonable assumptions for missing/censored data, one treatment is better. This approach provides for scenarios under which conclusions could be different. AUTHOR Ram B. Jain, Ph.D. Mathematical Statistician Biometrics Branch Medications Development Division National Institute on Drug Abuse Parklawn Building, Room 11A-55 5600 Fishers Lane Rockville, MD 20857 175 Open/Panel Discussion: General Issues Ram B. Jain Panel Members: Peter A. Lachenbruch (Cochair), Jack C. Lee (Cochair), Joseph Collins, Lloyd D. Fisher, Sudhir G. Gupta, Nicholas P. Jewell, Michael Murphy, Vincent Shu, and Ram C. Tiwari INTRODUCTION The issues primarily discussed during this session were: * Need for multiple outcome criteria to deal with various aspects of drug addiction * High and differential dropout rates and their influence on estimation of treatment effect * Alternate study designs: comparative dose trials and “enrichment” designs NEED FOR MULTIPLE OUTCOME CRITERIA TO DEAL WITH VARIOUS ASPECTS OF DRUG ADDICTION Almost all the participants at the meeting agreed that there is a need for multiple outcome criteria in these trials. However, what should be measured should be clearly defined. Dr. Murphy would definitely incorporate urine screens as a measure in his studies, particularly because it would be easy to standardize this measure across the centers. However, this would not be the key, pivotal outcome in his studies because this measure is of “questionable clinical relevance.” He would like to study the impact of urine screens “on the sites we select, the patients we enroll, the numbers of patients we enroll, and how often we can retain them.” If the patients become free of opiates but substitute opiates by using other illicit drugs like Valium, cocaine, amphetamine, or marijuana, he would consider this a failure of (opiate) treatment. This substitution of other drugs for opiates should be factored into data analysis. 176 Dr. Gupta warned against having too many outcome measures because correlations between these measures must be considered in the analysis. Too many measures make the problem multidimensional and too complicated. Obtaining a clear, solid conclusion from the trial is important. However, because the effect of (opiate) treatment drugs lasts for only a limited time—24 hours or so—he was in favor of analyzing data collected on each day separately. According to Dr. Jewell, the primary interest should be evaluating whether the patient has improved since the treatment began, and as such, treating multiple observations from a patient as anything more than a single data point should not be done. There was no consensus at the meeting about how to handle the data on multiple outcome variables. Dr. Murphy was in favor of integrating data from urine screens with other measures into an overall index of disease severity. He liked the idea of collapsing across measures. Dr. Collins wanted each measure to be analyzed separately because, in general, investigators may be interested in knowing exactly on which measures what the direction and magnitude of treatment differences are, if any exist. A drug may reduce the opiate abuse but may have a problem in retaining patients on the treatment. On the other hand, a drug may not be so effective in reducing the frequency of drug abuse but may be very effective retaining patients on the treatment. The investigator should be the one to make a final decision about the usefulness of the proposed treatment. However, as Dr. Fisher pointed out, if there are too many outcome measures, it is more than likely that the direction and magnitude of treatment differences on these variables will be different at different levels. This could be a potential source of confusion and indecision for the investigator. It would only be helpful if a composite measure is obtained from different outcome measures. The analysis of this combined measure should be helpful in deciding if the drug has an overall effect. Subsequently, individual measures can be analyzed and evaluated. | suggested that an investigator be asked at the beginning of a trial about the performance required on each outcome measure for the proposed drug to be minimally useful and/or successful. For example, for the proposed drug to be useful and/or successful, patients must participate in the study for at least 4 weeks, should have at least 75 percent negative urines, and should have a craving score of no more than 50 on at least 75 percent of their clinic visits. The performances on different measures for each patient can then be combined to obtain some sort of combined score without statistical manipulations. These combined scores can then be subjected to formal analysis. 177 Dr. Fudala did not think a single composite score would be clinically meaningful. Dr. Shu expressed the need for a more focused study and the need for defining a primary outcome measure, for example, the reduction in incidence of drug abuse. Other secondary measures, such as rating scales to measure patients’ physical or psychological dependence on drugs, can be developed. Craving scores, a nonthreatening measure, can also be used. Correlations between craving scores and incidence of drug abuse can be studied. The bias in obtaining some of the soft measures can be reduced and/or eliminated if blinded evaluating physicians, separated from treating physicians, take these measures. As Dr. Geller put it, there are likely to be situations where combining several measures (test statistics) would be appropriate, and in some cases, it would be more appropriate to have one primary and several secondary measures (statistics). HIGH AND DIFFERENTIAL DROPOUT RATES AND THEIR INFLUENCE ON ESTIMATION OF TREATMENT EFFECT Almost all participants were concerned about the high dropout rates in these trials. Dr. Jewell thought, with that much missing/dropout data “one can say anything about treatment effect.” In “a trial of this kind . . . where 80 percent of the patients on placebo were not available with regard to any kind of outcome or restricted . . . outcome, there is just no way one would be able to make sense of what that meant with regard to an active drug . . . .” He advised that pilot studies that concentrate on the ability to collect outcome data of specified kinds be conducted so that, for instance, specific patient populations available for specific treatments and followup can be identified. There was an agreement that high and differential dropout rates for different reasons in different treatment groups can compromise estimation of treatment effects. For example, the drugs may “cure” certain patients, and as such, they do not need treatment anymore (and they drop out). In other cases, the drug may be a failure, and as such, the patients do not come back. Some patients may drop out because of toxicity even if the drug was working for them. Other times the pattern of dropouts may be related to covariates such as age, sex, and marital status. In the opinion of Dr. Gupta, dropout rates may contain information about the treatment effect, and as such, dropout rates should be analyzed as a separate outcome variable in addition to other variables related to the improvement in the condition of patients. Dr. Jack C. Lee also said, “. . . if the missingness has any valuable information for the relevance of the outcome variable, which | think in 178 this case is, then make it a part of the analysis . . . .” According to Dr. Murphy, only legitimate discontinuation from the study is due to toxicity. Dropouts have “only a fraction to do with the therapeutic intervention in place. . . . it is an issue of staff investment.” Clinical staff should prevent that from happening. Dr. Johnson commented that an administrative dropout after three consecutive missed visits may be one of the reasons for a large dropout rate in the ARC 090 trial, but “. . . when you don’t put a restriction on with medications that have physical dependence-producing properties you bring in a lot of complications that | don’t know how you deal with . . .." Dr. Follmann suggested that efforts could have been made to collect urine samples from the patients even after they were out of the trial. Dr. Johnson retorted that they (patients) could have been paid for coming back to provide urine samples but that a provision like that would constitute payment for dropping out of the trial, which brings in an ethical problem because these payments could be used to buy drugs of abuse. Dr. Johnson remarked, “. . . how do you get people back when they did not come to begin with?” Missed visits is an inherent phenomenon in these patient populations. It was suggested that sensitivity analysis recommended by Dr. Fisher and the approaches proposed by Drs. Weng (this volume) and Follmann and colleagues (this volume) are promising tools to adjust for missing/censored data. ALTERNATE STUDY DESIGNS Comparative Dose Trials Most of the participants recognized the usefulness of comparative dose trials. However, two issues must be dealt with in these kinds of designs. First, the selection of doses should be done carefully. As Dr. Murphy said, “Early in the development especially, it is very difficult to judge where one is on the dose effect curve. One could be asymptotically too high or too low.” Dr. Shu suggested the physicians should be allowed to titrate in the first studies to select the doses they feel comfortable with, and these selected doses can be used in dose comparison trials. The second difficulty in doing these kinds of trials lies in that, in the neuropsychiatric area, the effect sizes are very small. Relatively large sample sizes may be required to detect differences between these small effects for different treatment groups. However, Dr. Fisher was, in general, not in favor of doing forced dose escalation studies. 179 “ENRICHMENT” DESIGNS It would be of interest to physicians to minimize the damage to those who cannot benefit from a proposed new therapy. Those who can benefit probably can be identified from those who cannot in a double blind trial where everybody receives medication. Then, in a random double blind second (e.g., dose comparison) trial, or probably the second phase of the same trial, only the responders can be studied. At the request of Dr. Donald F. Klein, professor of psychiatry at the College of Physicians and Surgeons of Columbia University, | asked the participants to comment on the usefulness of such designs in the drug abuse area. Dr. Jewell thought the main idea behind these kinds of designs is to identify subgroups based on some sort of covariate information that amounts to evaluation of interaction effects. If so, these trials could result in large sample size requirements. Dr. Fisher said that these trials are initiated when physicians intuitively feel that some substrata of patient population are biologically and/or psychologically so different from the others that for some there is no hope, whereas others can respond. This is an efficient way to show activity. As Dr. Jack C. Lee said, these designs can be useful in determining the profiles of the drug addicts who respond. This information can then be used to determine eligibility for future studies. These designs can be used to demonstrate heterogeneity between responders and nonresponders, but this heterogeneity, if present, has to be rooted, in Dr. Murphy's opinion, in some sort of biological substrate. It should flow from theory into practice. Several participants expressed reservations and struck a note of caution about the use of “enrichment” designs. The analysis of these designs is difficult because of, among other things, possible carryover effects. The interpretation of results obtained by analyzing these trials presents serious challenges. Dr. Vocci presented two examples where one can incorrectly conclude that an alternate medication is ineffective. “If you give an agonist and find out who responds to the agonist, and then you are testing an agonist in an enrichment design vs. a partial agonist, what you may be doing is undermedicating the patients with partial agonist and making it look worse . . . . A study was done in which anxiety patients were tested for benzodiazepine receptor, benzodiazepine sensitivity, if they were actually helped by benzodiazepine, and then they were randomized to the same benzodiazepine or a nonbenzodiazepine, and what happened was the patients who were randomized to the nonbenzodiazepine had this mild withdrawal, which was not helped by the new agent, and many of them dropped out as a result. . . . This withdrawal syndrome actually exacerbated their anxiety.” Dr. Greenhouse presented a similar example from a trial done in the psychiatric area. 180 Consequently, although enrichment designs do have a place in the drug development process (particularly when a new chemical entity is being studied), where a post hoc analysis of the data will be meaningful in identifying easily discernible (by the average physician), clinically relevant baseline variables that separate responders from nonresponders, prospectively, that “good” group should be studied in a traditional clinical trial. AUTHOR Ram B. Jain, Ph.D. Mathematical Statistician Biometrics Branch Medications Development Division National Institute on Drug Abuse Parklawn Building, Room 11A-55 5600 Fishers Lane Rockville, MD 20857 181 Technical Review Committee Meeting: Statistical Issues in Clinical Trials for Treatment of Opiate Dependence Medications Development Division National Institute on Drug Abuse December 2-3, 1991 List of Participants Jack D. Blaine, M.D. Chief Treatment Research Branch Division of Clinical Research National Institute on Drug Abuse Parklawn Building, Room 10A-30 5600 Fishers Lane Rockville, MD 20857 Robert J. Chiarello, M.D. Medical Officer Treatment Research Branch Division of Preclinical Research National Institute on Drug Abuse Parklawn Building, Room 10A-30 5600 Fishers Lane Rockville, MD 20857 Joseph Collins, Sc.D. Chiet Cooperative Studies Program Coordinating Center, 151E Department of Veterans Affairs Medical Center Perry Point, MD 21902 Edward J. Cone, Ph.D. Chief Laboratory of Chemistry and Drug Metabolism Addiction Research Center National Institute on Drug Abuse P.O. Box 5180 Baltimore, MD 21224 Sandra L. Dickerson, B.S. Medical Technologist Addiction Research Center National Institute on Drug Abuse P.O. Box 5180 Baltimore, MD 21224 Lloyd D. Fisher, Ph.D. Professor Department of Statistics SC-32 University of Washington 19220 64th Place, N.E. Seattle, WA 98155 182 Dean Follmann, Ph.D. Mathematical Statistician Biostatistics Research Branch National Heart, Lung, and Blood Institute Federal Building, Room 2A11 Bethesda, MD 20892 Paul J. Fudala, Ph.D. Assistant Professor Department of Psychiatry University of Pennsylvania School of Medicine and the Department of Veterans Affairs Medical Center Building 15 University and Woodland Avenues Philadelphia, PA 19104 Nancy L. Geller, Ph.D. Chief Biostatistics Research Branch National Heart, Lung, and Blood Institute Federal Building, Room 2A11 Bethesda, MD 20892 Albert J. Getson, Ph.D. Associate Director Merck Clinical Biostatistical Research Labs BL 3-2 West Point, PA 19486 Harold Gordon, Ph.D. Project Officer Division of Epidemiology and Prevention Research National Institute on Drug Abuse 5600 Fishers Lane 615 Rockwall Il Rockville, MD 20852 David A. Gorelick, M.D., Ph.D. Chief Treatment Research Branch Addiction Research Center National Institute on Drug Abuse P.O. Box 5180 4940 Eastern Avenue Baltimore, MD 21224 Charles W. Gorodetzky, M.D., Ph.D. Executive Director CNS Drug Development CIBA-Geigy Corporation 556 Morris Avenue Summit, NJ 07901 Joel B. Greenhouse, Ph.D. Associate Professor Department of Statistics Carnegie-Mellon University 5000 Forbes Avenue Pittsburgh, PA 15213-3890 Alan J. Gross, Ph.D. Professor Department of Biostatistics, Epidemiology, and Systems Science Medical University of South Carolina College of Graduate Studies 171 Ashley Avenue Charleston, SC 29425-2503 Charles V. Grudzinskas, Ph.D. Director Medications Development Division National Institute on Drug Abuse Parklawn Building, Room 11A-55 5600 Fishers Lane Rockville, MD 20857 Sudhir C. Gupta, Ph.D. Associate Professor Statistics Department Northern Illinois University DeKalb, IL 60115-2888 A.S. Hedayat, Ph.D. Professor Department of Mathematics, Statistics, and Computer Sciences, MC249 University of lllinois P.O. Box 438 Chicago, IL 60680 John Hyde, M.D., Ph.D. Medical Officer Pilot Drug Evaluation Division, HFD-007 Food and Drug Administration 5600 Fishers Lane Rockville, MD 20857 183 Ram B. Jain, Ph.D. Mathematical Statistician Biometrics Branch Medications Development Division National Institute on Drug Abuse Parklawn Building, Room 11A-55 5600 Fishers Lane Rockville, MD 20857 Donald R. Jasinski, M.D. Chief Center for Chemical Dependence Francis Scott Key Medical Center 4940 Eastern Avenue A-1-East Baltimore, MD 21224 Nicholas P. Jewell, Ph.D. Professor of Biostatistics University of California at Berkeley School of Public Health Berkeley, CA 94720 Rolley E. Johnson, Pharm.D. Assistant Professor Department of Psychiatry and Behavioral Sciences Johns Hopkins School of Medicine Building G, Room 2725 B5510 Nathan Shock Drive Baltimore, MD 21224 William F. Krol, Ph.D. Chief Statistical Programming Section, 151E Department of Veterans Affairs Medical Center Perry Point, MD 21902 Peter A. Lachenbruch, Ph.D. Professor of Biostatistics University of California, Los Angeles School of Public Health Los Angeles, CA 90024-1772 Young Jack C. Lee, Ph.D. Chief Biometry and Mathematical Statistics Branch National Institute of Child Health and \ Human Development National Institutes of Health Executive Plaza North, Room 630 Bethesda, MD 20892 Mei-Ling Ting Lee, Ph.D. Assistant Professor Mathematics Department Boston University 111 Cummington Street Boston, MA 02215 Shou-Hua Li, Ph.D. Statistician Biometrics Unit, EODPP Westwood Building, Room 533 National Institute of Dental Research Bethesda, MD 20892 Michael Murphy, M.D. Medical Director Neurosciences—SBU Hoechst Roussel Pharmaceutical, Inc. Route 202-206 North Sommerville, NJ 08876-1258 Taesung Park, Ph.D. Mathematical Statistician National Institute of Child Health and Human Development National Institutes of Health Executive Plaza North, Room 630 Bethesda, MD 20892 Carol K. Redmond, Ph.D. Professor Department of Biostatistics University of Pittsburgh 318 Parran Hall Pittsburgh, PA 15261 184 Saul Rosenberg, Ph.D. Mathematical Statistician Medications Development Division National Institute on Drug Abuse Parklawn Building, Room 11A-55 5600 Fishers Lane Rockville, MD 20857 Vincent Shu, Ph.D. Section Head Clinical Statistics and System Development D436 AP6C Abbott Laboratories One North Abbott Park Road Abbott Park, IL 60064 Richard Stein, Ph.D. Mathematical Statistician Food and Drug Administration HFD-007, Room 9B45 5600 Fishers Lane Rockville, MD 20857 Ram C. Tiwari, Ph.D. Associate Professor Department of Mathematics University of North Carolina at Charlotte Charlotte, NC 28223 Frank J. Vocci, Ph.D. Deputy Director Medications Development Division National Institute on Drug Abuse Parklawn Building, Room 11A-55 5600 Fishers Lane Rockville, MD 20857 L.J. Wei, Ph.D. Professor Department of Biostatistics Harvard School of Public Health 677 Huntington Avenue Boston, MA 02115 T.S. Weng, Ph.D. Mathematical Statistician Division of Biometric Sciences Office of Science and Technology Center for Devices and Radiological Health Food and Drug Administration Suite 405 1801 Rockville Pike Rockville, MD 20852 Curtis Wright, M.D. Medical Officer Pilot Drug Evaluation Division Food and Drug Administration 5600 Fishers Lane Rockville, MD 20852 Margaret Wu, Ph.D. Mathematical Statistician Biostatistics Research Branch National Heart, Lung, and Blood Institute National Institutes of Health Federal Building, Room 2A11 Bethesda, MD 20892 185 National Institute on earch MONOGRAPH SERIES While limited supplies last, single copies of the monographs may be obtained free of charge from the Office for Substance Abuse Prevention’s National Clearinghouse for Alcohol and Drug Information (ONCADI). Please contact ONCADI also for information about availability of coming issues and other publications of the National Institute on Drug Abuse relevant to drug abuse research. Additional copies may be purchased from the U.S. Government Printing Office (GPO) and/or the National Technical Information Service (NTIS) as indicated. NTIS prices are for paper copy; add $3 handling charge for each order. Microfiche copies are also available from NTIS. Prices from either source are subject to change. Addresses are: ONCADI National Clearinghouse for Alcohol and Drug Information P.O. Box 2345 Rockville, MD 20852 (301) 468-2600 (800) 729-6686 GPO Superintendent of Documents U.S. Government Printing Office Washington, DC 20402 (202) 275-2981 NTIS National Technical Information Service U.S. Department of Commerce Springfield, VA 22161 (7083) 487-4650 For information on availability of NIDA Research Monographs 1 through 70 (1975-1986) and others not listed, write to NIDA, Community and Professional Education Branch, Room 10A-54, 5600 Fishers Lane, Rockville, MD 20857. 186 71 OPIATE RECEPTOR SUBTYPES AND BRAIN FUNCTION. Roger M. Brown, Ph.D.; Doris H. Clouet, Ph.D.; and David P. Friedman, Ph.D., eds. GPO out of stock NTIS PB #89-151955/AS $31 72 RELAPSE AND RECOVERY IN DRUG ABUSE. Frank M. Tims, Ph.D., and Carl G. Leukefeld, D.S.W., eds. GPO Stock #017-024-01302-1 $6 NTIS PB #89-151963/AS $31 73 URINE TESTING FOR DRUGS OF ABUSE. Richard L. Hawks, Ph.D., and C. Nora Chiang, Ph.D., eds. GPO Stock #017-024-01313-7 $3.75 NTIS PB #89-151971/AS $23 74 NEUROBIOLOGY OF BEHAVIORAL CONTROL IN DRUG ABUSE. Stephen I. Szara, M.D., D.Sc., ed. GPO Stock #017-024-01314-5 $3.75 NTIS PB #89-151989/AS $23 75 PROGRESS IN OPIOID RESEARCH. PROCEEDINGS OF THE 1986 INTERNATIONAL NARCOTICS RESEARCH CONFERENCE. John W. Holaday, Ph.D.; Ping-Yee Law, Ph.D.; and Albert Herz, M.D., eds. GPO out of stock ONCADI out of stock Not available from NTIS 76 PROBLEMS OF DRUG DEPENDENCE, 1986: PROCEEDINGS OF THE 48TH ANNUAL SCIENTIFIC MEETING, THE COMMITTEE ON PROBLEMS OF DRUG DEPENDENCE, INC. Louis S. Harris, Ph.D., ed. GPO out of stock ONCADI out of stock NTIS PB #88-208111/AS $53 77 ADOLESCENT DRUG ABUSE: ANALYSES OF TREATMENT RESEARCH. Elizabeth R. Rahdert, Ph.D., and John Grabowski, Ph.D., eds. GPO Stock #017-024-01348-0 $4 ONCADI out of stock NTIS PB #89-125488/AS $23 78 THE ROLE OF NEUROPLASTICITY IN THE RESPONSE TO DRUGS. David P. Friedman, Ph.D., and Doris H. Clouet, Ph.D., eds. GPO out of stock NTIS PB #88-245683/AS $31 79 STRUCTURE-ACTIVITY RELATIONSHIPS OF THE CANNABINOIDS. Rao S. Rapaka, Ph.D., and Alexandros Makriyannis, Ph.D., eds. GPO out of stock NTIS PB #89-109201/AS $31 187 80 NEEDLE SHARING AMONG INTRAVENOUS DRUG ABUSERS: NATIONAL AND INTERNATIONAL PERSPECTIVES. Robert J. Battjes, D.S.W,, and Roy W. Pickens, Ph.D., eds. GPO out of stock NTIS PB #88-236138/AS $31 81 PROBLEMS OF DRUG DEPENDENCE, 1987: PROCEEDINGS OF THE 49TH ANNUAL SCIENTIFIC MEETING, THE COMMITTEE ON PROBLEMS OF DRUG DEPENDENCE, INC. Louis S. Harris, Ph.D., ed. GPO Stock #017-024-01354-4 $17 NTIS PB #89-109227/AS Contact NTIS for price 82 OPIOIDS IN THE HIPPOCAMPUS. Jacqueline F. McGinty, Ph.D., and David P. Friedman, Ph.D., eds. GPO out of stock NTIS PB #88-245691/AS $23 83 HEALTH HAZARDS OF NITRITE INHALANTS. Harry W. Haverkos, M.D., and John A. Dougherty, Ph.D., eds. GPO out of stock NTIS PB #89-125496/AS $23 84 LEARNING FACTORS IN SUBSTANCE ABUSE. Barbara A. Ray, Ph.D., ed. GPO Stock #017-024-01353-6 $6 NTIS PB #89-125504/AS $31 85 EPIDEMIOLOGY OF INHALANT ABUSE: AN UPDATE. Raquel A. Crider, Ph.D., and Beatrice A. Rouse, Ph.D., eds. GPO Stock #017-024-01360-9 $5.50 NTIS PB #89-123178/AS $31 86 COMPULSORY TREATMENT OF DRUG ABUSE: RESEARCH AND CLINICAL PRACTICE. Carl G. Leukefeld, D.S.W., and Frank M. Tims, Ph.D., eds. GPO Stock #017-024-01352-8 $7.50 NTIS PB #89-151997/AS $31 87 OPIOID PEPTIDES: AN UPDATE. Rao S. Rapaka, Ph.D., and Bhola N. Dhawan, M.D., eds. GPO Stock #017-024-01366-8 $7 NTIS PB #89-158430/AS $45 88 MECHANISMS OF COCAINE ABUSE AND TOXICITY. Doris H. Clouet, Ph.D.; Khursheed Asghar, Ph.D.; and Roger M. Brown, Ph.D., eds. GPO Stock #017-024-01359-5 $11 NTIS PB #89-125512/AS $39 89 BIOLOGICAL VULNERABILTY TO DRUG ABUSE. Roy W. Pickens, Ph.D., and Dace S. Svikis, B.A., eds. GPO Stock #017-022-01054-2 $5 NTIS PB #89-125520/AS $23 188 90 PROBLEMS OF DRUG DEPENDENCE 1988: PROCEEDINGS OF THE 50TH ANNUAL SCIENTIFIC MEETING, THE COMMITTEE ON PROBLEMS OF DRUG DEPENDENCE, INC. Louis S. Harris, Ph.D., ed. GPO Stock #017-024-01362-5 $17 91 DRUGS IN THE WORKPLACE: RESEARCH AND EVALUATION DATA. Steven W. Gust, Ph.D., and J. Michael Walsh, Ph.D., eds. GPO Stock #017-024-01384-6 $10 NTIS PB #90-147257/AS $39 92 TESTING FOR ABUSE LIABILITY OF DRUGS IN HUMANS. Marian W. Fischman, Ph.D., and Nancy K. Mello, Ph.D., eds. GPO Stock #017-024-01379-0 $12 NTIS PB #90-148933/AS $45 93 AIDS AND INTRAVENOUS DRUG USE: FUTURE DIRECTIONS FOR COMMUNITY-BASED PREVENTION RESEARCH. C.G. Leukefeld, D.S.W.; R.J. Battjes, D.S.W.; and Z. Amsel, D.Sc., eds. GPO Stock #017-024-01388-9 $10 NTIS PB #90-148941/AS $39 94 PHARMACOLOGY AND TOXICOLOGY OF AMPHETAMINE AND RELATED DESIGNER DRUGS. Khursheed Asghar, Ph.D., and Errol De Souza, Ph.D, eds. GPO Stock #017-024-01386-2 $11 NTIS PB #90-148958/AS $39 95 PROBLEMS OF DRUG DEPENDENCE 1989: PROCEEDINGS OF THE 51ST ANNUAL SCIENTIFIC MEETING, THE COMMITTEE ON PROBLEMS OF DRUG DEPENDENCE, INC. Louis S. Harris, Ph.D., ed. GPO Stock #017-024-01399-4 $21 NTIS PB #90-237660/AS $67 96 DRUGS OF ABUSE: CHEMISTRY, PHARMACOLOGY, IMMUNOLOGY, AND AIDS. Phuong Thi Kim Pham, Ph.D., and Kenner Rice, Ph.D. eds. GPO Stock #017-024-01403-6 $8 NTIS PB #90-237678/AS $31 97 NEUROBIOLOGY OF DRUG ABUSE: LEARNING AND MEMORY. Lynda Erinoff, Ph.D., ed. GPO Stock #017-024-01404-4 $8 NTIS PB #90-237686/AS $31 98 THE COLLECTION AND INTERPRETATION OF DATA FROM HIDDEN POPULATIONS. Elizabeth Y. Lambert, M.S., ed. GPO Stock #017-024-01407-9 $4.75 NTIS PB #90-237694/AS $23 99 RESEARCH FINDINGS ON SMOKING OF ABUSED SUBSTANCES. C. Nora Chiang, Ph.D., and Richard L. Hawks, Ph.D., eds. GPO Stock #017-024-01412-5 $5 NTIS PB #91-141119 $23 189 100 DRUGS IN THE WORKPLACE: RESEARCH AND EVALUATION DATA. VOL. Il. Steven W. Gust, Ph.D.; J. Michael Walsh, Ph.D.; Linda B. Thomas, B.S.; and Dennis J. Crouch, M.B.A., eds. GPO Stock #017-024-01458-3 $8 101 RESIDUAL EFFECTS OF ABUSED DRUGS ON BEHAVIOR. John W. Spencer, Ph.D., and John J. Boren, Ph.D., eds. GPO Stock #017-024-01426-7 $6 NTIS PB #91-172858/AS $31 102 ANABOLIC STEROID ABUSE. Geraline C. Lin, Ph.D., and Lynda Erinoff, Ph.D. eds. GPO Stock #017-024-01425-7 $8 NTIS PB #91-172866/AS $31 103 DRUGS AND VIOLENCE: CAUSES, CORRELATES, AND CONSEQUENCES. Mario De La Rosa, Ph.D.; Elizabeth Y. Lambert, M.S.; and Bernard Gropper, Ph.D., eds. GPO Stock #017-024-01427-3 $9 NTIS PB #91-172841/AS $31 104 PSYCHOTHERAPY AND COUNSELING IN THE TREATMENT OF DRUG ABUSE. Lisa Simon Onken, Ph.D., and Jack D. Blaine, M.D., eds. GPO Stock #017-024-01429-0 $4 NTIS PB #91-172874/AS $23 105 PROBLEMS OF DRUG DEPENDENCE, 1990: PROCEEDINGS OF THE 52ND ANNUAL SCIENTIFIC MEETING, THE COMMITTEE ON PROBLEMS OF DRUG DEPENDENCE, INC. Louis S. Harris, Ph.D, ed. GPO Stock #017-024-01435-4 $22 106 IMPROVING DRUG ABUSE TREATMENT. Roy W. Pickens, Ph.D.; Carl G. Leukefeld, D.S.W.; and Charles R. Schuster, Ph.D., eds. GPO Stock #017-024-01439-7 $12 NTIS PB #92-105873 Paperback $50 Microfiche $19 107 DRUG ABUSE PREVENTION INTERVENTION RESEARCH: METHODOLOGICAL ISSUES. Carl G. Leukefeld, D.S.W., and William J. Bukoski, Ph.D., eds. GPO Stock #017-024-01446-0 $9 NTIS PB #92-160985 Paperback $35 Microfiche $17 108 CARDIOVASCULAR TOXICITY OF COCAINE: UNDERLYING MECHANISMS. Pushpa V. Thadani, Ph.D., ed. GPO Stock #017-024-01446-0 $7 NTIS PB #92-106608 Paperback $35 Microfiche $17 190 109 LONGITUDINAL STUDIES OF HIV INFECTION IN INTRAVENOUS DRUG USERS: METHODOLOGICAL ISSUES IN NATURAL HISTORY RESEARCH. Peter Hartsock, Dr.P.H., and Sander G. Genser, M.D., M.P.H., eds. GPO Stock #017-024-01445-1 $4.50 NTIS PB #92-106616 Paperback $26 Microfiche $12.50 110 THE EPIDEMIOLOGY OF COCAINE USE AND ABUSE. Susan Schober, Ph.D., and Charles Schade, M.D., M.P.H., eds. GPO Stock #017-024-01456-7 $11 NTIS PB #92-14624-0 Paperback $43 Microfiche $17 111 MOLECULAR APPROACHES TO DRUG ABUSE RESEARCH VOLUME I: RECEPTOR CLONING, NEUROTRANSMITTER EXPRESSION, AND MOLECULAR GENETICS. Theresa N.H. Lee, Ph.D, ed. Not for sale at GPO NTIS PB #92-135743 Paperback $35 Microfiche $17 112 EMERGING TECHNOLOGIES AND NEW DIRECTIONS IN DRUG ABUSE RESEARCH. Rao S. Rapaka, Ph.D.; Alexandros Makriyannis, Ph.D.; and Michael J. Kuhar, Ph.D., eds. GPO Stock #017-024-01455-9 $11 113 ECONOMIC COSTS, COST-EFFECTIVENESS, FINANCING, AND COMMUNITY-BASED DRUG TREATMENT. William S. Cartwright, Ph.D., and James M. Kaple, Ph.D., eds. Not for sale at GPO NTIS PB #92-155795 Paperback $35 Microfiche $17 114 METHODOLOGICAL ISSUES IN CONTROLLED STUDIES ON EFFECTS OF PRENATAL EXPOSURE TO DRUG ABUSE. M. Marlyne Kilbey, Ph.D., and Khursheed Asghar, Ph.D., eds. GPO Stock #017-024-01459-1 $12 NTIS PB #92-146216 Paperback $43 Microfiche $17 115 METHAMPHETAMINE ABUSE: EPIDEMIOLOGIC ISSUES AND IMPLICATIONS. Marissa A. Miller, D.V.M., M.P.H., and Nicholas J. Kozel, M.S., eds. GPO Stock #017-024-01460-5 $4 116 DRUG DISCRIMINATION: APPLICATIONS TO DRUG ABUSE RESEARCH. Richard A. Glennon, Ph.D.; Torbjorn U.C. Jarbe, Ph.D.; and Jerry Frankenheim, Ph.D., eds. GPO Stock #017-024-01470-2 $13 191 117 METHODOLOGICAL ISSUES IN EPIDEMIOLOGICAL, PREVENTION, AND TREATMENT RESEARCH ON DRUG-EXPOSED WOMEN AND THEIR CHILDREN. M. Marlyne Kilbey, Ph.D., and Khursheed Asghar, Ph.D., eds. GPO Stock #017-024-01472-9 $12 118 DRUG ABUSE TREATMENT IN PRISONS AND JAILS. Carl G. Leukefeld, D.S.W., and Frank M. Tims, Ph.D., eds. GPO Stock #017-024-01473-7 $16 ; 119 PROBLEMS OF DRUG DEPENDENCE 1991: 53RD ANNUAL SCIENTIFIC MEETING, THE COMMITTEE ON PROBLEMS OF DRUG DEPENDENCE, INC. Louis S. Harris, Ph.D., ed. GPO Stock #017-024-01474-5 $22 120 BIOAVAILABILITY OF DRUGS TO THE BRAIN AND THE BLOOD- BRAIN BARRIER. Jerry Frankenheim, Ph.D., and Roger M. Brown, Ph.D., eds. 121 BUPRENORPHINE: AN ALTERNATIVE TREATMENT FOR OPIOID DEPENDENCE. Jack D. Blaine, Ph.D., ed. 122 RESEARCH METHODS IN WORKPLACE SETTINGS. Helen Axel, M.A., and Dennis J. Crouch, M.B.A., eds. 123 ACUTE COCAINE INTOXICATION: CURRENT METHODS OF TREATMENT. Heinz Sorer, Ph.D., ed. 124 NEUROBIOLOGICAL APPROACHES TO BRAIN-BEHAVIOR INTERACTION. Roger M. Brown, Ph.D., and Joseph Frascella, Ph.D., eds. 125 ACTIVATION OF IMMEDIATE EARLY GENES OF ABUSE. Reinhard Grzanna, Ph.D., and Roger M. Brown, Ph.D. eds. 126 MOLECULAR APPROACHES TO DRUG ABUSE RESEARCH VOLUME Il: STRUCTURE, FUNCTION, AND EXPRESSION. Theresa N.H. Lee, Ph.D., ed. 192 U. C. BERKELEY LIBRARIES CO044a8a87039