key: cord-0001746-isyjyqjj authors: Cleveland, R. H.; Schluchter, Mark; Wood, Beverly P.; Berdon, Walter E.; Boechat, M. Ines; Easley, Kirk A.; Meziane, Moulay; Mellins, Robert B.; Norton, Karen I.; Singleton, Edward; Trautwein, Lynn title: Chest radiographic data acquisition and quality assurance in multicenter studies date: 1997 journal: Pediatr Radiol DOI: 10.1007/s002470050262 sha: 705e54c09a078bafb517d4bb0953b6735c7b9ab9 doc_id: 1746 cord_uid: isyjyqjj Background. Multicenter studies rely on data derived from different institutions. Forms can be designed to standardize the reporting process allowing reliable comparison of data. Objective. The purpose of the report is to provide a standardized method, developed as a part of a multicenter study of vertically transmitted HIV, for assessing chest radiographic results. Materials and methods. Eight hundred and five infants and children were studied at five centers; 3057 chest radiographs were scored. Data were entered using a forced-choice, graded response for 12 findings. Quality assurance measures and inter-rater agreement statistics are reported. Results. The form used for reporting chest radiographic results is presented. Inter-rater agreement was moderate to high for most findings, with the best correlation reported for the presence of bronchovascular markings and/or reticular densities addressed as a composite question (kappa = 0.71). The presence of nodular densities (kappa = 0.56) and parenchymal consolidation (kappa = 0.57) had moderate agreement. Agreement for lung volume was low. Conclusion. The current tool, developed for use in the pediatric population, is applicable to any study involving the assessment of pediatric chest radiographs for a large population, whether at one or many centers. Materials and methods. Eight hundred and five infants and children were studied at five centers; 3057 chest radiographs were scored. Data were entered using a forced-choice, graded response for 12 findings. Quality assurance measures and inter-rater agreement statistics are reported. Results. The form used for reporting chest radiographic results is presented. Inter-rater agreement was moderate to high for most findings, with the best correlation reported for the presence of bronchovascular markings and/or reticular densities addressed as a composite question (kappa = 0.71). The presence of nodular densities (kappa = 0.56) and parenchymal consolidation (kappa = 0.57) had moderate agreement. Agreement for lung volume was low. Conclusion. The current tool, developed for use in the pediatric population, is applicable to any study involving the assessment of pediatric chest radiographs for a large population, whether at one or many centers. ic interpretation and reporting among radiologists representing five centers. A radiology subcommittee of coinvestigators developed a standardized reporting form. This form required specific responses concerning observations on chest radiographs similar to that used in adult classification of pneumoconiosis [2±4] and in assessing the severity of adult respiratory distress syndrome (ARDS) [5] . A standardized method also exists for evaluating interstitial lung disease in adults [6, 7] and in the prediction of oxygen dependency in premature infants [8] . This report describes the first standardized interpretation and reporting method for chest radiographs developed for use in children (other than premature newborns) and summarizes the results of quality assurance studies utilized to judge inter-rater agreement. This standardized method should be applicable to other multicenter studies in which changes in chest radiographic findings are anticipated. Five pediatric centers in the United States are involved in the collaborative assessment: Baylor College of Medicine, Children's Hospital of Boston/Harvard Medical School/Boston University School of Medicine, Mount Sinai School of Medicine of New York, Presbyterian Hospital of New York City/Columbia University, and University of California, Los Angeles School of Medicine/ Children's Hospital Los Angeles/University of Southern California School of Medicine. The data have been collected on infants and children with, or at risk of, vertically transmitted (mother to child) HIV infection. The design of the P2C2 HIV Study has been previously reported [1] . Two hundred and five infants (greater than 28 days old) and children with clinically diagnosed vertically transmitted HIV infection constituted group I, and 600 live-born infants born to HIV infected mothers constituted group II. Through January 1997, among the group II children, 93 of the 600 were subsequently found to be HIV infected and categorized into group II A. Four hundred and sixty-three of the 600 did not become infected with the HIV virus and constitute group II B. Forty-four infants died or were lost to follow-up before their HIV status was determined. Selected randomly for long-term followup were a subset of 216 of the group II B children. Follow-up consisted of regularly scheduled chest radiographs performed for group II at ages 3, 12, 18 months and annually thereafter in group II A, and at yearly intervals in group I infants, as well as intercurrent examinations on acutely ill children in both groups I and II. Recruitment of study subjects began in May 1990 and continued through January 1994, with follow-up continuing through January 1997. At the time of this report, 3,057 chest radiographs had been performed on the children enrolled in this study. Specific radiographic criteria were included for assessment: (1) lung volume, (2) the presence of nodular densities, (3) parenchymal reticular densities, (4) parenchymal consolidation, (5) cystic lesions, (6) pleural effusions, (7) pneumothorax, (8) hilar adenopathy, (9) heart size, (10) osseous changes, (11) additional abnormalities. A separate score for bronchovascular (BV) mark-ings was added to the form after initial assessment of results ( Fig. 1) . Films were scored as normal or abnormal. Lung volume was scored as normal, low, or increased. If abnormal, specific observations were scored as follows. For BV markings, nodular densities, reticular densities, and parenchymal consolidation, scoring was based on a three-point scale: (1) absent or normal; (2) equivocal, undecided or ill-defined; (3) definitely present or increased. For the categories ªill-definedº, ªdefinitely presentº, and ªincreasedº a sub-classification evaluated location, profusion, and/or size. Pleural effusions were listed as absent or present with location and size recorded. Adenopathy was recorded as present or absent with location indicated. Cysts, pneumothorax, osseous changes and ªother abnormalitiesº were recorded as absent or present. Heart size was assessed as normal or enlarged (Fig. 1 ). A review process was undertaken to establish quality assurance for inter-rater reliability. A stratified random sample of ten radiographs was selected from each of the clinical centers. Of these ten radiographs, seven were randomly chosen from those initially interpreted by the centers' radiologists as abnormal plus an additional random sample of normal films for a maximum of ten samples per center. In the initial three evaluations of quality assurance, a film was adjudged abnormal if it had any abnormality at all as indicated by an answer of ªabnormalº for question 3 on the evaluation questionnaire ( Fig. 1) . Two other study radiologists reviewed the same sample of films. The reviewers did not have access to the original interpretation or to the HIV status of the child. All of the samples films were re-interpreted by a radiologist at the coordinating center who was also blinded as to the status of the patient. The results of the quality assurance reading were summarized and reviewed. After the initial three rounds of quality assurance, information indicated that some of the interpreting radiologists were including the observation of increased BV markings and bronchial wall thickening as a component of reticular densities, while others were applying the observation of reticular densities to a finer, more diffuse, interstitial abnormality. Thus, an observation of increased BV markings could have been included in the reticular densities category or in the ªother abnormalitiesº category. Thereafter, a new category was established for BV markings (Fig. 2) , scored separately from reticular densities (Fig. 3) . All radiographs initially interpreted as having either increased BV marking or reticular densities were reevaluated using the revised form. A fourth round of quality assurance was undertaken to examine inter-rater reliability for BV markings and reticular densities. This review included 39 films (12 originally read by the individual centers' radiologists as normal, 8 originally read as having increased BV markings, 7 as having reticular densities present, and 12 as having both increased BV markings and reticular densities). The films were selected randomly and re-read in a blinded fashion by two study radiologists at clinical centers as well as the radiologist at the coordinating center. Inter-rater reliability was summarized using a kappa statistic developed for use with multiple readers [9] . The kappa statistic is a chance-corrected measure of agreement which equals 0 if agreement among readers is equal to what would be expected based on chance alone, and 1 if there is perfect agreement among raters. It has been suggested [10] that values of kappa less than 0.40 represent poor agreement, values 0.40±0.75 represent fair to good agreement, and values above 0.75 represent excellent agreement. Kappa statistics were not calculated if the chance agreement was greater than 90 %. Results of the first three rounds of quality assurance are summarized in Table 1 . The kappa statistics combining data from the three rounds were above 0.40 for all questions except lung volume (kappa = 0.21) and enlarged heart (kappa = 0.23). Review of the results for reticular densities led to the discovery of differential recording of BV markings versus reticular densities among the study radiologists. Results of the fourth round of quality assurance, focusing on BV markings and reticular densities (Table 2), found that the kappa statistic was low for reticular densities (kappa = 0.34 for absent vs present) and moderate for BV markings (kappa = 0.49 for absent vs undecided/ increased). However, while radiologists varied in their interpretation of specific abnormalities as BV markings or reticular densities, agreement was better when considering the composite question of whether increased BV markings and/or reticular densities are present or absent (kappa = 0.71). Table 3 gives results for findings with low prevalence. Since chance agreement was greater than 90 % for these observations, kappa statistics were not calculated. Reliable quantification of abnormalities recognized on plain chest radiographs has been successfully performed in the adult population by the use of data entry forms similar to that used in this study. These studies [2±8] have shown that subjective opinions can be rendered more reliable by using a forced-choice decision making process (usually based on a sliding scale of severity or certainty). Use of the kappa statistic allows quantification of the relationship between chance and observed agreement 883 a b c Fig. 2 a±c Bronchovascular markings were defined as bronchial wall thickening separate from linear reticular interstitial prominence. In this child, they manifest as parallel bronchial walls seen in the long axis (tram tracks, open arrows ), and rings (closed arrow) representing thickened bronchial walls seen on end. a PA projection; b lateral projection; c coned down image of posterior lung bases from the lateral projection W Fig. 1 The chest radiograph data collection form used by the P2C2 HIV study. Questions 4 and 5 on bronchovascular markings were added after the third round of quality assurance by expressing the radiologists' agreement as a proportion of the possible score for doing better than chance, which is the difference between maximal agreement (100 %) and chance agreement. The kappa statistic of 0.54 (95 % confidence interval 0.47±0.61) for overall findings suggests moderate agreement among the radiologists. This study shows a moderate to high inter-rater correlation for several observations made on chest radiographs of infants and small children (Tables 1, 2) . Although radiologists were able to agree with moderate inter-rater reliability on most issues, assessment of BV markings and reticular densities when assessed individually showed poor to moderate inter-rater agreement. However, when considered together, there was good agreement (Table 2 ). This, in part, may have been because of a lack of agreement among the radiologists as to the criteria for clearly differentiating between the two observations. The kappa statistics for nodular densities and parenchymal consolidation, considered as ªabsentº vs. ªpresentº, were 0.56 and 0.57 respectively, both suggesting moderate agreement among the radiologists (Table 1) . However, the strength of agreement for lung volume (normal vs. low vs. increased) was low (kappa = 0.21). Since the prevalence of children with an enlarged heart as read by the original reader was only 6.9 %, and the observed and chance agreements were high (90.5 % and 87.7 % respectively) (Table 1) , the low kappa statis-885 tic of 0.23 may be misleading. It is well recognized that the statistic depends upon the proportion of children (prevalence) in each category. If more children with an enlarged heart had been included in the study but the observed agreement remained the same, the kappa statistic would be greater because the chance agreement would be lower. Among group I children, 194 had at least one cardiac echo study and at least one radiograph. Of these, there were 38 infants who had a z-score for echo-determined end-diastolic dimension (EDD) of greater than 2.0; 15 (39 %) also had notation of an enlarged heart on chest radiograph. Of the remaining 156 children who never had an EDD z-score > 2.0, 119 (76 %) never had an enlarged heart on radiography (S. E. Lipshultz, personal communication). This tool, developed for assessment of chest radiographs in the P2C2 HIV study, is applicable to any study involving the examination of pediatric chest radiographs for a large population, whether at one or many centers. National Heart, Lung and Blood Institute Hannah Peavy, M. D., (Project Officer) Children's Hospital UCLA School of Medicine/Children's Hospital The pediatric pulmonary and cardiovascular complications of vertically transmitted HIV infection study: design and methods Guidelines for the use of ILO international classification of radiographs of pneumoconioses The 1980 ILO classification of radiography of the pneumoconioses A preliminary statistical report of x-ray findings in black lung applicants for the state of West Virginia A radiographic score for clinical use in the adult respiratory distress syndrome Diffuse infiltrative lung disease: a new scheme for description Radiologic assessment of pulmonary arterial pressure and blood volume in chronic, diffuse, interstitial pulmonary diseases A chest radiograph scoring system to predict chronic oxygen dependency in low birth weight infants Measuring nominal scale agreement among many raters The measurement of observer agreement for categorical data The following is a partial listing of the individuals and institutions participating in the P2C2 HIV Study. A full list of participants is provided in ref. 1.