mwli II IHIffMIMnfffW IBHlO^BSaMBiuDKOQwwfi BBR LIBRARY OF THE UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN 510 .^ Il(oT ///sUX UIUCDCS-R-T2-ll9T COO-2118-0029 Parallel Image-Processing For Automated Cervical Smear Analysis By John S. Read October 1972 RY0EX1 NOV 9 1972 5ITY OF ILLINOIS AT URBANA-CHAMPAIGN DEPARTMENT OF COMPUTER SCIENCE UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN COO-2118-0029 UIUCDCS-R-72-497 Parallel Image-Processing For Automated Cervical Smear Analysis By John S. Read Department of Computer Science University of Illinois Urbana, Illinois This work supported in part by Contract Al(ll-l)-21l8 with the U.S. Atomic Energy Commission. Digitized by the Internet Archive in 2013 http://archive.org/details/parallelimagepro497read iii ACKNOWLEDGMENT I wish to acknowledge the assistance and support given to me by several people while I was working on this thesis. Professor B. H. McCormick, my adviser, provided endless patience as well as good advice. My wife and daughters gave up many months of family- type activities. S. N. Jayar amamurthy , Val Tareski, and Peter Raulefs wrote programs to implement Michalski's and McCormick' s ideas used in the texture analysis section. S. N. Jayar amamurthy also collaborated in a paper which was incorporated in this thesis. Mrs. Judy Arter and Mrs. Patti Welch typed the several revisions of the text. I also wish to express gratitude to Dr. Ben T. Williams and Dr. Fernando Toledo, who supplied encouragement and indispensable information about cytology, and to give special thanks to Mr. Lawrence Brady, Cytotechnologist , who gave up his Thursday evenings for several months to teach me some of the rudiments of his art. IV TABLE OF CONTENTS Page, I. INTRODUCTION 1 A. Image Processing in Automated Cytology 1 B. The Subject and Scope of this Thesis 2 II. CERVICAL SMEARS 7 A. Purpose and Use of Cervical Cell Sample 7 B. Image Characteristics of the Cell Sample 8 1. Variations in Epithelial Cell Morphology Due to Differentiation 8 2. Model of the Cell Sample 9 C. Characterizations of Malignant Cells l6 1. Morphological Characterizations l6 2. Non-Morphological Characterizations 20 III. REVIEW OF CYTOLOGY AUTOMATION 2k A. Applications 2k B. Specimen Transport 2k 1 . Microscope Slides 2k 2. Fluid Transport 26 3. Linear Deposition 28 k. Other Systems 28 C. Measurement and Analysis 28 1 . Morphological Analysis 28 2. No n -Morphological Analysis k2 Page IV. EQUIPMENT AND PROCESSES USED 1+5 A. Programmable Scanners 1+5 B. Show-and-Tell 1+8 C. Parallel Image Processing and PAX II . 1+8 D. Texture Recognition using Varivalued Logic 50 V. CYTOLOGICAL IMAGE PROCESSING EXPERIMENTS 6l A. Blob Detection 6l B. Cell Detection 68 C. Texture Processing 71 VI. CONCLUSIONS 88 A. Blob and Cell Detection 88 B . Texture Analysis 89 C. Recommendations for Further Work 92 LIST OF REFERENCES 9k APPENDIX llU vi LIST OF FIGURES Figure Page 1. Cross-section of Stratified Squamous Epithelium (left) and Views of Representative Cells 10 2. Photomicrograph of Epithelial Cells and White Blood Cells 11 3. Cell Structure as Delineated "by Papanicolaou Stain . . 12 h . Single Squamous Cell Status 13 5. Squamous Cell Interactions lk 6. Photomicrograph of Cervical Smear Showing Cell Interactions 15 7. Block Diagram of Equipment Used k6 8. One-dimensional Texture-processing Example (a) One -dimensional "Textures," Quantized to Four Gray Levels 53 (h) Receiver Operating Characteristic Corresponding to Table 5 56 (c) Generalized Logic Diagram with Interval Covering of T 1 against T° 56 9. Input Images (a) Photographic Image: Blobs 65 (b) Digitized Image: Blobs 65 (c) Photographic Image: Cells 72 (d) Digitized Image: Cells 72 10. Operations (a) "BLOB" Operation, Three Gray Levels. IRADUB = 2, IRADLB = 1, IDROP = 2 73 (b) "RADAR" Operation 7^ VI 1 Figure Page 11. Blob Detector Processing of Figure 9("b) (a) Large Blobs Detected 76 (b ) Blob Detector Output 76 12. Cell Detector Processing of Figure 9(d) (a) Smoothed Image (KPOW = 2) 77 (b) Magnitude of Differences in Y-Direction 77 (c) Blobs Detected 78 (d) Cytoplasmic Borders in the X-Direction Detected . 78 (e) RADAR "Beams" 79 (f) Cells Detected 79 13. Texture Analysis (a) Photomicrograph of Drying Artifact 8l (b) Photomicrograph of Cell Nucleus Showing Chromatin Texture (Center) 8l (c) Test Set of Texture Samples 8k (d) Test Set with Hit Counts 85 (e) Test Set Classified 86 (f) Receiver Operating Characteristic for Chromatin vs. Artifact 87 (g) Receiver Operating Characteristic with Tapered Quantization of Samples 87 (h) Hit Count Distribution 91 viii LIST OF TABLES Table Page 1. Some Parameters Characterizing the Present Cervical Smear Screening System 8 2. Measurements on Abnormal Cells by Reagen and Wied. . . 19 3. Automated Cytology Applications 25 1+. Flow Systems — Non-Morphological Measurements hk 5. Statistics Derived from the One-dimensional Texture of Figure 8(a) using a 1x3 Template to Define Events . 53 6. Blob-detector Performance as a Function of Small-blob Parameter Values 67 I . INTRODUCTION A. Image Processing in Automated Cytology Images of biological cells are being increasingly used in quanti- tative studies of cell properties. New techniques have become available which elucidate subtle cell structures and functions. Procedures such as stoichiometric staining, fluorescent or radioactive tagging with antibodies or chemical precursors, and autoradiography produce images whose optical properties can be used to analyze complex biological events. However, conversion of optical properties to numeric form and subsequent interpretation become problems, especially when large cell populations are involved. Information in biological images is typically rather "noisy," owing to the heterogeneity and variation inherent in such materials. Simple automatic approaches to data extraction are sensitive to this noise. Frequently, morphological considerations are needed to limit the domain of measurement to particular structures to avoid spurious signals [97, 120, 191 » 26l]. Data conversion and inter- pretation by humans, on the other hand, can cope with the noise problem, but are not cost-effective for some applications of interest, even when augmented by interactive computation facilities. What is needed, then, is the development of cost-effective image- processing algorithms which can cope with morphological criteria at the level of complexity present in biological materials. These algorithms can produce some measurements directly, or can be used in conjunction with other analytical instruments. For example, an image-processing algorithm could be used to locate a cell nucleus, whereupon an electron micro probe could be automatically directed to perform a chemical analysis, Conversely, image-processing could be used to automatically apply quantitative adaptations of human-oriented morphological criteria to 2 the output of existing high-speed cell analyzers. Flow and electro- static particle transport systems have the capacity to handle, analyze and even sort large numbers of cells, often as many as 100,000 per minute. (See section III.B.2.) These devices typically apply measure- ments, such as light absorption at a particular wavelength, to an entire cell at once with no attempt to resolve intracellular structure. Some applications would benefit from the ability to rapidly filter large quantities of cells using high-speed whole-object measurements followed by automatic morphological analysis of the residue. B. The Subject and Scope of this Thesis In this thesis a particular application of digital image processing in cytology is examined: automated analysis of the well-known "Pap" smear. Pap smears are samples of epithelial (skin) cells from the uterine cervix and are used to detect cancer of the uterus while it is in an early stage and relatively easy to cure. Development of an auto- matic device to screen these samples has been recognized as of great potential value, almost from the time the test was first devised by Papanicolaou. However, this has proven to be a formidable task, as demon- strated by unsuccessful development projects undertaken by such organi- zations as the National Cancer Institute, IBM Corporation, and Vickers, Ltd. Accordingly, the goals of this thesis are prudently restricted to something less than a complete implementation. Actual implementation of a system to analyze cervical smears would require a careful analysis of the options available in staining, specimen preparation, transport and sensing to determine the most effective combination. It seems likely that much deeper insights into the nature of cancer cells will be forthcoming in the next few years. One would expect that this new knowledge will have a significant effect on the technological direction. 3 As is discussed in some detail in the next chapter, a cervical smear consists of a mixed population of cells spread on a microscope slide and stained. The cells have a tendency to clump together, fold, and otherwise present a very heterogenous and confusing picture to an observer, whether human or machine (see Figure 2). This clumping and overlapping will frequently be a problem whenever loose cells are to be examined, even if special preparative techniques are used to disperse them. The fact is that many cell types are supposed to stick together, and treatment of the sample to defeat this may result in the loss of important information. In the present work, the prime objective was to examine the effectiveness of a parallel digital image processor in the analysis of cervical smear imagery. A wide range of activities could support this purpose. In order to define the problem adequately for machine imple- mentation, and also to avoid expending excessive time on a complex biomedical problem, it was decided to accept as correct certain con- clusions drawn in the course of development of the Cytoanalyzer project, an early attempt to automate the screening of cervical smears by image processing. The Cytoanalyzer project is discussed more thoroughly in chapter III. The pertinent conclusions, paraphrased from references [206] and [207] are as follows: (1) Normal cells shed from the cervix and vagina exhibit a functional relationship between size and optical density of the nucleus that approximates e(n) = Q/D(n)**2 where e(n) is nuclear optical density and D(n) is nuclear diameter suggesting that the nucleus contains a constant quantity, Q, of dye-binding material in normal cells throughout various stages of differentiation. (2) There is a continuous spectrum of change toward increased optical density of the nucleus and increased nuclear size (relative to overall cell size) as one progresses from normal cells to cancer cells. (3) Smears classified as "being associated with cancer show the presence of a second population of aberrant cells superimposed on the normal cell population. These conclusions were based on the use of a standard Papanicolaou- prepared smear, and were qualified by the statement: "The application of these methods to an automatic instrument implies the condition that the instrument make no errors in measurement or cell recognition." [206, p. 468] As is described in Chapter III, a prototype instrument incor- porating these principles was built which was intended to measure nucleus optical density and nucleus diameter of epithelial cells. Due to limi- tations in the image processing technology then available, it proved to be impossible to meet the conditions of the qualifying statement. The prototype could not distinguish enlarged, dark, epithelial nuclei from clumps of white "blood cells nor distinguish white blood cells from certain normal epithelial cell nuclei. Thus, the validity of the con- clusions reached in the design study could not really be tested in a clinical environment because of the presence of this biological image "noise." The experiments reported in this thesis were motivated by the assumption that improved image processing technology, as represented by a parallel digital image processing device (described briefly in Chapter IV) will permit reliable analysis of much more complex patterns of cell images. The emphasis here is not on the discovery and extrac- tion of parameters for distinguishing a malignant cell from a non- malignant cell, but rather on the construction of algorithms for a parallel processor which permit the machine to rapidly make sense out 5 of the mess of cells and debris in the microscope field so that sub- sequent measurements (whatever they might be) are made on the correct objects: epithelial cell nuclei, for example, rather than clumps of white blood cells, cytoplasmic folds or other locally similar- appearing phenomena. The interest in algorithms for a parallel image processor stems from the belief that for this and similar applications, parallel digital image processing has the greatest likelihood of being sufficiently fast, flexible and cost effective to perform the critical initial steps of object location and identification. The remainder of this thesis consists of two parts. The first, Chapters II and III, contains background information about the application and a review of the rather extensive literature on automated cytology. The second part, Chapters IV-VI, describes some experiments in applying parallel image-processing to some of the cervical image difficulties mentioned above. The objective was to write and test programs for hardware like the Pattern Articulation Unit of Illiac III, where the programs could cope with three particular aspects of the analysis of cervical images: (l) Rapid filtering of images of minimal photometric and spatial resolution to detect dark blob-like regions representing potential malignant cell nuclei . The Cytoanalyzer study mentioned above provides the justifica- tion for searching for large, symmetric dark regions. The constraint to use images of low resolution is in keeping with the realities of scanner per- formance in situations where high speed is required. The filtering procedure is designed to distinguish between dark blobs caused by symmetric clumps of leukocytes and blobs caused by other objects, including malignant cells. (2) Detection and counting of normal, well-differentiated epi- thelial cells . This is again a filtering operation to "be done on a low- resolution image, and is useful in cervical smear analysis as an indica- tion of cell sample adequacy and as an input to a stopping rule. (3) Textural discrimination between cell nuclei and blobs caused by drying artifacts . After the blob filter isolates areas of potential interest, blobs caused by spurious phenomena such as drying artifact can be eliminated by texture or spectral analysis on a higher resolution image. Data rate requirements are much reduced by rescanning at higher resolution only areas which require higher resolution to clear up ambiguities. While there are many other aspects of cervical cell image analysis which could be of interest, it is felt that these three are particularly compelling, since previous attempts to handle them were unsuccessful because of the state of image processing technology. By capitalizing on the recently- available ability to make better use of two-dimensional infor- mation at high speeds, it is felt that these difficulties are solvable. II. CERVICAL SMEARS A. Purpose and Use of Cervical Cell Sample The epithelial cells of the body cover and line organs, providing protection in some cases and secreting or absorbing vital fluids in others. Since epithelial cells are subject to various stresses, they are renewed and sloughed off (exfoliated) continuously. [63, p. 189- 2l8]. In some cases, exfoliated cells are carried by fluids to locations in the body where they can be easily sampled (sputum, urine). These phenomena provide a painless and low-cost way of obtaining a pseudo-biopsy of tissues which may otherwise be accessible only by surgery. The epithelium of the uterine cervix, being relatively accessible, can be sampled more directly by scraping. Under the present, manual system the cells are spread on a microscope slide, fixed, and stained according to procedures described by G. N. Papanicolaou in the early 19U0's [156-I58, 2^U], An appropriately-trained person determines whether the sample contains tumor cells or cells from "pre-cancerous" conditions by looking for certain features of individual cells, and by looking at the overall pattern of cells on the slide. This task is commonly broken into two parts: a preliminary screening to eliminate indisputably normal slides, and then a close scrutiny of the remaining slides by a pathologist. The screening phase is usually delegated to a cytotechnologist , a technician who has undergone at least six months of formal training in cytotechnology followed by another half-year of on-the-job training. Some parameters which describe the effectiveness and load on this system are presented in Table 1, which is a compilation of information from several sources. Table 1. Some Parameters Characterizing the Present Cervical Smear Screening System Number of deaths due to uterine cancer (US, Number of slides processed (US) Throughput capacity per screener Time to screen one slide Cost per slide Number of cytotechnologists (US) Percent incidence, various abnormalities Dysplasia Carcinoma in situ Invasive cancer Probability of missed positive ik thousand/year [l] 15 million/year [279" 10 slides /hour 2-5 minutes [165] $1.00-$3.00 [165] 4500 [68] ,5-. 8 [173] 3-.h [173] .1 [173] .1 [U7,110] While the primary purpose of taking Pap smears is to detect cervical cancer at a stage when it is easily controlled, other information is frequently obtained from the cell sample: The quantity and types of cells present indicate whether the cell sample is adequate. For example, a sample consisting entirely of white blood cells was probably not made correctly. Various micro-organisms such as yeasts, bacteria and trichomonads may be identified. The degree of differentiation of the epithelial cells present gives an accurate indication of the status of hormonal activity. This is useful in estimating the effect of hormone therapy or detecting ovarian cancer in post-menopausal patients. B. Image Characteristics of the Cell Sample 1. Variations in Epithelial Cell Morphology Due to Differentiation Figure 1 shows on the left a cross-section through epithelial tissue like that present on the outer (vaginal) part of the uterine cervix. New cells are continuously being generated in the bottom (basal) cell layer. In normal epithelium, as the cells differentiate, they become more and more flattened (squamous), and the nucleus shrinks. At the top (superficial) stratum, the cells are relatively easy to dis- lodge and can be harvested for examination by non-traumatic scraping or wiping. On the right in Figure 1 are top and side views of repre- sentative cells from various strata. Figure 2 is a photomicrograph of a Pap smear. Intermediate and parabasal cells are seen. The small dark blobs scattered throughout the field are white blood cells (leukocytes). 2. Model of the Cell Sample As mentioned in the Introduction, the appearance of the cell sample in standard preparations is heterogeneous and confusing. To help sort this out, a model as depicted in Figures 3-5 was contrived. This model applies to mixtures of white blood cells (leukocytes) and to fairly well-differentiated squamous epithelial cells only, and shows some of the situations which occur. In this model, an idealized epi- thelial cell is postulated. Since the Papanicolaou staining process does not elucidate a living cell's true complexity, the cell can be regarded as having the parts shown in Figure 3. [63, 6k], Two classes of transformations act upon these cells to affect the image appearance. Various biological and mechanical forces can cause changes in the appearance of individual cells , as shown in Figure h. In addition, interactions between cells produce the classes of cell images shown in Figure 5. Combinations of individual cell changes and cell interaction effects can produce complex images such as Figure 6. SIDE VIEW 10 TOP VIEW < ■ *»- ."*.: '' ,-.- SUPERFICIAL o to ,. -c &> ■ .9 . «> -' • ". ' '. r" : - '.-" <*> r/"' ' -_. , ^' r./# ; ./•' « ■ 4iS? // (0 ■'::' V \l . ■>■■ -- (, g '- .. *\ <_ ? ^ . . / ' & ' ' v ' • /*V,. V _ '•N ••• & >,v ... . .- A' 4 # r' # # N v .,•■ s ■ ' ^i ." - — •■•' .,.-,' <0 0v. <■' '•^r' <$> 7 d3 ® ../v-nc*"- *> *--£S^tfSS j W*SgF*"" \ o \l G & > • & * N is co * — ^ r-\ H i — i CD vo o -=t CM H L. | 1 d i tJ Jh CD O •H a ^ ,g < * ^ -s a « O co , — , -p OO a t— a) H B a; ^ C p) CD ra bO cd cd 0) cd 2 « CM ■3 O LTN ON H LTN ON CO H LT\ LT\ H g* p + 1 + 1 O c— s o m O K VO s H H H cd ON t— LTN CO On 00 O CO 3 VO OO VO H on H OO K CO H + 1 4 1 C •H +3 cd *H CD 1 1 VO ON CO O .si- o vo 00 o LTN VO CO CO on CM t— « s CM + 1 + i 1 bC £ •H IS] ■H a •H •p aJ LTN t— t—co On CM On On m C— O t— CM CM CM UA H CD CM H « W + 1 + 1 pi -P IS H CM vo vo H o\ H ■H CO H H CM on vo 00 w on h + 1 H + 1 S3 ■H novo o\ on CM NO CM CO en LTN CM CM t— • LTN H. O CO 00 C— CM CM CM t— cd « on h H O + l + 1 + 1 CM _3- LTN t— cd ir\ C\ t— 00 H •H ^ o\ en H co o\ + 1 cd + 1 H ft CO H t— CO VO 00 H ON ON O -* ^t CM CO co H vo oo H -3" CO H H ON >> K o oo H Q r-H- 1 + 1 + 1 ^ N «— . W, TiS. 1 — ' y -s N ^ ^ CD ^ — N ^— ^ h Pi VS. cd cr S H cd ^-v -— - • — x p) ft CM CM a VS. H § o =1 3- CD ^ — ■% v_^ ^ — *. pi cd >^_^ >v_^ ^ tsS. feS. § k -p cd CO ^~-^ ^_^ cd bO a o3 cd CD is**. U CD CD o TJ -P ' cd bO >> o [3 1h pi CD cd ■H H pi g3 S -P bO CO -p !>> CD H cd CD ■p o H CO co H • • H U CD CD Si fl H o H O bO CD CO a cd CD pi CD CO bO & ■H o U O s K H < CD ES( o EH y — s IT 9 T"eiuaouqv jo q-UsuisSuBaav J\. 8jmq.X3j J utq-Biuoaqo 20 greater than .5 and nucleus diameter greater than 10 microns. Per- formance with these features was shown to be very sensitive to sampling technique, with scraping producing the aforementioned excellent per- formance and vaginal aspiration resulting in over 65% false positives for 5% false negatives. .Vaginal aspiration is known to be very poor at harvesting cells from squamous cell cancer, especially carcinoma in situ [l, p. 57]. Actual performance realized by the Cytoanalyzer is discussed in the next chapter. Quantification of chromatin texture analysis has been somewhat more difficult, since there are no universally understood measurements of texture. The very subjective texture characterizations of Table 2 illustrate this. Some progress has been made toward this goal, however, via experiments with computer-oriented texture measurements [15» 125 s 166, 172, 187]. Some of these algorithms are discussed in Chapter III. The parameters of cell-to-cell variability and cell arrangement have apparently not been subjected to objective analysis beyond that of Table 2. 2. Non-Morphological Characterizations This section is included to provide context for the discussions of morphology and to provide background for the section on non-morphologi- cal measurements (III.C.2.). No claim is made for completeness or currency. Much is being learned about the processes of malignancy which very probably will be applicable to automated cancer detection. Some of these insights may lead, for example, to development of more specific staining procedures which could vastly simplify an image-processing approach to cell analysis. Other criteria might lend themselves to non-morphological techniques such as automated chemical analysis or high-speed spectrophotometry. 21 Increased DNA Content It has been shown [185] that cervical cell samples from women with cervical cancer typically have a population of cells with increased DNA content superimposed over the population of normal cells. An amount of DNA in a cell nucleus which corresponds to the amount found in a normal interphase (between divisions) body cell is called the diploid amount. The amount found in a replicating cell is, on the average, twice the diploid amount and is called the quadruploid amount. If a histogram is made showing number of cells vs. DNA amount , then a mode normally occurs at the diploid amount for cervical cell samples. If for some reason replicating cells are included, a mode will occur at the quadruploid level. In samples of cancer cells, the histogram is spread out, with large numbers of cells containing aneuploid amounts of DNA, and with modes occurring at places other than at diploid and quadru- ploid levels, indicating the presence of abnormal stem lines or clones [27, 72, 2k"j], DNA has properties which facilitate deter- mination of its quantity photometrically. It absorbs ultraviolet light in proportion to its mass, especially if the wavelength of the light is in the vicinity of 260 nanometers. [267]. Also, DNA binds basic dye molecules. The mass of bound dye can be measured photometrically, permitting the DNA mass to be inferred [237, p. 2]. Immunological Indicators It is a theory that the transformation of a normal cell into a cancer cell includes changing the chemical structure of the external cell membrane to the point that the cancer cell has antigenic properties like those of a foreign protein entering the body tissues from outside. In normal circumstances, the antibody 22 response would dispose of the tumor cell. In cancer, however, the immune system is somehow suppressed, and the tumor goes unchallenged. [2l8, p. 595]. Presence of tumor-specific antigen has been demon- strated in hamster cells transformed by a virus into cancer cells. [218, p. 6ll], The antigen can be detected by the fluorescent antibody technique in which antibodies are tagged with fluorescent dye molecules. [25^]. When the antibodies attach to the antigenic tumor cells, the fluorescent dye permits identification by fluores- cence microscopy. [92]. Conversely, tumor cells in humans have been detected by measuring the absence of isoantigens. Isoantigens are antigens which cause antibody responses when cells are trans- ferred among individuals of the same species, as in the case of the A, B and blood types. Davidsohn, et. al. [38] claim to be able to demonstrate a progressive loss of isoantigen as a pre- cursor to the spread of a tumor to noncontiguous tissue. The degree of loss of isoantigen is held to parallel the degree of loss of normal cell differentiation. Some related characteristics of cancer cells are also due to the alteration of the normal chemi- cal composition of the cell's external membrane. For example, it has been hypothesized [218, p. 591] that animal cells stop multi- plying when they touch one another (contact inhibition). Some cancer cells do not have this property, and continue to reproduce even when piled up, as demonstrated in cell cultures. Also, normal animal cells show "selective stickiness" i.e., they prefer to adhere to cells of their own type. Again, cancer cells do not show this, providing more evidence for the existence of changes in the cell membrane. This change can also be reflected in an altered electri- cal charge on the cell surface which can be measured by electrophoresis 23 [2^9]. It was also discovered that the surface changes result in a tendency for cancer cells to be precipitated by a certain glyco- protein derived from wheat germ [2^9; 218, p. 596]. Unfortunately, wheat germ agglutinin also appears to precipitate normal parabasal and some benign though atypical cells. [2U9]. Presence of Abnormal Cell Products Another theory supposes that cancer can cause changes in the chromosomes of a cell. [218, p. 597]. If so, it would be reasonable to expect to find changes in the proteins coded by the chromosomes and in the biochemical processes controlled by those proteins. These phenomena have been observed and proposed as cancer-cell characterizers . One such discovery was that many types of cancer cells excrete more lactic acid than do similar normal cells. This was determined to be due to an unexplained change in the (unknown) mechanism which regulates glucose consump- tion. [218, p. 59^]. It is also possible that production of abnormal enzymes can be detected by biochemical assay using an automatic analyzer [2U8] or by development of specific staining procedures [250]. 2k III. REVIEW OF CYTOLOGY AUTOMATION A. Applications Research and development in automating the measurement of micro- scopic objects has at least a quarter-century's history. Early impetus to this work was given "by a desire to automate the extraction of size distributions of a variety of micron-sized particles ranging from red blood cells to coal dust, since statistically significant estimates of these were and are very costly and difficult to obtain manually [217]. Recently, much work has been engendered by a desire to develop instru- mentation for clinical use in the cytology laboratory, and for making automated, quantitative photometric studies of cell populations. A wide range of purposes can be discerned, as suggested by Table 3, a somewhat arbitrary sampling of work done in the field of cytology automation. In the next sections, several cytology automation projects are discussed in terms of the techniques used to solve two problems which seem to be fundamental: presentation of the cells to the machine for analysis, and implementation of measuring and analysis strategies. B. Specimen Transport The method by which cell samples are prepared and presented to the sensor system is critical in determining feasible approaches to the analysis problem. Three fundamental approaches have been tried in the past: (l) microscope slides, (2) fluid transport, and (3) linear deposition. 1 . Microscope Slides Microscope slides are the traditional method of carrying cells for presentation to human microscopists and to a variety of instruments. Initial attempts to develop high-speed scanning devices used slide transport [129 , 127, 115, 87, 199-203, 208]. However, this approach 25 Table 3. Automated Cytology Applications Some Representative Bibliographic References Detection of Individual Red Blood Cell Abnormalities Green [60] Red Blood Cell Count and Size Distribution Abnormalities Cell Sorting Brecher [28] Neuron Counting Fulwyler feU ] Hulett [jS 1 Kamentsky [87] Microbe Colony Monitoring Dudley [19] Lipkin [107] Mansberg [115] Detection of White Cell Abnormalities Ingram and Preston [l^+] Glaser [57] Mansberg [112] Chromosome Karyotyping Butler [277] Castleman [276] Ledley [102] Mendelsohn [132] Neurath [278 1 Rutovitz [260] Wald [216] White Cell Differential Count Kamentsky [155] Prewitt and Mendelsohn [l66] Preston [163] Technicon Corporation Young [2U2] Live/Dead Cell Count Kamentsky [87] Exfoliative Cytology Husain/IMANCO [26l] Ishiyama [76] Kamentsky [9^] Nuclear Research Associates [62] Tetronics Corporation [66] Tolles [191] Vickers Corporation [135 J Wied [227] DNA Distribution Van Dilla [211] 26 turned out to be infeasible because the sensing and analysis methods of the time could not distinguish between objects of interest and arti- facts caused by clumping and overlapping of the cell sample while main- taining sufficient speed. Slide transport has been used successfully in instruments where manual location and centering of individual cells is permissible as in microspectrophotometric cell constituent analysis [7, 8, 27, 31, 185-187, 226, 228-232, 235, 237, 252], or in research environments where morphologic parameters for cell recognition are under investigation independent of speed requirements [ 13-18, 20, 60 , 107, 125, 1^7, l6l, l66, l68, 179, 227, 233, 242] , or in those cases where the cell sample tends not to present severe clump or overlap problems [75, 102, 112, l63, 2l6]. However, there is motivation to use micro- scope slide transport in automatic cell analysis systems because (l) it greatly simplified interfacing machine results with traditional morpho- logic taxonomies, especially in the case of. screening applications where positive samples need to be re-examined by a human. If the sample is disrupted by the transport technique, much valuable information about cell-to-cell relationships is lost. If a second sample must be taken in each case to permit human interpretation, intricate legal questions arise because of the possibility of missed positives in the second specimen. (2) In addition, a capability for interpreting slide- carried cells would permit the use of large numbers of existing labelled samples now stored in archives. For these reasons, attempts are being made to improve image-processing technology so that more complex cell images can be handled [123, l6h 9 255, 256]. 2. Fluid Transport Many of the problems in cell location, isolation and measurement are greatly simplified if the cells can be placed in a fluid (usually liquid) suspension and transported past the 27 sensors in an orderly, predictable manner. To implement this, the cell suspension has been transported via capillary tubes or flow cells, sometimes with special laminar flow conditions. The latter was initially described by Cros land-Taylor [37], and is usually referred to as a Sheathflow or Cros land-Taylor system [lkk 9 211]. A capillary containing the sample suspension is enclosed in a larger tube in which a flow of clean fluid is maintained. As the sample suspension emerges from its capillary, it is surrounded by a sheath of clean fluid. The composite is drawn through a constricting nozzle which focuses the central stream down to a fine filament, perhaps only 10 urn in diameter. Clogging problems are reduced, since the constriction is done by the fluid sheath. Where clogging is not an acute problem, or if a close positional tolerance is not required, then tubes or channels of diameter on the order of 100 ym can be used. Systems have been developed in which the cell population is physically separated into fractions on the basis of the measurements made. To do this, the solute stream can be broken into droplets by periodic vibration of the fluid column. A charge is applied to the droplets containing cells of interest, enabling them to be electro- statically deflected into a separating container. This technique was adapted by Fulwyler [5k] from an ink-writing oscillograph of Sweet [262]. Another method of cell sorting is used by Kamentsky in his Rapid Cell Spectrophotometer [101]. A fluid switch causes selected cells to be pulsed into a side channel and saved on a filter, an approach motivated by a goal to select the one-hundred most diagnostic cells in a cervical cell sample for viewing under a single microscope field as a means of solving some of the re-screening problems in the use of flow transport systems. 28 3. Linear Deposition This approach is similar to the fluid transport systems in that the cell sample is "broken up and suspended in a fluid. However, instead of carrying the fluid past the transducers, the fluid is laid down in a narrow track on a plastic film or glass substrate using a device similar to a drafting pen. The track is subsequently carried past the transducers. This procedure has some of the advantages of both the microscope slide and fluid transport systems. The cells are flattened and static in shape, so conventional morphologic interpretation can be made relatively easily by machine or human. On the other hand, the object isolation problem is somewhat reduced. Track-laying systems have been developed by Vickers [135] and Tetronics [39, 66] Corporations in Great Britain for cervical cell screening machines. Both of these utilize a plastic film of about one meter in length for a cell sample, and lay a line of cellular "ink" approximately 1000 (Vickers) or 500 (Tetronics) ym wide. A scanner built by Mansberg for fluorescence measurements [ll6] included a long flexible glass ribbon to hold the specimen. h. Other Systems Mansberg also described a record-player like scanner in which a fluorescing sample on a membrane was scanned in a spiral pattern. Other specially designed transport systems for Petri dishes, etc., have also been considered [112] . C. Measurement and Analysis 1. Morphological Analysis Morphology is defined by Webster as "the features comprised in the form and structure of an organism or any of its parts." Since form and structure are most commonly sensed visually, mechanized morphological measurements are typically made on some sort of simulation of a retinal representation of an image of the object. Morphology can be measured in other ways, such as by 29 measuring diffraction effects, but these involve fundamentally different techniques, and are discussed briefly in Section III.C.2. To form the retinal representation, light energy transmitted by or reflected from the object is measured either by a retina-like array of sensors, or more commonly, a single sensor time-shared by scanning and sampling in a systematic pattern. This process generates a picture , defined by Rosenfeld [l8l] as a non-negative function of two variables which is non-zero only in a bounded region of some standard size and shape, e.g. a square. Two fundamental approaches have been used in making and manipulating picture representations, which can be named the one-dimensi on al and two-dimensional approaches. In a one-dimensional approach, the picture function is processed by sampling in a TV-like raster scan. This produces a time-varying video signal which can be analyzed by digital or analogue signal processing techniques transplanted from well-known engineering practice. A limited amount of two-dimensional information can be processed by the use of a shift register or delay- line buffer memory of, say, the previous few scan lines so that measure- ments of the same object on succeeding scan lines can be associated. This approach has been used in many high-speed microscope image analysis systems so far because fairly high processing rates can be achieved at relatively low cost. Processing occurs in real-time with the scan, which can occur at television rates. However, applications have been limited to those where a rudimentary analysis will work, e.g. where pre-set thresholds can be used to determine when an object boundary has been crossed. Two-dimensional approaches do not constrain analysis to a parti- cular spatial sequence. However, storage and accessibility of the image 30 become problems, as does processing time, since conventional number or character computers have a one-dimensional topology poorly adapted to picture analysis. Alternative architectures have been proposed [82, 106 , 123, 257, 258], a^a a discussion of one of these is contained in the next chapter. The rest of this section on morphological analysis considers various morphology -based systems which have been described for use in automatic cytology. Discussion will frequently be in terms of three processes: Object identification and isolation which involves locating cells and distinguishing them from other objects which might also be present, and also making object sub-segmentations; Shape analysis which involves parameters of object contours, however established; and Texture analysis , which is based on brightness measurements independent (except on a very local level) of shape information. There is some interaction of these processes, e.g., texture analysis can be used to establish a contour for making shape measurements, or can be used in object location. This breakdown of morphological analysis into shape and texture analysis is related to a definition of image by Huang, et.al. [259]: "We consider an image as the sum of three components: the low- frequency part, the edges and the textures." Texture is defined, therefore, as what is left over when the edges and low- frequency parts are subtracted out. Shape information is contained in the edge and low- frequency components. In the Cytoanalyzer project, an attempt was made to develop a clinical instrument for cervical cancer pre-screening using scanning techniques transplanted and extended from particle counting and sizing technology [2k, 25, 188, 199-205]. A one-dimensional approach was used, with hard-wired logic analyzing a signal generated by a Nipkow-disk image-plane scanner. Early design objectives [218] were to use Papani- colaou-prepared samples in order that the cytologist's re-examination 31 would be easier. A preliminary version included object-location circuitry which would find cells not overlapped by other cells by examining the sequence of signal levels as the scan crossed a cyto- plasmic border, traversed the cytoplasm, then the denser nucleus, followed by more cytoplasm and finally dropping off the other border. The final implementation, [2U], however, used a highly modified pre- paration, [36], which eliminated cytoplasmic staining and was intended to disperse the cells evenly on the slide. The image was scanned with a two-um aperture in an extended raster of 100 ym x 5 cm. Each time the aperture traversed a chord of a nucleus, a video pulse resulted. The duration was proportional to the width of the chord and the maximum amplitude was proportional to optical density. These chords were summed, resulting in a measurement of nuclear area and integrated density, the parameters determined to be discriminating for cancer cells by the study discussed in II.C.l. Cell nucleus identification was accomplished by rejecting chord pulses preceded or followed by absorbing material. Since the cytoplasm was unstained, a true cell nucleus was supposed not to have this characteristic. Some rudimentary shape measurements were used to eliminate non-nuclei. However, a residual noise count of 10-50 abnormal counts was still experienced on a normal smear. Each nucleus was classified and counted as normal, abnormal, or indifferent. When the normal count reached 10,000, the machine stopped and the ratio of abnormal to normal counts (A/N ratio) was computed. If this exceeded a threshold, the sample was declared positive. In clinical trials using aspiration-collected samples and a proto- type Cytoanalyzer , a false negative rate of 10.3 per cent was achieved for a A/N ratio threshold which correctly identified 36.6 per cent of negatives. The false negative rate happens to be close to that estimated 32 as common and acceptable for human screeners (see Table l), and the overall performance was not very much worse than that predicted for the vaginal aspiration samples [207]. However, these results caused the Cytoanalyzer prototype to be deemed unusable for pre-screening for two reasons [l9l]: (l) It was felt that the 10.3 per cent false negative rate was too high for prescreening, since the subsequent false negative rate of the human screener would increase the error to an intolerable level. It was felt that lowering the A/N ratio threshold to improve the false negative rate would force the false positive rate so high that the machine would not reduce work loads to a significant extent. (2) Some of the counts registered were out of agreement with experi- mental data. An investigation led to the conclusion that the presence of large numbers of leukocytes overwhelmed both the normal and abnormal count categories because single leukocytes could not be distinguished from superficial epithelial cell nuclei and clumps of leukocytes could not be distinguished from enlarged, abnormal nuclei. Furthermore, indi- vidual abnormal cells were called indifferent or normal 82% of the time because of the placement of the decision boundaries. An approach very similar to the Cytoanalyzer was taken in the "Automatic Cytoscreener" described by Ishiyama [ 76 ] , although the objective was to develop a screening machine rather than a pre-screener. This device used glass slide transport, a CRT flying-spot light source, and one-dimensional image analysis with wired-in logic. The abnormal/ normal decision for a slide was made on the basis of the frequency distribution of nuclear diameters. Nucleus identification capability seems to be absent, and no performance data at all on cervical material is given. 33 In a pioneering study, Previtt and Mendelsohn [l66] analyzed images of leukocytes using algorithms which have since seen use by several others [60, 163, 2^2]. An off-line CRT flying spot scanner, CYDAC, was used to generate two-dimensional representations of cell images which were later analyzed with a general-purpose digital computer. However, neither the object isolation nor the texture analysis described used spatial information directly. All processing used only the infor- mation contained in the optical density frequency distribution (gray value histogram) of the image. This seems to have been possible at least partly because the images contained only the cells of interest and were very low in noise content in terms both of random scanner noise and interference from other similar objects. Object location was con- fined to separating the smoothed histogram into three regions: back- ground, cytoplasm and nucleus. This separation was accomplished by assuming that density value changes rapidly as the border between, say, cytoplasm and background is crossed. Therefore, these border density values will occur comparatively infrequently in the picture, and will show up in the frequency histogram as minima. The local minimum of lowest optical density value is taken to represent the threshold dis- tinguishing background and cytoplasm picture elements and the minimum of highest density defines the nuclear boundary. These thresholds can be used to segment either the digitized picture itself or the density histogram derived from the picture. Rudimentary shape and texture features such as cytoplasm area, average nucleus density, density dis- tribution skewness and several others are extracted from the density histogram, and classification occurs on the basis of techniques of standard statistical decision theory. The initial study reported in [166] only classified four of the five leukocyte types (basophil omitted) 3^ and a training set of only 22 cells was used. There was no report of trials using an unknown set. Further experiments were reported in [167]. In contrast to the methodological and research orientation of the Prewitt-Mendelsohn project, the Perkin-Elmer Corporation's Cellscan/ GLOPR system [7^-75, l6l-l6U] was intended to lead eventually to "a practical system for general use" [163]. The initial purpose [l6l] was to demonstrate the feasibility of semi- automatic ally scanning "blood smears to locate very rarely-occurring binucleate lymphocytes, which are evidence of low-level radiation damage. The approach used is also in contrast to the Prewitt-Mendelsohn procedure in that optical density information is eliminated as quickly as possible and processing is done on a binary picture. On the other hand, later versions of Cellscan utilize a histogram technique to derive the binary pictures, segmenting the multiple density input image into cytoplasm/red cell and leukocyte nucleus picture elements. These planes are processed by a special purpose hardware image processor, the Golay Logic Processor (GLOPR) [258, l6U]. The two-dimensional GLOPR operations can extract features for shape and texture analysis using local homogeneous operations very similar in spirit if not in implementation to PAU operations of the Illiac III (see next chapter). Texture analysis of the nucleus fine structure is accomplished by an iterative thinning algorithm which strips ones from the borders of connected components of the binary pictures. At each iteration, the number of remaining isolated ones are counted. A record of these counts as a function of iteration number is a texture feature which indicates the number and size distribution of granules in the nucleus [75]. Several shape measurements can also be made by using similar combinations of marking, propagation, and counting processes, for example to measure area vs. perimeter, or to locate and size concavities and inclusions. A late version of Cells can /GLOPR 35 includes an automatic vibrating-mirror scanner with a hardware object locating capability. Apparently, the latter is done by one- dimensional wired-in signal analysis using simple criteria for identifying objects likely to be white cells. All versions of Cells can have used microscope slide transport and standard staining procedures. Preparation has included spinning the slide to produce a monolayer of well-dispersed cells. The system can apparently do a differential leukocyte count as accurately as a human scanner [26U]. However, two factors mitigate against its acceptance for clinical use: (l) As currently implemented, it is much too slow (overnight to do one slide); and (2) Technicon Corporation, the IBM of clinical lab equipment, recently introduced Hemalog-D, a flow system with non-morphological analysis, which can do the differential count at realistic rates. The use of spectral (color) information has been investigated by Young [2U2J at MIT as a means of object isolation in color photomicro- graphs of blood smears. The objective is to classify each picture element as background, red cell or white cell. A color transparency is scanned with a broad-spectrum CRT-generated flying-spot. Two dichroic mirrors separate the transmitted light into red, blue, and green com- ponents which are each sensed by a photomultiplier and digitized to eight bits. The color data is encoded as two numbers, r = R/(R + G + B) and g=G/(R+G+B) where R, G, and B are the three digitized PMT outputs. The vector (r,g), called a chromaticity pair, is used as a feature vector for a maximum-likelihood classifier in making the red- cell/white-cell discrimination. A Prewitt-type histogram approach is also used to separate background points from cell points. Husain [26l] conducted a study to determine the best color filter to use in a Quantimet monochromatic image processor (see below) to 36 establish a brightness threshold capable of reliably separating nucleus from cytoplasm and malignant from non-malignant nucleus. His conclusion was that "density alone cannot work satisfactorily" because of an "unacceptable degree of overlap in some cases." Husain conducted a further study which generally confirmed the earlier Cytoanalyzer parameter analysis of Tolles , et.al. Green used the histogram method to isolate and segment scanned monochromatic photomicrograph images of red blood cells in an attempt to develop quantitative morphologic measures equivalent to hematologists ' evaluations [60]. Green used noisy images of much reduced resolution as compared to Prewitt-Mendelsohn' s , and found that noise and quantization errors required a more complicated method of locating peaks and valleys in the histogram. A global histogram segmentation was followed by local histogramming and segmenting of the red cells. Object recognition was by fairly simple area and perimeter vs. area criteria. A large number of shape and texture features were extracted, including area, total optical density, and eccentricity. The primary objective was to extract these quantitative shape measurements , rather than to arrive at a diagnostic decision. The Vickers trace-laying system mentioned in Section III.B.3. was intended to be used with a very simple optical device for locating dark blobs of size greater than 12 microns in cervical cell samples [135]. A fair amount of work was done to show that this would be an adequate parameter for pre-screening [175]. However, this claim seems to have met with massive skepticism [22U, 169 ] . A more sophisticated texture and shape analysis was proposed by McMaster [125] , with the 12-micron criterion to be used for quickly locating objects of interest [251], Low noise, high resolution pictures digitized from photomicrographs of 37 individual, isolated cells were used. Five parameters were extracted: average transmission, average transmission difference in adjacent picture elements, average difference in successive centroid-nuclear border radii, average radius, and ratio of maximum to minimum radius. A product, P, of weighted functions of these parameters was formed, with the weights computed by maximizing the difference of the extreme values of P for a small training set of 18 normal basal cells and 11 malignant cells. No effort was made to try the classifier on any cells not in the training set. Similar measurements were automatically extracted from photomicro- graphs of cervical smears by a system of Rosenberg and Ledeen [179] . Object location and identification was accomplished by a raster search stopped by a preset threshold density, followed by size and shape analysis Measurements were extracted (average nuclear radius, average deviation of radii, nuclear area, average nuclear density) and conditional proba- bility distributions computed for 100 normal and 100 malignant cells (pre-identified) . Means and standard deviations agreed closely with Reagen and Wied (see Section II.C.l.). However, conditional distributions published in [179] show considerable overlap between normal and malig- nant cells, suggesting that the features extracted are not very good for cancer detection. This is rather interesting, considering that these measurements supposedly reflect criteria used by human screeners. Also, Rosenberg and Ledeen fail to mention whether the cell samples were from more than one person. No classification of unknowns was attempted using these conditional distributions. A series of instruments for sizing and counting of various micro- scopic objects was described by Mansberg [ 111-116] . All used a one- dimensional approach, and assume that objects are fairly predictable in 38 density, well separated and not too complex in shape. A prototype scanner was described which was intended to scan an entire sectioned human brain (6000 whole-section slides) in 1200 hours to attempt to establish a quantitative correspondence between brain lesioning and neuron depopulation. To do this, it is necessary to distinguish glial cells from neurons. Since shape analysis is difficult with one-dimensional processing, this object identification problem was attacked by trying to defocus the spot enough to eliminate detection of the smaller glial cells. However, problems were encountered in controlling the spot size variation, and at last notice (196U), the glial cell/neuron discrimina- tion problem had not been solved [113]. Mansberg also described a system for counting fluorescing biological objects stained with the fluorescent antibody technique [ll6] (see the previous chapter). Another scanner for making fluorescence measurements on slide- transported smears was proposed in 1951 by Mellors , Papanicolaou, et.al. [127-130] for automating cervical cancer detection. The Quantimet, a vidicon scanning instrument with one-dimensional analysis was developed by Image Analysing Computers, Ltd. for counting and sizing microscopic inclusions in metallurgical specimens. The same device was used for counting goblet cells in specially-stained sections of lungs of rats exposed to sulfur dioxide. However, the machine counts were not directly comparable to manual counts , since the Quantimet could 39 not distinguish stained goblet cells from random blobs of stain [120]. A more sophisticated version of Quantimet was used in some experiments to determine its usefulness in cervical cancer screening [255, 26l], Slide transport and standard Papanicolaou staining was used, although there was a desire to couple the Quantimet to the Tetronics trace-laying machine [26l]. Newer versions of the Quantimet include an ability to extract several morphological features such as integrated density, perimeter, or presence of second phases (nuclei). Both morphological and non-morphological analysis occurs in the Cytoscreener of Nuclear Research Associates [62] in which a programmed Ultraviolet CRT flying-spot scanner generates a raster on cells transported by a laminar flow system. The system measures total UV absorption as a measure of DNA content, and also analyzes nuclear size, nuclear density, cell symmetry and nuclear-cytoplasmic ratio using "a pattern-recognition computing system" not further described. The Cytoscreener also has an object-location capability in that it can recognize and ignore fragmented or clumped cells or debris. All this occurs at a rate of approximately 5000 cells per minute, which is rather slow, since a cell sample may contain 100,000 cells or more. Performance of the Cytoscreener in a clinical trial using cervical cell samples, was fairly impressive. One hundred specimens were used, of which 2U were known to be from cancer patients, and 76 were known to be from normal patients. The specimens were analyzed by conventional cytology and also by the Cytoscreener. The false negative rate was 8.3% for the Cytoscreener and 12.5% for the humans. However, only 22 of 76 normals were screened out by the machine, partly because 31 of the 76 were rejected as having insufficient cellular material for processing. (Compare with Rapid Cell Spectrophotometer performance, next section.) ko George Wied and his associates have conducted extensive investi- gation into image parameters to distinguish malignant and normal cells. According to Wied [363], "there are two roots to the application of pattern recognition principles to biological cells. One has its origin in development of methods and instruments for quantitative cytochemistry. . . The other root... is found in commercial and academic interests for automating clinical microscope screening procedures." Wied has developed a cell recognition system (TICAS) very strongly based in quantitative cytochemistry. Cells are located manually and scanned with a very slow scanning microphotometer (17-20 msec per sample point minimum). Three objectives are enunciated [238]: (l) Description and discrimination of cells which are known to be biologically different but which are difficult to distinguish by standard techniques, (2) Pro- viding computerized access to images of cells accompanied by expert diagnostic opinions, (3) Computer-aided instruction in cyto- and histo- pathology. A system of remote access to TICAS via phone lines has been proposed [238] and now implemented, wherein an image generated by a Zeiss Cytoscan is sent to a central PDP-10 for assessment. A large number of reports on investigations along these lines have been published [226-233, 238, 13-18, 20, 265]. To show that machine analysis could make discrimi- nations difficult or impossible for humans observing standard cytologic preparations, Wied and his colleagues selected several difficult clinical problems: (l) discrimination among uterine glandular cancer cells, normal glandular cells and histiocytes [226, 229], (2) discrimination among uterine glandular cancer cells, cells from a possibly precancerous con- dition of the uterus (hyperplasia), and normal uterine glandular cells [232], (3) discrimination between similar-appearing normal cells from different parts of the uterus (endocervix and endometrium [231, 228], in (h) discrimination between normal and leukemic white blood cells (lympho- cytes) [230, 15]. Also, studies were done in discrimination of cyto- chemically and morphologically identical tissue culture cells from human embryonic lung (HEL) and human epidermoid tumor (HEP) [227, 13, lU]. A variety of methodologies were tried. The usual histogram of optical density values received considerable massaging, being used both as a source of features (integrated or average optical density, sum of five highest density values with non-zero frequencies, sum of frequencies of specific density values) for constructing linear discriminant functions, and as a means of calculating threshold values for segmenting the gray- value image. In the latter technique, a composite optical density histo- gram derived from the entire training set population for one cell type is re-partitioned into unequal bin-widths so that each bin has approxi- mately the same frequency of occurrence, (see Section VIB . ) This is conceptually the same as recoding the image using a maxi- mum-entropy gray-level quantization, thresholding, and applying simple shape-recognition procedures to selected binary isodensity discriminants. Wied and his colleagues have experimented with a large number of features and a wide variety of classification techniques. Recent development of system programming to automote the search for good features and classifiers should be a significant aid to exploration. However, published results do not appear to include investigation of the crucial question of performance of features and classifiers on images which were not contained in the training set used to design features and classifiers. This is typically the point at which pattern recognition systems have a tendency to fall apart. Also, the imagery used in Wied's studies was of a particularly high quality in terms of low noise, high resolution and accuracy. Whether Wied's methods can be used under less favorable 1+2 circumstances is questionable, although it should be pointed out that Wied's stated objectives do not include high-speed, fully automatic cell sample analysis. 2. Non-Morphological Analysis Several systems have been developed using fluid-flow cell trans- port. These systems capitalize on the ability to control cell positions in a flow channel and typically make whole object (zero-resolution) non- morphological measurements. The Rapid Cell Spectrophotometer (RCS) of Kamentsky/IBM [8U-91, 9k] is a general purpose instrument which has been applied to cervical cancer screening [9*0 s blood cell differential counting [155], cell viability assay [87, 126] and exploratory studies to establish photometric features which could distinguish populations of functionally different cells [89, 73]. The RCS is capable of measuring 500 cells per second and has been equipped with a fluid-switch cell sorter. In the cervical cell application, the RCS was subjected to a clinical trial in 1965 [9^]« Performance for cervical swab specimens was not spectacular: 15$ false negatives with 32$ false positives, and for vaginal wash specimens 50$ false negatives for 25$ false positives. In addition, problems with sample preparation caused only *i5$ of the cervical swab and 70$ of the vaginal wash specimens to be usable. This performance was apparently found to be unsatisfactory for clinical use. The Coulter counter is a highly successful fluid transport system which has enjoyed extensive clinical and research use for establishing size distributions for a large variety of objects [253]. No optical measurements are made; the objects to be measured are pumped in an electrically conductive fluid medium through a small aperture in a non- conducting plate. A current is also flowing through the aperture. U3 Passage of an object causes a change in resistance through the aperture, generating an electric voltage pulse proportional to the volume of the object. This device was used [97] in an unsuccessful [68] experiment in cervical cancer screening. The Coulter counter has also been used as the front end of other fluid- flow systems, since the Coulter pulse can be used to control a variety of events. An electronic cell sorter built by Fulwyler [5U] combines a Coulter counter with an electrostatic droplet deflection system like the one used in the Inktronic printer, and can sort 1000 cells per second on the basis of cell volume. A stream of cell-containing fluid is broken into droplets by applying a vibration to the nozzle forming the stream. The Coulter volume signal is used to apply a proportional electrical charge to the droplet containing the cell. The droplets then pass through an electrostatic field which deflects the cells according to the volume-proportional charge. The Automatic Multiparameter Analyzer for Cells (AMAC) proposed by Leif [105] is also built around a Coulter-effect synchronizer; having, however, a more elaborate sensor system with optical measurements available at several wave lengths. A system described by Hulett, e.al. [73] uti- lizes the droplet cell-sorter of Fulwyler, but substitutes a fluorescence measurement for the Coulter-effect volume signal used to charge the solute droplet. A Cros land-Taylor laminar flow system by Van Dilla et.al. [211] uses a laser excitation source and a multichannel pulse-height analyzer to produce a frequency distribution histogram of fluorescent light emission per cell at a rate of up to 100,000 cells per minute. This instrument was used in a study to establish the time-course of DNA synthesis in a population of mouse fibroblasts. These permutations and combinations of sensors, applications, and particle transport methods are sum m arized in Table k. Table k, Project/ References Fulvyler [5h, 262] Van Dilla [211] Flow Systems — No n -Morphological Measurements Application Measurement/ Performance Sort by volume while retaining cell via- bility to determine relation of volume to functional state Measure fluorescence due to DNA content in Chinese hamster ovary tissue-culture cells. Determine time of DNA synthesis and duration of cell phases. Improved statistical significance of large cell sample size over studies using scanning microspectrophotometers Coulter volume- Record data with multi' channel pulse-height analyzer. 30,000-60,000 cells/ min. 962 viability Fluorescent emission (Feulgen) with Argonio* laser source at U88 nm. Record data with Multi- channel pulse-height analyzer. 10,000-100,000 cells/ min. Transport/ Sorting Method Electrostatic deflection of charged droplet system Crosland-Taylor laminar- flow system. No sorting. Rapid cell spectro- photometer [8U-91, 9U, 126, 155] Cervical cancer screen- ing. Live/dead cell assay. Differential white blood cell count. Population studies. Scattering of white light. Absorption at various UV wavelengths 60,000 cells /min. Capillary tube transport with fluid switch sorting. Mullaney [1U3-IU5] Derive volume spectra of large cell popula- tions. Increased accuracy over Coulter counter for fixed cells. Volume measured by narrow-angle light scattering (primarily diffraction). Laser light source at 632.8 nm. Data stored in a multi-channel pulse- height analyzer. 10,000-100,000 cells/ rain. Crosland-Taylor laminar- flow transport. No sorting. Hulett [73] Separate mixed cell populations on the basis of fluoro- chromasia developed by enzyme action on FDA. Cell viability is retained. Mercury-arc excitation source. Light pulse from fluorescent cell activates charging pulse. Electrostatic deflection of charged droplet stream (see Fulwyler). Technicon Corp. Hemalog-D Differential white blood cell count (Count white cells classified in five types) . Measures fluorescence and absorption at various wavelengths. Capillary tube transport, No cell sorting. U5 IV. EQUIPMENT AND PROCESSES USED This chapter describes some facilities which were used in this study. Figure 7 is a block diagram showing the interconnection of some of the hardware used to acquire and process the images. Digitized images from programmable film and microscope scanners were loaded into the Illiac III core memory under control of an interim software system, Show-and-Tell [lTl], which provides operator control of image selection. In addition, Show-and-Tell provides communication via a high-speed data link with an on-line image analysis package, PAX II, running on the IBM 360/75. PAX II is a parallel-image-processing language developed by the University of Maryland from the original simulator of the Illiac III Pattern Articulation Unit [80]. In the experiments in texture analysis, texture samples obtained interactively using Show-and-Tell were fed into programs developed by Jayaramamurthy , Tareski and Raulefs to implement the vari-valued logic approach to pattern recognition. A. Programmable Scanners A key aspect of this study is the use of a flexible programmable scanner in which the parameters of the image acquired are under computer control. The scanners used are the Illiac Ill's film scanner and micro- scope scanner. Both are of the flying-spot type in which the light source is a small (ca. 1 mil) spot of light generated by a Cathode Ray Tube. In the microscope scanner, light from the spot is projected down through the microscope ocular and is focused by the objective to a smaller spot on the specimen. A Photomultiplier tube (PMT) detects the light transmitted by the specimen. Another PMT independently examines the CRT spot brightness. These two PMT outputs are processed by analog r~ he DEC PDP-8/e SUBSYSTEM TELETYPE LINCTAPE CONTROLLER LINCTAPE DRIVES (8) PDP-8/i 8K WORDS MEMORY INTERFACE INTERFACE mmmvtmmwmmimm mm wf mm ifmmwmm u i t — mwmmmm - ■ mmtmmmmmmmmmmm mm w mwmwmm mm mmmmmmm mmmtmmm EXCHANGE I NET IBM 360/75 SUBSYSTEM 2701 PARALLEL DATA ADAPTER 360/75 i»»im m; mm .; V V -7& -><- FABRITEK CORE MEMORY 128 K BYTES FABRITEK CORE MEMORY 128 K BYTES -,<- PATTERN ARTICULATION UNIT I illiac m ; CORE MACHINE — I/O PROCESSOR **aSSsSMSS,S*3. t : ,j»^>S- «.«t» -*MX: ■MMHMMiHMI! I i_. SCANNER -MONITOR -VIDEO CONTROLLER STAGE 8 FOCUS MOTOR CONTROLLER FLYING- SPOT MICROSCOPE SCANNER 46 MM FILM SCANNER MONITOR SCANNER -MONITOR -VIDEO CONTROLLER 35 MM FILM SCANNER 70 MM FILM SCANNER MONITOR ;,<•,»«-:,/•>■;(<.,»,:.:-_>*« .., . -«*M I|I H I — mm • wiwiwinium mmmm ™- «-■«««*;«<•««•• m S-M-V SUBSYSTEM VIDEO SUBSYSTEM memmmi«i>*r" :f*t vf»v»»tmmm» - VIDEO SWITCHING NETWORK STAGE 8 FOCUS MOTOR CONTROLLER AUTOMATEO VIDEO MICROSCOPE VIDEO MONITOR MANUAL VIDEO MICROSCOPE VIDEO MONITOR LARGE FORMAT VIDEO CAMERA VIDEO MONITOR ,.■<> .< •». -wrft ».«-,;/ ;,»».« -:».^« u «mu»-« MCMMIMHMlHK-»>'MjlW r .-' < >'' DATA AND/OR CONTROL CONTROL ONLY Figure 7. Block Diagram of Equipment Used hi techniques to yield a signal whose amplitude is not significantly influenced by variations in the intensities of the CRT spot. This signal is digitized, currently to four bits, and the resulting numbers are loaded into Illiac III core at a rate of approximately 500,000 per second. Ey deflecting the spot in a regular, TV-like raster, a digital repre- sentation of the image is formed in the Illiac III memory. Following the notation of Rosenfeld [l8l], this representation will be indicated by (a. .), 1 < i < m, 1 <_ $ < n, where m and n are the horizontal and vertical dimensions (in picture elements) of the conceptual array storing the picture. The resolution, proportions and placement of the sampling raster are under computer control. Up to a 256 x 256 element image representa- tion can be contained in the memory as currently configured. The film scanner works in a similar fashion, except that the raster and spot size conform to the format of a U6-mm. film frame. A monitor system permits viewing the digitized images, either in real-time with the scanner, or by retrieving the picture element values from memory. The adjustment of display parameters (magnification, loca- tion on the screen) to improve the interpretability of the monitor display is also under computer control. Since the specimen or film is available on-line, it is not necessary to read in the entire image at one time. Instead, the image source can be treated as a read-only memory of very large capacity, and accessed only as needed. The flexibility of the scanner permits a quick coarse-scan look at large areas of the specimen followed by high-resolution interpretations of areas of interest. The microscope stage motion and focus are also under control of the PDP8e/Show-and-Tell, via digital stepping motors. U8 B. Show-and-Tell Show-and-Tell is an interactive programming system designed to permit on-line development and testing of scanning, preprocessing and feature extraction programs. The early version used in these experiments was intended to provide control of image acquisition, display, and real- time communication with an IBM 360, which would then provide image analysis via PAX II (see next section), [ill] In a 'typical development session, a programmer codes the pro- cessing program as an IBM 360 FORTRAN subroutine, with calls to PAX II as required. The subroutine can include calls to Show-and-Tell to type messages to the programmer, read data typed by the programmer, display or scan images, and transfer pictures to and from the 360. This pro- gram is submitted through the OS/360 batch system. When it begins exe- cution, the programmer is informed, and he may begin testing his program by executing a Show-and-Tell CALL statement. Intermediate results can be displayed on the monitor, and various parameter values tested. Other commands can cause images to be saved on 360 tape or disk for later use as data for off-line testing. C. Parallel Image Processing and PAX II Over the past twenty years or so, a paradigm has evolved for reducing image data: a series of local operations converts the input image into an image or set of images from which the desired data can be relatively easily obtained. This approach apparently originated with Self ridge and Dineen [272, 273] and has been followed by many others, e.g. [258, 257, 270]. For a digital input picture, (a..), a local >*9 operation generates a new picture (b. .), where the value of each output picture element, b . . , depends on the picture elements in some relatively small neighborhood of each a. .. Such an operation can be performed simultaneously for all the elements in a picture, since the output at each point depends only on the original values of the neighbors. This definition follows that of Rosenfeld, who has investigated the alternative case where each new value depends also on the new values of some of the neighbors [2U7]. In this case the local operation must be implemented serially. Rosenfeld showed that parallel local operations can in principle do anything serial local operations can do and vice versa, although with efficiency tradeoffs. If the local operations are homogeneous . i.e. the same function rule is applied at each location, then it becomes economically feasible to build a hardware processing array with a shared control to efficiently implement the operations. The hardware array processor can be expected to perform parallel local operations with a throughput improvement over conventional organization on the order of pq.:l, where p and q are the dimensions of the array. This was in fact realized in the Pattern Articulation Unit of the Illiac III, which consists of 1,02*1 identical processors (stalactites) arranged in a 32 x 32 array [123]. Each processor can communicate directly with as many as eight (rectangular topology) or six (hexagonal topology) nearest neighbors. Local operations involving more distant neighbors can be performed by shifting. The fundamental instruction set, discussed in detail in the Illiac III system manuals [275] s includes four basic classes of instructions: (l) forming logical functions of the contents of each stalactite and nearest neighbors (including simple arithmetic 50 functions); (2) shifting; (3) propagation; and (k) image loading, marking and readout of results. These same instructions were simulated on a conventional computer (an IBM 7090) in the original PAX picture processing software system. By treating the accumulator as one planar row of the PAU, a modicum of parallelism was achieved. Workers at the University of Maryland extended and improved the PAX system, and embedded it in FORTRAN IV [80]. The resulting PAX II system therefore combines the numerical processing con- venience of FORTRAN with a set of FORTRAN-callable subroutines to imple- ment the parallel instructions. A data management system for storing and accessing the picture" arrays is also provided. D. Texture Recognition using Varivalued Logic Varivalued logic, introduced by McCormick and Michalski [136, 137] is an extension and generalization of the binary-valued logic applied to switching theory. In this thesis, algorithms developed to support the theory of varivalued logic were used to automatically generate local opera- tions capable of discriminating between two textures of cytological interest. As mentioned above, a local neighborhood of an element in a digital picture is some subset of the nearby picture elements. For example, in a rectangular sampling array each picture element and its nearest neigh- bors to the east, west, south, southeast, and southwest form a 3 x 2 local neighborhood. An m x n local neighborhood can be represented as an mn- dimensional vector, e.g., (x i' X 2' ••*' X mn ) 51 and can be regarded as an event in an mn-dimensional sample space. If the digital picture was quantized to h gray levels, then the sample space contains h distinct events. In the procedure described below, a training set of digitized samples of both textures is obtained. Using methods of statistical decision theory, each different event (local neighborhood) occurring in the train- ing set is assigned to one or the other of the textures. Events in the sample space that did not occur in the training set are assigned to a DON'T CARE class. Having done this, one of the textures is regarded as a "true" set and an analogy can be drawn between the events assigned to the true class and the minterms of the disjunctive normal form of a switching function. This analogy is pursued and a suitably modified minimization procedure discovers an entity called an "interval cover," that defines a simplified categorizer preserving the assignment of events to texture classes much as a prime implicant cover defines a simplified switching circuit preserving some desired truth table. The details of the procedure are presented by means of a simple one-dimensional "texture" example. The purpose is to communicate a general understanding of the concepts used in this thesis; a precise, formal exposi- tion by the originators of the theory is contained in [136] and [137]. (l) Defining the categorizer by means of signal detection theory : The purpose of this phase of the procedure is to assign each event in the training set to one or the other of the texture classes so that an unambig- uous "true" set is defined for input to the interval covering process. Statistical decision theory provides a systematic way of doing this so that certain objectives are fulfilled optimally. 52 Let E and E be the sets of different events obtained from the texture samples T and T , respectively, and let n (e. ) Number of occurrences of the event e, in T (the Ik K number of "hits"). n»(e 1 ) Number of occurrences of the event e, in T (the Ok K- number of "false alarms"). nT - Number of events in.T . nT Number of events in T . In the one- dimensional example of Figure 8(a), the events are 1x3 local neighborhoods; the "textures" have been quantized to four levels, so the three-dimensional sample space contains 6U possible events. The nine events of E and the seven events of E are listed in the first columns of Table 5. This information can be used to effect a disjoint partition of E U E that is optimal in the sense that certain decision objectives are satisfied as well as they can be, given the inherent separability or non- separability of the sample data. For example, it may be assumed that all misclassifications are equally costly and that one simply wants to minimize the number of errors. It can be easily demonstrated [59] that the sum of the probabilities of the two types of errors (saying that an event e is from T when it is really from T [a false alarm] and vice versa [a miss]) is minimized when an event is categorized in T if its likelihood ratio '(LR) is greater than 1. The likelihood ratio of an event e, is defined as k 53 TEXTURE i TEXTURE 12 3 um mm Figure 8(a). One- dimensional "Textures," Quantized to Four Gray Levels Table 5. Statistics Derived from the One -dimensional Texture of Figure 3(a) using a 1x3 Template to Define Events Event (e) Jb±l P(e T x ) »0<«> P<« V UUe) L 1.2,3 2,3,2 2,1,2 1,0,2 0,2,3 3,2,0 2,0,2 3,2,1 2,1,0 1.3,2 1,0,1 0,1,2 1,2,1 0.1,3 .15 .25 .10 .05 .10 .05 .05 .20 .05 .15 .20 .15 .20 .10 .10 .10 1.33 .25 LR(e. ) = 5^ P(ejT 1 ) where k P(e k |T°) P(e |T ) The probability of e occurring in T , i.e., the proba- bility of e conditional on T , = n 1 (e R )/nT 1 . P(e |T ) is defined similarly. It can be seen by looking at Table 5 that this decision goal is realized in the one-dimensional example. If only those events with LR > 1 are called T , then four mis classifications (three false alarms and one miss) result. If any other subset is classed T , more errors occur. Other decision goals can be realized by using a different likelihood ration thres- hold [59]. For generality the threshold can just be designated £. With this background, we are ready to partially define the cate- gorizer ¥„ on the basis of the training set information. Let E = Event space. F ={e|eeE^JE andLR(e)>6}. F° 6 = {eleeE^E and LR(e)<£} . F* = {eleeEME-W 3 )}. Then define ¥ R by its acceptance set R, i.e., \(e) = 1, iff e R 55 where and F 1 ^ R <=. F 1 e U F* RnF 0e = *. Note that the determination of which events in F* are in R has not been made at this point; these represent DON'T CARE events that are assigned as described in (2) below. The receiver-operating- char act eristic (ROC) curve is a useful device for observing and predicting the behavior of these categorizers. To make the curve, each event eeE~TJE is regarded as a two-component vector with x = p(e| T ) and y = p(e| T ). An ordering can be imposed on these vectors by sorting them in descending order by the likelihood ratios of the e. The curve is generated by placing the tail of the first vector at the origin and then concatenating the rest in order. For the one-dimensional example, the graph shown in Figure 8(b) is the result. The ROC displays several useful items of information in an easy-to-see form, For one thing,. the training-set performance of a categorizer for each value of 8 is shown directly, since for each threshold the y coordinate is equal to fc p(e|T 1 ) {e|LR(e)>8} and the x coordinate is equal to p(e T°). {e|LR(e)>6} 56 p(hlt) LOO O.SO yrs_* IF i(t)i.23 0.73 / ^-RESPONO "Ti IF l(t)ai.3 0.60 ^■—RESPOND "Tj IF l(e)>l.3 0.43 0.30 - 0.13 ■ ■ L__ 0J5 0.30 0.45 0*0 073 p(fol*« alarms) OJO LOO Figure 8(b). Receiver Operating Characteristic Corresponding to Table 5 Figure 8(c). Generalized Logic Diagram with Interval Covering , 233 130223 2323 of T x against T . Li = X 2 X 3 ; L 2 = X x X 2 X 3 ; L 3 = X x X 2 57 The point on the ROC corresponding to a given value of 6 is easy to find since it is the tail of the vector with slope = g. (Note that g only has a finite number of values with different performance effects.) The ROC also provides a measure of the inherent separability of the textures in the training set. The area under the curve is equal to 0.5 if the textures are nondistinguishable (all events occur with equal probability in both tex- tures) and in equal to 1.0 if the textures are perfectly distinguishable (all events occur in one or the other texture but not both). (2) Implementing the categorizer by means of varivalued logic; In principle the local categorizer described in the preceding section could be implemented by just looking up input events in a table of events and likelihood ratios. However, for real textures and useful neighborhood sizes this process would be hopelessly slow. Also, no categorization would be performed for events not in the training set. By applying some concepts from switching- theory , equivalent but much more efficient categorizers can be generated. This is accomplished by a technique analogous to switching- theoretic procedures for minimization of the disjunctive normal form of a switching function. If Table 5 is viewed as a truth table where events in F ' are true and the others false, then the disjunctive normal form can be expressed as V£.(e.) where £. is a predicate that has output true when the input is a particular event e. from F 13 , and output false otherwise. The symbol V represents the logical OR of the predicates. McCormick and Michalski have developed a generalization of switching theory [137] that permits the transplantation of much of the minimization machinery already in existence. In particular, Michalski' s A Q algorithm for generation of quasi-minimal 58 covers can be used. To explain the method it is necessary to introduce a few items of notation from [137 J # E is the event space as before, i.e., the set of all events = (x v x 2 , ... x n ), 0 IDROP = 1. Parameters Performance TNTOT TTWOP FALSE MT^S T0TAL PERCENT INTOL IDROP ALAms MISSES ERRQRS ERRQRS 5 1 U7 ^7 36 5 2 3 U2 U5 35 5 3 17 32 U9 38 6 1 U2 U2 33 6 2 13 31 Ui» 3U 6 3 33 22 55 U3 7 1 6 31 37 29 7 2 29 15 kh 3U 7 3 52 2 5h U2 8 1 38 25 63 1+9 8 2 63 2 65 50 8 3 69 69 5U 68 B. Cell Detection As described in Chapter II, images of superficial cells can become extremely complex. It is therefore necessary to determine precisely the data to be extracted so that unnecessary processing is avoided. In the present work, it was decided to try to locate fairly well-preserved superficial cells which might be touching or overlapping each other, but not badly crumpled or contained in thick cell masses. The cells to be detected would be the same ones a human microscopist could readily count. It was felt that it should not be necessary to perform detailed image segmentation to accomplish this, and in fact, that this would be impossible to do in the low-resolution images contemplated. Other workers [177, 106] have described algorithms to handle touching or overlapping cells, but usually postulate an idealized input image in which edges are unambiguously determined and noise-free. As can be seen in Figure 9 5 this is far removed from reality, and the algorithms would be useless if applied to the output of any conceivable preprocessing scheme operating on real images in real time. Precise delineation of edges is not required for the present purpose since no measurements are taken. The presence of a cell in a digitized image is established by applying a crude structural model of a flattened epithelial cell and reporting a hit where there is sufficient match, an approach which is similar to that taken in subroutine BLOB. In this case, the model must be somewhat more complex so as to account for the greater range of possible configurations of the complete cells. 69 Processing is based on the model of the cell image described previously in Chapter II. B. 2, in which the cells of interest are composed of a region of fairly consistent optical density (the cytoplasm) with a darker "blob (the nucleus) more or less in the center, as in Figure 3. Cell-detection proceeds by attempting to find blobs which are approxi- mately centered between pairs of "step-down" edges. A step-down edge is an edge oriented orthogonal to a line radiating from a blob point, where the gray-value changes in the negative direction as the line crosses the edge. In principle, this process is insensitive to overlapping of cytoplasms, since the requirement for equally-distant negative-going edges will associate edge pairs with the correct nucleus, unless either (l) the cells are nearly coincident, in which case the distance between the nuclei may be less than the tolerance established for the equal- distance criterion or (2) the cells have off-center nuclei. In case (2), disambiguation may still be possible where the nucleus is centered in at least one direction. The present implementation of this procedure takes the form of five PAX II subroutines: AVG, DIRDIF, RIDGE, RADAR and BLOB. AVG performs a digital low-pass filtering operation which is intended to decrease the effect of small detail and noise on the edge computation. AVG uses an efficient parallel algorithm described by Rosenfeld and Thurston [l82], in which the gray value of each picture element is replaced by the average gray-value in an n x n square neighborhood of the element, when n is a power of 2. AVG does this computation in parallel using arith- metic of n*2+U bits precision. The optimal size of n is a function of 70 the "conspicuousness" (Rosenfeld's term) of the edge to be detected. For this application, n was established by trial-and-error using the inter- active facilities of Show-and-Tell. Subroutines DIRDIF and RIDGE cooper- ate to reduce the blurred image to a binary plane with ones marking the location of edges orthogonal to a given direction. DIRDIF performs a directional differencing operation on the AVG output by subtracting it from a shifted copy, where the shift is in the idrection of interest and has magnitude n (the size of the side of the averaging area). RIDGE detects ridges (local maxima occurring in chains) in the difference pic- ture and eliminates spurious maxima not aligned orthogonally to the given direction. The sign of the difference is available for use in establishing whether an edge is negative-going with respect to the given direction. Subroutine BLOB is pressed into service again to detect the leukocyte-sized cell nuclei used as reference points for the cell-finding operation. In order to determine which of the blob points is centered between negative-going edges, another subroutine, RADAR, is used. RADAR causes each blob point to become a sort of Radar transmitter. The edge points for one sign and direction are the targets. Using CONNECT, the propagation operation of the PAU, a spreading beam of ones is propagated in a given direction (see Figures 10(b) and 12(e)). Edge points caught in the beam are identified by ANDing the beam with the edge point plane at each stage of propagation of the beam. These edge points are caused to reflect a signal back to the blob "transmitter" along the beam, again by using CONNECT. A plane containing the blob points which received return signals at each propagation stage is stored. RADAR is then re-applied in the opposite direction with edges of opposite sign as targets. The output, then, of RADAR applied in a pair of directions is two stacks of planes containing coded range information. Distance is represented by the position of the plane in the stack, and on bits in a 71 plane indicate which blob points received echoes at that distance. If successive pairs of planes, one from each stack, are ANDed, and the result is a plane having a bit on_ for each blob found to be approximately- centered between a pair of step-down edges, the number of edge-pairs discovered for each blob can be used as a measure of "the degree of "ill- formedness" of the cells. Cells detected at an ill-formedness index of two are shown in Figure 12(f). C. Texture Processing Drying artifact can cause large, dark blobs in cervical smears, which must be distinguished from cell nuclei to avoid excessive false positive reports. Drying artifact occurs when cells are not fixed promptly upon being spread on the slide and looks like the large masses featured in Figure 13(a). Chromatin is the result of staining the genetic material in the cell nucleus. When the chromatin shows a texture consisting of 72 H H 0) o bO tSl bO ■H Q -d on Pn 0) bO o •H I bO O -P o XI PL, o ON •H P-H 73 8 Figure 10(a). "BLOB" Operation. Three Gray Levels. IRADUB = 2, IRADLB = 1, IDROP = 2 Ih STARTING POINTS T TARGET POINTS T S I S I I / s s s ~/ s s s s s s s s s s s s s s DIRECTION OF PROPAGATION TARGET POINTS "IN RANGE" OF SOME STARTING POINT Figure 10(b). "RADAR" Operation DIRECTION OF PROPAGATION STARTING POINT "IN RANGE" OF SOME TARGET POINT Figure 10(b) (continued). "EADAR" Operation 76 ilifc. ■ — '-:.L ; ....;*T*"*"rii"'.T; T t. .!—..( — ;' r. v tzt.iti; 'jl.. ^/ — - ± , , . rt;jr:: :;-'-rrT:::;:*-t_t.j'." : ;- * - ■ ~ ■u-j. ........... ::i:^ri:BL11"iH:n:niKs;siy ^l^fulnrii } '^^r=j;~ :: i^-^y-^J^U^sii£a p A -p o ^ o ^-s -p -d o O h ft rO £* O 3 H CO pq Pn I o 5=,!} : »-«•-, eiiii:::]:: ti ?HHp n^ ? Yiiii^-iW-pVV,- : If^; : f;;;-frr ^^ O p pq o 0) o p •H a) a Q CO crt d H o Ph •H O P P O >> \ CD pq K CM H CD ■H 80 large dark clumps it is said to show an active chromatin texture. This condition can occur in nonmalignant cells, but is much more pronounced in the case of cancer. In Figure 13(h), cell nuclei with active chro- matin are approximately centered in the photograph. Note that the drying artifact is a somewhat glassy material containing refractile and light-absorbing areas, while the chormatin textures are mostly composed of light-absorbing areas and are in most cases somewhat less contrasty than the artifact. To test the applicability of the varivalued logic approach (Section IV. D) to this problem, sample textures were acquired by select- ing ik chromatin and 13 artifact samples from 5 different Pap smears. In some cases the same cell was sampled more than once. An attempt was made to sample textures that appeared to the eye to contain some local texture information and to avoid texture regions that would obviously need contextual data for discrimination. For example, the chromatin in some cell nuclei is so condensed as to present an opaque dark blob. It would make no sense to try to separate these on the basis of local tex- ture. The 27 samples contained 32 x 32 picture elements each, with the 2 sample covering an area of perhaps 100 urn on the microscope slide. The gray values were quantized initially to 16 equally spaced gray levels. This was reduced subsequently to h equally spaced gray levels to reduce processing costs. Nine each of the chromatin and artifact texture samples were designated the training set and the rest were set aside as an unknown or test set. The chromatin texture in considered to be T and the artifact isT°. 8l CO CD H O H H 0) O Ch O ft bO O U o •H B o -p o ft en H cu -p o CU -p 0) En •H o !h ,13 o bO •H o ,13 CO ft bO O U a •H 3 o -p o ft en H 0) bO ■H ft -P CJ a ft •H -P bO •H P ft O 82 A 3 x 2 neighborhood was used, so there was a total of 30 x 31 x 18 = l6,TU0 events in the training set. Conditional probabilities and likeli- hood ratios were computed and the ROC curve labeled D in Figure 13(f) was generated as in the one-dimensional example. According to the ROC, if the decision rule "decide chromatin if LR(e)>l" is used, one can expect that when a piece of chromatin texture is presented to the cate- gorizer 67 percent of the events will be labeled "chromatin" and when artifact is presented 5U percent of the events will be incorrectly labeled chromatin. This information is then used to set a threshold to classify texture regions, in this case the patches of texture in the samples. (Note that we were previously classifying local neighborhoods. Now we are classifying regions in a digitized picture.) The training set events were labeled using the LR(e)>l decision fule and the chromatin- labeled events (hits) were counted for each patch of texture. The sample patches were then classified as T (chromatin) if there were more than 650 hits in a patch and T otherwise. There was not complete separation: one-third of the chromatin texture patches were called artifact and 11 percent of the artifact patches were called chromatin, a misclassification rate of about 22 percent on the training set. Likelihood ratios computed on the training set were used to assign the training set events to T or T , using the LR>1 criterion. These events were fed to the A algorithm and an interval covering which required 31 intervals was generated. When this was applied as a local operation to the unknown texture sample set (Figure 13(c)), the binary result was as shown in Figure 13(d). Black spots in Figure 13(d) indi- cate that the neighborhood in the vicinity of the spot was categorized as being T by the local operation. These spots were counted for each 83 Texture patch, as shown in Figure 13(d), and the 650 threshold was used to classify the samples. Figure 13(e) shows how the program classified each sample. In the case of the unknown set, only one error was made. 81+ Figure 13(c). Test Set of Texture Samples. Top Row and First Two in Second Row Are T 1 85 Figure 13(d). Test Set With Hit Counts 86 Figure 13(e). Test Set Classified 87 t.QU D.IS 0.30 . MS SO C . 7S U.90 1.0 Figure 13(f). Receiver Operating Characteristic for Chromatin vs. Artifact Curve A: lxl Neighborhood; B: 1x2; C: 2x2; and D: 3x2 Neighborhood. Triangles indicate performance with LR(e)>l Rule 0.M5 0.6O 0.7S Figure 13(g). Receiver Operating Characteristic with Tapered Quantization of Samples 88 VI. CONCLUSIONS A. Blob and Cell Detection The performances of the blob and cell detectors indicate that this capability for more complex shape analysis has the potential to over- come the immediate problems which have prevented development of a Cyto- analyzer-like instrument. The images used in testing the programs were of approximately the level of complexity one would expect in a mechanically dispersed cell sample, such as the one developed for Cytoanalyzer . Therefore, it is reasonable to predict that a processor incorporating the algorithms described above could examine a dispersed cell sample , counting well-differ- entiated epithelial cells , and detecting enlarged blobs while ignoring white cell clumps, two processes which were essential to the Cytoanalyzer ' s planned mode of operation, but which proved to be technically infeasible. If the Cytoanalyzer-format slide is used (l cm x 5 cm), approximately 12,500 fields of 200 urn x 200 urn would have to be processed. To do this in four minutes (approximately the time required by human screeners), pro- cessing rate would have to approach fifty frames per second. This frame rate is well within the realm of possibility for available television cameras, analog-to-digital converters, and memories at the resolution and signal-to- noise capacities required by the blob-filtering and cell detecting algorithms discussed here. Processing each frame with the algorithms described required 991 fundamental parallel array operations (Boolean operations between planes, unit shifts of planes) and U8U more complex array operations (addition, subtraction and comparison of stacks of planes 89 interpreted as arrays of binary numbers). To estimate time requirements, it is assumed that each complex operation takes six times as long as a fundamental operation. This rule-of- thumb was a result of observing that integrated-circuit arithmetic/logic units take about six times as long to perform an operation as simple gates take. Using this information and assuming that scanning can be completely overlapped with processing so that all 20 ms are available—it can be calculated that 5.1 ys are available for each fundamental operation. This execution time is clearly attainable within the present state-of-the-art of circuit technology. B. Texture Analysis In view of the success of recent studies in identifying cancer cells through texture features [125, 18?] and also in view of the large weight commonly placed on chromatin texture by diagnosticians [52], it seems clear that addition of a texture-analyzing capability would greatly enhance the effectiveness of an image-processing approach to automated cancer cell detection. The experiments reported here demonstrated that texture informa- tion permits a greater degree of discrimination between chromatin and arti- fact than is obtainable from intensity data alone. However, a degree of ambiguity is not resolved by the present procedure. Figure 13(h) shows the distribution of hit count values (as calculated in Section V.C. ) for the sample texture patches in the training set and in the unknown set, a total of 27 observations. While there are too few samples to provide a very reliable picture of the actual distributions, it seems clear that there are two populations and that they overlap. There are several parameters that can be adjusted to try to improve separation: sampling resolution in the initial scan; scanning beam wave length; local neighborhood size and con- 90 figuration; and quantization scheme. For example, Figure 13(f) shows the result of changing the local neighborhood configuration from 1 x 1 to 3x2. The hit probability (for the LR > 1 criterion) goes from approximately 55 percent for 50 percent false alarms to 67 percent for 5h percent false alarms, i.e., the hit rate increased 12 percent while the false alarm rate increased h percent. The same l6-level texture patches were used in the generation of Figure (13(g); however, an alternative quantization scheme was applied. Instead of quantizing four gray-value ranges of equal width, a "tapered" quantization was used [l8l]. With this method, the gray-value ranges are adjusted so that approximately the same number of picture elements will have each of the (quantized) values, or to put it another way, each gray value will occur with equal probability in the quantized picture. This has the effect of increasing detail in large areas of low-amplitude high-frequency modulation. The quantization ranges were set once by using a composite gray-value histogram derived from all training set samples. As can be seen by looking at the coordinate of point D in Figure 1Mb ) for the LR>1 . decision rule and using a 3 x 2 local neigh- borhood, a 58 percent hit rate can be achieved at the expense of 38 per- cent false alarms, a separation of 23 percent which is an improvement over the 13 percent separation observed with the equal-range-width quantiz- ing of Figure 13(f). By systematically adjusting parameters in this way and monitoring expected performance via the ROC, it is possible to arrive at an interval cover with optimal effectiveness for this application. 91 HIT COUNT vji \n vn \n o\ / n.ATA JCLH, J*T , jnfw /3*n/ pk n 118 o ICGE Q^RcniriMc o jnGE(/S4DIFF7t /PCLT/t ICIRf PAVG) SURSmiT INF TO FIMO FDGF °CINTS IN r I F^FP PMC F H PI^TIJOF, jMxcfcp niF<;(?,p)/li 1 1 1 - 1 • o f -i. -1,-1,-1,0,-1, i , cs ! , i,i/, 1 ^ JSSKI ( 17), D W^l/0/» n VvK2/0/,^4WKl(A)/A*0/,PAVr, TJT! niP^CHCN LIST r i d s « | (i > = k A V r- Kv = nTP^c,iDTQ) k" r r \l T N 1 1 p Ki/VrKkV + KY f fMT J Ml I c !-L = ( rr ( r tc ski f l I ) CM I V A x \" A ( in , P In K 1 , 5 A n T F r , A ) r ' U I I N c S f P W K 1 , P r U T , I r I R , A , "» ) r A I I PH r fl c;(i ,dwk1 I p = t i j p r, t \ T p y p lP^TFf/S^-llT/ f /^ATM/,P^irN,TniR,PAVG) j iv r r r, p o p <; j ~ v K v .- - K A \/ r •.: r T % ( i , f r T P ) r A | L SHIFTS ( S4WK1 ,SA!Ni,KX,KY, A) r * I L C IP ■* T M ^'t^l.'T ,A , PSIHN, SAIN, A,SAWK! ,A) r .* 1. 1 P F I. F A S ( A , 9 A ECGES CETECTEC ARE PWK3 CAIL CR(PWK5,FUK5,FWK3,C) _TALL C CNNEC(irL2,PWK? ,P^K l,PWK?,JCCLNT ) CAIL A N C < S C I S T < T ),FUK3,FPLCP,0) S D I ST ( I I CO N T A IN. S P L C^_S_ A T_ PI ST I* I^NCR E * CALL ECUAL ( P.WK4, FWK1 ,0 ) CCNTINLE _ _. .: EC"LAL(PCFCGE,PV.K5,C) FPC* ECCE CALL CALL CALL CALL CALL CALL PELFAS( 1 ,FHK1 ) RELEAS( 1,PWK2) RELEASJ ljPWK? )_. RFLEAS ( 1 ,PWK4 ) REiEAS_(Ij_PW_K5 ) RFTIJRN FNn 120 RLTR SUHPnMTTNF PI.CBI IPHTTS , / I P I / t /HP I / , / I PO/ , / I R AOt B / , 1 /IRACUP/ ,/ IMCl/, /IPPCP/ ) JPMITc; -STACK FCP ArrfMUL AT INC HIT COUNTS [Of -INPUT 9 TACK Mnj -NH, PF PLANFS IN INPUT STACK I pn _r|iTD(]T p | a\ c * P A fl |. R _ c f A R C H R A P T U S L P W F c P C IJ N C tpatiio _sPAOf>- bah-TUS IPPFP RCI.JN-n Ptt| _vtntmi w NUMPFO TF PACTI WHICH M U S T RUCCEFP jPPPD -ORpn r W r,S TH CAUSP RACIUS SEARCH SUCCESS 1 00 ?p. n DTMFM^irM TPAFC(4),IPSUR(4),ISFIF(16),IPHITS(4),IP0IFF(4) n tmcmc; J TN TWMnnw(A) , IWKARA( 14) OATA IWNprW /l , 1 , l?fl, 129/, IWKAPA/14+C/ EO'II VALENCE f IPACCCX 1 f IWKAPAdM t ( IPSUP<1 ) ♦ IWKAPA{5i), r ( IPWK1, TMKAPAI ) ) , ( IPPIFFI 1) tlWKAPA(lC) ) , ( I PTPPP , T WK AP A ( \ 4 ) 1 dta| FACTOR (q )/,^Q, .7?, .SO, 1 .0, .59,.7?,.5F f l.C/ TN'Tpr.PP THTSTI f P I « iniSTUf A ) n rt T a T S i- t f /-l r SC,-l ( U-l.ltC,l,!,fl,lf-l»l.-l.fi/ b I. T 1. n A T A 4 ! F P^fTAMCUl AP rp S^A^CM PISTANCES, CrPRECTFT FCR A ractpp HAVING AN ASPFCT RATIC TF 1.42:1.°^. 7 no nr in I Tpt 1 f( ! c ( r a i. 1 TnT or ^ TRI TOM TAL CAI T v = TY = nn o t r ( CAI FA I FA! I, r A! C At t\ CAL CAI r a i C A t CAM. ] r r ST| ( STIJI T P An TRAT C|. P 9=1 T p = t n q= TO L FC 1 r -c TSU T9MT 00 t TTFR 1 5U L ^L SHT IFF i cr T TMJ t su I si. I AN I A<" a« m CPNT SI I 1=1. T »rF T )=F i. q . l L«.P APS ( = 1 , 1 I STL I STIJ I I A|_ C HAL * c ( T ) F ( T + TFR A A T.N PSTN ICF( FTC ( v pr R VR T \ P RSTN TCI P< 1 F 01TP TP + 1 I M. IF CF« I LHAT ( TP AOLP)*PAFTTR (T)+.* I..CA.T ( IPADIJP)*EACTrR ( I ) * , fi T • 1 I c c T r c or t . T o ^0M« ) C rTn QCO TPMTTS,0 ,4 i 5, ? UCIPi ( IOTP ) ( T^U , IP 1,0,4) ( I P A C C , I P I , , 4 ) 1 I T=!,TRUP P . TP I o )r,CTP 2Q (IPriFF,4,TPWKl,IPI,4 f IPACC,A) I°Tnrc,IPCIFF,4,4,inRCP) TPS'IR, TP^UP, IX, IV, 4) I TPWKJ , I PSHP, IPACC ,4,1) (TDACr,!PArc,IP^UP,4,IPWKl) (IP^TFF,4 t IFWKl,IPI,4,IFACC,4) T P W w 1 , TDPIFF t A t A , inRCP) WK1 , TPUK1, TPTrOC, 1 ) ( I P W K 1 , I P HI T S , 4 ) PO, TPHI TS ,4 ,4 ,IN'TCl ) =ormAEC-427 U.S. ATOMIC ENERGY COMMISSION pS^im UNIVERSITY-TYPE CONTRACTOR'S RECOMMENDATION FOR AE DISPOSITION OF SCIENTIF'.C AND TECHNICAL DOCUMENT ( See Instructions on Reverse Side ) I. AEC REPORT NO. COO-2118-0029 UIUCDCS-R-72-U97 2. TITLE Parallel Image-Processing for Automated Cytology 3. TYPE OF DOCUMENT (Check one): PH^a. Scientific and technical report 1} b. Conference paper not to be published in a journal: Title of conference Date of conference Exact location of conference Sponsoring organization □ c. Other (Specify) 4. RECOMMENDED ANNOUNCEMENT AND DISTRIBUTION (Check one): PH a. AEC's normal announcement and distribution procedures may be followed. ~2 b. Make available only within AEC and to AEC contractors and other U.S. Government agencies and their contractors. "2 c. Make no announcement or distribution. 5. REASON FOR RECOMMENDED RESTRICTIONS: 6. SUBMITTED BY: NAME AND POSITION (Please print or type) John S. Read Research Programmer Organization . Dept. of Computer Science University of Illinois Urbana, Illinois 6l801 Signature Date October 1972 FOR AEC USE ONLY 7. AEC CONTRACT ADMINISTRATOR'S COMMENTS, IF ANY, ON ABOVE ANNOUNCEMENT AND DISTRIBUTION RECOMMENDATION: i. PATENT CLEARANCE: O a- AEC patent clearance has been granted by responsible AEC patent group. LJ b. Report has been sent to responsible AEC patent group for clearance. LJ c. Patent clearance not required. BIBLIOGRAPHIC DATA SHEET 1. Report No. UIUCDCS-R-72-U97 I. Title and Subtitle Image Processing for Automated Cytology '. Author(s) John Stevenson Read I. Performing Organization Name and Address Dept. of Computer Science Univ. of 111. at Urbana-Champaign Urbana, Illinois 6l801 1 2. Sponsoring Organization Name and Address U.S. Atomic Energy Commission 3. Recipient's Accession No. 5. Report Date June, 1972 6. 8. Performing Organization Rept. No. 10. Project/Task/Work Unit No. ILLIAC III 11. Contract/Grant No. AT(ll-l)-21l8 13. Type of Report & Period Covered 14. Feb. -June 1972 15. Supplementary Notes 16. Abstracts The prime objective of this work was to examine the effectiveness of a parallel digital image processor in the analysis of cervical (Pap) smear imagery. This task has historically been shown to be very difficult to do by machine, partly because clumping and overlapping of cells has presented difficulties exceeding the capacity of cost-effective image-processing technology. The ILLIAC Ill's special parallel image processor promises a capability to analyze much more complex patterns for a given cost than has been possible in the past. This thesis contains background information on cervical smears, an extensive review of automated cytology, and reports on three experiments: in (l) detecting enlarged cell nuclei in the presence of clumps of leukocytes, (2) detecting superficial epithelial cells, and (3) discriminating two important textures occurring in cervical preparations. The last experiment utilized the Varivalued logic approach to pattern recognition developed by McCormick and Michalski. 17. Key Words and Document Analysis. 17o. Descriptors Digital Image processing Automated cytology Parallel processors Biological cells Automated Cancer screening ILLIAC III Pattern Articulation Unit Varivalued logic Texture analysis Image segmentation 7b. Identifiers/Open-Ended Terms 17c. COSATI Field/Group i8. Availability Statement Release Unlimited 19. Security Class (This Report) UNCLASSIFIED 20. Security Class (This Page UNCLASSIFIED 21. No. of Pages 22. Price ORM NTIS-35 (10-70) USCOMM-DC 40329-P7 1 ♦ ^ ^ 1> C} JUL 26 \973 ■on UNIVERSITY OF ILLINOIS-URBANA 510.84 IL6R no. C002 no. 493-498(1972 Tree height reduction lor parallel proce 3 0112 088400210 m sh IMm ■■**. <$ . ■ i ,..