key: cord-0052656-gp5nf3dy authors: Kemp Bohan, Phillip M.; Chick, Robert C.; Hickerson, Annelies T.; Messersmith, Lynn M.; Williams, Grant M.; Cindass, Jessica L.; Lombardo, Jamie; Collins, Ryan; Brady, Robert O.; Hale, Diane F.; Peoples, George E.; Vreeland, Timothy J.; Clifton, Guy T. title: Correlation of tumor microenvironment from biopsy and resection specimens in untreated colorectal cancer patients: a surprising lack of agreement date: 2020-11-12 journal: Cancer Immunol Immunother DOI: 10.1007/s00262-020-02784-5 sha: 3352dc553f74ae95f93bc69b12c0a85c5182774d doc_id: 52656 cord_uid: gp5nf3dy BACKGROUND: Colorectal cancer (CRC) tumor microenvironment (TME) characteristics, such as tumor infiltrating lymphocyte (TIL) densities and PD-L1 status, are predictive of recurrence, disease-free survival, and overall survival. In many malignancies, TME characteristics are also predictive of response to immunotherapy. As window of opportunity studies using neoadjuvant immunotherapy become more common and treatment guidelines incorporate TME features, accurate assessment of the pre-treatment TME using the biopsy specimen is critical. However, no study has thoroughly evaluated the correlation between the TMEs of the biopsy and resection specimens. METHODS: We conducted a retrospective analysis of patients with stage I–III CRC with matched biopsy and resection specimens. CD3+, CD4+, CD8+, and FoxP3+ lymphocyte populations at the center of tumor (CT) and invasive margin (IM) and tumor PD-L1 status in the biopsy and resection specimens were evaluated. TIL populations were compared using Mann–Whitney U tests or Student’s t tests and correlated using Pearson r. RESULTS: CD3+ and CD4+ densities were significantly higher in the CT of the biopsy relative to the resection specimen Comparing biopsy and resection specimens, no TIL population at either the CT or IM had a correlation coefficient > 0.5. Determining PD-L1 status based on biopsy tissue resulted in a sensitivity of 37.1%, specificity of 81.4%, and accuracy of 61.5%. CONCLUSIONS: These findings demonstrate significant discordance between the TME of the biopsy and resection specimens. Caution should be used when basing treatment decisions on pre-treatment endoscopic biopsy findings and when interpreting changes in the TME between pre-treatment biopsy and resection specimens after neoadjuvant therapy. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1007/s00262-020-02784-5) contains supplementary material, which is available to authorized users. Treatment of localized colorectal cancer (CRC) is based upon the tumor, nodes, metastasis (TNM) staging system, with surgery alone recommended for stage I and most stage II disease and adjuvant chemotherapy recommended for high-risk stage II and stage III disease [1] . However, the TNM staging system offers only a rough assessment of tumor biology. The high-risk stage II disease classification includes patients with clinical findings of obstruction and perforation [1] and excludes patients with microsatellite instability (MSI)-high tumors, a subtle distinction that represents a major shift in cancer care as it acknowledges the prognostic importance of tumor biology [2, 3] . This shift towards incorporating tumor biology into staging and treatment algorithms is further evidenced by the recent FDA approval of the anti-PD-1 antibody, pembrolizumab, for treatment of any MSI-high metastatic or unresectable tumor, regardless of histology or origin [4] . This unprecedented approval for a therapy across all tumors types based on a single biologic factor highlights the importance of accurately understanding tumor biology. The incorporation of MSI status into staging and treatment recommendations [1] highlight recent interest in the tumor microenvironment (TME), a factor that influences response to immunotherapy and serves as a marker of overall tumor biology and prognosis [5, 6] . In CRC, the immunoscore-a measure of CD3+ and CD8+ tumor infiltrating lymphocyte (TIL) densities at the center of tumor (CT) and invasive margin (IM)-has been shown to predict recurrence independent of, and more accurately than, T or N stage [7, 8] . Additionally, a greater degree of inflammation, higher CD8+ lymphocyte density, and higher FoxP3+ lymphocyte density have also been associated with improved survival [9] [10] [11] . Finally, higher levels of programmed cell death-ligand 1 (PD-L1, a component of the PD-1/PD-L1 checkpoint) within the TME have been correlated with higher CD8+ densities, improved disease-free survival, and improved overall survival [12] . Given the predictive value of the TME, it seems likely that future staging and treatment guidelines will continue to incorporate these features. As TME characterization and MSI status become increasingly important in cancer care, it is vital that providers accurately define these factors in a pre-therapy setting before the final resection specimen is subjected to thorough pathologic analysis. This pre-therapy assessment is used to identify patients who would benefit from neoadjuvant therapy or be eligible for an immunotherapy window of opportunity trial, which involves administering an immunomodulatory agent over a limited period of time between diagnosis on biopsy and final surgical resection. Mischaracterization of the preintervention TME could lead to inappropriate enrollment in a trial and incorrect appraisal of the effect of the trial agent on the TME. Despite the importance of TME evaluation from pretreatment biopsies, little is known regarding the accuracy of the biopsy TME and the correlation between the TME from biopsy and surgical specimens. The purpose of our study was to thoroughly compare the TME of CRC (stages I-III) sampled on endoscopic biopsy to that of the definitive surgical resection specimen in patients without a history of neoadjuvant chemo-or immunotherapy. We sought to determine the density of CD3+, CD4+, CD8+, and FoxP3+ lymphocyte populations in both the CT and IM and the correlation of these densities between biopsy and resection specimens. Additionally, we sought to assess the accuracy of an endoscopic biopsy at determining the PD-L1 status of the tumor. This study was conducted after approval by our institutional review board. Patients with pathologic stage I-III CRC diagnosed on endoscopic biopsy who underwent resection with curative intent from the years 2006-2016 at our center were included. Cases were identified through the electronic pathology database at our institution (CoPath Plus, Cerner, Kansas City, MO) using the terms "colonic adenocarcinoma" and "colorectal adenocarcinoma" under the category "final diagnosis". To meet inclusion criteria, formalin-fixed paraffin embedded tissue (FFPE) blocks from both the initial biopsy and the resection specimen of the same malignancy had to be on site at our facility and needed to have at least 2 mm 2 of tumor remaining in the block. Cases were excluded if one of the procedures had been performed at an outside institution, blocks were unavailable, or less than 2 mm 2 of tumor remained in the blocks. Patients with stage IV disease at time of diagnosis, in situ disease only on final resection specimen, or who underwent neoadjuvant chemotherapy prior to resection were also excluded. Tumors were staged using the TNM classification from the American Joint Committee on Cancer (AJCC) 7th edition. Demographic, tumor-specific, and treatment data were all collected from retrospective chart review. FFPE tissue blocks from the endoscopic biopsies and surgical resections were visually examined for tumor quantity. Blocks that contained the most representative tumor material were examined histologically (hematoxylin and eosin) to ensure adequate tumor was available and selected blocks contained representative samples of the invasive margin and center of tumor. Each block was cut at 4 μ and immunohistochemically stained with FoxP3, CD3, CD4, CD8, and PD-L1 (CAL10, Biocare Medical, Pacheco, CA). The immunohistochemical-stained slides were digitally captured at 400 × magnification, using the Aperio ScanScope AT Turbo (Leica Biosystems, Buffalo Grove, IL) digital imager system. De-identified images were uploaded into eSlide Manager (version 12.3.2.5030) so that all interpreting pathologists were blinded to any patient information corresponding to a given slide. Image analysis was performed using Aperio ImageScope (version 12.0.1.5012) and an in-house analysis algorithm. The algorithm was tuned to detect nuclear and cytoplasmic positivity while excluding larger tumor nuclei in an attempt to interpret only lymphocytes. To validate this algorithm, images were captured on the ScanScope and representative fields were manually counted by three different pathologists and compared to the result obtained using the developed algorithm. After validation, slides were scanned into the eSlide Manager server and 1mm 2 areas were manually selected from both the tumor center and the invasive margin ( Fig. 1) . CD3+, CD8+, and FoxP3+ immunostained lymphocyte counts were reported as positive cells/ mm 2 . PD-L1-stained slides were read manually by a pathologist and reported as positive/negative. The PD-L1 stain was interpreted per manufacturer guidelines for interpretation in non-small cell lung cancer (the only FDA-approved indication for this test at the time of this investigation; Fig. 2 ) [13] . Slides containing tissue from the CT, IM, and tumor surface were all examined for PD-L1 staining. If greater than or equal to 1% of the tumor cells were positive for the stain, the specimen was considered positive. If the stain was dark and continuous around the cell membrane, the specimen was considered high positive. If the stain was faint or patchy around the membrane, it was considered low positive. All photomicrographs were captured using the Aperio ScanScope AT Turbo (Leica Biosystems, Buffalo Grove, IL) at 400 × magnification using a 20X FN 26.5 lens with a 2 × doubler. Images were processed using the Aperio ImageScope software, version 12. SPSS v. 24 (IBM, Armonk, NY) was used for all statistical analyses. Continuous variables were assessed for normality using Shapiro-Wilk tests and reported as either mean and standard deviation (SD) or median and interquartile range (IQR) as appropriate. Continuous variables were compared either with Student's t tests (parametric) or Mann-Whitney U tests (nonparametric). Correlations between continuous variables were performed using a Pearson correlation (r). Statistical significance was set at p < 0.05. Matched endoscopic biopsy and resection specimens from 78 patients with sufficient tissue remaining were identified and included. Clinical and pathologic descriptors of those patients are described in Table 1 . Median age was 61 years, and the majority of patients were male (56.4%). The most frequent tumor and node classifications were T3 (53%) and N0 (54%), respectively. The most frequent overall pathologic stage was stage III (46.2%). Tumors were most commonly in the right colon (consisting of all colon supplied by the superior mesenteric artery). MSI testing was performed in 52.6% of patients, of which 85.4% (35/41) were reported as low probability. Lymphovascular and perineural invasion were present in 30.8% and 11.5% of cases, respectively. The TME was assessed at four sites: the CT and IM of the biopsy specimen (CT-B and IM-B), and the CT and IM of the resection specimen (CT-R and IM-R). Comparison of the TME of CT-B and CT-R (Table 2 ) demonstrated significantly larger populations of CD3+ and CD4+ lymphocytes in the CT-B relative to the CT-R specimen (p < 0.001 and p = 0.004, respectively). The CD3+ lymphocyte population was larger in the IM-B relative to the IM-R (p = 0.001). There was no difference in FoxP3+ or CD8+ lymphocyte populations at either the CT or the IM when comparing biopsy and resection specimens. Two sets of correlations using TME populations were performed (Table 3) . First, CT and IM TME populations were correlated in the biopsy and resection specimens separately (Supplemental Figures S1 and S2 ). There were moderate correlations between FoxP3+ and CD8+ lymphocyte populations at CT and IM in the biopsy (CT-B to IM-B, r = 0.700 and r = 0.617, respectively) and resection specimens (CT-R and IM-R, r = 0.673 and r = 0.621, respectively). Second, biopsy and resection specimen TME populations were correlated at the CT and IM separately (Supplemental Figures S3 and S4 ). No lymphocyte population in either the CT or IM had a Pearson r > 0.5 when comparing the biopsy and resection. CD3+ and CD8+ lymphocyte populations at the IM and CT moderately correlated (r values between 0.394 and 0.444) and CD4+ and FoxP3+ lymphocyte populations at either site weakly correlation (r values all < 0.250) between biopsy and resection specimens. Biopsy and resection specimens were then divided into groups based on PD-L1 status (Table 4) . Of 78 patients, 21 (26.9%) had PD-L1+ biopsies, 35 (44.9%) had PD-L1+ resection specimens, and 13 (16.7%) had both biopsy and resection stain as PD-L1+. Patients with a PD-L1+ biopsy specimen had a greater number of CD3+ and CD4+ cells at CT-B and CD3+ cells at IM-B relative to PD-L1-biopsy specimens. Patients with PD-L1+ resection specimens had a greater number of FoxP3+, CD4+ and CD8+ cells at the CT-R, and FoxP3+ and CD3+ cells at IM-R (all p < 0.05) relative to PD-L1-resection specimens. Given the difference in PD-L1 status between biopsy and resection specimens, the accuracy of the biopsy at predicting final specimen PD-L1 status was then assessed (Table 5) . Biopsy specimens were considered the test and resection specimens were considered the gold standard. Only 16.7% of specimens were true positives and 45% of specimens were true negative, with a false-negative rate of 28.2%. The overall accuracy of the biopsy at correctly identifying PD-L1 status of the final tumor was 61.5% (95% CI 49.8-72.3%). This study investigated the degree of correlation amongst TMEs as measured at the CT and IM in CRC biopsy and resection specimens. Our data demonstrate that there is concordance between the IM and CT FoxP3+ and CD8+ lymphocyte populations within individual biopsy and resection specimens. However, there were only moderate correlations between TIL populations (all r < 0.5) at any location between the biopsy and resection specimens. PD-L1+ specimens consistently had higher TIL populations at CT-R and IM-R, but the overall accuracy of the biopsy at predicting resection PD-L1 status was only 61.5%, with a false-negative rate of 28.2%. Only recently has immunotherapy expanded to CRC, and only two studies to date have compared the TME of endoscopic biopsies to those of surgical specimens in CRC. Koelzer et al. [14] compared CD8+ and CD45RO lymphocyte populations in pre-operative biopsies and resection specimens from 130 patients with stage I-III CRC. Higher CD8+ lymphocyte infiltration in the biopsy specimen was found to be independently predictive of improved overall survival (p < 0.01). Lower CD8+ lymphocyte densities on biopsy were also predictive of higher T stage and positive nodal status. Yet when matched biopsy and resection specimens were correlated, only a moderate correlation existed for CD8+ lymphocytes (r = 0.42) and a weak correlation existed for CD45RO lymphocytes (r = 0.16). The authors did not comment on any relationship between TIL populations found in the resection specimen and survival, which would have been useful to determine the clinical significance of this weaker correlation between biopsy and resection specimens and the prognostic significance of the biopsy relative to the resection specimen. Similarly, Park et al. [15] compared CD3+ lymphocytes, CD8+ lymphocytes, and tumor stroma percentage (TSP, an assessment of the degree of intratumoral stroma at the deepest point of tumor invasion) in matched biopsy and resection specimens of 115 patients with stage I-III CRC. No correlation coefficients were calculated, but incorrect characterization of TIL density occurred at frequencies of between 26.1 and 41.1%, depending on the type and location of the various T cells. The group also noted a difference in high-and low-density TSP when comparing biopsy and resection specimens (p = 0.001). The authors concluded that biopsy specimen provides a representative assessment of TME microenvironment, despite the statistically significant difference in lymphocytes populations and tumor stroma between biopsy and resection specimens. Cumulatively, these data suggest a lack of correlation in certain TIL populations between biopsy and final surgical resection. While comparing Pearson r correlations and sensitivity/specificity calculations across studies is difficult as each of these can be independently influenced by a number of factors inherent to each data set, our study now supports a growing a body of literature that question the reliability of a biopsy at accurately representing the resection [14] . Additionally, the accuracy of biopsy intraepithelial CD3+ lymphocytes density at predicting density in resection specimen reported by Park et al. is similarly lacking (73%) [15] . While our current study corroborates the initial data from previous studies, we have included a more exhaustive examination of the TME of both biopsy and resected specimens, including additional TIL populations (FoxP3+ and CD4 +). We found no correlation between biopsy and resection specimens with regards to any of the examined TIL lines. This lack of correlation is concerning as such an assessment is becoming crucial for both deciding patient treatment and determining the efficacy of therapies. Our study also examined PD-L1 positivity and its correlation between biopsy and resection specimens. We found significant discordance of PD-L1 status between biopsy and resection specimens, with 38.5% of patients misclassified based on the endoscopic biopsy. This inaccuracy is driven by a low sensitivity (37.1%) and thus a high falsenegative rate (28.2%), which could clinically result in the withholding of potentially effective immunotherapy from these patients. A possible reason for this low accuracy is a high degree of intratumoral PD-L1 heterogeneity. PD-L1 heterogeneity is well documented in lung adenocarcinoma [16, 17] and breast cancer [18, 19] and it would be reasonable to hypothesize that such variability might also be present in CRC. This heterogeneity likely applies not just to PD-L1, but to different TIL phenotypes and other aspects of the TME. Understanding TME heterogeneity is particularly important because individual TIL populations [10, [20] [21] [22] [23] as well as PD-L1/PD-1 staining [24, 25] have all been shown to have prognostic importance. However, these conclusions are based on studies that only analyzed the TME of surgical resection specimen. As researchers attempt to apply such analyses of the TME in the neoadjuvant setting, an inadequate analysis on biopsy secondary to TME heterogeneity may result in the misclassification of patients. In addition, comparing biopsy and resection specimens separated by chemo-or immunotherapy and drawing conclusions on the effect of therapy on the TME may lead to false conclusions as no reliable correlation between biopsy and resection TME has been published. We must, therefore, use caution when making important treatment decisions based on any biopsy specimen alone, and must similarly be cautious when judging the efficacy of a therapy based on the relative change in the TME that are calculated by comparing a pre-treatment biopsy to a post-treatment biopsy or resection specimen. Future studies must focus on the development of an adequate means of comparing biopsy and resection specimens. While no studies to date have addressed this problem, we would advocate for thorough mapping of the TME using multiple samples throughout the tumor, with initial analysis aimed at understanding the implications of differences throughout the TME. More extensive sampling is also necessary both before and after treatment in window of opportunity trials to more definitively characterize the effects of any therapy on the TME. Similarly, additional studies are needed to fully understand the clinical implications of PD-L1 heterogeneity and if reviewing a greater number of slides to better characterize a patient's PD-L1 expression status can improve our ability to predict response to targeted therapies. There are a number of limitations to our study. First, this was a retrospective study constrained by data accessible in the medical record and remaining tissue available for analysis. Lack of sufficient biopsy tissue remaining after original diagnosis certainly limited the power of this study as patients without available tissue were appropriately excluded. Second, a more thorough analysis would have addressed how individual TIL populations, and particularly those from the biopsy specimen, correlated with clinical outcome. Our data set is limited by a lack of long-term follow-up data for a number of patients, precluding our ability to perform such an analysis accurately. Finally, we did not compare multiple CT or IM TMEs from the same specimen to evaluate for spatial heterogeneity at that particular location of the TME. We would expect that a certain degree of heterogeneity would exist even within the CT or the IM, but determining if a specific location more consistently represents the TME would be of great clinical relevance. This question should be investigated further in future studies. In conclusion, our results demonstrate a weak correlation between TME assessed on biopsy with the TME present in the final resection specimen. In particular, the accuracy of endoscopic biopsy of assessing tumor PD-L1 status was only 61%. Until a more accurate means of assessing the TME using biopsy alone is available, we would use caution when excluding patients from specific therapies based solely on the findings of endoscopic biopsy and interpreting results of window of opportunity studies in colon cancer based on comparisons between biopsy and resection specimens. Author contributions PKB: investigation, formal analysis, and writing-original draft; RC: investigation, writing-review and editing; AH, LM, GW, JC, and JL: investigation, data curation, formal analysis, and writing-review and editing; RC: formal analysis and writing-review and editing; ROB: conceptualization, supervision, formal analysis, and writing-review and editing; DFH: formal analysis and writing-review and editing; GEP: conceptualization, formal analysis, and writing-review and editing; TJV: formal analysis and writing-review and editing; GTC: conceptualization, supervision, formal analysis, and writing-review and editing. All authors approve of the final version of the manuscript to be submitted for publication and agree to be accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved. Funding There were no internal or external sources of funding for this study. The datasets generated and analyzed during the current study are not publically available to protect patient confidentiality. De-identified datasets can be made available upon request. Conflict of interest The authors declare no potential conflicts of interest. This was a retrospective study using prior biological specimens obtained for clinical purposes and chart review. After conferring with our Institutional Review Board, a waiver of informed consent was obtained. Ethical approval The retrospective study was conducted with the approval of our Institutional Review Board. As no contact with patients was required to complete this study, a waiver of informed consent was obtained from our Institutional Review Board. National Comprehensive Cancer Network Clinical Practice Guidelines in Oncology: Colon Cancer Microsatellite instability and adjuvant chemotherapy in stage II colon cancer High-risk stage II colon cancer: not all risks are created equal FDA approval summary: pembrolizumab for the treatment of microsatellite instability-high solid tumors Predictive biomarkers for checkpoint inhibitor-based immunotherapy Relative contribution of clinicopathological variables, genomic markers, transcriptomic subtyping and microenvironment features for outcome prediction in stage II/III colorectal cancer Fridman WH, Pages F (2006) Type, density, and location of immune cells within human colorectal tumors predict clinical outcome Back to the future: routine morphological assessment of the tumour microenvironment is prognostic in stage II/III colon cancer in a large population-based study Tumour-infiltrating T-cell subsets, molecular changes in colorectal cancer, and prognosis: cohort study and literature review Prognostic value of tumor-infiltrating FoxP3+ regulatory T cells in cancers: a systematic review and meta-analysis Clinical significance of programmed death 1 ligand-1 (CD274/PD-L1) and intra-tumoral CD8+ T-cell infiltration in stage II Oncology M A Pathologists Guide to PD-L1 Testing in NSCLC CD8/CD45RO T-cell infiltration in endoscopic biopsies of colorectal cancer predicts nodal metastasis and survival Preoperative, biopsy-based assessment of the tumour microenvironment in patients with primary operable colorectal cancer Heterogeneity of PD-L1 expression in lung mixed adenocarcinomas and adenosquamous carcinomas Heterogeneity of PD-L1 expression in non-small cell lung cancer: Implications for specimen sampling in predicting treatment response PD-L1 expression and intratumoral heterogeneity across breast cancer subtypes and stages: an assessment of 245 primary and 40 metastatic tumors Heterogeneity of PD-L1 expression in primary tumors and paired lymph node metastases of triple negative breast cancer Prognostic impact of FoxP3+ regulatory T cells in relation to CD8+ T lymphocyte density in human colon carcinomas Clinical impact of tumor-infiltrating lymphocytes for survival in stage II colon cancer Prognostic utility of immunoprofiling in colon cancer: results from a prospective, multicenter nodal ultrastaging trial Intraepithelial effector (CD3+)/regulatory (FoxP3+) T-cell ratio predicts a clinical outcome of human colon carcinoma Stromal PD-1/PD-L1 expression predicts outcome in colon cancer patients Programmed cell death ligand 1 expression is an independent prognostic factor in colorectal cancer Publisher's Note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations