key: cord-331592-l44rupmi authors: Wang, Tzu-Hao; Chao, Angel title: Microarray Analysis of Gene Expression of Cancer to Guide the Use of Chemotherapeutics date: 2007-09-30 journal: Taiwanese Journal of Obstetrics and Gynecology DOI: 10.1016/s1028-4559(08)60024-8 sha: doc_id: 331592 cord_uid: l44rupmi Summary The beauty of microarray analysis of gene expression (MAGE) is that it can be used to discover some genes that were previously thought to be unrelated to a physiologic or pathologic event. During the period from 1999 to 2007, applications of MAGE in cancer investigation have shifted from molecular profiling, identifying previously undiscovered cancer types, predicting outcomes of cancer patients, revealing metastasis signatures of solid tumors, to guiding the use of therapeutics. The roles of cancer genomic signatures have evolved through three phases. In the first phase, genomic signatures were described in stored cancer specimens and dubbed as molecular portraits of cancer. When gene expression profiles were carefully correlated with sufficient clinical information of cancer patients, new subgroups of cancers with distinct outcomes were revealed. In studies of the second phase, validation of cancer signatures was emphasized and commonly performed with independent groups of cancer specimens or independent data set. In the third phase, cancer genomic signatures have been further expanded beyond depicting the molecular portrait of cancer to predicting patient outcomes and guiding the use of cancer therapeutics. Cancer genomic signatures have become an essential part of a new generation of cancer clinical trials. It is advocated that, in future clinical trials of cancer therapy, the cancer specimens of each participant should be tested for currently available predictor genomic signatures, so that the most effective treatment with the least adverse effects for each patient can be identified. Then, participants can be triaged to an appropriate study group. A DNA microarray is an orderly arrangement of DNA on solid support, providing a medium for matching known and unknown DNA samples [1] . The types of DNA microarrays and relevant methodologies are reviewed by Chao et al [2] . In this article, we briefly review the current advances in microarray analysis of gene expression (MAGE), focusing on recent reports of the MicroArray Quality Control (MAQC) project and the shift of MAGE usage from molecular cancer profiling to clinical cancer therapeutics. Gene expression profiling has generated and continues to generate extensive information on the molecular mechanisms of cellular function in particular tissues during physiologic or pathologic events. Microarray analysis technology is a high throughput platform for gene expression profiling. The beauty of MAGE is that we can usually discover some genes that were previously not linked to certain physiologic or pathologic events. For instance, we have gained insight into the host response to SARS infection [3] , tumor biology of various cervical cancer types [4] , molecular mechanisms in paclitaxel treatment of ovarian cancer [5] , and the intrinsic difference among the mesenchymal stem cells derived from distinct origins [6] . When DNA microarrays are used to analyze similar tissues, gene expression profiles obtained from different studies used to be notoriously varied [7] , sometimes even conflicting. Possible causes for the discrepancy include different assay platforms using different sequences to represent a particular gene, non-uniform coverage of gene sets, distinct data filtering strategies, various statistical stringency, as well as data complexity and variability [8] [9] [10] . The identification of differentially expressed genes in a studied condition with DNA microarray analysis is often determined by the criteria set by the investigators. Therefore, concerns have been raised regarding the reliability of microarray results due to the varied and often conflicting reports [11, 12] . To address this concern, a collaborative effort led by the United States Food and Drug Administration that included 137 scientists from 51 organizations representing academia, industry, and the US government has completed the MAQC project [13] [14] [15] [16] [17] [18] . In this project, identical specimens were aliquoted and assigned to participating laboratories to analyze using different microarray platforms, including those manufactured by Applied Biosystems, Affymetrix, Agilent, and GE Healthcare. To validate the quantitative capability of microarrays, microarray results were compared with real-time quantitative polymerase chain reaction (PCR). The correlation between Affymetrix gene expression results and TaqMan real-time quantitative PCR results has shown good linearity (r 2 = 0.95) [15] . A fold-change ranking method with a p-value cutoff < 0.05 has recently been shown to be reproducible in selecting the signature gene list from results using different microarray platforms [18] . These selection on criteria have been shown to more reproducible than t-test p value or significance analysis of microarrays [18] . We have applied this method in selecting the signature gene expression profiles with ease; after filtering using p < 0.05, we ranked genes by fold change and chose the top 25 genes that were upregulated in each group of mesenchymal stem cells derived from amniotic fluid, amniotic membrane, cord blood, and adult bone marrow [6] . Collectively, because of the remarkable improvement of microarray technology and the aforementioned critical evaluations, the majority of microarray researchers recognize the reliability and consistency of well-designed and carefully conducted microarray results. Even in the 2 years before the publication of the MAQC project, the clinical and biologic findings derived from microarrays were regarded to be "remarkably robust, with a high level of quantitative precision" [19, 20] . The recent MAQC results further demonstrate that microarray gene expression analysis itself is suitable as a stand-alone quantitative comparison [17] . Nevertheless, we should not ignore potential flaws. All the encouraging results of the MAQC project only establish that microarray technology is robust, but they do not imply that the technology is foolproof. Quoted from a commentary in the November 17, 2006 issue of Cell, "You can learn to do PCR well in a month. But with microarrays, it can take years." [13] . To evaluate how MAGE can help to make a diagnosis or choose a therapy, researchers use one set of patients to identify a gene-expression pattern called a genetic signature that can correspond to a clinical issue, such as a 5-year survival rate, the response to a treatment, or the induction of side effects by a drug. The power of microarray technology is its ability to use changes in multiple genes as the pattern of gene expression rather than to choose thresholds of individual markers [19] . This genetic signature is then validated on other groups of patients [13] . During this trial period, it is critical that investigators understand how to minimize expression noise and bias through effective design. Expression noise can be defined as gene expression variation that does not correlate with the biology or behavior being studied and is introduced by both the technology itself and/or during tissue processing [21] . Bias is not inherent to microarray analysis but is easily introduced by faulty experimental design [21] . A series of sophisticated analytical strategies to address these problems have been discussed [19, 22, 23] , as summarized in Table 1 . An unsupervised analysis does not use any a priori class definition, but it simply seeks to determine what structure is inherent in the data [19] . A commonly used example of unsupervised analysis is hierarchical clustering, i.e. letting the data define its own patterns by clustering genes that are most similar in expression profile [24] . A supervised analysis is more likely to reveal putative associations between genes and the cytogenetic class, but it may bias the outcome by forcing a model onto the data, i.e. the "overfitting" risk [19] . To extract robust profiles from multiple data sets, a meta-analysis has been done on 40 independent data sets derived from more than 3,700 array experiments, identifying 36 cancer signatures that were activated in cancer relative to the normal tissue from which the cancer arised [25] . A meta-analysis of these signatures further identified 67 genes that were activated in 12 or more signatures, suggesting a common transcriptional program pervading most types of cancer [22] . In functional enrichment analysis, a series of external functional information has been used to interpret and summarize large cancer signatures milestones [22] . Databases of external functional information include Gene Ontology (www.geneontology.org) [26] , Kyoto Encyclopedia of Genes and Genomes (www.genome.jp/ kegg/), Biocarta (www.biocarta.com), and GenMAPP (www.genmapp.org). Commercially maintained integrative databases and softwares include MetaCore by GeneGo (www.genego.com/metacore.php) and Ingenuity Systems (www.ingenuity.com). We have recently used the MetaCore suite to analyze the signature profiles of mesenchymal stem cells of various origins and obtain insights into biologic processes of each group [6] . The goal of transcriptional network analysis is to simplify a complex cancer signature to a small number of activated transcriptional programs that may shed light on neoplastic mechanisms and further point to potential targets of therapeutic intervention [27] . In addition to the aforementioned functional enrichment analysis, in which many of the downstream effectors are transcription factors, chromatin immunoprecipitation coupled with promoter microarrays (ChIP-chip assays) allow for genome-wide identification of transcription factor-binding sites [28, 29] . With hundreds of consensus binding sequences for transcription factors, which have been defined by sequence-based methods, it is feasible to perform a large-scale integrative analysis of binding-site profiles and cancer signature expression profiles [22] . Analysis of expression modules, in which functional pathways (i.e. gene modules) are used as gene modules, was proposed to extend the investigation of cancer gene expression from individual genes to biologic processes [23] . When this concept of higher-level modules was applied to examine the joint behavior of differentially expressed genes in diabetic muscle, a significant change in the whole set of genes was noted, even though the expression of individual genes was not significantly different [30] . Segal et al used this module-level analysis to obtain a global view of the shared and unique molecule modules underlying human cancer [31] . They demonstrated that activation or repression of some modules (e.g. cell cycle) was shared across multiple tumor types and could be regarded as a general tumorigenesis, whereas others (e.g. growthregulatory modules) were more specific to tissue origin or progression of particular tumors [23] . Applications of MAGE in clinical cancer investigation have shifted from molecular profiling in the year 1999 [32, 33] , identifying previously undiscovered subgroups of particular type of cancer [34] , predicting outcomes of cancer patients in 2002 [35, 36] , and revealing a metastasis signature of solid tumors [37] , to guiding the use of therapeutics in 2006 [38] , as summarized in Table 2 . The use of MAGE as the guide of cancer therapeutics has also been compared in meta-analyses in large B-cell lymphoma [39] . In a leukemia data set of 38 bone marrow samples (27 acute lymphoblastic leukemia and 11 acute myeloblastic leukemia), Golub et al tested whether gene expression monitoring by DNA microarray could be to assign tumors to known classes (class prediction) [32] . Using a supervised learning classification algorithm, Golub et al first constructed class predictors and then evaluated them using cross-validation with the same collection of specimens with known outcomes. Their results suggested a general strategy for discovering and predicting cancer classes, which proved useful in predicting outcomes in patients with other tumor types [36] . As a proof of principle, Perou et al used cDNA microarrays to identify genes of differential expression between in vitro cultured human mammary epithelial cells and breast tumor specimens [40] . Their results supported the feasibility and usefulness of this systematic approach for studying variations in gene expression patterns in human cancers as a means to dissect and classify solid tumors [40] . Then, 64 surgical specimens of human breast tumors from 42 patients were analyzed for gene expression profiles [33] . They identified a set of co-expressed genes for which variation in mRNA levels could be related to specific features of physiologic variation. Molecular portraits of cancer with gene expression profiles were thus proposed [33] . Diffuse large B-cell lymphoma (DLBCL) is one disease in which further subgrouping by histology is difficult because of inter-and intra-pathologist irreproducibility [41] . Using hierarchical clustering for MAGE profiling, 40 DLBCL specimens could be divided into two distinct groups: 19 cases of germinal center B-like DLBCL, and 21 cases of activated B-like DLBCL [34] . Statistically, patients with germinal center B-like DLBCL had a better overall survival than those with activated B-like DLBCL. The molecular classification of tumors based on gene expression profiles has thus proved its ability to identify previously undetected and clinically significant subtypes of cancer [34] . Even in the same stage of disease, breast cancer is notorious for its unpredictable response to chemotherapy and variable overall outcome. Chemotherapy [42] or hormonal therapy [43] reduces the risk of distant metastasis by about a third. However, because of the lack of an accurate patient triage strategy to determine who should or should not undergo adjuvant therapy, many patients who might not develop cancer metastasis at all have unnecessarily undergone adjuvant therapy. To develop patient-tailored therapy strategies for breast cancer, van't Veer et al used supervised classification to analyze DNA microarray data on primary breast tumors of 117 young patients [35] . van't Veer et al identified a poor prognosis signature, by which they could predict a short time interval to distant metastasis. Therefore, these results provide a strategy to select patients who would benefit from adjuvant therapy. In search of the molecular metastasis signature of cancer, Ramaswamy et al compared gene expression patterns between primary tumors and metastases [37] . They identified a gene expression signature that distinguished primary from metastatic adenocarcinomas. However, the authors found that a subset of primary tumors resembled metastatic tumors, and they further confirmed this finding by applying the expression signature to data on 279 primary solid tumors of diverse types. Notably, Ramaswamy et al analyzed whole tumor tissues including surrounding stromal cells, instead of pure cancer population that could be isolated using laser capture microdissection. In the 17-gene metastasis signature identified in that study, two collagen genes and one lamin gene were upregulated, suggesting that malignancy is the product of the tumor-host microenvironment [44] . Two articles about the prognostic usefulness of gene expression profiles in acute myeloid leukemia (AML) [19, 20] . These studies also demonstrated the ability of microarray technology to use somewhat imprecise patterns of gene expression rather than exact thresholds of individual markers [19] . Currently, many prestigious journals, such as Science and Nature series, ask authors to deposit their microarray data in a Minimal Information About a Microarray Experiment (MIAME)-compliant form to one of two public repositories: Gene Expression Omnibus at National Center of Biotechnology and Informatics (http://www. ncbi.nlm.nih.gov/geo/) and Array Express at European Bioinformatics Institute (http://www.ebi.ac.uk/arrayexpress) as a prerequisite of publication of microarrayderived research articles. Therefore, many large-scale microarray data sets became available for re-analyses by other researchers worldwide. For instance, multiple important papers of breast cancer gene expression profiles [35, [48] [49] [50] have been derived from the same comprehensive microarray data set with sufficient clinical information [51] . Using several microarray data sets [34, 36, 52] , Lossos et al did a meta-analysis of DBCL and selected 36 genes for further analysis with real-time quantitative PCR on independent samples of lymphoma from 66 patients [39] . Six genes that showed the strongest predicting power were LMO2, BCL6, FN1, CCND2, SCYA3, and BCL2. After testing these genes in two additional independent microarray data sets, Lossos et al concluded that measurement of the expression of these six genes to be sufficient in predicting overall survival in DBCL [39] . The use of microarray gene expression profiles in predicting outcomes of cancer patients has been validated in the aforementioned studies. However, even if the same group samples were analyzed, distinct prognostic profiles have been derived for outcome prediction [35, [48] [49] [50] [51] . To resolve this paradox, Fan et al compared the prediction powers of the gene sets for the same group of specimens by applying five geneexpression-based models: intrinsic subtypes, 70-gene profile, wound response, recurrence score, and the two-gene ratio [53] . By performing Kaplan-Meier survival analyses of 295 patients with breast cancer, Fan et al found that four of the five models tested showed significant agreement in the outcome prediction for individual patients. The only exception was the model using two-gene ratio, which could not result in a reliable prediction [53] . The explanation for the surprising concordance among the four different models in predicting breast cancer outcomes is likely to be as follows: there was a large group of genes, which behaved differently and were related to biologic phenotypes of cancer, and ultimately, the patients' outcomes. In this large list of genes, each of the four models might have used only some of them to construct the signature profiles used for predicting outcomes. To dissect oncogenic pathway signatures in human cancer, Bild et al used adenoviral vectors to express various oncogenic activities, such as Myc, Ras, E2F3, Src, and β-catenin in otherwise quiescent cells. Applying this strategy, they were able to specifically isolate the subsequent events as defined by the activation/deregulation of that single oncogenic pathway, and analyze these events with Affymetrix oligonucleotide microarrays [54] . In clinical samples of lung, breast and ovarian cancers, Bild et al combined signature-based predictions across several pathways and identified coordinated patterns of pathway deregulation that distinguish between specific cancer and tumor subtypes [54] . Using pathway-specific inhibitors such as Ras pathway inhibitors, either farnesyltransferase inhibitor (L-744832) or farnesylthiosalicylic acid, and the Src pathway inhibitor (SU6656), Bild et al could predict drug sensitivity of tested breast cancer cell lines according to their pathway deregulation. In summary, predictions of pathway deregulation in cancer cells can also predict the sensitivity to therapeutic agents that target components of the pathway. These results pave the path of using these oncogenic pathway signatures to guide the use of target therapeutics. Because of the enormous complexity of cancer and a frequent inability to properly guide the use of available therapeutics, chemotherapy for solid tumors often results in marginal success. Most people with advanced solid tumors will relapse and die of their disease [55] . Furthermore, oncologists have always had to face the challenge of matching the right therapeutic regimen with the right individual, balancing relative benefit with risk to achieve the best outcome. With the goal of using genomic signatures to guide the use of chemotherapeutics, Potti et al systematically extracted gene expression profiles from the following microarray data sets: NCI-60 cancer cell lines [56] , additional 30 cancer cell lines [57] , the authors' previously reported breast, ovarian and lung cancer specimens [54] , and their newly analyzed 13 ovarian cancer cell lines and 119 advanced (FIGO stages III or IV) serous epithelial ovarian cancers [38] . Potti et al demonstrated patterns of predicted sensitivity of three human solid cancers (breast, lung, ovary) to seven common chemotherapeutic drugs (5-fluorouracil, paclitaxel, docetaxel, adriamycin, topotecan, cyclophosphamide, etoposide). To evaluate how individual signatures respond to a combination of drugs, Potti et al also analyzed 51 breast cancer patients who were in a breast neoadjuvant treatment study that used a combination of paclitaxel, 5-fluorouracil, adriamycin and cyclophosphamide (TFAC). The predicted response that was based on a combined probability of sensitivity built from the individual chemosensitivity predictions yielded a statistically significant distinction between responders and nonresponders [38] . As summarized in the Figure, the roles of cancer genomic signatures have evolved through three phases. In the first phase, genomic signatures were described in stored cancer specimens and dubbed as molecular portraits of cancer [33] . When gene expression profiles were carefully correlated with sufficient clinical information of cancer patients, new subgroups of cancers with distinct outcomes were revealed [34] . In studies of the second phase, validation of cancer signatures was emphasized and commonly done with independent groups of cancer specimens or independent data sets [22, 25, 35, 39] . In the third phase, cancer genomic signatures have been expanded beyond depicting the molecular portrait of cancer to predicting patient outcomes [45, 46] , including metastasis [37] . It has become a rule, in the third phase, that all of the prognostic genomic signatures be validated in additional data sets. Potti et al further demonstrated the role of cancer genomic signatures as a guide to the use of cancer therapeutics [38] . A new generation of cancer clinical trials was proposed, in which the cancer specimens of each participant should be tested for currently available predictor genomic signatures so that the most effective treatment and the least adverse effects for each patient could be identified. Then, participants can be triaged to an appropriate study group (Figure) . In fact, successful examples in treating patients with early-stage non-small cell lung cancer were reported by Potti et al [59] and in treating those with advanced-stage ovarian cancer by Dressman et al [60] . It is commonly argued against microarray results by the fact that a transcriptome does not necessarily reflect the corresponding proteome, the collection of proteins that execute the majority of cellular functions. Indeed, mRNA expression is only a coarse surrogate for protein activation levels. For many genes, however, mRNA expression is a useful surrogate [23] . As documented in many studies, when one finds strong signals of differential expression, these are typically reflected later at the protein levels. The latter can be validated by protein assays such as enzyme-linked immunosorbent assays [5] or immunohistochemistry [4] . At the current developmental pace of genomic technology, the clinical trend towards personalized medicine is almost certain (Figure) . Many biomedical researchers and clinicians predict that microarray technology will be incorporated into clinical laboratories in hospitals in the near future. As extrapolated from the results of the studies discussed in this article, the use of a "focused array" to measure the expression of 50 to 100 genes in the signature profile for a selected disease would help clinicians predict the patient's response to a drug, triage patients into a chemo-responsive versus chemo-resistant group, and evaluate a panel of risk Three phases of genomic signatures in cancer therapy and a new generation of clinical trials. This schema is adapted from Herbst and Lippman [58] and Potti et al [38] . factors that may result in comorbidity of the patient. To achieve this practical feasibility, microarray technology would need to address a range of quality-control issues. Every aspect of the process should be so robust that it can be considered to be foolproof. It is predicted that within the next 10 years, microarray developers will meet these challenges [13] . Molecular classification of cancer using supervised machine learning Golub et al Molecular profiling of breast cancer Perou et al Identification of subgroups of diffuse large B-cell lymphoma with different outcomes Alizadeh Prediction of clinical outcomes of breast cancer van't Veer Identification of metastasis signature that reflected both contributions of the tumor Ramaswamy et al [37] and the host environment 2004 Identification of prognostic profiles of adult acute myeloid leukemia Bullinger et al Using independent samples of lymphoma to test a meta-analysis derived signature Lossos Establishment of cDNA microarray analysis at the Genomic Medicine Research Core Laboratory (GMRCL) of Chang Gung Memorial Hospital Overview of microarray analysis of gene expression and its applications to cervical cancer investigation Molecular signature of clinical severity in recovering patients with severe acute respiratory syndrome coronavirus (SARS-CoV) Molecular characterization of adenocarcinoma and squamous carcinoma of the uterine cervix using microarray analysis of gene expression Paclitaxel (Taxol) upregulates expression of functional interleukin-6 in human ovarian cancer cells through multiple signaling pathways Functional network analysis on the transcriptomes of mesenchymal stem cells derived from amniotic fluid, amniotic membrane, cord blood, and bone marrow Evaluation of gene expression measurements from commercial microarray platforms Comprehensive comparison of six microarray technologies Redefinition of Affymetrix probe sets by sequence overlap with cDNA microarray probes reduces cross-platform inconsistencies in cancer-associated gene expression measurements A study of interlab and inter-platform agreement of DNA microarray data Comment on "'Stemness': transcriptional profiling of embryonic and adult stem cells" and "a stem cell molecular signature Getting the noise out of gene arrays Arrays of hope Evaluation of external RNA controls for the assessment of microarray performance Evaluation of DNA microarray results with quantitative gene expression platforms Performance comparison of one-color and two-color platforms within the MicroArray Quality Control (MAQC) project for MAQC Consortium. The MicroArray Quality Control (MAQC) project shows inter-and intraplatform reproducibility of gene expression measurements Rat toxicogenomic study reveals analytical consistency across microarray platforms Microarrays and clinical investigations Gene-expression profiling in acute myeloid leukemia Noise and bias in microarray analysis of tumor specimens Integrative analysis of the cancer transcriptome From signatures to models: understanding cancer using microarrays Cluster analysis and display of genome-wide expression patterns Large-scale meta-analysis of cancer microarray data identifies common transcriptional profiles of neoplastic transformation and progression for Gene Ontology Consortium. The Gene Ontology (GO) database and informatics resource Mining for regulatory programs in the cancer transcriptome Isolating human transcription factor targets by coupling chromatin immunoprecipitation and CpG island microarray analysis Microarray Analysis of Gene Expression of Cancer Control of pancreas and liver gene expression by HNF transcription factors PGC-1alpha-responsive genes involved in oxidative phosphorylation are coordinately downregulated in human diabetes A module map showing conditional activity of expression modules in cancer Molecular classification of cancer: class discovery and class prediction by gene expression monitoring Molecular portraits of human breast tumours Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling Gene expression profiling predicts clinical outcome of breast cancer Diffuse large B-cell lymphoma outcome prediction by gene-expression profiling and supervised machine learning A molecular signature of metastasis in primary solid tumors Genomic signatures to guide the use of chemotherapeutics Prediction of survival in diffuse large-B-cell lymphoma based on the expression of six genes Distinctive gene expression patterns in human mammary epithelial cells and breast cancers A clinical evaluation of the International Lymphoma Study Group classification of non-Hodgkin's lymphoma. The Non-Hodgkin's Lymphoma Classification Project Polychemotherapy for early breast cancer: an overview of the randomised trials. Early Breast Cancer Trialists' Collaborative Group Tamoxifen for early breast cancer: an overview of the randomised trials. Early Breast Cancer Trialists' Collaborative Group Cancer's deadly signature Use of gene-expression profiling to identify prognostic subclasses in adult acute myeloid leukemia Prognostically useful gene-expression profiles in acute myeloid leukemia Prediction of cancer outcome with microarrays: a multiple random validation strategy Gene expression patterns of breast carcinomas distinguish tumor subclasses with clinical implications A gene-expression signature as a predictor of survival in breast cancer scalability, and integration of a wound-response gene expression signature in predicting breast cancer survival Gene expression signature of fibroblast serum response predicts human cancer progression: similarities between tumors and wounds for the Lymphoma/ Leukemia Molecular Profiling Project. The use of molecular profiling to predict survival after chemotherapy for diffuse large-B-cell lymphoma Concordance among geneexpression-based predictors for breast cancer Oncogenic pathway signatures in human cancers as a guide to targeted therapies Twenty-two years of phase III trials for patients with advanced non-small-cell lung cancer: sobering results Chemosensitivity prediction by transcriptional profiling Gene expression profiling of 30 cancer cell lines predicts resistance towards 11 anticancer drugs at clinically achieved concentrations Molecular signatures of lung cancer-toward personalized therapy A genomic strategy to refine prognosis in early-stage non-small-cell lung cancer An integrated genomic-based approach to individualized treatment of patients with advanced-stage ovarian cancer The authors like to thank Dr Yun-Shien Lee