key: cord-0001755-4h5u4b4m authors: Mias, George I.; Snyder, Michael title: Personal genomes, quantitative dynamic omics and personalized medicine date: 2013-01-15 journal: Quantitative Biology DOI: 10.1007/s40484-013-0005-3 sha: b67376db380379b48690b43ab52ae56c4ec3cfc3 doc_id: 1755 cord_uid: 4h5u4b4m The rapid technological developments following the Human Genome Project have made possible the availability of personalized genomes. As the focus now shifts from characterizing genomes to making personalized disease associations, in combination with the availability of other omics technologies, the next big push will be not only to obtain a personalized genome, but to quantitatively follow other omics. This will include transcriptomes, proteomes, metabolomes, antibodyomes, and new emerging technologies, enabling the profiling of thousands of molecular components in individuals. Furthermore, omics profiling performed longitudinally can probe the temporal patterns associated with both molecular changes and associated physiological health and disease states. Such data necessitates the development of computational methodology to not only handle and descriptively assess such data, but also construct quantitative biological models. Here we describe the availability of personal genomes and developing omics technologies that can be brought together for personalized implementations and how these novel integrated approaches may effectively provide a precise personalized medicine that focuses on not only characterization and treatment but ultimately the prevention of disease. With the advent of high-throughput technologies genomic science has experienced great leaps, rapidly expanding its domain beyond the characterization of short genomic reads in the early days of sequencing to the possibility of obtaining personalized genomes, once considered the holy grail of genomic methodology and technology development. The value of personalized genomic analysis, and evaluation of variant associations to disease, is becoming more apparent, even spurring directly to consumer implementations. Further developments in the last few years now lead to a more ambitious goal: the longitudinal monitoring of multiple omics components in individuals and the characterization of the molecular changes associated with disease onset in individuals, at an unprecedented level. In this review we describe technological and methodological developments in personal genomics, and the new promise of multiple omics profiling, including transcriptomes, proteomes, metabolomes, autoantibodyomes and so forth, (sample omics analysis workflows shown in Figures 1-4) . We then discuss a framework on how such data may be integrated with a view towards the application of a personalized precise and preventive medicine, and describe an implementation of this approach. The technological developments and methodology allow for inroads into the future of quantitative personal medicine, which we can now plan carefully by taking into account not only the scientific developments that need to be implemented, but also the social implications coupled to ethical and legal considerations. In 2001 the completion of the Human Genome Project (HGP) was announced effectively with the publication of the first complete human genome sequence. The HGP came at a hefty $2.7 billion cost using the best technology of the time, making it seemingly prohibitive to expect personal genome sequences to be achieved shortly thereafter. Yet the immense technological advancement, spurred by motivation by the National Institute of Health (NIH) and the National Human Genome Research Institute (NHGRI) to bring down genomic costs, led to an unprecedented growth in technology and methodology, enabling the drop in sequencing costs (http://www. genome.gov/sequencingcosts) to continue at a rate beyond the most optimistic projections of 2001 ( < $4000 currently). While initially the human genome was a combination of multiple individual genomic data [1] [2] [3] , the developments by 2008 had allowed the determination of genomic individual makeup [4] [5] [6] [7] . It is now possible to personalize Whole Genome Sequencing (WGS), and the dwindling sequencing costs promise the possibility of affordability for all in the near future [8] . These developments encouraged efforts to characterize disease on a genomic level, towards the application of an all-encompassing genomic medicine, at the molecular level. The initial goals were the characterization of populations for large studies, now shifting to the individual. The HGP relied on technology using Sanger-based capillary sequencing [1] with an estimated production of 115k base pairs per day (kbp/day) [9] . The NHGRI spurred progress by encouragement through the $1000 genome program (http://www.genome.gov/11008124-al-4), leading to the industry development of multiple massively parallel [10] sequencing platforms (e.g., Roche/454, based on pyrosequencing [11] [12] [13] ; Life Technologies SOLiD [14] [15] [16] ; Illumina [5, 6] ; Complete Genomics based on DNA nanoball sequencing [17] ; Helicos Biosciences [18] ; and recently single molecule real-time technology [19, 20] by Pacific Biosciences). These next generation sequencing platforms are now being supplemented but what has been termed as thirdgeneration sequencing, [21] , including such nanopore technologies as announced early in 2012 by Oxford Nanopore Technologies [22] . The technological developments and competition resulted in a drastic and continuing drop in sequencing cost, processing times and exponential increases in number of reads produced. An alternative to sequencing the whole genome has been whole exome sequencing (WES) [23] . This technology aims to study the exonic regions of the genome (~2%-3%), which are associated to several Mendelian disorders. It offers a lower cost option (e.g., Illumina, Agilent, and Niblegen platforms, see Clark et al. for a comparison of the latter two [24] ) and has received immense attention, including the Exome Sequencing Project (ESP) (see the Exome Variant Server at http://evs. gs.washington.edu/EVS/), supported by the National Heart, Lung and Blood Institute (NHLBI). Concurrently with the technological developments, our understanding of the human genome has grown immensely since the publication of the reference genome in 2003. The aim was to determine the precise role of each base in the genome and identify genomic variants ( Figure 1 ). Several collaborative large-scale efforts pursued such investigations. The International HapMap Consortium [25, 26] tried to identify common population variants and led to the development of public databases, such as dbSNP [27] (http://www.ncbi.nlm.nih.gov/SNP/), which catalogues Single Nucleotide Polymorphisms (SNPs) (defined as occurring in >1% of the population to differentiate from Single Nucleotide Variants (SNVs)). This has revealed great genomic variation both in global populations [28, 29] and populations of admixed ancestry [30] [31] [32] [33] . Typically the technologies involve the assignment of reads to the reference genome to determine the structure of the underlying sequence, including variation ( Figure 1 ). Beyond nucleotide variation, other genomic differences have been investigated, including small insertions and deletions (indels), copy number variations (CNVs) indicating varying numbers of segments and longer chromosomal segments that contribute to Structural Variation (SVs) -SVs are defined for segments of chromosomes larger than 1000 bp ( Figure 1A ). Such efforts have been based on microarray methodology [34] [35] [36] [37] and even higher-resolution in structural variants may be achieved with other methods [38] [39] [40] [41] . Structural variants have been publically made available in the database of Genomic Structural Variation (dbVAR; http:// www.ncbi.nlm.nih.gov/dbvar/). Furthermore, functional elements have been extensively catalogued by the Encyclopedia of DNA Elements consortium (ENCODE; http://genome.gov/encode~10 production projects), with funding from the NHGRI. ENCODE data, including regulatory elements and RNA and protein level elements, have now been released and the project has received widespread attention [42] [43] [44] [45] . The ENCODE project aims at a biochemical genomic characterization, with a thorough mapping of transcribed regions, transcription factor binding sites, open chromatin signatures, chromatin modification and DNA methylation. Such extensive data still needs to be annotated [46] interpreted in terms of biological significance, mechanisms and connections to phenotype and will likely prove invaluable in our interpretation of personalized genomic differences. Though initially limited by the number of complete genomic sequences, such data are now continuously updated and expanded by information from other projects such as the 1000 Genomes Project [47] as discussed QB Figure 1 . Genomic variants. (A) Variation in the human genome. The personal genomic code can differ from the published reference genome. Basic examples of variation are shown on a single or few base variants (e.g., point mutations, insertions and deletions), or a larger scale for structural variants (>1000 bp, e.g., large insertions, deletions, inversions, tandem repeats, translocations). (B) Sample variant analysis workflow. In a genomic variant analysis, for example, after sample preparation and sequencing the raw files can be passed through quality control (e.g., using FastQC (http://www.bioinformatics.bbsrc.ac.uk/projects/ fastqc/) and removing PCR artifacts using tools as Picard (http://picard.sourceforge.net)). Reads are mapped to the genome and variants are assessed, e.g., mapping with several algorithms, including ELAND II (Illumina), SOAP [221] , MAQ and Burrows-Wheeler Aligner (BWA) [222] and Novoalign by © Novocraft Technologies (http://www.novocraft.com). Read re-alignment can be performed, e.g., using Genome Analysis Toolkit (GATK) [223] , or HugeSeq [211] , to call variants, including implementations with Sequence Alignment Map format Tools (SAMtools) [224] , annotation using Annovar [225] , SIFT [226] and Polyphen [227] for determining variant effects on proteomic translation [228] . Furthermore, using a variety of methods the structural variants can be determined. For example the paired-end mapping method considers how paired-end reads mapped to the reference to assign deletions and insertions, from reads whose mapped span is longer or shorter than the average span; inversions, from position and relative orientations of the ends of reads [39, 40] . The read depth method allows the possibility to identify the proportional genomic copy number variation. In the approach of Abyzov et al. [229] the read depth considered as an image is analyzed using image processing techniques, viz. mean-shift-theory [230] . Programs such as Pindel [231] and BreakSeq [232] consider split-read analysis to determine breakpoints of insertions and deletions. DELLY [233] by Rausch et al. takes into account paired-end and split-read methods for determining structural variants. Many packages for analysis are available through the Bioconductor [234] project as implemented in the freely available R statistical analysis platform (http://www.R-project.org). below, which has allowed us to have a better view of the great variability in each individual genome (~3-4Â10 6 SNPs, > 200000 SVs of varying sizes,~1500 SVs> 2 kbp), with much of the variation considered rare (1%-5%). Genome-Wide Association Studies (GWAS) try to associate the common variants to disease, by combining the now readily available extensive variant information and allelic variability, with linkage disequilibrium (a description of the correlation patterns between proximal variants). The NHGRI provides a publically available catalogue of published GWAS (http://www.genome.gov/ gwastudies) [48] . The early expectations of finding common traits and genomic features unique to diseases have proven more complicated, as the genomic variability turns out to be higher than expected and additionally the genetic variants need further validation. Use of WGS and WES has been successful in the identification of somatic mutations. Mendelian disorders including neurological disorders, and cancer have been characterized using WES [49] [50] [51] [52] [53] [54] [55] [56] [57] [58] , including some recent single-cell studies [59, 60] . Genomics may help classifying cancer subtypes, and possible treatment, and such research is at the center of WGS, with projects such as the Cancer Genome Atlas [61] (http://cancer-genome.nih. gov/), and the International Cancer Genome Consortium (http://www.icgc.org). Additionally, cancer specific public databases already are available [62] , including a cancer cell line encyclopedia [63] , and genome characterization has been carried out, for example in ovarian cancer [61] , melanoma [64] , lymphocytic leukemia [65] , breast cancer [66] [67] [68] [69] and acute myeloid leukemia (AML) [70, 71] . One of the goals of personalized genome interpretation is the evaluation of disease risk factors based on an individual's variant and allelic distribution composition. Such information may be compared to similar individuals with known disease associations to assess whether an individual shows increased or decreased risk compared to the control group. A combination of know SNPs and personalized variants has been found to be effective [72] [73] [74] [75] and has been used in clinical studies; more recently, a seminal study by Ashley et al. [76] evaluated disease risk for a patient with family history of vascular disease. Personalized evaluation of potential drug responses can be based on the effects of variants [77, 78] , including drug selection, sensitivity and dosage estimation, e.g., cardiovascular drugs [79] , schizophrenia related medications [80] . For example, PharmGKB (http://www.pharmgkb. org) provides a curated database of possible genomics information [81, 82] , exploring the impact of genomic variation on drug responses as these relate to expressed genes and associated pathways and disorders. The future applications are to include a precise drug dosage for an individual, avoiding trial and error methods and providing more effective treatment. The evaluation of personalized risk based on genomes is now appearing in direct-to-consumer services. Companies like 23andMe, deCODEme, (and previously Navigenics), offer to assess individual genotypes and offer disease based interpretation services based on Mendelian disorder evaluation and including pharmacogenomics responses. These are mostly based on SNPs evaluation and the tests though limited in scope do offer interpretation attractive to multiple consumers. Presently thousands of genomes have been completely sequenced. One of the first large scale projects has been the 1000 Genomes Project [47] , that has made its data publically available, and has encouraged the development of streamlined bioinformatics tools to analyze the variation in the individual genomes ( Figure 1 ). This project aims to combine data from 2500 individuals from multiple populations, at a 4Â coverage. Another grand scale effort driven by George Church's group at Harvard University is the Personal Genome Project (PGP) [83] [84] [85] . The project has been recruiting individuals who can share their medical and other information together with genomic information online (http://www.personalgenomes.org). The volunteers share full DNA sequences, RNA and protein profile information in addition to extensive phenotype information including medical records and environmental considerations, with all the data made publically available, and plans to expand to 100000 individuals [86] . One of the rather unique features of the PGP project is that it differs in consent of participants as compared to traditional studies. The ownership of the data is to be open and publically available without restrictions, not only for the initial perspective of the study, but open to follow-up or additional investigations. The scope is participatory, with the volunteers for the project interacting directly with the researchers. To address informed consent, participants pass a basic genetic literacy exam and must understand the project's scope. Additionally, they provide complete medical history, immunization and medications history, which becomes part of the publically available subject information. The access to the individual's data in the project can be either private to the participant and researchers only or completely public, depending on the participant's choice. The availability of extensive patient and omic information will be invaluable to researchers in developing robust analysis models for characterizing genomes and disease and the PGP project, and its publically open structure model, will be at the forefront of such efforts. Though the genetic code in DNA is the almost identical (besides cellular variation), different cells have different gene expressions, corresponding to the kind of cell, developmental stage and physiological state. The collection of the transcripts in a cell (e.g., mRNA, non-coding RNA and small RNAs), the transcriptome, is essential in our understanding of cell function, and response to disease. Considerations must include start and end sites of genes, and coding, alternative splicing and post-transcriptional modifications. Initially inroads were made using high-density oligo microarrays, and in-house custom made microarrays [87] , with high-density arrays having resolutions up to 100 bp [88] [89] [90] [91] . While relatively inexpensive, these methods suffered from relying on prior knowledge of the genome, and faced technical issues such as background and saturation effects [92] . Hybridization interactions between probe sets in short oligo microarrays lead to spurious correlations [92, 93] . The development of RNA sequencing (RNA-Seq) brought higher coverage, better precision and quantitation, and higher resolution and sensitivity, bringing RNA-Seq technology and transcriptomics on par with genomic sequencing [94] [95] [96] [97] [98] . RNA-Seq considers reads that correspond to millions of transcriptomic fragments that are mapped to the reference genome, to provide information on transcripts that may not be in the existing genomic annotation, allowing the search for novel transcripts, and even identification of SNPs and other variants, while showing remarkable reproducibility ( Figure 2 ). Transcriptome profiling has included looking at cancers [99] [100] [101] , including breast cancer [102] , gastrointestinal tumors [103] and prostate cancer [104] . Gene expression was expected to correlate with protein levels in a cell and it was thought that methods such as RNA-Seq would be enough to ascertain the proteomic expression corresponding to gene expression. Proteins are expected to be closer to phenotype, as they participate in every aspect of cellular biology, but their expression levels are difficult to quantitate, partly because of translational control in cells, possible degradation and sampling issues [105] [106] [107] . The development of electrospray ionization brought mass spectrometry (MS) to the field of proteomics and the possible identification of thousands of molecules based on mass [108] [109] [110] [111] [112] . This has enabled not only the cataloguing of proteins, but also querying post-translational modifications [113, 114] . As the techniques matured, liquid chromatography tandem mass spectrometry (LC-MS/MS) has become standard, and novel instruments (e.g., Velos family [115] by Thermo Scientific; quadrupole time-of-flight mass spectrometers (QTOFs) by Agilent) allow unprecedented precision to enable the development of methods to QB Figure 2 . RNA-Seq analysis. In RNA-Seq analysis, short reads can be assembled and then mapped to the reference genome (with tools such as Illumina's ELAND, MAQ and BWA [222] , Bowtie [235] [236] [237] , SOAP [221] , and others). A recent protocol by Trapnell et al. [238] describes in detail the use of dedicated RNA-Seq programs from the Tuxedo suite, such as TopHat [239] , Cufflinks [240, 241] and an R implementation called CummeRBund as a Bioconductor package (an alternative is to run these directly or using GenePattern [242, 243] , which also includes possible reconstruction by Scripture [244] ). Other programs such as DESeq, another package in Bioconductor, can also help test for differential expression [245] . The numerous analyses availabilities are now publically discussed online, in a forum (http://SEQanswers.com/) that discusses many other examples and all aspects of the mapping process [246] . identify thousands of proteins (~4000-6000 over 2 days), and quantitate protein levels [73, 116] (Figure 3 ). One set of methods uses stable isotopic labeling by amino acids in cell culture (SILAC) to label cell in light and heavy isotopes of amino acids providing double spectral peaks in MS for identification and quantitation [117] [118] [119] [120] this method is now supplemented by 'spike-in'/'super' SILAC which has been used to measure biopsy tumor proteomes [121] . Another possibility is to use isobaric tags for relative and absolute quantitation (iTRAQ) [122, 123] or tandem mass tag (TMT) labeling [73, 124, 125] , and other methods, including spiking in peptides for absolute quantitation. Finally, it is possible to employ label-free methods for quantitation, which do not rely on tags, including integrating signal methods and MS spectral counting [126] [127] [128] [129] [130] [131] . In comparison to whole transcriptome profiling, the numbers of proteins identified in proteome profiling tend to be less in comparison, particularly since low peptide levels cannot be amplified (cf. polymerase chain reaction methods for sequencing methods). Additionally, the current bottom-up (shotgun) proteomics methodology uses digestion with endopeptidases such as trypsin to obtain peptides of small enough mass to be identified by MS/MS, resulting in many fragments that cannot be identified in MS, which may possibly be alleviated by top down approaches that do not employ a digestion step [132] [133] [134] [135] [136] . However, proteomics provides insights that are missing from transcriptomic analysis, especially given the low correlations between protein and transcriptome differential gene expressions [73, [137] [138] [139] [140] [141] [142] . Multiple proteomes have been quantitatively profiled, including characterization of ovarian cancer [143] , an integrated approach that combines transcriptome and proteome information in a human cancer cell line by Nagaraj et al. [144] , integrative gastric cancer characterization and effects of post-translational modifications [145] , and looking for biomarkers in other cancers [146, 147] . In addition to developments in proteomics, MS has encouraged the study of small molecules. The behavior of small molecules in cells though difficult to track provides insight into many common disorders. The set of all cellular small molecules is collectively called the metabolome. Metabolic processes are vital in biological pathways and a systems analysis of molecular cell complexity might lead to biomarker discovery, and possibly disease risk assessment, diagnosis and treatment [148] . Similar to proteomics, metabolomics can employ mass spectrometry to identify compounds [149] In quantitative proteomics using mass spectrometry typical approaches employ trypsin digestion coupled with tagging methodsnon label-free methods include use of isotopic labeling (SILAC) or isobaric tagging (iTRAQ, TMT). One typical bottom-up-approach setup uses a combination of high affinity liquid chromatography coupled with two rounds of mass spectrometry (LC-MS/MS) to fractionate peptides for identification and obtain their mass spectra. Raw files may be analyzed using vendor software or converted to open formats (such as .mzXML, .mzData or the current standard .mzML [247] [248] [249] , e.g., using MSConvert [250] ). The mass spectra can be mapped to known protein using a protein library, or less frequently de novo assembled, using an array of programs (e.g., X!Tandem [251] , SEQUEST [252] , Mascot [253] , Open Mass Spectrometry Search Algorithm (OMSSA) [254] , Proteome Discoverer by Thermo Scientific, or MassHunter Workstation by Agilent). Quality control includes estimation of false discovery rates (FDR), often using a reverse database search [105, 255, 256] . Quantitation can be carried out to estimate relative levels of proteins in different samples (employing standardization and normalization of average sample ratios to a unit mean). Finally annotation is made using databases such as UniProt or NCBI. Some of the analysis can be performed using suites and programs, such as PEAKS [257] , the Trans-Proteomic Pipeline (TPP) [258] [259] [260] [261] , multiple tools from ProteoWizard [250] , OpenMS [262] [263] [264] or vendor complete solutions Proteome Discoverer and MassHunter Workstation mentioned above. Multiple other programs for mass spectrometry are available (e.g., see http://www.msutils.org). 4) and cataloguing is under way, with thousands of metabolites identified by structure, mass and occasionally associated biological processes [150] [151] [152] [153] [154] [155] [156] [157] [158] [159] [160] [161] . The identification of compounds can be based on MS/MS application and use of known compound spectra, or via use of standards against which mass spectra are compared. The profiling of metabolic components on an individualized basis can provide insights into pharmacogenomics and personalized medications, in addition to potential biomarkers, for example cholesterol levels and coronary artery [162, 163] . The metabolomics of cancer has been extensively studied [164] [165] [166] and Type 2 Diabetes has been investigated [167] , and in vivo interactions with proteins are being evaluated [168] . Genomes, transcriptomes and metabolomes have received widespread attention and currently offer the most quantitative data, provided by robust and comprehensive omics technologies, both in terms of experimental, as well as computational methodology. However multiple other omics are available, and these numbers are increasing, with a few notable technologies mentioned below: Autoantibodyomes: In addition to profiling of proteins directly, the reactivity of proteins to autoantibodies may be profiled on a large scale. Spotted protein arrays [169] [170] [171] [172] [173] have been implemented to study for example effects in cancer [174] , immune response [175] and recently diabetes [176] . Another approach is the Nucleic Acid Programmable Protein Array (NAPPA) constructed by spotting plasmid DNA to effectively express and code the proteins on the array and used for immunoprofiling [177, 178] . Furthermore functional peptide arrays have also been constructed [179, 180] . Complementary technologies such as bead-based immunoassays are also being actively developed, such as the Luminex xMAP assay [181] . Microbiomes: Omics profiling could also include mapping of the personal microbiome, the complete set of microbes in an individual (e.g., found mainly on the skin or in the gut, conjunctiva, saliva and mucosa) using possibly a combined omics approach to look at genetic makeup and metabolic components [182] [183] [184] [185] [186] [187] . The human microbiota (http://www.human-microbiome.org) have been associated to obesity [188] and diabetes [189, 190] and have also been suspected to play an active role in the development of immunity [191] . The dynamic monitoring of microbiome-related changes can help identify the specific microbiota involved in disease responses, elucidate microbiome-host interactions and how the individual variability in components impacts developmental and metabolic processes. Methylomes: In addition to genomics, epigenomic information, such probing the methylome, i.e., identifying all genomic sites of cytosine methylation [192, 193] , might provide information about differentiation and regulation of gene expression. Methylation analysis and data interpretation can be challenging [194, 195] but methods are improving as more data becomes available. Methylome analysis has now been carried out in blood components [196] , stem cells [197] and ovarian cancer [61] , and it might prove invaluable in assessing epigenomic effects on individual development and In metabolomics analysis chromatography columns are used for purification and preparation of samples coupled to mass spectrometry (gas chromatography (GC) or liquid chromatography (LC)-MS); standards for specific compounds may also be used in parallel for positive identification. Raw files may be analyzed using vendor software or converted to open formats (such as .mzXML, .mzData or the current standard .mzML [247] [248] [249] , e.g., using MSConvert). The spectral data may be aligned for retention time and mass intensity calibration, e.g., using XCMS [265] [266] [267] , SIEVE by Thermo Scientific, Matlab toolboxes by MathWorks, MassHunterProfiler by Agilent, MzMine [268, 269] . After quality control and statistical analysis, masses of interest can be annotated using databases, e.g., Metlin [155, 156] , KEGG [151] , MetaCyc [153, 270, 271] , Reactome [157] [158] [159] [160] [161] . health. The developments of the many different omics technologies outlined above have given us tremendous insight into the human genome and associations to diseases, especially with the rise of the personal genome. The NHGRI recognizing the importance of these developments and the directions necessary to enhance health care, outlined in 2011 a vision for the future of personalized medicine [198] encompassing five domains of development that included understanding the structure of genomes, their biology, improving our understanding of the biology of disease, advancing medicine and improving the effectiveness of healthcare. The aims had been set to a shift towards personalized medicine within two decades, but the availability of the technology and constant decreasing costs have made pilot investigations of personalized medicine a current possibility [73] . Genetic variation has proven adequate for understanding group differences in disorders, but a truly personalized implementation needs to consider an individual. Clinicians are already considering molecular markers in their evaluation of patients, and particularly cancer [199] [200] [201] [202] [203] . The typical clinical diagnosis involves the observation of symptoms traditionally confirmed utilizing a small set of molecular markers. In diseases that share a common set of symptoms, some rare, such diagnosis is often complicated and prolonged, especially for heterogeneous disorders that need additional information to enable classification and subsequent specific treatments. Genetic and environmental factors create additional variability in disease severity, progression and treatment responses. Thus, traditional assays together with the aforementioned current omics technologies, that allow monitoring of thousands of molecular components, will facilitate and accelerate differential diagnostics and sub-classification through utilizing a more complete set of disease markers. A personalized approach will result in better targeting of diseases, introduce higher precision through measurement of larger sets of molecular components and ideally implemented at an early age to assess disease risk and have a preventative rather than retrospective treatment focus. A personal approach is by its nature an n = 1 study, which helps eliminate variation between individuals that are treated as a group, but still requires some verification and establishment of a baseline for comparison. As such, the profiling of healthy physiological states in a longitudinal approach may provide such a basis, if multiple time points with similar physiological state makeup are sampled. Multiple omics can supply multiple supporting datasets at each time point, with each complementary technology providing additional supporting information for a baseline establishment. This introduces the concept of complete omics monitoring of individuals over time, making personalized medicine a more dynamic proposition. The dynamic changes of molecular components may be associated to the individual's changing physiological states, and mapped onto pathways to identify the onset and progression of disease, including possible preventive measures. In our suggested implementation, termed integrative Personal Omics Profiling (iPOP) which we followed in the study discussed below [73] we integrate the omics components discussed above in a longitudinal approach with three essential steps ( Figure 5 ): I) Risk estimation: As discussed above the personal and common genomic variants determined in an individual genome can be associated to disease [76] , with pharmacogenomic evaluation to determine possible drug response. An early age whole genome sequencing, possibly at birth, can provide a list of possible increased risk disorders and lead to taking preventive measures. This may be done in combination with a complete medical and family history, as for example implemented in the PGP project, and in conjunction with classical clinical risk factor profiling. II) Dynamic profiling of multiple omics: Starting with a healthy or 'steady state' baseline, by monitoring changes in the molecular components over multiple time points, drastic or gradual changes in physiological states might be assessed and the dynamic onset of disease profiled, and possibly prevented. Such profiling may be done on blood components, which are easily obtainable currently in the clinic. The individual blood components are excellent reflectors of generalized physiological state of an individual, as the blood circulates and receives inputs from multiple tissues throughout the body. The components may be processed to track multiple omics, such as transcriptome, proteome, metabolome and autoantibodyome, etc., which as mentioned offer complementary information, especially given the modest correlation observed between transcriptomic and proteomic components [137] [138] [139] [140] [141] [142] . A recent study of profiles of tumors changing over time also employed an integrative approach on genomic and transcriptomic components [204] . Implementing this monitoring on healthy individuals will allow the monitoring of disease onset and physiological changes from various healthy, disease and recovery states, and following thousands of molecular component levels and responses at corresponding physiological states. III) Data integration and biological impact assessment: The multiple omics data can be analyzed individually to characterize their temporal response profile. This may be done using standard statistical time-series analysis, extensively used in all quantitative disciplines, such as physics, economics and finance, as discussed by Bar-Joseph et al. [205] . The dynamic signature of the signals for each molecular component can be studied for autocorrelation, periodicity or spikey behavior, corre-QB Figure 5 . iPOP for personalized medicine. The framework described in the text employs multi-omics analyses (see above and Figures 1-4) that may be implemented for individuals. In step I) Risk estimation for disease is carried out using a whole genome sequencing to perform variant analysis coupled to medical history, environmental considerations and pharmacogenomics evaluations. In step II) Dynamic profiling of multiple omics using an array of technologies follows multiple omics longitudinally in a subject as they progress through their different physiological states, including healthy, disease, and recovery states. Thus thousands of molecular components are collected over time for III) Data integration and biological impact assessment, using temporal patterns to obtain matched omics information, correlate and classify responses, compare against pathway databases and visualize components, e.g., current pathway tools include DAVID [206, 272] , KEGG [151] , Reactome [157] [158] [159] [160] [161] , Ingenuity Pathway Analysis (IPA); networks can be visualized using Cytoscape [207] , various R packages through Bioconductor [234] , Matlab by MathWorks and several others. The future iPOP implementations may be gathered into a curated database of iPOP-disease associations that may help in categorizing an omics dynamic response to a catalogued physiological state and disease onset, with potential diagnostic capabilities. sponding to causal changes or abnormal physiological state conditions resulting from the onset of disease, infections, or environmental effects. The different classes of temporal response can be checked for biological pathway and gene ontology enrichment [151, [157] [158] [159] [160] [161] [206] [207] [208] [209] [210] , and corresponding disease associations in comparison to a database of other longitudinal profiles (coupled to complete electronic records of omic and medical histories). Such a database is a necessary and powerful resource towards the realization of personalized medicine based on omics data profiling. To show the feasibility and practical applicability of iPOP we profiled a healthy individual, 54, over a period of initially 14 (now 33) months [73] . This initial time series covered healthy states, and two viral states, including a human rhinovirus (HRV) infection at the initiation of the study and a respiratory syncytial virus (RSV) infection 289 days later. The iPOP used blood samples to extract omic components from peripheral blood mononuclear cells (PBMCs) and serum, which were analyzed to obtain a complete DNA, RNA, protein, metabolite and autoantibody profile. Initially a complete medical exam was performed with standard clinical tests before time-point profiling began. In a first step, WGS with two platforms was carried out (Complete Genomics and Illumina, at 150-and 120-fold coverage respectively) and WES with three platforms (Nimblegen, Illumina and Agilent) and helped identify a large number of variants (> 3Â10 6 SNPs; > 2Â10 5 indels; > 2000 SVs). Using multiple platforms allowed us to determine high-confidence and novel variants (using HugeSeq [211] ). Evaluation of genetic disease risks based on variants was carried out, both by looking for known disease associations using dbSNP and the Online Mendelian Inheritance in Man (OMIM, http://omim.org/) database and using the RiskO-Gram algorithm [76] which integrates information from multiple alleles to assess risk against a similarly matched data cohort. This revealed significantly increased risk for various disorders, including open angle glaucoma, dyslipidemia, coronary artery disease, basal cell carcinoma, type 2 diabetes (T2D), age related macular degeneration and psoriasis. This encouraged the subject to follow up on these disorders, and also start monitoring glucose and glycated hemoglobin (HbA1c) levels, which surprisingly increased beyond normal levels following the RSV infection, and the subject was diagnosed by his physician for T2D 369 days into the study. Related to T2D, pharmacogenomic considerations revealed a possibly favorable (glucose lowering) response to diabetic drugs rosiglitazone and metformin, should treatment become necessary. Furthermore, the autoantibodyome profiling of the subject (Invitrogen ProtoArrays profiling of 9483 protein reactivities to Immunoglobulin G (IgG)) revealed increased reactivity in multiple proteins, including DOK6 (related to insulin receptors), and GOSR1, BTK and ASPA, previously reported to show high reactivity by Winer et al. in insulin resistant patients [176] . The subject initiated and still maintains a strict dietary and exercise regiment supplemented with low doses of acetylsalicylic acid, which helped him control his glucose and HbA1c levels, which after a considerable time period (~months) have now returned to normal levels. In addition a range of omics were profiled over time for up to 20 different timepoints over the span of the study including high coverage transcriptome (RNA-Seq of PBMCs, 2.67 billion reads mapped to 19714 isoforms corresponding to 12659 genes), proteome (MS of PBMCs, identifying a total of 6280 proteins; 3731 consistently across most timepoints), metabolome (MS of serum, profiling 6862 and 4228 metabolites during periods of HRV and RSV infections respectively, with 20% identified based on mass and retention times alone). The dynamic transcriptome, proteome and metabolome profiles were analyzed in a novel integrated framework based on spectral analysis of the time series. This allowed the identification of temporal patterns in the combined data, corresponding to biological processes that varied with physiological state changes, including the onset of T2D seen in multiple omics components, and common signatures of HRV and RSV infections. While several gene associations to pathways were known, multiple genes showed similar patterns that had not been reported before and merit further investigation. The iPOP study discussed above revealed the complexities and characteristics of personal genomes, transcriptomes, proteomes and metabolomes and showed the feasibility of personalized longitudinal profiling that can provide actionable health information. Multiple omics data integration still presents a formidable challenge and merits further development. Each omics technology produces different kinds of data, including multiple formats (e.g., data files range from simple text, and extensible markup, e.g., .xml, to vendor closed-source formats). Additionally, each omics set requires its own quality control analysis, further confounded by different error and noise levels associated to the different technologies. As each of the data sets also presents different signal and noise distributions, this makes uniform normalization approaches across omics challenging, especially if considering multimodal dynamic data. Furthermore, the amounts of information per omics set can vary, e.g.,~5000 proteins,~20000 transcript isoforms,~6000-10000 metabolites,~9000 autoantibodyprotein reactivities and so forth. Hence, gene-centric approaches, that integrate data corresponding to, associated or interacting with the same genes, will not always work, as the different components may not match. The integration of information per component is made more difficult with multiple existing gene and protein annotations, often resulting in a many-to-many map in the geneprotein integration, and correspondingly lacking metabolite-protein/gene annotations and associations. Finally, if considering dynamic datasets, this also results in multiple instances where time points might be missing data for some of the molecular components (especially evident in mass spectrometry and shotgun proteomics, where proteins are identified through different peptides). These complications of omics data integration necessitate that each individual omics data set is analyzed independently up to normalization, and then integrated with the other information. New integrative methodology has to account for such different normalizations, missing data, and also integration that is not gene-based, but rather incorporates time-series analyses, as for example was carried out in the iPOP study [73] . Classification of changes by temporal response, and possibly interaction data leads to an interpretation of components based on shared similar dynamics and avoids some of the issues of insufficient annotations and missing information. Such an interpretation lends itself to a clinical setting where dynamic changes are associated to varying personalized physiological states, and may be adopted by the medical community. To facilitate the wide adoption of the methods into personalized medicine, the integrated data analysis will require optimization of current computational tools to rapidly and efficiently handle as well as visualize the multiple omics data. As a first step, the amount of computation time for different analyses must be reduced from days (in the case of mapping sequence data and quantitative proteomics in current omics analyses presented above) to hours or less to have immediate relevance to active medical examinations. Secondly, better visualizations of omics data, though difficult, are also necessary, as multidimensional information is difficult to collate, present, and interpret (many efforts are addressing this, e.g., Circos plots that allow multiple sequence information to be displayed together are now widely adopted [212] ). Incorporating such information with clinical data and phenotypes presents a new challenge, requiring browsers that combine temporal information with multi-dimensional omics sets. We believe network analysis [213] [214] [215] [216] [217] presents an excellent visualization and integration possibility, allowing the combinations of multiple levels of networks, dynamically changing, that will include cellular information, component and corresponding disease temporal progressions, as well as medical assay data in a modularized approach. The computational analyses and visualization of omics data integration also reveal the known need to manage large amounts of data [218, 219] , both in terms of processing power, as well as storage capacity and maintaining easy accessibility, especially for the practicing clinicianwith the recent advent of cloud computing providing one possible solution. Finally, the combination of omics data with medical records presents another challenge, with privacy and ethical issues that must be considered. Such improvements and standardization of approaches will help make the analysis available in a clinical setting and an increasingly larger set of patients, while encouraging the early adaptation of the integrated approaches by the scientific community towards personalized medicine applications. As technology improves we expect to see advancements in each omics implementation discussed above. In terms of sequencing, continual improvements in depth and read length will allow unambiguous precise sequence mapping and additionally the querying of lower gene expression, coupled to higher accuracy in variant calling. With sequencing times becoming faster (e.g., whole genome sequencing in~5-30 hours depending on platform at deep,~100Â coverage), and hardware more compact, eventually such technology will be available in the clinic, enabling the incorporation of all genomic, transcriptomic, microbiomic and autoantibodyomic profiling as parts of regular medical examinations. Correspondingly, mass spectrometry improvements (including table-top hardware now available) will improve mass accuracy, and higher sensitivity, allowing increases in the number of proteins identified and better quantitation, which can already be implemented in a clinical setting. The MS improvements in combination with better metabolite cataloguing will also improve the identification of small molecules. The protocol and methodology advancements will allow using a smaller volume of patient sample needed for iPOP (decreasing from~80 mL to drops of blood) making it feasible to probe the omics on more regular basis for each patient, even providing home kits to send in self-collected samples (akin to what is already implemented to some degree by companies, e.g., 23andMe, that collect saliva samples for phenotyping). The technological and methodological advancements will allow for effective iPOP implementations with multiple patients, but it will still take some time to evaluate what constitutes actionable information and which components will be most informative. Once these relevant components are identified monitoring technologies can be further developed to help possible clinical implementations. This will certainly be alleviated by multiple iPOP studies providing the necessary aggregated information. However, clinical and psychological concerns need to be addressed and the possible impact to patient health being of paramount importance, in a medical process in which the patient is actively participating [220] . Such active participation requires the training of the public and health professionals to an understanding of genomic information, and how this omics knowledge impacts their health, and their families. Genetic counseling is a necessity, and the number of trained genetic counselors is steadily increasing. Informed consent will be necessary, but this requires an understanding of basic genomic terms that are not apparent to non-experts. To facilitate this, probably school curriculum adjustments will be needed to enable early education of the public. The emergence of quantitative Personal Omics, including genomes transcriptomes, proteomes, metabolomes and other omics allows us to now combine them to yield personalized actionable health care information. Such research is at the forefront of medical science, and may help the characterization of disorders and the implementation of precise personal medicine aimed towards prevention rather than treatment. Careful forward planning, coupled to the continuing interest and participation of the public, government agencies and researchers, assures that the development of personalized omics will proceed beyond possible hurtles into a novel approach for the 21st century health care implementations. Initial sequencing and analysis of the human genome The sequence of the human genome Finishing the euchromatic sequence of the human genome The diploid genome sequence of an Asian individual Accurate whole human genome sequencing using reversible terminator chemistry The complete genome of an individual by massively parallel DNA sequencing The diploid genome sequence of an individual human Personal genome sequencing: current approaches and challenges A decade's perspective on DNA sequencing technology Massively parallel sequencing: the next big thing in genetic medicine A sequencing method based on real-time pyrophosphate Real-time DNA sequencing using detection of pyrophosphate release The history of pyrosequencing New frontiers in plant functional genomics using next generation sequencing technologies NGSQC: cross-platform quality analysis pipeline for deep sequencing data Next Generation Genome Sequencing: Towards Personalized Medicine Human genome sequencing using unchained base reads on self-assembling DNA nanoarrays Sequence information can be obtained from single DNA molecules Real-time DNA sequencing from single polymerase molecules Real-time DNA sequencing from single polymerase molecules A window into third-generation sequencing Nanopore genome sequencer makes its debut Performance comparison of exome DNA sequencing technologies A haplotype map of the human genome A second generation human haplotype map of over 3.1 million SNPs ) dbSNP: the NCBI database of genetic variation Genetics. Harvesting medical information from the human family tree ) Genotype, haplotype and copy-number variation in worldwide human populations Genes mirror geography within Europe Population genetic inference from personal genome data: impact of ancestry and admixture on human genomic variation Development of a panel of genome-wide ancestry informative markers to study admixture throughout the Americas Genome-wide patterns of population structure and admixture in West Africans and African Americans Global variation in copy number in the human genome Origins and functional impact of copy number variation in the human genome Genome structural variation discovery and genotyping Genome-wide mapping of copy number variation in humans: comparative analysis of high resolution array platforms Paired-end mapping reveals extensive structural variation in the human genome BreakDancer: an algorithm for high-resolution mapping of genomic structural variation PEMer: a computational framework with simulation-based error models for inferring genomic structural variants from massive paired-end sequencing data Characterizing complex structural variation in germline and somatic genomes An integrated encyclopedia of DNA elements in the human genome Architecture of the human regulatory network derived from ENCODE data Genomics: ENCODE explained The making of ENCODE: lessons for big-data projects Annotation of functional variation in personal genomes using RegulomeDB A map of human genome variation from population-scale sequencing Potential etiologic and functional implications of genome-wide association loci for human diseases and traits Exome sequencing identifies ACAD9 mutations as a cause of complex I deficiency A de novo paradigm for mental retardation Exome sequencing reveals VCP mutations as a cause of familial ALS Whole-exome sequencing identifies recessive WDR62 mutations in severe brain malformations Exome sequencing identifies the cause of a mendelian disorder Exome sequencing identifies MLL2 mutations as a cause of Kabuki syndrome Exome sequencing, ANGPTL3 mutations, and familial combined hypolipidemia De novo mutations revealed by whole-exome sequencing are strongly associated with autism Medulloblastoma exome sequencing uncovers subtype-specific somatic mutations Exome sequencing of head and neck squamous cell carcinoma reveals inactivating mutations in NOTCH1 Single-cell exome sequencing reveals single-nucleotide mutation characteristics of a kidney tumor Single-cell exome sequencing and monoclonal evolution of a JAK2-negative myeloproliferative neoplasm Integrated genomic analyses of ovarian carcinoma The Roche Cancer Genome Database 2.0 The Cancer Cell Line Encyclopedia enables predictive modelling of anticancer drug sensitivity A comprehensive catalogue of somatic mutations from a human cancer genome Whole-genome sequencing identifies recurrent mutations in chronic lymphocytic leukaemia Whole-genome analysis informs breast cancer response to aromatase inhibition Genome remodelling in a basal-like breast cancer metastasis and xenograft Identification of high-confidence somatic mutations in whole genome sequence of formalin-fixed breast cancer specimens A whole-genome massively parallel sequencing analysis of BRCA1 mutant oestrogen receptor-negative and -positive breast cancers DNA sequencing of a cytogenetically normal acute myeloid leukaemia genome Identification of a novel TP53 cancer susceptibility mutation through whole-genome sequencing of a patient with therapyrelated AML Phased whole-genome genetic risk in a family quartet using a major allele reference sequence Personal omics profiling reveals dynamic molecular and medical phenotypes Analysis of genetic inheritance in a family quartet by wholegenome sequencing Whole-genome sequencing for optimized patient management Clinical assessment incorporating a personal genome Individualization of drug therapy: history, present state, and opportunities for the future Moving towards individualized medicine with pharmacogenomics Pharmacogenetics of chronic cardiovascular drugs: applications and implications Pharmacogenomics: a path to predictive medicine for schizophrenia From pharmacogenomic knowledge acquisition to clinical applications: the PharmGKB as a clinical pharmacogenomic biomarker resource Personal genomes in progress: from the human genome project to the personal genome project A public resource facilitating clinical use of genomes The personal genome project Genomics: personal genome project Genomewide analysis of mRNA processing in yeast using splicing-specific microarrays Transcriptional maps of 10 human chromosomes at 5-nucleotide resolution Global identification of human transcribed sequences with genome tiling arrays Empirical analysis of transcriptional activity in the Arabidopsis genome A high-resolution map of transcription in the yeast genome Hybridization interactions between probesets in short oligo microarrays lead to spurious correlations Toward a universal microarray: prediction of gene expression through nearestneighbor probe sequence identification RNA-Seq: a revolutionary tool for transcriptomics Dynamic repertoire of a eukaryotic transcriptome surveyed at single-nucleotide resolution The transcriptional landscape of the yeast genome defined by RNA sequencing Mapping and quantifying mammalian transcriptomes by RNA-Seq RNA-seq: an assessment of technical reproducibility and comparison with gene expression arrays Transcriptome sequencing to detect gene fusions in cancer Widespread shortening of 3'UTRs by alternative cleavage and polyadenylation activates oncogenes in cancer cells Identification of somatically acquired rearrangements in cancer using genome-wide massively parallel paired-end sequencing The clonal and mutational evolution spectrum of primary triple-negative breast cancers Alternatively spliced NKp30 isoforms affect the prognosis of gastrointestinal stromal tumors Alternative splicing and biological heterogeneity in prostate cancer Correlation between protein and mRNA abundance in yeast Absolute protein expression profiling estimates the relative contributions of transcriptional and translational regulation The biological impact of mass-spectrometry-based proteomics Mass spectrometry-based proteomics Quantitative proteome analysis: methods and applications A mass spectrometric journey into protein and proteome research Proteomics of organelles and large cellular structures Quantitative, high-resolution proteomics for data-driven systems biology Proteomic analysis of posttranslational modifications Existing bioinformatics tools for the quantitation of post-translational modifications Mass spectrometry-based proteomics using Q Exactive, a highperformance benchtop quadrupole Orbitrap mass spectrometer Mass spectrometry-based proteomics turns quantitative Stable isotope labeling by amino acids in cell culture for quantitative proteomics A practical recipe for stable isotope labeling by amino acids in cell culture (SILAC) Properties of 13C-substituted arginine in stable isotope labeling by amino acids in cell culture (SILAC) Stable isotope labeling by amino acids in cell culture, SILAC, as a simple and accurate approach to expression proteomics Use of stable isotope labeling by amino acids in cell culture as a spike-in standard in quantitative proteomics 8-plex quantitation of changes in cerebrospinal fluid protein expression in subjects undergoing intravenous immunoglobulin treatment for Alzheimer's disease Multiplexed protein quantitation in Saccharomyces cerevisiae using amine-reactive isobaric tagging reagents Tandem mass tags: a novel quantification strategy for comparative analysis of complex protein mixtures by MS/MS Relative quantification of proteins in human cerebrospinal fluids by MS/MS using 6-plex isobaric tags Mass spectrometry and protein analysis Quantitative shotgun proteomics using a protease with broad specificity and normalized spectral abundance factors SuperHirn -a novel tool for high resolution LC-MS-based peptide/ protein profiling A platform for accurate mass and time analyses of mass spectrometry data Role of spectral counting in quantitative proteomics A model for random sampling and estimation of relative protein abundance in shotgun proteomics Relationship between serum concentrations of saturated fatty acids and unsaturated fatty acids and the homeostasis model insulin resistance index in Japanese patients with type 2 diabetes mellitus Controlling the false discovery rate with constraints: the Newman-Keuls test revisited A proteomics approach to understanding protein ubiquitination Robust detection of periodic time series measured from biological systems Quantitative analysis of complex protein mixtures using isotope-coded affinity tags Protein pathway and complex clustering of correlated mRNA and protein expression analyses in Saccharomyces cerevisiae Comparative analysis of different label-free mass spectrometry based protein abundance estimates and their correlation with RNA-Seq gene expression data Defining the transcriptome and proteome in three functionally different human cell lines Global survey of organ and organelle protein expression in mouse: combined proteomic and transcriptomic profiling Correlations between RNA and protein expression profiles in 23 human cell lines Analysis of mRNA expression and protein abundance data: an approach for the comparison of the enrichment of features in the cellular population of proteins and transcripts Use of proteomic patterns in serum to identify ovarian cancer Deep proteome and transcriptome mapping of a human cancer cell line Multidimensional identification of tissue biomarkers of gastric cancer The application of CD antigen proteomics to pharmacogenomics Overcoming key technological challenges in using mass spectrometry for mapping cell surfaces in tissues Genetic variation in metabolic phenotypes: study designs and applications Mass spectrometry-based holistic analytical approaches for metabolite profiling in systems biology studies The human serum metabolome KEGG: Kyoto encyclopedia of genes and genomes PubChem's BioAssay Database The MetaCyc database of metabolic pathways and enzymes and the BioCyc collection of pathway/genome databases An accelerated workflow for untargeted metabolomics using the METLIN database Molecular formula and METLIN Personal Metabolite Database matching applied to the identification of compounds generated by LC/TOF-MS METLIN: a metabolite mass spectral database Reactome: a knowledge base of biologic pathways and processes Reactome knowledgebase of human biological pathways and processes An introduction to the reactome knowledgebase of human biological pathways and processes Reactome: a knowledgebase of biological pathways Reactome: a database of reactions, pathways and biological processes FADS1 genetic variability interacts with dietary α-linolenic acid intake to affect serum non-HDL-cholesterol concentrations in European adolescents Dietary n-3 and n-6 polyunsaturated fatty acid intake interacts with FADS1 genetic variation to affect total and HDL-cholesterol concentrations in the Doetinchem Cohort Study Metabolic profiles of cancer cells Metabolite profiling identifies a key role for glycine in rapid cancer cell proliferation Interplay between lipids and branched-chain amino acids in development of insulin resistance Extensive in vivo metabolite-protein interactions revealed by large-scale systematic analyses Printing proteins as microarrays for high-throughput function determination Protein microarrays for highly parallel detection and quantitation of specific proteins and antibodies in complex solutions Protein arrays for autoantibody profiling and fine-specificity mapping Autoantigen microarrays for multiplex characterization of autoantibody responses Systems biology approaches to disease marker discovery Identification of differentially expressed proteins in ovarian cancer using high-density protein microarrays Severe acute respiratory syndrome diagnostics using a coronavirus protein microarray B cells promote insulin resistance through modulation of T cells and production of pathogenic IgG antibodies Nucleic Acid programmable protein arrays: versatile tools for array-based functional protein studies Immunoprofiling using NAPPA protein microarrays Peptide microarrays for serum antibody diagnostics Functional peptide microarrays for specific and sensitive antibody diagnostics Detection of human anti-flavivirus antibodies with a West Nile virus recombinant antigen microsphere immunoassay Genomic approaches to studying the human microbiota The impact of the gut microbiota on human health: an integrative view The human microbiome: our second genome Experimental and analytical tools for studying the human microbiome Community health care: therapeutic opportunities in the human microbiome The human microbiome: at the interface of health and disease An obesity-associated gut microbiome with increased capacity for energy harvest Innate immunity and intestinal microbiota in the development of Type 1 diabetes A metagenome-wide association study of gut microbiota in type 2 diabetes Role of the commensal microbiota in normal and pathogenic host immune responses The DNA methylome Functions of DNA methylation: islands, start sites, gene bodies and beyond Analysing and interpreting DNA methylation data Principles and challenges of genomewide DNA methylation analysis The DNA methylome of human peripheral blood mononuclear cells Hotspots of aberrant epigenomic reprogramming in human induced pluripotent stem cells Charting a course for genomic medicine from base pairs to bedside Personalized cancer medicine and the future of pathology Personalized medicine in a phase I clinical trials program: the MD Anderson Cancer Center Initiative Making personalized cancer medicine a reality: challenges and opportunities in the development of biomarkers and companion diagnostics Ovarian cancer: prevention, detection, and treatment of the disease and its recurrence. Molecular mechanisms and personalized medicine meeting report Personalized medicine in breast cancer: a systematic review Personalized oncology through integrative highthroughput sequencing: a pilot study Studying and modelling dynamic biological processes using time-series gene expression data DAVID: Database for Annotation, Visualization, and Integrated Discovery Cytoscape 2.8: new features for data integration and network visualization Cytoscape: a software environment for integrated models of biomolecular interaction networks BiNGO: a Cytoscape plugin to assess overrepresentation of gene ontology categories in biological networks Integration of biological networks and gene expression data using Cytoscape Detecting and annotating genetic variations using the HugeSeq pipeline Circos: an information aesthetic for comparative genomics Critical phenomena in complex networks Statistical mechanics of complex networks Biological networks: the tinkerer as an engineer Complex networks: the key to systems biology Evolution and dynamics of protein interactions and networks Cloud and heterogeneous computing solutions exist today for the emerging big data problems in biology Big data, but are we ready? Opportunities and challenges for the integration of massively parallel genomic sequencing into clinical practice: lessons from the ClinSeq project SOAP: short oligonucleotide alignment program Fast and accurate short read alignment with Burrows-Wheeler transform The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data The Sequence Alignment/Map format and SAMtools ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data SIFT: predicting amino acid changes that affect protein function A method and server for predicting damaging missense mutations Using SIFT and PolyPhen to predict loss-of-function and gain-of-function mutations CNVnator: an approach to discover, genotype, and characterize typical and atypical CNVs from family and population genome sequencing MSB: a mean-shift-based approach for the analysis of structural variation in the genome Pindel: a pattern growth approach to detect break points of large deletions and medium sized insertions from paired-end short reads Nucleotideresolution analysis of structural variants using BreakSeq and a breakpoint library DELLY: structural variant discovery by integrated paired-end and split-read analysis Bioconductor: open software development for computational biology and bioinformatics Ultrafast and memory-efficient alignment of short DNA sequences to the human genome Fast gapped-read alignment with Bowtie 2 Aligning short sequencing reads with Bowtie Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks TopHat: discovering splice junctions with RNA-Seq Identification of novel transcripts in annotated genomes using RNA-Seq Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation Using GenePattern for gene expression analysis Ab initio reconstruction of cell type-specific transcriptomes in mouse reveals the conserved multi-exonic structure of lincRNAs Differential expression analysis for sequence count data SEQanswers: an open access community for collaboratively decoding genomes ) mzML -a community standard for mass spectrometry data Mass spectrometer output file format mzML mzML: a single, unifying data format for mass spectrometer output ProteoWizard: open source software for rapid proteomics tools development TANDEM: matching proteins with tandem mass spectra An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database Probability-based protein identification by searching sequence databases using mass spectrometry data Open mass spectrometry search algorithm Evaluation of multidimensional chromatography coupled with tandem mass spectrometry (LC/LC-MS/MS) for large-scale protein analysis: the yeast proteome Intensity-based protein identification by machine learning from a library of tandem mass spectra PEAKS DB: de novo sequencing assisted database search for sensitive and accurate peptide identification Trans-proteomic pipeline: a pipeline for proteomic analysis Software pipeline and data analysis for MS/MS proteomics: the trans-proteomic pipeline Trans-Proteomic Pipeline supports and improves analysis of electron transfer dissociation data sets A guided tour of the Trans-Proteomic Pipeline OpenMS -an open-source software framework for mass spectrometry TOPP -the OpenMS proteomics pipeline OpenMS and TOPP: open source software for LC-MS data analysis ) metaXCMS: second-order analysis of untargeted metabolomics data Online: a web-based platform to process untargeted metabolomic data XCMS: processing mass spectrometry data for metabolite profiling using nonlinear peak alignment, matching, and identification MZmine 2: modular framework for processing, visualizing, and analyzing mass spectrometry-based molecular profile data MZmine: toolbox for processing and visualization of mass spectrometry based molecular profile data The MetaCyc Database of metabolic pathways and enzymes and the BioCyc collection of Pathway/Genome Databases The MetaCyc database of metabolic pathways and enzymes and the BioCyc collection of pathway/genome databases Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources We would like to thank the Stanford Genetics Department and the NIH for support through grant P50HG02357. GIM would also like to thank the NIH for support through training grant T32HG000044. We also thank Drs. Rui Chen, Jennifer Li Pook Than and Hogune Im for useful discussions.