Submitted 13 November 2018
Accepted 12 May 2019
Published 10 June 2019

Corresponding author
Andreas Gogol-Döring,
andreas.gogol-doering@mni.thm.de

Academic editor
James Procter

Additional Information and
Declarations can be found on
page 10

DOI 10.7717/peerj-cs.198

Copyright
2019 Menzel et al.

Distributed under
Creative Commons CC-BY 4.0

OPEN ACCESS

Enhort: a platform for deep analysis of
genomic positions
Michael Menzel, Peter Koch, Stefan Glasenhardt and Andreas Gogol-Döring
MNI, Technische Hochschule Mittelhessen—University of Applied Sciences, Giessen, Hessen, Germany

ABSTRACT
The rise of high-throughput methods in genomic research greatly expanded our
knowledge about the functionality of the genome. At the same time, the amount
of available genomic position data increased massively, e.g., through genome-wide
profiling of protein binding, virus integration or DNA methylation. However, there
is no specialized software to investigate integration site profiles of virus integration or
transcription factor binding sites by correlating the sites with the diversity of available
genomic annotations. Here we present Enhort, a user-friendly software tool for relating
large sets of genomic positions to a variety of annotations. It functions as a statistics
based genome browser, not focused on a single locus but analyzing many genomic
positions simultaneously. Enhort provides comprehensive yet easy-to-use methods for
statistical analysis, visualization, and the adjustment of background models according
to experimental conditions and scientific questions. Enhort is publicly available online
at enhort.mni.thm.de and published under GNU General Public License.

Subjects Bioinformatics, Computational Biology
Keywords Virology, Data analysis, Genome annotation, Next-generation sequencing, Integration
profiling

INTRODUCTION
Some viruses like HIV (Craigie & Bushman, 2012) and AAV (Deyle & Russell, 2009) are
able to copy their genomic sequence into the genome of an infected cell. This can have
severe impact on host cell stability as the integration may hit and disable a gene or a
regulatory region. The investigation of characteristics and underlying driving factors for
virus integration is not only relevant for virology and infectious diseases research but
also for approaches in gene therapy that apply virus-derived vectors and transposons to
deliver functional DNA fragments into host cells (Riviere, Dunbar & Sadelain, 2012; Li et
al., 2015). Each gene delivery system has its own mechanisms for genomic integration and
preferences for choosing integration sites, hence different systems may have different risks
for causing undesired side effects.

Next Generation Sequencing (NGS) facilitates the genome-wide profiling of integration
sites, as they are collected e.g., in investigations of protein binding, virus/transposon
integration or DNA methylation. Integration sites are available from databases like the
Retrovirus Integration Database (Shao et al., 2016) and are regularly created for novel
targeted vectors. Typically, the identified sites are related to a variety of genomic features
and any integration preferences are determined by a comparison of actual integration
sites to a set of random control sites (Gogol-Döring et al., 2016). A proper background

How to cite this article Menzel M, Koch P, Glasenhardt S, Gogol-Döring A. 2019. Enhort: a platform for deep analysis of genomic posi-
tions. PeerJ Comput. Sci. 5:e198 http://doi.org/10.7717/peerj-cs.198

https://peerj.com
mailto:andreas.gogol-doering@mni.thm.de
https://peerj.com/academic-boards/editors/
https://peerj.com/academic-boards/editors/
http://dx.doi.org/10.7717/peerj-cs.198
http://creativecommons.org/licenses/by/4.0/
http://creativecommons.org/licenses/by/4.0/
http://doi.org/10.7717/peerj-cs.198


model should mimic all known biases of the signal data originating from experimental
or laboratory conditions. If, for example, a profiling method is only capable of detecting
integration events that are close to certain enzyme restriction sites then the control sites
should also be selected accordingly.

Several tools have been published that are capable of processing genomic positions and
annotations, like the Genomic HyperBrowser (Sandve et al., 2013). Genome browsers
like the UCSC Genome Browser (Kent et al., 2002), IGV (Robinson et al., 2011) or
Artemis (Carver et al., 2011) are designed for inspecting single genomic locations. Also
custom written scripts are commonly used for the analysis of genomic positions (Cook
et al., 2014) or libraries like PyBedTools (Janovitz et al., 2014; Dale, Pedersen & Quinlan,
2011). Once written these scripts have the benefit of being a reusable option to conduct
a specific set of analysis on recurring data. However, they are limited by the available
functionality because each function has be newly developed. Additionally, comparability
across laboratories is afflicted by varying functionality and different implementations of
background models. There is yet no specialized tool for genomic positions analysis that
combines the features of instant analysis and user defined adaptable background models
that mimic known biases.

In this paper we present Enhort, a user-friendly web-platform for deep analysis of large
sets of genomic positions. Our aim is to accelerate and simplify the data analysis process as
well as to standardize it in order to increase reproducibility. Enhort is capable of adjusting
background sites used for comparison by user selected covariates. This includes annotation
tracks like restriction sites or chromatin accessibility, gene expression tracks and sequence
motifs. With covariates it is possible to adjust the background sites selection in a way that
they match the investigated sites for a specific track. The adaptation rules out the effects
of this annotation for the background. This feature can be used to adjust for experimental
bias as well as specific questions. Figure 1 shows the schematic process of data gathering
and the usage of Enhort in the workflow of analyzing genomic positions.

METHODS
Integration sites of viruses are gathered by sequencing infected cells and preprocessing as
shown in Fig. 1. These sites are uploaded to Enhort and are intersected with each annotation
file to compute fold-change enrichment and χ2 test in comparison to control sites, yielding
a measure for effect strength and significance of each annotation respectively. Figure 2 shows
the schematic analysis pathway for sites uploaded by a user. Statistical analysis depends
on the Apache Commons Math library (https://commons.apache.org/proper/commons-
math/) and uses Bonferroni correction for multiple hypothesis testing. The libraries
plotly.js (https://plot.ly/javascript/) and Circos (http://circos.ca/) are used for visualization.
The results are sorted according to their relevance and presented in conjunction with
appropriate figures. Example results for a virus can be seen in Fig. 3A. The software has
been designed in a way that analysis results are almost immediately available after upload.

In many cases a background model consisting of random sites is not sufficient for an
adequate analysis. Some protocols, for example, can only detect integration events that

Menzel et al. (2019), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.198 2/13

https://peerj.com
https://commons.apache.org/proper/commons-math/
https://commons.apache.org/proper/commons-math/
https://plot.ly/javascript/
http://circos.ca/
http://dx.doi.org/10.7717/peerj-cs.198


Virus VirusDNA

Sequencing

Reads

Genomic
Sequence

Mapping

W
et

 L
ab

P
re

pr
oc

es
si

ng

Site Site

Other Data 
Sources

E
nh

or
t

Sets of Sites

Background Model Generation

Statistical Analysis

Output Generation

Genomic
Annotations

Results
Hit 1

Hit 2

Hit 3

chr 1

chr 2

chr 3

Figure 1 Overview of preparatory work and data gathering for analysis in Enhort. Reads containing vi-
ral integration sites are identified and sequenced in the WebLab and mapped to a reference genome. Iden-
tified insertion sites are converted to a BED file for the usage in Enhort. Together with genomic annota-
tions from public database the analysis in Enhort is conducted to generated analysis of the given integra-
tion sites.

Full-size DOI: 10.7717/peerjcs.198/fig-1

occurred in close proximity to a restriction site of a specific enzyme, like EcoRI, which
cuts inside of GAATTC hexamers (Pingoud & Jeltsch, 2001). Background models should be
adapted to mimic the actual integration pattern with regard to any known technical bias.
In this case, the control sites should also be selected to be near restriction sites. This can
be achieved in Enhort by setting the appropriate genome annotation as a covariate. When
selecting the track that contains all possible genomic positions of GAATTC hexamers as
covariate, Enhort will generate a set of control sites having exactly the same distribution of
distances to the enzyme restriction sites as the actual virus integration sites.

Menzel et al. (2019), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.198 3/13

https://peerj.com
https://doi.org/10.7717/peerjcs.198/fig-1
http://dx.doi.org/10.7717/peerj-cs.198


Sites are uploaded
by a user 

Background sites
are created (1) 

User sites are
tested against the

background (2) 

Tables and figures
are shown 

Export results

User selects
covariates

Length of all
integration
intervals is
determined

Random positions
are set between 0
and the combined

interval length 

Random positions
are spread

according to
interval locations
on the genome

Positions for each
combination are

merged

Input:
Site count
Covariates

Genome version 

Count number of
sites inside

intervals for user
and background

sites

Contingency table

χ2 test, correction
for multiple testing

and 
fold change 

Input:
User sites

Background sites

for each
track

for each
combination

for each
combination

for each
track

Annotation tracks
from

databases 

1:

2:

Figure 2 Flowchart of the procedure of analysis performed by Enhort. Blue boxes show the steps to cre-
ate a background model based on multiple covariates. Random positions have to be set for each combi-
nation of covariates. Green boxes show the steps to test the user sites against the background sites. The re-
sults are returned as a table and converted into figures for the user.

Full-size DOI: 10.7717/peerjcs.198/fig-2

Covariates help to adapt the background model both for technical circumstances, for
example, restriction sites and for eliminating a bias or biological preferences such as
motifs or genetic features. Covariates can also be used to identify dependent or separate
weak integration preferences that are covered by stronger effects, as shown in Fig. 3B.
MLV integration sites are compared to two different control sets: A random and an altered
background, to identify the actual integration preferences; e.g., for histone mark H3K4me3,
which is a known preference of MLV (Gogol-Döring et al., 2016).

For the validity of statistical testing it is usually indispensable to normalize the
background model relative to multiple covariates. For that purpose, Enhort supports
the selection of multiple covariates simultaneously in order to further investigate the
integration site characteristics. For example, Enhort may create a control set that considers
chromatin accessibility, restriction site distance as well as several histone modifications
simultaneously. This functionality is needed to build background models for sites that
are influenced by multiple factors, e.g., biological and technical biases. A set of additional
features listed in the following table:
1. Statistical analysis for annotation tracks:

(a) Fold change
(b) χ2 test
(c) Kolmogorov–Smirnov test

2. Hotspot analysis (Fig. 4C)
3. Position depended enrichment (Fig. 4A)
4. Background models based on:

Menzel et al. (2019), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.198 4/13

https://peerj.com
https://doi.org/10.7717/peerjcs.198/fig-2
http://dx.doi.org/10.7717/peerj-cs.198


Figure 3 Output view example, generated by Enhort when analyzing Murine Leukemia Virus (MLV)
integration sites in CD4 + T cells (Roth, Malani & Bushman, 2011). (A) The results are presented in a
table containing for each annotation the p value, effect size and a visual representation of the integration.
The annotations are ranked by effect strength. (B) Effect of covariate selection. The upper diagram con-
tains integration frequencies of MLV compared to random sites for a selection of annotations. This virus
is known for preferentially integrating near transcription start sites (TSS) and H3K4me3 histone marks
(LaFave et al., 2014). The lower diagram shows the same data after selecting H3K4me3 as covariate. The
adapted background model is generated in a way that control sites and MLV integration sites have the
same frequency relative to H3K4me3. This also changed the control site frequencies for other annotations:
MLV integration is no longer enriched but depleted in CpG islands when compared to the adapted back-
ground model.

Full-size DOI: 10.7717/peerjcs.198/fig-3

(a) Inside and outside of annotations
(b) Distance to annotations
(c) Scored annotations
(d) Sequence logo

5. Upload background sites
6. Comparing effects of different background models
7. Batch analysis of multiple integration sets
8. Heatmaps to compare integration sets (Fig. 4B)
9. Custom annotation tracks
10. Blend annotation tracks
11. Export results as R code and CSV files

Enhort is separated into a lightweight, web-based user interface and a high performance
back-end server attached to a SQLite database storing meta-information about the
annotations fetched from DeepBlue (Albrecht et al., 2016). Results from Enhort are
instantaneously available as seen in Table 1 where the run times for different input
sizes are shown. Our application currently offers 1402 annotation tracks from 97 cell

Menzel et al. (2019), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.198 5/13

https://peerj.com
https://doi.org/10.7717/peerjcs.198/fig-3
http://dx.doi.org/10.7717/peerj-cs.198


Table 1 Analysis execution times for different usual site counts, annotation tracks from hg19 and co-
variate counts. (Back-end server: SuperMicro SuperServer 4048B-TRFT 4x Intel Xeon E7-8867v3 with
2048GB DDR3 ECC LR).

Track count 23 1,127

Covariate count 0 2 5 0 2 5

Site count Execution time (ms)

150k 877 1,188 4,668 8,538 10,436 12,540
125k 717 1,103 5,628 5,509 7,552 7,975
100k 749 817 5,085 3,724 4,672 8,673
75k 624 571 4,019 4,905 4,397 9,633
50k 470 555 5,455 4,736 5,844 10,451
25k 308 351 4,628 3,246 3,091 8,111

lines and tissues for human genome assemblies hg19 and hg38, downloaded from
UCSC Genome Browser (Fujita et al., 2011), Encode (ENCODE Project Consortium, 2004),
ChIP-Atlas (http://chip-atlas.org), BLUEPRINT Epigenome (Adams et al., 2012) and
Roadmap Epigenomics (Roadmap Epigenomics Consortium et al., 2015) using the DeepBlue
Epigenomic Data Server (Albrecht et al., 2016).

RESULTS AND DISCUSSION
Literature review
We reviewed the relevance of Enhort for contemporary research by systematically searching
PubMed, Google Scholar, and several review articles for publications concerning the analysis
of genomic integration sites. The publications include virus integration site analysis for
HIV, MLV, HRP-2, SIV, foamy virus, HPV, AAV and transposons such as piggyBac,
LINE-1, Alu and sleeping beauty. In total we identified 59 relevant publications. Details on
the reviewed publications and methodological analysis are available in the Table S1. Of these
publications 19 used completely random control sites, only six used adapted control sites.
The data analyses presented in 37 (63%) publications could have been entirely performed
with our tool. Six further publications use at least some methods provided by Enhort. We
assume that if they had the opportunity to use Enhort the authors would have saved a lot
of effort writing custom analysis scripts.

Data re-analysis
To further present the capabilities of Enhort we re-analyzed integration sites of the PiggyBac
transposon (PB) published by Gogol-Döring et al. (2016) using Enhort. Results from Wilson,
Coates & George (2007) are used for comparison. PB integration characteristics show a
preference for genes, exons, introns, highly expressed genes, DNase I hypersensitive sites,
H3K4me3 and open chromatin structures (Wilson, Coates & George, 2007; Li et al., 2013).
We uploaded the PB integration sites to Enhort, selected all relevant tracks and finally
exported the results. Figure 5A shows the log fold changes for a selection of annotations
for PB against a random background in grey. Figure 5B shows the sequence logos for the

Menzel et al. (2019), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.198 6/13

https://peerj.com
http://chip-atlas.org
http://dx.doi.org/10.7717/peerj-cs.198#supp-1
http://dx.doi.org/10.7717/peerj-cs.198


Figure 4 Additional plots generated by Enhort. (A) Circos plot (Krzywinski et al., 2009) of position de-
pendent enrichment over all chromosomes for MLV for the most significant tracks. (B) Heatmap for a set
of three integration data sets against various annotations. The values are log2-fold changes of the numbers
of integration vs control sites falling into a given annotation. Star symbols mark statistically significant
changes. The same background sites are used for the comparisons. The background sites are adapted to in-
tegrate only inside the sequence contigs. (C) Integration hotspots across the genome for MLV. The color
intensity of the thin bars show the integration ratio inside of the respective genomic region.

Full-size DOI: 10.7717/peerjcs.198/fig-4

PB integration sites and the random background. The barplots were created using the
R-export feature of Enhort.

The key feature of the PB integration preference is the TTAA motif in which all
integrations occur. To precisely analyze the preferences of PB integration the background
model has to be adapted to replicate the TTAA motif preference. This can be achieved
using Enhort by creating a set of pseudo-random control sites that are located only inside
a TTAA sequence. To achieve this, we simply selected the sequence logo as a covariates.
Enhort takes genomic positions from a pre-sampled set of positions where each position
has a probability based on the similarity between the surrounding sequence and the TTAA
sequence. The results are shown in Fig. 5C where the background sites and PB show a
similar motif after the motif is added as a covariate using Enhort. The motif adaption also
changes the observed integration characteristics seen in Fig. 5A. The relative decreased
integration of PB into coding exons is changed to a significant preference, because CpG
islands are less likely to be hit by a site from the adapted background model, as TTAA
occurs relatively less frequent in CpG islands. The same applies to DNAse cluster regions,
TSS and exons, where the significance of integration is enriched in comparison to a random
background. Only a small change for the enrichment in introns and genes is visible. Overall

Menzel et al. (2019), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.198 7/13

https://peerj.com
https://doi.org/10.7717/peerjcs.198/fig-4
http://dx.doi.org/10.7717/peerj-cs.198


Figure 5 Analysis of PB integration sites. (A) Log fold changes of PB integration sites in relation to sev-
eral annotations against a random and an adapted background model. Changing the background model
to adapt the TTAA motif changes the observation of several integration preferences. (B) The PB motif and
random sites motif, corresponding with the random background bars in (A). (C) Motif of the random
sites after adaption to the PB motif using Enhort.

Full-size DOI: 10.7717/peerjcs.198/fig-5

Table 2 Log fold changes and integration ratios of Wilson, Coates & George (2007) in comparison to Enhort for two PB integration site sets.

Enhort Wilson et al. Enhort Wilson et al. Enhort Wilson et al.
Annotation track Fold change Fold change PB (%) PB (%) Random (%) Random (%)

RefSeq genes 1.32 1.46 63.08 48.8 47.93 33.2
TSS (±5 kb) 2.14 3.00 20.8 16.2 9.7 5.4
CpG islands (±1 kb) 5.52 2.00 12.99 3.8 2.35 1.9
CpG islands (±5 kb) 2.82 0.96 22.85 7.7 8.09 8.3

Repeats:
LINE 0.71 0.76 7.72 12.7 10.90 16.7
SINE 0.50 0.54 3.8 6.0 7.64 11.1
LTR 0.56 1.84 2.79 6.8 5.0 3.7
DNA 1.61 1.18 1.87 4.0 1.61 3.4

this indicates that beside the TTAA preference of PB there are additional mechanisms that
alter the integration preferences. Using the background adaption feature of Enhort it would
be possible to test different hypothesis against the data and build a model that explains the
integration preferences.

To further review the analytic capabilities of our software, the integration counts of
PB sites are compared to published results from Wilson, Coates & George (2007). The
comparison can be seen in Table 2. An increased integration of PB into RefSeq genes,
inside the 5kb-TSS window, as well as a preference for CpG islands is observable for both
analyses.

Wu et al. (2003) published a study on MLV and HIV stating that MLV favors TSS
regions, whereas HIV does not display a strong preference towards TSS regions. The

Menzel et al. (2019), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.198 8/13

https://peerj.com
https://doi.org/10.7717/peerjcs.198/fig-5
http://dx.doi.org/10.7717/peerj-cs.198


Table 3 Comparison between fold changes of Wu et al. (2003) and Enhort over different annotations
on the same integration sites.

Wu et al. Enhort

HIV MLV HIV MLV HIVa MLVa HIVb MLVb

RefSeq genes 2.58* 1.5* 1.7* 1.4* 1 1 1 1
Housekeeping genes – – 3.7 * 1.36 2.22* 1.12 2.05* 1.04
CpG islands (±1 kb) 1 8* 0.41 6.24* 0.35 6.17* 0.31 4.09*

TSS (±5 kb) 2.5* 4.7* 1.34 2.3* 1.14 2.02* 1 1
H4K20me1 – – 1.71* 1.56* 1.34* 1.52* 1.36* 1.42*

H3K4me2 – – 1.23 21.7* 1.48 21.29* 1.09 15.2*

H3K27ac – – 0.9 24.52* 1.01 22.79* 0.83 20.12*

Notes.
*P <0.002.
awith RefSeq genes as covariate.
bwith RefSeq genes and TSS (± 5 kb) as covariates.

available integration sites were uploaded to Enhort and analyzed using the batch tool
with a random 10,000 site background model. The results from Enhort show a similar
integration pattern as stated in Wu et al. (2003) (Table 3). Except for CpG islands for HIV
where Wu et al. found a near random integration and we found a decreased integration.

For further review, HIV and MLV integration sites were uploaded independently to
Enhort, and RefSeq genes added as covariate. This background model had only a little effect
on MLV as the preference for TSS and CpG islands only changed slightly, indicating that
the preference for TSS is not due to a preference for RefSeq genes. For the HIV integration
sites the housekeeping genes, which are a known preference of HIV (Craigie & Bushman,
2012), are still statistically significant against this background model.

Finally, RefSeq genes and TSS (±5 kb) were both used as covariates together, showing
that the integration ratio of MLV into CpG islands with a (±1 kb) window decreases
slightly. This shows that the integration into the CpG islands is probably not a side effect
of the preference for TSS or genes. The combined background model with RefSeq genes
and TSS does not have any influence on the HIV fold changes compared to the previous
background model.

The creation of each background model and comparing the results was possible using
built-in features of Enhort. We further added histone modifications to the analysis
showing that H4K20me1 is significantly enriched for both integration sets and does not
change significantly for the different background models. This indicates that the histone
modification preferences is an additional effect, only slightly influenced by the preference
for genes and TSS. H3K4me2 and H3K27ac are known preferences of MLV (De Ravin
et al., 2014) and show a high fold change for all background models. With the available
database it would be easy to add numerous additional annotations for comparison.

We have shown that Enhort is capable of reproducing integration site analysis with less
effort and additionally offers easy-to-use mechanisms to create more sophisticated analysis
using adaptable background models. The exact annotation files were not available for
comparison, so it was not possible to produce the exact numbers. However, Enhort uses

Menzel et al. (2019), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.198 9/13

https://peerj.com
http://dx.doi.org/10.7717/peerj-cs.198


the same calculation principle. With the same annotations and sites the results by Enhort
would be the same as in the referenced publications.

CONCLUSION
In this publication we present Enhort, a fast and easy-to-use analyzing platform for genomic
positions. Based on a comprehensive library of genomic annotations, Enhort provides a
wide range of methods to analyze large sets of sites. In contrast to multi-purpose software
such as bioconductor, Enhort enables scientists to analyze data without programming
effort or extensive manual work.

Our literature review shows that Enhort is able to perform most of the analyses commonly
used in the investigation of integration sites. The re-analysis of Wilson, Coates & George
(2007) and Wu et al. (2003) demonstrates that Enhort is able to reproduce analyses from
literature with little effort. It was not possible to reproduce the exact values, because
the version of the annotation was not recorded in the publications. However, more
detailed insights can be made using adaptable background models. This was shown in the
comparison of HIV and MLV from Wu et al. against different control sites.

Most publications use very simple background models for statistical analysis of
integration data and could potentially be improved using better background models.
Enhort provides methods to easily create more sophisticated background models for
improving both the accuracy and the range of possible analyses. Complex background
models can be used to identify weak effects and segregate driving factors for integration,
find a minimal set of annotations to mimic integration characteristics, as well as to
eliminate technical biases. In conclusion, this shows that Enhort will be a valuable tool
for further analyses of genomic positions, no matter if these positions are derived from
virus integration, sequence motifs, enzyme restrictions, histone modifications, or protein
binding.

ADDITIONAL INFORMATION AND DECLARATIONS

Funding
This work was supported by the Hessen State Ministry for Higher Education, Research and
the Arts. The funders had no role in study design, data collection and analysis, decision to
publish, or preparation of the manuscript.

Grant Disclosures
The following grant information was disclosed by the authors:
Hessen State Ministry for Higher Education, Research and the Arts.

Competing Interests
The authors declare there are no competing interests.

Author Contributions
• Michael Menzel conceived and designed the experiments, performed the experiments,
analyzed the data, contributed reagents/materials/analysis tools, prepared figures and/or

Menzel et al. (2019), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.198 10/13

https://peerj.com
http://dx.doi.org/10.7717/peerj-cs.198


tables, performed the computation work, authored or reviewed drafts of the paper,
approved the final draft.

• Peter Koch contributed reagents/materials/analysis tools, performed the computation
work.

• Stefan Glasenhardt contributed reagents/materials/analysis tools, performed the
computation work.

• Andreas Gogol-Döring prepared figures and/or tables, authored or reviewed drafts of
the paper, approved the final draft.

Data Availability
The following information was supplied regarding data availability:

The source code and build instructions are available at https://git.thm.de/mmnz21/
Enhort.

Supplemental Information
Supplemental information for this article can be found online at http://dx.doi.org/10.7717/
peerj-cs.198#supplemental-information.

REFERENCES
Adams D, Altucci L, Antonarakis S, Ballesteros J, Beck S, Bird A, Bock C, Boehm

B, Campo E, Caricasole A, Dahl F, Dermitzakis E, Enver T, Esteller M, Estivill
X, Ferguson-Smith A, Fitzgibbon J, Flicek P, Schacht C, Willcocks S. 2012.
BLUEPRINT to decode the epigenetic signature written in blood. Nature Biotech-
nology 30(3):224–226 DOI 10.1038/nbt.2153.

Albrecht F, List M, Bock C, Lengauer T. 2016. DeepBlue epigenomic data server:
programmatic data retrieval and analysis of epigenome region sets. Nucleic Acids
Research 44(W1):W581–W586 DOI 10.1093/nar/gkw211.

Carver T, Harris SR, Berriman M, Parkhill J, McQuillan JA. 2011. Artemis: an inte-
grated platform for visualization and analysis of high-throughput sequence-based ex-
perimental data. Bioinformatics 28(4):464–469 DOI 10.1093/bioinformatics/btr703.

Cook Lucy B, Melamed A, Niederer H, Valganon M, Laydon D, Foroni L, Taylor
GP, Matsuoka M, Bangham CRM. 2014. The role of HTLV-1 clonality, proviral
structure, and genomic integration site in adult T-cell leukemia/lymphoma. Blood
123(25):3925–3931 DOI 10.1182/blood-2014-02-553602.

Craigie R, Bushman FD. 2012. Hiv dna integration. Cold Spring Harbor Perspectives in
Medicine 2(7):Article 006890 DOI 10.1101/cshperspect.a006890.

Dale RK, Pedersen BS, Quinlan AR. 2011. Pybedtools: a flexible Python library for
manipulating genomic datasets and annotations. Bioinformatics 27(24):3423–3424
DOI 10.1093/bioinformatics/btr539.

De Ravin SS, Su L, Theobald N, Choi U, Macpherson JL, Poidinger M, Symonds G,
Pond SM, Ferris AL, Hughes SH, HL M, X W. 2014. Enhancers are major targets
for murine leukemia virus vector integration. Journal of Virology 88(8):4504–4513
DOI 10.1128/JVI.00011-14.

Menzel et al. (2019), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.198 11/13

https://peerj.com
https://git.thm.de/mmnz21/Enhort
https://git.thm.de/mmnz21/Enhort
http://dx.doi.org/10.7717/peerj-cs.198#supplemental-information
http://dx.doi.org/10.7717/peerj-cs.198#supplemental-information
http://dx.doi.org/10.1038/nbt.2153
http://dx.doi.org/10.1093/nar/gkw211
http://dx.doi.org/10.1093/bioinformatics/btr703
http://dx.doi.org/10.1182/blood-2014-02-553602
http://dx.doi.org/10.1101/cshperspect.a006890
http://dx.doi.org/10.1093/bioinformatics/btr539
http://dx.doi.org/10.1128/JVI.00011-14
http://dx.doi.org/10.7717/peerj-cs.198


Deyle DR, Russell DW. 2009. Adeno-associated virus vector integration. Current Opinion
in Molecular Therapeutics 11(4):442–447.

ENCODE Project Consortium. 2004. The ENCODE (ENCyclopedia of DNA elements)
project. Science 306(5696):636–640 DOI 10.1126/science.1105136.

Fujita PA, Rhead B, Zweig AS, Hinrichs AS, Karolchik D, Cline MS, Goldman M,
Barber GP, Clawson H, Coelho A, Diekhans M, Dreszer TR, Giardine BM, Harte
RA, Hillman-Jackson J, Hsu F, Kirkup V, Kuhn RM, Learned K, Li CH, Meyer LR,
Pohl A, Raney BJ, Rosenbloom KR, Smith KE, Haussler D, Kent WJ. 2011. The
UCSC Genome Browser database: update 2011. Nucleic Acids Research 39(suppl
1)D876–D882 DOI 10.1093/nar/gkq963.

Gogol-Döring A, Ammar I, Gupta S, Bunse M, Miskey C, Chen Wei, Uckert W, Schulz
TF, Izsvák Z, Ivics Z. 2016. Genome-wide profiling reveals remarkable parallels
between insertion site selection properties of the MLV retrovirus and the piggyBac
transposon in primary human CD4+ T cells. Molecular Therapy 24(3):592–606
DOI 10.1038/mt.2016.11.

Janovitz T, Oliveira T, Sadelain M, Falck-Pedersen E. 2014. Highly divergent integration
profile of adeno-associated virus serotype 5 revealed by high-throughput sequencing.
Journal of virology 88(5):2481–2488 DOI 10.1128/JVI.03419-13.

Kent WJ, Sugnet CW, Furey TS, Roskin KM, Pringle Tom H, Zahler AM, Haussler D.
2002. The human genome browser at UCSC. Genome Research 12(6):996–1006
DOI 10.1101/gr.229102.

Krzywinski MI, Schein JE, Birol I, Connors J, Gascoyne R, Horsman D, Jones SJ, Marra
MA. 2009. Circos: an information aesthetic for comparative genomics. Genome
Research 19(9):1639–1645.

LaFave MC, Varshney GK, Gildea DE, Wolfsberg TG, Baxevanis AD, Burgess SM. 2014.
MLV integration site selection is driven by strong enhancers and active promoters.
Nucleic Acids Research 42(7):4257–4269 DOI 10.1093/nar/gkt1399.

Li MA, Pettitt SJ, Eckert S, Ning Z, Rice S, Cadianos J, Yusa K, Conte N, Bradley A.
2013. The piggyBac transposon displays local and distant reintegration preferences
and can cause mutations at noncanonical integration sites. Molecular and Cellular
Biology 33(7):1317–1330 DOI 10.1128/MCB.00670-12.

Li L, Zhang D, Li P, Damaser M, Zhang Y. 2015. Virus integration and genome influence
in approaches to stem cell based therapy for andro-urology. Advanced Drug Delivery
Reviews 82–83:12–21 DOI 10.1016/j.addr.2014.10.012.

Pingoud A, Jeltsch A. 2001. Structure and function of type II restriction endonucleases.
Nucleic Acids Research 29(18):3705–3727 DOI 10.1093/nar/29.18.3705.

Riviere I, Dunbar CE, Sadelain M. 2012. Hematopoietic stem cell engineering at a
crossroads. Blood 119(5):1107–1116 DOI 10.1182/blood-2011-09-349993.

Roadmap Epigenomics Consortium, Kundaje A, Meuleman W, Ernst J, Bilenky M,
Yen A, Heravi-Moussavi A, Kheradpour P, Zhang Z, Wang J, Ziller MJ, Amin V,
Whitaker JW, Schultz MD, Ward LD, Sarkar A, Quon G, Sandstrom RS, Eaton
ML, Wu Y-C, Pfenning AR, Wang X, Claussnitzer M, Liu Y, Coarfa C, Harris
RA, Shoresh N, Epstein CB, Gjoneska E, Leung D, Xie W, Hawkins RD, Lister R,

Menzel et al. (2019), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.198 12/13

https://peerj.com
http://dx.doi.org/10.1126/science.1105136
http://dx.doi.org/10.1093/nar/gkq963
http://dx.doi.org/10.1038/mt.2016.11
http://dx.doi.org/10.1128/JVI.03419-13
http://dx.doi.org/10.1101/gr.229102
http://dx.doi.org/10.1093/nar/gkt1399
http://dx.doi.org/10.1128/MCB.00670-12
http://dx.doi.org/10.1016/j.addr.2014.10.012
http://dx.doi.org/10.1093/nar/29.18.3705
http://dx.doi.org/10.1182/blood-2011-09-349993
http://dx.doi.org/10.7717/peerj-cs.198


Hong C, Gascard P, Mungall AJ, Moore R, Chuah E, Tam A, Canfield TK, Hansen
RS, Kaul R, Sabo PJ, Bansal MS, Carles A, Dixon JR, Farh K-H, Feizi S, Karlic R,
Kim A-R, Kulkarni A, Li D, Lowdon R, Elliott G, Mercer TR, Neph SJ, Onuchic V,
Polak P, Rajagopal N, Ray P, Sallari RC, Siebenthall KT, Sinnott-Armstrong NA,
Stevens M, Thurman RE, Wu J, Zhang B, Zhou X, Beaudet AE, Boyer LA, De Jager
PL, Farnham PJ, Fisher SJ, Haussler D, Jones SJM, Li W, Marra MA, McManus
MT, Sunyaev S, Thomson JA, Tlsty TD, Tsai L-H, Wang Wei, Waterland RA,
Zhang MQ, Chadwick LH, Bernstein BE, Costello JF, Ecker JR, Hirst M, Meissner
A, Milosavljevic A, Ren B, Stamatoyannopoulos JA, Wang T, Kellis M. 2015.
Integrative analysis of 111 reference human epigenomes. Nature 518(7539):317–330
DOI 10.1038/nature14248.

Robinson JT, Thorvaldsdóttir H, Winckler W, Guttman M, Lander ES, Getz G,
Mesirov JP. 2011. Integrative genomics viewer. Nature Biotechnology 29(1):24–26
DOI 10.1038/nbt.1754.

Roth SL, Malani N, Bushman FD. 2011. Gammaretroviral Integration into Nucleosomal
Target DNA In Vivo. Journal of Virology 85(14):7393–7401 DOI 10.1128/JVI.00635-11.

Sandve GK, Gundersen S, Johansen M, Glad I, Gunathasan K, Holden L, Holden M,
Liestl K, Nygrd S, Nygaard V, Paulsen J, Rydbeck H, Trengereid K, Clancy T,
Drabls F, Ferkingstad E, Kala M, Lien T, Rye MB, Frigessi A, Hovig E. 2013. The
Genomic HyperBrowser: an analysis web server for genome-scale data. Nucleic Acids
Research 41(W1):W133–W141 DOI 10.1093/nar/gkt342.

Shao W, Shan J, Kearney MF, Wu X, Maldarelli F, Mellors JW, Luke B, Coffin JM,
Hughes SH. 2016. Retrovirus Integration Database (RID): a public database
for retroviral insertion sites into host genomes. Retrovirology 13(1):Article 47
DOI 10.1186/s12977-016-0277-6.

Wilson MH, Coates CJ, George AL. 2007. PiggyBac transposon-mediated gene transfer
in human cells. Molecular Therapy 15(1):139–145 DOI 10.1038/sj.mt.6300028.

Wu X, Li Y, Crise B, Burgess SM. 2003. Transcription start regions in the human
genome are favored targets for MLV integration. Science 300(5626):1749–1751
DOI 10.1126/science.1083413.

Menzel et al. (2019), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.198 13/13

https://peerj.com
http://dx.doi.org/10.1038/nature14248
http://dx.doi.org/10.1038/nbt.1754
http://dx.doi.org/10.1128/JVI.00635-11
http://dx.doi.org/10.1093/nar/gkt342
http://dx.doi.org/10.1186/s12977-016-0277-6
http://dx.doi.org/10.1038/sj.mt.6300028
http://dx.doi.org/10.1126/science.1083413
http://dx.doi.org/10.7717/peerj-cs.198