key: cord-143539-gvt25gac authors: Marmarelis, Myrl G.; Steeg, Greg Ver; Galstyan, Aram title: Latent Embeddings of Point Process Excitations date: 2020-05-05 journal: nan DOI: nan sha: doc_id: 143539 cord_uid: gvt25gac When specific events seem to spur others in their wake, marked Hawkes processes enable us to reckon with their statistics. The underdetermined empirical nature of these event-triggering mechanisms hinders estimation in the multivariate setting. Spatiotemporal applications alleviate this obstacle by allowing relationships to depend only on relative distances in real Euclidean space; we employ the framework as a vessel for embedding arbitrary event types in a new latent space. By performing synthetic experiments on short records as well as an investigation into options markets and pathogens, we demonstrate that learning the embedding alongside a point process model uncovers the coherent, rather than spurious, interactions. The propagation of disease [1] , news topics [2] , crime patterns [3, 4] , neuronal firings [5] , and market trade-level activity [6, 7] naturally suit the form of diachronic point processes with an underlying causal-interaction network. Understanding their intrinsic dynamics is of paramount scientific and strategic value: a particular series of discrete options trades may inform an observer on the fluctuating dispositions of market agents; similarly, temporal news publication patterns may betray an ensuing shift in the public zeitgeist. The spread of a novel pathogen, notably the COVID-19 virus, through disjointed pockets of the globe hints to how it proliferates, and how that might be averted [8] . Practically estimating the (n × n ) possible excitations between each dyad (pair) of event types is untenable without succinct and interpretable parametrizations. How can one possibly disentangle the contributions of hundreds of options trades within each minute, in a myriad of different strike prices and expiration dates, to the Poisson intensity of a particular type of trade? We must envision the right kind of bias for our model. Spatiotemporal domains [9, 3, 10] exploit physical constraints on interaction locality. In essence, the influence one event bears on another is governed solely by Formulation. Consider a record of N event occurrences (k i , t i ) ∈ , i = 1, 2, . . . N , with n marked types k i ∈ {1, 2, . . . n } at times t i ∈ [0, T ). We have reason to believe that events with certain marks excite future events of either the same or another type. Multiple such interactions may be present, and we only want to identify those that are warranted by the observed record. A compact latent representation would induce a strong prior on the continuum of allowable interactions. We start with preliminaries. The multivariate intensity function λ(k , t ), conditional on the events in [0, t ), dictates the instantaneous Poisson frequency. A particular interval [t , t + dt ) would expect to witness λ(k , t )dt instances of event type k . We decompose this intensity [24] into self-and cross-excitations and an intrinsic background rate, not knowing a priori which event triggered which: One way to encourage inductive skepticism about apparent interactions in the estimated response function h (k , l , τ) is to constrain its structure. Suppose there exists a latent Euclidean geometry that adequately captures the interactions between event types. We infer distinct embeddings for receiving, x k ∈ X ⊂ m , versus influencing, y l ∈ Y ⊂ m , because otherwise we constrain our model to symmetrical bidirectional interactions-and that is not sufficiently expressive. Induced by a mapping on the marks, X and Y constitute the discrete embedding of a latent manifold wherein the influence some event type l exerts on k is characterized by y l − x k 2 . Each response is the result of a dyadic interaction between an event (k i , t i ) in the past and the potential occurrence of event type k at the present time t . The chosen parametric family for the causal kernel bank entails Gaussian proximities in space and exponential clocks: Eqs. 2 & 3 form the spatial and temporal basis of r = 1, 2, . . . R kernels comprising the response function, tentatively produced here and fully expanded in Eq. 8. Later on we introduce the ingredients ξ(l ) and γ r for granularity in the magnitudes. A generalized Poisson process yields a clean log likelihood function, written as follows: However direct optimization on the basis of its gradient has proven unwieldy in anything other than deep general models. Even worse, its current form is not amenable to analysis. Suppose we happened upon the expected branching structure [12] of the realized point process. In other words, we introduced latent variables [p i j r ] ∈ N ×N ×R holding expectation estimates of P i j r ∈ {0, 1}, indicating whether it was the event instance i that triggered instance j , and attributing responsibility to kernel basis r . Knowledge of the untenable true line of causation endows us with the so-called complete-data log likelihood termed log L c [10] , an expectation of the joint log-probability density of the record and the latent variables [P i j r ] in terms of their probabilities [p i j r ]. By abuse of notation let h r (· · · ) denote the r th response kernel. Note that, in the above form, what was previously a logarithm of summations (see Eq. 4 and 1) is replaced by a weighted sum of decoupled logarithms. The probability that the event instance was due to the background white Poisson process is p b j ; ∀ j , i ,r p i j r + p b j = 1. Concretely, given a model λ(k , t ), the allegation of causality i → j via r is the ratio of that particular contribution to the overall intensity: The right-hand term in Eq. 5 simplifies vastly if one assumes that ∀y l ∈ Y x k ∈X g (x k , y l ) = 1, and The latter approximation is tenable for large enough T ; the former is not. Only through certain concessions may we gain confidence that the sum is roughly unit. First note that, by the Gaussian integral, m g (x , y )dx = 1. Veen & Schoenberg [9] arrived at the conclusion that this approximation holds arbitrarily well if the event occurrences in this putative space are distributed uniformly in m . Our embedding scheme is usually not: events are clumped at discrete locations of their types, and we are left with a coerced normalization of Eq. 2; We still rely on the "pure" form in Eq. 2 for the sake of closed-form optimization, as detailed below; nevertheless the intervention in Eq. 7 prevents drifting towards degeneracy. The final touch is the introduction of one more kernel parameter, ξ(l ), to account for this additional restriction. The full response kernel is therefore Eq. 8. We eliminate redundancies by constraining the exertion coefficient n −1 n l =1 ξ(l ) = 1 and scaling the basis coefficients γ r appropriately. Furnished with the causality estimates in Eq. 6 (the "Expectation" step), we perform projected gradient ascent by setting partial derivatives of the complete-data log-likelihood with respect to each kernel parameter to zero (the "Maximization" step). Eventually the causalities are aggregated in special ways to form coefficient estimates. Omitting the domains of summation over i and j , i to imply {1, 2, . . . N } and {1, 2, . . . N } × {1, 2, . . . N } respectively, the solutions unfold as so: At times, it is necessary to preserve focus on the acceptable time horizons for a particular domain, in bias against "degenerate" ones. A Gamma(α δ , β δ ) prior on the decay rate δ r admits the maximization a posteriori in Eq. 9, which trivially becomes uninformative at the assignment δ r (1, 0). One is typically interested in the half-life (log 2/δ r ), the prior of which is the reciprocal of the aforementioned gamma distribution and characterized by the aptly named inverse-gamma distribution with expectation β δ α δ −1 . Preserving the mean while increasing both parameters strengthens the prior. The influence matrix Φ = [ϕ(k , l )] is composed of the kernels with time integrated out, i.e. There is evidence that this quantity encodes the causal network structure [25, 26] . Pursuant to the above maximization step, one may alternatively estimate all of these n 2 degrees of freedom. First Principles (FP). We invoke the direct relation between the likelihood function and the embeddings. Our approach alternates between updating the reception embedding and the influence embedding, á la Gibbs sampling. Observe the partial gradient with respect to the vector y : This behemoth is difficult to solve analytically. Recall, however, our prior simplifying assumption that ∀y , r x ∈X g r (x , y ) = 1, also enforced a posteriori by means of Eq. 7. The latter portion of Eq. 14 contains the form x (x − y )g r (x , y ), equivalent to taking a quantized "expectation" of a Gaussian variable subtracted by its own mean (see Eq. 2). Hence a viable approximation to a locally optimal y embedding point stems from neglecting the contribution of the entire second part of Eq. 14, allowing us to garner the following intuitive formula: Evidently each influence point y ∈ Y is attracted to the reception points {x ∈ X } that appear to receive excitatory influence from it. Unfortunately there is no analogue approximation for the reception points themselves that could manifest by a similar sensible trick. We produce the derivation: and submit to regular gradient ascent with learning rate , and specifically an update rule along the average log likelihood for a consistent strategy across different record lengths: To gain intuition on the selection of , we looked into entropic impact as a heuristic. The beautiful findings in [27] allowed us to reason about the following contribution of a change in an embedding point to the differential entropy of a doubly stochastic point process: which admitted a simple rule of thumb to setting the learning rate proportional to n N , with the constant pertaining to domain idiosyncrasies. Maintaining this ratio ameliorated convergence in §3.1. Diffusion Maps (DM). Could we posit a diffusion process across event types? Random walk methods yield approximate manifold embeddings, proven helpful in deep representations [28, 29, 30] . Construed as graph affinities, the influences Φ guide a Markovian random walk of which diffusion maps [31, 32, 33] may be approximated via spectral decomposition. We found that asymmetrical diffusion-maps embeddings (in the style of [34] ) serve as an adequate initial condition but are not always conducive to stable learning in conjunction with our dynamic kernel basis. We term the model learned entirely this way as DM, and for brevity we relegate its review to Appendix A. Baseline (B). Did the reduction in degrees of freedom lend its hand to a more generalizable model of the point process? In order to motivate the reason for having an embedding at all-besides the gains in interpretability-we pitted the techniques FP and DM against the following: having estimated the full-rank matrix entries ϕ(·, ·) directly [11] . We guessed adequate initial conditions for the EM procedure with a fixed empirical protocol. The surmised influence matrix came by summing up correlations between event types. Notice that it remains unscaled.δ was computed as the naive reciprocal inter-arrival time between We justify this construction on basis of Eq. 9, which forms a weighted average over said arrival times to garner an optimal estimate for δ −1 r . We feed the result of Eq. 19 into the diffusion-maps algorithm in order to obtain our initial embeddings (x ,ŷ ) ∈X ×Ŷ . ∀rβ 2 r is initialized at the mean dyadic squared distance in the embedding;δ r = (rt ) −1 , in which variety is injected to nudge the kernels apart;γ r = R −1 and finally ∀x ,μ(x ) = T −1 n −1 N . We intended to stress-test the learning algorithm under small record sizes and numerous event types; we thereby contrived a number of scenarios with known ground-truth parameters sampled randomly. The underlying models had a single kernel R = 1 in the response function and conformed to the prescriptions FP, DM, and B from §2.2. Each sampled model was simulated with the thinning algorithm (see e.g. [35] and their supplementary material) in order to generate a time-series record of specified length N . Reception and influence points were realized uniformly from a unit square (m = 2). Spatial bandwidths β 2 were granted a gamma distribution with shape α = 1 / n and unit scale. Decay rates δ were standard log-normal as were backgrounds µ(k ), though scaled by 1 /n. Stability [24] was ensured by setting γ = 1 / n, constraining the Frobenius norm Φ F = 1 that upperbounds the L 2 -induced norm, which itself upper-bounds the spectral radius of the influences ρ(Φ), the real criterion. In line with Goodhart's Law [36] , different facets of the model apparatus were scrutinized. First, we sought to determine whether the baseline tends to reach high in-sample likelihoods yet abysmal out-of-sample likelihoods, including extreme outliers. Specifically, not only are statistics on each likelihood important but also the relationship between the two. We thought to convey it through a total least squares [37] slope and centroid, followed by its root mean-square error (RMSE) in the first four outcome columns of Table 1 . The centroid gives empirical means for train and test log L , and the slope how much the test varies with the train. Did our models recover the chain of causation? We opened up the empirical [p i j r ] estimates and computed their Hellinger distance [38] from those stipulated by the ground truth. For each "to be caused" event j , the quantities (i , r ) → p i j r are framable as a discrete probability distribution, the empirical construction of which makes it poorly suited numerically for the more widespread KL-divergence measure, unlike the Hellinger distance. , RMSE of the fit line (col. "fit"), divergence from ground-truth causalities (col. "div."), and mean correlation difference between the model and Glove along with t-test significance markers (col. "emb. cor."). ***: P ≤ 0.01, **: P ≤ 0.05, *: P ≤ 0.1. Bold numerals indicate the sample rejected an Anderson-Darling test [39] for normality with significance ≤ 0.05. Comparison to GloVe. Our technique lives in a dual realm: that of vector embeddings for words and other sequential entities. We fed the ordered sequence of event-type occurrences into a typical GloVe scheme [40] with a forward-looking window of size three, three dimensions, and an asymmetric cooccurrence matrix in an attempt to recover a set of vectors that served as both influence and reception points. The final column of Table 1 displays a systematic evaluation against GloVe's embeddings with reference to the ground-truth geometry. Concretely, all pairwise distances in each setting (that of our learned model and the newfound GloVe embeddings) were correlated to those of the ground truth by Kendall's rank-based nonparametric statistic [41] . The gap between GloVe's estimated correlation and the model's under scrutiny, each in [−1, 1] and where a positive difference means the model correlated more with the ground truth, was collected in each trial and means along with a t-test significance [42] were reported in the last column of Table 1 . GloVe is nondeterministic, so we obtained the sample mean of ten embedding correlations per trial-a move that favors GloVe's results. From the outcomes, it is evident that DM models pick up the spatial coordinates more consistently-probably due to the regularity imposed by their normalized spectral decomposition. Epidemics. Consider disease in a social apparatus emerging as a diffusive point process. In 2019 a finely regularized variational approach to learning multivariate Hawkes processes from (a little more than) a handful of data [13] was demonstrated on a dataset of symptom incidences during the ∼2014-2015 ebola outbreak [43] . We gave the record precisely the same treatment the authors did, and obtained significantly higher commensurate likelihoods than their best case. In turn, they outperformed the cutting-edge approaches MLE-SGLP [14] and ADM4 [15] that regularize for sparsity. We also trained a model with an embedding fixed to the geographic coordinates of the 54 West African districts present in the dataset, assessing the spatial nature of the process. See Table 2 ; also, view Appendix B for a foray into the COVID-19 pandemic and a fruitful result on South Korea. Table 3 . The auxiliary accuracy metric is derived from categorical cross-entropy of the predicted event type at the time of an actual occurrence, rendered by the expression We visualized three-dimensional embeddings via their two principal components in Figure 3 . Figure 2 : Trades at discrete strike prices were resampled according to quantized log-Gaussian profiles with reference to moneyness at any given point in time. Standard deviations, in logarithmic space, were half the separation between the two densities' centers. They were kept "loose" for the sake of seamless translation even under abrupt fluctuations in the underlying stock price. Predictive ability. We display the epoch with the best training score in all our experiments. Most notable in Table 1 is how the DM formulation enjoys poorly determined systems, e.g. (300, 90) , but is typically outperformed on the basis of likelihoods by FP in better-posed situations like (900, 30). The baseline suffers in recovering actual causalities (measured by divergences from ground truth) in contrast to our novel models FP and DM. The ebola results in Table 2 depict superior results from all of our models to the state of the art. Whereas our EM "baseline" is best and the model with geographic ground truth performs marginally better than our FP, we count it as a positive that most of the information was retained. The strain afflicting the region during that period of time had an incubation period of about 8-12 days [44] , suggesting preference for higher half lives. See our results pertaining to COVID-19 in Appendix B. Interpretation of the market embeddings. Events belonging to each stock ticker tend to attract influencing points of the same color in Figure 3 , with deviations hinting to their relative perceptions by the market. Efficient estimators are necessary in order to discern a lack of stationarity; in this case, transferring our Sep. 15 FP model onto the Sep. 18 test set yielded a log L improvement from 2.57 to 2.82, and vice versa gave a decrease from 2.72 to 2.45. Thus behavior largely persists. Further, aggregate ξ(l )'s confirm the broad intuition that out-of-money trades move markets the most [45] . The quantile plots put forth that both FP and DM outperformed the baseline in statistically filtering out the white (i.e. serially independent) background events according to their estimated probabilities p b j . A constant-intensity Poisson process would witness arrival times distributed exponentially [24] . Disentangling time scales. We unsuccessfully attempted to sway the estimators towards longerterm (on the order of seconds) behaviors by enlisting a prior on the exponential rate encouraging an expected half life of one minute. Parsing out minute-scale behaviors out of high-frequency trades is severely difficult-recall that most market makers dealing in options are automated. The gamma prior inflicts a cost without influencing the half lives very much. It would be imperative to study higher-order interactions [46] if one were to investigate longer patterns through individual trades. The fundamental notion driving the doubly stochastic process, first attributed to Cox [47] , manifests in a variety of ways including the Latent Point Process Allocation [48] model. In fact, note the meteoric rise [49] of Bayesian approaches that now permeate serially dependent point processes. Zhang et al. [50] sample the branching structure in order to infer the Gaussian process (GP) that constitutes the influence function. GPs, usually accompanied by inducing points, sometimes directly modulate the intensity [51, 48, 52, 53, 54] . Linderman and Adams [55] took an approach that estimated a matrix very similar to our [ϕ i j r ], relying on discretized binning and variational approximation. Salehi et al. [13] exploited a reparametrization trick akin to those in variational autoencoders in order to efficiently estimate the tensor of basis-function coefficients. Recent progress has been made in factorizing interactions with a direct focus on scalability [19] , improving on prior work in low-rank processes [20] . Block models on observed interaction pairs also exist [21] . While these all achieve compact Hawkes processes, our methodology distinguishes itself by learning a Euclidean embedding with the semantics of a metric space, not a low-rank projection. Notably Neural Hawkes [35] and a hodgepodge of other techniques centered on neural networks [56, 57, 58, 59, 60, 61, 2, 62, 63, 64] have been established through the years. Our baseline estimator resembles most closely the one described in the work of Zhou et al. [11] , whereas the spatiotemporal aspect is inspired from the likes of Schoenberg et al. [9, 3, 10] . Variational substitutes in the EM algorithm have also been explored [65] . A concurrent study to ours by Zhu et al. [66] parametrizes a heterogeneous kernel in real Euclidean space by deep neural networks. We demonstrated the viability of estimating embeddings for events in an interpretable metric space tied to a self-exciting point process. The proposed expectation-maximization algorithm extracts parsimonious serial dependencies. Our framework paves the way for generalization, extension to more elaborate models, and consequent potential for societal impact in the future. All of the real-world phenomena examined herein emerged from systems of social entities. The point processes are governed by either human interactions (e.g. in a pandemic) or dealings between agents thereof (e.g. in the options market, with the majority of trades automated on behalf of institutions.) Data collection is costly for problems concerning human activities in "everyday" society, as peering into social media provides a skewed and limited perception. Other, more authentic streams of information concerning public opinion, physical interaction networks, and so forth are either inaccessible due to privacy issues or saturated with noise. We believe that our contribution broadens the scope of the kinds of problems that can be studied with analytical point-process methods. A myriad of applications in the social sciences have lagged in adopting the level of technical vigor that contemporary data science enables. A few of themas displayed in the present study-benefit significantly from the efficient estimation of compact, interpretable representations for event interactions. It is difficult to name any negative ethical implications for this line of work. One necessary precaution with increasingly refined "causal" models is to avoid making any hasty conclusions on true causation, which cannot be proven solely from our Hawkes kernel estimates. We review briefly the technique's application here; the curious reader is encouraged to peruse the theory presented in Coifman's seminal publications [33] . Casting the influence matrix as edge weights in a bipartite graph flowing between influence (col.) ↔ reception (row), we examine the diffusion process upon it [34] . We first normalize by density to our liking, per our selected value for the parameter 0 ≤ α ≤ 1 according to Consider the row-stochastic version of A, named B R . Its singular values multiplied by the left (orthonormal) eigenvectors thereof supply manifold embedding coordinates for the reception points, weighted by significance according to the singular values. Likewise, B I may be constructed as the row-stochastic transformation of A T from the same Eq. 20, of which the resultant coordinates grant us the influence points. In each set of coordinates, we preserve only those corresponding to the highest m singular values-except for the largest, which is constant by definition. The COVID-19 pandemic caused by a coronavirus novel to humans has taken the world by surprise, forcing lockdowns across the globe and uniting humanity on a common front. Data scientists of course desire to contribute to this global effort in whichever way they can. One avenue is the study of the infections' spatiotemporal nature. Notwithstanding the regional distortions in reporting due to a wide array of factors, we expect diffusion dynamics to reflect in the fatalities at the macroscopic level. Figure 4 catalogs our empirical results from the Johns Hopkins CSSE dataset [67] . Since the daily new confirmed cases were so numerous, we had to increase the temporal resolution artificially from days to hours: we interpolated on an exponential curve between successive days. The model we deployed as the baseline ("B") gave the best performance in this experiment. Identifying harbingers for the spatial progression of COVID-19 is of paramount importance for proactive policymaking. We sought to model the transient exponential-growth phase only; incorporating a self-limiting aspect remains the topic of future study. That instability coupled with evident non-planarity of the interaction graphs in Figure 4 contribute to the failure of enacting our latent embedding scheme, and reversion to the baseline model. Three contending factors make this particular task difficult. First, the instability of the infection process renders possible cross-regional influences ambiguous. Second, imperfections in the reporting protocol could have induced too much noise. And third, there may actually be no suitable lowdimensional Euclidean embedding. To test the third hypothesis, we experimented with the counties of relatively vast and rural states. We examined solely Ohio's fatality diffusion process to no end. To ameliorate perhaps the issues with recording fatalities instead of confirmed cases, as well as the explosive growth, we also peered into the April 20-May 20 period of new confirmed cases, which was slower because most states had instituted social controls. Both Ohio and Kentucky, which have numerous small counties, were scrutinized with FP and DM models to no avail. South Korea. Questioning whether it was the quality of the reported confirmed cases at fault, we found the well-crafted dataset released by the Korean Centers for Disease Control & Prevention [22] . They detail 3,385 incidences from the early outbreaks across the 155 regions of the country, each with at least one infection occurrence. We find it worth noting that the propagation is more controlled in South Korea relative to most other countries-this premise along with diligent testing appears to contribute to the embedding's identifiability. Our novel method FP with = 1 attained a test log L of −3.90, in comparison to the baseline with −4.68 and the geographically spatial model with −4.65. The test set consisted of the last 30 days in the record, containing 631 incidences. It is further worth noting that the full-rank baseline registered a significantly higher training log L than our FP did, indicative of excess overfitting. See the resultant principal components of the three-dimensional embeddings in Figure 5 along with occurrence statistics in Figure 6 . For one, we observe that the two urban hubs Seoul and Busan are not as far apart in the latent influence space as they are geographically. Colors were interpolated by hue on the basis of physical proximity to Seoul (blue) versus Busan (red), the two major urban centers. Each location has a pair of an ex and a dot, corresponding to receiving and influencing points respectively. Each comparable multivariate Hawkes model (along with its inference procedure) that is detailed in the current state of the art [20, 15, 13] entails careful tuning that no existing software package appears to cover judiciously. We found it more productive to benchmark the proposed formulation on the basis of previously reported test-set average log likelihoods, a standardized quantity. The current state of self-exciting point processes is fragmented such that there is no accepted benchmark dataset, or even domain. This aspect differs from the prevailing applications of deep learning like image recognition. As elaborated in the Related Work section, some methodologies rely on large sample records and retain the power to model any kind of interaction (in theory). Others sacrifice a degree of expressivity for salient and parsimonious interactions. Even within the latter camp one is faced with substantial tradeoffs pertaining to computational as well as statistical complexity. The number of event types to be accommodated could lie in the dozens or the hundreds, and likewise the length of the time series varies in the tens of thousands versus the hundreds of thousands or millions. Each scenario calls for different modeling choices, and yet the delineation is not at all clear. Exhaustive synthetic examples generated from true Hawkes processes serve to validate the correctness of the model alongside its estimation algorithms. Real phenomena never abide by the pure assumptions posed for a self-exciting point process, so a model must stand on its own merit within each distinct application domain. We showcased viability in two highly pertinent domains by examining epidemics and the options market. Comparison and assessment of epidemic models Hawkestopic: A joint model for network inference and topic modeling from text-based cascades Marked point process hotspot maps for homicide and gun crime prediction in chicago Self-exciting point process modeling of crime Spatio-temporal correlations and visual signaling in a complete neuronal population State-dependent hawkes processes and their application to limit order book modeling General compound hawkes processes in limit order books When is a network epidemic hard to eliminate? Estimation of space-time branching process models in seismology using an em-type algorithm Multivariate spatiotemporal hawkes processes and network reconstruction Learning triggering kernels for multi-dimensional hawkes processes Modelling dyadic interaction with hawkes processes Learning hawkes processes from a handful of events Learning granger causality for hawkes processes Learning social infectivity in sparse low-rank networks using multi-dimensional hawkes processes weg2vec: Event embedding for temporal networks Crime event embedding with unsupervised feature selection Embedding temporal network via neighborhood formation Learning multivariate hawkes processes at scale Multivariate hawkes processes for large-scale inference The block point process model for continuous-time event-based dynamic networks Ds4c: Data science for covid-19 in south korea Spectra of some self-exciting and mutually exciting point processes First-and second-order statistics characterization of hawkes processes and non-parametric estimation Uncovering causality from multivariate hawkes integrated cumulants Learning network of multivariate hawkes processes: A time series approach The entropy of a point process Variational autoencoders with riamannian brownian motion priors Diffusion variational autoencoders Diffusion variational autoencoders Data-driven probability concentration and sampling on manifold Multivariate time-series analysis and diffusion maps Diffusion maps Large-scale spectral clustering using diffusion coordinates on landmarkbased bipartite graphs The neural hawkes process: A neurally self-modulating multivatiate point process Inflation, Depression, and Economic Policy in the West, ch. Problems of Monetary Management: The U.K. Experience An introduction to total least squares Universal boosting variational inference Power comparisons of shapiro-wilk, kolmogorov-smirnov, lilliefors and anderson-darling tests Glove: Global vectors for word representation Parameters behind "nonparametric" statistics: Kendall's tau, somers' d and median differences The probable error of a mean Heterogeneities in the case fatality ratio in the west african ebola outbreak A review of epidemiological parameters from ebola outbreaks to inform early public health decision-making Option moneyness and price disagreements General methodology for nonlinear modeling of neural systems with poisson point-process inputs Some statistical methods connected with series of events Latent point process allocation Mutually regressive point processes Efficient non-parametric bayesian hawkes processes Nonparametric regressive point processes based on conditional gaussian processes Structured variational inference in continuous cox process models Bayesian nonparametric poisson-process allocation for time-sequence modeling Scalable high-resolution forecasting of sparse spatiotemopral events with kernel methods Scalable bayesian inference for excitatory point process networks Temporal network embedding with micro-and macro-dynamics Deep random splines for point process intensity estimation of neural population data Deep mixture point processes: Spatio-temporal event prediction with rich contextual information Fully neural network based model for general temporal point processes Generative sequential stochastic model for marked point processes Geometric hawkes processes with graph convolutional recurrent neural networks," Association for the Advancement of Artificial Intelligence Recurrent marked temporal point processes: Embedding event history to vector Neural jump stochastic differential equations Latent self-exciting point process model for spatial-temporal networks Interpretable generative neural spatio-temporal point processes An interactive web-based dashboard to track covid-19 in real time Community structure in social and biological networks