key: cord-103781-bycskjtr
authors: Mönke, Gregor; Sorgenfrei, Frieda A.; Schmal, Christoph; Granada, Adrián E.
title: Optimal time frequency analysis for biological data - pyBOAT
date: 2020-06-04
journal: bioRxiv
DOI: 10.1101/2020.04.29.067744
sha: 
doc_id: 103781
cord_uid: bycskjtr

Methods for the quantification of rhythmic biological signals have been essential for the discovery of function and design of biological oscillators. Advances in live measurements have allowed recordings of unprecedented resolution revealing a new world of complex heterogeneous oscillations with multiple noisy non-stationary features. However, our understanding of the underlying mechanisms regulating these oscillations has been lagging behind, partially due to the lack of simple tools to reliably quantify these complex non-stationary features. With this challenge in mind, we have developed pyBOAT, a Python-based fully automatic stand-alone software that integrates multiple steps of non-stationary oscillatory time series analysis into an easy-to-use graphical user interface. pyBOAT implements continuous wavelet analysis which is specifically designed to reveal time-dependent features. In this work we illustrate the advantages of our tool by analyzing complex non-stationary time-series profiles. Our approach integrates data-visualization, optimized sinc-filter detrending, amplitude envelope removal and a subsequent continuous-wavelet based time-frequency analysis. Finally, using analytical considerations and numerical simulations we discuss unexpected pitfalls in commonly used smoothing and detrending operations.

Oscillatory dynamics are ubiquitous in biological systems. From transcriptional to behavioral level these oscillations can range from milliseconds in case of neuronal ring patterns, up to years for the seasonal growth of trees or migration of 1 birds (Goldbeter et al. [2012] , Gwinner [2003] , Rohde and Bhalerao [2007] ). To gain biological insight from these rhythms, it is often necessary to implement time-series analysis methods to detect and accurately measure key features of the oscillatory signal. Computational methods that enable analysis of periods, amplitudes and phases of rhythmic time series data have been essential to unravel function and design principles of biological clocks (Lauschke et al. [2013] , Ono et al. [2017] , Soroldoni et al. [2014] ). Here we present pyBOAT, a framework and software package with a focus on usability and generality of such analysis.

Many time series analysis methods readily available for the practitioner rely on the assumption of stationary oscillatory features, i.e. that oscillation properties such as the period remain stable over time. A plethora of methods based on the assumption of stationarity have been proposed which can be divided into those working in the frequency domain such as Fast Fourier transforms (FFT) or Lomb-Scargle periodograms (Lomb [1976] , Ruf [1999] ) and those working in the time domain such as autocorrelations (Westermark et al. [2009] ), peak picking (Abraham et al. [2018] ) or harmonic regressions (Edwards et al. [2010] , Halberg et al. [1967] , Naitoh et al. [1985] , Straume et al. [1991] ). In low noise systems with robust and stable oscillations, these stationary methods suce to reliably characterize oscillatory signals. Recordings of biological oscillations frequently exhibit noisy and time-dependent features such as drifting period, uctuating amplitude and trend. Animal vocalization (Fitch et al. [2002] ), temporal changes in the activatory pathways of somitogenesis (Tsiairis and Aulehla [2016] ) or reversible and irreversible labilities of properties in the circadian system due to aging or environmental factors (Pittendrigh and Daan [1974] , Scheer et al. [2007] ) are typical examples where systematic, often non-linear changes in oscillation periods occur. In such cases, the assumption of stationarity is unclear and often not valid, thus the need to use nonstationary-based methods that capture time-dependent oscillatory features. Recently, the biological data analysis community has developed tools that implement powerful methods tailored to specic steps of time-series analysis such as rhythmicity detection (Hughes et al. [2010] , Thaben and Westermark [2014] ), de-noising and detrending, and the characterization of nonstationary oscillatory components (Leise [2013] , Price et al. [2008] ).

To extract time-dependent features of non-stationary oscillatory signals, methods can be broadly divided into those that rely on operations using a moving time-window (e.g. wavelet transform) and those that embeds the whole time series into a phase space representation (e.g. Hilbert transform). These two families are complementary, having application-specic advantages and disadvantages, and in many cases both are able to provide equivalent information about the signal (Quiroga et al. [2002] ). Due to the inherent robustness in handling noisy oscillatory data and its interpretability advantages, we implemented at the core of pyBOAT a continuous-wavelet-transform approach.

As a software package pyBOAT combines multiple steps in the analysis of oscillatory time series in an easy-to-use graphical user interface that requires no prior programming knowledge. With only two user-dened parameters, pyBOAT is able to proceed without further intervention with optimized detrending, amplitude envelope removal, spectral analysis, detection of main oscillatory components (ridge detection), oscillatory parameters readout and visualization plots (Figure 1A) . pyBOAT is developed under an open-source license, is freely available for download and can be installed on multiple operatings systems.

In the rst section of this work we lay out the mathematical foundations at the core of pyBOAT. In the subsequent section 3 we describe artifacts generated by the widely used smoothing and detrending techniques and how they are resolved within pyBOAT. In section 4 we describe the theory behind spectral readouts in the special case of complex amplitude envelopes. We nalize this manuscript with a short description of the user interface and software capabilities. shown together with a sweeping signal whose instantaneous period coincides with the Morlet of scale s = 1 exactly at τ 2 . Bottom panel:

Result of the convolution with the sliding Morlet Ψ 1,τ (t) along signal f (t). The power quickly decreases away from τ 2 . The curve corresponds to one row in the Wavelet power spectrum of panel D). C) Synthetic signal with periods sweeping from T 1 = 30s (f 1 ≈ 0.033Hz) to T 2 = 70s (f 2 ≈ 0.014Hz). D) Wavelet power spectrum shows timeresolved (instantaneous) periods.

In this section we aim to lay down the basic principles of wavelet analysis as employed in our signal analysis tool, albeit the more mathematical subtleties are moved to the Appendix.

The classic approach to do frequency analysis of periodic signals is the wellknown Fourier analysis. Its working principle is the decomposition of a signal f (t) into Sines and Cosines, known as basis functions. These harmonic components have no localization in time but are sharply localized in frequency: each harmonic component carries exactly one frequency eective everywhere in time. Thus, the straighforward Fourier analysis underperforms in cases of time-dependent oscillatory features, such as when the period of the oscillation changes in time ( Figure 1C ). The goal behind wavelets is to reach an optimal compromise between time and frequency localization (Gabor [1946] ). Gabor introduced a Gaussian modulated harmonic component, also known as Morlet wavelet:

The basis harmonic functions for time-frequency analysis are then generated from the mother wavelet by scaling and translation:

Varying the time localization τ slides the wavelet left and right on the time axis. Scale s changes the center frequency of the Morlet wavelet according to ω center (s) = ω 0 /s (see also Appendix equation (8)). Higher scales therefore generate wavelets with lower center frequency. The Gaussian envelope suppresses the harmonic component with frequency ω center farther away from τ , therewith localizing the wavelet in time ( Figure 1B top panel). This frequency ω center (s) is conventionally taken as the Fourier equivalent (or pseudo-) frequency of a Morlet wavelet with scale s. It is noteworthy to state that wavelets are in general not as sharply localized in frequency as their harmonic counterparts ( Figure S1 ). This is a trade-o imposed by the uncertainty principle to gain localization in time (Gröchenig [2013] ). The wavelet transform of a signal f (t) is given by the following integral expression:

For a xed scale, this equation has the form of a convolution as denoted by 4 the ' * ' operator, whereΨ denotes the complex conjugate of Ψ. For an intuitive understanding it is helpful to consider above expression as the cross-correlation between the signal and the wavelet of scale s (or center frequency ω center (s)).

The translation variable τ slides the wavelet along the signal. Since the wavelet decays fastly away from τ , only the instantaneous correlation of the wavelet with center frequency ω center and the signal around τ signicantly contributes to the integral ( Figure 1B middle and lower panel). By using an array of wavelets with dierent frequencies (or periods), this allows to scan for multiple frequencies in the signal in a time-resolved manner. The result of the transform: W : f (t) → f (t, ω) is a complex valued function of two variables, frequency ω and time localization τ . In the following, we implicitly convert scales to frequency via the corresponding central frequencies ω center (s) of the Morlet wavelets. To obtain a physically meaningful quantity, one denes the wavelet power spectrum:

We adopted the normalization with the variance of the signal σ 2 from Torrence and Compo [1998] as it allows for a natural and statistical interpretation of the wavelet power. By stacking the transformations in a frequency order, one constructs a two-dimensional time-frequency representation of the signal, where the power itself is usually color coded ( Figure 1D ), using a dense set of frequencies to scan for approximates of the continuous Wavelet transform.

It is important to note that the time averaged wavelet power spectrum

is an unbiased estimator for the true Fourier power spectrum P F f of a signal f (Percival [1995] ).

This allows to directly compare Fourier power spectra to the wavelet power. White noise is the simplest noise process which may serve as a null hypothesis. Normalized by variance white noise has a at mean Fourier power of one for all frequencies. Hence, also the variance normalized Wavelet power of one corresponds to the mean expected power for white noise: P F W N (ω) = 1 ( Figure 2C ). This serves as a universal unit to compare dierent empirical power spectra.

Extending these arguments to random uctuations of the Fourier spectrum allows for the calculation of condence intervals on wavelet spectra. If a background spectrum P F 0 (ω) is available, the condence power levels can be easily calculated as:

Assuming normality for the distribution of the complex Fourier components of the background spectrum, one can derive that the power itself is chi-square dis-tributed (Chateld [1995] ). Thus, picking a desired condence (e.g. χ 2 (95%) ≈ 6) gives the scaling factor for the background spectrum. Only Wavelet powers greater than this condence level are then considered to indicate oscillations with the chosen condence. The interested reader may nd more details in section 4 of Torrence and Compo [1998] . A wavelet power of C = 3 corresponds to the 95% condence interval in case of white noise ( Figure 2B ), which is frequency independent. For the practitioner, this should be considered the absolute minimum power, required to report 'oscillations' in a signal. It should be noted that especially for biological time series, due to correlations present also in non-oscillatory signals, white noise often is a poor choice for a null model (see also supplementary information A.3). A possible solution is to estimate the background spectrum from the data itself, this however is beyond the scope of this work.

3 Optimal Filtering -Do's and Dont's A biological recording can be decomposed into components of interest and those elements which blur and challenge their analysis, most commonly noise and trends. Various techniques for smoothing and detrending have been developed to deal with these issues. Often overlooked is the fact that both, smoothing and detrending operations can introduce spectral biases, i.e. attenuation and amplication of certain frequencies. In this section we lay out a mathematical framework to understand and compare the eects of these two operations, showing examples of the potential pitfalls and at the same time providing a practical guide to avoid these issues. Finally, we discuss how pyBOAT minimizes most of these common artifacts.

The operation which removes the fast, high-frequency (low period) components of a signal is colloquially called smoothing. This is most commonly done as a sliding time window operation (convolution). In general terms we can refer to a window function w(t) such that the smoothed signal is given as:

By employing the convolution theorem, it turns out that the Fourier transformation of the smoothed signal

is simply given by the product of the individual Fourier transforms.

6 It follows that the Fourier power spectrum of the smoothed signal reads as:

Applying a few steps of Fourier algebra shows that the original power spectrum P F f gets modied by the low pass response of the window function | w| 2 scaled by the ratio of variances σ 2 f /σ 2 f S . Also without resorting to mathematical formulas, smoothing and its eect on the time-frequency analysis can be easily grasped visually. A broad class of ltering methods falls into the category of convolutional ltering, meaning that there is some operation in a sliding window done to the data, e.g. for moving average and LOESS or Savitzky-Golay ltering (Savitzky and Golay [1964] ). Moving average lter is a widely used smoothing technique, dened simply by a box-shaped window that slides in the time domain. In Figure 2 we summarize the spurious eects that this lter can have on noisy biological signals. White noise, commonly used as descriptor for uctuations in biological systems, is a random signal with no dominant period. The lack of dominant period can be seen from a raw white noise signal ( Figure 2A ) and more clearly from the almost at landscape on the power spectrum ( Figure  2B ). Applying to raw white noise signal a moving average lter of size 5-times the signal's sampling interval (M = 5∆t) leads to a smoothed noise signal (Figure 2C ) that now has multiple dominant periods, as seen by the emergence of high power islands in Figure 2D . Comparing the original spectrum ( Figure 2B ) with the white noise smoothed spectrum ( Figure 2D ), it becomes evident that smoothing introduces a strong increase in Wavelet power for longer periods. In other words, smoothing perturbs the original signal by creating multiple highpower islands of long periods, also referred as spurious oscillations. To better capture the statistics behind these smoothing-induced spurious oscillations, it is best to look at the time averaged Wavelet spectrum. Figure 2E shows the mean expected power after smoothing white noise with a moving average lter. A zone of attenuated small periods become visible, the sloppy stopband for periods around 7∆t. These are the fast, high-frequency components which get removed from the signal. However, for larger periods 10 the Wavelet power gets amplied up to 5-fold. It is this gain, given by σ 2 f /σ 2 f S , which leads to spurious results in the analysis.

As stated before, variance normalized white noise has a mean power of 1 for all frequencies or periods (P F W N (ω) = 1). This allows to use a straightforward numerical method to estimate a lter response | w(ω)| 2 , i.e. applying the smoothing operation to simulated white noise and time averaging the Wavelet spectra. This Monte Carlo approach works for every (also non-convolutional) smoothing method. Results for the Savitzky-Golay lter applied to white noise signals can be found in Supplementary Figure S2 . Convolutional lters will in general produce more gain and hence more spurious oscillations with increasing window sizes in the time domain.

If smoothing, even with a rather small window (M = 5∆t), already potentially introduces false positive oscillations, what does that mean for practical time-frequency analysis? For Wavelet analysis the answer is plain and clear: smoothing is simply not needed at all. A close inspection of the unaltered white noise Wavelet spectrum shown in Figure 2B , shows the same structures for higher periods as in the spectrum of the smoothed signal ( Figure 2D ). The big dierence is, that even though these random apparent oscillations get picked up by the Wavelets, their low power directly indicates their low signicance. As Wavelet analysis (see previous section) is based on convolutions, it already has power preserving smoothing built in. As illustration, we show in Figure 2F a raw noisy signal with lengthening period (noisy chirp) and the corresponding power spectrum ( Figure 2F lower panel). Without any smoothing the main periodic signal can be clearly identied in the power spectrum. Thus, Wavelet analysis does not require smoothing for the detection of oscillations in very noisy signals. For all other spectrum analysis methods which rely on explicit smoothing, characteristics of the background noise and the signal to noise ratio are crucial to avoid detecting spurious oscillations. These are usually both quantities not readily available a priori or in practice.

Complementary to smoothing, an operation which removes the slow, low frequency components of a signal is generally called detrending. Strong trends can dominate a signal by eectively carrying most of the variance and power. There are at least two broad classes of detrending techniques: parametric tting and convolution based. Both aim to estimate the trend as a function over time to be subtracted from the original signal. A parametric t always is the best choice, if the deterministic processes leading to the trend are known and well understood. An example is the so called photobleaching encountered in time-lapse uorescent imaging experiments, here an exponential trend can often be well tted to the data based on rst principle deliberations (Song et al. [1995] ). However, there are often other slow processes, like cell viability or cells drifting in and out of focus, which usually can't be readily described parametrically. For all these cases convolutional detrending with a window function w(t) is a good option and can be written as:

The trend here is nothing less than the smoothed original signal, i.e. f (t) * w(t).

However with the signal itself falling into the stop-band of the low-pass lter, with the aim to not capture and subtract any signal components. Using basic algebra in the frequency domain we obtain an expression relating 8 the window w(t) to the power spectrum of the original signal f (t):

As in the case of smoothing, the so called high-pass response of the window function is given by (1 − | w|) 2 and scaled with the ratio of variances σ 2 f /σ 2 f D . In strong contrast to smoothing, there is no overall gain in power in the range of the periods, passing through the lter (called passband). There is however, in case of moving average and other time-domain lters (see also Figure S2 ), no simple passband region. Instead, there are rippling artifacts in the frequency domain, meaning some periods getting amplied to up to 150% and others attenuated by up to 25%. To showcase why this can be problematic, we constructed a synthetic chirp signal sweeping through a range of periods T 1 − T 2 , however, this time modied by a linear and an oscillatory trend ( Figure 3A ). The oscillatory component of the trend was chosen for clarity with a specic time scale given by its period T trend , which is three times the longest period found in the chirp signal. Strongly depending on the specic window size chosen for the moving average lter, there are various eects on both, the time and frequency domain (shaded area in Figure 3B ) such as the introduction of amplitude envelopes and/or incomplete trend removal ( Figure 3C ). A larger window size is better to reduce the eect of the ripples inside the passband. However, the lter decay (roll-o ) towards larger periods becomes very slow. That in turn means that trends can not be fully eliminated. Smaller window sizes perform better in detrending, but their passband can be dominated by ripples (see also supplementary Figure S3 ). In practice, sticking to lters originally designed for the time-domain without having oscillatory signals in mind, can easily lead to biased results of a time-frequency analysis. However, given the moderate gains of the detrending lter response, there is a much smaller chance to mistakenly detect spurious oscillations compared to the case of smoothing.

The Sinc lter, also known as the optimal lter in the frequency domain, is a function with a constant value of one in the passband. In other words, frequencies which pass through are neither amplied nor attenuated. Accordingly, this lter also should be constantly zero in the stop-band, the frequencies (or periods) which should be ltered out. This optimal low-pass response can be formulated in the frequency domain simply as:

Here ω c is the cut o frequency, an innitely sharp transition band dividing the frequency range into pass-and stop-band. It is eectively a box in the frequency domain (dashed lines in Figure 3D ). Note that the optimal high-pass or detrending response simply and exactly swaps the pass-and stop-band. In the time domain via the inverse Fourier transform, this can be written as:

This function is known as the sinc function and hence the name sinc lter. An alternative name used in electrical engineering is brick-wall lter. In practice, this optimal lter has a nonzero roll-o as shown for two dierent cut-o periods (T c = 2π ωc ) in Figure 3D . The sinc function mathematically requires the signal to be of innite length. Therefore, every practical implementation implements windowed sinc lters (Smith et al. [1997] ), see also supplementary information S3) about possible implementations. Strikingly still, there are no ripples or other artifacts in the frequency-response of the windowed sinc lter. And hence also the 'real world' version allows for a bias free time-frequency analysis. As shown in Figure 3E , the original signal can be exactly recovered via detrending. To showcase the performance of the sinc lter, we numerically compared its performance against two other common methods, the Hodrick-Prescott and moving average lter ( Figure 3F ). The stop-and passband separation of the sinc lter clearly is the best, although the Hodrick-Prescott lter with a parameterization as given by Ravn and Uhlig [2002] also gives acceptable results (see also supplementary Figure S5 ). The moving average is generally inadvisable, due to its amplication right at the start of the passband.

In addition to its advantages in practical signal analysis, the Sinc lter also allows to analytically calculate the gains from ltering pure noise (see also supplementary information A.3). The gain, and therefore the probability to detect spurious oscillations, introduced from smoothing is typically much larger compared to detrending. However, if a lot of energy of the noise is concentrated in the slow low frequency bands, also detrending with small cut-o periods alone can yield substantial gains (see Figure S6 and Figure S7 ). Importantly, when using the Sinc lter, the background spectrum of the noise will always be uniformly scaled by a constant factor in the pass-band. There is no mixing of attenuation and amplication as for time-domain lters like moving average ( Figure 3B and C). If the spectrum of the noise can be estimated, or an empirical background spectrum is available, the theory presented in A.3 allows to directly calculate the correct condence intervals. extraction of the instantaneous period, amplitude and phase of that component is of prime interest for the practitioner. In this section we show how to obtain these important features using wavelet transforms as implemented in pyBOAT.

From the perspective of Wavelet power spectra, such main oscillatory components are characterized by concentrated and time-connected regions of high power. Wavelet ridges are a means to trace these regions on the time-period plane. For the vast majority of practical applications a simple maximum ridge extraction is sucient. This maximum ridge can be dened as:

with n being the number of sample points in the signal. Thus, the ridge r(t k ) maps every time point t k to a row of the power spectrum, and therewith to a specic instantaneous period T k ( Figure 4C and D). Evaluating the power spectrum along a ridge, gives a time series of powers: P W f (t k , r(t k )). Setting a power threshold value is recommended to avoid evaluating the ridge in regions of the spectrum where the noise dominates (In Figure 4C threshold is set to 5). Alternatively to simple maximum ridge detection, more elaborated strategies for ridge extraction have been proposed (Carmona et al. [1995] ).

A problem often encountered when dealing with biological data is a general time-dependent amplitude envelope ( Figure 4A ). Under our wavelet approach, the power spectrum is normalized with the overall variance of the signal. Consequently, regions with low signal amplitudes but robust oscillations are nevertheless represented as very low power blurring them with the spectral oor ( Figure 4C ). This leads to the impractical situation, where even a noise free signal with an amplitude decay will show very low power at the end ( Figure 4E ,F and S8), defeating its statistical purpose. A practical solution in this case is to estimate an amplitude envelope and subsequently normalize the signal with this envelope ( Figure 4A and B). We specically show here non-periodic envelopes, estimated by a sliding window (see also methods). After normalization, lower amplitudes are no longer penalized and an eective power-thresholding of the ridge is possible (Figure 4D and F) .

A limitation of convolutional methods, including Wavelet-based approaches, are edge eects. At the edges of the signal, the Wavelets only partially overlap with the signal leading to a so-called cone of inuence (COI) (Figure 4C and D). Even though the periods are still close to the actual values, phases and especially the power should not be trusted inside the COI (see discussion and supplementary Figure S9 ). Once the trace of consecutive Wavelet power maxima has been determined and thresholded, evaluating the transform along it yields the instantaneous envelope amplitude, normalized amplitude and phases (see Figure 4E ,F and G). To obtain 14 Applications After introducing the dierent time series analysis steps using synthetic data for clarity, in this paragraph, we discuss examples of pyBOAT applications to real data. To showcase the versatility of our approach, we chose datasets obtained from dierent scientic elds.

In Figure 5A we display COVID-19 infections in Italy as reported by the Disease Prevention and Control (ECDC). A Sinc-lter trend identication with a cut-o period of 14 days reveals a steep increase in newly reported infections at the beginning of March and a steady decline after the beginning of April. Subtracting this non-linear trend clearly exposes oscillations with a stable period of one week ( Figure 5B , and see Supplementary Figure S10A for power spectrum analysis). Similar ndings were recently reported in Ricon-Becker et al. [2020] .

The signals shown in Figure 5C show cycles in Hare-Lynx population sizes as inferred from the yearly number of pelts, trapped by the Hudson Bay Company. Data has been taken from Odum and Barrett [1971] . The corresponding power spectra are shown in the supplement ( Figure S10B ), and reveal a fairly stable 10 year periodicity. After extracting the instantaneous phases with pyBOAT, we calculated the time-dependent phase dierences as shown in Figure 5B . Interestingly, the phase dierence slowly varies between being almost perfectly out of phase and being in phase for a few years around 1885.

The next example signal shows a single-cell trajectory of a U2OS cell carrying a Geminin-CFP uorescent reporter (Granada et al. [2020] ). Geminin is a cell cycle reporter, accumulating in the G2 phase and then steeply declining during mitosis. Applying pyBOAT on this non-sinusoidal oscillations reveals the cellcycle length over time ( Figure 5F ), showing a slowing down of the cell cycle progression for this cell. Ensemble dynamics for a control and a cisplatin treated population are shown in the Supplementary Figure S10C .

The nal example data set is taken from Mönke et al. [2017] , here populations of MCF7 cells where treated with dierent dosages of the DNA damaging agent NCS. This in turn elicits a dose-dependent and heterogeneous p53 response, tracked in the individual cells for each condition ( Figure 5G ). pyBOAT also features readouts of the ensemble dynamics: Figure 5H shows the time-dependent period distribution in each population, Figure 5I the phase coherence over time. The latter is calculated as R(t) = j e iφ j (t) . It ranges from zero to one and is a classical measure of synchronicity in an ensemble of oscillators Kuramoto [1984] . The strongly stimulated cells (400ng NCS) show stable oscillations with a period of around 240min, and retain more phase coherent after an initial drop in synchronicity. The medium stimulated cells (100ng NCS) start to slow down on average already after the rst pulse, both the spread of the period distribution and the low phase coherence indicate a much more heterogeneous response. Two individual cells and their wavelet analysis are shown in Supplementary Figure S10D . Graphical interface

The extraction of period, amplitude and phase is the nal step of our proposed analysis workow, which is outlined in the screen-captures of Figure 6 . The user interface is separated into several sections. First, the 'DataViewer' allows to visualize the individual raw signals, the trend determined by the Sinc lter, the detrended time series and the amplitude envelope. Once satisfactory parameters have been found, the actual Wavelet transform together with the ridge are shown in the 'Wavelet Spectrum' window. After ridge extraction, it is possible to plot the instantaneous observables in a 'Readout' window. Each plot produced from the interface can be panned, zoomed in and saved separately if needed.

Once the delity of the analysis has been checked for individual signals, it is also possible to run the entire analysis as a 'batch process' for all signals imported. One aggregated result which we found quite useful is to determine the 'rhythmicity' of a population by creating a histogram of time-averaged powers of the individual ridges. A classication of signals into 'oscillatory' and 'nonoscillatory' based on this distribution, e.g. by using classical thresholds (Otsu [1979] ) is a potential application. Examples of the provided ensemble readouts are shown in Figure 5H and I and Supplementary Figure S10C .

Finally, pyBOAT also features a synthetic signal generator, allowing to quickly explore its capabilities also without having a suitable dataset at hand. A synthetic signal can be composed of up to two dierent chirps, and AR1-noise and an exponential envelope can be added to simulate possible challenges often present in real data (see also Material and Methods and Supplementary Figure  S11 ).

Installation and user guidelines of pyBOAT can be found in the github repository. gets detrended with a cut-o period of 90h, and an amplitude envelope is estimated via a window of size 50h. See labels and main text for further explanations. The example trajectory displays a circadian rhythm of 24h and is taken from the data set published in Abel et al. [2016] .

Recordings of biological oscillatory signals can be conceptualized as an aggregate of multiple components, those coming from the underlying system of interest and additional confounding factors such as noise, modulations and trends that can disguise the underlying oscillations. In cases of variable period with noisy amplitude modulation and non-stationary trends the detection and analysis of oscillatory processes is a non-trivial endeavour. Here we introduced pyBOAT, a novel software package that uses a statistically rigorous method to handle non-stationary rhythmic data. pyBOAT integrates pre-and post-processing steps without making a priori assumptions about the sources of noise and periodicity of the underlying oscillations. We showed how the signal processing steps of smoothing, detrending, amplitude envelope removal, signal detection and spectral analysis can be resolved by our hands-o standalone software (Figure 1 and 5).

Artifacts introduced by the time series analysis methods itself are a common problem that inadvertently disturbs results of the time-frequency analysis of periodic components (Wilden et al. [1998] ). Here we rst analyzed the eects of data-smoothing on a rhythmic noisy signal and showed how common smoothing approaches disturb the original recordings by introducing non-linear attenuations and gains to the signal (Figures 2, S6 and S7 ). These gains easily lead to spurious oscillations that were not present in the original raw data. These artifacts have been characterized since long for the commonly used moving-average smoothing method, known as the Slutzky-Yule eect (Slutzky [1937] ). Using an analytical framework, we describe the smoothing process as a lter operation in frequency domain. This allows us to quantify and directly compare the eects of diverse smoothing methods by means of response curves. Importantly, we show here how any choice of smoothing unavoidably transforms the original signal in a non-trivial manner. One potential reason for its prevalence is that practitioners often implement a smoothing algorithm without quantitatively comparing the spectral components before versus after smoothing. pyBOAT avoids this problem by implementing a wavelet-based approach that per se evades the need to smooth the signal.

Another source of artifacts are detrending operations. Thus, we next studied the spectral eects that signal detrending has on rhythmic components. Our analytical and numerical approaches allowed us to compare the spectral eects of dierent detrending methods in terms of their response curves (see Figure  3 ). Our results show that detrending also introduces non-trivial boosts and attenuations to the oscillatory components of the signal, strongly depending on the background noise ( Figures S6 and S7 ). In general there is no universal approach and optimally a detrending model is based on information about the sources generating the trend. In cases without prior information to formulate a parametric detrending in the time domain, we suggest that the safest method is the convolution based sinc lter , as it is an "ideal" (step-function) lter in the frequency domain ( Figures 3C and S3 ). Furthermore we compared the performance of the sinc lter with two other commonly applied methods to remove non-linear trends in data ( Figure 3F ), i.e. the moving average (Díez-Noguera [2013] ) and Hodrick-Prescott (Myung et al. [2012] , Schmal et al. [2018] , St. John and Doyle [2015] ) lter.

In addition to smoothing and detrending, amplitude-normalization by means of the amplitude envelope removal is another commonly used data processing step that pyBOAT is able to perform. Here we further show how that for decaying signals amplitude normalization grants that the main oscillatory component of interest can be properly identied in the power spectrum ( Figure 4A to D). This main component is identied by a ridge-tracking approach that can be then used to extract instantaneous signal parameters such as amplitudes, power and phase ( Figure 4E to G).

Rhythmic time series can be categorized into those showing stationary oscillatory properties and the non-stationary ones where periods, amplitudes and phases change over time. Many currently available tools for the analysis of biological rhythms rely on methods aimed at stationary oscillatory data, using either a standalone software environment such as BRASS (Edwards et al. [2010] , Locke et al. [2005] ), ChronoStar (Klemz et al. [2017] ) and CIRCADA (Cenek et al. [2020] ) or an online interfaces such as BioDare , Zielinski et al. [2014] ).

Continuous wavelet analysis allows to reveal non-stationary period, amplitude and phase dynamics and to identify multiple frequency components across dierent scales within a single oscillatory signal (Leise [2013] , Leise et al. [2012] , Rojas et al. [2019] ) and is thus complementary to approaches that are designed to analyze stationary data. In contrast to the R-based waveclock package (Price et al. [2008] ), pyBOAT can be operated as a standalone software tool that requires no prior programming knowledge as it can be fully operated using its graphical user interface (GUI). An integrated batch processing option allows the analysis of large data sets within a few "clicks". For the programming interested user, pyBOAT can also be easily scripted without using the GUI, making it simple to integrate it into individual analysis pipelines. pyBOAT also distinguishes itself from other wavelet-based packages (e.g. Harang et al. [2012] ) by adding a robust sinc lter-based detrending and a statistically rigorous framework, providing the interpretation of results by statistical condence considerations.

pyBOAT is not specically designed to analyze oscillations in high-throughput "omics" data. For such sake, specialized algorithms such as ARSER (Yang and Su [2010] ), JTK-cycle (Hughes et al. [2010] ), MetaCycle (Wu et al. [2016] ) or RAIN (Thaben and Westermark [2014] ) are more appropriate. Its analysis reveals basic oscillatory properties such as the time-dependent (instantaneous) rhythmicity, period, amplitude and phase but is not aimed at more specic statistical tests such as, e.g., tests for dierential rhythmicity as implemented in DODR (Thaben and Westermark, 2016) . The continous-wavelet analysis underlying pyBOAT requires equidistant time series sampling with no gaps. Methods such as Lomb-Scargle periodograms or harmonic regressions are more robust or even specically designed with respect to unevenly-sampled data (Lomb [1976] , Ruf [1999] ). Being beyond the scope of this manuscript, it will be interesting in future work to integrate the ability to analyze unevenly sampled data into the pyBOAT software, either by the imputation of missing values (e.g. by linear interpolation) or the usage of wavelet functions specically designed for this purpose (Thiebaut and Roques [2005] ).

pyBOAT is fast, easy too use and statistically robust analysis routine designed to complement existing methods and advance the ecient time series analysis of biological rhythms research. In order to make it publicly available, pyBOAT is a free and open-source, multi-platform software based on the popular Python (Van Rossum and Drake [2009] ) programming language. It can be downloaded using the following link : https://github.com/tensionhead/pyBOAT, and is available on the Anaconda distribution (via the conda-forge channel).

Software pyBOAT is written in the Python programming language (Van Rossum and Drake [2009] ). It makes extensive use of Python's core scientic libraries numpy and scipy (Virtanen et al. [2020] ) for the numerics. Additionally we use matplotlib (Hunter [2007] for visualization, and pandas (McKinney [2010] ) for data management. pyBOAT is released under the open source GPL-3.0 license, and its code is freely available from https://github.com/tensionhead/pyBOAT. The readme on this repository contains further information and installation instructions. pyBOAT is also hosted on the popular Anaconda distribution, as part of the conda-forge community https://conda-forge.org/.

To estimate the amplitude envelope in the time domain, we employ a moving window of size L and determine the minimum and maximum of the signal inside the window for each time point t. The amplitude at that time point is then given by a(t) = 1 2 (max(t) − min(t)). This works very well for envelopes with no periodic components, like an exponential decay. However, this simple method is not suited for oscillatory amplitude modulations. It is also recommended to sinc-detrend the signal before estimating the amplitude envelope. Note that L should always be larger then the maximal expected period in the signal, as otherwise the signal itself gets distorted.

A noisy chirp signal can be written as: f (t i ) = a cos(φ(t i )) + d x(t i ), where φ(t i ) is the instantaneous phase and the x(t i ) are samples from a stationary stochastic process (the background noise). The increments of the t i are the sampling interval: t i+1 − t i = ∆t, with i = 0, 1, ..., N samples. Starting from a linear sweep through angular frequencies: ω(0) = ω 1 and ω(t N ) = ω 2 , we have ω(t) = ω 2 −ω 1 t N t + ω 1 . The instantaneous phase is then given by

Sampling N times from a gaussian distribution with standard deviation equal to one corresponds to gaussian white noise ξ(t i ). With x(t i ) = ξ(t i ) the signal to noise ration (SNR) then is a 2 /d 2 .

A realization of an AR1 process can be simulated by a simple generative procedure: the inital x(t 0 ) is a sample from the standard normal distribution. Then the next sample is given by: x(t i ) = αx(t i−1 ) + ξ(t i ), with α < 1.

Simulating pink noise is less straightforward, and we use the Python package colorednoise from https://pypi.org/project/colorednoise for the simulations. Its implementation is based on Timmer and Koenig [1995] .

Functional network inference of the suprachiasmatic nucleus

Quantitative analysis of circadian single cell oscillations in response to temperature

Identication of chirps with continuous wavelet transform

Circada: Shiny apps for exploration of experimental and synthetic circadian time series with an educational emphasis

Problem solving: a statistician's guide

Methods for serial analysis of long time series in the study of biological rhythms

Quantitative analysis of regulatory exibility under changing environmental conditions

Calls out of chaos: the adaptive signicance of nonlinear phenomena in mammalian vocal production

Theory of communication. part 1: The analysis of information

Systems biology of cellular rhythms

The eects of proliferation status and cell cycle phase on the responses of single cells to chemotherapy

Foundations of time-frequency analysis

Circannual rhythms in birds

Circadian System Phase An Aspect of Temporal Morphology; Procedures and Illustrative Examples

Wavos: a matlab toolkit for wavelet analysis and visualization of oscillatory systems

Jtk_cycle: An ecient nonparametric algorithm for detecting rhythmic components in genomescale data sets

Matplotlib: A 2d graphics environment

Hilbert transformer and time delay: statistical comparison in the presence of gaussian noise

Reciprocal regulation of carbon monoxide metabolism and the circadian clock

Chemical turbulence

Scaling of embryonic patterning based on phase-gradient encoding

Wavelet analysis of circadian and ultradian behavioral rhythms

Persistent cellautonomous circadian oscillations in broblasts revealed by six-week singlecell imaging of per2::luc bioluminescence

Extension of a genetic network model by iterative experimentation and mathematical analysis

Least-squares frequency analysis of unequally spaced data

Mckinney, wes. "data structures for statistical computing in python

Excitability in the p53 network mediates robust signaling with tunable activation thresholds in single cells

Online period estimation and determination of rhythmicity in circadian data, using the biodare data infrastructure

Period Coding of Bmal1 Oscillators in the Suprachiasmatic Nucleus

Circadian rhythms determined by cosine curve tting: analysis of continuous work and sleep-loss data

Fundamentals of ecology

Dissociation of per1 and bmal1 circadian rhythms in the suprachiasmatic nucleus in parallel with behavioral outputs

A threshold selection method from gray-level histograms

On estimation of the wavelet variance

Circadian oscillations in rodents: A systematic increase of their frequency with age

WAVECLOCK: wavelet analysis of circadian oscillation

Performance of dierent synchronization measures in real data: a case study on electroencephalographic signals

On adjusting the hodrick-prescott lter for the frequency of observations

A sevenday cycle in covid-19 infection and mortality rates: Are inter-generational social interactions on the weekends killing susceptible people? medRxiv

Plant dormancy in the perennial context

Beyond spikes: Multiscale computational analysis of in vivo long-term recordings in the cockroach circadian clock

The lomb-scargle periodogram in biological rhythm research: Analysis of incomplete and unequally spaced time-series

Smoothing and dierentiation of data by simplied least squares procedures

Plasticity of the intrinsic period of the human circadian timing system

Measuring relative coupling strength in circadian systems

The summation of random causes as the source of cyclic processes

The scientist and engineer's guide to digital signal processing

Photobleaching kinetics of uorescein in quantitative uorescence microscopy

A doppler eect in embryonic pattern formation

Quantifying Stochastic Noise in Cultured Circadian Reporter Cells

Least squares analysis of uorescence data

Detecting rhythms in time series with rain

Dierential rhythmicity: detecting altered rhythmicity in biological data

Time-Scale and Time-Frequency Analyses of Irregularly Sampled Astronomical Time Series

On generating power law noise

A practical guide to wavelet analysis

Self-organization of embryonic genetic oscillators into spatiotemporal wave patterns

Python 3 Reference Manual. Cre-ateSpace

Quantication of circadian rhythms in single cells

Subharmonics, biphonation, and deterministic chaos in mammal vocalization

MetaCycle: an integrated R package to evaluate periodicity in large scale data

Analyzing circadian expression data by harmonic regression based on autoregressive spectral estimation

Strengths and limitations of period estimation methods for circadian data

We gratefully thank Bharath Ananthasubramaniam, Hanspeter Herzel, Pedro Pablo Rojas and Shaon Chakrabarti for fruitful discussions and comments on the manuscript. We further thank Jelle Scholtalbers and the GBCS unit at the EMBL in Heidelberg for technical support. We thank members of the Aulehla and Leptin labs for comments, support and helpful advice.