1 Introduction

Cryptocurrency refers to electronic or virtual currency used for online transactions and asset transfers, secured through cryptographic techniques that regulate the creation of new units and safeguard transactions, ensuring secure transfers [3, 18]. Unlike conventional currencies, cryptocurrencies operate on a decentralized framework without central banking systems. Due to their uncontrolled and untraceable nature, the cryptocurrency market has experienced nearly exponential growth [4, 12, 14].

As cryptocurrencies gain recognition as a new electronic option for currency exchange, they have become increasingly important in emerging economies and the global financial landscape [11]. They are now integral to numerous financial transactions, making them a promising investment avenue. However, the market is characterized by significant volatility, price fluctuations, and significant randomness. Predicting cryptocurrency trends is a formidable challenge in time-series forecasting due to the multitude of unpredictable variables, pronounced volatility in pricing, the lack of stationarity, and the fact that they are close to a random process [6, 10].

Researchers have proposed various market rules to guide investors in their decisions, using historical price movements and trading volume in the form of technical analysis. This analysis helps forecast the continuation or reversion of market trends by identifying patterns in past price data. Nevertheless, manual trading remains difficult due to market uncertainty, emotional influences on traders [16], and the infinitude of indicators and other financial data [1]. Algorithmic trading, incorporating machine learning techniques, offers a solution. Deep learning techniques, particularly Convolutional Neural Networks (CNN) and long short-term memory (LSTM) layers, have proven effective for financial forecasting in the cryptocurrency market. LSTM layers efficiently capture sequential patterns in both long- and short-term dependencies, whereas convolutional layers serve to eliminate noise from intricate time-series datasets and extract valuable patterns [9].

The literature presents a variety of models aimed at predicting the dynamics of financial markets, often achieving impressive results in price prediction. For instance, Selvin et al. compare classical econometric methods with cutting-edge deep learning techniques, concluding that the latter are superior [15]. Similarly, Zhanhong He et al. study different architectures for predicting gold prices, identifying a combination of CNN, LSTM, and attention mechanisms as the most effective [5]. Zhuorui Zhang et al. develop a robust price prediction regressor using memory and convolutional layers that consider the interrelations between cryptocurrencies [19]. In another study, Ioannis E. Livieris et al. create a CNN-LSTM based neural network to predict gold prices, which shows promising results in regression but falls short in trend classification, with accuracy close to randomness [9]. Livieris et al. also propose a non-sequential CNN-LSTM neural network for price regression and trend classification, again finding trend classification results near randomness [8]. Yanhui Liang et al. take a different approach by decomposing gold prices into various frequencies before feeding them into a CNN-LSTM structure, though they do not test the model in trend classification [7]. Sima Siami-Namini et al. compare Bidirectional Long Short Term Momory (BiLSTM) and LSTM for price prediction, showing that BiLSTM achieves lower errors, but they do not test the model in trend classification [17]. Iromie K. Samarasekara et al. focus on risk management rather than trading, developing a dynamic Stop-Loss tool that uses a CNN-LSTM trend classifier to feed a price regressor, the study showed good results compared to other tools of risk management [13]. Faraz et al. enhance an LSTM price regressor with an LSTM autoencoder, achieving good results in price prediction but not testing it in trend classification [2]. Despite these successes in price prediction, trend prediction often yields poor results. To address this gap, the current study proposes a novel trend classification method using a robust BiLSTM-CNN model and suggests linear regression for better sample labeling in trend slope regression and classification tasks.

To effectively manage the inherent noise in financial markets, reduce input dimensionality, and eliminate outliers, an encoding structure is applied to the features. This compression process, handled by BiLSTM layers, forces market noise filtering, which is essential when dealing with financial data. Once encoded, the features are fed into a BiLSTM-CNN neural network. The BiLSTM layers track short- and long-term dependencies in both directions of the time series, while the CNN layers capture significant patterns. The model is then trained using labels produced by linear regression over past and future price candles. Trends are identified by their slope, with an ascending slope indicating a bull market and a descending one signaling a bear market. This approach enhances accuracy in identifying and predicting market trends. The point-by-point contributions are listed below:

  • A novel autoencoder architecture to filter market noise based on BiLSTM.

  • Proposed a BiLSTM-CNN to handle trend prediction, both in market classification and slope regression tasks.

  • A novel trend labeling system based on linear regression slope.

The rest of this paper is organized as: Sect. 2 describes the related work related to financial forecasting, both in price and trend prediction. Section 3 shows the proposed model. Section 4 discusses the results. Section 5 brings a conclusion, followed by references.

2 Related Work

This session aims to discuss several models and strategies proposed by various researchers. These proposals cover various architectures as well as methods for pre-processing the data and variables that feed the input layer of the various models.

Sreelekshmy Selvin et al. [15] conducted a comparative analysis of three machine learning approaches: Recurrent Neural Network (RNN), Long Short-Term Memory, and Convolutional Neural Network, focusing on their performance in predicting financial market price. Employing a sliding window methodology with a window size of 100 min, wherein 90 min of historical data is utilized to forecast the price of the subsequent 10 min. The method used to evaluate the models is MAPE. Their study revealed that CNN outperformed RNN and LSTM. The superior performance of CNN is attributed to its ability to adapt to the dynamic nature of the market by prioritizing current information over past patterns, which may not always be indicative of future trends. Unlike RNN and LSTM, which rely heavily on historical data, CNN’s emphasis on current information enables it to identify sudden shifts in market trends more effectively. This study underscores the efficacy of CNN in financial forecasting, particularly in scenarios characterized by rapid market changes and volatility.

In the research conducted by Zhanhong He et al. [5], a novel deep learning model for predicting gold prices is introduced, integrating Long Short-Term Memory and Convolutional Neural Networks with an attention mechanism. Their study reveals that arranging the model with the LSTM layer preceding the CNN layer yields superior results compared to the reverse order. Additionally, incorporating an attention mechanism after the LSTM layer and before the CNN layer further enhances predictive accuracy. The author attributes this improvement to the prevention of information loss in the attributes, which would occur if CNN were placed before LSTM. Furthermore, the performance evaluation of the LSTM-CNN hybrid model against CNN and LSTM individually demonstrates a reduction in error, highlighting the efficacy of the combined approach. The author proposes that employing bidirectional LSTM (BiLSTM) could potentially yield even more favorable outcomes, indicating avenues for future research and refinement in forecasting gold prices.

Introducing a new approach to cryptocurrency price prediction, the work of Zhuorui Zhang et al. [19] emphasizes volatility and strong intercorrelations among different cryptocurrencies. Their predictive system comprises three key modules: the attentive memory module, leveraging a Gated Recurrent Unit (GRU) layer and self-attention mechanism to capture both short- and long-term dependencies; the channel weighting module, explicitly modeling interdependencies between cryptocurrency channels to account for their correlated nature; and the convolutional and pooling module, identifying attribute forms crucial for prediction. While the attentive memory module effectively captures dependencies, it struggles to assess the true importance of temporally delayed instances, addressed by the self-attention mechanism, which selects significant information across various time intervals. By explicitly modeling interdependencies between cryptocurrency channels, the channel weighting module proves essential, given the inherent correlations among cryptocurrency prices. The convolutional module further enhances predictive accuracy by identifying attribute forms. Evaluation metrics, including average absolute percentage error and accuracy, demonstrate superior performance of the proposed model over traditional methods like CNN, LSTM, or GRU, indicating its effectiveness in cryptocurrency price prediction.

In the research conducted by Ioannis E. Livieris et al. [9], the focus lies on leveraging convolutional neural networks and long short-term memory neural networks for gold price and movement prediction. The proposed model capitalizes on CNN’s capability to extract crucial information and learn internal representations from time series data, complemented by LSTM’s proficiency in capturing both long- and short-term dependencies. While the study yielded highly accurate results for price prediction with minimal errors, the performance for trend prediction was noted to be average. This observation suggests that while the model effectively captures and predicts the price movements of gold, it may encounter challenges in accurately forecasting the directional trends over time. Further refinements or alternative approaches may be necessary to enhance the model’s efficacy in trend prediction, potentially through additional feature engineering, model adjustments, or the incorporation of complementary techniques.

In another study by Ioannis E. Livieris et al. [8], an advanced non-sequential neural network architecture is proposed to predict cryptocurrency prices. This model capitalizes on the intricate interrelations between different cryptocurrencies by using data from multiple cryptocurrencies as inputs. The research focuses on the three leading cryptocurrencies by market capitalization: Bitcoin, Ethereum, and Ripple. The model is applied to both price regression and trend classification tasks, with its performance evaluated using a range of metrics: Mean Absolute Error (MAE), Root Mean Squared Error (RMSE), and R squared (\(R^2\)) for regression tasks, as well as Accuracy, Geometric Mean, Sensitivity, and Specificity for classification tasks. While the model achieved relatively low errors in regression, its accuracy in trend classification was unfortunately close to randomness.

Yanhui Liang et al. [7] also concentrate on predicting gold prices through a multi-step approach. Initially, they decompose gold prices into distinct frequency components utilizing the Improved Complete Ensemble Empirical Mode Decomposition with Adaptive Noise (ICEEMDAN) algorithm. Subsequently, each of these decomposed components undergoes neural network layers such as CNN and LSTM networks to refine the prediction. The study reveals that decomposing prices into various frequencies significantly enhances prediction accuracy, outperforming alternative decomposition methods. This approach underscores the importance of considering diverse frequency components in price prediction models, highlighting its potential to improve forecasting accuracy in financial markets, particularly for assets like gold characterized by complex and multifaceted price dynamics.

Sima Siami-Namini et al. [17]conducted a comparative study on the performance of LSTM and BiLSTM in price prediction tasks. While BiLSTM is known to excel in tasks like predicting the next word in a sentence, its superiority in time series prediction was uncertain. The study revealed that BiLSTM outperformed LSTM by 37.78% in MSE for price prediction tasks. However, biLSTM requires more effort to train compared to LSTM. The key advantage of BiLSTM in financial data analysis is its bidirectional training, allowing the model to process information both forward and backward, enhancing its predictive capabilities.

Iromie K. Samarasekara et al. [13] deviate from conventional approaches in financial forecasting by focusing their research on stop-loss dynamics. While traditional studies center on developing price prediction systems to aid buying and selling decisions, this work directs attention toward creating a prediction system to optimize stop-loss placement. The authors employ convolutional neural networks and long short-term memory networks to forecast future trends, utilizing these predictions as inputs for a subsequent price prediction system. Subsequently, both the future trend predictor and price predictor are fed into a regression neural network, also leveraging CNN and LSTM architectures, tasked with determining optimal stop-loss positions. This neural network is trained according to a specific algorithm. Results demonstrate that losses incurred using the proposed strategy are substantially lower than losses experienced with traditional strategies employing fixed stop-loss and trailing stop mechanisms.

Faraz et al. [2] employed an autoencoder in their study to compress information extracted from technical indicators. Both the encoder and decoder components of the autoencoder utilized Long Short-Term Memory cells. Following the encoding process and reduction in input dimensionality, the data was then fed into a neural network consisting of LSTM cells for price prediction. Comparative analysis with the Generative Adversarial Networks (GAN) model demonstrated a reduction in error, indicating the superior performance of the LSTM-based autoencoder approach. This suggests the effectiveness of leveraging LSTM cells within the autoencoder framework for information compression and subsequent price prediction tasks, showcasing its potential for enhancing accuracy in financial forecasting models.

Fig. 1.
figure 1

Pipeline of the model, showing the flow of market information carried out by technical indicators from the preprocessing stage, passing through encoding, and reaching the forecasting neural network.

Fig. 2.
figure 2

Architectural scheme of the neural network used for trend market classification. The model uses three BiLSTM layers, one convolutional layer, and two dense layers.

Fig. 3.
figure 3

Architectural scheme of the neural network used for trend slope regression. The model uses a prior convolutional layer, two BiLSTM layers, one convolutional layer, and two dense layers.

3 Proposed Method

In this section, we elucidate the proposed models and main concepts about trend classification and trend slope regression, which are the aims of this study.

Figure 1 illustrates the pipeline of the models of trend prediction. First, the technical indicators—which contain valuable trend information—go through the normalization process. Then, the normalized data passes through an autoencoder to reduce the dimension of features, filter out some inherent market noise, and extract outliers. Finally, the data is sent to a BiLSTM-CNN neural network to forecast the trends. In this step, linear regression is used to train the neural net both in the trend classifier and trend slope regressor.

A set of candles is used in the trend forecasters’ training. Candles are a type of financial instrument that displays four different prices during a given period: the opening, closing, minimum, and maximum prices. Half of this candle set is composed of past candles, and the other half is composed of future candles. The closing prices of this set of candles are then used to perform a linear regression. In conclusion, regarding the trend classifier, a positive linear regression slope indicates a bull market, while a negative linear regression slope indicates a bear market. Regarding the trend slope regressor, the linear regression slope measure provides an approximation of the trend’s strength.

The trend classifier’s architecture is depicted in Fig. 2. It is made up of three BiLSTM layers with a dropout layer to prevent overfitting, a convolutional layer, a Max Pooling layer, and two dense layers. BiLSTM layers have the advantage of being able to obtain time dependences from the past to the future as well as from the future to the past. Additionally, because they come before the CNN layer, they ensure that information crucial to classification is not improperly pruned. Furthermore, the design of the slope regressor is depicted in Fig. 3. Two CNN layers, two BiLSTM layers, a CNN layer, a Max Pooling layer, and two dense layers make up this structure.

3.1 Normalization

Neural networks are sensitive to the size of the input data, so learning may be dominated by features with wider numerical ranges if normalization isn’t applied. This could lead to the model skewing in favor of some features and possibly ignoring others. In that regard, Normalization was used to stabilize the training process by scaling the data to a common range, \([-1,1]\). This accelerates the convergence of the gradient descent technique and increases its efficiency. Additionally, the uniform scaling of normalization helps to achieve better and more balanced weight adjustments during training, which improves the model’s performance and generalization on fresh data. It also prevents numerical instability.

$$\begin{aligned} Norm(x) = \frac{x - \min (X)}{\max (X) - \min (X)} \end{aligned}$$
(1)

In Eq. 1, x represents the value of the sample in a particular feature, \(\min (x)\) represents the minimum value the feature takes, whereas \(\max (x)\) represents its maximum value.

3.2 Autoencoder

An autoencoder is a kind of artificial neural network that is designed to automatically learn effective coding from input data without supervision. An autoencoder is a structure that employs two structures to learn a compressed representation of data: an encoder that codifies the information using the function \(h = f(x)\), and a decoder that tries to match the encoder’s input with its output \(x' = g(h)\).

BiLSTM layers were used to build the autoencoder, which was based on the premise of dimensionality reduction and filtering outliers and market noise. The number of BiLSTM layers in the encoder and decoder components is equal, and each layer has an innate dropout mechanism that keeps the model from overfitting.

3.3 Convolutional Neural Networks

Convolutional neural networks (CNNs) are engineered to autonomously discern and internalize intricate patterns from single- or multi-dimensional datasets. The distinctive feature of CNNs lies in their convolutional layer, where convolutional operations are applied, enabling the identification of patterns such as edges and shapes. The application of CNNs in financial time series analysis offers significant potential for generating valuable insights into market forecasting, identifying investment opportunities, and recognizing price patterns. These networks excel in handling complex sequential data and extracting pertinent information, rendering them indispensable tools in financial analytics. Unlike densely connected neural networks, CNNs leverage weight sharing in convolutional layers, enhancing scalability and enabling the network to learn specific types of patterns more efficiently. This characteristic contributes to the adaptability and effectiveness of CNNs in capturing and interpreting intricate patterns inherent in financial data, empowering traders and analysts with enhanced decision-making capabilities in dynamic market environments.

$$\begin{aligned} f * g (x) = \sum _{-\infty }^{\infty } f(k)g(x-k) \end{aligned}$$
(2)

Equation 2 shows the convolution operation applied to one dimension, where f is the function to pass through the operation, g is the convolution filter and \((*)\) represents the convolution operation.

3.4 LSTM Neural Networks

Long Short-Term Memory networks (LSTMs) represent a specialized variant of recurrent neural networks (RNNs) designed specifically for processing sequential information. LSTMs incorporate three essential channels for managing and processing sequential data. The first channel, termed the “forgetting gate,” regulates the extent to which information is discarded from the long memory, enabling the network to forget irrelevant or outdated information. The second channel, known as the “input gate,” receives new data and assesses its significance in the context of previous information, determining what information to retain and what to discard. This gate plays a crucial role in updating the network’s internal state based on incoming data. Finally, the third channel, the “output gate,” determines which information stored in the long-term memory should be utilized to predict the current output, effectively filtering and selecting relevant memory components for the current context. These three channels interact dynamically, allowing LSTMs to capture and preserve long-term temporal dependencies within sequential data (which financial prices are an example) while mitigating issues such as gradient vanishing or exploding, which are common in traditional RNN architectures.

$$\begin{aligned} i_t = \sigma _g (W_i[x_t,h_{t-1}] + b_i) \end{aligned}$$
(3)
$$\begin{aligned} f_t = \sigma _g(W_f[x_t,h_{t-1}] + b_f) \end{aligned}$$
(4)
$$\begin{aligned} o_t = \sigma _g(W_o[x_t,h_{t-1}] + b_o) \end{aligned}$$
(5)
$$\begin{aligned} c_t = f_t \odot c_{t-1} + i_t \odot \sigma _c(W_c[x_t,h_{t-1}]+b_c) \end{aligned}$$
(6)
$$\begin{aligned} h_t = o_t \odot \sigma _c(c_t) \end{aligned}$$
(7)

In the equations above, \(i_t\), \(f_t\) and \(o_t\) represent respectively the input, forget and output gates at the time step t; W and b represent which one the weights and the bias of the LSTM cell, it’s noted that each gate has their own weights and biases; \(x_t\) is the input of the cell, and \(h_{t-1}\) the hidden state at the previous time step. The operation [yz] represents concatenation. \(\sigma _g\) is the sigmoid function, whereas \(\sigma _c\) is the hyperbolic tangent function. Thus, memory \(c_t\) is formed taking the value of previous memory \(c_{t-1}\) weighted by the forget gate \(f_t\) added the new information given by \(\sigma _c(W_c[x_t,h_{t-1}]+b_c)\) weighted by the input gate \(i_t\). The symbol \(\odot \) represents the Hadamard’s multiplication. Finally, the actual hidden state is formed by the memory \(c_t\) transformed by the function \(\sigma _c\).

4 Experiments and Results

This section assesses the effectiveness of our suggested market trend classifier and trend slope regressor models using various parameter values. While the accuracy, precision, and recall metrics are used to assess the market trend classifier’s performance, the Mean Squared Error, Mean Absolute Error, and Mean Absolute Percentage Error are used to evaluate the trend slope regressor’s performance.

Table 1. Metrics for the Market Trend Classifier
Table 2. Metrics for the Trend Slope Regressor

In this study, Bitcoin data was used to perform the training and testing of the trend predictors. Bitcoin was chosen as it’s the first and most widely recognized cryptocurrency, and holds significant representativity in the market, often serving as the benchmark for other digital assets. It commands the largest market capitalization, influencing market trends and investor sentiment. Bitcoin’s dominance also reflects its role as a store of value and a leading indicator of the cryptocurrency sector’s health and maturity. Said that, the dataset consists of the last 30’000 candles of the BTCUSDT (Bitcoin price given in dollars) in the period between January 01, 2022, and January 01, 2024. Opening, high, low, close, and volume of each candle were used to calculate technical indicators.

The technical indicators were: Exponential Moving Average(EMA), Weighted Moving Average(WMA), Moving Average Convergence Divergence(MACD), Momentum (MOM), Commodities Channel Index(CCI), Detrended Price Oscillator(DPO), Relative Strength Index(RSI), Average True Range(ATR), Stochastic K% and Stochastic D%.

The results for training and test of the market trend classifier are shown in Table 1. The number of candles taken for the linear regression calculation was varied between two to twelve, where half of the candles are future candles and half are past candles. The greater the number of candles, the more the trend is consolidated. It’s clear from the data that market forecasting is possible, given that if the market was all random, the accuracy of all models would be near 50%. Furthermore, as shown in the table, the models trained with more candles show a better performance than the models trained with fewer candles, which suggests that market forecasting is more sympathetic to more consolidated trends.

The results for training and testing of the trend slope regressor are shown in Table 2 and in Fig. 4. Visually, the graphs show, like in the case of the market trend classifier, that the greater the number of candles used in linear regression, the more likely the curves of true and predicted values will match. This phenomenon can be explained by the higher volatility of linear regression calculated with fewer candles. Nevertheless, Table 2 shows bigger values of errors as the number of candles increases. The smallest values of errors come from

Fig. 4.
figure 4figure 4

Trend slope over time, for the trend slope predictor built with (a) 2 candles, (b) 4 candles, (c) 6 candles, (d) 8 candles, (e) 10 candles, and (f) 12 candles. The curves in purple show the true values of the slope, whereas the curves in blue show the predicted values. The slopes are shown for train and test samples (Color figure online)

the case in which only 2 candles were used for linear regression calculation, and we see clearly from the pictures that was the worst scenario. This occurs because the model can be stuck in local minima in the gradient descent part of training, delivering small values of error, whereas the overall predicted curve does not match the real curve.

5 Conclusion

This paper presents a novel strategy for training financial predictors using linear regression, focusing on cryptocurrencies, especially Bitcoin, due to its market significance and trend forecasting complexities. The proposed architecture employs Bidirectional Long Short-Term Memory (BiLSTM) and Convolutional layers to capture spatial and temporal patterns in the data. Unlike traditional methods, this approach uses linear regression over a variable set of 2 to 12 candles -half future and half past candles- to better represent market trends.

The trend classifier shows promising potential, achieving significantly higher accuracy than the 50% random benchmark. Increasing the number of candles in the linear regression set enhances accuracy by providing more consolidated trends, which are easier to forecast. Consolidated trends exhibit stable patterns, while less consolidated ones are more volatile and random. Thus, using more candles in the linear regression set leads to more reliable predictions by effectively capturing market dynamics. The trend slope regressor shows discrepancies between predicted trend slopes and error metrics, with better matches for more consolidated trends.

Future research could explore a broader set of indicators and the maximum number of candles predictors can handle before accuracy degrades. Such advancements could enhance the model’s predictive capabilities. The trend classification and slope regression techniques in this study offer valuable tools for investors, enabling precise identification of trend reversals and continuations, potentially leading to greater profits.