1 Introduction

A stock market is where companies can raise funds for their business by selling fractional shares. Many individuals buy and sell these shares regularly to generate profits, a strategy commonly referred to as day trading.

Day trading involves purchasing and selling shares on the same day to profit from price differences. Traders who use this strategy often rely on statistical models and candlestick patterns to predict future prices, even though the market may be unpredictable [6].

In contrast to traders who use statistical models or candlesticks, some investors perform fundamental analysis. These individuals base their actions on various factors within the company, such as its profit, market value, return on equity, and other indicators.

According to Vijh et al. [24], the most challenging aspect of predicting stock prices is the influence of external factors, such as the global economy and political conditions. Despite this challenge, previous works in this field often employ classical algorithms, such as Linear Regression, Random Walk Theory [11], Moving Average Convergence/Divergence [2], Autoregressive Moving Average [22], and Autoregressive Integrated Moving Average [3] to predict stock prices.

Traditional machine learning techniques, such as Support Vector Machines (SVMs) [14] and Random Forests [19], can be used to enhance stock market prediction. Additionally, many researchers are exploring artificial neural networks (ANNs) to predict stock prices and automate trading for passive income. For example, Lu et al. [17] and Singh and Srivastava [21] utilize ANNs for this purpose. Currently, certain types of ANNs, such as Recurrent Neural Networks (RNNs) [8], are showing promising results. These networks analyze current data and use previous data to make a single prediction.

In [9], the authors demonstrated that deep learning models, specifically multi-layer perceptron, recursive (RNNs), and convolutional networks, can be more effective than traditional statistical methods, like ARIMA, for stock prediction. The authors trained the models on data from a single company listed on India’s National Stock Exchange (NSE) and evaluated the approach on five stocks listed on the New York Stock Exchange (NYSE).

Many works in the literature explore machine learning using sequential models for forecasting stock prices [8, 9], often incorporating filters such as moving averages to smooth time series data. Some studies employ convolutional networks that learn to process signals via filters, yet these networks tend to neglect temporal information. This observation steered us towards a research question: Can the integration of convolutional and recursive LSTM blocks within a singular architecture bolster the results?

Simultaneously, this work aims to probe a further research question: What is the impact of training and testing with different data distributions? A majority of the existing studies prioritize predicting stock trends from a more uniform data distribution, often concentrating on the same stock market. To heighten the level of challenge, we ran experiments to predict from different distributions: specifically, training with data drawn from the Indian Stock Market and testing the predictions using data from the New York Stock Market.

Our experiments indicate a substantial improvement and strongly suggest that pre-processing data through Convolutional Neural Networks (CNN) benefits LSTM blocks for the task in question. This approach underscores the potential efficacy of incorporating convolutional layers in sequence prediction tasks, thereby contributing to the broader conversation around stock market prediction methodologies

This manuscript is divided as follows. Section 2 shows similar papers written by different researchers and the reported results. Section 3 presents the data and models used and how they were built. Section 4 addresses the experiments made and compares the results obtained and the literature results Hiransha et al. [9]. Finally, conclusions are presented in Sect. 5.

2 Related Works

In this section, we present several works that use Machine Learning and Deep Learning for stock prediction. Besides, it is also presented the kind of data used in these works.

Stock prediction is a challenging and popular research area that, if successful, can yield profitable results. Jiang [12] presents several papers on this topic, where most utilize standard data (open, close, high, low, and volume) to train models and compare their performance with real-market results. For example, Vijh et al. [24] collected ten years of data on five USA-based companies using Yahoo Finance and trained a low-complexity ANN model, which achieved an RMSE of 0.42 and a MAPE of 0.77% in the best result.

In addition to predicting stock prices, models can be built to determine when to buy or sell stocks, as demonstrated by Santuci et al. [20], who compared SVM and Random Forest models to predict such events. In the best case, their work yielded a profit of 17.74%.

Another internal topic for discussion is the accuracy of different models in predicting stock prices, which is demonstrated by Huang et al. [10] using two models: Feed-forward Neural Network and Adaptive Neural Fuzzy Inference System.

Pang et al. [18] used two different types of LSTM to make predictions based on Shanghai A-share data, achieving an MSE of 0.017 in the best test. Many external factors can influence the stock price, and Jin et al. [13] utilized an external indicator called sentiment to indicate how investors feel about the economy or a specific stock. Das et al. [4] also used sentiment data extracted from tweets on Twitter to predict stock prices. These papers show that the market responds to people’s actions, and using this information can better represent the market than numbers.

Agrawal et al. [1] utilized Stock Technical Indicators to predict not the stock price but the trend of the price, i.e., whether the price will increase or decrease. Their work achieved an accuracy of 63.59% in the best case.

Despite the works presented in this section using different datasets, the objective is stock price prediction. These works were selected due to the made of traditional machine learning and different neural network architectures applied to the stock price prediction context.

3 Proposed Methodology

s This section shows how the research was driven and the specific models definition. Subsection 3.1 describes the data used and how it was collected. Subsection 3.2 explains the baseline technique used for comparison purposes and also the proposed architecture, combining deep learning architectures. The metrics used to evaluate the proposed approaches are described in Subsect. 3.3.

3.1 Dataset

The first step is data collection by downloading the stock information from the website infomoney.com. The collected raw data contains:

  • Date: The days to which data refer. This data is a string with the month with three letters, followed by the day and year.

  • High: The highest share price on a specific day. This data is a float number.

  • Close: The share price when the stock market closes on a specific day. This data is a float number too.

  • Open: The share price when the stock market opens on a specific day. This also is a float number.

  • Low: The lowest share price on a specific day. And this is a float number.

  • Volume: The number of transactions that happen on that day. This data is a integer number, but in the case of a thousand volumes, it is represented with the number divided by 1000 followed by a letter “K”. And if is a million or more the number is divided by one million and followed by the letter “M”.

  • Change: The difference in price, in percentage, that happens on a specific day. This data wasn’t used in the model but is presented because it was collected along with others.

This data is a time series, so the previous samples directly influence the following sample. The close price rarely moves from a low value to a high value (or vice versa) without the other data that has changed drastically.

Based on Hiransha et al. [9], we chose Tata Motors Ltd. (which uses the token TAMO) stock as training data and used Axis Bank (AXBK), Maruti (MRTI), and HCL (HCLT) as testing and the close price as to be predicted. Table 1 shows the datasets used in this work and the period collected. The data was collected respecting the date range used in [9] in order to directly compare the results.

Table 1. Data collected of the stocks.

The second step consists of preprocessing and cleaning the data due to the nature of the raw data being different from the one requested by a machine learning model. The objective is to delete null data and convert the fields to float numbers. We removed the “date” and “change” columns as they are unnecessary. The “date” is due to its nature which is an index, and the “change” column because it is a linear combination of the other fields. And last, before the training, it is mounted frames with a window of several days for training and testing according to the machine learning model needs. Besides, the letters “M” and “K” are also converted to “000000” and “000” respectively. For example, the string “14k” would be converted to 14,000 and “2M” would be converted to 2,000,000.

Finally, the data needs to be normalized due to the high differences in the features scale. By normalizing the data, we ensure that all data has the same impact on training. The normalization strategy adopted was the min-max, which can be defined as:

$$\begin{aligned} x_{norm} = \frac{(x - x_{min})}{x_{max} -x_{min}}, \end{aligned}$$
(1)

where x is a unit of data, \(x_{max}\) is the largest value in the dataset, \(x_{min}\) is the smallest value in the dataset, and \(x_{norm}\) is the return that will replace x.

3.2 Model Definition and Training

Two strategies are evaluated, one base on a traditional machine learning strategy and another based on deep learning architectures.

The first model is based on the Support Vector Regression (SVR) [16]. The SVR works like SVM (Support Vector Machine), but instead of making a classification based on the hyperplane division, it calculates and returns the distance between the point and the hyperplane as output. In this work, the SVR model uses information from a single day to predict the close price of the next day.

As in trade operations, recent data is just as important as current ones since the current stock value depends on the previous ones. RNN works well because it uses the current data and previous ones. So the input is a frame with the several days including the one right before the day that will be predicted. One example of RNN is the long short-term memory (LSTM) architectures [7]. We made a sequence of empirical tests with a different number of layers to compare different LSTM architectures and try to find the best fit for the problem treated. We fix the number of 75 cells in each layer, a dropout of 20% (avoid overfitting [23]), and the last layer as a dense layer with one node for the regression.

The LSTM model learns the time series. Towards to extract more features from the time series, a Convolutional Neural Network (CNN) [15] is applied. So, instead of the LSTM learning directly from the raw data, it could learn from more complex features learned by convolution layers of a CNN. This strategy was observed in other works that use a 1D signals [5]. The CNN-LSTM model was constructed by using the best-performing LSTM model, adding a one-dimensional convolutional layer with 32 filters, ReLU activation, followed by a max-pooling layer of size 4. The values for the CNN and max-pooling layer were determined empirically.

3.3 Evaluation Metrics

To evaluate the effectiveness of the proposed approaches, we used three different evaluation metrics, all available on the scikit-learn library. The first one is the root mean square error (RMSE), which is the mean error squared to turn positive the metric, and the square root to move back to the original proportion. Besides, it also focuses on emphasizing large outlier errors. The metric is defined as

$$\begin{aligned} RMSE = \sqrt{\frac{\sum {(y_i - y_p)^2}}{n}}, \end{aligned}$$
(2)

where \(y_i\) represents the right value, \(y_p\) is the value predicted by the model, and n is the number of predictions made in tests.

The second is mean absolute error (MAE), which is the module of the mean error and it is represented by the formula

$$\begin{aligned} MAE = \frac{\sum |(y_i - y_p)|}{n}, \end{aligned}$$
(3)

in which \(y_i\), \(y_p\) and n are the same presented for RMSE metric.

The last metric used was the mean average percentage error (MAPE), which is represented by the formula

$$\begin{aligned} MAPE = \frac{\sum {|y_i - y_p|/|y_p|}}{n} \end{aligned}$$
(4)

in which \(y_i\), \(y_p\) and n are the same presented for RMSE and MAE metrics.

4 Experiments and Results

This section is about showing, explaining, and discussing the results. All the training and testing were made in a computer with 128 GB RAM, Titan X 12 GB VRAM, and Intel(R) Core(TM) i7-5820K CPU @ 3.30 GHz. The framework used was Tensorflow 2 and Scikit-learn 1.1 for Python3.

We aim to answer the two researches questions of this work: (1) Can integrating convolutional and recursive LSTM blocks within a singular architecture enhance prediction accuracy? and (2) What is the impact of training and testing with disparate data distributions? To correctly answer the second question, first we propose baseline results using the SVR method. To answer the first question, first we evaluate a proper LSTM architecture and then, add CNN layer.

All the models were trained for 10,000 epochs. The SVR model has an RBF kernel and a regularization parameter of 1. The LSTM and CNN-LSTM models were trained with a batch size of 2,048, Adam optimizer, and mean squared error as a loss. The models were trained with the TAMO dataset and tested using the HCLT, MRTI, and AXBK datasets, following the pattern of the work carried out by Hiransha et al. [9].

4.1 SVR Model

The first model evaluated was the SVR. In all three datasets tested, there is a gap between the predicted close price and the real one. This can be seen in Figs. 1, 2 and 3 and the error measured in Table 2.

Fig. 1.
figure 1

Predictions made by SVR model for the Axis Bank dataset. Time in days.

Fig. 2.
figure 2

Predictions made by SVR model for the HCL Techonologies dataset. Time in days.

Table 2. Metrics of result in SVR model

It is possible to observe that the predicted close prices from SVR have a smoother line compared to the real one. As the SVR only uses the data from today to predict tomorrow, the information from the previous days are not used, and the model cannot follow the real close price. The greatest gap is observed in the HCL Technologies dataset, which has a bigger variance as the range of values is greater.

Fig. 3.
figure 3

Predictions made by SVR model for the Maruti dataset. Time in days.

4.2 LSTM Model

We conduct an empirical study to define the best LSTM architecture to reach the highest MAPE in the TAMO dataset. Different models were evaluated by varying the size of the layer. After that, the model with the best results in the TAMO data was chosen.

As shown in Table 3, the model with the slowest result has six layers. All these layers were intercalated with Dropout layers of 20% to minimize the chance of overfitting. The best LSTM architecture can be seen in Fig. 4.

Fig. 4.
figure 4

LSTM architecture with six layers with 75 unities in each one.

As seen in Figs. 5, 6 and 7, the LSTM model could predict the closest stocks value compared to the SVR one. This scenario can be seen in Table 4, where the reported error are below the ones reported in Table 2. MAPE below three shows that the distance between the predicted close price and the real one is below 3.

Fig. 5.
figure 5

Predictions made by LSTM model for the Axis Bank dataset. Time in days.

Fig. 6.
figure 6

Predictions made by LSTM model for the HCL Technologies dataset. Time in days.

Table 3. Reported metrics of an empirical study with LSTM models. The model with smaller MAPE, RMSE, and MAE is with six layers.
Fig. 7.
figure 7

Predictions made by LSTM model for the Maruti dataset. Time in days.

Table 4. Metrics of result in LSTM model

Compared to the SVR models, the LSTM models predict the closed price near the real one. One of the reasons is the number of days used for each model: while the SVR uses only one day, the LSTM uses 12  days. Similar to the SVR model, HCL Technologies also presented the highest gap. The LSTM model could follow the real close price curve when the variance is small, although, when it is high, the model could not follow the real curve and presented a smoother behavior.

4.3 CNN-LSTM Model

The last model build was with a combination of CNN and LSTM. This model has two 1D convolution layers and was defined with empirical tests, stacking the convolutional layers before the LSTM with six layers and dropout defined in the experiments presented in Table 3. The CNN-LSTM architecture is seen in Fig. 8.

Fig. 8.
figure 8

CNN-LSTM architecture with a convolutional layer with 32 filters, kernel size equal to 8 and ReLU activation followed by six LSTM layers with 75 unities in each one.

As can be seen in Figs. 9, 10 and 11, the line that represents the prediction almost can not be seen, which means that the predicted stock price is very close to the actual value. Some parts of the graphic have a peak, but this does not affect the result in general. The errors can be seen in Table 5.

Table 5. Metrics of result in CNN-LSTM model

As Tables 4 and 2 show, a smaller MAPE is reported compared to the one reported by Hiransha et al. [9]. While Hiransha et al. [9] reached the MAPE of 4.88, the proposed CNN-LSTM model reached 1.83.

So, as can be seen in Fig. 12, on a large scale, all the models follow the trend of the stock market, but, in a zoomed view of the predictions, the models are not accurate. We improved the curve accuracy by adding CNN in the LSTM model, which forced the curve to stay softer, following the close price. Therefore, the model with CNN succeeds in extracting features from the data and improving the results.

Fig. 9.
figure 9

Predictions made by CNN-LSTM model in Axis Bank dataset. Time in days.

Fig. 10.
figure 10

Predictions made by CNN-LSTM model in HCL Technologies dataset. Time in days.

4.4 Results Discussion

As seen in Table 6, the proposed approach overcomes the baseline model proposed by Hiransha et al. [9]. The results show that it will be possible to build a model that can precisely predict the prices in a near future. But, despite the reported results, it is important to observe that the market floats with external factors, which makes the model use quite risky.

Fig. 11.
figure 11

Predictions made by CNN-LSTM model in Maruti dataset. Time in days.

Fig. 12.
figure 12

Prediction made by all models with a zoomed view of the last 100 days predicted in MRTI dataset.

Table 6. MAPE of all models trained. AXBK = Axis Bank; HCLT = HCL; MRTI = Maruti.

As Table 6 shows, the CNN-LSTM model has better results. The intuition of getting smaller errors maybe is the information of more days used by the CNN-LSTM and the data processing made by the convolution layer. Gathering more data with different prices and fluctuations can result in a more general model.

Comparing the much higher error of the SVR to that of the LSTM and CNN-LSTM, it is visible how the use of only one day can be detrimental to the algorithm’s performance. As only one day is used, long-term behavior is not taken into account, which tends to generate results that are inconsistent with those expected.

5 Conclusion and Future Works

In this work, we analyzed how models work in long and short periods to predict close stock prices, a challenging task. Three different architectures were evaluated: a traditional machine learning algorithm (SVR) and two deep learning algorithms (LSTM and CNN).

As can be seen, adding a CNN layer to the LSTM model increased the performance by 0.75% in the best case. This finding aligns with the primary research question addressed in this study. The results also address the second research question, once the models could generalize to a dataset different from the one a model was trained on.

As the input is a time series, the LSTM model overcomes the SVR one. Another analysis is that the convolutional layers can extract features in tabular data and increase the results when added to the LSTM model.

In future work, two research paths to reduce the error will be explored: (1) an ensemble model, which combines some models responses, like CNN, RNN, LSTM, and transformers, and (2) gathering external information about the companies to predict the close price, such as the response of sentiment analysis about news regarding the companies and other negative or positive aspects of the company.