key: cord-0987990-apxd1wru authors: Mudassir, Mohammed; Bennbaia, Shada; Unal, Devrim; Hammoudeh, Mohammad title: Time-series forecasting of Bitcoin prices using high-dimensional features: a machine learning approach date: 2020-07-04 journal: Neural Comput Appl DOI: 10.1007/s00521-020-05129-6 sha: bb8dd6d45462671d99b345ba5744c6f1859f01b1 doc_id: 987990 cord_uid: apxd1wru Bitcoin is a decentralized cryptocurrency, which is a type of digital asset that provides the basis for peer-to-peer financial transactions based on blockchain technology. One of the main problems with decentralized cryptocurrencies is price volatility, which indicates the need for studying the underlying price model. Moreover, Bitcoin prices exhibit non-stationary behavior, where the statistical distribution of data changes over time. This paper demonstrates high-performance machine learning-based classification and regression models for predicting Bitcoin price movements and prices in short and medium terms. In previous works, machine learning-based classification has been studied for an only one-day time frame, while this work goes beyond that by using machine learning-based models for one, seven, thirty and ninety days. The developed models are feasible and have high performance, with the classification models scoring up to 65% accuracy for next-day forecast and scoring from 62 to 64% accuracy for seventh–ninetieth-day forecast. For daily price forecast, the error percentage is as low as 1.44%, while it varies from 2.88 to 4.10% for horizons of seven to ninety days. These results indicate that the presented models outperform the existing models in the literature. Digital transformation of economies is the most serious disruption that is taking place now in all economies and financial systems. The economies and financial systems of the world are becoming digital at an unprecedentedly fast pace. According to a recent report, the size of digital economy in 2025 is estimated to be 25% (23 trillion USD), consisting of tangible and intangible digital assets [1] . The most recent technology for establishing and spending digital assets is the distributed ledger technology (DLT), and its most well-known application being the cryptocurrency named Bitcoin [2] . Following these developments, blockchain technology has found its place in the intersections of Fintech and next-generation networks [3] . An important issue about the non-tangible digital assets, and especially cryptocurrencies, is price volatility. The price of Bitcoin (BTC) for the period of April 1, 2013, to December 31, 2019, can be seen in Fig. 1 . BTC prices have exhibited extreme volatility in this period. The price has increased 1900% in the year 2017, consecutively losing 72% of its value in 2018 [4] . Prior to 2013, the popular interest in BTC, its usage in virtual transactions and its prices have been low. That period is not considered in our models. Although the BTC prices exhibit extraordinary volatility, BTC as a digital asset is quite resilient as it can regain its value after significant drops, and even when the uncertainty is high in the market such as during the COVID-19 pandemic [5] . Despite its rapidly changing nature, the price of BTC has been an area where various researchers have presented efforts for price forecast. A number of studies have discussed whether BTC prices are predictable using technical indicators and demonstrated the existence of significant return predictability [6, 7] . Other recent studies such as [8, 9] and [10] have applied various machine learningrelated methods for end-of-day price forecast and price increase/decrease forecasting. [9] reported maximum accuracy up to 63% for forecasting of increase or decrease of prices. [10] reported 98% success rate for daily price forecast. However, the time periods of these studies have been limited by data-up to April 1, 2017 [10] and up to March 5, 2018 [9] . We believe that a current study is needed considering the volume of the BTC price movements that occurred after these dates. Secondly, the cited works focus on end-of-day closing price forecast and price increase/decrease forecasting for the next day prices. In our study, we address mid-term price forecast and increase/ decrease forecasting for horizons of forecast ranging from 7 day to 90 days, as well as daily closing price forecast, and price increase/decrease forecasting for the short term (endof-day and next day). In addition, this is the first study that takes into consideration all the price indicators up to December 31, 2019, and provides highly accurate end-ofday, short-term (7 days) and mid-term (30 and 90 days) BTC price forecasts using machine learning. Our performance results indicate that our results are better than the latest literature in daily closing price forecast and price increase/decrease forecasting. Additionally, we present high-performance neural-network-based models for medium term (7, 30 and 90 days) BTC price forecasts and price increase/decrease forecasting. When Bitcoin began to get worldwide attention at end of 2013, it witnessed a significant fluctuation in its value and number of transactions [11] . A strand of literature has examined the predictability of BTC returns through various parameters such as social media attention [12, 13] and BTC-related historical technical indicators [14] . One group studied the period from September 4, 2014, to August 31, 2018 , by capturing the number of times the term ''Bitcoin'' has been tweeted. The results showed that the number of tweets on Twitter can influence BTC trading volume for the following day [15] . Moreover, [16] studied the influence of users comments in online platforms on price fluctuations and number of transaction of cryptocurrencies and found that BTC is particularly correlated with the number of positive comments on social media. They reported an accuracy of 79% along with Granger causality test, which implies that user opinions are useful to predict the price fluctuations. When it comes to time-series forecasts, there are three different types of model based approaches for time-series forecast according to [17] . The first approach, pure models, only uses the historical data on the variable to be predicted. Examples of pure time-series forecast models are Autoregressive Integrated Moving Average (ARIMA) [18] and Generalized AutoRegressive Conditional Heteroskedasticity (GARCH) [19] . [20] presents an ARIMA-based timeseries forecast for next-day BTC prices. However, we have not yet seen a study based on GARCH. Pure time-series models are more appropriate for univariate and stationary time-series data. In this paper, we focus on machine learning with higher level features rather than the traditional models for the following reasons. First of all, BTC prices are highly volatile and non-stationary. We demonstrate that BTC prices are non-stationary in the next section. Secondly, there are a large number of features in the data and the proposed machine learning methodology handles autocorrelation, seasonality and trend effects, while the training process of pure time-series models require manual tuning to address these effects. The second approach, explanatory models, uses a function of predictor variables to predict the target variable in a future time. Model-based time-series forecast approaches have the disadvantage of making a prior assumption about data distributions. For example, [20] and [21] are based on a log-transformation of the BTC prices. Similarly, [21] used daily BTC data from September 2011 up to August 2017 to conduct an empirical study on modeling and predicting the BTC price that compare the Bayesian neural network (BNN) with other linear and nonlinear benchmark models. They found that BNN Neural Computing and Applications performs well in predicting BTC log-transformed price, explaining the high volatility of the BTC price. However, the above-mentioned studies have used log-transformed prices for reporting performance metrics, which are misleading, as such values tend to be lower than performance metrics computed using real prices. We have analyzed this by calculating the performance metrics using log-normalized values and comparing against the non-log-normalized ones for our own results and found that although the lognormalized price forecast reports a much lower MAPE value, the actual error may be up to 10 times higher. Since cryptocurrency prices are nonlinear and non-stationary, the assumptions on data distributions may have adverse effects on the forecast performance. Non-stationary time-series models exhibit evolving statistical distributions over time, which results in a changing dependency behavior between the input and output variables. Machine learning-based approaches utilize the inherent nonlinear and non-stationary aspects of the data. They can also take advantage of the explanatory features by taking into consideration the underlying factors affecting the predicted variable. There are several research studies on modeling and forecasting the price of BTC using machine learning, [22] used Bayesian regression method that utilizes latent source model which was developed by [23] to predict the price variation using BTC historical data. [24] used machine learning and feature engineering to investigate how the BTC network features can influence the BTC price movements. They obtained classification accuracy of 55%. [9] used artificial neural network (ANN) to achieve a classification accuracy of 65%. Furthermore, [25] predicted the BTC price using Bayesian optimized recurrent neural network (RNN) and long short-term memory (LSTM). The classification accuracy they achieved was 52% using LSTM with RMSE of 8%. They also reported that in forecasting, the nonlinear deep learning models performed better than ARIMA. [10] employed ANN and SVM algorithms in regression models to predict the minimum, maximum and closing BTS prices and reported that SVM algorithm performed best with MAPE of 1.58%. One of the latest studies in predicting BTC daily prices is by [8] , which used high-dimensional features with different machine learning algorithms such as SVM, LSTM and random forest. For next-day price forecast from July 2017 to January 2018, the highest accuracy of 65.3% was achieved by SVM. In this study, we are focusing on the time-series forecast of BTC prices using machine learning. A time-series is a set of data values with respect to successive moments in time. Time-series forecast is the forecast of future behavior by analyzing time-series data. The objective is to estimate the value of a target variable x in a future time point x½t þ s ¼ f ðx½t; x½t À 1; :::; x½t À nÞ; s [ 0, where s is the horizon for forecast. We take into consideration, end-ofday, 7, 30 and 90 days as the horizon for forecast. Figure 2 gives an overview of the main ideas used in this paper. The ML-based time-series forecast method starts with the construction of a dataset. This is followed by the training of ML models and forecasts based on these models for different horizons of forecast. Time-series forecast on cryptocurrency prices has underlying interdependencies that are hard to understand and model. For example, there are statistical factors such as variance and standard deviation that changes over time. Those interdependencies show up as technical indicators, which are explained in Sect. 3.1. In our study, open data sources have been utilized for gathering the BTC price technical indicators. In the data pre-processing step, data are gathered, cleaned and scaled/normalized. The collected BTC data are processed and divided into three intervals. Feature selection is used to identify relevant features. Furthermore, based on the third interval, the datasets of nth day forecast/forecast are created. We produce multiple datasets for different horizon of forecasts, for three different time periods, and exercise feature selection separately for each dataset. Feature selection is the most important step in ML for time-series forecast and explained in Fig. 5 and in Sect. 3.2.1. Feature selection is done to extract high ranking features from each of these datasets, using the random forest (RF) method and pruned based on variance inflation factor (VIF) and Pearson cross-correlation. The candidate features are explanatory features based on different statistical data about the operation of the blockchain itself, as well as technical market indicators. The datasets are split into training and validation sets. The ML classification and regression models are trained on the training split and validated on the holdout split. The main difference of ML-based approaches from model-based methods for time series is the training phase. ML methods extract high-dimensional statistical trends and underlying features from the training data to allow it to predict the outcome in previously unseen cases. ML for time-series forecast can be used for classification and regression. We use the following ML models for classification and regression: artificial neural network (ANN), stacked artificial neural network (SANN), support vector machines (SVM) and long short-term memory (LSTM). The classification is applied as follows: If the BTC daily closing price P BTC ½t þ 1 À P BTC ½t ! 0, then y½t ¼ þ1, and if P BTC ½t þ 1 À P BTC ½t, then y½t ¼ 0, where y[t] is a target variable for categories of increasing and decreasing price. The regression models are used to predict BTC prices in a horizon of forecast for end-of-day, 7, 30 and 90 days. Figure 3 depicts the detailed steps of ML-based methodology used in this paper. BTC prices prove to be a non-stationary time series, based on the unit root augmented Dickey-Fuller (ADF) testing, with ADF statistic of À1:6188 at 1% significance level (ADF critical ¼ À3:433; p ¼ 0:47). For higher-order autoregressive processes given by (1), the ADF test checks for the non-stationarity by testing the null hypothesis, H 0 : d ¼ 0, against the alternative, H 1 : d. Failing to reject the null hypothesis (p [ 0:05) indicates the time series has a unit root and a trend. where D is the finite difference operator, the variable of interest is y t , a is a constant, f is the coefficient of the deterministic trend, y tÀ1 is the lagged series, d is the coefficient of the lagged series, b i is the coefficient of the lags, and t is the residual error. Non-stationary time-series data exhibit varying statistical properties as shown in Fig. 4 . The box and whisker plots of the 3 different linear segments of the BTC prices time series. The time series is divided in 3 segments each segment consisting of 822 days. Each of the segment has different means, standard deviations, maximum and minimum prices as shown in Table 1 . BTC features and price data are available online and freely accessible. The data for this study were collected from https://bitinfocharts.com by a using web scraper written in In this study, three data intervals were considered for comparing with the state-of-the art given by [10] and [9] . In the first interval, data between April 1, 2013, and July 19, 2016, were considered. The second interval consists of data from April 1, 2013, to April 1, 2017. The third and the largest interval contains data from April 1, 2013, to December 31, 2019. This interval has not been previously studied in the literature. In pre-processing, missing cases were imputed using linear interpolation method wherever possible. Otherwise, most frequently occurring value within the feature is used for imputation. For all the regression models, the dataset was shuffled and split into two sets: training set and validation set. 20% of the data were held for validation, and 80% of the data were used for training. Fivefold cross-validation was applied to the training set for training the stacked artificial neural network. For all the classification models, the dataset was linearly split into two sets: training and validation. The first 80% of the data were assigned to training set, and the last 20% were kept for validation. For training ANN, stacked ANN and LSTM models, the features were scaled using the robust scaling followed by minmax scaling method. With the minmax scaling, the features are shifted between 0 and 1 while preserving the relative magnitude of the outliers. The robust scaling method uses the median and the interquartile range to scale the data. The parameters of scaling are fit using the training set and transformed to both training and validation sets. For training SVM, standard scaling was applied to the features as it improved the model performance compared to the other scaling methods based on our data. For nth-day price forecast, the train-test split is the same except that the price column is shifted upward (or equivalently, backward in the time domain) based on the number of days required. For instance, for 7th-day forecasts, the price column is shifted upward by 7 days. This enables the regression models to learn the relation between the features and future prices. For classification models, the price is converted to categorical value by assigning a value of 1 if the price increases or remains the same compared to the previous day. It is assigned a 0 otherwise. For forecasting the nth-day price, the same technique is used. For instance, for 30th-day price forecasting, the 30th-day price is compared to today's price and the category 0 or 1 is assigned as appropriate. The effect of price outliers on the performance of the models was studied, and removing them resulted in improved performance. Isolation Forest method [26] was used for this case. This is an unsupervised method for detecting outliers based on decision forest. It is built based on the assumptions that outliers tend to be few in number and have properties unlike the bulk of the data. For instance, a randomly occurring unusually large spike in BTC price data can be considered an outlier. Removing about 10% of the outliers increased model performance for most of the ML models. A few models performed well despite the outliers. Feature selection, which is a crucial part of data pre-processing, is necessary to improve model performance. The features were extracted and pruned iteratively using a number of different approaches. Firstly, feature importance was determined using an ensemble method based on random decision forest. Secondly, the reduced feature set was checked for multi-collinearity and cross-correlations. Variance inflation factor (VIF) and Pearson correlation were used for these steps. The resulting subset of features The ratio of the fee sent in a transaction to the reward for verifying that transaction by the other users were of relatively high importance with low cross-correlation values and no multi-collinearity. The feature selection is repeated for each of the three intervals. When forecasting or predicting for nth day, the feature selection process is reiterated to create a new subset of features that are better suited for the period of interest. For instance, the features that can forecast 7-day-ahead price movements fails to forecast 90-days-ahead prices reasonably. Furthermore, feature selection process is required for classification models in each interval after encoding them into categories such as increase, 1, or decrease, 0. Figure 5 shows the feature selection process. Random forest is an ensemble machine learning method based on decision trees that can be applied to both regression and classification problems. Unlike a single decision tree, a random forest can use hundreds of trees to make forecasts giving better results. It does not require extensive training and is useful for relatively small datasets and for quick evaluation. The features that contributed to the forecast results are given importance scores, which can be inspected for tuning the results such as by keeping or removing those features. Since random forest does not consider multi-collinearity and cross-correlations, other methods need to be used to check for those issues. VIF is used for measuring the collinearity in a multiple regression model. It compares the difference between a model with multiple features and the same model with a lone feature. This indicates the variability that occurs in the model due to having a feature that correlates with another feature present in the model. While VIF 10 can be accepted, some suggest using VIF 5. Table 3 shows some of the features that are common to several forecast periods based on the feature importance determined by random forest. To elaborate on the feature nomenclature, take the feature label median_transac-tion_fee30trxUSD for instance. This is the 30-day triple moving exponential smoothing of the median transaction fee of BTC given in terms of USD exchange rate at that time. An alternative to manual feature selection is dimensionality reduction by principal component analysis (PCA). In this way, all the features are transformed into a new set of components through matrix manipulations. These new features are linearly independent. A few datasets have been prepared using PCA that captures 95% of the variance in the original dataset. We modeled the Bitcoin prices using different machine learning regression and classification models based on Artificial neural network is a machine learning model that consists of an input layer, an output layer and one or more hidden layers. ANNs are universal function approximators [27] and widely used in machine learning for forecasts and classifications. The ANN model is trained on the training split with hyperparameter tuning for optimal performance. Satisfactory results were obtained using the configuration shown in Table 4 for Interval II. For this model and most other ANNs, the stochastic gradient-based optimizer Adam [28] was used as it performed better in comparison with other gradient-based optimizers on our dataset. Hidden layers, number of neurons per hidden layer, learning rate, epochs and batch sizes were tuned empirically to obtain optimum results. The loss function logcosh was used as it is less affected by sparsely distributed large forecast errors than the commonly used mean squared error. The rectified linear unit (ReLU) [29] was used as activation function as it is more robust to the vanishing gradient problem. Multiple ANNs can be used to create an ML model by a technique called stacking. The stacked ANN (SANN) consists of 5 individual ANN models that are used to train a larger ANN. The individual models were trained using the training split with fivefold cross-validation-each model trained on a separate fold. As ANNs are stochastic, each trained model has different weights enabling them to learn their respective fold well. The final ANN learns from these different models, thereby outperforming any individual model over the whole training set. Figure 6 shows the stacked architecture of the SANN regression model. In this figure, the train split is divided into fivefolds. A separate ANN is trained on each of these folds. The output of these ANNs is fed to the final ANN. The final ANN uses the test split to compare the outputs from the smaller ANNs and uses the best output as its input to make forecasts. Although it uses the test split in deciding which output to choose from the smaller input, it does not learn from the test split. The SANN model is different from the ANN model that is trained on the whole train split. The SANN model does not directly learn from the whole train split but rather trains on the outputs of the individual smaller ANNs. As a supervised machine learning algorithm, SVM is used for both classification and regression problems. SVM is based on the idea of separating the data points in the training split using hyperplanes such that the distance of separation is maximum. The support vectors are points closest to the hyperplane that are used for calculating its position. SVM kernels can be linear or nonlinear, which includes radial basis function (RBF), hyperbolic tangent and polynomial. For small datasets, SVM can yield forecasts with low error rates without requiring extensive training time. For computing the SVM, either of the objective functions based on L1 (2) or L2 SVM (3) has to be minimized subject to the condition given by (4) . The Gaussian RBF kernel is given by (5). where the slack variable is f i , the penalty is C, and w is the normal to the hyperplane. where x i and y i are data points, and /ðx i Þ is the data transformation. The offset of the hyperplane from the origin along the normal of the hyperplane, w, is given by b kwk . where kx i À x j k 2 is the square of the Euclidean distance between the features x i and x j , and r is a free parameter. Long-short term memory (LSTM) network is a type of recurrent neural network that can learn from both long-and short-term dependencies. This deep learning model is particularly useful for modeling and forecasting time-series data. Since the daily Bitcoin price and its features are timeseries data, LSTM can be used for making price forecasts and forecasting rise or fall of BTC prices. An LSTM block is analogous to the neuron in the ANN. It has three gates represented by the sigmoid functions: forget (f), input (i) and output (o) gates. In the LSTM block, C tÀ1 is the memory or cell state from the previous block, h tÀ1 is the previous block output, X t is the vector input, C t is the cell state or memory of the present block, and h t is the output of the current block. At the junction, the Hadamard product is performed element wise, and likewise at the þ junction the summation is done element wise. The LSTM gates and cell states equations are given by (6) to (11) . where f t is the activation vector of the forget gate, W and U are the weight matrices, b is the bias vector, and r g is the sigmoid function. where i t is the action vector of the input or update gate. where o t is the activation vector of the output gate. where the activation vector of the cell input is given by c t and r h is the hyperbolic tangent function. where c t is the cell state or memory vector. where h t is the output vector of the LSTM block or the hidden state vector. In this section, we present the results of the machine learning-based regression and classification. To evaluate the performance of regression models, the following metrics are used: mean absolute error (MAE) (12) , root mean squared error (RMSE) (13) and mean absolute percentage error (MAPE) (14) . A model with low MAE, MAPE and RMSE is desirable. In the context of BTC price, for example, an MAE of 5 means that the predicted price is ± USD 5 from the actual price. MAPE quantifies the error in terms of percentage. For example, a MAPE of 3% can mean either USD 3 or 30 depending on whether the actual price is USD 100 or 1000, respectively. RMSE indicates the spread of the forecast errors. A model that predicts occasionally erratic values will have a higher RMSE value, although it may have still have lower MAE or MAPE. Thus, the models should be evaluated with respect to all the three metrics. where y i is the actual value andŷ i is the predicted value. The results of the regression models for the three intervals are given in Table 5 . In Interval I, from April 2013 to July 2016, the BTC prices did not experience much volatility as shown in Fig. 1 Overall, all the four types of ML models showed robust performance in Intervals I and II, and satisfactory performance in Interval III, albeit with relatively higher errors. Table 6 summarizes the forecast of the regression models for nth-day BTC price considering Interval III, from April 2013 to December 2019. The bar chart in Fig. 7 shows the performance of the ML models in terms of MAPE. SANN reports the lowest MAPE for nth-day price forecasts, except for SVM, which gives a lower error rate for end-of-day closing price forecasts. However, this should be evaluated considering the fluctuations of the models as shown in Figs. 8 and 9-where LSTM clearly outperforms all other models. For 7th-day price forecast, ANN model reported the lowest RMSE of 31.78. Highest MAPE and MAE are reported by the LSTM model, and the highest RMSE is Lastly, in 90th day forecast horizon, SANN performed the best with MAPE, RMSE and MAE of 4.10%, 140.00 and 72.23, respectively. LSTM reported the highest error rate with MAPE of 5.41%. Generally, the SANN model reported the lowest errors for this horizon of forecast, followed by ANN, SVM and LSTM, respectively. However, when considering the model fluctuation as shown in Figs. 8 and 9 , LSTM performs the best, followed by SVM. ANN and SANN have similar patterns; however, SANN has high fluctuations. Consequently, even though SANN reported lower mean errors, it is the lowest performing model when considering the variability of its forecasts. The regression models perform better than baseline price estimates calculated by moving averages and technical indicators. Table 7 shows the MAPE obtained by using moving averages against the MAPE of the ML models. In forecast of end-of-day closing price and shortterm horizon of 7 days, the baseline estimate is competitive and comparable to some of our ML models. However, for medium-term horizon forecasts of 30 to 90 days, all developed ML models outperform the baseline. The classification models require different performance metrics for evaluation. The metrics are accuracy (15), F-1 score (16), area under curve (AUC) and the receiver operating characteristic (ROC) curve. The ROC curve is plotted with recall (18) along the y-axis and specificity (19) along the x-axis. All these metrics are created based on true positives (TP), true negatives (TN), false positives (FP) and false negatives (FN) shown in the confusion matrix in Table 8 . The accuracy is the most commonly reported classification metric and easily interpreted-a higher accuracy means a superior model. However, when the reported classes are imbalanced [30] , such as dataset with more days of decreased price than increased ones, metrics such as F1-score may provide further insight. A higher F1score indicates that the model performs both the precision (17) and the recall (18) well. AUC score indicates how good the model is in distinguishing between the true positives and the true negatives, AUC of 0.5 means no discrimination between classes, and thus, the closer the AUC to one, the better the classification performance [31] . where precision and recall are given by (17) and (18), respectively. Recall Table 9 summarizes the results of classification models for the three intervals. These datasets are different from the regression datasets as they include different set and number The results of the increase/decrease forecasting of nthday BTC price are given in Table 10 . Figure 10 shows the classification accuracies for the different types of ML models. For 7th-day forecasts, SVM performs the best with accuracy of 62% and AUC of 0.60. ANN performs the poorest with accuracy of 51%. LSTM performs better than ANN with 55% accuracy and AUC of 0.56. For 30th day, SANN has the highest accuracy of 62% with AUC of 0.61. SVM, LSTM and ANN have similar accuracy, F1 and AUC scores. Finally, for 90th-day forecasts, the LSTM model reports the highest accuracy of 64% with AUC of 0.66. The ANN model improves to 62% accuracy. SANN comes in third with 60% accuracy. LSTM model performed best for forecasting 9th-day increase/decrease. SANN performed best for next-day forecasts across all intervals as well as 30th-day forecasts. SVM performed best for 7th-day forecasts. ANN had similar performance in Intervals I, II and III next-day forecast but improved in 90th-day forecast. Overall, LSTM model is the best performing one based on Figs. 8, 9 and the overall results from Table 10 . We applied principal component analysis (PCA) for dimensionality reduction for the purpose of measuring its effects. Based on Interval III, the components of PCA that capture 95% (PCA 95 ) of the variance in the original data were used for predicting BTC price and forecasting the increase/decrease. However, the performance of the regression models was subpar compared to the other models reported in Table 5 . SVM resulted in MAPE of ! 30%, ANN in MAPE of ! 22%, SANN in MAPE ! 17%, and LSTM in MAPE ! 41%. In classification, LSTM and ANN models reported accuracy scores below 50%. However, SANN reported accuracy of 61%, and F1-score and AUC of 0.61. SVM reported 54% accuracy with F1-score of 0.57 and AUC of 0.54. Thus, while all the regression models did not perform well using PCA 95 , the classification metrics showed that SANN and SVM are quite comparable to the models reported in Table 9 . Modeling BTC price consists of two components: the rise and fall of the price and the actual price. Through this paper, we have shown that the latter can be done with very low error rates. However, the former is still an open challenge to all researchers. As noted in the literature, researchers have used internal and external factors to classify the increase/decrease of BTC price. BTC prices are stochastic, and no given sets of features can provide a complete forecast. Nevertheless, researchers have shown success to various degree in modeling BTC prices based on a different kinds of feature sets. In this paper, we have included features that are directly associated with the blockchain. For instance, if a lot of miners are interested in mining BTC, the hashrate and difficulty will be high. Likewise, if many people are using it for transactions, then the related features such as active addresses and number of transactions will be high. All these features can also be considered time-series features. The technical indicators are simple mathematical tools to convert these rapidly raw features into smoother time-series features that can used to make baseline estimates. Combining the technical The feature selection process has to be robust to find the most useful features. The selected features for the various intervals and forecast horizons are different and not unique to one particular interval or horizon period. The feature selection process presented is systematic and can be used to come up with good selections as evidence by the performance of the models. Alternatively, dimensionality reduction was experimented using the PCA method. The regression models based on PCA did not perform as well as the models trained on selected features. The reason our approach works well is that the techniques used in our feature selection process take care of the issues such as multi-collinearity and cross-correlations in addition to obtaining the feature importance. Although PCA has a similar effect of making new variables that are linearly independent, our inclusion of feature importance with random forest allows us to identify individual features with high importance, which was not possible to do with PCA. The four ML models used are of different nature and different strengths are weaknesses. SVM is easy to train, but it is not truly stochastic. For a given dataset, it will always produce the same results with the same parameters. It is fast and can be used for small datasets. The reason why SVM performed better than ANN in some instances can be attributed to the size of the dataset. ANN performs generally performs with large datasets containing millions of data points. LSTM is designed to remember trends in the data such as past behavior. Its best performance in 90th-day forecast can be attributed to its design. SANN is a stacked model consisting of sub-models made from smaller ANN models. Considering the different performance metrics and fluctuations of the forecasts, LSTM performed the best overall, followed by SVM. SANN and ANN models can follow the BTC time series with more fluctuations. LSTM also performed the best in classification. This aligns with the results from [25] . In this paper, we address short-term to mid-term BTC price forecasts using ML models. It is the first study that takes into consideration all the price indicators up to December 31, 2019, and provides highly accurate end-of-day, short-term (7 days) and mid-term (30 and 90 days) BTC price forecasts using machine learning. Four types of ML models have been used: ANN, SANN, SVM and LSTM. The LSTM showed the best overall performance. All the developed models are satisfactory and have good performance, with the classification models scoring up to 65% accuracy for next-day forecast and scoring from 62% to 64% accuracy for seventhninetieth-day forecast. For daily price forecast, the MAPE is as low as 1.44%, while it varies from 2.88% to 4.10% for horizons of seven to ninety days. Performance evaluation results show an improvement over the latest literature in daily closing price forecast and price increase/decrease forecasting. The results are satisfactory and show potential for further applications in different areas such financial technology, blockchain and AI development. Our results show that it is possible to forecast the actual BTC price with very low error rates, while it is much harder to forecast its rise and fall. The classification model performance scores presented are the best in the literature. Having said that, the classification models for Bitcoin need to be further studied. As further work, hourly BTC prices and technical indicators may be utilized as well as using ensemble models that combined different types of models for making forecast. Further work which can be followed on the basis of this paper is investigating the use of artificial intelligence for modeling the price of cryptocurrencies as a basis for measuring the risk factor for the financial usage of blockchain technology. This model could also be useful in detecting fraudulent activities and anomalous behavior. When the actual behavior (price) changes significantly from the modeled behavior, this may indicate the effect of Fig. 10 Accuracy of the classification models for nth day forecast in Interval III Neural Computing and Applications external factors such as major global events as well as fraudulent activities such as artificial pumps and dumps. While the price modeling and forecast is not the only tool to detect such external factors, one of the possible applications of such models is in the detection and prevention of fraudulent activities. Our future research will be focusing on such application areas. Using external data inputs related to global events and global financial risk, a combination of machine learning-based price models and anomaly detection methods may be utilized to assess and predict the stability of cryptocurrencies. Conflict of interest The authors declare that they have no conflict of interest. Availability of data and material Data available at: https://github. com/heliphix/btc_data Digital spillover: measuring the true impact of the digital economy Mastering blockchain distributed ledger technology, and smart contracts explained-packt publishing Policy specification and verification for blockchain and smart contracts in 5G networks Bitcoin hits a new record high, but stops short of \$20,000 | Fortune Bitcoin beats coronavirus blues Are Bitcon returns predictable?: Evidence from technical indicators Predicting Bitcoin returns using high-dimensional technical indicators Bitcoin price prediction using machine learning: an approach to sample dimension engineering Non-fundamental, non-parametric Bitcoin forecasting Predicting the direction, maximum, minimum and closing prices of daily Bitcoin exchange rate using machine learning techniques What causes the attention of Bitcoin? Media attention and Bitcoin prices Cryptocurrency price prediction using tweet volumes and sentiment analysis Predicting Bitcoin returns using high-dimensional technical indicators Does twitter predict Bitcoin? Predicting fluctuations in cryptocurrency transactions based on user comments and replies Forecasting: principles and practice Autoregressive integrated moving average (ARIMA) model for fore-casting cryptocurrency exchange rate in high volatilityenvironment: A new insight of bitcoin transaction A GARCH forecasting model to predict day-ahead electricity prices Next-day Bitcoin price forecast An empirical study on modeling and prediction of Bitcoin prices with bayesian neural networks based on blockchain information Bayesian regression and Bitcoin A latent source model for nonparametric time series classification Using the Bitcoin transaction graph to predict the price of Bitcoin Predicting the price of Bitcoin using machine learning Isolation-based anomaly detection Approximation by superpositions of a sigmoidal function Adam: a method for stochastic optimization Approximating continuous functions by ReLU nets of minimal width A review on classification of imbalanced data for wireless sensor networks Receiver operating characteristic curve in diagnostic test assessment