Estimating Code Running Time Complexity with Machine Learning

Pfitscher, Ricardo J.; Rodenbusch, Gabriel B.; Dias, Anderson; Vieira, Paulo; Fouto, Nuno M. M. D.

doi:10.1007/978-3-031-45389-2_27

Ricardo J. Pfitscher⁹,
Gabriel B. Rodenbusch⁹,
Anderson Dias¹⁰,
Paulo Vieira¹⁰ &
…
Nuno M. M. D. Fouto¹¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 14196))

Included in the following conference series:

Brazilian Conference on Intelligent Systems

645 Accesses
2 Citations

Abstract

The running time complexity is a crucial measure for determining the computational efficiency of a given program or algorithm. Depending on the problem complexity class, it can be considered intractable; a program that solves this problem will consume so many resources for a sufficiently large input that it will be unfeasible to execute it. Due to Alan Turing’s halting problem, it is impossible to write a program capable of determining the execution time of a given program and, therefore, classifying it according to its complexity class. Despite this limitation, an approximate running time value can be helpful to support development teams in evaluating the efficiency of their produced code. Furthermore, software-integrated development environments (IDEs) could show real-time efficiency indicators for their programmers. Recent research efforts have made, through artificial intelligence techniques, complexity estimations based on code characteristics (e.g., number of nested loops and number of conditional tests). However, there are no databases that relate code characteristics with complexity classes considered inefficient (e.g., \(O(c^n)\) and O(n!)), which limits current research results. This research compared three machine learning approaches (i.e., Random Forest, eXtreme Gradient Boosting, and Artificial Neural Networks) regarding their accuracy in predicting Java program codes’ efficiency and complexity class. We train each model using a dataset that merges data from literature and 394 program codes with their respective complexity classes crawled from a publicly available website. Results show that Random Forest resulted in the best accuracy, being 90.17% accurate when predicting codes’ efficiency and 89.84% in estimating complexity classes.

Access provided by University of Notre Dame Hesburgh Library. Download conference paper PDF

Learning Based Methods for Code Runtime Complexity Prediction

Information Economy for the Fitness of Intelligent Algorithms

Is Prüfer Code Encoding Always a Bad Idea?

1 Introduction

Computer programming is a primary task in a software development lifecycle. It essentially involves translating algorithms into functions, methods, and procedures that perform tasks that, in conjunction, solve a specific problem. Before the programming action, analysts should determine the algorithms that apply to the referred situation. Such an analysis task requires technical knowledge in many areas, from business rules to computational resource efficiency. Regarding the latter, a relevant discipline is algorithm analysis. One result of its activities is a mathematical function that expresses the upper bound running time of a specific algorithm, a.k.a. running time complexity [10].

The complexity class of an algorithm or code directly impacts its resource usage efficiency. Such an impact on the software’s future performance highlights the need for a precise analysis. Also, depending on its asymptotic behavior, the study may indicate that the program takes so long to run that it is considered intractable [6]. On the other hand, beginner developers may rely on inefficient solutions to solve coding problems without understanding the impacts on running time complexity. Determining the running time complexity is arduous and requires a deep understanding of code behavior. An analyst must consider multiple variables when analyzing a coded function, including the number of recursive calls, the size of iteration loops, and the number of nested loops. Also, analysts must consider how these variables relate and how the computer program uses computational resources. Consequently, experts can misevaluate the running-time function, promoting a search for tools and frameworks to aid in the complexity analysis process.

Alan Turing proved through the halting problem that estimating the code complexity is mathematically impossible [12]. Thus, recent research applied machine learning techniques and mathematical models to estimate running-time complexity functions [9, 18]. With such approximation values, developers could get real-time feedback on the efficiency of their code. However, a relevant barrier that hinders the evolution and adoption of such models is the lack of datasets that correlate coding characteristics to the respective runtime complexity class.

This research aims to fill this gap by developing a learning method to predict the runtime complexity of a program code. To achieve this objective, we built a representative dataset with program codes, related characteristics and complexities, and compared machine learning approaches according to their accuracy in predicting efficiency (if the code is efficient or not) and the runtime complexity of program codes. The contributions of this paper are twofold: first, it makes available to the machine learning community a dataset containing 394 Java code files grouped into eight distinct complexity classes and each code having 16 metadata information; second, it shows that a Random Forest model trained with a dataset merged from the published work of Sikka et al. [18] and web crawled data can achieve an accuracy of 90.34% when predicting code efficiency and 89.26% when predicting complexity classes.

The remainder of this text is organized as follows, Sect. 2 discusses the related work and current limitations. Then, Sect. 3 describe the methodology followed to build datasets and develop the machine learning models. Next, Sect. 4 presents the complexity prediction results obtained with trained models and Sect. 5 discusses the limitations of classifications. Finally, the Sect. 6 presents this work’s conclusions and future works.

2 Related Work

Given the limitations of writing a computer program to determine the running time of program source codes, a few works appear in the literature to address this problem. However, the recent advances in artificial intelligence propelled the development of models that estimate code complexity.

The seminal work of Hutter et al. [9] assesses machine learning approaches to predict the performance of algorithms. The results show that the proposed approaches based on Random Forests and approximate Gaussian processes are the ones that better predict the performance of parameterized algorithms used to solve NP-hard problems. The correlation coefficients of predictions reached values superior to 0.9. Although the research of Hutter et al. provided relevant results, our focus differs from theirs because we wish to estimate the runtime complexity regarding asymptotic order.

The research work of Sikka et al. [18] addressed the runtime complexity estimation problem using machine learning models. The paper has two main contributions. First, it publishes the Code Runtime Complexity Dataset (CoRCoD) composed of 932 code files belonging to 5 different classes of complexities, namely constant-time, logarithmic-time, linear-time, linearithmic-time, and quadratic-time. Second, it shows that the Random Forest model achieved the best results for predicting code complexity, with an accuracy of 71.84% using code features as attributes and 83.57% when Abstract Syntax Tree (AST) applies to generate code embedding used in training. Although the paper contributes a significant step for future research, we argue that the dataset does not have entries for inefficient codes. Thus, we address this problem by crawling a public website to publish a more comprehensive dataset; also, we evaluated more recent models, such as the eXtreme Gradient Boosting Trees (XGBT) and Artificial Neural Networks (ANN).

The efforts published by Sepideh Seifzadeh in a public blog [17] appear to be the results of ongoing research at IBM. The publication has two significant highlights: first, the developed models use The CodeNet Dataset with around 14M code samples for roughly 4K programming problems; second, they trained an ANN and a Light Gradient Boosting Machine (LGBM) using both code features and code graph representation to predict six classes of runtime complexity near-constant, linear, log-linear, polynomial, exponential, and factorial. The preliminary results show up to \(80\%\) of accuracy in predictions, which supports machine learning for addressing the problem of runtime complexity estimation. Despite the promising results, the problem remains open, with space for improvements; also, we accessed the CodeNet used in the results and do not find the tags that classify codes according to complexity classes, we suppose authors manually tagged the dataset, but it is not clear in the published article.

3 Material and Methods

This section presents the materials and methodology used to compare approaches to estimate code running time complexity. Section 3.1 explains the steps taken to consolidate a dataset with representativeness for training machine learning models, including a discussion regarding dataset balancing. Section 3.2 explains how we extracted attributes from codes and the assumptions leveraged for complexity classification, as also discusses the process used to select features for the machine learning models. Finally, Sect. 3.3 presents the machine learning models used to predict the codes’ complexity and process used for comparisons. All the source codes we used in this paper are available on GitHub^{Footnote 1}.

3.1 Dataset Consolidation

The first step toward developing this research project is dataset consolidation. This work relies on three datasets. The first one is the reference dataset published in the work of Sikka et al. [18], which consists of 931 Java code files, with 14 metadata information relative to the codes and their respective complexity. Considering that such a dataset does not contain codes with intractable complexity, this work builds a second one with data from a publicly available website, which we call crawled dataset (394 entries). The third one, the merged dataset (1325 entries), results from merging the reference and the crawled datasets.

We developed a web crawler to extract the information from the platform geekforgeeks.org, which contains a specific section to discuss fundamentals of algorithms^{Footnote 2}. Such section discusses several algorithms, their respective codes, and complexities. The crawler consists of a Python script that uses the selenium library to scratch the website and extract codes and complexity information and runs as follows. First, it accesses the main page, which contains a list of topics related to the fundamentals of algorithms. Then, the crawler runs a login process and computes the list of links to be visited on the page. Next, each page in the link list is accessed to verify if it contains content relevant to complexity prediction. In other words, the crawler searches if the page has algorithms’ source codes and runtime complexity information.

Considering that the focus of this research is to extend the reference dataset, the crawler always selects Java programming language. Also, the page should provide text describing the time complexity of the code in any place near it. As soon as the crawler identifies that the page contains the relevant content, it copies each code and searches for the nearest complexity information. It is essential to highlight that many pages have multiple program codes; thus, we save each one and associate it with the runtime complexity information closer to the point where the program code is exhibited. Such nearest assumption involves a risk of misclassification, as the pages from the website do not follow a static standard; however, we understand that this is the best effort for automated computing of a dataset.

The crawler process resulted in a dataset with 394 program codes and 78 distinct complexity classes. Many classes extracted from the Web pages do not follow the general form of asymptotic classes; for example, the classes O(n|n|) (algorithm: compute the sum of digits in all numbers from 1 to n) and \(O(m^2k + k^3 log n)\) (algorithm: count ways to reach the nth stair) do not match any general case; for those cases, we manually evaluated each entry and defined their value according to the closest dominant class.

To produce a feasible scenario, instead of 78 complexities classes, we reduced the classification scope to eight categories: constant, logarithmic (includes double logarithmic and polylogarithmic), sublinear (fractional power), linear, linearithmic, quadratic, polynomial, and exponential (we also consider factorial as exponential). In addition, we split the program codes into two major categories: the program codes that are efficient (constant, logarithmic, sublinear, linear, and linearithmic) and the inefficient ones (quadratic, polynomial, and exponential). We assume as polynomial time, the asymptotic functions that are equal to or greater than \(O(n^3)\). We understand this is a questionable assumption, as according to [10], polynomial time functions have the form of \(n^c\) for every c greater than 1. However, we argue that algorithms that run in \(O(n^3)\) are much more inefficient than the ones that run in \(O(n^2)\) time, and thus, supposing \(O(n^3)\) is distinct from \(O(n^2)\) is quite reasonable. Figure 1 depicts the resultant distribution of complexity classes in crawled and reference datasets.

As shown in Fig. 1b, the reference dataset does not contain the exponential time and the polynomial time codes, which severally reduces the size of the inefficient class. Also, as one can notice, both datasets are imbalanced: in the crawled dataset (Fig. 1a) the smaller class (sublinear) contains five entries while the larger class (linear) contains 125 entries; in the reference dataset the smaller class (logarithmic) contains 55 entries while the larger class (also the linear) contains 383 entries. Such unbalancing also impacts the classes distribution in the merged dataset (Fig. 1c), making the linear and quadratic classes more representative. Regarding efficiency, the distribution in each dataset is as follows: 63.70% of codes are from efficient classes in the crawled dataset, 78.51% in the reference dataset, and 74.18% in the merged dataset.

Considering that an imbalanced dataset can impact the abstraction capability of machine learning models [14] and that our dataset can be considered small compared to what is considered in current state-of-the-art, we rely on the SMOTE tool to balance the data on each dataset. The SMOTE (Synthetic Minority Oversampling Technique) [4] is a well-known algorithm for solving imbalanced classification problems. The general idea of this method is to artificially generate new samples of the minority class using the nearest neighbors of these cases. Such synthetic data would be generated between the random data and the randomly selected k-nearest neighbor; the procedure is then repeated until the minority class has a size equal to or close to the majority. Due to applying SMOTE, all the resultant complexity classes hold a similar number of occurrences.

3.2 Features Extraction and Selection

We wrote the code features extraction program in Python (“crawler.py file on the repository”) and relied on the javalang library to extract the features described in Table 1 for each Java code in the datasets. The second column in the table describes if the former work of Sikka et al. [18] relies on the feature for their prediction. For the cases where column values are with “no*”, the feature was only used in [18] for the manual classification of algorithms and not for the machine learning models. For the case of recursive calls, we measured the number of recursive calls in the code instead of if there is a recursive call in the code.

Table 1. Features extracted from codes and respective descriptions

Full size table

As we established the set of complexity classes and balanced the datasets, the next step is to perform a feature selection to determine the most relevant characteristics to distinguish the codes according to their complexity classes. Multiple approaches exist for determining the most relevant features of random forest models. According to Speiser et al. [19] for datasets with many predictors, the methods implemented in the R packages varSelRF and Boruta [11] are preferable due to computational efficiency. Thus, in this work, we rely on the Boruta package to define which code attributes are more relevant to predict the complexity classes and the efficiency of a given code (please read the “classificators.R” file in the repository). Boruta is a feature selection algorithm that relies on Random Forest to output a variable importance measure (VIM); the method’s rationality consists of progressively eliminating irrelevant features by comparing original attributes’ importance with importance achievable at random until the test is stable. Table 2 depicts the resultant feature functions from Boruta process.

Table 2. Features resultant from Boruta feature selection process in each dataset according to the dependent variable

Full size table

An analysis of the feature selection results depicted in Table 2 permits some relevant considerations: (i) the efficiency estimation requires fewer features than the complexity class estimation; (ii) the crawled dataset also uses fewer features for both predicting efficiency and complexity classes and; (iii) the number of switches (attribute number 3) in codes is irrelevant for the predictions.

3.3 Model Training and Validation

The last step of this research consists of comparing supervised machine learning models according to their ability to predict the running-time complexity class of a program code. To this end, we first use the merged dataset to find out which method have the best accuracy to predict both efficiency (two-class prediction problem) and complexity class (multiclass prediction). Considering the results provided by the work of [18], which found that Random Forest had the best accuracy among eight classification algorithms, we included eXtreme Gradient Boosting Trees (XGBT) and ANN in the comparisons because the previous work do not considered that approaches. We conducted the implementations of Random Forest both in R and in Python (files “classificators.R” and “classificators.py”, respectively) and the other models only in Python.

XGBT is an effective and scalable tree-boosting system that combines novel sparsity-aware algorithms and weighted quantile sketch for approximate tree learning [5]. This system is an optimized version of the gradient boosting machine algorithm (GBDT), created by Friedman [7], which uses decision trees for classification. The traditional GBDT approach only deals with the first derivative in learning; XGBoost improves the loss function with Taylor expansion, reducing modeling complexities and the likelihood of model over-fitness [3].

Regarding ANNs, each implementation requires several configuration parameters, which we discuss in the following. We normalized data using the StandardScaler function from scikit-learn [13] to scale the features so that they have a mean of 0 and a standard deviation of 1. Such scaling is a typical pre-processing step to improve model performance [20]. We then established the hyperparameters based on the best random search results, which have been proven more efficient for hyper parametrization than trials on a grid [1]. When the corresponding dropout rate exceeds zero, a Dropout layer is added after the dense layer to prevent overfitting. The same applies to the batch normalization flag: when it is true, it adds a BatchNormalization layer after the dense layer to normalize the inputs and improve the convergence of the model [8]. Lastly, the model is optimized through Adam, defined by the random search, and trained for 500 epochs with a batch size equal to 64.

We used the train_test_split function from scikit-learn [13] to split the balanced dataset into train- and test-data. To prevent overfitting issues, we chose to maintain most of the function parameters as standard as possible. That way, we altered only the stratify. When different from None, the samples are stratified through StratifiedKFold, so that each set contains approximately the same percentage of samples of each target class as the complete set. Stratification has been found to improve upon standard cross-validation both in terms of bias and variance. The standard method of randomly distributing multi-label training samples can create issues when test subsets lack even a single positive example of a rare label. This, in turn, can lead to calculation problems for various multi-label evaluation measures [16].

After we established that Random Forest is the machine learning approach that provides the best results (see Sect. 4), we conducted a systematic process for learning and predicting complexity classes:

1.
We used the crawled dataset for training the models with a sample of 70% and tested with 30% remaining data;
2.
We compare our results to Sikka et al. [18] by running a cross-validation process with 70% of the reference dataset and validating using 30% of their data;
3.
We evaluated the generalization capability of the crawled dataset by training the ML model with 70% of the crawled dataset and testing with 100% of the reference dataset, and;
4.
We in-depth evaluate the confusion matrix of the Random Forest model using 70% of the merged dataset for training and 30% for testing.

4 Results

This section presents the results for predicting computer program code efficiency and complexity classes based on their attributes. First, we compare the accuracy of predictions for each model in the merged dataset to select the appropriate approach to perform our in-depth systematic evaluation process. Table 3 depicts these comparison results; the accuracy values show that the Random Forest is the best approach among the evaluated methods, predicting codes’ efficiency with 90.17% accuracy and the complexity class with 89.84%. The XGBT provided an accuracy close to the Random Forest, and given the nondeterministic behavior of ANN it resulted in a range of accuracy, also smaller than the Random Forest. Such results follow the findings of [18], which pointed out the Random Forest as the algorithm that resulted in better accuracy for predicting complexity classes (71.84%). Thus, we will consider the Random Forest for our systematic evaluation process.

Table 3. Comparison of machine learning models to predict efficiency and complexity class on the merged dataset

Full size table

As we established Random Forest as the reference ML approach, we now assess its accuracy to predict the efficiency of codes using our systematic evaluation process. Table 4 contains prediction results for the efficiency of available algorithm codes. Two main conclusions arise from the analysis of the use of Random Forest to predict the efficiency of source codes: i) when the training and test data are from the same dataset, the model can achieve up to 80% of accuracy (80.13% for crawled dataset, 93.85% for reference dataset, and 90.17% for the merged dataset); ii) when train and test data are from distinct datasets, accuracy drops to 83.57% (trained with 70% of crawled dataset and 100% of reference dataset). Although this fall in accuracy seems a poor result, if we consider that the data are from distinct sources, the crawled dataset has a good generalization concerning efficiency classification. Also, by analyzing the F1-score’s results (up to 80%), we can conclude that all the random forest models have a good balance between precision and recall, which means that they have a good fit to distinguish the efficiency of source codes.

Table 4. Summary of efficiency prediction using Random Forest

Full size table

After assessing the Random Forest models’ ability to classify codes regarding their efficiency, we retrained each model to predict complexity classes. Similar to the efficiency classification, we leverage our four-step systematic process to compare the models. Table 5 depicts these results. An accuracy analysis shows that the Random Forest model properly predicts the complexity class of source codes from the crawled dataset, with an accuracy of 80%.

The results in Table 5 also show that the reference- and merged-based models achieved an accuracy superior to 88%, outstanding the finds of Sikka et al. [18], which presented an accuracy of 71.4% for predicting all classes. We claim that the resultant improvement justifies by both the balancing and the feature selection processes we leveraged. However, when the predictions ran in the reference dataset, the model trained with data from the crawled data had a lower accuracy, only 44.04%. This indicates that the model does not have enough generalization for distinguishing the complexity class when it only relies on the crawled dataset for training. It is worth mentioning that the trained model does not include any of the data in the reference dataset. To understand the misclassifications, we analyzed the confusion matrix of predictions of the crawled-based model on the reference dataset (Table 6).

Table 5. Summary of complexity class prediction using Random Forest model

Full size table

Table 6. Confusion matrix of predictions in the reference dataset with the Random Forest model trained using 70% of crawled dataset

Full size table

The results depicted in Table 6 show that 112 predictions occurred for classes that even exist in the dataset (sublinear, polynomial, and exponential time), representing 12.03% of the total data, most of them pointing to inefficient classes (polynomial and exponential time). Such a result demonstrates that the crawled dataset does not have enough representativeness to allow machine learning models to generalize complexity classes. Also, we observed that the better predictions occurred for the linear and linearithmic categories (65.01% and 74% of accuracy), and the worst case occurred in the logarithmic class (09.09% of accuracy), which indicates that the characteristics of these classes require a more profound study. For such analysis, we assess the mean decrease accuracy measure - MDA [2] for each feature to understand their importance in predicting the complexity class on each model. Figure 2 presents such MDA results.

The analysis of top-3 MDA depicted in Fig. 2 allows us to conclude that the three most relevant features for code complexity classification in all the models are the number of variables (num_vari), the number of statements (num_state) and the number of loops (num_loof). The depth of nested loops and the number of recursive calls only appear in the top-5 relevant features in two of three models, which is surprising, as these metrics serve to compute recurrence functions in code complexity analysis. Another relevant aspect of MDA analysis is that the second most relevant feature varies significantly from one model to another, which also hinders the abstraction capability of the model based on the crawled dataset. Considering that the crawled-based and the reference-based models share the top-3 relevant features, we now assess the frequency density function of these features in both datasets to understand the reasons why models misclassify linear, linearithmic, and logarithmic classes (Fig. 3).

Two main considerations arise from the analysis of Fig. 3: first, the density function is different in the two datasets, both in terms of the number of occurrences and the behavior of the distribution; second, the linear and linearithmic classes have density distribution functions with similar behavior for the three evaluated metrics, independently of the dataset. While the first consideration explains why the crawled-based model cannot properly identify the classes of the reference dataset, the second justifies the lack of distinction between classes.

To have a clear view of the limitations of the merged-based model, we also depicted the confusion matrix of its predictions. The results in Table 7 show accurate predictions for most of the class, with accuracies superior to 90%; the discrepant case occurred in the linear class, with 25 predictions pointing to faster time classes (i.e., constant, logarithmic, and sublinear) and 29 predictions to slower classes (i.e., linearithmic, quadratic, polynomial, and exponential). To understand this lack of accuracy, we plotted the frequency density function for the num_vari and num_state features in the merged dataset (Fig. 4).

Table 7. Confusion matrix of predictions in the validation part of merged dataset

Full size table

The density functions depicted in Fig. 4 show that misclassifications mainly occurred because the number of variables and states in codes from the mispointed classes have similar behavior, hindering distinguishing them. One can argue that we could remove this attribute from the models; however, even though these features cause misclassifications in the linear class, they are relevant to the overall model classification.

5 Discussion

Besides this paper showing promising results in estimating the code runtime complexity, outperforming the state-of-the-art, we point out the following limitations that will be addressed in future research:

The Web crawler may not collect the correct complexity of each code. The dataset used for training and estimations resulted from a Web scrapping process. The collection process may imply biased information, as the crawler ran automatically. However, given the lack of datasets containing inefficient codes, we understand that the potential bias does not influence the comparison results, which shows that Random Forest achieved the best results. We will provide an in-depth analysis of the collected data in future works, including a manual verification.
Trustworthiness of geeksforgeeks.org. One can argue that the information provided by geeksforgeeks.org may not be trustworthy. However, considering the lack of publicly available datasets containing inefficient codes and their respective complexity classes, we understand that the provided information is the best effort to have a comprehensive dataset containing the most relevant complexity classes.
Small dataset for machine learning purposes. The merged dataset used for training the machine learning models contains 1325 entries, which is small compared to today’s standards. However, considering the lack of comprehensive datasets and the absence of tags to classify the codes published in the CodeNet dataset, we consider our results as a relevant step towards the use of AI to predict code complexity.
The dataset balancing process may cause overfitting. Santos et al. [15] discuss the impacts of balancing datasets before running the cross-validation process; they demonstrate that oversampling techniques may generate replicated entries in test and training sets, which implies in lack of generalization of ML models caused by overfitting. However, the SMOTE method generates synthetic entries based on a proximity function, without duplicating any entry. Thus, we argue that the lack of duplication reduces the overfitting issues.

6 Concluding Remarks

In this paper, we investigated the use of machine learning models to predict the runtime complexity class of computer program codes. Considering that the reference dataset published by Sikka et al. [18] does not contain most of the inefficient classes from the literature, we build a second one with data from a publicly available website. Next, we compare machine learning models to predict program codes’ efficiency and complexity classes.

Results show the Random Forest as the best approach, predicting code efficiency with an accuracy of up to 80% and can classify the runtime complexity with an accuracy superior to 81% for a model trained with data from a merged dataset and 87.4% when the model is trained with the reference dataset. Such a result outperforms the ones found by Sikka et al. [18], which found an accuracy of 71.4% using a Random Forest Model. We argue that the feature selection process and balanced datasets supported the accuracy enhancement. However, the Random Forest Model’s accuracy drops to less than 50% when we train it with the crawled dataset and try to predict the complexity classes in the reference dataset. We claim that this occurred for two primary reasons: first, we trained the model in a dataset that has more classes than the reference dataset; two, most of the classification errors occurred in the logarithmic and linearithmic categories, which we demonstrated to have a similar behavior regarding the top-3 most important features.

In future work, we aim to develop a complete complexity prediction framework containing a learning component deployed on the cloud and an extension for software IDEs. Research challenges related to the framework include but are not limited to: i) studying the applicability of other machine learning models to predict computer programs’ efficiency and complexity class, including natural language processing and deep learning frameworks; and ii) building an even more comprehensive dataset of program codes and complexity classes mainly in polynomial and exponential classes with constant update. In addition, we aim to manually tag the dataset published by the CodeNet project, which will benefit future research and improve the accuracy of predictors.

Notes

References

Bergstra, J., Bengio, Y.: Random search for hyper-parameter optimization. J. Mach. Learn. Res. 13(2) (2012)
Google Scholar
Breiman, L.: Random forests. Mach. Learn. 45, 5–32 (2001)
Article Google Scholar
Chang, Y.C., Chang, K.H., Wu, G.J.: Application of extreme gradient boosting trees in the construction of credit risk assessment models for financial institutions. Appl. Soft Comput. 73, 914–920 (2018)
Article Google Scholar
Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: Smote: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)
Article MATH Google Scholar
Chen, T., Guestrin, C.: Xgboost: A scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 785–794 (2016)
Google Scholar
Cormen, T.H., Leiserson, C.E., Rivest, R.L., Stein, C.: Introduction to algorithms. MIT press (2022)
Google Scholar
Friedman, J.H.: Greedy function approximation: a gradient boosting machine. Annals of statistics, pp. 1189–1232 (2001)
Google Scholar
Garbin, C., Zhu, X., Marques, O.: Dropout vs. batch normalization: an empirical study of their impact to deep learning. Multimed. Tools Appl. 79, 12777–12815 (2020)
Google Scholar
Hutter, F., Xu, L., Hoos, H.H., Leyton-Brown, K.: Algorithm runtime prediction: methods & evaluation. Artif. Intell. 206, 79–111 (2014)
Article MathSciNet MATH Google Scholar
Kleinberg, J., Tardos, E.: Algorithm design. Pearson Education India (2006)
Google Scholar
Kursa, M.B., Rudnicki, W.R.: Feature selection with the boruta package. J. Stat. Softw. 36, 1–13 (2010)
Article Google Scholar
Lucas, S.: The origins of the halting problem. J. Logical Algebraic Methods Programming 121, 100687 (2021)
Article MathSciNet MATH Google Scholar
Pedregosa, F., et al.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)
MathSciNet MATH Google Scholar
Ramyachitra, D., Manikandan, P.: Imbalanced dataset classification and solutions: a review. Int. J. Comput. Bus. Res. (IJCBR) 5(4), 1–29 (2014)
Google Scholar
Santos, M.S., Soares, J.P., Abreu, P.H., Araujo, H., Santos, J.: Cross-validation for imbalanced datasets: avoiding overoptimistic and overfitting approaches [research frontier]. IEEE Comput. Intell. Magaz. 13(4), 59–76 (2018)
Google Scholar
Sechidis, K., Tsoumakas, G., Vlahavas, I.: On the stratification of multi-label data. In: Gunopulos, D., Hofmann, T., Malerba, D., Vazirgiannis, M. (eds.) ECML PKDD 2011. LNCS (LNAI), vol. 6913, pp. 145–158. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-23808-6_10
Chapter Google Scholar
Seifzadeh, S.: Ai for code: Predict code complexity using IBM’s CodeNet Dataset, October 2021. https://community.ibm.com/community/user/ai-datascience/blogs/sepideh-seifzadeh1/2021/10/05/ai-for-code-predict-code-complexity-using-ibms-cod
Sikka, J., Satya, K., Kumar, Y., Uppal, S., Shah, R.R., Zimmermann, R.: Learning based methods for code runtime complexity prediction. In: Jose, J.M., Yilmaz, E., Magalhães, J., Castells, P., Ferro, N., Silva, M.J., Martins, F. (eds.) ECIR 2020. LNCS, vol. 12035, pp. 313–325. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-45439-5_21
Chapter Google Scholar
Speiser, J.L., Miller, M.E., Tooze, J., Ip, E.: A comparison of random forest variable selection methods for classification prediction modeling. Expert Syst. Appl. 134, 93–101 (2019)
Article Google Scholar
Thara, D., PremaSudha, B., Xiong, F.: Auto-detection of epileptic seizure events using deep neural network with different feature scaling techniques. Pattern Recogn. Lett. 128, 544–550 (2019)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Federal University of Santa Catarina, Florianópolis, Brazil
Ricardo J. Pfitscher & Gabriel B. Rodenbusch
UniSociesc Campus Anita Garibaldi, Joinville, Brazil
Anderson Dias & Paulo Vieira
University of São Paulo, São Paulo, Brazil
Nuno M. M. D. Fouto

Authors

Ricardo J. Pfitscher
View author publications
Search author on:PubMed Google Scholar
Gabriel B. Rodenbusch
View author publications
Search author on:PubMed Google Scholar
Anderson Dias
View author publications
Search author on:PubMed Google Scholar
Paulo Vieira
View author publications
Search author on:PubMed Google Scholar
Nuno M. M. D. Fouto
View author publications
Search author on:PubMed Google Scholar

Corresponding author

Correspondence to Ricardo J. Pfitscher .

Editor information

Editors and Affiliations

Federal University of São Carlos, São Carlos, Brazil
Murilo C. Naldi
Centro Universitario da FEI, São Bernardo do Campo, Brazil
Reinaldo A. C. Bianchi

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Pfitscher, R.J., Rodenbusch, G.B., Dias, A., Vieira, P., Fouto, N.M.M.D. (2023). Estimating Code Running Time Complexity with Machine Learning. In: Naldi, M.C., Bianchi, R.A.C. (eds) Intelligent Systems. BRACIS 2023. Lecture Notes in Computer Science(), vol 14196. Springer, Cham. https://doi.org/10.1007/978-3-031-45389-2_27

Download citation

DOI: https://doi.org/10.1007/978-3-031-45389-2_27
Published: 12 October 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-45388-5
Online ISBN: 978-3-031-45389-2
eBook Packages: Computer ScienceComputer Science (R0)

Estimating Code Running Time Complexity with Machine Learning

Abstract

Similar content being viewed by others

Learning Based Methods for Code Runtime Complexity Prediction

Information Economy for the Fitness of Intelligent Algorithms

Is Prüfer Code Encoding Always a Bad Idea?

1 Introduction

2 Related Work