key: cord-277650-llsgpyrm authors: Damiati, Safa A. title: Digital Pharmaceutical Sciences date: 2020-07-26 journal: AAPS PharmSciTech DOI: 10.1208/s12249-020-01747-4 sha: doc_id: 277650 cord_uid: llsgpyrm Artificial intelligence (AI) and machine learning, in particular, have gained significant interest in many fields, including pharmaceutical sciences. The enormous growth of data from several sources, the recent advances in various analytical tools, and the continuous developments in machine learning algorithms have resulted in a rapid increase in new machine learning applications in different areas of pharmaceutical sciences. This review summarizes the past, present, and potential future impacts of machine learning technologies on different areas of pharmaceutical sciences, including drug design and discovery, preformulation, and formulation. The machine learning methods commonly used in pharmaceutical sciences are discussed, with a specific emphasis on artificial neural networks due to their capability to model the nonlinear relationships that are commonly encountered in pharmaceutical research. AI and machine learning technologies in common day-to-day pharma needs as well as industrial and regulatory insights are reviewed. Beyond traditional potentials of implementing digital technologies using machine learning in the development of more efficient, fast, and economical solutions in pharmaceutical sciences are also discussed. There has been a remarkable increase in the amount of data-including pharmaceutical data-that are generated each day. The term "big data" has gained increasing interest in various research areas. In addition, data-driven companies currently show how various industries are able to profit from the massive generation of data. Several definitions have been proposed for the term "big data." One of the widely recognized definitions used is the "4 Vs" definition. The definition was first proposed by Douglas Laney and encompasses "3 Vs" which consist of volume, velocity, and variety (1, 2) . This definition was later extended by IBM to include the fourth "V" for veracity (3) . However, the reported definitions of "big data" usually lack consistency and quantification. Because of its potential value, data has been considered as the new oil (4, 5) . Textbooks and publications, social media, user-generated content, electronic health records, genomics, sensor networks, and many other types of data all form "big data" and contribute to its diversity and complexity. The remarkable increase in the amount of data can be attributed to advancements in data storage and innovative technologies (6) . Almost 2.5 million new scientific papers are published annually (7) . In addition, there were more than 15,000 PubMed-reported publications on "pharmaceutical sciences" in 2019 only (8) . Thus, "big data" in pharmaceutical sciences can be viewed as both a challenge and an opportunity. The evolution of artificial intelligence (AI), particularly machine learning technologies in which computers can "learn" and perform tasks, has improved the potential of using big data in pharmaceutical sciences. The scope of this review is specific to machine learning because, among all AI branches, machine learning is the most currently used AI technology in the field of pharmaceutical sciences. Other AI fields, such as natural language processing (NLP), expert systems, and robotics, are becoming very popular in many healthcare settings, such as in the diagnosis of diseases, patient monitoring, and robotic surgeries (9, 10) . These methods, however, have not yet received as much attention as machine learning in pharmaceutical sciences settings. The aim of this review is to summarize the past, present, and potential future impacts of machine learning on different areas of pharmaceutical sciences, including drug design and discovery, preformulation, and formulation. This review covers different machine learning algorithms that are commonly implemented in different areas of pharmaceutical sciences, with a special emphasis on the use of artificial neural networks (ANNs). Notably, compared with other machine learning methods, ANNs have displayed superior performance in various pharmaceutical settings, as will be discussed in the following sections. Despite its long history, as will be discussed below, there is still no standard definition of AI. However, mimicking human intelligence using computer systems is the basic concept of AI. The physiology and function of neurons in the brain inspired Warren McCulloch and Walter Pitts (1943) to propose a computational model of artificial neurons. Similar to human neurons, artificial neurons are characterized by being "on" or "off" in response to sufficient stimulation from neighboring neurons (11) . The term "artificial intelligence" was officially introduced by John McCarthy at the Dartmouth conference in the summer of 1956 (12) . Since then, AI has had cycles of success as well as so-called "AI winters" (13) . Recently, AI has significantly advanced and gained increasing interest in a wide range of fields, including healthcare (14) , engineering (15) , and transportation (16) . This increased focus on AI applications has been fueled by the growing availability of big data in healthcare and the rapid advancement of numerous analytical techniques (10) . Machine learning is a popular AI technique ( Fig. 1 ) whereby computers can accurately adapt or modify their actions (e.g., making predictions). Machine learning algorithms can be classified into two major categories: supervised learning and unsupervised learning (17) . In supervised learning, the algorithm uses generalizations to respond appropriately to a set of training examples. Training examples are input-output data that are provided in the dataset to be learned. Because the output data here are known to be the correct responses (or correct answers), they are termed as "targets." The machine learning model eventually aims to predict an output that is closer to the target. Examples of supervised machine learning methods include regression analysis, support vector machines (SVMs), random forests (RF), and ANNs. Unsupervised learning is based on feature extraction methods in which no examples are provided (17) , such as principal component analysis (PCA). Some supervised machine learning models may also support unsupervised machine learning models such as SVMs and ANNs (18) . Table I shows a comparison of several machine learning methods commonly used in pharmaceutical research. Linear regression, ANNs, KNN, SVM, DT, and RF are common machine learning methods used in pharmaceutical sciences; PCA is considered as an unsupervised dimensionality reduction technique usually integrated into computing transformation of unlabeled data to find a lowerdimensional set of axes (12) . Although PCT may be considered as a statistical technique used to analyze multidimensional data, it is usually incorporated as a preprocessing tool in machine learning (19) . In addition, there are other machine learning methods used in pharmaceutical sciences such as the fuzzy logic algorithm. In this method, reasoning with logical expressions is used to describe membership in fuzzy sets (12) . This method has the advantage of eliminating the need for expert knowledge regarding the system, considers the noise in the data, and produces easily interpretable predictions (20) . Fuzzy logic algorithm provided good prediction models for analyzing gene expression data (20) . Additionally, genetic algorithm (GA) is a population-based method commonly used as an optimization technique. This algorithm also offers the advantage of modeling nonlinear relationships. In pharmaceutical research, GA is mainly used in quantitative structure-activity relationship (QSAR) studies as a feature selection tool (21, 22) . Recently, there is an emergence of several novel machine learning applications in pharmaceutical settings using non-conventional machine learning techniques such as light gradient boosting machine (lightGBM). This method has offered numerous useful features as compared to the other classic machine learning methods as shown in Table I . Furthermore, an emerging machine learning technique is the transfer learning. Transfer learning is based on reusing a pre-trained model in order to build a new, improved model to address the intended target (23) . In transfer learning, a relatively large dataset size of the original model is an important determinant for optimum transfer learning performance. Important recent progress of using transfer learning has been achieved in the field of pharmaceutical sciences (24) as will be discussed in a following section. Moreover, machine learning models can be classified into two categories: parametric and nonparametric models (Table I) . Parametric models summarize data with a set of constant number of parameters (regardless of the number of training examples), whereas nonparametric models are dependent on the number of parameters and therefore on the number of training examples (12) . Table I summarizes the common parametric and nonparametric machine learning methods encountered in different drug research and development studies. Note that each of these machine learning methods may have further subtypes, and a general comparison among these models can be unfair. For example, although certain machine learning methods may require large datasets, an optimum dataset size is usually lacking. The reader is encouraged to refer to the cited references for details. Additionally, no machine learning method is generally considered superior to all others, and each problem (classification or regression) should be addressed individually. ANNs are biologically inspired computational models that mimic the brain's ability to learn by example (Fig. 2) . Our brains consist of billions of processing units called neurons. These neurons are fully interconnected through an enormous number of synapses that connect one neuron to another (34) . A biological neuron consists of a cell body that contains a nucleus and controls cell activities, dendrites, which compose the fine threads among neurons and carry the information to the cell, and axons, which consist of one long thread that transports information to the next cell (34) . Similar to human neurons, ANNs consist of artificial neurons or processing elements (PEs) that are connected via coefficients (weights) (34) . A typical ANN (Fig. 2) consists of three main structural components: input, hidden, and output layers. The first layer of an artificial neuron is the input layer, which corresponds to the dendrites of the biological neuron and transfers information to the next layer. The following layer is the hidden layer, which is the middle layer between the input layer and the output layer. The hidden layer connects these two layers through certain coefficients (weights). Each hidden layer consists of a number of neurons (also called nodes). The choice of the number of neurons in the hidden layer of ANNs is generally achieved by a trial-and-error approach (35) . Although there is no definite number of neurons to be used, using too few neurons in the hidden layer may result in a reduction in the ANN learning ability, whereas too many neurons in the hidden layer may result in the memorization or overfitting of the training data, ultimately decreasing the generalization ability of the ANN. Thus, the number of hidden neurons in the neural network that will give the highest correlation coefficient (r 2 ) and lowest error (i.e., the minimum difference between observed and predicted values) should be selected as the optimal ANN. The final layer of an artificial neuron is the output layer, which consists of the outputs (targets). Moreover, by examining the magnitude of the ANN connection weights, ANNs can provide quantitative estimates of the relative importance of the input variables for the output in question (36) . Figure 2 illustrates a schematic representation of a typical biological neuron (a) and an ANN (b). The process of designing a neural network that can learn to ultimately solve a problem occurs through iterative use of examples with known answers (targets). This process is called learning or training. The learning/training process as illustrated in Fig. 2 starts with receiving signals (inputs) from the input layer. These inputs are multiplied by connection weights and summed in the hidden layer. The results are then sent to the output layer through a transfer function. Several activation functions are available including identity, logistic, tanh, and exponential functions (17, 37) . The sigmoid function is a commonly used activation function in pharmaceutical research. During neural network learning, a process called "error back-propagation" is usually implemented (38) . In back-propagation, the weights are adjusted to minimize the error between the calculated (predicted) output and the observed (target) output. ANNs are particularly powerful in modeling nonlinear relationships and can make highly accurate predictions due to their ability to analyze complex data primarily based on generalization and pattern recognition (39, 40) . Nevertheless, some challenges with using ANNs can be encountered, such as trapping at local minima, controlling noise, and overfitting/ underfitting. To avoid local minima and control noise, a timeinvariant noise algorithm (TINA) can be implemented (41) . In addition, there are various ways to overcome overfitting/ underfitting problems, including splitting the data into training and validation sets (42) . This technique can reduce overfitting. Moreover, stopping the training process at the right point can also prevent both overfitting and underfitting (17) . Figure 3 illustrates the optimum stopping point for ANN training. DL is a machine learning technique that is also a representation learning method (43) . The state-of-the-art of DL methods includes recent advances in neural networks. The major difference between ANNs and DL is that DL includes larger numbers of hidden layers (usually more than three), and each layer comprises many more nodes. Therefore, DL uses multiple levels of representations that can ultimately learn very complex functions. Generally, DL requires very large training sets, which may limit the use of such methods. There are different types of neural network architectures in DL, including convolutional neural networks (CNNs), recurrent neural networks (RNNs), and fully connected feed-forward networks, which have been comprehensively discussed elsewhere (44) . DL has become very popular and has gained interest in diverse research areas of pharmaceutical research such as in pharmaceutical formulation development (45) , drug discovery (46) , and drug repurposing (47) . Their predictability and generalization performance are generally better than that of other machine learning methods, such as SVMs and RFs (45) . This can be due to the improvements in algorithms, computers, and the availability of large datasets. Specific DL applications in pharmaceutical sciences is an interesting topic for future reviews. Machine learning has been utilized in different pharmaceutical applications from the early stages of drug discovery to the late phases of drug development. The following sections present three major areas of pharmaceutical sciences that have witnessed a considerable use of ANNs together with a number of other machine learning methods. These studies can be categorized into drug design and discovery, preformulation, and formulation studies. Drug discovery accounts for a significant share of the machine learning applications in pharmaceutical sciences, mainly due to the use of high-throughput screening, combinatorial chemistry, and computer-aided drug design (45, 48) . One of the early areas in which ANNs were applied is QSAR studies (49) (50) (51) . The QSAR approach correlates the physicochemical properties of a compound with the corresponding chemical or biological activities (52, 53) . The most commonly used physicochemical properties in QSAR studies include molecular weight, partition coefficient (logP), and hydrogen bonding capacity. Because QSAR studies usually involve complex and nonlinear characteristics, ANNs were among the best available QSAR modeling tools. Additionally, due to their usefulness and success, the importance of neural networks has continued to grow in drug discovery with the rapid rise of QSAR studies based on ANNs (54) . Table II summarizes several input-output data used to build various machine learning models in different QSAR studies. Preformulation is the stage of drug development in which the physicochemical properties of a drug substance are assessed. Determining the physicochemical properties of a drug substance is very important because it governs various parameters, such as its solubility, stability, interaction with excipients, and ultimately, bioavailability (62) . Determining the aqueous solubility of a new drug substance is an essential first step in preformulation. Any drug to be absorbed must possess a certain degree of water solubility. This is true for oral, parenteral, ophthalmic, topical, and other routes of administration. Various solubilization techniques are used to improve the aqueous solubilities of drug substances, such as using surfactant, complexation, salt formation, using hydrotropes, or forming cocrystals (36, 63, 64) . The in silico prediction of the aqueous solubility of drug substances has gained significant interest using different computational approaches, such as molecular dynamics simulations (65) and machine learning techniques (36) . For example, Damiati et al. (2017) developed a machine learning application using ANNs to predict the solubility enhancement effect of several hydrotrope molecules. The input data consisted of experimental data together with 10 physicochemical properties (used as descriptors) related to 10 hydrotrope molecules at different hydrotrope concentrations. The physicochemical properties included logP, melting point, and hydrogen bonding capacity. The developed ANN model was subsequently used to predict the solubility enhancement of another 16 potential hydrotrope molecules from an external dataset. The trained model was also able to identify new prospective hydrotropes for the drug molecule. In addition to providing accurate predictions, by determination of the connection weights, the developed ANN was able to provide a quantitative assessment of the relative importance of various physicochemical properties that are required for a good hydrotrope (36) . The reported use of ANNs in the prediction of solubility enhancements for drug substances and their successful use in other solubility applications in various research areas (66, 67) are encouraging for further exploration of their potential uses in more pharmaceutical preformulation research. Moreover, based on the pharmacokinetic profile of a drug substance, a suitable pharmaceutical formulation can be . Eight molecular descriptors have been used for 1104 approved drug molecules. Descriptors included molecular weight, hydrogen bond donor count, hydrogen bond acceptor count, rotatable bond count, topological polar surface area, heavy atom count, complexity, and covalently bonded unit count. The developed model showed good performance and generalization ability compared to other conventional machine learning techniques including partial least-squares regression (PLSR), SVM, ANNs, RF, and KNN (24) . In preformulation studies, transfer learning is a promising machine learning approach for further exploration. Another stage of drug development is the formulation of pure drug substances into drug products to be administered by patients. ANNs have gained significant interest in this area and became the most popular machine learning tool in pharmaceutical formulation prediction (45) . Table III summarizes numerous pharmaceutical researches that have been performed utilizing ANNs (as the only method used or as an approach that outperformed other machine learning methods) in the area of pharmaceutical formulation development in the past 20 years. This table compares these studies from different machine learning aspects including the diverse input-output data used, amount of data (dataset sizes), input variables, and purpose(s). Notably, a large number of these studies have utilized ANNs for the development and optimization of formulations and the prediction of formulation-and process-related factors associated with different parameters, such as drug dissolution and release. Additionally, the *The top-ranked machine learning methods in each of these studies demonstrated better predictive ability than the other machine learning methods tested. ANN artificial neural network, SVM support vector machine, DT decision tree, RF random forest, KNN K-nearest neighbor, RBFNN radial basis function neural network optimization of formulations (including the optimization of ingredients and/or operating conditions) using machine learning tools-particularly ANNs-has provided considerable success and displayed great promise for future applications that usually require fast and efficient manufacturing. Recently, non-traditional machine learning techniques have been utilized in the development of in silico predictive models in pharmaceutical formulation. LightGBM has recently shown high potential predictive ability compared to conventional machine learning methods in pharmaceutical formulation researches. Zhao et al. (2019) compared lightGBM, RF, and DL for the prediction of complexation free energy between cyclodextrins (CDs) and guest molecules with a dataset consisting of 3000 data points. Over 30 numerous descriptors related to the guest molecule, CD, and experimental conditions have been implemented in designing the machine learning models. LightGBM showed better prediction performance compared to the other models including RF and DL (33) . Gao et al. (2020) also implemented the lightGBM method for prediction of complexation performance of 341 drugs/phospholipid complex formulations described by over 40 molecular descriptors related to the properties of the drugs, solvents, and experimental conditions. Compared with other conventional machine learning techniques such as SVM and DT, lightGBM model showed the best predictive performance for predicting drug/phospholipid complexation (68) . Also, in 2020, He and co-workers used lightGBM to predict the particle size and polydispersity index (PDI) of nanocrystals prepared by different methods. The dataset consisted of 910 experimental size data and 341 PDI data under various conditions and using various API-, stabilizer-, and nanocrystal preparation-related descriptors. The prediction performance of lightGBM was better than that obtained from several classic machine learning methods including deep neural networks (DNN), SVM, and DT for both size and PDI datasets (69) . In all these lightGBM studies, it has been proved that lightGBM is a powerful and promising machine learning technique that can be further explored in the future for various pharmaceutical applications not only for its ability to provide accurate predictions but also due to its capability to provide an informative assessment of the importance of the input descriptors. In terms of applying AI and machine learning technologies in common day-to-day pharma needs, a number of aspects are to be considered including the benefits, risks, and efforts. The benefits of machine learning applications in pharmaceutical sciences are evident. This is true for both the classic machine learning tools such as ANNs as well as for the newly emerging tools such as lightGBM. Accelerating advances across the entire spectrum of the development of drug substances and drug products by dramatically reducing the timeline in unnecessary attempts is a substantial benefit of AI in pharmaceutical settings. This may not only allow for improving outcomes in less time, but it also can help to find more efficient solutions in order to sustain manufacturing efficiency and rapid throughput. In addition, depending upon the therapeutic class, the problem of high drug attrition rates (87) can be reduced. Thus, the high costs associated with drug research and development processes can be significantly reduced if performed in silico using data digitalization and reduced extensive laboratory testing. For instance, considering a real pharmaceutical problem in which substantial efforts are needed is the problem of low aqueous solubilities of drugs. It is estimated that approximately 90% of drug candidates in research and development pipelines are poorly water-soluble (88) . Considering that only small quantities (< 50 mg) of a drug substance exist in early preformulation (62) , determining the baseline solubility and subsequently the optimum solubilization technique for each drug substance may require extensive screening and laboratory work, as well as substantial resources. If well-trained and well-validated machine learning models can be incorporated in such settings, only drug candidates that show positive results in silico may then undergo laboratory testing. Thus, successful drug candidates can ultimately reach the intended patient in less time and with less material waste. Based on the type of data, there is an important advantage of machine learning is that no restrictions are encountered while implementing machine learning algorithms. Different types of data, including binary classification, multiple classes, and continuous data all can be modeled and analyzed by machine learning. Moreover, machine learning models may be used individually or in combination. Compared with traditional statistical models, a number of machine learning technologies (e.g., ANNs) offer the advantage of modeling complex and nonlinear relationships that are frequently encountered in pharmaceutical sciences. Traditional models are usually used to find inference about relationships in the data, whereas machine learning models are designed to model complex relationships which can ultimately produce accurate predictions. For example, the nature of the solubilization effect using hydrotropes is complex, nonlinear, and do not follow a constant pattern (36) . Traditional statistical tool would not be able to provide accurate predictions for the solubilizing effect of these systems, whereas machine learning models not only were able to produce highly accurate predictions, but also proved to be powerful tools that can provide useful insights into the relative importance of the different input features in determining the outputs by interrogation of the connection weights. In addition, the machine learning approach also provided valuable insights that eventually lead to the identification of new prospective solubilizing compounds (36) . The quality of data is one of the challenges that must be considered when using AI and machine learning in the pharmaceutical sciences. Quality encompasses the consistency, reliability, accuracy, availability, and accessibility of the data. The dataset size also should be considered. Small dataset size can be modeled using simple machine learning methods; if the dataset size is large and more complex to be modeled using simple machine learning methods, the advanced ANN models based on DL approach can offer a potential solution. Other challenges that must also be considered include the training/learning time, underfitting, and overfitting. Therefore, the risk of applying unreliable machine learning models can be eliminated if these challenges were appropriately considered, and well-trained and well-validated machine learning models were carefully designed. Hence, digitalizing pharmaceutical data using AI may require domain (80) experience and the ability to train algorithms; each machine learning method should be implemented "task specifically." The pharmaceutical industry would greatly benefit from the use of AI and machine learning, due to its wide range of applications as discussed in this review. From proof of concept to product evaluation and marketing, AI can be applied to nearly every aspect of drug development. With the long-standing figures of an average of $2.6 billion and over 10 years to develop new medicines (89) , AI can offer a substantial investment to hasten and improve this process. In the last 10 years, there is a remarkable growing number of pharmaceutical companies and startups using AI in drug research and development. A number of pharma companies either collaborated with or acquired AI technologies such as Novartis and Pfizer with IBM Watson (90). Mak and Pichika (2019) provided a comprehensive list of AI and pharmaceutical companies and the corresponding collaboration areas in drug development such as drug repurposing, personalized medicine, and drug discovery (91) . Other areas where pharma companies have been actively investigating in AI applications include process automation, robotic manufacturing, and targeted marketing (90) . Investing in data management and AI power can sustain manufacturing efficiency and rapid throughput of data digitalization which is powered by advancing algorithms as well as the availability of the diverse, complex, and large amount of data. AI may, therefore, improve decision-making and eventually create new and better medicines. Nonetheless, it has been reported that AI has not yet influenced the pharma industry significantly due to several reasons/challenges suggested by Henstock (2019) including data management (e.g., managing diversity and large amount of data), finding solutions for a large number of problems, insufficient skillsets, shifting towards alternatives to traditional scientific approaches, and lack of investments. To overcome these challenges, the author also suggested internal investment in data management and AI talent (90) . Mary and co-workers in 2019 conducted a survey-based study to clarify and understand the adoption and effect of AI in pharmaceutical and biotechnology companies. Across 217 organizations, a number of important AI activities have been identified including the use of AI for patient selection and recruitment for clinical trials, in addition to identification of medicinal products data gathering. Major factors for not utilizing AI technology have been identified including lack of skilled staff, safety, regulatory, and compliance concerns, and budget constraints (92) . In terms of regulatory and recommendation insights, Food and Drug Administration (FDA) recently published a discussion paper "Proposed Regulatory Framework for Modifications to Artificial Intelligence/Machine Learning (AI/ML)-Based Software as a Medical Device (SaMD) -Discussion Paper and Request for Feedback" which discusses the current approach made to subject software as a medical device driven by AI and machine learning to a premarket review in order to ensure safety and effectiveness. Several types of changes/modifications which may have an impact on users (including patients, healthcare professionals, and others) using these softwares have been reviewed. For example, changes/modifications related to re-changing the inputs, training with new data sets, and change in the AI/ML architecture. To ensure lifecycle safety and effectiveness from its premarket development to postmarked performance, FDA also proposed a total product lifecycle regulatory approach for AI/ML-Based SaMD (TPLC) to acquire evaluation and monitoring of a software product (93) . The growing success of machine learning technologies, particularly ANNs, in many pharmaceutical settings showed great potentials for the development of beyond traditional machine learning applications. This trend has already begun in areas such as drug and gene delivery. Therapeutic agents are often transported into the cell using special transporter systems such as cell-penetrating peptides (CPPs). The efficiency of CPPs is usually investigated and screened based on extensive laboratory work, which has recently been successfully performed in silico using ANNs. The developed CPPs/ANN model provided highly accurate predictions and informative assessments for 13 different input features (94) . Additionally, drug repurposing also can highly benefit from these technologies (95) . At present, although the first AI-designed drug has not reached the market yet, there is an ongoing race to find a treatment for the current COVID-19 pandemic. AI plays an important role in the ongoing efforts by identifying potential molecules that could be used as anti-COVID-19 drugs. For example, Benevolent AI (96, 97) reported the use of machine learning to identify drugs for COVID-19 in which clinical trials are already underway. Digitalizing pharmaceutical sciences is a very promising area in which numerous AI and machine learning technologies can be discovered and effectively employed. The growing success of machine learning technologies in many pharmaceutical settings shows great potentials for the development of beyond traditional AI applications. In practice, the choice of the machine learning method to be implemented may depend on various factors, including the type of the data and the size of the dataset. Therefore, the choice of which machine learning method should be implemented can be considered task-specific. With a sufficient amount of carefully curated data, building highvalue applications using advancing AI algorithms may become a common practice that has the potential to solve many challenges in drug research and development. It is likely that AI will flourish a new era of digital pharmaceutical sciences with efficient, fast, and economical solutions. Competing Interests The author declares that they have no competing interests. 3D data management: controlling data volume, velocity and variety Big data: a survey Big data clustering with varied density based on MapReduce Data is the new oil. Towards user-centric transport in Smart data for digital humanities The rise of "big data" on cloud computing: review and open research issues The STM report: an overview of scientific and scholarly journal publishing Database resources of the national center for biotechnology information Our initial experience with robotic hepato-biliary-pancreatic surgery Artificial intelligence in healthcare: past, present and future A logical calculus of the ideas immanent in nervous activity Artificial intelligence: a modern approach Long-term trends in the public perception of artificial intelligence A guide to deep learning in healthcare The evolution of citation graphs in artificial intelligence research The key technology toward the selfdriving car Marsland S Machine learning: an algorithmic perspective Machine learning in chemoinformatics and drug discovery Simultaneous quantitative analysis of mebendazole polymorphs A-C in powder mixtures by DRIFTS spectroscopy and ANN modeling A fuzzy logic approach to analyzing gene expression data MaNGA: a novel multi-niche multi-objective genetic algorithm for QSAR modelling Exploration of nitroimidazoles as radiosensitizers: application of multilayered feature selection approach in QSAR modeling Inductive transfer learning for molecular activity prediction: next-gen QSAR models with MolPMoFiT An integrated transfer learning and multitask learning approach for pharmacokinetic parameter prediction Sample sizes when using multiple linear regression for prediction A simple noise-tolerant abstraction algorithm for fast k-nn classification Comparison of decision tree methods for finding active objects Evaluating random forests for survival analysis using prediction error curves Decision trees do not generalize to new variations Independent principal component analysis for biologically meaningful dimension reduction of large biological data sets Random forests and decision trees Machine learning techniques and drug design Predicting complexation performance between cyclodextrins and guest molecules by integrated machine learning and molecular modeling techniques Basic concepts of artificial neural network (ANN) modeling and its application in pharmaceutical research A two phase method for determining the number of neurons in the hidden layer of a 3-layer neural network Application of machine learning in prediction of hydrotropeenhanced solubilisation of indomethacin Help documentations Learning representations by back-propagating errors Artificial neural network in drug delivery and pharmaceutical research What are artificial neural networks? Event-dependent control of noise enhances learning in neural networks Artificial neural network prediction of aerosol deposition in human lungs Deep learning The rise of deep learning in drug discovery Deep learning for in vitro prediction of pharmaceutical formulations Deep neural nets as a method for quantitative structure-activity relationships Deep learning applications for predicting pharmacological properties of drugs and drug repurposing using transcriptomic data Advancing drug discovery via artificial intelligence Neural networks applied to structure-activity relationships Prediction of structural and functional features of protein and nucleic acid sequences by artificial neural networks Basic operating characteristics of neural networks when applied to structure-activity studies Molecular design of flotation collectors: a recent progress Correlation of biological activity of phenoxyacetic acids with Hammett substituent constants and partition coeffi cients Artificial neural networks and genetic algorithms in QSAR Quantitative structure-activity relationship (QSAR) analysis of tumor-specificity of 1, 2, 3, 4-tetrahydroisoquinoline derivatives silico SAR studies of HIV-1 inhibitors QSAR modelling using combined simple competitive learning networks and RBF neural networks Non-linear quantitative structure-activity relationships modelling, mechanistic study and in-silico design of flavonoids as potent antioxidants Quantitative structure-activity relationship (QSAR) study of carcinogenicity of polycyclic aromatic hydrocarbons (PAHs) in atmospheric particulate matter by random forest (RF) Quantitative structure-activity relationship to predict the anti-malarial activity in a set of new imidazolopiperazines based on artificial neural networks Development of quantitative structure-activity relationship models to predict potential nephrotoxic ingredients in traditional Chinese medicines Essentials of pharmaceutical preformulation Solubility advantage of amorphous drugs and pharmaceutical cocrystals Techniques of solubilization of drugs Molecular simulation as a computational pharmaceutics tool to predict drug solubility, solubilization processes and partitioning Artificial neural network models for the prediction of CO2 solubility in aqueous amine solutions Predictions of equilibrium solubility and mass transfer coefficient for CO2 absorption into aqueous solutions of 4-diethylamino-2-butanol using artificial neural networks. Petroleum Predicting drug/ phospholipid complexation by the lightGBM method Can machine learning predict drug nanocrystals? The use of artificial neural networks for the selection of the most appropriate formulation and processing variables in order to predict the in vitro dissolution of sustained release minitablets Simultaneous determination of vitamins C, B6 and PP in pharmaceutics using differential pulse voltammetry with a glassy carbon electrode and multivariate calibration tools Application of artificial intelligent tools to modeling of glucosamine preparation from exoskeleton of shrimp Optimization of salbutamol sulfate dissolution from sustained release matrix formulations using an artificial neural network Creation of a tablet database containing several active ingredients and prediction of their pharmaceutical characteristics based on ensemble artificial neural networks Contribution of the physicochemical properties of active pharmaceutical ingredients to tablet properties identified by ensemble artificial neural networks and Kohonen's self-organizing maps Optimization of controlled release nanoparticle formulation of verapamil hydrochloride using artificial neural networks with genetic algorithm and response surface methodology Chitosan-tripolyphosphate nanoparticles: optimization of formulation parameters for improving process yield at a novel pH using artificial neural networks A novel method for the production of core-shell microparticles by inverse gelation optimized with artificial intelligent tools Comparison of response surface methodology and artificial neural network to optimize novel ophthalmic flexible nano-liposomes: characterization, evaluation, in vivo pharmacokinetics and molecular dynamics simulation Comparative study for optimization of pharmaceutical self-emulsifying pre-concentrate by design of experiment and artificial neural network Development of a new aprepitant liquisolid formulation with the aid of artificial neural networks and genetic programming Application of artificial neural networks for process analytical technology-based dissolution testing An artificial neural network approach to predict the effects of formulation and process variables on prednisone release from a multipartite system Application of interpretable artificial neural networks to early monoclonal antibodies development The application of machine learning algorithms in understanding the effect of core/shell technique on improving powder compactability Predicting physical stability of solid dispersions by machine learning techniques High drug attrition rates-where are we going wrong? Insoluble drug delivery strategies: review of recent advances and business prospects Alzheimer's & Dementia: Translational Research & Clinical Interventions Artificial intelligence for pharma: time for internal investment Artificial intelligence in drug development: present status and future prospects A study on the application and use of artificial intelligence to support drug development Proposed regulatory framework for modifications to artificial intelligence/machine learning (AI/ML)-based software as a medical device (SAMD)-discussion paper and request for feedback Accessed 08 Novel machine learning application for prediction of membrane insertion potential of cell-penetrating peptides Insights into computational drug repurposing for neurodegenerative disease Baricitinib for COVID-19: a suitable treatment?-Authors' reply COVID-19 and artificial intelligence: protecting health-care workers and curbing the spread Publisher's Note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations