key: cord-0709316-b6xandfh
authors: Lee, Ming-Han; Ta, Giang Huong; Weng, Ching-Feng; Leong, Max K.
title: In Silico Prediction of Intestinal Permeability by Hierarchical Support Vector Regression
date: 2020-05-19
journal: Int J Mol Sci
DOI: 10.3390/ijms21103582
sha: ef2a8c7896c1a4095748c3c3fbb8edd5ccc04b46
doc_id: 709316
cord_uid: b6xandfh

The vast majority of marketed drugs are orally administrated. As such, drug absorption is one of the important drug metabolism and pharmacokinetics parameters that should be assessed in the process of drug discovery and development. A nonlinear quantitative structure–activity relationship (QSAR) model was constructed in this investigation using the novel machine learning-based hierarchical support vector regression (HSVR) scheme to render the extremely complicated relationships between descriptors and intestinal permeability that can take place through various passive diffusion and carrier-mediated active transport routes. The predictions by HSVR were found to be in good agreement with the observed values for the molecules in the training set (n = 53, r(2) = 0.93, [Formula: see text] = 0.84, RMSE = 0.17, s = 0.08), test set (n = 13, q(2) = 0.75–0.89, RMSE = 0.26, s = 0.14), and even outlier set (n = 8, q(2) = 0.78–0.92, RMSE = 0.19, s = 0.09). The built HSVR model consistently met the most stringent criteria when subjected to various statistical assessments. A mock test also assured the predictivity of HSVR. Consequently, this HSVR model can be adopted to facilitate drug discovery and development.

Oral administration is the predominant route for medication that can be manifested by the fact that ca. 56% of unique drugs approved by FDA in 2018 were orally administrated [1] . Accordingly, drug absorption is one of critical absorption, distribution, metabolism and excretion, and toxicity (ADME/Tox) factors that should be taken into consideration in the process of drug discovery and development as well as practical applications [2] . For instance, curcumin, which is the major constituent of the spice turmeric (Curcuma longa), has a great beneficial potential in treating cancer, diabetes, osteoarthritis, antianxiety, and even novel coronavirus disease 2019 [3, 4] and yet its practical clinical applications are very limited mainly due to its poor absorption [5] . Clinically, tuberculosis (TB) is one of the leading causes of death globally, especially for HIV/AIDS patients [6] , and the survival of extremely ill TB patients is diminished due to the poor absorption of anti-TB agents [7] .

Drug absorption mainly relies on solubility and intestinal permeability [8] , which is also termed as intestinal absorption [9] , since oral drugs must permeate the gastrointestinal barrier before they can be absorbed by the bodies [9] . In fact, solubility and permeability have been adopted by the biopharmaceutics drug disposition classification system (BDDCS), which suggests that the intestinal permeability rate is closely correlated with the extent of metabolism [10] Nevertheless, intestinal permeability is an extremely complicated process since drugs can pass through the intestinal epithelium to enter blood vessel by active transport as well as passive diffusion, as illustrated by Figure 3 of Dahlgren and Lennernäs [11] . Mechanistically, the active transport can be mediated by two superfamilies expressed in the intestine, namely the influx transporters of the solute carrier (SLC) family and the efflux transporters of the ATP-binding cassette (ABC) family, whereas the passive diffusion can take place through the transcellular and/or paracellular routes [12] .

In addition, the ABC transporters including P-glycoprotein (P-gp, MDR1, ABCB1), breast cancer resistance protein (BCRP, ABCG2), MRP2 (ABCC2), MRP3 (ABCC3), MRP4 (ABCC4), MRP5 (ABCC5), MRP6 (ABCC6), MRP7 (ABCC10), MRP8 (ABCC11), and MRP9 (ABCC12) [13] , and the SLC transporters involving peptide transporter 1 (PepT1, SLC15A1), concentrative nucleoside transporter 1 (CNT1, SLC28A1), concentrative nucleoside transporter 2 (CNT2 SLC28A2), equilibrative nucleoside transporter (ENT2, SLC29A2), organic cation transporters 1 (OCT1, SLC22A1), organic cation/carnitine transporter 1 (OCTN1, SLC22A4), organic cation/carnitine transporter 2 (OCTN2, SLC22A5), monocarboxylate Transporter 1 (MCT1, SLC16A1), organic anion transporting polypeptide 2B1 (OATP2B1, SLC02B1), serotonin transporter (SERT, SLC6A4), and apical sodium-dependent bile acid transporter; (ASBT. SLC10A2) [14] can be found in the intestine. Their expression levels can be different in varied segments of intestine [15, 16] .

Of various in vitro assay systems to measure intestinal permeability, human colorectal adenocarcinoma cells (Caco-2), Madin−Darby canine kidney cells (MDCK), and parallel artificial membrane permeability assay (PAMPA) are commonly used [9] , and they can be affected by factors such as cell line types and cultured conditions. The in situ single-pass intestinal perfusion (SPIP) model is the most prevalent assay [17] that normally measures effective permeability (P eff ) of the gastrointestinal (GI) tract segments, namely duodenum, jejunum, ileum, and colon, in human, rat, and mouse [18] . The parameter P eff , which is expressed as cm/s, can be calculated by

where Q is the perfusion buffer flow rate; C out and C in are the outlet and inlet solute concentrations, respectively; and A represents the surface area within the intestinal segment that can be computed by the radius of the intestinal segment (R) and the length of the perfusion intestinal segment (L) [19] ,

When compared with in vitro assays, in vivo tests provide a closer to real-life environment, but they are costly, time consuming, and sometimes inhumane, and are subjected to discrepancies by a number of factors such as individual differences in intestinal cell surface and epithelial cell integrity [20] , especially they are very sensitive to the animal species because of differences in physiological conditions [21] . More importantly, those factors can make substantial contribution to data inhomogeneity that, in turn, can create paramount obstacles to producing a sound quantitative theoretical model based on the data compiled from the public domain since only homogenous data can produce a good in silico model [22] .

In silico technologies have been seamlessly integrated into the drug discovery and development and they especially provide valuable advantages in ADME/Tox profiling due to their extremely fast throughput and low cost [23] . As such, it is plausible to expect an in silico model that can predict intestinal permeability is very useful. Nevertheless, no sound quantitative structure-activity relationship (QSAR) model has been published to date despite, even though some qualitative studies have been conducted. The scarcity in QSAR model can be plausibly attributed to the lack of consistent and homogenous data in the public domain and, more importantly, the extremely complex process of intestinal permeability (vide supra) since it can take place through various active transport and passive diffusion routes. More specifically, the SLC transporters can enhance the drug uptake into the intestine and hence increase drug absorption, whereas ABC proteins can elevate drug efflux out

Of all molecules selected in this study, 53 and 13 molecules were randomly assigned to the training set and test set, respectively, with a ca. 4:1 ratio. Figure 1 shows the projection of all molecules enrolled in this investigation in chemical space, spanned by the first three principal components (PCs), which characterized 97.7% of the total variable variance. It can be observed that the training samples and test samples showed similar distributions in the chemical space. Furthermore, the high degrees of biological and chemical similarity between both datasets can also be observed in Figure S1 , which displays the histograms of log P eff , molecular weight (MW), log D, log P, hydrogen-bond acceptor (HBA), hydrogen-bond donor (HBD), and polar surface area (PSA) in density form for the training and test molecules. Thus, the unbiased partition of data samples can be ascertained [31] .

It is not trivial to establish the AD of predictive models prior to model development to identify the outliers and exclude them from data collection [32] . There are various versions to define AD [33] . This study adopted the version based on the chemical similarity/dissimilarity using principal component analysis (PCA) to graphically assess the outliers [32] . Those designated outliers, conversely, are very dissimilar from the training samples, as manifested by the fact that they are totally situated outside the perimeter of the training set in the chemical space shown in Figure 1 . In fact, the differences between the outliers and the other samples can be realized by the fact that their surface areas are unanimously more than 600 Å 2 , whereas surface areas of the others are less than 600 Å 2 .

training set and test set, respectively, with a ca. 4:1 ratio. Figure 1 shows the projection of all molecules enrolled in this investigation in chemical space, spanned by the first three principal components (PCs), which characterized 97.7% of the total variable variance. It can be observed that the training samples and test samples showed similar distributions in the chemical space. Furthermore, the high degrees of biological and chemical similarity between both datasets can also be observed in Figure S1 , which displays the histograms of log Peff, molecular weight (MW), log D, log P, hydrogen-bond acceptor (HBA), hydrogen-bond donor (HBD), and polar surface area (PSA) in density form for the training and test molecules. Thus, the unbiased partition of data samples can be ascertained [31] . 

Various SVR models were built based on various descriptor combinations and runtime parameters, and three SVR models, denoted as SVR A, SVR B, and SVR C, were selected to construct the SVR ensemble, which, in turn, was subjected to regression by another SVR to generate the HSVR model. The optimal runtime parameters of SVR A, SVR B, SVR C, and HSVR, are listed in Table S2 .

These three SVR models, which unanimously selected four descriptors with different combinations (Table 1) , were adopted based on their individual performances on the molecules and statistical assessments in the training set and test set. Table S1 lists their predictive log P eff values. Tables 2 and 3 summarize the corresponding statistical assessments of these three SVR models in the training set and test set, respectively. (Table 2) . Furthermore, Figure 2 shows that most of the points predicted by SVR B mostly lie on or are closer to the regression line when compared with SVR A and SVR C. As such, SVR B generated the lowest ∆ Max (0.91), MAE (0.25), s (0.15), and RMSE (0.29) and the largest r 2 parameter (0.83), suggesting that SVR B executed better than SVR A and SVR C for the molecules in the training set. Nevertheless, SVR B produced not only the lowest q 2 CV value (0.02) but also the largest difference between r 2 and q 2 CV (0.81) when subjected to the leave-one-out cross-validation (Table 2) , signifying its high level of overtraining that, in turn, can severely limit its practical application. SVR A, SVR B, and SVR C unanimously gave rise to the miniature <r 2 s > values of 0.02 (Table 2 ) when subjected to the Y-scrambling, and their almost zero values of <r 2 s > apparently depict that there is little chance correlation associated with those SVR models [34] .

The predictions by SVR A, SVR B, and SVR C in the test set are in modest agreement with the experimental values, as shown in Figure 3 , which displays scatter the plot of observed versus the predicted log P eff values by SVR A, SVR B, SVR C, and HSVR for the molecules in the test set. Nevertheless, the mean absolute errors calculated by SVR A, SVR B, and SVR C increase from 0.33, 0.25, and 0.15 in the training set to 0.39, 0.26, and 0.24 in the test set, respectively ( Table 3 ). The other statistical assessments also indicate the performance deteriorations of these three models in the SVRE from the training set to the test set (Tables 2 and 3) . CV (0.02) but also the largest difference between r 2 and 2 CV q (0.81) when subjected to the leave-one-out cross-validation (Table 2) , signifying its high level of overtraining that, in turn, can severely limit its practical application. SVR A, SVR B, and SVR C unanimously gave rise to the miniature < 2 s r > values of 0.02 (Table 2 ) when subjected to the Y-scrambling, and their almost zero values of < 2 s r > apparently depict that there is little chance correlation associated with those SVR models [34] . The predictions by SVR A, SVR B, and SVR C in the test set are in modest agreement with the experimental values, as shown in Figure 3 , which displays scatter the plot of observed versus the predicted log Peff values by SVR A, SVR B, SVR C, and HSVR for the molecules in the test set. Nevertheless, the mean absolute errors calculated by SVR A, SVR B, and SVR C increase from 0.33, 0.25, and 0.15 in the training set to 0.39, 0.26, and 0.24 in the test set, respectively ( Table 3 ). The other statistical assessments also indicate the performance deteriorations of these three models in the SVRE from the training set to the test set (Tables 2 and 3) .

Furthermore, SVR A, SVR B, and SVR C yielded q 2 values of 0.40, 0.67, and 0.73 in the test set, which are smaller than their r 2 counterparts in the training set. Most notably, the difference between r 2 and q 2 evaluated by SVR A was 0.22, indicating the overtraining nature of SVR A that is also confirmed by its 2 , and HSVR (gray square) for the molecules in the test set. The solid line, dashed lines, and dotted lines correspond to the HSVR regression of the data, 95% confidence intervals for the HSVR regression, and 95% confidence interval for the prediction, respectively.

Significant performance decreases can be observed when those SVR models in the ensemble were applied to the molecules in the outlier set, as depicted by the statistical parameters listed in Table 4 . SVR A, SVR B, and SVR C, for instance, gave rise to 2 F1 q values of -0.10, 0.04, and 0.47, respectively, which differ greatly from their r 2 values in the training set (Table 2 ). In addition, they showed larger distances between points and the regression line in the outlier set when compared with their counterparts in the training set, as displayed in Figure 4 . Such substantial variations in performance can be realized by the fact most predictive models are vulnerable to the "mesa effect", which leads to a predictive model executing poorly when applied to extrapolated predictions [35] . Furthermore, SVR A, SVR B, and SVR C yielded q 2 values of 0.40, 0.67, and 0.73 in the test set, which are smaller than their r 2 counterparts in the training set. Most notably, the difference between r 2 and q 2 evaluated by SVR A was 0.22, indicating the overtraining nature of SVR A that is also confirmed by its q 2 F1 , q 2 F2 , q 2 F3 , and CCC.

Significant performance decreases can be observed when those SVR models in the ensemble were applied to the molecules in the outlier set, as depicted by the statistical parameters listed in Table 4 . SVR A, SVR B, and SVR C, for instance, gave rise to q 2 F1 values of −0.10, 0.04, and 0.47, respectively, which differ greatly from their r 2 values in the training set (Table 2 ). In addition, they showed larger distances between points and the regression line in the outlier set when compared with their counterparts in the training set, as displayed in Figure 4 . Such substantial variations in performance can be realized by the fact most predictive models are vulnerable to the "mesa effect", which leads to a predictive model executing poorly when applied to extrapolated predictions [35] . , and HSVR (gray square) for the molecules in the outlier set. The solid line, dashed lines, and dotted lines correspond to the HSVR regression of the data, 95% confidence intervals for the HSVR regression, and 95% confidence interval for the prediction, respectively.

The HSVR model was yielded by the regression of the SVR ensemble based on the predictions of all samples and statistical evaluations in the training set (Table S1 and Table 2 ) and its optimal runtime conditions are listed in Table S2 . HSVR generally executed better than SVR A, SVR B, and SVR C for those training samples, as illustrated by Figure 2 , in which it can be observed that the distances between the predictions by HSVR and regression line generally fall in the range between the largest ones and smallest ones produced by its SVR counterparts in the ensemble. Nevertheless, HSVR predicted better than all of the models in the SVRE in some cases as demonstrated by the prediction of compound 2 (aloin), in which SVR A, SVR B, SVR C, and HSVR yielded absolute residuals of 0.25, 0.20, 0.06, and 0.03, respectively. As such, HSVR statistically functioned better than SVR A, SVR B, and SVR C, as manifested by all parameters listed in Table 2 except MAE values, , and HSVR (gray square) for the molecules in the outlier set. The solid line, dashed lines, and dotted lines correspond to the HSVR regression of the data, 95% confidence intervals for the HSVR regression, and 95% confidence interval for the prediction, respectively.

The HSVR model was yielded by the regression of the SVR ensemble based on the predictions of all samples and statistical evaluations in the training set (Table S1 and Table 2 ) and its optimal runtime conditions are listed in Table S2 . HSVR generally executed better than SVR A, SVR B, and SVR C for those training samples, as illustrated by Figure 2 , in which it can be observed that the distances between the predictions by HSVR and regression line generally fall in the range between the largest ones and smallest ones produced by its SVR counterparts in the ensemble. Nevertheless, HSVR predicted better than all of the models in the SVRE in some cases as demonstrated by the prediction of compound 2 (aloin), in which SVR A, SVR B, SVR C, and HSVR yielded absolute residuals of 0.25, 0.20, 0.06, and 0.03, respectively. As such, HSVR statistically functioned better than SVR A, SVR B, and SVR C, as manifested by all parameters listed in Table 2 except MAE values, which were 0.15 and 0.16 for SVR C and HSVR, respectively. In addition, HSVR gave rise to the largest r 2 value (0.93) as compared with its counterparts in the SVRE. In addition, there is a little chance that HSVR was produced by chance correlation, as manifested by its nearly zero value of <r 2 s > (0.02) [34] . When applied to the molecules in the test set, slight performance decreases can be observed for HSVR. For instance, RMSE increased from 0.17 in the training set to 0.26 in the test set (Tables 2 and 3) . Nevertheless, the parameter ∆ Max declined from 0.50 in the training set to 0.47 in the test set. Figure 3 shows that HSVR performed better than SVR A, SVR B, and SVR C in the test set. The performance dominance of HSVR can be further confirmed by those statistical evaluations listed in Table 3 . For example, SVR A, SVR B, SVR C, and HSVR generated s values of 0.29, 0.24, 0.20, and 0.14, respectively. Similar observation that HSVR gave rise to smaller absolute residuals than its counterparts in the SVRE can also be noted in the test set. The absolute prediction deviation of compound 59, for instance, was 0.04 yielded by HSVR, whereas SVR A, SVR B, and SVR C gave rise to the absolute residuals of 0.35, 0.38, and 0.24, respectively. HSVR normally generated consistent and small errors in both training and test sets, as depicted by those parameters listed in Tables 2 and 3 , when compared with its SVR counterparts in the ensemble. Moreover, HSVR yielded the largest q 2 (0.81) in the test set and the smallest difference between r 2 and q 2 CV (0.09), suggesting that HSVR was well-trained or no overfitting effect was observed because it would otherwise generate a significant difference between r 2 and q 2 or between r 2 and q 2 CV . When applied to the outliers, HSVR even showed more pounced predominance, as indicated by those statistical parameters listed in Table 5 , from which it can be recognized that HSVR generated the largest q 2 values and smallest deviation-related parameters. The superiority of HSVR in the outlier set can be plausibly due to the broad applicability domain encompassed by HSVR as compared with its SVR counterparts in the ensemble and, more importantly, the more robust nature of HSVR makes it more practically useful in real applications [30] . 

X X X Equation (22) N/A a X X † Fulfilled; a Not available. Figure 5 illustrates the scatter plots of the residual vs. the log P eff values predicted by HSVR for the molecules in the training set, test set, and outlier set. It can be conceived that the residuals are approximately evenly allocated on both sides of x-axis along the range of predicted values in all datasets, suggesting that there is no significant systematic error associated with HSVR. The unbiased predictions can be further rendered by its almost negligible average residuals that were 0.00, −0.10, and 0.09 in the training set, test set, and outlier set, respectively (Table S1) . The derived HSVR model was further assessed by combining the most rigorous validation criteria collectively suggested by Golbraikh et al. [36] , Ojha et al. [37] , Roy et al. [38] , and Chirico and Gramatica [39] in the training set, test set, and outlier set (Equations (16)-(22)). The results are listed in Table 5 , from which it can be found that HSVR showed similar high levels of performance in those three datasets. More importantly, HSVR completely fulfilled all validation requirements, suggesting that this predictive model is highly accurate and predictive. The derived HSVR model was further assessed by combining the most rigorous validation criteria collectively suggested by Golbraikh et al. [36] , Ojha et al. [37] , Roy et al. [38] , and Chirico and Gramatica [39] in the training set, test set, and outlier set (Equations (16)- (22)). The results are listed in Table 5 , from which it can be found that HSVR showed similar high levels of performance in those three datasets. More importantly, HSVR completely fulfilled all validation requirements, suggesting that this predictive model is highly accurate and predictive.

Equation (16) X † X X Equation (17) X N/A N/A Equation (18) X X X Equation (19) X X X Equation (20) X X X

To imitate real world challenges, the derived HSVR model was checked by all marketed drugs assayed by Lennernäs [40] , of which seven were also included in this investigation, producing a good way to calibrate the challenging system. However, Lennernäs measured the P eff values by the SPIP in human jejunum in contrast to the compounds enrolled in this study, whose P eff values were obtained using the rat jejunum segment. Thus, those drugs assayed by Lennernäs are not suitable as the test set or second external set since their validation assessments (vide supra) are not applicable to these drugs. The subsequent relationship between both measured systems was initially instituted and checked based on those common seven drugs and the resulted scattered plot is displayed in Figure 6 . The results show that both systems were modestly correlated with each other with an r value of 0.80, suggesting that it is plausible to validate the derived HSVR model by those novel molecules assayed by Lennernäs, which is consistent with the fact that the rat SPIP P eff values can be useful to predict human intestinal permeability [17] .

using the rat jejunum segment. Thus, those drugs assayed by Lennernäs are not suitable as the test set or second external set since their validation assessments (vide supra) are not applicable to these drugs. The subsequent relationship between both measured systems was initially instituted and checked based on those common seven drugs and the resulted scattered plot is displayed in Figure 6 . The results show that both systems were modestly correlated with each other with an r value of 0.80, suggesting that it is plausible to validate the derived HSVR model by those novel molecules assayed by Lennernäs, which is consistent with the fact that the rat SPIP Peff values can be useful to predict human intestinal permeability [17] . Figure 7 displays the tested results of 11 novel drugs. It can be observed that the r value between experimental human log Peff and predicted rat log Peff was 0.79. The negligible difference between both numbers (0.80 vs. 0.79) suggests that HSVR can almost reproduce the experimental observations. Accordingly, this mock challenge by 11 marketed drugs apparently assured the predictivity of HSVR and it is plausible to adopt this HSVR model as a surrogate for preliminary estimation of human intestine permeability in the process of drug discovery and development. Figure 6 . Observed human log P eff versus observed rat log P eff for the molecules in the mock test. The solid line, dashed lines, and dotted lines correspond to the mock test regression of the observed data, 95% confidence interval for the mock test regression, and 95% confidence interval for the observation, respectively. Figure 7 displays the tested results of 11 novel drugs. It can be observed that the r value between experimental human log P eff and predicted rat log P eff was 0.79. The negligible difference between both numbers (0.80 vs. 0.79) suggests that HSVR can almost reproduce the experimental observations. Accordingly, this mock challenge by 11 marketed drugs apparently assured the predictivity of HSVR and it is plausible to adopt this HSVR model as a surrogate for preliminary estimation of human intestine permeability in the process of drug discovery and development. 

Numerous in silico models have been reported to predict intestinal permeability [41] [42] [43] [44] [45] [46] [47] [48] [49] [50] [51] [52] [53] [54] [55] . However, those published models were based on data assayed by different experimental conditions or different measured parameters, and some of them were qualitative models, making the direct comparison between HSVR and those models extremely difficult. In addition, intestine permeability is an extremely complicated process, which can take place through various active transport and passive diffusion routes (vide supra). As such, it is not uncommon to observe that various descriptor Figure 7 . Observed human log P eff versus the rat log P eff predicted by HSVR for the molecules in the mock test. The solid line, dashed lines, and dotted lines correspond to the HSVR regression of the data, 95% confidence interval for the HSVR regression, and 95% confidence interval for the prediction, respectively.

Numerous in silico models have been reported to predict intestinal permeability [41] [42] [43] [44] [45] [46] [47] [48] [49] [50] [51] [52] [53] [54] [55] . However, those published models were based on data assayed by different experimental conditions or different measured parameters, and some of them were qualitative models, making the direct comparison between HSVR and those models extremely difficult. In addition, intestine permeability is an extremely complicated process, which can take place through various active transport and passive diffusion routes (vide supra). As such, it is not uncommon to observe that various descriptor combinations associated with intestinal permeability have been identified. For example, Shultz proposed the significance of HBD, topological polar surface area (TPSA), and log P in intestinal permeability [56] , whereas Broccatelli et al. recognized the contributions of TPSA, MW, HBD, number of rotamers (n rot ), charge, and fraction ionized at pH 7.4 (cFI 7.4 ) to intestinal permeability [57] .

Drugs must pass through the hydrophobic mucus layer, which is adjacent to the intestinal wall, before they can penetrate through the intestinal epithelial cells [58] . As such, hydrophobicity is of critical importance in intestinal permeability and it can be represented by the n-octanol-water partition coefficient (log P) and the n-octanol-water partition coefficient at pH 6.5 (log D). Moreover, it was proposed by Balimane et al. that log P and log D should be adopted to predict the intestinal permeability since log P alone is not sufficient enough to accurately render this complicated process [9] . As such, both log P and log D were adopted by this study (Table 1) . However, the selection of both descriptors can plausibly lead to an overtrained model since the correlation coefficient between log P and log D was 0.73 for all molecules included in this study. This controversial issue can be eliminated by the fact that log D was adopted by SVR A and SVR B, whereas log P was selected by SVR C, depicting the fact that no single SVR model included both descriptors simultaneously. In fact, this dilemma of selecting both correlated descriptors to accurately predict intestinal permeability cannot be resolved by any other traditional linear or machine learning-based QSAR schemes but only by any ensemble-based scheme such as HSVR.

It has been observed that PSA is profoundly implicated in membrane permeability in passive diffusion [59] , which is completely consistent with the PAMPA study [1] as well as intestinal permeability [56] . In addition, permeability also relies on MW, as proposed in [13] . Nevertheless, neither PSA nor MW was adopted by any of the SVR models in the ensemble (Table 1) . Conversely, it is seemingly unusual to observe that the descriptor n N+O was selected by SVR A and yet has hitherto not been adopted by any reported study. These discrepancies can be realized by the fact that n N+O was modestly correlated with PSA and MW with r values of 0.88 and 0.71, respectively, for all molecules selected in this study. The empirical observation indicated that models with the selection of n N+O performed better than those with the selection of PSA or MW (data not shown), plausibly due to the descriptor-descriptor interaction [1] , suggesting that it is plausible to represent PSA or MW by n N+O . The negative correlation between n N+O and log P eff (−0.29) is also consistent with the fact that permeability can decrease with MW [60] .

It has been postulated that hydrogen bond, which can be characterized by HBA and HBD, plays a critical role in intestinal P-gp-mediated transport [61] and HBD makes substantial contributions to intestinal permeability when compared with its HBA counterpart [56] . Accordingly, HBD was adopted in this study. Nevertheless, the relationship between HBD and P eff is seemingly obscure, as illustrated by Figure 8 , which shows the average P eff for each histogram bin of HBD for all molecules included in this investigation. This peculiar relationship can be plausibly attributed to the fact that the substrates of PepT1 and P-gp, which are the most abundant SLC and ABC transporters, respectively, in jejunum [15] , can interact with their transporter proteins via HBD [25, 26] . The complexity can be further increased by taking into the account the fact that P-gp inhibitors, modulators, and substrates can interact with P-gp through HBD [26, 62, 63] . As such, HBD can simultaneously facilitate and hinder intestinal permeability, leading to a perplexing dependency, which, in turn, can create an unsurmountable hurdle for creating a predictive theoretical model regardless of traditional linear or machine learning-based schemes.

respectively, in jejunum [15] , can interact with their transporter proteins via HBD [25, 26] . The complexity can be further increased by taking into the account the fact that P-gp inhibitors, modulators, and substrates can interact with P-gp through HBD [26, 62, 63] . As such, HBD can simultaneously facilitate and hinder intestinal permeability, leading to a perplexing dependency, which, in turn, can create an unsurmountable hurdle for creating a predictive theoretical model regardless of traditional linear or machine learning-based schemes. Shadow-ν is a size-related descriptor which measures the ratio of largest to smallest dimension. It can be observed in Figure 9 , which displays the average Peff for each histogram bin of shadow-ν, that Peff generally increased with shadow-ν for all molecules selected in this investigation, suggesting that molecules with larger shadow-ν have higher permeability than their smaller counterparts. Shadow-ν is a size-related descriptor which measures the ratio of largest to smallest dimension. It can be observed in Figure 9 , which displays the average P eff for each histogram bin of shadow-ν, that P eff generally increased with shadow-ν for all molecules selected in this investigation, suggesting that molecules with larger shadow-ν have higher permeability than their smaller counterparts. It has been observed that molar refractivity (MR), which is possibly associated with molecular size, polarity, and/or polarizability [64] , is closely related to ligand-P-gp interactions [65, 66] . Nevertheless, little correlation manifested between MR and log Peff for all molecules enrolled in this study, with an insignificant r value of -0.12 (Table 1) . This incongruity can be resolved by the nonlinearity between MR and Peff, as demonstrated in Figure 10 , which illustrates the average Peff for each histogram bin of MR. It can be observed that Peff marginally increased with MR and substantially decreased afterwards, suggesting the nonlinear relationship between MR and Peff. Thus, linear It has been observed that molar refractivity (MR), which is possibly associated with molecular size, polarity, and/or polarizability [64] , is closely related to ligand-P-gp interactions [65, 66] . Nevertheless, little correlation manifested between MR and log P eff for all molecules enrolled in this study, with an insignificant r value of −0.12 (Table 1) . This incongruity can be resolved by the nonlinearity between MR and P eff , as demonstrated in Figure 10 , which illustrates the average P eff for each histogram bin of MR. It can be observed that P eff marginally increased with MR and substantially decreased afterwards, suggesting the nonlinear relationship between MR and P eff . Thus, linear models cannot properly render such a complicated relationship. It has been observed that molar refractivity (MR), which is possibly associated with molecular size, polarity, and/or polarizability [64] , is closely related to ligand-P-gp interactions [65, 66] . Nevertheless, little correlation manifested between MR and log Peff for all molecules enrolled in this study, with an insignificant r value of -0.12 (Table 1) . This incongruity can be resolved by the nonlinearity between MR and Peff, as demonstrated in Figure 10 , which illustrates the average Peff for each histogram bin of MR. It can be observed that Peff marginally increased with MR and substantially decreased afterwards, suggesting the nonlinear relationship between MR and Peff. Thus, linear models cannot properly render such a complicated relationship. The significance of the descriptor µ in intestinal permeability has been recognized [67] since µ can describe the solute-solute and solute-solvent dipole interactions [68] , as demonstrated in PAMPA permeability [1] , leading to nonlinear relationship between µ and permeability. In addition, it has been observed that ligands can interact with the efflux transporter P-gp and the influx transporter PepT1 through dipole interactions [69] [70] [71] , giving rise to the complex role played by µ in intestinal permeability.

It is of interest that most of descriptors selected in this study are associated with passive diffusion, which is consistent with the fact that passive diffusion is the major route for intestinal permeability for many administrated drugs [12] . Additionally, MR, shadow-ν, and n N+O , which was selected in place of MW in this study (vide supra), are closely linked to molecular size, and the molecular size is a determining barrier factor in intestinal permeability as postulated [72] [73] [74] .

CSA, which is also another characteristic associated with molecular size, has been implicated in membrane permeability [75] . Figure 11 exhibits the average P eff for each histogram bin of CSA. It can be observed that P eff did not show substantial variations with CSA initially, yet the P eff value greatly dropped once CSA was larger than 75, which is very similar to the trend observed for MR (Figure 10 ), suggesting that it is more difficult to penetrate the intestinal wall once the CSA values are larger than a threshold. Nevertheless, the empirical observation has indicated that HSVR with the selections of MR executed better than those with the selection of CSA (data not shown), presumably because of the descriptor-descriptor interaction [1] .

CSA, which is also another characteristic associated with molecular size, has been implicated in membrane permeability [75] . Figure 11 exhibits the average Peff for each histogram bin of CSA. It can be observed that Peff did not show substantial variations with CSA initially, yet the Peff value greatly dropped once CSA was larger than 75, which is very similar to the trend observed for MR ( Figure  10 ), suggesting that it is more difficult to penetrate the intestinal wall once the CSA values are larger than a threshold. Nevertheless, the empirical observation has indicated that HSVR with the selections of MR executed better than those with the selection of CSA (data not shown), presumably because of the descriptor-descriptor interaction [1] . It has been indicated that ion class is one of critical factors in physiological-based pharmacokinetic (PBPK) models and ADME/Tox properties that should be taken into account [20, 76] . Actually, it has been demonstrated that neutral compounds show higher passive diffusion [1] . Accordingly, all molecules enrolled in this investigation were subjected to ion class analysis. Figure  12 displays the box plot of the log Peff minimum, maximum, mean, median, the 25th percentile, and the 75th percentile for each ion class. The log Peff values of neutral compounds are statistically greater than the other ion classes, depicting that neutral compounds show higher intestinal permeability. It is possible to improve the compound's intestinal permeability of the other ion classes by chemical modification to produce neutral compounds when they show low intestinal permeability. It has been indicated that ion class is one of critical factors in physiological-based pharmacokinetic (PBPK) models and ADME/Tox properties that should be taken into account [20, 76] . Actually, it has been demonstrated that neutral compounds show higher passive diffusion [1] . Accordingly, all molecules enrolled in this investigation were subjected to ion class analysis. Figure 12 displays the box plot of the log P eff minimum, maximum, mean, median, the 25th percentile, and the 75th percentile for each ion class. The log P eff values of neutral compounds are statistically greater than the other ion classes, depicting that neutral compounds show higher intestinal permeability. It is possible to improve the compound's intestinal permeability of the other ion classes by chemical modification to produce neutral compounds when they show low intestinal permeability. Initially, massive attempts were made in this investigation to construct various 2-QSAR models by adopting numerous partial least squares (PLSs), but no acceptable models were yielded (data not shown) [29] . This challenge was due to little correlation between the selected descriptors and log Peff for those molecules selected in this study and the largest absolute maximum r was merely 0.29 between nN+O and log Peff (Table 1) , depicting the highly nonlinear relationship between them. More importantly, the substantial difference in 2-QSAR development between the passive diffusion, viz. the PAMPA system, and intestinal permeability can be greatly attributed to the significant and complex roles played by those active (influx and efflux) transporters. As such, it is extremely difficult to build a sound linear model to predict intestinal permeability. Conversely, the accurate and predictive HSVR model can considerably delineate such nonlinear dependence of log Peff on Figure 12 . Box plot of log P eff values for different ion classes, where the boxes indicate the distribution of log P eff from the 25th to the 75th percentile, the black and red lines represent the median and mean values, the whiskers depict the minimum and maximum values, and the asterisk denotes significant difference between neutral and the others (p < 0.05).

Initially, massive attempts were made in this investigation to construct various 2-QSAR models by adopting numerous partial least squares (PLSs), but no acceptable models were yielded (data not shown) [29] . This challenge was due to little correlation between the selected descriptors and log P eff for those molecules selected in this study and the largest absolute maximum r was merely 0.29 between n N+O and log P eff (Table 1) , depicting the highly nonlinear relationship between them. More importantly, the substantial difference in 2-QSAR development between the passive diffusion, viz. the PAMPA system, and intestinal permeability can be greatly attributed to the significant and complex roles played by those active (influx and efflux) transporters. As such, it is extremely difficult to build a sound linear model to predict intestinal permeability. Conversely, the accurate and predictive HSVR model can considerably delineate such nonlinear dependence of log P eff on descriptors.

A comprehensive literature search was carried out to retrieve in vivo permeability data from a variety of sources to construct quality data for this investigation. Of various assay systems, 74 compounds, which were measured by SPIP in rat jejunum with pH 6.5 phosphate buffered saline (PBS), were adopted from various sources . The mean value was taken to assure better consistency if there were more than one P eff values for a given compound within close range. Chemical structures without defined stereochemistry (e.g., racemates) were discarded from the collection. All molecules included in this study, IUPAC names, SMILES strings, CAS registry numbers, logarithm of observed P eff values, and references to the literature are listed in Table S1 .

All molecules selected in this study were subjected to full geometry optimization using the density functional theory (DFT) B3LYP method with the basis set 6-31G(d,p) by the Gaussian package (Gaussian, Wallingford, CT) since it has been demonstrated elsewhere that predictive models with the selection of DFT descriptors can perform better [29] . To mimic the assay conditions, the water solvent system was considered by the polarizable continuum model (PCM) [99, 100] . The minimum of optimized geometry on the potential energy surface was verified by force calculations in case no imaginary frequency was obtained. Furthermore, atomic charges were also determined by the molecular electrostatic potential-based method of Merz and Kollman [101] . The highest occupied molecular orbital energy (E HOMO ), lowest unoccupied molecular orbital energy (E LUMO ), free energy (∆G), molecular dipole (µ), and its maximum absolute components (|µ| Max ) of each molecule were also retrieved from the optimization calculations.

More than 200 one-, two-, and three-dimensional molecular descriptors, which can be categorized as electronic descriptors, spatial descriptors, structural descriptors, thermodynamic descriptors, topological descriptors, and E-state indices, were evaluated by the Discovery Studio package (BIOVIA, San Diego, CA, USA) and E-Dragon (available at the website: http://www.vcclab.org/lab/edragon/). Additionally, the cross-sectional area (CSA) was also computed using the method modified by Muehlbacher et al. since it was implicated in membrane permeability [102] . Molecules were further placed into four classes, namely neutral, zwitterion, acid, and base, by their pK a values. Specifically, molecules with only one pK a value are defined as neutrals, whereas those with more than one pK a value are designated as zwitterions, acids, and bases when their largest pK a values are larger than 7 and the smallest pK a values are smaller than 7; the largest and smallest pK a values are smaller than 7; and the largest and smallest pK a values are larger than 7, respectively.

Descriptor filtration was initiated by discarding those descriptors missing for at least one sample or barely displaying discrimination against most of samples. It was suggested by Topliss and Edwards that the probability of spurious correlations can be reduced by removing those descriptors with intercorrelation values of r 2 > 0.80 [103] . Accordingly, the Spearman's matrix was constructed and a more conservative threshold of r 2 0.64 was adopted. Descriptor normalization, which was designated to reduce the possibility of descriptors with broader ranges outweighing those with narrower ranges, was implemented by

where x ij and χ ij present the jth original and normalized descriptors of the ith sample, respectively;

x j is the mean value of the original jth descriptor; and n is the number of samples.

Descriptors are one of the critical factors influencing the performance of predictive models [104] . The initial descriptor selection was executed by genetic function approximation (GFA) using the QSAR module of Discovery Studio due to its effectiveness and efficiency [105] . Further selection was done by the recursive feature elimination (RFE) scheme, in which the model was repeatedly built by all but one of descriptors. Their contributions to the predictive performance were then evaluated, and the descriptor with the smallest contribution was discarded [106] .

It is of necessity to identify the outliers prior to the model development, which can be done by examining the molecular distribution in the chemical space [107] . As such, the chemical space was initially constructed based on principal components (PCs) using the Diverse Molecules/Principal Component Analysis module of Discovery Studio and the outliers were determined. The remaining compounds were randomly divided into the training set and test set with a ca. 4:1 ratio as proposed to build and validate the models, respectively [108] , using the Diverse Molecules/Library Analysis module of Discovery Studio. In addition, it was suggested by Golbraikh et al. that only high levels of chemical and biological similarity between the training samples and test samples can develop a sound model [36] . As such, the data distribution was carefully examined to guarantee the high levels of biological and chemical similarity in both datasets.

Hierarchical support vector regression (HSVR), which was originally proposed by Leong et al. [27] , was evolved from support vector machine (SVM), which was originally invented by Vapnik et al. [109] . Initially, SVM was devised for classification and then modified for regression, termed support vector regression (SVR) [110] . The most distinguished difference between SVR and HSVR is that the latter can simultaneously take into account the characteristics of a global model, viz. broader applicability domain (AD), and a local model, viz. higher predictivity, as compared with the former [28] . More importantly, it has been demonstrated by a number of studies that predictive models based on the HSVR scheme perform extremely well [1, [26] [27] [28] .

The principle and implementation of HSVR has been detailed elsewhere and the architecture of HSVR schemeis depicted in Figure 1 of Leong et al. [27] . Briefly, SVR models were developed by the svm-train module in LIBSVM (software available at: http://www.csie.ntu.edu.tw/~cjlin/libsvm) using those samples in the training set with various descriptor combinations and SVR run parameters. The derived models were validated by svm-predict using those samples in the test set. The regression functions, namely ε-SVR and γ-SVR, were tried, and radial basis function (RBF) was selected as the kernel function due to its simplicity and greater performance [111] . The runtime parameters including regression modes ε-SVR and ν-SVR, the corresponding ε and ν, cost C, and the kernel width γ were automatically run by the systemic grid search algorithm in a parallel fashion.

The principle of Occam's razor was applied to the construction of SVR ensemble (SVRE), which suggests that the fewer the ensemble members, the better [112] . More specifically, two SVR models were selected to develop an SVRE, which was further subjected to regression by another SVR to give rise to the final HSVR model. The construction of two-member SVREs was repeated until the HSVR model executed well. Otherwise, three-or even four-member ensembles would be built by adding one or more SVR models, respectively, in the case all two-member ensembles failed to show good performance.

The difference between the observed value (y i ) and the predicted value (ŷ i ), viz. the residual, was calculated

The root mean square error (RMSE) and the mean absolute error (MAE) for n samples in the dataset were computed

The developed models were further assessed by the correlation coefficients r 2 and q 2 in the training set and external set, respectively, by the following equation

where ŷ represents the average predicted value. Additionally, Ojha et al. [37] also proposed various modified versions of r 2 to gauge the model performance

r 2 m = r 2 m + r 2 m /2 (10)

where the correlation coefficient r 2 o and the slope k were derived from the regression line (predicted vs. observed values) without the intercept, whereas r 2 o was calculated from the regression line (observed vs. predicted values) without the intercept.

The developed model was further subjected to internal validation using the leave-one-out cross-validation scheme, producing the correlation coefficient q 2 CV . Furthermore, Y-scrambling or Y-randomization was applied by randomly permuting the log P eff values, viz. Y values, to refit the previously derived models while the descriptors remained unchanged. This procedure was repeated 25 times, as suggested in [34] , to give rise the average correlation coefficient r 2 s to ensure no chance correlation was associated with those derived models.

Additionally, the predictivity in the external dataset was evaluated by the correlation coefficients q 2 F1 , q 2 F2 , and q 2 F3 , and concordance correlation coefficient (CCC) using QSARINS [32, 113] 

where n TR and n EXT are the numbers of samples in the training set and external set, respectively; ŷ TR stands for the average predicted value in the training set; and y EXT and ŷ EXT represent the average observed and predicted values in the external set, respectively.

Finally, the predictivity of developed models were further assessed by combining the most stringent criteria collectively suggested by Golbraikh et al. [36] , Ojha et al. [37] , Roy et al. [38] , and Chirico and Gramatica [39] r 2 , q 2 CV , q 2 , q 2 Fn ≥ 0.70 (16) r 2 − q 2 CV < 0.10 (17) r 2 − r 2 o /r 2 < 0.10 and 0.85 ≤ k ≤ 1.15 (18)

r 2 m ≥ 0.65 (20) r 2 m ≥ 0.65 and ∆r 2 m < 0.20 (21) CCC ≥ 0.85 (22) where r in Equations (18)-(21) denotes r and q in the training set and external set, respectively. q Fn in Equation (16) symbolizes q 2 F1 , q 2 F2 , and q 2 F3 .

Intestinal permeability plays a pivotal role in systemic drug absorption that, in turn, can be of critical importance to drug efficacy. As such, intestinal permeability is one of the important drug metabolism and pharmacokinetics parameters that should be assessed in the process of drug discovery and development. An in silico model to predict the intestinal permeability can be of great value to drug discovery and development. Nevertheless, intestinal permeability is an extremely complex process that can take place through various routes, namely passive diffusion and carrier-mediated active transport. thus distinct descriptor combinations as well as various relationships are needed to depict these variations in different mechanisms. The novel machine learning-based HSVR scheme, which concurrently possesses the advantageous characteristics of a local model and a global model, namely larger coverage of applicability domain and higher degree of predictivity, respectively, was adopted in this investigation to construct a QSAR model to predict the intestinal permeability. The constructed HSVR showed great prediction accuracy for the molecules in the training set and test set, respectively, with superior predictivity and statistical significance. When subjected to a mock test by a group of molecules to mimic real challenges, the built HSVR model also performed equivalently well. In addition, the selected descriptors can render those interactions associated with passive diffusion and active transport. Accordingly, it can be asserted that this HSVR model can be utilized as an accurate and dependable predictive tool, even in the high throughput environment, to facilitate drug discovery and development by predicting the intestinal permeability of hit and lead compounds even when they are virtual. Additionally, the development of HSVR also paves the way to create more in silico models to predict oral absorption, drug stability in stomach, and bioavailability in the future.

The following are available online at http://www.mdpi.com/1422-0067/21/10/3582/s1, Table S1 . Selected compounds for this study; their names, SMILES strings, CAS numbers, and observed log P e values; their predicted values by SVR A, SVR B, HSVR, and PLS; data partitions; and references; Table S2 . Optimal runtime parameters for the SVR models; Figure Acknowledgments: Parts of calculations were performed at the National Center for High-Performance Computing, Taiwan. The authors are grateful to Paola Gramatica for providing the free license of QSARINS.

The authors declare no conflict of interest.

Quantitative structure-activity relationship HSVR Hierarchical support vector regression ADME/Tox Absorption, distribution, metabolism and excretion, and toxicity BDDCS Biopharmaceutics drug disposition classification system SLC Solute carrier ABC ATP-binding cassette Caco-2

Colorectal adenocarcinoma MDCK Madin-Darby canine kidney PAMPA Parallel artificial membrane permeability assay SPIP Single-pass intestinal perfusion

In Silico Prediction of PAMPA Effective Permeability Using a Two-QSAR Approach

Property-based design: Optimization of drug absorption and pharmacokinetics

Turmeric (Curcuma longa) and its major constituent (curcumin) as nontoxic and safe substances: Review

Potential inhibitor of COVID-19 main protease (M pro ) from several medicinal plant compounds by molecular docking study

Pharmacokinetics, pharmacodynamics, and pkpd modeling of curcumin in regulating antioxidant and epigenetic gene expression in healthy human volunteers

Costs of tuberculosis screening among inpatients with HIV

The critically ill patient with tuberculosis in intensive care: Clinical presentations, management and infection control

Prediction of oral Bioavailability in Rats: Transferring insights from in vitro correlations to (deep) machine learning models using in silico model outputs and chemical structure parameters

Current methodologies used for evaluation of intestinal permeability and absorption

Predicting drug disposition via application of BCS: Transport/absorption/elimination interplay and development of a biopharmaceutics drug disposition classification system

Intestinal permeability and drug absorption: Predictive experimental, computational and in vivo approaches

Coexistence of passive and carrier-mediated processes in drug transport

Computational Absorption Prediction

Regional DISTRIBUTION of solute carrier mRNA expression along the human intestinal tract

Variability in mRNA expression of ABCand SLC-transporters in human intestinal cells: Comparison between human segments and Caco-2 cells

Regional levels of drug transporters along the human intestinal tract: Co-expression of ABC and SLC transporters and comparison with Caco-2 cells

Predicting human intestinal permeability using single-pass intestinal perfusion in rat

Intestinal Perfusion Methods for Oral Drug Absorptions

Quantitative mechanistic studies in simultaneous fluid flow and intestinal absorption using steroids as model solutes

Predicting drug extraction in the human gut wall: Assessing contributions from drug metabolizing enzymes and transporter proteins using preclinical models

Oral delivery of highly lipophilic, poorly water-soluble drugs: Self-emulsifying drug delivery systems to improve oral absorption and enable high-dose toxicology studies of a cholesteryl ester transfer protein inhibitor in preclinical species

QSAR modeling: Where have you been? Where are you going to?

In silico tools for drug absorption prediction

Targeting intestinal transporters for optimizing oral drug absorption

Bioavailability through PepT1: The role of computer modelling in intelligent drug design

Theoretical prediction of the complex P-glycoprotein substrate efflux based on the novel hierarchical support vector regression scheme

Prediction of human cytochrome p450 2b6-substrate interactions using hierarchical support vector regression approach

Predicting mutagenicity of aromatic amines by various machine learning approaches

In silico prediction of the mutagenicity of nitroaromatic compounds using a novel two-QSAR approach

Robust estimates, residuals, and outlier detection with multiresponse data

Averaged shifted histogram

QSARINS-chem: Insubria datasets and new QSAR/QSPR models for environmental pollutants in QSARINS

How to judge whether QSAR/read-across predictions can be trusted: A novel approach for establishing a model's applicability domain

y-Randomization and its variants in QSPR/QSAR

Using neural networks: Hybrid expert networks

Rational selection of training and test sets for the development of validated QSAR models

Further exploring r m 2 metrics for validation of QSPR models

Comparative studies on some metrics for external validation of QSPR models

Real external predictivity of QSAR models. Part 2. New intercomparable thresholds for different validation criteria and the need for scatter plot inspection

Regional intestinal drug permeation: Biopharmaceutics and drug development

Evaluation of human intestinal absorption data and subsequent derivation of a quantitative structure-activity relationship (QSAR) with the Abraham descriptors

Comparative QSAR studies on PAMPA/modified PAMPA for high throughput profiling of drug absorption potential with respect to Caco-2 cells and human intestinal absorption

Predictive human intestinal absorption QSAR models using bayesian regularized neural networks

Rate-limited steps of human oral absorption and QSAR studies

In silico adme modeling 3: Computational models to predict human intestinal absorption using sphere exclusion and kNN QSAR methods

Consensus hologram QSAR modeling for the prediction of human intestinal absorption

Predicting human intestinal absorption of diverse chemicals using ensemble learning based QSAR modeling approaches

The survey of the use of qsar methods to determine intestinal absorption and oral bioavailability during drug design

Exploration of linear modelling techniques and their combination with multivariate adaptive regression splines to predict gastro-intestinal absorption of drugs

Ionization-specific analysis of human intestinal absorption

Combinatorial QSAR modeling of human intestinal absorption

The impact of training set data distributions for modelling of passive intestinal absorption

SAR for gastro-intestinal absorption and blood-brain barrier permeation of pesticides

Predicting human intestinal absorption with modified random forest approach: A comprehensive evaluation of molecular representation, unbalanced data, and applicability domain issues

Prediction of human intestinal absorption of compounds using artificial intelligence techniques

Two decades under the influence of the rule of five and the changing properties of approved oral drugs

Aliagas, I. Predicting passive permeability of drug-like molecules from chemical structure: Where are we?

Effect of mucus layer on the transcellular absorption of lipophilic drugs in rat small intestine

Fast calculation of molecular polar surface area as a sum of fragment-based contributions and its application to the prediction of drug transport properties

Improving drug candidates by design: A focus on physicochemical properties as a means of improving compound disposition and safety

How hydrogen bonds impact P-glycoprotein transport and permeability

Toward a better pharmacophore description of p-glycoprotein modulators, based on macrocyclic diterpenes from euphorbia species

Prediction of promiscuous P-glycoprotein inhibition using a novel machine learning scheme

Classification analysis of P-glycoprotein substrate specificity

A Combined hansch/free-wilson approach as predictive tool in qsar studies on propafenone-type modulators of multidrug resistance

Synthesis and in vitro multidrug resistance modulating activity of a series of dihydrobenzopyrans and tetrahydroquinolines

Theoretically-derived molecular descriptors important in human intestinal absorption

Recent advances in computational prediction of drug absorption and permeability in drug discovery

Structure-activity relationships for xenobiotic transport substrates and inhibitory ligands of P-glycoprotein

Synthesis and DFT study on hantzsch reaction to produce asymmetrical compounds of 1,4-dihydropyridine derivatives for P-glycoprotein inhibition as anticancer agent

Crystal structure of a prokaryotic homologue of the mammalian oligopeptide-proton symporters, PepT1 and PepT2

Studies with novel Pdr5p substrates demonstrate a strong size dependence for xenobiotic efflux

Functional role of P-glycoprotein in limiting intestinal absorption of drugs: Contribution of passive permeability to P-glycoprotein mediated efflux transport

Effect of size and charge on the passive diffusion of peptides across Caco-2 cell monolayers via the paracellular pathway

In silico prediction of blood-brain barrier permeation using the calculated molecular cross-sectional area as main parameter

Impact of ion class and time on oral drug molecular properties

Correlation between epithelial cell permeability of cephalexin and expression of intestinal oligopeptide transporter

Grapefruit juice and its constituents augment colchicine intestinal absorption: Potential hazardous interaction and the role of P-glycoprotein

Prediction of passive transport of 10 natural products by liposome-based fluorescence technique

Comparison between permeability coefficients in rat and human jejunum

Simultaneous determination of procaspase activating compound 1 and permeability markers in intestinal perfusion samples and application to a rat intestinal absorption study

Comparison of the permeability of metoprolol and labetalol in rat, mouse, and Caco-2 cells: Use as a reference standard for BCS classification

Interaction of dipeptide prodrugs of saquinavir with multidrug resistance protein-2 (MRP-2): Evasion of MRP-2 mediated efflux

In situ intestinal permeability and in vivo absorption characteristics of olmesartan medoxomil in self-microemulsifying drug delivery system

The suitability of an in situ perfusion model for permeability determinations: Utility for BCS class I biowaiver requests

Rat jejunal permeability and metabolism of µ-selective tetrapeptides in gastrointestinal fluids from humans and rats

Concentration-and region-dependent intestinal permeability of fluvastatin in the Rat

Surface activity and concentration dependent intestinal permeability in the rat

Absorption and metabolism of flavonoids in the caco-2 cell culture model and a perused rat intestinal model

In vitro and in vivo studies of lipid-based nanocarriers for oral N3-o-toluyl-fluorouracil delivery

In-situ intestinal rat perfusions for human Fabs prediction and BCS permeability class determination: Investigation of the single-pass vs. the doluisio experimental approaches

Site of drug absorption after oral administration: Assessment of membrane permeability and luminal concentration of drugs in each segment of gastrointestinal tract

Determination of site of absorption of propranolol in rat gut using in situ single-pass intestinal perfusion

Intestinal permeability study of minoxidil: Assessment of minoxidil as a high permeability reference drug for biopharmaceutics classification

Intestinal permeability of lamivudine using single pass intestinal perfusion

Paclitaxel disposition studies using P-Gp inhibtor & inducer by single pass intestinal perfusion in rats

metabolism and transport of d-Asp (OBzl)-Ala-a model prodrug with affinity for the oligopeptide transporter

Comparison of intestinal permeabilities determined in multiple in vitro and in situ models: Relationship to absorption in humans

Electrostatic interaction of a solute with a continuum. A direct utilizaion of AB initio molecular potentials for the prevision of solvent effects

Remarks on the use of the apparent surface charges (ASC) methods in solvation problems: Iterative versus matrix-inversion procedures and the renormalization of the apparent charges

Atomic charges derived from semiempirical methods

Qualitative prediction of blood-brain barrier permeability on a large and refined dataset

Chance factors in studies of quantitative structure-activity relationships

The great descriptor melting pot: Mixing descriptors for the common good of QSAR models

Application of genetic function approximation to quantitative structure-activity relationships and quantitative structure-property relationships

Gene selection for cancer classification using support vector machines

Robust PCA and classification in biosciences

The importance of being earnest: Validation is the absolute essential for successful application and interpretation of QSPR models

Support-vector networks

Support Vector Method for Function Approximation, Regression Estimation, and Signal Processing

Support Vector Machines, Neural Networks, and Fuzzy Logic Models

How not to develop a quantitative structure-activity or structure-property relationship (QSAR/QSPR)

A new software for the development, analysis, and validation of QSAR MLR models

This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license