key: cord-0834998-28ldgmh8 authors: Arefi, Mohsen; Viertl, Reinhard; Taheri, S. Mahmoud title: A possibilistic analogue to Bayes estimation with fuzzy data and its application in machine learning date: 2022-04-20 journal: Soft comput DOI: 10.1007/s00500-022-07021-y sha: 77183dc29b81c103d65594f01361806ea58f6ec4 doc_id: 834998 cord_uid: 28ldgmh8 A Bayesian approach in a possibilistic context, when the available data for the underlying statistical model are fuzzy, is developed. The problem of point estimation with fuzzy data is studied in the possibilistic Bayesian approach introduced. For calculating the point estimation, we introduce a method without considering a loss function, and one considering a loss function. For the point estimation with a loss function, we first define a risk function based on a possibilistic posterior distribution, and then the unknown parameter is estimated based on such a risk function. Briefly, the present work extended the previous works in two directions: First the underlying model is assumed to be probabilistic rather than possibilistic, and second is that the problem of Bayes estimation is developed based on two cases of without and with considering loss function. Then, the applicability of the proposed approach to concept learning is investigated. Particularly, a naive possibility Bayes classifier is introduced and applied to some real-world concept learning problems. Bayesian inference for parametric statistical models is based on two assumptions: (1) The parameter of interest, θ , of the underlying model f (x|θ) is of stochastic nature and has a probabilistic prior distribution, π(θ). (2) The available data related to the random variable X are precise. The first item is the most important point in the Bayesian paradigm. But, considering θ as a random variable with a probabilistic prior distribution is a matter of challenge between frequentist and Bayesian statisticians. On the other hand, concerning item 2 above, in many situations, the available data are vague (non-precise) rather than crisp (precise) (see Zimmermann 2000) . In real-world problems, we often face such situations, situations in which the interesting parameter has a possibilistic nature and/or the data available are fuzzy. For instance, consider a medical study about the rate of people affected by virus COVID-19 in a certain population. Suppose that, based on the experiences, this rate is determined to be "about 0.15" (expressed by a fuzzy number with certain spreads as ambiguity). Moreover, it is completely reasonable that some medical tests about presence or absence of the virus yield uncertain results; for example, "with high possibility" she/he has the virus, the possibility is "about 0.8"s to have virus, etc. In such a study, which is very usual in practical problems, we need to use a possibilistic version of the Bayes formula based on fuzzy information. In this study, we propose a possibilistic version of the Bayesian approach, in which the underlying model is probabilistic, but the prior information about θ is formulated as a possibility distribution (called possibilistic prior distribution) rather than a probabilistic one. In addition, we consider that the data available for the random variable X are presented as fuzzy numbers rather than as crisp numbers. To this end, the likelihood function based on such a fuzzy-valued random sample is defined, and then, the extended likelihood function is combined with the possibilistic prior distribution, to obtain a possibilistic version of the posterior distribution. Such a possibilistic posterior distribution is based on a T-norm, so that it is flexible with respect to different situations. In this regard, the problem of point estimation without and with loss function is presented, too. Also, we present a possibilistic predictive distribution for predicting future fuzzy values of the underlying variable. In fact, we introduce a new Bayes classifier which is based on the possibility prior (rather than a probability one) and fuzzy data (rather than crisp ones). Therefore, the novelty of the current work compared to previous works is: 1) Considering the underlying model to be probabilistic rather than possibilistic. 2) Studying the problem of Bayes estimation based on two cases of without and with considering loss function, in a decision-theoretic framework. 3) Introducing the possibilistic predictive distribution function, in a probabilistic-possibilistic context. 4) Introducing the naive possibilistic Bayes classifier for crisp and fuzzy data. The application of the proposed approach can be used in machine learning in which the Bayesian procedures present some methods to combine the prior knowledge and available data (see, e.g., Cui et al. 2013; He et al. 2014; Jiang et al. 2014; Subrahmanya and Shin 2013; Wang et al. 2018; You et al. 2019 ). The topic of Bayesian inference in vague (non-precise) environments and/or in expert systems has been studied by some authors. Let us review some recent works in this topic. Lapointe and Bobée (2000) studied a possibilistic posterior distribution based on a possibilistic prior distribution and a possibilistic statistical model. Arefi and Taheri (2016) extended Lapointe and Bobée's approach when the available data are fuzzy. They also studied the applicability of their approach in the field of concept learning. Behboodian (2001, 2006) extended the Bayes approach to testing fuzzy hypotheses for both crisp (exact) and fuzzy (non-exact) data (also, see Torabi and Behboodian (2007) . Some approaches of point estimations with fuzzy random samples are investigated by Akbari and Khanjari Sadegh (2012) . Osoba et al. (2011) considered the problem of Bayesian inference using some certain fuzzy priors. Bacani and Barros (2017) , Bardakhchyan (2017) , Hareter and Viertl (2004) , Mandal and Ranadive (2019), Osoba et al. (2012) , Viertl (2011), Viertl and Hareter (2004) , and Zhang and Chi (2008) investigated some aspects of the Bayes approaches in imprecise (fuzzy/vague) environments. This paper is organized as follows: In Sect. 2, we recall some basic concepts of possibility theory. Two new concepts, the possibilistic prior distribution and possibilistic posterior distribution, are introduced and investigated in Sect. 3. In Sect. 4, we study the problem of parameter estimation, in the possibilistic Bayes paradigm, without considering a loss function. In Sect. 5, using a loss function, the posterior risk function is initially defined and then, the problem of parameter estimation is developed based on this function. The concept of possibilistic predictive distributions for future values of fuzzy data is introduced in Sect. 6. Some applications of the proposed model in machine learning are explained in Sect. 7. In Sect. 8, the proposed approach is compared with some other approaches. A brief conclusion is provided in Sect. 9. The Bayesian approach in statistics is fundamentally based on considering the parameter of interest θ (related to the statistical model f (x|θ)) as a random variable with a prior probabilistic distribution π(θ). However, in many problems, we have an imprecise (not necessarily stochastic) information on θ . In these cases, it is reasonable to consider θ as a possibilistic variable with a vague (fuzzy) prior information. Below, we recall two basic definitions of "Possibility Theory," which we will need in the present article. The reader is referred to Dubois and Prade (1988) , Klir and Folger (1988) , Krätschmer (2004) , and Zadeh (1968) for more details. Definition 1 A possibility measure Π on a measurable space (Ω, B) is defined to be a function Π : B → [0, 1], that satisfies the following axioms Also, (Ω, B, Π) is said to be a possibility space. In the special case, π * (x) = Π({x}). The possibility measure and the possibility function are comparable to the probability measure P(·) and the probability density function, respectively. Note that if f (·) is a probability density function on (Ω, B), then we have Remark 2 Note that the possibility measure Π(·) and the probability measure P(·) are set functions, but the possibility function π * (·) and the probability function f (·) are real-valued functions defined on Ω. In this section, we extend the concept of likelihood function to fuzzy data. Moreover, we define the possibilistic posterior distribution based on a possibilistic prior distribution, when the observations of the underlying model are fuzzy. In the following, we assume that (Ω, B) is a measurable space, in which Ω is the sample space. Also, (Ω, B, P) is a probability space, where P is a probability measure on (Ω, B). Definition 3 Let (Ω, B, P) be a probability space. Let X = ( X 1 , X 2 , ..., X n ) be a fuzzy-valued random sample of size n of X , associated with the probability density function (PDF) (or a probability mass function) f (·|θ), i.e., a sequence of fuzzy numbers as fuzzy realizations of the original random variable X. Then, the likelihood function based on such a fuzzy-valued random sample is defined by where f (x|θ) is the Radon-Nikodym derivative of P with respect to ν (a σ -finite measure). The measure ν usually is "counting measure" or "Lebesgue measure," and χ is the support of the random variable X . Remark 3 It should be mentioned that when the available data are crisp numbers x 1 , x 2 , ..., x n , then the above definition reduces to the ordinary definition of the likelihood function, i.e., l(θ ; x) = n i=1 f (x i |θ). Note that the above extension is according to Zadeh's definition (Zadeh 1968) for the probability of fuzzy events. Definition 4 Let f (x|θ) be a statistical model with unknown parameter θ ∈ Θ. Suppose that the information about θ is formulated as a possibility function π * (θ ). This possibility function is called possibilistic prior distribution for θ . Definition 5 Consider the fuzzy-valued random sample X = ( X 1 , X 2 , ..., X n ) with the likelihood function l(θ ; X ). Suppose that the parameter θ has a possibilistic prior distribution π * (θ ). The possibilistic posterior distribution, under T-norm T (., .), is defined by where m( X ) = sup θ T (l(θ ; X ), π * (θ )) is called the marginal function (for obtaining a normal posterior distribution, i.e., ∃ θ s.t. π * (θ | X ) = 1). The flowchart of procedure for calculating the possibilistic posterior distribution is given in Fig. 1 . It is remarkable that in the above discussion, we have two kinds of uncertainty: the probabilistic one which is due to the random variable and the possibilistic one which is arisen from the possibilistic prior. The difficulty is to combine these two different kinds of uncertainties to update the information about the unknown parameter θ . Note that although there have been some attempts to define the conditional possibilities (e.g., Coletti and Vantaggi (2009 ), De Baets et al. (1999 ), Ferracuti and Vantaggi (2006 , Hisdal (1978) , Kramosil (1998), and Nguyen (1978) ), to the best knowledge of the authors, there has not been any work in the problem of updating a possibility information using probabilistic fuzzy data. The above definition is, in some sense, consistent to some definitions for conditional possibility. First, note that the marginal function m( X ) is analogous to a corresponding relation for probabilities in probability theory for which the operators summation and product are used instead of sup and T (., .), respectively (see Eq. (33), in Nguyen (1978), Eq. (2.8) in Hisdal (1978) , and Eq. (6) in Kramosil (1998) ). A second item is related to use a T-norm in the numerator in Eq. (2). In this regard, see the discussion in Sect. 3 in De Baets et al. (1999) and Sects. 3 and 4 in Coletti and Vantaggi (2009) . Table 1 (centers of the fuzzy numbers) show the lifetimes (in 1000 km) of front disk brake pads on a randomly selected set of 40 cars (same model) that were monitored by a dealer network (see, Lawless, 2003, p. 337) . Suppose that the lifetime of the front disk brake pad has an exponential distribution with density function where θ is the mean lifetime of the front disk brake pad. An expert believes that the value of the variable θ lies in the interval [40, 50] with a possibility of one. Moreover, he/she believes it is possible that θ is smaller than 40, but never below 30, and bigger than 50, but never above 60. We use a trapezoidal fuzzy number to model the possibilistic prior distribution based on the expert opinion as follows: In practice, measuring the lifetime of a disk may not yield an exact result. A disk may work perfectly over a certain period but be braking for some time and finally be unusable at a certain time. So, such data may be reported as imprecise quantities. Assume that the lifetimes of front disk brake pads are reported as fuzzy numbers in Table 1 . In fact, imprecision is formulated by fuzzy numbers . . , 40, as follows: other wise. The likelihood function based on such fuzzy data is calculated as Hence, the possibilistic posterior distribution based on the product T-norm, T (a, b) = a.b, is obtained as follows (see Fig. 2 ) Example 2 Consider a study for estimating the proportion θ of a kind of tree in a forest, which is infected with a plague. We take a sample of 3 trees and examine each tree separately for the presence of plague. Suppose that we have no precise mechanisms for an exact distinction between the presence or absence of plague, but we can identify the information with a fuzzy set on X = {0, 1}. The usual model for this problem starts from the Bernoulli experiment X associated with the presence of plague (Bin(1, θ), 0 ≤ θ ≤ 1). Suppose that, based on some prior information, we know that θ is approximately 0.2. In this case, a suitable possibilistic distribution for modeling such prior information is the following fuzzy number (see Fig. 3 ) Suppose that, based on a random sample of size n = 3, we observe the following fuzzy data The likelihood function based on these fuzzy data is (t = Based on the product T-norm, the possibilistic posterior distribution would be (see Fig. 4 ) which is maximized in θ = 0.4340. Also, the marginal function for the observed data is Definition 6 Consider a possibilistic posterior distribution based on an observed fuzzy random sample. Then,θ = d( X ), as a function on the fuzzy-valued random sample The above definition is similar to the definition of the maximum Bayesian likelihood estimator, defined as the posterior mode, in the probabilistic approach (see, Robert (Robert 2001) , p. 166). In spite of Definition 6, we can use defuzzification methods (see, i.e., Ross (1995) ) to obtain an estimation for θ based on the possibilistic posterior distribution. Definition 7 Let D F F(·) be a defuzzification operator. The point estimationθ based on the possibilistic posterior distri- (4) i) Consider the "Center of Gravity" defuzzification method. Based on this method, the point estimationθ ii) Consider the "Center of Area" defuzzification method. Based on this method, the point estimationθ iii) Consider the "Supremum of Center" defuzzification method. Based on this method, the point estimationθ Example 3 Consider the possibilistic posterior distribution in Example 1. The point estimations of θ , based on the above different methods, are calculated as follows: • The maximum possibilistic posterior estimation: • The center of gravity estimation: 576.2920 11.8144 = 48.7788. • The center of area estimation: • The supremum of center estimation: Example 4 Consider the possibilistic posterior distribution in Example 2. The point estimations of θ , based on the above different methods, are calculated as follows: • The maximum possibilistic posterior estimation: • The center of gravity estimation: • The center of area estimation: • The supremum of center estimation: In a decision-theoretic analysis, the quality of a decision function (here, a point estimation) is quantified by the risk function of the decision function. In this section, we first define a risk function based on the possibilistic posterior distribution π * (θ | X ), and then, the point estimation (decision function) d( X ) is obtained based on this risk function. Let Θ be the parameter space. Any function L(θ, d) : Θ × D → is called a loss function, where D is the space of possible decisions (here, the space of all point estimations of θ ). The posterior risk function with fuzzy data X for the estimation (decision function) d( X ) under the probability density function (or probability mass function) f (x|θ) and with the possibilistic prior distribution π * (θ ), based on a loss function L(θ, d), is defined as Definition 9 The estimation d P B based on the loss function L(θ, d) and the possibilistic posterior distribution based on the fuzzy data π * (θ | X ) are called a possibilistic Bayes estimation if where D is the set of all estimations for θ . i) Based on the quadratic loss function L(θ, d) = (θ −d) 2 , the posterior risk function in terms of d for the random sample of size n = 3 is shown in Fig. 6 . Here, the possibilistic Bayes estimation of θ , for which the posterior risk function is minimized, is d P B = 0.4491. ii) Based on the loss function L(θ, d) = |θ − d|, the posterior risk function for the random sample of size n = 3 is shown in Figure 6 . Here, the possibilistic Bayes estimation of θ , for which the posterior risk function is minimized, is d P B = 0.4315. The future values of fuzzy data Y are provided by a possibility function, called "possibilistic predictive distribution" (ppd). Definition 10 Let π * (θ | X ) be the possibilistic posterior distribution based on the fuzzy data X = ( X 1 , ..., X n ). Also, suppose that the random variable Y has the pdf f (.|θ). The possibilistic predictive distribution g(·| X ) is defined as where f (y|θ) = sup θ∈Θ { f (y|θ)}. In the Bayesian approach, the predictive distribution for future values is introduced as where π(θ|x) is the probabilistic posterior distribution based on the precise data x = (x 1 , ..., x n ) and f (y|θ) is the probability density function for the random variable Y (see, Robert, 2001, p. 22) . We list some differences of the possibilistic predictive distribution g(y| X ) and the predictive distribution g(y|x) as follows: 1-The predictive distribution g(y|x) is based on precise data, but the possibilistic predictive distribution g(y| X ) is based on fuzzy data. 2-The predictive distribution g(y|x) is a probability distribution, but the possibilistic predictive distribution g(y| X ) is a possibility distribution. Example 7 Consider the possibilistic posterior distribution π * (θ | X ) based on the product T-norm, in Example 1. The possibilistic predictive distribution g(·| X ) based on the product T-norm, T (a, b) = a.b, is obtained as follows (see Fig. 7 ) The possibilistic predictive distribution can be expressed by "approximately 50." Table 2 shows some values of the possibilistic predictive distribution. The new front disk brake pad works 50 (in 1000 km) with possibility 1, and it works less/more than 50 with less possibility. (For example, with possibility 90%, the lifetime of the disk is equal to 30.) Hence, the possibilistic predictive distribution is Hence, the possibilistic predictive distribution is given by (see Fig. 9 ) A common problem in machine learning is the concept learning problem, for which the Bayes classifier is a well-known method, (see, e.g., He et al. (2014) , Mitchell (1997) , and Wang et al. (2014) ). Any system that classifies new instances according to the following equation is called a Bayes optimal classifier or Bayes optimal learner (see Mitchell (1997) ) v M AP = argmax θ j ∈V π(θ j |a 1 , a 2 , ..., a n ), where V = {θ 1 , ..., θ k }, a 1 , a 2 , ..., a n are the attribute values, and π(θ j |.) is the probabilistic posterior function. Using Bayes rule, we can rewrite the above expression in the following way, v M AP = argmax θ j ∈V P(a 1 , a 2 , ..., a n |θ j )π(θ j ) P(a 1 , a 2 , ..., a n ) = argmax θ j ∈V P(a 1 , a 2 , ..., a n |θ j )π(θ j ). Particularly, the naive Bayes classifier assumes that the attribute values are conditionally independent given the target value. In this case, the so-called naive Bayes classifier is defined as Now, based on the results in Sect. 3, we investigate and develop the Bayes classifier in the possibility environment. Particularly, we investigate naive possibilistic Bayes classifier both for crisp information and for fuzzy information. The possibilistic Bayesian approach to classify the new crisp instance is to assign the most target value of maximum a possibilistic posteriori (v M AP P ) given the crisp attribute values {a 1 , a 2 , ..., a n } as follows, v M AP P = argmax θ j ∈V π * (θ j |a 1 , a 2 , ..., a n ) = argmax θ j ∈V T (P(a1,a2,...,an|θj ) ,π(θ j )) sup θ j ∈V {T (P(a1,a2,...,an|θj ) ,π(θ j ))} = argmax θ j ∈V T P(a 1 , a 2 , . .., a n |θ j ), π(θ j ) , where V = {θ 1 , ..., θ k } and π(θ j ) is the possibilistic prior distribution. If the attribute values a 1 , ..., a n are conditionally independent given the target value, then the joint conditional probability is obtained as follows, 1 , a 2 , . .., a n |θ j ) = n i=1 P(a i |θ j ). The following example is a possibilistic version of the example in Mitchell (1997) , p. 157, about a medical diagnosis. In this example, we investigate a learning task in a possibilistic context rather than probabilistic one. Consider a target attribute Medical Diagnosis with two values: (1) The patient has a particular form of cancer, and (2) the patient does not. The available data are from a laboratory test with two possible outcomes: ⊕ (the result of test is positive) and (the result of test is negative.) Formally, the set of target values is V = {θ 1 , θ 2 }, where θ 1 = cancer and θ 2 = nocancer. Based on the prior information, a patient has cancer and nocancer with the possibilities π(θ 1 ) = 0.10 and π(θ 2 ) = 0.95, respectively. Here, the amount π(θ 1 ) = 0.10 reflects the consistency of the symptoms of cancer in this patient (note that these values are not related to the randomness or relative frequency, which are usual in the probabilistic context). The test returns a correct positive result in only 98% of the cases in which the disease is actually present and a correct negative result in only 97% of the cases in which the disease is not present. In other cases, the test returns the opposite result. The conditional probabilities are summarized as P(⊕|θ 1 ) = 0.98, P( |θ 1 ) = 0.02, P(⊕|θ 2 ) = 0.03, P( |θ 2 ) = 0.97· Now, suppose that the laboratory test of a new patient returns a positive result. Our task is to diagnose the target value (the patient has cancer or not) based on the possibilistic posterior distribution. Fig. 9 Possibilistic predictive distribution based on the minimum T-norm in Example 8 Table 3 The possibilistic posterior distribution in Example 9 Here, the possibilistic posterior distribution is obtained as follows, The results based on different T-norms are given in Table 3 . Based on this information, v M AP P = θ 1 . Hence, if the result of test in a new patient is positive, then our best estimate is that he/she has cancer. In this subsection, we extend the method in the previous subsection to the case when the observed data are fuzzy rather than crisp. We use the possibilistic posterior distribution to classify the new fuzzy instance. Generally, suppose that the new fuzzy instance { a 1 , a 2 , ..., a n } is available and we want to classify it in the target space V = {θ 1 , ..., θ k }. We first calculate the possibilistic posterior distribution π * (θ j | a 1 , a 2 , ..., a n ) based on Definition 5, and then, the most target value of the possibilistic posteriori (v M AP P ) is obtained as follows, v M AP P = argmax θ j ∈V π * (θ j | a 1 , a 2 , ..., a n ) = argmax θ j ∈V T (l (θ ; a 1 , a 2 , . .., a n ), π * (θ )). The following example is a possibilistic version of the example in Mitchell (1997) , p. 178, for the target concept "PlayTennis" (see also Arefi and Taheri (2016) ). Example 10 Consider the learning task represented by the training examples of Table 4 . Here, the target attribute PlayTennis, which can have values yes or no for different Saturday mornings, is to be predicted based on other attributes of the morning in question. Formally, the set of target values is V = {θ 1 , θ 2 }, where θ 1 = Y es and θ 2 = N o. Based on the prior information, the player selects θ 1 and θ 2 with the possibilities π(θ 1 ) = 0.91 and π(θ 2 ) = 0.15, respectively. Let each day be described by the following attribute values, Table 4 , the conditional probabilities are obtained as in Table 5 . Suppose that we observe the following fuzzy information for a new instance, Based on Definition 3, the likelihood function under a 1 , a 2 , a 3 , a 4 is calculated as a 2 , a 3 , a 4 )) = P( a 1 , a 2 , a 3 , a 4 |θ) where a = ( a 1 , a 2 , a 3 , a 4 ) . Here, a 1 , a 2 , a 3 , a 4 ) = Y es. Therefore, the naive possibilistic Bayes classifier assigns the target value PlayT ennis = Y es to the new instance. Gil et al. (1985) studied the point estimation problem with fuzzy information. They defined a probabilistic posterior distribution as follows where P( X |θ) is defined in Definition 3, and π(θ) is a probabilistic prior distribution. Also, the posterior risk function based on the loss function L(θ, d) for the estimated parameter θ is defined as follows In contrast with the Gil et al.'s method, it should be noted that: 1) In our method, the parameter θ has a possibilistic nature, but it has a probabilistic nature in Gil et al.'s. Hence, in our method, the posterior distribution will also be of a possibilistic-probabilistic nature, whereas it will be of a probabilistic nature in Gil et al.'s. 2) We define the posterior distribution based on a T-norm. Hence, the posterior distribution is flexible under the different T-norms. Viertl (2011) proposed a Bayesian inference method in the situation of fuzzy prior information and fuzzy data. He supposed that a fuzzy sample x * 1 , ..., x * n consists of fuzzy numbers with related membership functions ξ 1 (·), . . . , ξ n (·). First, he combined the membership functions of the fuzzy data to obtain a membership function ζ(·, · · · , ·) of the combined fuzzy sample x * . Then, based on the extension principle on the likelihood function l(·|x 1 , · · · , x n ), the characterizing (membership) function η(·) of the fuzzy value l(θ |x * 1 , · · · , x * n ), based on fuzzy data x * 1 , ..., x * n , is given by for all y ∈ R. In order to obtain the generalized Bayes' theorem, he used the δ-level functions π δ (·) and π δ (·) of π * (·) and l δ (·|x * ) and l δ (·|x * ) of l(·|x * ). Hence, a fuzzy posterior density π * (·|x * ) is calculated by its δ-level functions π δ (·|x * ) and π δ (·|x * ) as follows Both the methods proposed in this article and in Viertl's work are based on the likelihood function, but our method has the following advantages: (1) In our method, the parameter θ has a possibilistic nature, but it has a probabilistic nature in Viertl's. Hence, the posterior distribution based on our method is possibilisticprobabilistic in nature, whereas it is fuzzy probabilistic in the Viertl method. (2) The posterior distribution proposed in the present paper is based on T-norms. Hence, the posterior distribution is flexible using different T-norms. (3) In our method, we investigate the case in which a loss function has been considered in estimating the unknown parameter. Arefi and Taheri (2016) studied a Bayesian approach, when the available data are fuzzy. This approach is an extended version of Lapointe and Bobée's approach (2000) with fuzzy data. In this approach, based on a possibilistic model π( X |θ) of fuzzy sample and a prior possibility distribution π(θ), the possibilistic posterior distribution is defined, when the available data are fuzzy. In the proposed approach, the possibility of a fuzzy sample is first defined as π( X |θ) = Poss( X 1 , . . . , X n |θ) = T sup where T (., .) is a T-norm. Then, based on a possibilistic prior distribution π(θ), the possibilistic posterior distribution is defined by π(θ| X ) = ψ T (π( X ), π( X , θ )), where π( X , θ) = T (π(θ ), π( X |θ)), π( X ) = sup θ∈Θ (π( X , θ)), and ψ T (a, c) = sup{x ∈ [0, 1]; T (a, c) ≤ c} is a residuated implication operator (see also Lapointe and Bobée (2000) ). Some cases are the following: (1) The approaches introduced in Lapointe and Bobée (2000) (with crisp data) and Arefi and Taheri (2016) (with fuzzy data) are full possibilistic nature, i.e., the model of data and the prior distribution have possibilistic distributions, but our method is a combined version of probability and possibility, i.e., the model of data is probabilistic and the prior distribution is possibilistic. (2) In this paper, we investigate the case in which a loss function has been considered in estimating the unknown parameter. In the Bayesian method for estimation of an unknown parameter, it is often difficult to assuming the prior information in stochastic terms. In addition, sometimes the observed data are non-precise (fuzzy) rather than precise (crisp). In this paper, by introducing the concept of likelihood function for fuzzy data of a probabilistic model, a possibilistic approach was described for dealing with such situations. The present approach employed the possibility distribution for modeling the prior information. Then, based on the possibilistic posterior distribution, some methods were proposed to estimate the unknown parameter of interest with/without a loss function. The proposed approach was applied to a well-known problem in machine learning called the concept learning problem. Particularly, the naive possibility Bayes classifier is introduced and applied to some real-world concept learning problems. In summary, we can express the contributions of the paper as follows: (1) Introducing a new approach to combination of probabilistic model and possibilistic information to get a possibilistic posterior distribution function. (2) Considering the observed data of the underlying model as fuzzy rather than crisp. (3) Employing a decision-theoretic approach in which the problem of estimation was studied based on two cases of without and with considering loss function. In addition, the maximum possibilistic posterior estimation (as a duality of the concept of maximum Bayesian likelihood estimation) is introduced. (4) Introducing the possibilistic predictive distribution function, based on probabilistic model, fuzzy observations, and possibilistic prior information. (5) Investigating some applications of the proposed Bayesian approach in the problem of concept learning, by introducing the naive possibilistic Bayes classifier for crisp data and fuzzy data. It should be mentioned that, according to the proposed approach, choosing (or constructing) a suitable prior distribution is an important issue (see, e.g., Hill and Spall (1994) , Chen et al. (2000) , and Dubois (2006)), which is a potential subject of future research. Estimators based on fuzzy random variables and their mathematical properties Possibilistic Bayesian inference based on fuzzy data Application of prediction models using fuzzy sets: a Bayesian inspired approach Conditioning in possibility theory with strict order norms New estimation method for the membership values in fuzzy sets T-conditional possibilities: coherence and inference Probabilistic DEAR models Possibility theory and statistical reasoning Independence and conditional possibility for strictly monotone triangular norms The fuzzy decision problem: an approach to the point estimation problem with fuzzy information Soft methodology and random information systems, advances in soft computing Bayesian classifiers based on probability density estimation and their applications to simultaneous fault diagnosis Sensitivity of a Bayesian analysis to the prior distribution Conditional possibilities, independence and noninteraction Bayesian citation-KNN with distance weighting Fuzzy Sets. Uncertainty and Information Alternative definitions of conditional possibilistic measures Probability theory in fuzzy sample spaces Revision of possibility distributions: a Bayesian inference pattern Multi-granulation fuzzy probabilistic rough sets and their corresponding three-way decisions over two universes Machine learning Bayesian inference with adaptive fuzzy priors and likelihoods Triply fuzzy function approximation for hierarchical Bayesian inference Heidelberg Ross TJ (1995) Fuzzy logic with engineering applications A Bayesian approach to fuzzy hypotheses testing On Bayesian approach to testing fuzzy hypotheses with fuzzy data Likelihood ratio tests for fuzzy hypotheses testing Chichester Viertl R, Hareter D (2004) Generalized Bayes' theorem for non-precise a-priori distribution Non-naive Bayesian classifiers for classification problems with continuous attributes NBWELM: naive Bayesian based weighted extreme learning machine An effective Bayesian network parameters learning algorithm for autonomous mission decision-making under scarce data Probability measures of fuzzy events A Fuzzy support vector classifier based on Bayesian optimization An application-oriented view of modeling uncertainty Publisher's Note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations The authors wish to express their thanks to the referee for valuable comments. The third author would like to acknowledge the financial support of University of Tehran for this research under grant number 30005/1/05. The authors have not disclosed any funding. The data used in the presented work are included within the paper. The authors declare that there is no conflict of interest regarding the publication of this paper.Ethical approval This article does not contain any studies with human participants or animals.