key: cord-0073970-xhug4wst authors: Wheatley, David; Bayley, Tiffany; Araghi, Mojtaba title: Able Construction: A Spreadsheet Activity for Teaching Bayes’ Theorem date: 2022-01-24 journal: Oper DOI: 10.1007/s43069-021-00119-3 sha: fea6286798e209ec54c364a3f8280aa5bb65f047 doc_id: 73970 cord_uid: xhug4wst Using classroom activities to motivate the teaching and learning of Bayes’ theorem is not new. However, many of the textbook exercises and published simulations gloss over how the requisite probabilities are determined. In our case study, Able Construction is a fictional company hoping to exploit historical bidding data to inform its own bidding strategy on a municipal construction project. Unlike most other classroom activities, we challenge students to calculate the necessary probabilities directly from a given dataset. In our experience with implementing this case in introductory business analytics courses at the undergraduate- and graduate-level, we find that this spreadsheet activity gives students the opportunity to exercise their own judgement regarding data manipulation and definition of states of nature. This autonomy in analysis develops in students a deeper appreciation for practical skills required for possible analytics careers after graduation, and leads to engaging discussions of the applicability of Bayes’ theorem in practice. Foundational probability concepts, as taught in undergraduate statistics or business analytics courses, begin with sets, unions, and intersections, move next to union, joint, and conditional probabilities, and then to addition and multiplication laws. While a probability and statistics course will no doubt cover many more topics, such as expectation, variance, and discrete and continuous distributions, the introduction to probability typically culminates with Bayes' theorem as it encapsulates many of the previous concepts. In most undergraduate statistics and analytics textbooks [1] [2] [3] [4] [5] [6] [7] , Bayes' theorem exercises are presented as word problems with numerical information sprinkled throughout, leaving students to perform the necessary calculations to obtain posterior probabilities. To assist in learning, decision trees are used as a visual tool to organize information, as well as to indicate when new information is revealed and how it will impact prior estimates. To situate the usefulness of Bayes' rule, textbook examples are usually centered on efficiency of prediction reports, or sensitivity and specificity of medical tests. Where many students tend to struggle is in the initial steps of correctly identifying prior and conditional probabilities and then distinguishing their meanings from the posteriors that they calculate. Rouder and Morey [8] advocate for presenting Bayes' formula as a ratio where posterior belief relative to prior belief equals the conditional probability of data relative to the marginal probability of data. Those authors reason that this ratio format better represents how prior probabilities are updated and the role that statistical evidence plays on posterior likelihoods. Still, these concepts can be abstract for a novice Bayesian, so it is no surprise that several approaches have been developed to assist student learning and to make statistics tangible and (more) fun. Eadie et al. [9] and Bayley et al. [10] use forms of active and game-based learning to engage students in their quest for determining posterior probabilities. Chen [11] developed an interactive poker game to reinforce Bayes' rule, providing a more relatable context for students than traditional textbook problems. Similarly, a web simulator based on the disappearance of a nuclear submarine shows a practical application of Bayesian probability [12] . Johnson et al. [13] integrate Bayesian and Markov methods in their project life cycle management case study, emphasizing the role of these concepts in a marketing and brand loyalty context. As analytics creeps into nearly every job sector and as machine learning and artificial intelligence are becoming more commonplace, the access to vast amounts of data will be at employees' fingertips. While determining likelihoods directly from this data is possible, we rarely provide students with this opportunity before they enter the workplace [14] ; even in classroom activities, the origin of prior and conditional probabilities is still a mystery as students do not collect or analyze this raw data for themselves. We attempt to address this gap through our Bayesian updating case study that includes a reasonably sized dataset, suitable for analysis in Excel. Students are expected to model the problem with a decision tree and calculate all necessary likelihoods from historical data provided. No probabilities are given outright, but rather, we ask students to determine an appropriate way to organize and analyze the data in order to update prior beliefs. In this article, we will describe in more detail the case study and its learning objectives, followed by an overview of its classroom implementation. We conclude with a brief discussion regarding the applicability of Bayes' rule outside undergraduate academics and the importance of gaining analytical skills to prepare students for their careers. Municipal infrastructure projects such as roads, bridges, sidewalks, or sewers are usually contracted out through a tendering process [15] . Construction companies submit their "sealed bids" and the company submitting the lowest bid that meets the municipality's eligibility criteria wins the contract. Our case examines Able Construction, a company that is deciding upon a bidding strategy for a new bridge construction project. The cost to prepare a proposal and estimated construction costs for the upcoming project are given. The full case problem is detailed in Appendix 1. The case problem can be presented as a decision tree where Able Construction is aiming to balance its expected profit with its chance of winning the bid (Fig. 1) . However, in contrast to the common approach in teaching decision-making using decision trees, our case does not provide the corresponding probabilities explicitly. Instead, the case presents 310 past construction tenders on various project types, such as bridges, roads, and highways (Fig. 2) . For each project, we provide the estimated project cost, the amount of the winning bid (which could be from Able or a competitor), and whether Able won the tender (or did not bid at all). Able Construction can exploit this historical data to predict how stiff its competition will be for this new bridge project, helping to inform their bidding strategy. Furthermore, a market research firm with a known track record approaches Able Construction and offers to forecast the competitiveness of construction tenders for a fee. It is possible to estimate the winning bid distribution and estimate the chance of winning with a specific bid amount through regression analysis. However, to make this analysis suitable for an undergraduate-level introductory business analytics course, our case proposes an alternate approach and provides two specific bidding strategies based on target profit margins to choose from: (i) a conservative bidding strategy (C) with a 25% profit margin, which has a lower chance of winning but higher payoff; or (ii) an aggressive bidding strategy (A) with a 15% profit margin, which has a higher chance of winning but lower payoff. Students will use the historical data to estimate the probability of winning under each strategy (i.e., prior probabilities like P(C Win) or P(A Lose)), and then identify the strategy that provides the highest expected payoff. Students can also investigate factors that might affect these probabilities, including the type of construction project and the estimated project cost. To provide a venue to study marginal and posterior probability, the case provides the historical track record of a market research firm that predicts competitiveness of tenders for construction projects as High Competition or Low Competition (last column in Fig. 2 , abbreviated "High Comp" and "Low Comp" henceforth). Students first estimate the marginal probability of each prediction (i.e., P(High Comp) or P(Low Comp)), then revise the probability of winning under each bidding strategy conditioned on the prediction of the competitiveness of the project (i.e., posterior probabilities such as P(C Win|High Comp) or P(A Lose|Low Comp)) directly from the historical data. To further students' understanding of Bayesian inference, the instructor may ask students to calculate the historical accuracy of the market research firm prediction (i.e., conditional probabilities, such as P(High Comp|A Win)). Students can calculate corresponding prior, conditional, marginal, and posterior probabilities and verify that Bayes' rule holds. Finally, students should determine which of the two bidding strategies, conservative or aggressive, is a better choice by using their updated decision tree. This case is suitable for an undergraduate or graduate course in business decision models, analytics, or statistics. All three of the authors have taught this case in an undergraduate course on business decision models, where the topic of Bayesian updating is part of a module on decision-making under uncertainty. In this context, we recommend using this case in the 3rd or 4th 80-min class in a series on probability and decision trees. We include the suggested time allocation in the teaching note (Appendix 2). Earlier sessions can introduce decision-making with payoff tables, decision trees, and Bayes' rule. In a graduate course on business analytics, this case can stand alone to support a 3-hour class on decision-making with probability and trees. Our initial experience with teaching this case is that students who are not used to participating in quantitative cases rarely come to class with prepared calculations, or are not inclined to contribute their unchecked work. This naturally leads to an instructor-led exploration of the case, essentially a "chalk-and-talk" approach. While that approach can still be effective, it does not have the benefits of active learning approaches [16] [17] [18] [19] . To increase student engagement without sacrificing student learning, instructors could gamify portions of this case, as discussed in [10] . Breaking interim calculations down into more manageable pieces, allowing students unlimited attempts to achieve the desired result, and promoting teamwork is a gamified problem-based learning approach that adds variety to traditional lecture-based sessions. Either way, we recommend that instructors prepare an "entrance quiz" (available up to a week ahead of time and due the day before class) to encourage (or require) that students do some initial analysis on the data before coming to class. Simple numeric questions such as "What is the average winning profit margin as a percentage?" or "What percentage of the time has the consulting firm predicted 'High' competition?" will ensure students are familiar with the dataset and develop their knowledge of Excel syntax. Whether an entrance quiz is used or not, the instructor can demonstrate the key outcomes of the case. These include how to calculate the conditionals, priors, marginals, and posteriors in Excel using functions or pivot tables, and carefully explaining how to recognize each of the probabilities in the context of the case. The instructor can also illustrate how to use the conditionals and priors to calculate the marginal and posterior probabilities, demonstrating Bayes' rule in action: we like to do this part on a whiteboard, to distinguish the mathematical approach from the data-driven approach. Students may be concerned that the sum of the prior probabilities of winning with an aggressive bid and winning with a conservative bid is more than 1. This happens because the probability of winning with a conservative bid is a subset of winning with an aggressive one. Our teaching note (Appendix 2) identifies three states of nature (winning with both bidding policies, winning with only the aggressive policy, losing with both policies), although instructors could approach this as two sets of priors (winning and losing) for the two different strategies (conservative bid and aggressive bid). Either way, this provides the instructor with a great opportunity to discuss the importance of defining mutually exclusive "states of nature" independent of decision alternatives. As part of our Assessment of Learning process, we regularly review student grades on particular exam questions and use those grades as a proxy for how well the students have learned and applied key topics. Every year since at least 2014, we have included a question on decision trees and Bayesian updating on an exam in a second-year undergraduate course in business decision modelling. The question is different every year, but we carefully tune the difficulty and use a consistent marking rubric to try and maintain parity across years. We introduced the Able Construction teaching case in 2019. Student scores on the decision tree with Bayesian updating question are collected in Table 1 . The students scored very well on the relevant question in 2019, the first year that we used the teaching case. The question used in 2019 was similar to the question used in 2017, and we believe that some of the improvement in student scores can be attributed to the teaching method. Subjectively, we think the question used in 2018 was a little easier than the other 2 years, so we propose the comparison between 2017 and 2019 is the most relevant. We must acknowledge that these results are not from an experiment, and are subject to uncontrolled factors including: the students are not from the same cohort; the exam questions are different year to year; and marking standards cannot be strictly controlled year to year. Nevertheless, these preliminary results are in line with our hypothesis that most students benefit from this data-driven approach. We plan to formally test this data-driven case approach in a carefully designed classroom experiment that runs semester-long. For a given class session and topic, students in the control group will learn through traditional lecturing while the experiment group will be exposed to the data-driven case (or activity). At the end of the session, we will gather anonymized perceptions of student learning as well as student performance on assessments related to the session's topic. Similar to [10] , we will offer multiple activities throughout the term and alternate which participants are in the control and experiment groups so that by the end of the term, each student will have experienced at least one traditional lecture and one datadriven case. The course in which we use this case does not cover as much depth as an undergraduate Bayesian statistics course. Still, instead of just going through the mechanics of Bayesian updating, we posit that students benefit from having "a solid understanding of the entire statistical investigation process [and] not just conducting the data analysis step of that process" [20] . Working directly with source data can provide valuable insight for students to better appreciate the process of cleaning, analyzing, and interpreting statistical data, and demystify those prior and conditional probabilities that appear in textbook problems. There is a small risk that this case undermines the role of Bayes' rule in practice. Students might wonder why they need Bayes' rule if they have access to datasets like this one. The instructor can clarify that Bayes' rule is still useful in situations where the dataset is incomplete, or in evaluating claims made by third parties where access to individual records is not available. For example, medical tests reporting test specificity and sensitivity rates are presenting conditional probabilities. An individual interested in knowing their chance of actually having a disease after receiving a positive or negative test result needs to calculate the associated posterior probabilities. A recent New York Times article walks the reader through this process for evaluating COVID-19 test results [21] , emphasizing throughout the practicality of Bayes' theorem. Even if we do have complete records, it is important to understand the difference between conditional and posterior probabilities. In contrast to the Bayesian classroom activities we reviewed above, these probabilities are not explicitly given. With our case, students are taught how to identify and calculate them in a manageable dataset, then properly use the posteriors to update a decision tree. We believe these techniques are increasingly relevant as demand for data analytics skills grows throughout the business world. Michelle Duarte, CEO of Able Construction, just received confirmation of a municipal tender for a new bridge construction project. She now had to prepare a bidding strategy for the contract. Municipal and provincial governments that are funding new infrastructure projects (roads, bridges, sidewalks, sewers, etc.) rarely do the construction themselves. Instead, they contract out the work by asking private construction companies to bid on the project through a tendering process. The process is carried out by "sealed bid," where all bidders must submit project proposals with private bids by 4:00 p.m. on the day the tender closes. The lowest bid that meets the eligibility criteria laid out in the tender wins the contract, and the government pays the lowest bid amount to the winning construction firm. The government then expects the construction firm to complete the project on time and to specification. The winning construction firm will receive their bid as revenue when the project is completed. The construction firm is also responsible for all incurred costs during construction. For example, if the winning bid is $500,000 on a road project, the government will pay the winning construction company $500,000. The construction company might incur $400,000 in costs, and thus earn $100,000 in profit after completing the road. Able Construction (Able) is a privately owned construction firm that bids on municipal and provincial infrastructure projects, especially constructing new roads, highway expansions, and bridges. Over the past few years, Able had done well by bidding only for construction contracts at a good price. It was important to bid high enough on contracts not only to ensure Able could both cover costs and secure a good profit but to also bid low enough that they stood a good chance of winning the bid. The large size of infrastructure projects hinders new and smaller size companies to enter the competition. Therefore, Duarte believes she can maintain a good track record for Able by continuing to make intelligent bids that balance the expected profit and her chance of winning the bid. Since the bidding data on past tenders is disclosed to the public, Able maintained a file summarizing the last 310 construction tenders, the winning bid, and Able's result (if they have placed a bid). The next tender was for a bridge contract that was due in a few weeks. The Construction Manager estimated the cost for Able to complete the bridge project was $1,324,000. The first task was to prepare the proposal and bid for the next tender. Preparing a proposal and bid costs Able about $5000, regardless of the size of the bid. After discussing multiple scenarios with the Chief Financial Officer of Able, Duarte came up with two final options to choose from: (i) bid conservatively at 125% of the estimated cost of the project, with lower chance of winning but higher payoff; (ii) bid aggressively at 115% of the estimated cost of the project, with higher chance of winning but lower payoff. While contemplating her options, Duarte received an offer from AuctPred, a market research firm specialized in forecasting the competitiveness of construction tenders. AuctPred was requesting $10,000 to study the project, and provide exclusive consult to Able. To show their record of excellence, AuctPred has submitted a list of their predictions ("high" or "low" competition) for previous projects provided to different firms. Duarte is now wondering if she should consult AuctPred before finalizing her decision. Michelle Duarte, CEO of Able Construction, just received confirmation of a municipal tender for a new bridge construction project. She now has to prepare a bidding strategy for the contract. Concepts: calculating prior and posterior probabilities from historical data, Bayesian updating, decision analysis. Tools: Spreadsheets, Pivot tables. By the end of this case, students will be able to. Payoff Table Develop the payoff table using the given information: 4 Page 10 of 18 -Estimated project cost = $1,324,000 -Cost of $5000 to submit bid regardless of bid amount -Conservative bid is 125% of estimated cost, so you make 25% in revenue = $331,000 -Aggressive bid is 115% of estimated cost, so you make 15% in revenue = $198,600 The state of nature should be defined based on the best competitors' bid. Specifically, if the best competitor's offer is below 115% of the project cost, Able will not be able to win the bid under either conservative or aggressive bidding strategies. If the best competitors' offer is between 115 and 125% of project cost, only the aggressive bidding strategy can win, and if the best competitor's offer is above 125% of project cost, both bidding strategies win. The instructor can also discuss the case "only conservative wins" and why it is not considered in the table. Before trying to estimate prior probabilities, students should note that the winning bid amount is not necessarily equivalent to the best competitor's offer. Specifically, for the contracts won by Able, they do not have access to the runner-up amount. In other words, if Able has won using an aggressive bidding strategy, they cannot determine if a conservative bid could have won or not. Therefore, students should first clean the dataset by removing rows corresponding to contracts won by Able. This would be an excellent teaching opportunity for the instructor to discuss handling censored and/or missing data. To estimate prior probabilities from the past data, create the following columns in the data spreadsheet: Averaging over column H gives P(both lose) = 0.256, over column I gives P(only aggressive wins) = 0.432, and over column J gives P(both win) = 0.312. This will lead to a discussion on estimating prior probabilities of winning and losing. It is not surprising that the bidding strategy does impact the likelihood of winning or losing. Therefore, for a conservative bid, students should consider: While it is easier to calculate prior probabilities (and posteriors, as discussed in the next section) with this approach, students should be aware that, unlike the states of nature defined in Table 2 , these two events are not mutually exclusive. Figure 3 shows the decision tree with "No Market Test" and the expected value for the correct decision: Aggressive Bid, $142,758.40. Students can begin by drawing the decision tree with the market test. This will lead to a discussion regarding the probabilities we are missing: marginal and posterior probabilities. While there are many approaches for estimating these probabilities, two methods are detailed: For marginal probabilities, find the proportion of projects identified as having High (or Low) competition. Select data in columns A:J and Insert PivotTable in same worksheet. Organizing the PivotTable according to Fig. 4a will produce the output in Fig. 4b . Students can then compute proportion of high and low competition, as well as the posterior probabilities from the PivotTable. The instructor can ask students to verify Bayes' theorem by estimating posterior probabilities through the record of the market research firm prediction. Students have already estimated prior probabilities P(both lose), P(only A wins), P(both win) directly from the historical data. To calculate posterior probabilities using Bayes' rule, students just need to calculate conditional probabilities. • P(High comp| both lose) = COUNTIFS(F2:F251,"High",H2:H251,1)/COUNTIF (H2:H251,1) = 0.5781 • P(High comp| Only A wins) = COUNTIFS(F2:F251,"High",I2:I251,1)/COUNTIF (I2:I251,1) = 0.6204 • P(High comp| both win) = COUNTIFS(F2:F251,"High",J2:J251,1)/COUNTIF (J2:J251,1) = 0.1282 Following from above, the remaining conditional probabilities are P(Low comp | both lose) = 0.4219, P(Low comp | Only A wins) = 0.3796; P(Low comp| both win) = 0.8718. Using prior and conditional probabilities, they estimate the posterior ones. For "High" competition prediction, we have. Now we find the probabilities needed for the decision tree: • P(C wins | High comp) = P(both win | High comp) = 0.0877 • P(A wins | High comp) = P(both win | High comp) + P(only A wins | High comp) = 0.0877 + 0.5877 = 0.6754 • P(C wins | Low comp) = P(both win | Low comp) = 0.5 • P(A wins | Low comp) = P(both win | Low comp) + P(only A wins | Low comp) = 0.5 + 0.3015 = 0.8015 Which are identical to posterior probabilities estimated directly from historical data and verifies Bayes' theorem. Decision Tree for "Market Test" Figure 5 shows the decision tree with "Market Test" and its expected value of $146,201. If the competition is predicted to be Low, the best decision is to bid conservatively (EV = $160,500). If the competition is predicted to be High, the best decision is to bid aggressively (EV = $129,412). Note that the cost of the market test is not included here. We consider it next. To determine the highest cost Able is willing to pay for AuctPred's analysis, we compute the Expected Value of Sample Information: • EVSI = EV with Sample Info − EV without Sample Info = $146,201 − $142,578. 40 = $3,622.60 Since EVSI < $10,000 cost of AuctPred's market test, Able should not choose their analysis. Based on our analysis so far, the correct decision would be "No Market Test" → "Aggressive Bid," with an EV = 142,578.40. Since the next bid is for a bridge, the steps outlined in the decision tree analysis with and without the market test can be repeated to consider only "Bridge" projects and provide more specific probabilities. For example: • P(both lose) = AVERAGEIF(E2:E251,"Bridge",H2:H251) • P(both lose|High comp) = AVERAGEIFS(H2:H251,E2:E251,"Bridge",F2:F251, "High"). The same can be accomplished with PivotTable by adding the field "Type" into the "Rows" area ( Fig. 6a) to produce the output in Fig. 6b . The updated decision tree with probabilities filtered by "Bridge" is shown in Fig. 7 . We find that EVSI = $157,347.62 − $141,585.71 = $15,761.91. Now, it is worth getting the market test and to only bid aggressively if the competition prediction is High. Probability and Bayesian modeling Business statistics: for contemporary decision making (3rd, Canadian Business analytics Introduction to management science: a modeling and case studies approach with spreadsheets Doing Bayesian data analysis: a tutorial with Introduction to probability models Teaching Bayes' theorem: strength of evidence as predictive accuracy Introducing Bayesian analysis with m&m's®: an active-learning exercise for undergraduates Assessing a novel problem-based learning approach with game elements in a business analytics course Introduction to probability: Bison Hold'em Poker Game A web simulator to assist in the teaching of Bayes' theorem Teaching Bayesian and Markov methods in business analytics curricula: an integrated approach The challenges of teaching business analytics: finding real big data for business students Procurement in the construction industry: the impact and cost of alternative market and supply processes Problem-based learning: a review of literature on its outcomes and implementation issues Measuring actual learning versus feeling of learning in response to being actively engaged in the classroom Effects of problem-based learning: a meta-analysis from the angle of assessment Does problem-based learning work? A meta-analysis of evaluative research Teaching an undergraduate course in Bayesian statistics: a panel discussion Thinking like an epidemiologist. The New York Times