key: cord-0655805-29hne1e2 authors: Garg, Nikhil; Li, Hannah; Monachou, Faidra title: Dropping Standardized Testing for Admissions Trades Off Information and Access date: 2020-10-09 journal: nan DOI: nan sha: d0263c414d696646aa5e24a931e44824f1b1864e doc_id: 655805 cord_uid: 29hne1e2 We study the role of information and access in capacity-constrained selection problems with fairness concerns. We develop a theoretical framework with testable implications that formalizes the trade-off between the (potentially positive) informational role of a feature and its (negative) exclusionary nature when members of different social groups have unequal access to this feature. Our framework finds a natural application to recent policy debates on dropping standardized testing in college admissions. Our primary takeaway is that the decision to drop a feature (such as test scores) cannot be made without the joint context of the information provided by other features and how the requirement affects the applicant pool composition. Dropping a feature may exacerbate disparities by decreasing the amount of information available for each applicant, especially those from non-traditional backgrounds. However, in the presence of access barriers to a feature, the interaction between the informational environment and the effect of access barriers on the applicant pool size becomes highly complex. In this case, we provide a threshold characterization regarding when removing a feature improves both academic merit and diversity. Finally, using application and transcript data from the University of Texas at Austin, we illustrate that there exist practical settings where dropping standardized testing improves or worsens all metrics. to colleges that require it. 3 Supporters of testing argue that it is "a systematic means of collecting information," and so helps decision-making when used appropriately (Phelps, 2005) . Some supporters claim that tests actually help schools evaluate under-represented minorities; in the absence of standardized testing, "a capable student from a little-known school in the South Bronx may be more challenging to evaluate," further benefiting students from privileged -and historically familiar -backgrounds (Bellafante, 2020) . A report released by University of California further states that test scores are more accurate for minority groups, and "that consideration of test scores allows greater precision when selecting from [under-represented minority] populations" (University of California Standardized Testing Task Force, 2020). Other application components such as recommendation letters (Dutt et al., 2016) and application essays (Alvero et al., 2021) may also be unreliable. 4 A school that does not consider test scores must rely more heavily on these components. The competing claims from critics and supporters largely center around two issues: access and information. We capture these arguments in favor of and against dropping test scores and formalize the underlying trade-off. The model considers a Bayesian school that wishes to admit students based on their skill level, which we refer to as "academic merit," and also values the "diversity" of the admitted class. The school admits students to meet a capacity constraint and tries to maximize the average academic merit of the accepted cohort. However, it has imperfect knowledge of the students skills and instead must rely on noisy and potentially biased signals, one of which is the test score. The school can then choose whether or not to require the test score. We evaluate the school's admission policy in terms of the average academic merit, overall diversity, and individual fairness of the students admitted (i.e., how the policy affects students of different groups and skill levels). We focus on the trade-off between two effects: Differential informativeness. Colleges often have better information -through, e.g., familiar letter writers and transcripts -on students from privileged backgrounds, and so can better estimate their true academic merit. Standardized testing reduces this measurement gap, and so in particular helps colleges identify well-qualified, non-traditional students. Disparate access and applicant pool composition. Some students -especially those from disadvantaged backgrounds -either do not take standardized tests or do not report their scores, 5 due to cost and other access barriers. Without a test score, students cannot apply 3 After eliminating GRE requirements, UC Berkeley saw an 82% increase in the number of under-represented minority applicants to master's programs in the 2020-2021 cycle: "while overall graduate applications have increased 19 percent when compared to [the 2019-2020 cycle], the number of underrepresented minority (URM) doctoral applicants increased by 42 percent and URM applicants to academic master's programs increased by 82 percent" (Aycock, 2021) . 4 For example, letter writers use different language to describe women and other under-represented groups, giving weaker recommendations (Dutt et al., 2016) , and application essays have a stronger correlation to reported household income than do SAT scores (Alvero et al., 2021) (although they are not necessarily differentially scored). 5 A University of California report on testing states that under-represented students might be discouraged from applying based on their score, even if their score would be competitive (University of California Standardized Testing Task Force, 2020). to a school with a test requirement, even if they are well-qualified. Dropping the requirement thus expands the applicant pool but also alters its composition at different rates across groups. Contributions. Given these effects, we study: Under what settings of informativeness and disparate access should standardized testing be dropped from admissions, if a college values both diversity and academic merit? Our technical contributions are as follows. 1. We introduce a Bayesian model with multiple application components that allows us to study the design of the information structure used in the application process. We formalize a trade-off between informativeness and access, two basic arguments in favor of and against the inclusion of a given feature, and show how the set of features required influences the academic merit and diversity through these two competing effects. Our main technical insight shows that differences in the total variance of features lead to information disparities across groups: even though the school manages to correct for the existing mean bias in the features of different groups, it is generally impossible to correct for variance. Our model leverages the statistical discrimination framework developed in Phelps (1972) and Arrow (1971) , which we extend by considering multiple features and access disparities. Within this framework, we define two fairness notions: diversity and individual fairness. The former captures disparities at a group level. The latter quantifies disparities in individual opportunities, by measuring the difference in the admissions probability between two individuals of equal skill but different demographic characteristics. 2. We provide a testable framework for evaluating the different trade-offs that arise in these decisions. Using application and transcript data from the University of Texas at Austin, we demonstrate how an admissions committee could measure the trade-off in practice to better decide whether to drop their test scores requirement. We show that there exist practical settings both in which dropping testing worsens or improves all metrics. Our primary takeaway for practice is that the decision to drop testing cannot be made without jointly considering the interaction between the information provided by other features relative to test scores and the rate at which dropping the test requirement affects the applicant pool composition. This interaction between information and access is complex. In the absence of access barriers to the test, the information loss incurred by dropping the test requirement always decreases academic merit, but has an ambiguous impact on diversity. However, in juxtaposition with unequal access barriers, the informational disparities can be amplified or reduced by the effect that the expanded access has on the applicant pool composition. We characterize the settings where dropping test scores introduces a trade-off between diversity and academic merit and where it improves or worsens all objectives. More broadly, our work further provides a Bayesian framework for predicting the effect of adding new features into the admissions process. Given some knowledge of the informativeness and access barriers associated with the new feature, the model can be used to reason about how the new feature would interact with the current set of features. We thus believe that our work provides a useful conceptual framework for studying emerging problems in fair decision-making and public policy. Our work broadly relates to the study of discrimination and admissions in the economics and fair machine learning communities. Economics of discrimination. Many economics works study affirmative action in admissions (Abdulkadiroglu, 2005; Avery et al., 2006; Chade et al., 2014; Chan and Eyster, 2003; Epple et al., 2006; Fershtman and Pavan, 2020; Fu, 2006; Immorlica et al., 2019; Kamada and Kojima, 2019) , and more broadly discrimination in markets (Coate and Loury, 1993; Fang and Moro, 2011; Foster and Vohra, 1992; Lang and Manove, 2011; Levin, 2009; Temnyalov, 2018) . In contrast to tastebased discrimination theories (Becker, 1957) , statistical discrimination theory (Arrow, 1971; Phelps, 1972) shows that group differences can arise in equilibrium even if groups are ex ante identically skilled. In particular, in Phelps' seminal Gaussian framework (Phelps, 1972) , employers receive a noisy signal about each worker's skill and are incentivized to use information about a group's belonging to infer the true skill. This exact statistical discrimination approach is surprisingly rare in the admissions literature (except Emelianov et al. (2020) ; Kannan et al. (2019) ). To the best of our knowledge, our work is the first to extend Phelps' model to multiple features with non-identical distributions and incorporate access asymmetries to some feature. Fairness in machine learning and mechanism design. Recent machine learning work applies fairness notions to college admissions and related allocation problems (Cai et al., 2020; Emelianov et al., 2020; Faenza et al., 2020; Haghtalab et al., 2020; Hu et al., 2019; Immorlica et al., 2019; Kannan et al., 2019; Kleinberg and Mullainathan, 2019; Liu et al., 2020; Mouzannar et al., 2019) . In a single-feature setting, several works analyze admissions or hiring decisions when evaluation of one group is noisier than another (Emelianov et al., 2020; Fershtman and Pavan, 2020; Temnyalov, 2018) . Most related, Emelianov et al. (2020) study how differential variance of a single feature affects the admissions decisions of a school that greedily admits students with the highest test scores, without factoring in the differential variance; they show that affirmative action can improve both diversity and academic merit. In contrast, our work studies the impact of differential bias and variance when students have multiple features and schools can potentially drop a feature. Another line of literature considers different types of barriers, including implicit bias (Faenza et al., 2020) , downstream effects of school admissions in later employment (Kannan et al., 2019) , and when only one group can take the test multiple times (Kannan et al., 2021) . These barriers affect the treatment of applicants, but do not prevent students from even applying, as is our focus. Finally, a follow-up paper (Liu and Garg, 2021) builds upon our work to provide (im)possibility results under test-optional policies. We develop a model where the school can design their admissions procedure and, in particular, choose the information that it requires the applicants to submit. We consider a continuum of students and a single school. A unit mass 6 of students is applying to college. Each student belongs to a group g ∈ {A, B}, and the mass of students in group B is π. Each student has a latent (unobserved) skill level q, normally distributed according to N (µ, σ 2 ) identically for each group, as well as a set of observed features θ = (θ 1 , . . . , θ K ). Each θ k is a noisy function of q, i.e., θ k = q+ k , k = 1, . . . , n, with Gaussian noise k ∼ N (µ gk , σ 2 gk ). The distribution of noise k is feature-and group-dependent, but each k is drawn independently across features and students. Features represent application components like recommendation letters, grades, and test scores. Students differ in their access to the features: only a fraction γ g of group g ∈ {A, B} has access to the full set of features full = {1, . . . , K}, i.e., θ = (θ 1 , . . . , θ K ); the remainder only has access to the subset sub = {1, . . . , K − 1}. Whether a student has access to all features is independent of skill q and conditionally independent of the feature values given group membership. We assume that when a student does not have access to feature K, then they cannot apply to a school that requires it. We now turn to the question of interest: the design of the admissions policy. The school admits a mass C of students. The school's admissions procedure consists of a feature requirement policy, skill estimation, and then selection given estimates. The feature requirement policy choice is whether to require the full set of features or the subset. If the school requires the full set, then students without full access cannot apply. If it only requires the subset, then it observes only that subset for each student. Then, given a student's features θ, the school estimates a perceived skillq of their true skill q. The school is Bayesian, knows the distribution of q and the (group-dependent) distributions of k , and is group-aware: it can use the student's group membership in constructing its estimate. 7 The resulting Bayesian estimate is the 'best' one can do, given the available information: After estimating the skill level of each applicant, the school selects the mass C of students with the highest skill estimatesq. This selection process induces a thresholdq * S such that applicants with perceived skill above the threshold are admitted. (In Section 4.3 we also study selection policies utilizing affirmative action, where the school uses potentially group-dependent thresholds. 8 ) Holding the estimation and selection policies fixed (except in Section 4.3), the admissions policy is determined by the feature requirement decision. Let P S denote the admissions policy requiring feature set S. We evaluate a policy P using three metrics on the admitted class. Let Y ∈ {0, 1} denote the admission decision for a given student; Y = 1 means that the student is admitted. Academic merit E[q | Y = 1, P ], the expected skill level of accepted students. We also use group-specific measures, E[q | Y = 1, g, P ]. Diversity level τ (P ), the fraction of students admitted that are of group B. Policy P satisfies group fairness if and only if the admission fraction matches the population, i.e., τ (P ) = π. Individual fairness gap I(q; P ), the difference in admissions probability between two students of identical true skill q, one belonging to group A and the other to group B: Policy P satisfies individual fairness if and only if the gap is 0 for all skill levels q. We characterize these three metrics as they depend on the policy P and the model parameters, as well as how they trade off with one another. While our model and results are more general, our exposition primarily considers undergraduate college admissions in the United States and the debate to drop standardized testing as our main running example. We focus on how policies differentially affect privileged (group A) versus disadvantaged (group B) students. We refer to the potentially inaccessible last feature, k = K, as the test score of a student in a common standardized exam like the SAT or ACT, and assume that more privileged students have access to testing; as Hyman (2016) notes, many well-qualified disadvantaged students do not have access to standardized tests and so cannot apply to schools that require them. On the other hand, as the University of California Standardized Testing Task Force (2020) and Bellafante (2020) posit, without testing it may be especially difficult to evaluate students from non-traditional backgrounds, as colleges instead rely on transcripts and recommendations from familiar (privileged) high schools. This aspect could be captured-as we do for our simulations-by considering the first K − 1 features as substantially more informative for group A (σ Ak < σ Bk ), with a smaller informativeness discrepancy for the test score. The model's focus differs from feature bias as traditionally understood, if a feature systematically under-values one group; e.g., weaker letters of recommendation for under-represented students. In our model, the school fully corrects for such bias (cancelling out µ gk ); in practice, schools interpret signals in context, for example benchmarking how many AP courses are offered by a student's school. In contrast, differential informativeness (function of σ gk ) and disparate access (γ g ) are harder to correct at admissions time. The former represents an information-theoretic limit to identifying the most qualified students, and the latter prevents some students from even applying. As we show, these effects cannot even be completely mitigated using affirmative action, 9 which is particularly insufficient in identifying qualified disadvantaged students. Without loss of generality, we suppose that the features are less informative for group B than they are for group A. Specifically, let unequal precisions between groups mean k∈S σ −2 Ak > k∈S σ −2 Bk , and equal precision mean k∈S σ −2 Ak = k∈S σ −2 Bk . In settings with barriers, we assume that group A also has more access to the test, i.e., γ A ≥ γ B . 10 Finally, the school is selective and has capacity C < 1/2. These assumptions are for exposition; our model's tractability allows us to solve analogously for the omitted cases. 3 Intuition: The role of differential informativeness We begin our analysis in Section 3.1 by deriving how a Bayesian optimal school estimates the students' skill level. Then, we preview our main results, illustrating how the relationship between skill estimates and true skills of the applicant pool depends on the informativeness of features and the access barriers, with implications for how admissions differ by group. Our Bayesian school-with knowledge of the model parameters (feature noise means and variances)observes each student's features and group membership and estimates their expected skill level, using properties of normal distributions. Repeating this process for all applicants induces the following distribution over the skill level estimates for each group. Lemma 1 (Estimated skill). Consider a school that uses feature set S ⊆ {1, . . . , K} for each applicant. Then, the perceived skill of an applicant in group g ∈ {A, B} with feature values θ = (θ k ) k∈S is:q (1) 9 In Section 4.3, we study our policies under the following definition of affirmative action: a constraint on the fraction of students from each group. This approach is common in the literature (Fang and Moro, 2011) and a proxy of the practices adopted by universities. However, as shown by recent lawsuits against Harvard (Hartocollis, 2019) and Yale (Hartocollis, 2020), the legal framework around affirmative action is ambiguous and restrictive. Explicit, predetermined racial quotas are generally illegal; conversely, University of Texas admits students using a high schoolbased quota system (The University of Texas, 2019). 10 We further assume that, even in the presence of barriers, the market is over-demanded in the sense that the school can not admit all applicants, i.e., C < (1 − π)γA + πγB. q * S µq q q | A, P S q | B, P S Figure 1 : The distribution of skill estimatesq at an aggregate level for each group, as it depends on informativeness of the features. When application components are more precise for one group (group A, in green), the variance in the skill estimates of their group is higher -there is more signal for individuals to demonstrate that their skill is different than the mean. Then, more group A have high skill estimates above thresholdq * S , and thus more are admitted. This effect occurs even though the true skill q distribution is identical across groups. If dropping the test causes such differential informativeness, then doing so may worsen both fairness and academic merit (estimated skill of admitted students). Figure 2 illustrates how the differential informativeness interacts with disparate access, due to which dropping test scores may improve all objectives. Further, the skill level estimates for students in group g are normally distributed: As Equation (1) shows, when the school estimates the skill levelq(θ, g) of an individual and knows the skill and feature noise distributions, it perfectly cancels out the mean bias terms µ gk such that they do not affect estimation. 11 The school also re-weights each feature θ k proportionately to the relative informativeness of this feature for group g: the less informative a feature is for a group (smaller precision σ −2 gk ), the less it contributes to estimates. Thus, due to informational differences in σ −2 gk across groups, two students from different social groups with the same features θ are evaluated differently. However, even in this idealized scenario, the school cannot fully correct for the variance terms σ 2 gk ; two students with same skill q but in different groups have different skill estimates in expectation. These individual estimation effects accumulate at the group level (Equation (2)) and drive our results on disparities. The school knows that q ∼ N (µ, σ 2 ) is identically distributed across social groups. However, as illustrated in Figure 1 , the distribution of its skill estimatesq | g, P S for each group can differ across groups. For each group, the skill estimates are regularized toward the mean 11 (University of California Standardized Testing Task Force, 2020): "test scores are considered in the context of comprehensive review, which in effect re-scales the scores to help mitigate between-group differences." Figure 2a represents a world without access barriers and when the features are approximately equally informative across groups. Figure 2b illustrates the consequences of requiring a test when group B (in pink) has access barriers: fewer can apply and so can be admitted. Figure 2c illustrates potential consequences of dropping the test: the school may be unable to distinguish among group B applicants, leading to worse estimates (rotated away from diagonal) and fewer admitted. skill level µ. The regularization strength depends on the total precision k∈S σ −2 gk : the larger the total precision for a group is (or the more informative its features are), the higher the variance in the estimated skills for that group is. In Figure 1 , group A has larger total precision and for any valueq > µ, there is a larger mass of students from group A than B with estimated skill higher thanq. When a college with capacity C < 1 2 admits students with the highest skill estimates, more students in group A are admitted. Before proceeding to our main results, we can now illustrate our main insight regarding the tradeoff between informativeness and applicant pool size. In Figure 2 , each sub-figure shows, for one scenario, the joint distribution between true skill q and the corresponding skill estimatesq for each group -along with the respective marginal distributions. Since both groups have identical true skill distributions, the joint distributions would ideally be identical for the two groups (and perfectly aligned along the diagonal); a policy that thresholds on estimated skill would then lead to group B composing a proportion π of the admitted class. Consider the case where the potentially dropped feature (the "test score") is equally informative for both groups, whereas the remaining features are more informative for group A. Figure 2a illustrates the scenario when there are no access barriers to the test. Due to the differential informativeness induced by the other features, (slightly) more group A students are admitted: the college can better estimate their true skill, as illustrated by the group A joint distribution being closer to the diagonal. Figure 2b illustrates the consequences of requiring test scores in the presence of unequal access levels (γ A = 1 and γ B = 2 3 ). Among those who apply, the college can estimate true skill as well as it could in Figure 2a . However, fewer group B students can apply, as indicated by the smaller marginal count histogram, and so fewer are admitted. Figure 2c illustrates a scenario where the school removes the test score. Estimates for both groups are worse, as reflected in the joint distributions being further from the perfect estimation diagonal. However, skill estimates for group B students are especially degraded as their other features may be less informative, and so they make up a smaller proportion of the admitted class. In this section, we apply the insights from Section 3 on feature informativeness and skill estimation to our college admissions setting. In Section 4.1, we focus solely on the effect of differences in informativeness, assuming no access barriers. We find that disparities arise with respect to all of our metrics of interest: academic merit, diversity, and individual fairness. In Section 4.2, we compare two admissions policies: with and without a certain feature (e.g., test scores). When students have full and equal access to testing, we demonstrate how removing information might further decrease both fairness and academic merit under reasonable conditions. However, when students have different levels of access to the test, there is a trade-off between the barriers imposed by a test and the potentially valuable information a test may contain. Requiring the test can give schools more information on the students who do apply, but removing the test can increase the pool of skilled applicants. We characterize the school's optimal policy to include or exclude the test, depending on the relative sizes of these two effects. Finally, in Section 4.3, we study the effect of affirmative action alongside the aforementioned policies. Affirmative action by definition improves diversity and individual fairness. However, it is insufficient in mitigating the differences in informativeness and, in the absence of barriers, its effect on the academic merit is heterogeneous: academic merit decreases further for group B and increases for group A. In general, our fairness notions are not achievable -even though we assume that both groups have the same true skill distributions. For differential access barriers, this result immediately follows from the definitions: high skilled group B students who otherwise would be admitted can no longer apply. The effects of differential variance are more heterogeneous and are formalized in the result below. Recall from Section 2 thatq * S denotes the admission threshold of the school under policy P S . Let also Φ denote the CDF of the standard normal distribution N (0, 1). Proposition 1 (Metrics with a fixed policy). Suppose that a selective school uses admissions policy P S . Group fairness and individual fairness fail except for equal precision. Given unequal precisions: (i) Diversity level: Group B students are under-represented, i.e., τ (P S ) < π. Furthermore, larger informativeness gap leads to decreased diversity: fix group B precision, k∈S σ −2 Bk ; then as group A precision increases, the diversity level τ (P S ) decreases. (ii) Individual fairness: High-skilled group B students are hard to target, i.e., I(q; P S ) > 0, if and only if Increasing the informativeness gap increases the individual fairness gap for high-skilled students: fix group B precision, k∈S σ −2 Bk ; then as group A precision increases, I(q; P S ) increases for q > µ + σΦ −1 (1 − C). (iii) Academic merit: The policy achieves worse academic merit for admitted students from group This result suggests that, although the school's Bayesian-optimal decision-making process can eliminate bias from skill estimates (see Section 3), the informativeness gap-as quantified via the difference in the total precision across groups-induces disparities in the admission outcomes even of ex-ante identical groups of students. As Figure 3 illustrates, and as we prove in Online Appendix C.3, with overall equal precision (the vertical line) both groups are admitted according to their population fractions (here, 1 − π = π = 0.5); however, all fairness metrics degrade as the gap in informativeness between the two groups increases. Access barriers (even if limited to one group) have a similarly negative effect, albeit for a different reason: high-skilled students who otherwise would be admitted cannot even apply as they have not taken the test, cf. Hyman (2016). The errors in estimation due to unequal precision affect the academic merit of each admitted group. Part (iii) establishes that under unequal precisions and no other disparities, students from group A admitted to selective colleges will be of higher true skill (on average) compared to the admitted students from group B, in contrast to existing theoretical results (cf., Faenza et al. (2020) ). This discrepancy is due to the fact that the school fails to identify the high-skilled students from group B -part (ii) for individual fairness shows that high skilled students in group B are less likely to be admitted than they would if they were in group A. We note that although the individual fairness gap is positive for all sufficiently high-skilled students, the magnitude of this gap varies. In fact, for students at the end of the right tail of the true skill distribution, the individual fairness gap starts to decrease, since -despite the noisetheir estimates are high enough for admission. We prove this relationship in the following lemma. Variance of test score for Group B True skill q : How the admitted students' academic merit, fraction of each group, and individual fairness gap change with group B test score variance and test access, respectively. With equal precision and no barriers, groups are treated equitably. As the feature variance or barriers increase for group B, both academic merit of admitted B students and fairness metrics worsen. We considered π = 1 − π = 0.5; the full parameter set can be found in Appendix A.5. Lemma 2. Consider policy P S , and assume unequal precision. The individual fairness gap I(q; P S ) is decreasing in q for q > q e , where Furthermore, lim q→∞ I(q; P S ) = 0. Intuitively, a very high-skilled student has low probability to be incorrectly perceived as noneligible for admission. This is because highq >q e overall leads to higher values of features θ which in turn lead to a higher perceived skillq. Of course, the informational disadvantage of students in group B still has an effect and so the individual fairness gap remains positive. These results on how a single policy performs as the model parameters change further hint at the difficulty in deciding whether to drop standardized testing. Doing so increases estimation variance (perhaps differentially, as Bellafante (2020) and University of California Standardized Testing Task Force (2020) posit), worsening all metrics, but also reduces access barriers, improving all metrics. These effects interact to induce the overall effect. Our next section formalizes this interaction. In this subsection, we ask: under what conditions would ignoring a feature benefit the school and the applicants?. We study this question by comparing the test-free policy P sub to the test-based policy P full in two different scenarios: Theorem 1 and Theorem 2 consider settings with and without barriers, respectively. In Theorem 1 below, we study first the effect of dropping test scores on academic merit and diversity in the presence of barriers. Theorem 1 (Dropping tests with barriers). Consider policies P full and P sub and assume unequal precisions under P full . In the presence of barriers, dropping test scores has the following implications: (i) Diversity level: Holding other parameters fixed, there exists thresholdγ such that the diversity level improves under P sub if and only if γ B <γ. (ii) Academic merit: For each group g, holding other parameters fixed, there exists a thresholdγ g such that academic merit of group g increases under P sub if and only if γ g <γ g . Perhaps surprisingly, Theorem 1 establishes that the academic merit of the admitted class may improve after dropping the test score. Similarly, diversity may deteriorate after dropping test scores. More specifically, Theorem 1 offers a threshold characterization, where the thresholdsγ g andγ are functions of both the access levels of the two groups as well as the variance parameters, with and without the test. We provide the full characterization of these two quantities in Online Appendix C.5. We also include additional illustrations of the effects of dropping the test, with changes in the variance and access parameters; in particular, Figure 8 illustrates that the decision boundary in terms of the effect on diversity and the total and per-group academic merit can be non-linear. At a high level, Theorem 1 implies that the decision to drop the test requirement is not just a matter of increasing access for the disadvantaged group. On the contrary, it depends on the complex interaction between the informational environment and the access levels of both groups. First, dropping test scores increases the applicant pool size but also affects its composition at different rates for each group. Second, the information loss incurred by dropping the test may not necessarily benefit students in group B. In particular, it is possible that the informational disadvantage faced by group B students may be exacerbated by the absence of test score information even if test scores are more noisy for group B than group A. In this case, the negative informational effect for group B may not be counterbalanced sufficiently by the increase in the group's pool size, especially when both groups face unequal, yet relatively proportional barriers. In addition to the ambiguous impact that dropping test scores can have on the diversity of the admitted class and the academic merit of each group, the decision to drop the test introduces some additional implicit trade-offs. For example, as part (ii) in Theorem 1 indirectly implies and Figure 8 illustrates, only one group (e.g., group B) may be positively affected by this policy change, 12 even if the overall academic merit increases. Depending on the exact model parameters, this might be an inevitable consequence of dropping the test score. Nevertheless, it raises interesting and important fairness trade-offs for policy-makers. Our next result studies the role of information loss in more depth, focusing on just the effect of the variance parameters in a setting without access barriers. Theorem 2 (Dropping tests without barriers). Consider policies P full and P sub , and assume unequal precisions under P full . (i) Diversity level: Diversity level improves after dropping feature K, τ (P sub ) > τ (P full ), if and (3) (ii) Individual fairness: For each group g, there exist thresholds q g such that the admission probability for students of skill q in group g decreases under P sub if and only if q > q g . Further, there exists a thresholdq ≥ max{q A , q B } such that the individual fairness gap increases for all q >q, but may decrease otherwise. (iii) Academic merit: Academic merit decreases for both groups g ∈ {A, B}, i.e., In the absence of barriers, dropping a feature has mixed effects on the diversity level and individual fairness gap. However, it always worsens academic merit for both groups: without test scores, the school has access to fewer information signals and so skill estimates become noisier. The exact effect on diversity depends on both the total precision of the remaining K − 1 features and how much the test precisions σ −2 A,K , σ −2 B,K differ. More specifically, (3) is equivalent to the following condition Similarly, dropping the test may worsen individual fairness. As part (ii) shows, the admission probability of students with sufficiently high true skill, for either group, decreases after removing the 12 As part (ii) in Proposition 2 shows, affirmative action has the same disproportionate effect across groups. test. Furthermore, for sufficiently high-skilled students, the individual fairness gap increases after dropping test scores. This implication is independent of the actual effect on diversity; although the school may manage to improve diversity by dropping the test, the targeting of high-skilled students in both group becomes less effective, leaving high-skilled students in group B disproportionately affected compared to their same-skilled peers in group A. Even without access barriers, the result establishes the importance of understanding features other than the test score -not just their biases (µ gk , canceled out given full knowledge) but also their informativeness. More generally, our theoretical results illustrate that, even in a simple model, that the debate over dropping standardized testing cannot be had without the particulars of the context: whether one cares about overall academic merit of the admitted class or our fairness criteria, the effects depend on the relationships between access barriers, the information content of the test, and the information content of other application components. In Section 5, we will further demonstrate this result empirically and show how there exist real-world settings on either side of the divide -when dropping test score requirements would improve or worsen all our metrics. Schools have often an additional lever in their choice for admissions policy: whether or not to use affirmative action. In this section, we study the outcomes when schools can decide whether to require standardized testing and whether to use affirmative action. The term affirmative action describes admissions policies that partially base their decisions on applicants' membership in social groups with legally protected characteristics (e.g., race/ethnicity or gender), to support both equal opportunity and educational experiences diversity brings (Alon, 2015) . These policies may thus use different admissions thresholds for different groups. As a stylized model of affirmative action, we introduce a constraint on the diversity level τ (P ) achieved by a policy P , i.e., consider admissions policies of the form P τ S , where τ ∈ (τ (P S ), π] is the target diversity level set by the school. Thus, the school still optimizes for academic merit but under the additional constraint that a fraction τ of admitted students belong to group B. To do so, the common admission decision threshold is now replaced by two group-dependent thresholds, q * A,S andq * B,S . 13 Note that τ (P τ S ) = τ ; under affirmative action, diversity improves by definition, and group fairness holds when the target diversity level is set to τ = π. 14 Affirmative action can be utilized on top of test-free or test-based policies. Whereas the testing policy determines the amount of information available in the estimation process, the affirmative action changes the selection process given information. We find that although affirmative action increases diversity, it does not change the information 13 In Proposition 2, the assumptions that γA ≥ 2(1−τ )C 1−π and γB ≥ 2τ C π ensure that, even in the presence of barriers, the admission to the school is over-demanded (in the sense that the school cannot admit all applicants) and selective (meaning that the admission thresholds satisfyq * g,S ≥ µ). 14 Proposition 2 focuses only on diversity levels τ ∈ (τ (PS), π]. The lower bound is reasonable since τ (PS) is the diversity level achieved by a school optimizing solely for academic merit (Theorem 1). The upper bound achieves group fairness. Note that higher levels τ > π could have also been considered with similar results; however, higher values of τ may be infeasible for certain values of C and (1 − π) therefore are omitted. that schools have on students, and as a result the school still cannot identify high-skilled students in group B as well as it can identify group A students. We show that with unequal precision, affirmative action improves the individual fairness gap but does not eliminate it, as disparities in the identification of the highest-skilled students remain. It further increases the gap in academic merit across social groups. Affirmative action alone cannot address the fundamental issue caused by variance in the features. As a result, we consider this decision as orthogonal. Proposition 2 (Affirmative action with a fixed testing policy). Fix the target diversity level τ (P S ) < τ ≤ π and assume unequal precisions. Let also Then, (i) Individual fairness: In comparison to P S , the individual fairness gap improves, i.e., I(q; P τ S ) < I(q; P S ) for all q. However, group A students still have higher probability of admission than same-skilled group B students, i.e., I(q; P S ) > 0, if and only if Finally, there exist parameters such that I(q; P τ S ) < 0 < I(q; P S ) for some q. (ii) Academic merit: Policy P τ S always achieves worse academic merit for admitted group B students than for group A students. Furthermore, in comparison to P S , the academic merit of admitted students decreases for group B, while it increases for group A. We now study how test-free and test-based policies with affirmative actions compare in a setting with unequal barriers to test access. Recall that Theorem 1 shows (without affirmative action) that, conditional on the information environment, if there are substantial barriers to test access, removing the test requirement improves academic merit. The following theorem establishes the same result for a school using affirmative action. Let the function HR denote the hazard rate of the normal Proposition 3 (Dropping tests under affirmative action with barriers). Fix group g ∈ {A, B}, variances σ 2 gk , and target diversity level τ . Let τ A 1 − τ and τ B τ . Dropping the test score requirement improves the academic merit of admitted students from group g, i.e., , if and only if γ g ≤γ g , wherê Fixing all other parameters, the thresholdγ g increases as test variance σ gK for group g increases. The thresholdγ g now depends only the characteristics of group g and τ , in contrast to Theorem 1, where the threshold depends on characteristics of both groups. The result further holds regardless of the economic inequality γ A − γ B between the two groups; under affirmative action with a fixed diversity level, the school conducts the selection process for the two groups separately. Finally, as expected, if the test has a higher variance for a certain group, then it is more beneficial for that group to drop the test. Comparing the policies. Figure 4 compares, for one parameter setting, our policies: with and without testing, and with and without affirmative action at various levels τ . In Figure 4a , the Pareto curves trace the trade-off between diversity and academic merit, for each testing policy. In this scenario, a constraint for group fairness (affirmative action at level τ = π = 1 2 ) does not substantially affect academic merit, while improving both group and individual fairness substantially. Furthermore, dropping tests has an ambiguous effect: it worsens diversity levels and academic merit, as well as the individual fairness gap in the case without affirmative action. However, it (slightly) improves the individual fairness gap with affirmative action. Figure 4 also includes group-unaware estimation policies, that ignore the social group that a student belongs to; in this case, estimating student skill levels requires calculating the posterior from a mixture of Normal distributions. Ignoring group attributes is an oft-proposed but often problematic policy proposal to combat bias (Corbett-Davies and Goel, 2018) . Perhaps unsurprisingly, groupunaware estimation policies perform most poorly. It worsens both the average academic merit of the admitted class and the diversity level, compared to the policy with group-aware estimation. It also leads to large individual fairness gaps, especially for high-skilled students. More details can be found in Online Appendix A.1. Of course, all these effects are dependent on the parameter setting; we now turn to estimating these effects in data. Data. Our data is from the Texas Higher Education Opportunity Project (THEOP), a public dataset of applications and transcripts for universities in Texas (Tienda and Sullivan, 2011). We focus on data from the University of Texas at Austin, for students who applied in 1992-1997. 15 For each applicant, we observe their high school class rank (rounded to nearest decile), standardized test score (SAT, or ACT score translated to equivalent SAT score); we also observe demographic features (gender, ethnicity, citizenship status), characteristics of their high school (relative economic privilege rounded to nearest quartile, public/private, whether it's within Texas), and the major and college to which they're applying. 16 We further observe admissions decisions and, for accepted students, whether they enrolled. Finally, for those who enrolled, we observe rich transcript data: their GPA, number of credit hours, and major/college for each enrolled semester. Simulation setup. We consider our applicant population as those who in reality enrolled to UT Austin (i.e., those for whom we observe college transcript data), and simulate a setting in which these applicants are further applying to a selective program, e.g., honors programs, scholarships, or college transfers. For each such individual, we use their cumulative college college GPA -not counting their first year -to represent their true skill. Then, as features, we use their high school class rank, college (e.g., Engineering, Business), and, in some cases, their standardized test score and college first-year GPA. To form the two groups, we take the upper (group A) and lower (group B) halves of the high schools' economic privilege index. We then simulate our model as follows. We train models using OLS to predict college GPA from the features, using all the students. 17 For each simulation, we sample an equal number of students from each group (π = 0.5) from the population, and assume that 2 3 of group B students (and all group A students) have access to the test. Finally, we predict each student's college GPA using the features, either with or without test score, in one of two cases regarding how informative are features other than the test score: Low informativeness Class rank and college are used as features. High informativeness Class rank, college, and first-year GPA are used as features. The latter is far more predictive of final college GPA than are the other features. Results. Table 1 shows the academic merit and diversity level of admitted students, for each informational case and with and without requiring the test. Results depend crucially on the informational environment. When the college has access to a high quality signal on all studentsfirst-year GPA -dropping test scores increases both academic merit (for both groups) and diversity; it allows more students to apply, without incurring a substantial informational loss. In contrast, in the low informativeness case, without test scores the school must rely on students' high school ranks, which are especially uninformative for group B; then, such students are disproportionately rejected without test scores; not using test scores especially hurts already disadvantaged students as a group, even when 1 3 of them are deemed to not have access to the test. In Appendix A.2, we show that these patterns are robust to the specific assumption on γ B . These findings underscore our theoretical results: the consequences of dropping test scores depends crucially on the information content of other signals, and the decision to do so should (and can) be made in a decision-driven manner. Limitations. There are several major limitations to interpreting our analysis. First, we only observe application data for those students who were able to submit an SAT or ACT score, and so we rely on prior research (e.g., Hyman (2016)) to calibrate a population of students who could have applied in a world without a test score requirement. Second, GPA data is only available for students who are accepted to and who subsequently enroll at UT Austin. Thus, our data cannot help determine whether the test score is predictive of GPA success at admissions time. 18 Rather, our analysis should be interpreted as ranking those already admitted, such as for internal scholarships, transfers to other universities, or admissions to honors programs and colleges. Third, low-income and minority students face many challenges and barriers during their college education, and so their final GPA is itself not reflective of their true skill or academic merit (Engle and Tinto, 2008) . Fourth, we do not observe several features available to admissions committees (such as recommendation letters), and so in practice test scores likely provide less marginal informativeness 17 Results using a random forest model are qualitatively similar. We train a separate model for each group. 18 In causal inference terms, being admitted is a collider between test scores and some unobserved true skill, which influences factors available to the admissions committee but not us as researchers, e.g., recommendation letters. This issue is a common barrier to measuring the predictive power of standardized testing (Weissman, 2020). than in our simulation. Despite these limitations, however, we believe that our study demonstrates how an admissions committee with better data and potential to carry out experiments could make an informed decision on whether to drop testing requirements. Our work has policy implications beyond the formalization of the trade-off between information access and barriers in a testable framework. In Section 4.3, we showed that affirmative action (admitting the top students within each group) can improve diversity and individual fairness. However, it is insufficient in addressing the inequalities that arise due to differential informativeness and access barriers, as it neither helps schools identify the highest-performing students, nor does it increase the applicant pool size. Colleges must further invest in better signals and in expanding their applicant pools. In the setting where test scores are found to be highly effective for skill estimation but also impose large barriers, our work further suggests the value of another option for increasing fairness in admission: decreasing the access barriers. For example, several states have implemented policies to make the SAT and/or ACT mandatory for all public school students, while also reducing both financial and logistical barriers by paying the financial costs of test registration and offering the tests at more convenient times (Hyman, 2016). Further, in reality individual schools do not make the decision to keep or drop testing requirements in isolation, but rather must react to the decisions that other schools make. When one school changes their own admission policy, and thus the pool of students they admit, other schools who are competing for the same pool of students may be affected. In Online Appendix B, we extend our model to study admissions decisions in settings where multiple schools compete for students and provide preliminary results. We show how students' preferences now affect the characteristics of the student body and that schools may have differing diversity levels even when using the same admission policy. We also begin to investigate the effect of one school's policy change on both its own students as well as those of the remaining schools. Note that our theoretical results hold in a highly stylized setting where the school is Bayesianoptimal and knows the parameters of the model. Such an idealized scenario is, in practice, unattainable. We show that even under this idealized setting, inequalities arise -the school cannot correct for the differential informativeness in the features; our work thus presents an information-theoretic limit to how well schools can identify the most qualified students. Even if a school had full knowledge of each group's feature distributions (i.e., were able to perfectly evaluate students' skills in context), the school could not completely mitigate inequalities in admissions. Another assumption is that all distributions are normal, which allows us to study the effect of variance in a transparent and tractable way. This assumption is not limiting: our results can be extended to a more general class of distributions such that group A's skill estimates are a meanpreserving spread (Blackwell, 1953) of group B's skill estimates, though analytic characterizations of the thresholds as we derive may not longer be possible. We also started from the fundamental assumption that the two groups of students are equally skilled; this approach is natural when the skill in our framework represents a student's 'potential.' Given that disparities arise even in this scenario with equal skill variables, we expect inequality -in terms of the individual fairness gap -to further worsen in a setting where one group is characterized by lower skill on average. Furthermore, our notion of barriers is restrictive; additional factors such as differential access to test preparation services (Park and Becks, 2015) and family support 19 (Espenshade and Radford, 2013; McDonough, 1997) may also constitute significant barriers for certain groups of students, though some of these may be captured in the noise bias term and corrected for by the school. However, our calibrated simulations in Section 5 show that our insights hold when these assumptions do not apply. Overall, we believe that our work makes a novel modeling and conceptual contribution to the growing literature of fairness in decision-making systems. Our multi-feature take on the seminal model by Phelps (1972) naturally fits the study of fundamental questions related to fairness in operations, and can serve as a useful technical and conceptual framework to study emerging problems in fair algorithmic decision-making and public policy in education and beyond. More generally, our work suggests that the design of input features to machine learning tasks is an important challenge. In the main text, we primarily consider a "group-aware" estimation procedure, in which the school uses students' group membership in its estimation procedure (and thus is able to plug in groupspecific noise biases and variances). We now briefly discuss "unaware" estimation when it cannot do so. Ignoring group attributes is an oft-proposed but often problematic policy proposal to combat bias in machine learning tasks (Corbett-Davies and Goel, 2018) , and so we evaluate its consequences. Ignoring group membership complicates the skill estimation challenge. When the feature distributions differ across groups but the school cannot observe the group of a student, the resulting estimated skill distribution is a mixture of normal distributions. The mixture weights depend on the noise means and variances of each group g. In contrast to the group-aware case, where the school manages to correct for the feature noise biases (but not variance), the biases now play an important rule in each feature's implications. We derive this distribution below. However, we primarily study the effects through simulation in Figure 4 . Unaware estimation derivation. Conditional on the true skill level q, the features are still distributed according to a group-specific Normal distribution: But under group-unaware estimation, the school does not know or cannot use g, so the posterior is now a mixture of Normal distributions. Specifically, let f (q | θ) denote the pdf of the posterior distribution, q | θ; similarly, we use the notation f (θ) and f (q | θ, g). Thus, Then, the posterior q| θ is distributed as a mixture of Normal distributions, where each Normal is as in the group-aware case: q| θ ∼ g∈{A,B} w(θ, g)N q(θ, g),σ 2 (θ, g) For the weights, we find that and for K features, Thus, we have Derivation for equation (6). We explicitly show the algebra for K = 1 and K = 2 features, and the pattern continues for K features. For one feature: For two features θ 1 , θ 2 : Figure 5 : Additional information for the Low Information case, when the standardized test scores provide much more information to the school than do the other features. In this setting, keeping test scores is best for both objectives, except for extremely low γ B . Table 1 in the main text are robust to our assumption on the fraction of group B students γ B that have access to the test score: in these cases, the informativeness differences between the cases determine whether dropping test scores benefits the dual-objectives. We now repeat our calibrated simulation analysis, using an alternate data source, to analyze the robustness of our methods and insights. Data. We now use a rich cross-national dataset consisting of demographic information and the We then simulate our model, as in Section 5. Results. Figure 7 contains results. As in the High Informativeness setting with the THEOP data, 20 We access a cross-country merged dataset (Cowgill et al., 2020b) compiled by Cowgill et al. (2020a) . 21 Test score is derived as described in the release notes of Cowgill et al. (2020b) . 22 We arbitrarily choose a threshold of $2000, but the result is robust to this choice. we find that in this setting dropping the "test" feature substantially improves both diversity and academic merit, in comparison to requiring an inaccessible test. Variance of test score : Difference between test-based and test-free policies with respect to various objective functions. The more negative (red) the difference, the more that dropping the test improves that metric compared to test-based policies. Simulation is with budgets case, using parameters as given in Appendix A.5. The plot reads as follows: in Figure 8a , a difference of 0.6 means that the average academic merit with a test-based policy is 0.6 higher than that with a test-free policy. Figure 8 supplements the results in Theorem 1 and Proposition 3, regarding the thresholds at which academic merit and diversity improve after dropping the test. In particular, they illustrate that for high enough test score variance or high enough barriers, dropping the test score improves the objectives. Figure 2 , except with test score precision varying together for both groups σ 2 A1 = σ 2 B1 ∈ (0, 3), and group B test access varying, γ B ∈ (0, 1). In reality, individual schools do not make the decision to keep or drop testing requirements in isolation, but rather must react to the decisions that other schools make. When one school changes their own admission policy, and thus the pool of students they admit, other schools who are competing for the same pool of students may be affected. In this section, we describe how to extend the model described in Section 2 to study admissions decisions in settings where multiple schools compete for students. We provide some preliminary results to suggest possible directions to study in this setting. When there are multiple schools, a student who is admitted to more than one school can choose which school they prefer. Thus the students' preferences create a discrepancy between the set of students admitted to a school and the set of students who choose to attend a school. We investigate the impact of interventions on the set of students who ultimately attend the school, in terms of both academic merit and fairness. We show that this competition between schools creates additional diversity concerns not present in the single-school case. Furthermore, when one school makes a change to their admissions policy, this change has downstream effects on other schools. The model is as defined in Section 2, with the following changes. A finite set of schools A 1 , . . . , A N separately admit students. Each school A j has capacity C j for the mass of students who attend the school. The market is over-demanded: N j=1 C j < 1. Each school can choose their own estimation and admissions policy. Let P (A j ) denote the admissions policy of school A j . Let P denote the vector of policies for schools A 1 , . . . , A N . Note that schools with different estimation policies may assign different perceived skills to a given student, and so the ranking of students according to perceived skill may not be consistent across schools. Students' preferences. Students have common preferences over schools, A 1 . . . A N . Students prefer any school over their outside option A o (i.e., A N A o ) and so they apply to all schools, unless they are prevented from doing so by testing barriers. We make the assumption of common preferences for simplicity, although the model can easily be extended so that students have heterogeneous preferences. Admission outcomes. As in the single school case, each school chooses a selection policy, inducing an admissions threshold, and admits all students with perceived skill greater than the threshold. Then, each student attends the school that they most prefer, among the set of schools to which they were admitted. If the student was not admitted to any school, then they choose their outside option A o . These outcomes correspond to a matching of students to schools. Extending the notation from Section 2, for a given student let Y j ∈ {0, 1} denote whether the student attends school A j . 23 There exists a unique set of thresholds, which depends on the entire vector of policies chosen by the schools, such that each school fills its capacity with its most preferred students not accepted to a higher-ranked school. 24 First we consider the case when all schools employ the same admission policy P S . In this case, we can characterize the stable matching of the system as follows. Proposition 4 (Equilibrium with identical policies.). Suppose that students have a common ranking of the schools and that schools all see the same feature set S ⊆ {1, . . . , K} and use a group-aware admissions policy where they admit on perceived qualityq. Then, a stable matching is characterized by a set of cutoffsq * 1 > . . . >q * j > . . . >q * N such that school A 1 is matched to students with perceived qualityq(θ, g, P S ) ≥q * 1 and for j ≥ 2, school A j is matched to students with perceived skillq * j ≤q(θ, g, P S ) X 2 , group B students are under-represented. • Mid-tier schools: When X 1 < 1 − D j−1 < X 2 and X 1 < 1 − D j < X 2 , group B students are not under-represented. • Low-tier schools: When 1 − D j−1 < X 1 , group B students are under-represented. Proof. Consider the distributions ofq | A, P S andq | B, P S and the respective pdfs fq |A,P S , fq |B,P S . Both distributions are normally distributed with the same and potentially difference variance. Assuming k∈S σ −2 Ak = k∈S σ −2 Bk , then Var(q | A) = Var(q | B, P S ) and fq |A,P S and fq |B,P S cross at exactly two points q that solve the following equation: Denote the solutions q 1 , q 2 , with q 1 < q 2 . Note that for any interval [c, d] , the mass of students in group A withq(θ, A) ∈ [c, d] is Bk , then fq |A,P S (q) < fq |B,P S (q) forq ∈ [q 1 , q 2 ] and fq |A,P S (q) > fq |B,P S (q) otherwise. Consider what we define as a high tier school S j : a school where its capacity and the total capacity of all higher ranked school is larger than X 2 . Then this school admits all students with perceived quality higher than q 2 . Since fq |A,P S (q) > fq |B,P S (q) on this interval, the proportion τ j of matched students in group B is smaller than 1 − π thus group B students are under-represented. The cases for the remaining intervals follow similarly. Figure 9 illustrates these results. Diversity levels change with tiers 25 due to the joint effects of estimation differences across groups and competition between schools. Differential variances lead to different skill estimate distributionsq | g, P S for different groups. In the single school setting, the difference leads to fairness concerns (Proposition 1): more privileged students have perceived skill above the admissions threshold, even conditional on true quality. In the multiple school setting, the set of students that ultimately attend a school A j corresponds to a band of perceived skillq: the students admitted to school A j but not to any higher ranked school A i A j . Whether disadvantaged students are under-represented at a school then depends on the mass of each group with perceived skill in that band. Competition across schools leads to different diversity levels, even when all schools choose the same admissions policy. Now we consider a setting where one school deviates from the benchmark model and changes their admission policy. We study the resulting impact on the matching of schools and students. In this setting, one might expect the impact on the system to change depending on whether a low-ranked school or a high-ranked school makes the change. To isolate this effect, we assume that there are two schools A 1 and A 2 , where students prefer A 1 A 2 , and that school A 1 changes their policy to adopt a test-free policy. Note that when schools observe a different set of features for a student, the schools may have different (though correlated) preferences over the students. In the following proposition, we show that the adoption of test-free policies by the top-rank school leads to the decrease in the average true skill of admitted students of each social group but also an increase in the average true skill of admitted students at the low-rank school. 25 The tiers are mutually exclusive but not necessarily exhaustive. Group B students may be under-represented in a school not included in any of the three tiers. Proposition 6. Assume C 1 + C 2 ≤ 0.5. When A 1 uses a test-free policy and A 2 uses a test-based policy, the academic merit of students of group g ∈ {A, B} decreases at school A 1 and increases at school A 2 , compared to when both schools use a test-based policy: Proof. The result for school A 1 follows directly from part (iii) in Theorem 2. Let η be the set of features k = 1 to K − 1. For school A 2 , from the distribution of q |q, g, P a,d for d = K − 1, K and the fact that A 1 has the same capacity C 1 in both scenarios, it follows that, for any constant c > 0 and group g, since the two Normal distributions have the same mean and the variance increases for K − 1. Since school A 2 has the same policy in both the baseline setting and this scenario, the last equation further implies that that q * A 2 ((P a,sub , P full )) ≥q * A 2 ((P full , P full )). Consequently, A 2 sees an increase in the academic merit of admitted students in g, that is E[q |q * A 1 ((P sub , P full )) >q(θ, g) ≥q * A 2 ((P sub , P full ))] ≥ E[q |q * A 1 ((P sub , P full )) >q(θ, g) ≥q * A 2 ((P full , P full ))] ≥ E[q |q * A 1 ((P full , P full )) >q(θ, g) ≥q * A 2 ((P full , P full ))] where the last inequality follows from the academic merit decrease that A 1 experiences for each group g, i.e., Note that the admission outcomes depend on the vector of policies chosen by both schools, and hence the expectation over true skill is conditioned on this vector. In this scenario, A 1 drops the test score and is less effective at identifying the top students. This change decreases the average skill level of the students who are admitted to (and thus, the students who attend) A 1 . Thus, after A 1 drops the test score, there are high-skilled students that would have been admitted under a test-based policy that are no longer admitted to A 1 . These high-skilled students will now attend A 2 , if accepted, increasing the expected skill at A 2 . This discussion illustrates how a change made by one school can have downstream effects on aca-demic merit at other schools. These results are preliminary, but suggestive of possible applications of our model to study the multiple school context. In this appendix, we provide and prove the full statement of each result appearing in the main text. Let Φ denote the CDF of N (0, 1) and HR(x) = φ(x) 1−Φ(x) the Hazard Rate of X ∼ N (0, 1). x ∈ R has the following properties: (i) Its derivative equals dHR(x) dx = HR(x)(HR(x) − x); (ii) It holds that HR(x) > x for all x > 0; Lemma C.5. Let a > 0. The function h(x) = x a HR( a x ) is increasing in x > 0. Proof. Let y = a/x. We study the monotonicity ofĥ(y) = HR(y)/y. The derivative ofĥ(y) equals dĥ(y) dy = dHR(y) dy y − HR(y) y 2 . For any y > 0, it holds that dĥ(y) dy < 0 if and only if dHR(y) dy y − HR(y) < 0. Using part (i) in Lemma C.4, we get that dHR(y) dy y − HR(y) = HR(y) HR(y) y − y 2 − 1 , which is negative for y > 0 if and only if HR(y) y − y 2 − 1 < 0 for all y > 0. By Theorem 2.3 in (Baricz, 2008) , we know that HR(y) < y 2 + √ y 2 +4 2 . Thus, using this inequality, we can bound the quantity HR(y) y − y 2 − 1 as follows: HR(y) y − y 2 − 1 < y 2 2 + y y 2 + 4 2 − y 2 − 1 = y 2 (−y + y 2 + 4) − 1, which is negative for any y ∈ R. Therefore, dĥ(y) dy < 0 for all y > 0. Finally, sinceĥ(y) is decreasing in y > 0 and y = a x , a > 0, is decreasing in x > 0, it follows that h(x) =ĥ a x is increasing in x > 0. Gaussian social learning with feature set S ⊆ {1, . . . , K}. Given that q ∼ N (µ, σ 2 ), kg ∼ N (µ gk , σ 2 gk ) and the noise is drawn independently, each feature k ∈ S is also normally distributed conditional on q, i.e., θ k | q, g ∼ N (q + µ gk , σ 2 gk ). Then, we inductively find that q | θ, g ∼ N q(θ, g),σ 2 (θ, g) , wherẽ Perceived skill conditional on true skill. (7) gives us the skill estimateq of a student conditional on features θ. Another useful distribution is the skill estimate conditional on the student's true skill q and group g, i.e.,q | q, g, P S , which is also Gaussian. Indeed, observe thatq(θ, g) in (7) is a linear combination of independent (conditional on q) Gaussian variables θ k = q + kg , k ∈ S. Lemma C.6. For group-aware estimation policies, the following properties hold: Proof. The proof follows immediately from simple algebra thus it is ommitted. Distribution of skill estimates per group. We find the distributionq | g, P S , that we denote by Fq |g,P S . Lemma C.7 (Lemma 1). Consider a school that uses feature set S ⊆ {1, . . . , K} for each applicant. For g ∈ {A, B}, the skill level estimates for students in group g are normally distributed: Proof. An application of Lemma C.1 for X =q and M = µσ −2 +q k∈full σ −2 gk σ −2 + k∈full σ −2 gk gives us the result. Analytically, the parameters of this distribution can be computed as follows: Bk . Definition 1. Let X and Y be two random variables with support R and CDFs F and G, respectively. We say that X second-order stochastically dominates Y , X SSD Y , if for every t ∈ R, Corollary 2 (Second-order stochastic dominance). If k∈S σ −2 Ak > k∈S σ −2 Bk , then (q | B, P S ) SSD (q | A, P S ) andq | A, P S is a mean-preserving spread ofq | B, P S . Distribution of true skill conditional on skill estimate. To answer questions about the academic merit of the admitted student body, we need to be able to compute the expected value of q conditional on acceptance and the social group g of a student, i.e., E[q | Y = 1, g, P S ]. Thus, we first the conditional distribution q |q, g, P S in the following lemma. Lemma C.8. Suppose that the school uses policy P S . Then, the true skill level q of students in group g ∈ {A, B} conditional on the estimated skill levelq is normally distributed as follows q |q, g, P S ∼ N q, 1 σ −2 + k∈S σ −2 gk . (9) Proof. We apply Lemma C.2 by using the transformation M = µσ −2 +q k∈S σ −2 gk σ −2 + k∈S σ k σ −2 gk and X =q. More specifically, let Then, by Lemma C.2, we get that . Finally, using the linear transfor- we get that q |q, g, P S ∼ N q, Letq * S denote the optimal decision threshold used by the school under policy P S . Using the distribution Fq |g,P S , it follows that thresholdq * S is the solution to the equation (1 − π)Fq |A,P S (q * S ) + πFq |B,P S (q * S ) = 1 − C. By Lemma 1, the Gaussian mixture of Fq |A,P S , Fq |B,P S with weights 1 − π, π has mean µ and variance Recall that for a Gaussian random variable X ∼ N (µ 0 , σ 2 0 ), it holds that X−µ 0 σ 0 ∼ N (0, 1). Thus, (10) can be equivalently written as We also introduce some additional definitions. Given any fixed value of k∈S σ −2 Bk , the informativeness gap ∆ is defined as ∆ = k∈S σ −2 Ak − k∈S σ −2 Bk . Given all parameters, except k∈S σ −2 Ak fixed, let Fq |g,P S (q; ∆) denote the CDF Fq |g,P S parameterized by ∆ ≥ 0 andq * S (∆) and τ (P S ; ∆) denote the corresponding admission threshold and diversity level, respectively, for any ∆ ≥ 0 under the baseline policy P S . We now provide the proof to Proposition 1. Note that the result below considers a general feature set S where the assumption on unequal precisions holds. Proposition 1 (Metrics with a fixed policy). Suppose that a selective school uses admissions policy P S . Group fairness and individual fairness fail except for equal precision. Given unequal precisions: (i) Diversity level: Group B students are under-represented, i.e., τ (P S ) < π. Furthermore, larger informativeness gap leads to decreased diversity: fix group B precision, k∈S σ −2 Bk ; then as group A precision increases, the diversity level τ (P S ) decreases. (ii) Individual fairness: High-skilled group B students are hard to target, i.e., I(q; P S ) > 0, if and only if Increasing the informativeness gap increases the individual fairness gap for high-skilled students: fix group B precision, k∈S σ −2 Bk ; then as group A precision increases, I(q; P S ) increases for q > µ + σΦ −1 (1 − C). (iii) Academic merit: The policy achieves worse academic merit for admitted students from group Proof of part (i). We break the proof into two steps. Step 1: We show that group fairness fails except for equal precision. Given unequal precisions, we further show that τ (P S ) < π. If k∈S σ −2 Ak = k∈S σ −2 Bk , then the two distributions Fq |A,P S , Fq |B,P S are identical so it trivially holds that Fq |A,P S (q * S ) = Fq |A,P S (q * S ) = 1 − C. Consequently, group fairness is achieved. Next, assume that k∈S σ −2 Ak > k∈S σ −2 Bk . Then, by Lemma 1 and Corollary 2, (q | B, P S ) SSD (q | A, P S ) andq | A, P S is a mean-preserving spread ofq | B, P S . Thus, the CDFs Fq |A,P S and Fq |B,P S cross once atq = µ. Furthermore, Fq |A,P S (q) < Fq |B,P S (q), forq > µ and Fq |A,P S (q) > Fq |B,P S (q), forq < µ. Since C < 0.5 = Fq |A,P S (µ) = Fq |B,P S (µ), thenq * S > µ. Therefore, Fq |A,P S (q * S ) < Fq |B,P S (q * S ), which due to (10) implies that 1 − Fq |B,P S (q * S ) < C thus τ (P S ) = π(1 − Fq |B,P S (q * S )) C < π. Step 2: We show that the marginal effect of ∆ on τ (P S ) is negative. Consider 0 ≤ ∆ < ∆ . Since Fq |B,P S (q; ∆) depends only on k∈S σ −2 Bk , it remains unchanged under both ∆, ∆ . Recall that the admission threshold is the solution to (11). Solving forq * S (∆) gives us which is an increasing function of ∆. Thus,q * S (∆ ) >q * S (∆). Therefore, given that the capacity remains constant at C, the diversity level decreases as ∆ increases since Proof of part (ii). We prove each claim in different steps. Step 1: We show that I(q; P S ) > 0 if and only if Recall that for a Gaussian variable X ∼ N (µ 0 , σ 2 0 ), it holds that X−µ 0 σ 0 ∼ N (0, 1). Thus, given policy P S , the probability of admission for a student in group g equals where Consequently, due to the monotonicity of Φ, it holds that I(q; P S ) > 0 if and only if Due to our assumption on unequal precisions, the last inequality further translates to where the RHS is always positive due to school selectivity which implies thatq * S > µ. Thus, we conclude that I(q; P S ) > 0 if and only if Step 2: We show that individual fairness fails except for equal precisions. As an immediate corollary of the previous analysis in Step 1, observe that individual fairness fails unless the LHS in (14) equals 0 for all q; equivalently, individual fairness fails except for equal precision, i.e., k∈S σ −2 Bk − k∈S σ −2 Ak = 0. Step 3: Finally, we show that for q > µ + σΦ −1 (1 − C), I(q; P S ) increases as the informativeness gap increases. We begin with group B. By (8), it follows that By (12), it further follows thatq * S (∆) is increasing in ∆. Consequently, the above probability is decreasing in ∆ since Φ is an increasing function and all terms except forq * S (∆) do not depend on ∆. Therefore, we conclude that the admission probability of group B students decreases for any q as ∆ increases. Next, for group A, note that students with q > µ + σΦ −1 (1 − C) are exactly those students in group A who -given perfectly observable skills q -would be admitted to the class; due to imperfect information, a group A student of true skill q > µ + σΦ −1 (1 − C) has a non-zero probability to get rejected. Next, observe that as ∆ increases, the total precision k∈S σ −2 Ak of group A must increase. Consequently, the variance Var[q | q, A, P S ] decreases thus the estimatesq | q, A, P S of all group A students (including those with true skill q > µ + σΦ −1 (1 − C)) become more precise. Combining this observation with the facts that the capacity C remains constant and the admission probability of group B students decreases, it follows that the probability that the top-skilled group A students with q > µ + σΦ −1 (1 − C) are rejected (either in favor of lower-skilled students in A or students in B) decreases as ∆ increases. Equivalently, their admission probability P[Y = 1 | q, A, P S , ∆] increases as ∆ grows. Putting everything together, we conclude that, given q > µ + σΦ −1 (1 − C), the individual fairness gap I(q; P S ) increases as the informativeness gap ∆ increases. Proof of part (iii). We break the proof into the following steps. Step 1 : We compute the expected value E[q | Y = 1, g, P S ] and show that E[q | Y = 1, A, P S ] ≥ E[q | Y = 1, B, P S ]. Applying Lemma C.3, we get that where t g =q * . Due to school selectivity, we haveq * S > µ. By Lemma C.5, the function is increasing in x > 0 forq * S > µ. Thus, by Corollary 1, we get that that Step 2 : We compute the expected value E[q |q ≥q * K , g, P S ]. Specifically, where the last equality follows from Lemma C.8. Step 3: We show that E[q | Y = 1, A, P S ] > E[q | Y = 1, B, P S ]. Given our assumptions on unequal precisions and school selectivity, the proof follows immediately from Steps 1 and 2. I.e., if Explaining why the individual fairness gap decreases for high-skilled students. We include an observation about the individual fairness gap. Although the individual fairness gap is positive for sufficiently high-skilled students, the magnitude of this gap varies. For students at the end of the right tail of the true skill distribution, the individual fairness gap starts to decrease. This property can be graphically observed in Figure 4b . Lemma 2. Consider policy P S , and assume unequal precision. The individual fairness gap I(q; P S ) is decreasing in q for q > q e , where Furthermore, lim q→∞ I(q; P S ) = 0. Proof. By (8), the individual fairness gap equals Taking the derivative of I(q; P S ) with respect to q, we find that Thus, to prove that dI(q;P S ) dq < 0, it suffices to show that The above condition is equivalent to Given our assumption on unequal precision, i.e., k∈S σ −2 Bk < k∈S σ −2 Ak , we further get that this condition is satisfied for Therefore, the individual fairness gap I(q; P S ) is decreasing in q for q > q e as desired. Furthermore, by the definition of I(q; P S ) and the fact that lim q →∞ Φ(q ) = 1, we immediately get that lim q→∞ I(q; P S ) = 0. We are interested in comparing group-aware policies P full and P sub . By our previous result in Lemma 1, we get that Lemma C.9. The variance ofq | g, P sub is lower than that ofq | g, P full but their means are both equal to µ. Proof. The proof follows trivially from the fact that the function h(x) = x σ −2 +x is increasing in for any g. Letq * sub be the decision threshold of a school considering only features k = 1 to K − 1. By (10), q * sub is the solution to the following equation whereasq * full is the solution to (1 − π)Fq |A,Pfull (q * full ) + πFq |A,Pfull (q * full ) = 1 − C. Lemma C.10. The admission threshold decreases after dropping feature k = K, i.e.,q * sub τ (P full ), if and only if (ii) Individual fairness: For each group g, there exist thresholds q g such that the admission probability for students of skill q in group g decreases under P sub if and only if q > q g . Further, there exists a thresholdq ≥ max{q A , q B } such that the individual fairness gap increases for all q >q, but may decrease otherwise. (iii) Academic merit: Academic merit decreases for both groups g ∈ {A, B}, i.e., τ (P sub ) = 1 − Fq |A,Psub (q * sub ) > 1 − Fq |A,Pfull (q a ) = τ (P full ). By the definition of diversity level and Lemma 1, this is equivalent to the following condition Replacingq * full ,q * sub with their definitions as in (11), the above inequality becomes which -due to the monotonicity of Φ -holds if and only if Using the substitution k∈full σ −2 gk = k∈sub σ −2 gk + σ gK , the last relation equivalently simplifies to (3). Proof of part (ii). We prove each claim at a separate step. Step 1: We show that, for group B, P(Y = 1 | q, B, P full ) < P(Y = 1 | q, B, P sub ) if and only if all q > q g , a sufficient condition for I(q; P full ) > I(q; P sub ), to hold is Defineq max{q, q, q A , q B }. Then, by the previous conditions, we have I(q; P full ) > I(q; P sub ) for all q >q, thus the individual fairness gap decreases. Furthermore,q ≥ max{q A , q B } as required. Finally, if q A < q B , then for all q A < q < q B , P(Y = 1 | q, A, P full ) > P(Y = 1 | A, g, P sub ) but P(Y = 1 | q, B, P full ) < P(Y = 1 | B, g, P sub ) (by Step 1). Thus, I(q; P full ) > I(q; P sub ). Proof of part (iii). Since Var[q | g, P sub ] < Var[q | g, P full ] and, by Corollary C.10,q * sub 1. Step 3: We show that for any given group g and γ g ∈ (0, 1], g = q, there exists thresholdγ g ∈ (0, 1] such that academic merit of group g improves if and only if γ g <γ g . Fix group A; the proof is analogous for group B. It suffices to show that (a)γ A is the unique solution to β A (γ A , γ B , ρ A full ) = ∆ A (ξ A , ρ A sub ) and (b)γ A ∈ (0, 1]. Conditional on the existence ofγ A , uniqueness in (a) follows immediately from the monotonicity of β A shown in Step 2. Existence in turn can be shown as follows. In the absence of barriers part (iii) in Theorem 2 guarantees that the academic merit of group g decreases after dropping test scores, thus β A (1, γ B , ρ A full ) > ∆ A (ξ A , ρ A full ). Furthermore, observe that for γ A = 0, academic merit trivially improves from β A (0, γ B , ρ A full ) = 0 to a positive value ∆ A (ξ A , ρ A sub ) > 0 after dropping test scores. Thus, by the continuity of β A (γ A , γ B , ρ A full ), such aγ A exists. For part (b), continuity of β A further implies that there must exist an interval [0, ), ) > 0, such that β A (γ A , γ B , ρ B full ) < ∆ A (ξ A , ρ A sub ) for all γ A ∈ [0, ). Consequently,γ A ≥ > 0. Proof of part (ii). Plugging (20) into the definition of diversity with and without test scores, respectively, it immediately follows that diversity improves if and only if η(1, 1, ρ B sub ) > η(γ A , γ B , ρ B full ). Step 1: Fix all parameters (including γ A ∈ (0, 1]) except for γ B ∈ (0, 1]. We show that diversity strictly increases as barriers decrease (γ B increases), i.e., η(γ A , γ B , ρ B full ) > η(γ A , γ B , ρ B full ) for γ B > γ B . By (20), the admission threshold increases as γ B increases. Indeed,q * sub is the solution to (1 − π)γ A (1 − Fq |A,Psub (q * sub )) + πγ B (1 − Fq |B,Psub (q * sub )) = 1 − C. Thus, as γ B increases, the solutionq * sub must decrease since each Fq |g,Psub is increasing in its argument. Then, since the admission thresholdq * sub increases but the capacity C, barriers γ A (thus the mass of students in group A who are eligible to apply), and the perceived skill distributions for both groups remain constant, it follows that a lower mass of students are admitted from group A. As a result, the remaining capacity is filled with more students from group B, which in turn implies that diversity increases. Step 2: We show that, given all other parameters fixed including γ A , there exists a thresholdγ B (γ A ) such that diversity increases after dropping the test if and only if γ B <γ. It suffices to show that (a)γ is the unique solution to η(1, 1, ρ B sub ) = η(γ A ,γ, ρ B full ) and (b)γ ∈ (0, 1]. The proof follows as in Step 3 in Part (i). We examine policies P τ S with affirmative action, meaning that the school sets a target diversity level τ (P τ S ) = τ . Thus, the common thresholdq * S in (10) is replaced by two group-dependent thresholds, q * A,S and q * B,S : (1 − π)γ A (1 − Fq |A,P S (q * A,S )) = (1 − τ )C, πγ B (1 − Fq |B,P S (q * B,S )) = τ C. Note further that the distribution Fq |g,P S ≡ Fq |g,P τ S , g ∈ {A, B}, remains unchanged under both admissions policies P τ S and P S , as both share the same group-aware estimation policy and feature set S. Proposition 2 (Affirmative action with a fixed testing policy). Fix the target diversity level τ (P S ) < τ ≤ π and assume unequal precisions. Let also γ B ≤ γ A ≤ 1 such that γ A ≥ 2(1−τ )C 1−π , γ B ≥ 2τ C π . Then, (i) Individual fairness: In comparison to P S , the individual fairness gap improves, i.e., I(q; P τ S ) < I(q; P S ) for all q. However, group A students still have higher probability of admission than same-skilled group B students, i.e., I(q; P S ) > 0, if and only if q > k∈S σ −2 Ak +σ −2 Finally, there exist parameters such that I(q; P τ S ) < 0 < I(q; P S ) for some q. (ii) Academic merit: Policy P τ S always achieves worse academic merit for admitted group B students than for group A students. Furthermore, in comparison to P S , the academic merit of admitted students decreases for group B, while it increases for group A. Proof of Part (i). First, note that (22) gives us Since τ > τ (P S ) and γ B ≤ γ A ≤ 1, it follows thatq * B,S q * S andq * B,S P[q ≥q * S | q, B, P S ], since the distribution ofq | q, P remains the same under both P ∈ {P S , P τ S }. Consequently, I(q; P τ S ) < I(q; P S ). For the proof of the second statement in Part (i), we apply the argument used in Proposition 1, Part (ii). Thus, we get that I(q; P τ S ) > 0 if and only if inequalities above, finally imply that -high-increase-in-historically-underrepresented-graduate-applicants A Supply and Demand Framework for Two-Sided Matching Markets Mills' ratio: Monotonicity patterns and functional inequalities The Economics of Discrimination Should Ivy League Schools Randomly Select Students (At Least for a Little While Equivalent comparisons of experiments. The annals of mathematical statistics Fair Allocation Through Selective Information Acquisition Student Portfolios and the College Admissions Problem Does Banning Affirmative Action Lower College Student Quality? Will Affirmative-Action Policies Eliminate Negative Stereotypes? The Measure and Mismeasure of Fairness: A Critical Review of Fair Machine Learning Nakul Verma, and Augustin Chaintreau. 2020a. Biased Programmers? Or Biased Data? A Field Experiment in Operationalizing AI Ethics Replication Data (A) for 'Biased Programmers or Biased Data?': Individual Measures of Numeracy, Literacy and Problem Solving Skill -And Biographical Data -For a Representative Sample of 200K OECD Residents Similarly, for group A, it holds that P(Y = 1 | q, A, P full ) > P(Y = 1 | q, A, P sub ) if and only ifAssume g = B; the proof for group A is analogous. Replacingq * S from (11) in (13), we find that for policy P S , the admissions probability (conditional on true skill q and group g) equalsThus, the admission probability increases after dropping test scores, if and only ifThis is equivalent to q < q B , i.e.,Step 2: We show that there exists a thresholdq ≥ max{q A , q B } such that the individual fairness gap increases for all q >q. Otherwise, it may decrease. LetNext, consider only q > max{q, q A , q B }. Since Φ is monotone and convex in (−∞, 0] and q >q, and byStep 1 for any group g, it also holds that P(Y = 1 | q, g, P full ) > P(Y = 1 | q, g, P sub ) for and under P τ S , the condition in Part (ii) in Proposition 1, holds with equality for someq, i.e.,Thus, I(q; P τ S ) < 0. However, for q =q, we also have thatTo see why, observe that given the condition in (24), the functionin Proposition 1 further guarantees that I(q; P S ) > 0 for instance Ω. Finally, we have constructed a problem instance Ω such that I(q; P S ) > 0 > I(q; P τ S ) for someq. Thus, such an instance exists.Proof of Part (ii). We use an argument similar to part (iii) in Proposition 1 (note that this part holds for any common threshold greater than µ and not onlyq * S ). Similarly to (16), we derive that for both g ∈ {A, B}, E[q |q ≥q * g,S , g, P τ S ] = E[q |q ≥q * g,S , g, P τ S ]. By the same part (iii) in Proposition 1, replacingq * S with thresholdq * A,S > µ implies that E[q |q ≥q * A,S , A, P τ S ] > E[q |q ≥ q * A,S , B, P τ S ]. Next, we have thatThe fact that E[q |q * A,S >q ≥q * B,S , B, P τ S ] < E[q |q ≥q * A,S , B, P τ S ], together with theRegarding the second statement of part (ii), recall that the distributions Fq |g,P S and Fq |g,P τ S are identical. Sinceq * B,S