key: cord-0669533-sx792mnp authors: Sailer, Noah; Schaan, Emmanuel; Ferraro, Simone title: Lower bias, lower noise CMB lensing with foreground-hardened estimators date: 2020-07-08 journal: nan DOI: nan sha: 47064f8cc9bf59a2ca20beac3d42c3e4841f2b80 doc_id: 669533 cord_uid: sx792mnp Extragalactic foregrounds in temperature maps of the Cosmic Microwave Background (CMB) severely limit the ability of standard estimators to reconstruct the weak lensing potential. These foregrounds are not fully removable by multi-frequency cleaning or masking and can lead to large biases if not properly accounted for. For foregrounds made of a number of unclustered point sources, an estimator for the source amplitude can be derived and deprojected, removing any bias to the lensing reconstruction. We show with simulations that all of the extragalactic foregrounds in temperature can be approximated by a collection of sources with identical profiles, and that a simple bias hardening technique is effective at reducing any bias to lensing, at a minimal noise cost. We compare the performance and bias to other methods such as"shear-only"reconstruction, and discuss how to jointly deproject any arbitrary number of foregrounds, each with an arbitrary profile. In particular, for a Simons Observatory-like experiment foreground-hardened estimators allow us to extend the maximum multipole used in the reconstruction, increasing the overall statistical power by $sim 50%$ over the standard quadratic estimator, both in auto and cross-correlation. We conclude that source hardening outperforms the standard lensing quadratic estimator both in auto and cross-correlation, and in terms of lensing signal-to-noise and foreground bias. CMB lensing [1, 2] is rapidly becoming one of the most powerful tools in a cosmologist's toolkit. With the ability to measure matter fluctuations to high redshift, no photometric redshift uncertainties and its sensitivity to crucial cosmological parameters, sub-percent precision measurements have the potential to greatly increase our understanding of the Universe. Current and upcoming wide-field CMB experiments such as AdvACT [3] , SPT-3G [4] and Simons Observatory [5] will heavily rely on reconstruction from CMB temperature fluctuations (as opposed to polarization). For future polarization-dominated experiments like CMB-S4 [6] and CMB-HD [7, 8] , temperature lensing will still contribute a non-negligible part of the total signal-to-noise. Temperature maps contain significant contamination from extragalactic foregrounds such as the thermal and kinematic Sunyaev-Zel'dovich effects (tSZ and kSZ), the Cosmic Infrared Background (CIB), and both radio and infrared point sources. Lensing reconstruction on contaminated maps creates significant biases in both the auto-and cross-correlation of the reconstructed field, which in turn can bias our inference of cosmology [9] [10] [11] [12] [13] [14] . These biases can be up to several tens of percent for upcoming surveys, thus severely limiting our ability to use CMB lensing for precision cosmology. Several techniques have been proposed and implemented to mitigate the impact of extragalactic foregrounds. Some are based on symmetries, such as the recently proposed shear-only reconstruction [13] (and related generalizations such as hybrid and general multipole estimators), while others make use of multi-frequency * nsailer@berkeley.edu † eschaan@lbl.gov ‡ sferraro@lbl.gov maps to reduce the impact of particular foregrounds [11, 15] . Here we revisit and extend a bias hardening technique first proposed in [10, 16] and applied to real data in [17] , and compare it to shear-only reconstruction and the hybrid estimator of [13] . We show that if we approximate extragalactic foregrounds as a collection of sources with identical profiles, an estimator for their amplitude can be obtained and hardened against in an optimal way to mitigate their impact on the reconstructed lensing convergence. With the use of realistic and correlated simulations of all of the foregrounds, we show that this approximation is excellent and leads to a very large suppression in bias, allowing for the use of considerably smaller scales in the reconstruction. The remainder of the paper is organized as follows: In Section II we review the general formalism for bias hardening against a single foreground, then generalize the procedure to an arbitrary number of foregrounds, each with arbitrary profiles. In Section III we specialize to the case where foregrounds are point sources, and in Section IV we numerically explore the biases and the performance of mitigation through bias hardening or shear-only estimators. We conclude with Section V. In this section, we first review the derivation of the standard lensing quadratic estimator (QE), and rephrase it in terms of the linear response of the map covariance to the lensing convergence. We build on this formalism to construct foreground estimators, and finally null the linear response of the lensing QE to these foregrounds, as in [10, 16] . The observed CMB temperature T is the sum of the lensed primary CMB T CMB and the foregrounds s, which will be approximated by a collection of sources in this work: T = T CMB + s. The lensed CMB temperature T CMB field is statistically invariant under translations. As a result, the off-diagonal covariance T CMB T CMB L− with L = 0 is zero. However, fixing the lensing convergence Fourier mode κ L breaks this statistical homogeneity, and produces non-zero off-diagonal covariances: where C 0 is the unlensed power spectrum. From this, we obtain an unbiased lensing estimator (to first order in κ) from the ratio T T L− /f κ ,L− . The standard lensing QE [18] is simply the inverse-variance weighted average of all these "building block" estimators, summing over at fixed L. This leads to the usual result: In order to generalize this construction to foreground quadratic estimators, we consider Eq. (1) and note that f κ ,L− can be seen as the linear response of the two point correlator T T L− at fixed κ L to the lensing convergence mode κ L : Furthermore, by multiplying both sides of Eq. (1) with κ −L , we see that this linear response can be expressed as the ratio of the following bispectrum and power spectrum: This final expression is our starting point to derive foreground quadratic estimators. Following the argument above, anytime a field s (e.g. a foreground, the mask, the beam) has a non-zero bispectrum s s L− s −L , we can define the following linear such that We can then define the "building block" foreground estimator, unbiased to first order in the foreground s, via T T L− /f s ,L− . Finally, we inverse-variance weight these estimators for all , at fixed L, to obtain the standard minimum variance quadratic estimator for s. In practice, evaluating this estimator requires knowledge of the foreground non-Gaussianity, through its bispectrum. We note however that no assumption about the foreground trispectrum is used. In this section we review the derivation of the biashardened (BH) lensing estimator of [10, 16] for a single foreground, relate the noise of the BH estimator to the noise of the standard minimum variance QE, and generalize the procedure for an arbitrary number of foregrounds. In the presence of a single foreground s, the off-diagonal covariances of the observed temperature map take the form to lowest order in κ and s. In this equation the average is taken at fixed κ L and s L , which will be implied from here on out for notational economy. Since both κ L and s L produce off-diagonal covariances, the standard quadratic estimators for κ and s acquire a bias. These biases can be written in the form [10] : where the response R L is defined to be We see that the bias toκ is proportional to s, and viceversa. This allows us to create new bias-hardened estimators, which are the unique linear combinations ofκ and s that null these biases By taking an average of Eq. (10), and inserting Eq. (8) on the right hand side, we see that bias-hardened estimators indeed satisfy κ BH L = κ L and ŝ BH L = s L . By construction, the weights of the standard QE are chosen to minimize its variance. Since the BH estimator's weights differ from the standard QE's, bias-hardening comes at a price in noise, which we quantify in this section. We start with Eq. (10), the definition of the BH estimator, from which we compute the variance (11) In this equation we've used the identity N κ L = N κ −L , and we've assumed that f s ,L− is even in both of its arguments, which forces both the noise of the source estimator and the response to be even functions of L. This assumption is valid when modeling the field s as a collection of individual sources with profiles u (since u = u − for any physical emission profile), as in section III. The cross-correlation ofκ andŝ is proportional to the response function, and is explicitly given by Combining Eq. (11) and (12) results in the simple expression We see that the noise cost for bias-hardening is completely determined by the determinant of the matrix in Eq. (10) . The value of this determinant is plotted in Fig. 1 for the case of point source foregrounds (f s ,L− = 1). Nearly identical calculations yield the noise of the bias-hardened source estimator and the noise cross-correlation of the two bias-hardened estimators. The results are 3. Simultaneously hardening against n foregrounds It is straightforward to generalize the bias-hardening procedure for n foreground fields s 1,L , · · · , s n,L . In this case the observed map has off-diagonal covariances Figure 1 . Bias-hardening against point source foregrounds. The solid curve is the determinant of the matrix in Eq. (10), which is the ratio of the bias-hardened estimator's noise to the standard QE's noise. It is close to unity on most scales, implying that the noise penalty for bias hardening is minimal. The dashed curve is the ratio of the SNR of the point source estimator to the relative bias to C κ L from the source trispectrum. This ratio is roughly 10 at all scales. Thus in the regime where the bias to the standard QE is significant, the source estimator reconstructs the sources with a high SNR, and the source-hardened estimator can remove the bias due to sources with little noise cost. The spike is due to a zero crossing in the response RL at L ∼ 2500. to lowest order in κ and the foreground fields. Just as before, this average is taken at fixed κ L , s 1,L , · · · , s n,L . First, we write down the standard minimum variance quadratic estimators for each of the fields: where X = κ, s 1 , · · · , s n . We then write down the (n + 1) × (n + 1) matrix that quantifies the biases to all the estimators, analogous to Eq. (8) . Finally, we invert this matrix to obtain the bias-hardened estimators where the generalized response is defined to be Just as in the previous section, the response R X,Y L is proportional to the cross-correlation of the two fields X and Y . When X = Y , the response is simply the inverse of the noise, which is manifested in Eq. (17) by the 1's on the diagonal. Writing the matrix M L as where A L is a n×n matrix and w L , v L are n-dimensional column vectors, we find that the noise of the BH lensing estimator is simply A derivation of this equation is given in Appendix C. In the previous section we reviewed how to bias-harden against any number of foregrounds. The only input to this procedure is the linear response f s ,L− of each foreground, which can be calculated if the foreground's bispectrum and power spectrum are known. To estimate these quantities we adopt a halo model, writing each foreground s(x) as a sum of individual sources with profiles u(x): where s i and x i are the flux and position of the i'th source, and the profile u(x) is assumed to be source independent. These sources are assumed to be a Poisson sampling of the matter density field. Because they trace matter on large scales, the source field s is correlated with the true lensing convergence field. As we show below, this correlation causes a bias in the CMB lensing cross-and auto-correlations. With this model, the linear response takes the form where in the second equality we assumed that the sources are sparse, such that the bispectrum and power spectrum are both shot noise dominated, and the source clustering is negligible. In this case the linear response simplifies to a separable function of and L − , and is proportional to the ratio of the third to the second moment of the individual source fluxes. We can choose the normalization of the source estimatorŝ to remove the constant factor s 3 i / s 2 i . Doing so makesŝ an estimator of s ≡ s s 2 i / s 3 i , which has linear response f s ,L− = u u L− /u L . This simplifies the expression of the source estimator tô which can be quickly evaluated using FFT. We emphasize that the weights and normalization of the BH estimator are left unchanged by this renormalization ofŝ. Therefore the bias-hardening procedure requires no knowledge of the source amplitude, and the normalization of the profile u is completely arbitrary. For this reason we will drop the primes from here-on-out, defining f s ,L− = u u L− /u L . Consider a single source component s. If s were Gaussian, it would be indistinguishable from map noise, and the source estimatorŝ would not be able to pick it out. Here, we show how the non-Gaussianity of the source map controls both the signal-to-noise (SNR) of the source estimator, and the bias from sources to CMB lensing. For simplicity let s be a Poisson distributed point source field (f s ,L− = u = 1). In this case the mean number of sourcesn determines the non-Gaussianity; asn goes to infinity, the statistics converges to that of a Gaussian. The point source estimator is an unbiased (to lowest order) estimator of s, so the SNR per Fourier mode is simply : Thus the SNR for the source estimator scales with the scale-independent source trispectrum T s : it is larger for low source densities which corresponds to a higher non-Gaussianity. As the number of sources increases, the field becomes increasingly Gaussian, and can no longer be distinguished from Gaussian noise in the map. The expression above is valid for mildly non-Gaussian source field, as is the case in reality. For the very highly non-Gaussian case, which is not relevant for the CMB, the source non-Gaussianity contributes also to the noise, leading to a saturation of the SNR to a finite value asn goes to zero. Even if the source field and lensing convergence were independent, point sources would cause a bias to the standard QE due to its trispectrum: (26) Here again, the source non-Gaussianity controls the bias to lensing via the 1/n scaling. From equations (25) and (26), we see that the ratio of the SNR of the source estimator to the relative bias to lensing from the source trispectrum is (N κ L N S L R 2 L ) −1 . This ratio cancels out the non-Gaussianity of the sources and is around 10 for most scales of interest, as shown in Fig 1. This means that subtracting the lensing bias due to the source trispectrum will always cause only a small loss in lensing SNR. Indeed, as shown in Fig 1 (solid line) , the lensing noise for the standard QE and the bias hardened estimators are very similar, meaning that subtracting the source bias causes a negligible increase in the lensing noise. This is consistent with the findings of [19] . Finally, the non-Gaussianity of the point sources also contributes an extra noise term in the lensing reconstruction, which we haven't included. However, this noise is only important when the bias to CMB lensing is also important. Thus, by making sure that the non-Gaussian point sources (and other foregrounds) do not significantly bias the lensing signal, we are making sure that their noise contribution is also small. In this section we quantify the biases for three different bias-hardening schemes, and compare these results to the standard QE and Shear estimators. We consider two source components: point-sources f PS ,L− = 1 and sources with tSZ-like profiles f tSZ ,L− = u u L− /u L . The profile u is calculated by taking the square root of the tSZ power spectrum, which is measured from the simulations. Our three different bias-hardening schemes harden against one or both of these source components. The Point Source Hardened (PSH) estimator hardenens against point sources; the Profile Hardened (PH) estimator hardens against a tSZ-like cluster profile, as described in App. D; and the Point source and Profile Hardened (PPH) estimator simultaneously hardens against both. We show the noise power spectra for these various estimators in Fig. 2 , and performed various checks of our code in App. F. The presence of a non-Gaussian foreground component s in the temperature map produces three bias terms to CMB lensing, referred to as primary, secondary and trispectrum terms [9, 10, 12, 13] . Indeed, if T = T CMB +s, then the quadratic estimator Q applied to T can be expanded bilinearly as: assuming the quadratic estimator is symmetric (or has been symmetrized in its arguments). In cross-correlation with a mass tracer g, this simply leads to the bispectrum bias: Hardening against a single source (PSH,PH) marginally increases the noise over QE, whereas hardening against multiple sources (PPH) results in a larger noise penalty. We also note that the noise cost of bias-hardening increases with the size of the source profile u , which is discussed further in Appendix E. For the auto-correlation, more terms arise: The primary and secondary bias terms are due to the non-Gaussianity of the foreground and its correlation with the true CMB lensing signal. They are integrals of the κ CMB ss bispectrum. In particular, predicting them requires knowing the redshift distribution of the foreground. On the other hand, the trispectrum bias is present whether or not the sources are correlated with CMB lensing, and only depend on the statistics (the trispectrum) of the projected 2d foreground map. In what follows, we compute these biases for the various quadratic lensing estimators of interest. We follow the approach of [13] , using the simulations from [20] . Here we explain which of these bias terms are nulled with bias-hardening. The only knowledge of foregrounds used in the bias hardening process is the shape of the foreground bispectrum and power spectrum, through f s . In particular, the procedure does not assume anything about the ssκ CMB bispectrum, which causes the primary and secondary biases, nor about the ssss trispectrum, which causes the trispectrum bias. Why would bias hardening help reducing foreground biases at all then? The answer is somewhat subtle. The bias hardening procedure nulls the linear response of the lensing estimator to the foreground s, i.e. the bispectrum Q BH [s, s] s . This is very similar but not identical to the primary bispectrum bias Q BH [s, s] κ CMB . The two bispectra are all the more similar as the foreground map and the true lensing map are highly correlated. Thus in principle, the primary bispectrum bias is only partially reduced by the bias-hardening. In practice though, as we show below with simulated foregrounds, this reduction is very large. As for the secondary bispectrum bias, there is no reason a priori why the bias hardening procedure would reduce it. Indeed, we show below that the various lensing estimators have a similar level of secondary bias. The case of the trispectrum bias is interesting. In general, if T 1, 2, 3 , 4 is the source trispectrum, the trispectrum bias to lensing is given by (30) If the lensing weights of the bias-hardened estimator are denoted as F BH ,L− , then the zero linear response to the foreground can be translated as: This does not null the trispectrum bias in general. However, it does in the case of Poisson sources with identical profiles. Indeed, in this case, the trispectrum T 1, 2, 3, 4 =n s 4 u 1 u 2 u 3 u 4 is separable, such that the trispectrum bias becomes: Since the response f s ,L− = u u L− /u L ∝ u u L− , we see that Eq. (31) implies that the trispectrum bias is exactly nulled. However, it is no longer generally true if the sources are clustered, or if there is a distribution of profile sizes or shapes. Because our profile hardening assumes a single fiducial source profile, the trispectrum bias is also not exactly nulled when the foreground sources have a range of profiles, as is the case for the tSZ. In summary, the PH estimator should significantly reduce the tSZ trispectrum bias, while the PSH estimator should exactly null the point source trispectrum. Both should largely reduce the primary bias, with no a priori expectation for the secondary bias. For bias hardening against more complex foregrounds, such as clustered sources, the effectiveness of this simple bias hardening is not guaranteed. However, if the statistical properties of the source distribution is known, it is nonetheless possible to extend our formalism to harden against non-linear clustering. This has been explored in the context of line intensity mapping in [21] , and we believe that a similar treatment could be helpful for mitigating biases to CMB lensing. To quantity the biases of each estimator for each of the extragalactic foregrounds, we use simulated maps [20] of the CIB, tSZ, kSZ, radio point sources, and κ CMB at 148 GHz. We rescale each of the foregrounds (0.38 for CIB, 0.7 for tSZ, 0.82 for kSZ, 1.1 for radio PS) to match [22] , then subtract the mean from each foreground, and finally mask each foreground using a matched filter for point sources with a flux cut of 5 mJy and mask patch radius 3'. The resulting power spectra are plotted in Appendix A. For CMB measurements, we consider an experiment similar to Simons Observatory [5] , with a white noise level of 6 µK-arcmin and a Gaussian beam with full-width at half maximum of 1.4 . As in [13] , we re-weight the halos from [20] to match the redshift distribution of the LSST gold sample (dn/dz ∝ (z/0.24) 2 e −z/0.24 /0.48) [23] and obtain a projected galaxy number density count δ g . Altogether, a set of these maps (foregrounds, κ CMB , and δ g ) allow us to calculate the biases to the cross-correlation of CMB lensing with galaxy number density counts C κδg L , as well as the primary, secondary, and trispectrum biases to the lensing auto-correlation C κκ L . We run the estimators on 81 flat square cutouts from these maps, obtaining the biases plotted in Fig. 3 For the cross-correlation of CMB lensing with galaxy counts (Fig. 3) , we find that the standard minimum variance quadratic estimator acquires a ∼ 10% bias from the CIB and tSZ for reasonable values of max,T . Hardening against a single source component with a tSZ-like profile (PH) effectively nulls the bias from both tSZ and kSZ. Hardening against just point sources (PSH) is complementary, effectively nulling the radio point source and CIB biases. When simultaneously hardening against both tSZ-like profiles and point sources (PPH), we're able to null the biases from all extragalactic foregrounds, creating a highly robust estimator of the cross-correlation. For the CMB lensing auto-correlation we again find that the standard QE acquires a ∼ 10% bias, primarily due to the CIB and tSZ. This is consistent with the results found in [10] . In Fig. 4 we see that the PH estimator nulls the primary bias from tSZ, and significantly reduces the remaining primary and trispectrum biases. Likewise the PSH estimator nulls the primary bias from point sources, with the standard QE. We find that hardening against point sources (PSH) nulls or dramatically reduces the bias from radio point sources and CIB, whereas hardening against a tSZ-like profile (PH) nulls or significantly reduces the bias from tSZ and kSZ. Hardening against both (PPH) effectively nulls the bias from all foregrounds, resulting in a bias that is roughly an order of magnitude smaller than the bias to the standard QE. while also significantly reducing remaining primary and trispectrum biases. The PPH estimator essentially nulls all primary biases, and reduces the trispectrum bias to a level comparable to Shear. As discussed in the previous section, there is no reason to expect a significant reduction to the secondary biases from bias-hardening. In practice we find that the secondary bias to bias-hardened estimators is slightly worse than (but comparable to) the bias to the standard QE. We note that the secondary bias is negative, whereas the bias from the trispectrum is positive. This results in a cancellation in the overall bias to the lensing power, allowing higher values of max,T than one might expect from looking at the secondary and trispectrum biases alone. As a figure of merit for the performance of each estimator, we use the signal-to-noise ratio (SNR) of the lensing amplitude A lens , defined for each L as C κ,m L /C κ L , where C κ,m L and C κ L are the measured and true lensing power spectra respectively. By inverse variance weighting the measurement of A lens for each L, we obtain an estimator for the lensing amplitude: Since A lens has a fiducial value of 1, the SNR and bias of A lens are (34) Similar definitions for the amplitude of the CMB-LSST cross-correlation can be made by replacing the lensing power C κ L with C κδg L and by setting σ 2 L = (C κ L +N κ L )(C δg L + N δg L ) + (C κδg L ) 2 in the equations above. We show the best SNR achievable in auto and cross-correlation for the various estimators, while maintaining the foreground bias below 1σ, in Tab. I. For the auto-correlation, we find that the Profile Hardened (PH) estimator achieves the highest SNR be- Relative Bias [σ] Figure 6 . Total signal-to-noise of the lensing amplitude (top) and the cross correlation with LSST (bottom). The black markers denote the highest max, T where the bias is less than 1σ. We find that the Profile Hardened (PH) estimator reconstructs the lensing amplitude with the highest signal to noise: a ∼ 50% improvement over the standard QE. We find that the Point Source Hardened (PSH) and Point source and Profile Hardened (PPH) reconstruct the cross with similar SNRs: a ∼ 50% improvement over the standard QE. The Hybrid estimator is a combination of Shear and QE, and is described in greater detail in Appendix B. We note that the color-scale saturates at 2σ, and that the bias to the standard QE can be > 10σ at max,T 3000. fore bias(A lens ) becomes larger than the noise σ = 1/SNR(A lens ), as shown in Fig. 6 . This is a ∼ 50% improvement over the standard QE. The PH estimator's 1σ bias threshold occurs at max,T ∼ 3700. From Fig. 5 , we see that this high value of max,T is due to a cancellation in the tSZ and CIB biases. A more conservative stopping point would be when any one of the foregrounds crosses the 1σ threshold, which would be around max,T ∼ 3200 for the PH estimator. We note that even at this lower max,T , the PH estimator still has the highest SNR and is still a ∼ 30% improvement over QE, which crosses the 1σ bias threshold at max,T ∼ 2400. For the cross-correlation, the PSH and PPH achieve similar SNRs at the 1σ bias threshold, improving upon QE by ∼ 40%. We note that the PPH has a stable bias across the entire range of realistic max,T 's, making it a conservative yet competitive option for reconstructing the cross-correlation. We conclude by noting that for an experiment with max,T ∼ 3000, the PH and PPH estimators reconstruct the auto-and cross-correlations respectively with the highest SNR while achieving a sub-percent level bias, as shown in Tab Table I . Comparison of the total signal-to-noise ratio on the amplitude of the CMB lensing auto-spectrum (first column) and cross-spectrum with LSST-like galaxies (second column). In all cases, the maximum temperature multipole max, T is selected to guarantee a foreground bias smaller than 1σ. Rel. bias to auto (cross) Table II . Relative bias (in both % and σ units) and SNR for each estimator at max,T = 3000. The PH and PPH estimators reconstruct the auto-and cross-correlations respectively with the highest SNR while achieving a sub-percent level bias. In this study we revisited foreground-hardened lensing quadratic estimators, described how to harden against arbitrary profiles, and extended the formalism to simultaneously harden against an arbitrary number of extragalactic foregrounds. We showed that the point source hardened estimator (PSH) reduces not only the bias from radio point sources, but also from the cosmic infrared background, the thermal and the kinematic Sunyaev-Zel'dovich effects. Motivated by the extended nature of tSZ clusters, we modify the point source hardening technique to deproject the lensing bias from uncorrelated clusters (PH estimator), or simultaneously point sources and extended clusters (PPH estimator). We propose a simple and sufficient method to approximate the input cluster profile from the data itself, by considering the observed tSZ power spectrum from the same data. The signal-to-noise cost from deprojecting the point source bias is small (∼ 15% for Point Source Harden-ing), and is more than compensated by enabling the use of higher multipoles: overall, the SNR on the lensing power spectrum is increased by ∼ 56% when using Profile Hardening (PH), while for cross-correlations, both PSH and PPH provide a ∼ 45% increase in SNR compared to QE. Part of the improvement of PH and PPH estimators comes from the increase in the maximum multipole that can be included in the lensing reconstruction. The PSH, PH and PPH estimators improve the lensing SNR over the shear and hybrid shear estimators of [13] , although at the cost of a slightly larger bias in cross-correlation. They also provide consistency checks for the standard quadratic estimator, having a smaller foreground bias for a given max in temperature. Overall, these estimators outperform the standard lensing quadratic estimator both in auto and cross-correlation, both in terms of the lensing SNR and in terms of the foreground bias. We recommend their use in the CMB lensing analyses of upcoming temperature-dominated CMB data such as the Simons Observatory. Future surveys that are dominated by polarization reconstruction such as CMB-S4 [6] , will still benefit from improved robustness, since temperature reconstruction will contribute a non-trivial fraction to the SNR. We note that a realistic analysis will require modeling of the higher order biases N (i) [19] that appear in lensing reconstruction and will likely be non-negligible for the next generation of surveys. Finally, foreground hardening can and should be implemented in conjunction with other methods of foreground mitigation, such as masking [9] and multi-frequency cleaning in one or both legs of the estimator. We shall report on the optimal combination of these methods in an upcoming paper. ] total lensed CMB detector noise CIB tSZ kSZ radio PS Figure 7 . Foreground power spectra at 148 GHz after being masked with a match-filter with a flux cut of 5 mJy and a mask patch radius of 3'. We assume a 6 µK-arcmin noise, and a 1.4' beam. The hybrid estimator is defined to beκ where the standard estimatorκ QE L is calculated using min,T ≤ ≤ 2000 and the shear estimatorκ S L is calculated using 2000 ≤ ≤ max,T . The noise of the hybrid estimator is simply assuming that shear and QE are uncorrelated. The bias B L to the lensing signal is defined by Assuming that the noises ofκ QE L andκ S L are uncorrelated, the bias to the Hybrid estimator is Here B QE L is the bias when the standard quadratic estimator is calculated using min,T ≤ ≤ 2000. We approximate B S L as the bias from Shear when calculated with min,T ≤ ≤ max,T . This approximation should be OK since the contribution from min,T ≤ ≤ 2000 is negligible. The bias to the hybrid estimator's cross-correlation with LSST is calculated by inverse noise weighting the biases to QE and Shear. Appendix C: Derivation of Eq. (20) We start with where A L is a n × n matrix and w L , v L are n-dimensional column vectors. By splitting the matrix into blocks, we can express its inverse and determinant as (C3) Using this notation, the bias-hardened lensing estimator takes the form whereŝ L ≡ (ŝ 1,L ,ŝ 2,L , · · · ,ŝ n,L ) T . The variance of the bias-hardened lensing estimator is simply from which we get the noise In the past two equations we've assumed that the noises and responses are even functions of L, which is true so long that the sources are modeled as halos with profiles u L . Let's focus on the second term of the RHS of the equation From this we find Recall that the j'th component of w L is just N κ L R κ,sj L . Plugging this result back into our expression for the noise gives If the tSZ foreground map is modeled as s = i s i u , then the tSZ power spectrum is proportional to |u | 2 . Recall that the bias-hardened estimator is insensitive to the normalization of the profile. Therefore we take the square root of the tSZ power spectrum (measured from the simulations) as our tSZ-like profile, which is plotted in Fig. 8 . In practice, determining the exact profile from the measured tSZ power spectrum may be challenging. However, we have checked that the residual biases from tSZ and other foregrounds vary slowly when the assumed profile is changed, such that a precise knowledge of the profile is not needed. This is encouraging, and suggests that this method will be useful in practice. Appendix E: Noise price of bias-hardening increases with profile size We found that the noise price for hardening against a tSZ-like profile is larger than hardening against point sources. To gain an intuition for why this is the case, we plot the noise when hardening against a single Gaussian profile u = e −σ 2 2 /2 for different values of σ in Fig. 9 . We find that the noise increases with σ; that is, the larger the profile the larger the cost in noise. As shown in Fig. 1 , in the regime where the response is small the noise cost goes as R 2 L . The response R L ∝ d 2 f s ,L− f κ ,L− /C tot C tot |L− | is larger when the linear response to sources f s ,L− = e σ 2 L 2 /2 e −σ 2 2 /2 e −σ 2 |L− | 2 /2 looks more like f κ ,L− . For point sources f s ,L− is flat, however, as the size of the profile gets larger f s ,L− looks slightly more similar to the linear response to lensing, making it more difficult for the source-hardened estimator to remove the sources, and results in a higher noise price. Figure 10 . Verification of the noises of each estimator. The auto-correlations (blue) of the estimatorsκ,ŝ, andκ PSH when run on a Gaussian random field with power spectrum C tot L . In black are the theory curves. Figure 11 . Cross-correlating the true convergence κ withκ,ŝ, andκ BH when run on a lensed CMB map. We see that both the standard QEκ and the point source hardened estimatorκ PSH recover the lensing signal C κ L when correlated with the true convergence. The cross correlation of the point source estimatorŝ with the true lensing convergence gives N s L C κ L RL, which is verified in the middle plot. The blue points are positive, whereas the red points are negative. The response has a zero crossing at L ∼ 2500. Figure 12 . Cross-correlating the true source map s withκ,ŝ, andκ PSH when run on the true point source map s. The crosscorrelation ofκ with s gives N κ L C s L RL, which is verified in the plot on the left.ŝ recovers the source signal C s L when cross correlated with the true source map. Since the point-source hardened estimator is designed to have zero response to point sources, the cross-correlation ofκ PSH and s should be zero, which is consistent with the result found in the rightmost plot. CMB-HD: An Ultra-Deep, High-Resolution Millimeter-Wave Survey Over Half the Sky CMB-HD: Astro2020 RFI Response Appendix A: Foreground power spectra Our processing of the Sehgal simulations follows [13] . After masking the sources detected in the foreground maps at Appendix D: tSZ-like profile Appendix F: Noise and cross-correlation checks As a simple pipeline check we verify our calculation of the noise and response when bias-hardening against point sources (PSH). These checks are shown in the plots below.