key: cord-0670168-t80iy0aw
authors: Wang, Yunlong; Liu, Jiaying; Park, Homin; Schultz-McArdle, Jordan; Rosenthal, Stephanie; Kay, Judy; Lim, Brian Y.
title: SalienTrack: providing salient information for semi-automated self-tracking feedback with model explanations
date: 2021-09-21
journal: nan
DOI: nan
sha: 6a91d07cfbd701a05d2e5d56b2d8d79f8df6a5bd
doc_id: 670168
cord_uid: t80iy0aw

Self-tracking can improve people's awareness of their unhealthy behaviors and support reflection to inform behavior change. Increasingly, new technologies make tracking easier, leading to large amounts of tracked data. However, much of that information is not salient for reflection and self-awareness. To tackle this burden for reflection, we created the SalienTrack framework, which aims to 1) identify salient tracking events, 2) select the salient details of those events, 3) explain why they are informative, and 4) present the details as manually elicited or automatically shown feedback. We implemented SalienTrack in the context of nutrition tracking. To do this, we first conducted a field study to collect photo-based mobile food tracking over 1-5 weeks. We then report how we used this data to train an explainable-AI model of salience. Finally, we created interfaces to present salient information and conducted a formative user study to gain insights about how SalienTrack could be integrated into an interface for reflection. Our key contributions are the SalienTrack framework, a demonstration of its implementation for semi-automated feedback in an important and challenging self-tracking context and a discussion of the broader uses of the framework.

Self-tracking and personal informatics have helped people to manage many areas of their lives, such as their finances [36, 81] , sustainability [6, 40] , physical activity [28, 29] , and diet [11, 25, 54, 58, 83] . Li et al. derived five stages of personal informatics and identified corresponding barriers [51] . Two critical stages are collection and reflection.

Much research has focused on reducing the collection burden by automating data capture (e.g., with sensors [41, 64] , deep learning [18, 54] , and reinforcement learning [76] ). Collection techniques now span manual, semi-automated, and automatic tracking [20] . In contrast, there has been little work on reducing the barriers to reflection. A few exceptions include identifying effective visualizations [21, 36] , visualizing longitudinal data [22, 84] , speeding up information querying [47, 60] , providing auxiliary contextual data of tracked behavior [11] , and using lightweight challenges [35] . This means people face challenges in making use of their copious tracking data.

We address this problem in the context of photo-based food logging, an important domain in a world where obesity poses huge health problems. Food logging apps have been shown to help users to gain awareness of their eating behaviors [30] , identify unhealthy diet [11, 63] , improve nutrition intake [76] , and control diet-related chronic diseases [24, 65] . It is also a domain where people can readily a collect large amount of complex data covering their many eating episodes a day. This makes it challenging for people to reflect of the logged information.

We tackle this problem by creating a new approach, which we call SalienTrack. There are three key goals that drove our design. The first and central goal is to create a system which presents just the salient information. In the context of food tracking, this means that we need to identify the salient events (the subset of meals the user has photographed) and the salient information about them (aspects such as the nutrients, such as fat, and the cooking method, such as frying). A second core goal of our approach is to build an explainable salience prediction modelthis means that an interface can enable a user to scrutinize the system's reasoning about salience of the information presented to them. The third goal for this work is to understand the right balance of user control in the interface for reflecting on salient event -essentially, there is a spectrum, with one extreme being completely automatic generation of information about a meal and the extreme opposite where users manually enter information for selfreflection [30] . Auto feedback would be less tedious for the user but may not support deep reflections. Hence, feedback mode should be selected to balance engagement and self-reflection.

We implemented SalienTrack with a machine learning and explainable AI-based approach to automatically provide concise, salient feedback based on the dimensions in the framework. To implement this, we pose two research questions: RQ1) What information is salient in feedback? RQ2) How to automatically provide salient information in feedback? We answer these questions in a two-phase pipeline shown in Figure 1 . First, shown in the yellow box at the left, we conducted a field study of photo-based mobile food logging, where participants photographed their meals over 1-5 weeks, received Manual or Auto feedback at the end of each week, and rated the informativeness of the overall meal feedback and specific information types (e.g., macronutrients, cooking method).

Next, shown in the orange box, we analyzed participant responses to identify useful features for feature engineering.

Then, the green box shows that we trained a Gradient Boosted Tree prediction model with the field study data to build an explainable model to predict salient meals. To understand the model prediction and provide more salient feedback, the blue box shows how we used two explainable-AI techniques (SHAP [59] and Anchors [77] ) to determine feature importance and counterfactual rules, respectively. These inform which feedback information (features) are most useful for self-reflection, and why. Finally, the violet box is for our qualitative user study on a set of prototype interfaces with different levels of automatic, manual and semi-automatic reflection interfaces. We make the following contributions: We first conducted a field study, where participants tracked their diet and reviewed feedback. The results were analyzed with statistical (see Appendix 0) and thematic analysis. We then trained an informativeness prediction model on the collected data, and leveraged explainable AI techniques to develop more concise feedback to increase convenience and informativeness.

The diversity of food and human eating behaviors complicates the self-tracking of diets and requires much information to be logged. From paper journals and questionnaires [5] to mobile logging using digital technologies and automated tracking using artificial intelligence, there are myriad methods for food tracking. Verbal and semantic information can be captured via highly scaffolded text-entry forms [2, 30] , or speech inputs [60] . However, these are burdensome for users over time. Conversely, the Ubicomp community has proposed many wearable sensing [85] approaches for more seamless tracking of eating behavior, such as with using wrist-worn [87] , in-ear [9] , or neck-worn devices [91] . However, these require custom hardware or atypical usage. In contrast, for this work, we leverage on the familiar practice of photographing meals with commodity smartphones. While merely capturing photos is reasonably good in aiding recall and reflection [30] , advances in computer vision using deep learning have the potential to provide informative but less burdensome meal annotation. Several capabilities include automatically recognizing food dishes [68] ; identifying food groups [79] , ingredients [18] and cooking methods [17] ; estimating calories [34] , portion sizes [39, 44] , and drink healthiness [73] . This can be supplemented with other contextual information from smartphones, such as eating time [90] and location [63] .

Although self-tracking need not be permanent [26] , behavior change takes several weeks or months [82] , so users need to be engaged for a moderate duration. We aim to sustain engagement by reducing the burden of reflection.

Providing feedback frequently at each meal is very tedious and may dull the user's sensitivity towards the information [30] . Aggregating the feedback to once per day [30] , once per week [11] , or even longer periods [84] can reduce the frequency of review and facilitate deeper reflections. In this work, we chose the week duration to balance burden and reflection. More adaptive methods to reduce frequency include using AI to recommend the most appropriate moments for feedback based on preferences and contextual cues [53, 76] , These predict based on the outcomes of step count [53] and calories consumed [76] . However, these relate to behavior change outcomes which may be incidental or accidental [55] . Instead, we measured the perceived salience of each feedback and made

Proposed UI Design predictions on them. Going beyond just increasing awareness, this considers how informative or useful the feedback is, rather than whether it was just noticeable. This corresponds to the early stages of noticing and understanding described by Kocielnik et al. [50] and the dimensions of breakdown and inquiry by Baumer [8] .

Beyond just reducing feedback frequency to salient eating events, we aim to also select salient information about that meal to reduce information overload given the diverse information such as nutrition information (calories, ingredients, etc.) [11, 47] , context (e.g., events, places, and people [52] ), and sensations [24, 30] . Finally, some feedback interfaces require interactions and annotations, e.g., typing messages vs. multiple choice. Methods to reduce this burden include using various visualizations [15, 21, 36] or search-accelerators [47] . In this work, we explored providing feedback as automatically shown text or manually elicited data entry to balance between convenient passive learning and more engaging active learning [71] . presenting automated and comprehensive feedback about a meal. In the Figure 2b baseline, the interface requires users to manually enter the many details shown. Both baselines, which show all macronutrients and cooking information for all meals, is excessive and repetitive and may cause users to become disengaged. The manual one may well be better for supporting reflection, but it is also more tedious and not sustainable. Instead of either of these, we propose SalienTrack which aims to automatically select a salient subset of information to feedback to users. For example, healthy meals eaten by a typically healthy user or meals that are similar to recent ones could be omitted from feedback ( Figure 2d ) to be less patronizing or nagging. Furthermore, for meals selected for feedback, only more informative and salient aspects should be included to retain the user's limited attention (Figure 2c ). Figure 3 : Schema of a tracked activity along two dimensions: events (columns) and annotations (rows). Each square denotes an annotation # for an event # . Salient feedback should only include some events (e.g., 1 , 2 , 4 , 7 ) and annotations (blue squares). We define salient feedback in self-tracking as feedback containing the most important subset of information of a tracked activity that the user finds informative. As illustrated in Figure 3 , we formulate a tracked activity as comprising multiple events over time, and each event as annotated with multiple features. Salient feedback would only select a subset of annotations (blue cells in Figure 3 ) instead of including all. Inspired by the Intelligibility framework by Lim and Dey [56] , we introduce the SalienTrack Framework with question types to identify salient moments and annotations ( Figure 4 ). First, we identify when salient moments that are most informative. Only some events are chosen for explicit feedback, while others are quietly logged. For self-tracking, these events can be exercise activities, meals, financial transactions, etc. Salient events can be selected heuristically [50] , or with machine learning techniques, such as reinforcement learning [53, 76] or supervised learning (our approach).

Second, we limit which information item to include in the feedback by identifying which features are more informative. Instead of providing a full description of the event (inputs) which can be overwhelming, this provides a concise subset of the most salient information. For example, the user could focus on the protein level for a particular meal, instead of all macronutrients. Salient factors can be obtained from user feedback (via surveys or focus group discussions), or data analysis (statistical analysis of significance, or explainable AI techniques). We employ SHAP [59] to identify influential attributions, and select the features with highest attribution for saliency. Note that this approach predicts what specific items to provide in feedback to promote informativeness, and not the annotation values for the tracked event (e.g., predicting nutrients from food). Third, we explain why the chosen features are informative. For example, a meal may be selected for feedback, because its fat content was >30g, which is high. These thresholds and rules can be obtained from domain expert specifications and literature, or through data mining methods (i.e., machine learning), which we employed with the Anchors [77] explanation method.

Finally, we determine how to provide the feedback for each salient feature. We explored two approaches: showing the information (auto-inform), or asking users to estimate the values (manual-elicit). For the latter manual-elicit approach, the application would not show feedback even if it has a prediction of the factor values. This ironic approach follows current practices for manual self-reflection, and can foster deeper reflection than autoinform [8] . The choice between auto or manual can be made by the application designer or a scoring function by comparing the informativeness prediction confidence between both approaches.

We investigated this framework through stages in this work: 1) Dimensions of salient feedback (Section 3.2), 2) Measures of informative events and salient features (Section 4.2, Table 1 ), 3) Mechanisms for saliency selection (Section 5, Table 7) , 4) Evidence to support the usefulness of the saliency dimensions (Section 7.1, Table 10 ).

Events over time Figure 4 : SalienTrack conceptual framework for salient feedback in self-tracking. This describes a chain of inquiry (blue text and arrows) for when to provide feedback, with which specific information items, with reasons why the user would learn from the event (as rules), and how to engage the user (with auto-generated information, or manual-elicited self-reflection). Given annotation feature values (inputs) of an event, predict the informativeness of its feedback; Latin alphabets represent variable names, Greek alphabets represent values and 's represent rule thresholds. Informativeness Attributions (blue bar chart) indicate how important each feature is towards informativeness; most important features indicated with longer dark blue bars. Counterfactual rules explain why the event feedback is predicted to be informative; only rules of important features are included.

To support our goal to train a model to predict feedback informativeness and saliency, we need to collect training data and identify relevant features. We did this for the domain of food tracking. Specifically, we want to answer

Specifically, we aim to understand when users find feedback informative or not, which meal-specific information or annotations they find more informative, why the feedback was (not) informative, and how this differs across feedback modes. Prior works on mobile food journaling [30, 54] focused on tracking meal details or providing feedback as an intervention or service. However, these did not explicitly measure the informativeness (or lack thereof) of the feedback in detail. Hence, we conducted a field study of mobile food tracking where participants logged their meals for at least one week (1-5 weeks), and reviewed their meals at the end of each week. We collected annotations for each meal and provided weekly feedback of all meals to situate users in the context of self-tracking, but focused on collecting data regarding the users' perceived informativeness of the feedback. This provides a labeled dataset of when feedback for a tracked event is informative, and informs which features could be useful for model training. We conducted the study with two feedback modes, Manual and Auto, to investigate and model how mode affects informativeness. Through data analysis and applying explainable AI, we will determine, for each meal, which information is salient and why. Next, we describe our method, apparatus, procedure, analyses and results.

We designed a food tracking pseudo-app for two tasks -meal capture and weekly feedback. As participants logged their daily meals for several weeks, we conducted weekly surveys to provide meal feedback and report what they learned from the feedback. While much HCI and Ubicomp research focus on manual elicitation feedback due to their support for rich reflection [20, 30, 58, 80] , the burden on user review threatens their sustained use. Instead, much AI research [17, 18, 54, 79] envision automatic feedback without user data entry. For generality, we conducted our data collection for Manual and Auto feedback modes. Among the different approaches for manual prompts (e.g., action plans [2] , visual cuts [36] , meal enjoyment and context [30] ), we chose to simply list nutrition information to align with basic food journals used by dietitians [47] . This also enables feasible automatic inference for Auto feedback. 

During the week, users photograph each meal; no annotation or data entry is needed. At the end of the week, users upload the images to our server to process the weekly feedback. This introduced a burden of requiring participants to remember to upload their photos, but this was manageable, since we successfully collected photos from many users. We leveraged existing applications rather than develop our own app to reduce development overhead, ensure app familiarity, reduce survey burden and fatigue, and improve app usage and study compliance [38, 58] . In Manual mode, all information is blank or unselected and users have to fill them out (see Figure 14 in Appendix); in Auto mode, all information is pre-filled and users can edit them (shown here).

Similar to [11] , users received weekly feedback on their meals for the past week. We scheduled the feedback to be once per week rather than daily to: 1) allow more immersive reflection across multiple meals, 2) reduce reflection burden of reviewing feedback too frequently, and 3) enable feasible annotation for the Auto feedback. We implemented the app feedback with Google Forms, since it was sufficient to provide nutrition information and did not require maintaining our own database. Next, we describe the feedback information and interface ( Figure 5 ).

The feedback comprises four types of nutrition information: Food Groups, Cooking Methods, Ingredients, and Macronutrients. We derived the nutrition feedback in close consultation with a trained dietitian. Macronutrients (calories, carbohydrates, protein, fat, fiber) are the most fundamental nutrition information, but are unintuitive for lay people to assess [7] . Thus, we include more explicit nutrition information. Food groups (fruits, vegetables, grains, meat/fish/poultry, and dairy) are the most intuitive information of food that people can easily perceive [30, 45] . Ingredients provide more details about each meal. Cooking methods (baked, pan-fried, deep-fried, steamed, grilled, boiled, roasted) transform ingredients and affect their final calories and nutrients [16, 46, 86] . Providing these information types allows users to reflect at different granularity and depth. Though other information has been found to be useful (e.g., mood, post-meal satiety, social and physical contexts) [30, 61] , for feasibility, we only include information about the food dish that can be inferred from photo-based recognition and food databases.

All participants engaged with the same information, but had different interactions based on feedback mode. In Auto mode ( Figure 5 ), users were shown the information to read or edit, e.g., how many calories were in the meal and what cooking methods were used. In Manual mode ( Figure 14 in Appendix A.3), no information was provided and users answered questions to estimate the nutrition information, e.g., fat level and cooking methods in the meal.

To reduce burden, we chose questions with multiple choice or short-text responses.

We focused on measuring the perceived informativeness of each meal feedback and specific information types to collect data to model salient feedback in terms of the four dimensions of the SalienTrack framework. Table 1 summarizes specific measures for each dimension. We also measure other secondary effects: perceived ease of understanding and tediousness (7-point Likert) when reflecting on the feedback, and perceived accuracy (5-point Likert) to assess whether the users are likely reflecting on wrong information. Post-week: Text rationale for why specific meals and specific information types were informative. How

Conducted data collection between-subjects for Manual and Auto feedback modes.

We employed a remote recruitment and engagement approach to address several issues. First, the cuisine in our geographic location (non-western, non-United States) has limited nutrition data to prepare feedback. Second, the participants in our local culture are typically reticent. Hence, to widen our participant pool and align the participants' cuisine with online food nutrition information, we recruited US-based participants from Amazon Mechanical Turk (MTurk), and employed a remote engagement approach for longitudinal participation. This approach can also be used for conducting studies under social distancing requirements, such as during the Covid-19 pandemic [3, 88] . Other benefits include higher participant diversity, and larger initial sample size to mitigate participant attrition. Similar methods for remote recruiting have been proposed for experiments with difficult recruiting requirements, such as field testing smart home technologies [12, 13] . Figure 6 illustrates the participant recruitment process and study procedure. Participants were engaged through Amazon Mechanical Turk HITs for specific steps and incentivized to return since each step was paid, but may drop out at any time. Participants were compensated $0.05 USD, $0.70, $0.75, $8.00, for the screening, presurvey, photo upload, and weekly survey HITs, respectively. The participant started with a screening survey testing instruction comprehension and basic nutrition literacy (Appendix A.1), which was reviewed within 2 days, and if passed, she was allocated to the Manual or Auto feedback mode, and invited to the pre-survey. The pre-survey (Appendix A.2) asked about the participant's demographics (age, gender, occupation, education, ethnicity, country of origin), attitudes towards healthy eating (i.e., self-assessment and motivation), weekly eating behavior (i.e., frequencies of eating specific food types and with cooking methods), and nutrition knowledge (adapted from [14] ). The participant was instructed to photograph ≥2 meals/day, every day for 1 week. After 7 days, the participant uploaded her photos for feedback preparation. Research administrators checked the validity of photos and, for Auto, annotated the food names and nutrition information. Although the auto feedback was communicated as being done by a smart system, because of the accuracy limitation of current deep learning models, we implemented this with Wizard-of-Oz [43] . This is similar to using crowdsourcing annotators in PlateMate, which had good accuracy [72] . Future methods can use better image classifiers and automatic database look-ups. Food names, food groups, and ingredients were manually identified based on the annotators' knowledge and experience. Two annotators had extensive experience with western foods eaten in the United States. They trained other annotators and clarified when the latter were uncertain. Annotation was performed by looking up the food name in the MyFitnessPal food analysis database (https://www.myfitnesspal.com/food/search). Since the database contains potentially inaccurate crowdsourced data, annotators reviewed multiple entries and chose the 1st reasonable one. Finally, the annotations and photos were uploaded to a web server and the participant was notified to complete the weekly survey. and healthy meals, about her perceived informativeness of each nutrition information (overall, food groups, ingredients, cooking methods, and macronutrients), and about her user experience (ease of use and tediousness).

Like in the pre-survey, she was asked about eating behavior and given the nutrition knowledge test to measure changes in knowledge. Finally, the participant indicated if she would like to continue another week or opt out.

We recruited participants from Amazon Mechanical Turk (AMT) with high qualification (≥5000 completed HITs with >97% approval) based in the United States (US). We screened 416 participants and invited 162 to the presurvey. 136 participants completed the pre-survey; they were 70 We present our findings on the primary measure of perceived informativeness, and secondary measures that supplement our understanding of the participant's experience. Our focus is on the user experience with meal feedback, and defer the supplementary analysis of background attitudes and food logging behavior in Appendix B.1 and B.2. To identify significant effects, we performed statistical analyses on user ratings detailed in Appendix B.3.

Participants found Manual feedback more tedious to use than Auto, especially in later weeks. They perceived the feedback as accurate (M=82.4% agreed). Not all feedback was informative (M=46.3%), suggesting the need to omit feedback sometimes. The perceived informativeness of Auto increased after the first week, but not for Manual.

Further details are in Appendix B.4-B.6. Next, we describe the informativeness for specific information types. We analyzed the relative differences in reported informativeness from different feedback information to inform which aspects are most salient. Participants appreciated learning more about diet behaviors than nutrition knowledge ( Figure 7 ), but there was no difference across Feedback Modes. Among nutrition knowledge types, participants learned more about food groups and ingredients than cooking methods and macronutrients ( Figure  7a ). Among diet behavior information, participants learned more about the diversity of foods eaten than about whether they were eating more/less healthily (variation) (Figure 7b ). These results highlight the need to include diet behavioral and temporal features to for salient feedback information. Among nutrition knowledge information, we also note that it is least useful to only inform about macronutrients, which many food logging apps typically do. 

We performed a thematic analysis of participant rationales (n=692) in the weekly survey regarding what they found informative from the meal feedback. We used open coding [42] to derive categories and affinity diagramming [10] to consolidate categories to themes. Thematic coding was performed by one co-author researcher with regular discussion with co-authors. We then calculated inter-rater reliability on a random 15% subset of feedback coded independently by another co-author to obtain a Krippendorff's alpha with MASI distance [75] of =0.756, which indicates good agreement. We identified key themes in users' reflections based on what they learned from the meal feedback: cognitive space [50] , valence of meal [11] , contextual information, post-activity sensations [30] , and agency [4, 67] . These align with prior literature. We performed follow-up statistical analysis with linear mixed effects models to compare the difference of users' reflection between conditions. We discuss each theme next. 

Participants reflected on three key types of information: nutrition knowledge, meal assessment, and diet behavior. This is similar to Kocielnik et al.'s description of a cognitive space with target and self domains [50] . Nutrition knowledge relates to just stating factual nutrition information and values about meals (macronutrients, food groups, cooking methods, ingredients); e.g., Participant PA11 with Auto feedback learned that "deep frying adds more calories than expected." Meal assessment judges whether the meal or its nutritional component (e.g., calories) was healthy, above/below expectations, or how one should change the meal; e.g., PM17 with Manual feedback assessed that her meal had "so many carbs, no green veggies, and boxed fish." Diet behavior describes a longer-term pattern across multiple meals by their averages, deviations (e.g., above/below, a lot), frequencies (e.g., often, never, seldom) and trends; e.g., PA22 learned that "there is much more variety in my dinners than in any other meal of the day."

Participants often reflected on the feedback positively or negatively relative to healthiness. Blair 

We identified other themes about the context surrounding the meal, namely, contextual background, agency and control, and post-activity sensations and feelings. Participants described the background context of their tracked meals to justify why a meal was healthy or unhealthy. E.g., PM17 felt that "it was interesting to me that it was a relatively unhealthy week for me. It was around Father's Day so there were a few special meals peppered in (for my husband and my step-Dad), but my breakfasts were a bit heavier than normal so I should have had more cereal than the fatty stuff." Conversely, PM8 credited that "the waitress suggested fruit and it actually was a really nice addition to my lunch." Hence, participants sometimes attributed external factors for their diet choices. Additionally, many participants cooked their meals and proudly declared new cooking skills learned, e.g., PM12 "never grilled kabobs before so it was new to me." This provided more justification for how they had or lacked agency to eat healthily.

Finally, participants recorded how they felt when they ate certain meals (sensation). E.g., PM6 reported that "not having enough protein, fat and fiber made my body sluggish and I was rather cranky." Some commented how certain ingredients were tasty, e.g., PA12 learned that "pineapple tastes good when grilled"; or how certain ingredients were satiating, e.g., PM8 "used a tortilla instead of the hash browns. I was surprised that it kept me feeling fuller for longer, because of the fiber." These suggest their awareness of new incentives to eat such dishes more in future.

Similar to [74] that found that photo-taking (i.e., tracking) behaviors differed with annotation automation, we found that users reflected differently based on whether the feedback was automatically shown or manually elicited. These reflections were mentioned at different frequencies across feedback modes. Contextual information was mostly written by participants with Manual feedback and almost never mentioned by participants with Auto feedback (only 9 mentions). Participants reflected most about their agency (or lack thereof), followed by contextual background, and sensation after the meal (Figure 9 ). Participants with Manual feedback reflected more about Agency (M=.079 vs. .010, p<.0001) and Context (M=.056 vs. .010, p=.0020) than participants with Auto feedback. 

In summary, answering RQ1 about what information is salient in feedback, we found that participants:

1. Perceived the meal feedback as accurate and found Auto feedback easier to use than Manual feedback.

Reported learning more about their diet behaviors than nutrition knowledge. 3. Reflected along the target-self cognitive space: nutrition knowledge, meal assessment, diet behaviors. 4 . Reflected on different nutrition information type depending on positive or negative valence. 5 . Reflected more about contextual information with Manual feedback than with Auto feedback.

These findings pose some implications for design, namely,

• Scaffold feedback with cognitive spaces (target domain, self-assessment and long-term behavior awareness).

• Provide Auto and Manual feedback together for nutrition-specific informativeness and contextual reflection.

• Prioritize feedback for each nutrition information type to support positive or negative valence in reflection.

With the data and findings from the data collection study, we propose a technology and technique to provide salient feedback to balance reflection burden and informativeness. This answers our second research question:

This involves three technical steps (Figure 1 , right): informativeness prediction, explainable AI for the user interface (UI), and proposed UI design. The machine learning approach has three parts ( Figure 10 ) for annotation prediction, informativeness prediction, and informativeness explanation. The first part is to train a convolutional neural network for automatic recognition of the food from a meal photo , including predicting or looking up nutrition information. We denote these nutrition annotations as �. Modeling for this is well-established [54, 68] and we defer providing further detail. Note that SalienTrack is premised on automatically recognizing meal photos, though feedback may be conveyed as manual elicitation or automatic display. Second, we propose to predict an informativeness score ̂ by training model on diet features � based on aggregate and temporal variables extracted from the nutrition annotation � , with heuristic preprocessing method ℳ . This allows nutrition knowledge and diet behavior information to be encoded. The informativeness model will predict whether a user is likely to learn much from each specific photographed meal. If the informativeness score is high, then feedback should be provided for that meal, otherwise, feedback should be omitted. Third, for concise feedback, we use model explanations to only show salient features and rules. Our approach to exploit explanations differs from typical uses of explainable AI. We are not proposing to explain the primary prediction task, i.e., why predicted the dish name or calorie level. Instead, we use explanations from that explains how diet features � influenced the current informativeness score . The explanations will first determine importance weights � for all annotation features, and rules � for some features, then filter the rules to only the most salient ones � . Figure 10 : Overview of proposed SalienTrack system with modules for annotation prediction (yellow), informativeness prediction (green) from multi-meal diet features (yellow-green) and informativeness explanations (blue). We propose informativeness prediction as a key capability for providing salient feedback in self-tracking. The multi-line arrow indicates aggregate features extracted across multiple meals. The cross symbol refers to a masking operation with importance weights � on the rules �.

Our quantitative and qualitative analyses of the users' review of food logging feedback identified different information and aspects that users learned and reflected on. We aimed to extract data features spanning different cognitive spaces on nutrition knowledge, meal assessment, and diet behavior from meal annotation and survey data from our earlier study. The premise is that nutrition knowledge and meal assessment annotations can be automatically inferred with image-based classifiers [54, 68] , though for our initial study, we depend on manual annotation by our previous participants and research annotators. We added features for the number of food groups and ingredients to capture the diversity in the meal. Diet behavior features calculated from the historical meal records include the mean, standard deviation, trend (slope of linear interpolation), change from previous meals, maximum of each nutrition information across different time periods (recent 2-4 meals, recent 2 meals with the same meal type). These features are consistent with meal annotations, participant reflections, and literature on dietetics [62] . We excluded features about user demographics (age, gender) and study treatment (week) so that the models trained are generalizable beyond the previous user study. Altogether, this produces 580 data features, which can suffer from the curse of dimensionality. To reduce dimensionality, we selected features using recursive feature elimination [41] for tree-based models, and mutual information-based univariate feature selection method [71] for other models. The final model was trained with 30 selected features including food habits, nutrition knowledge, and diet behaviors, as shown in Table 12 in Appendix C.1. We modeled informativeness prediction as a classification problem by binarizing the self-rated informativeness (rating >0 or not). Our dataset had 1,545 instances (922 and 623 from Manual and Auto, respectively). The dataset is balanced with 46.3% of instances labeled with high informativeness. We investigated 5 machine learning models.

Logistic regression and decision tree were considered for interpretability, but they sacrifice model performance.

Multi-layer perceptron (neural network), Random Forest, and Gradient-Boosted Trees (XGBoost) [19] , were considered for accuracy, but are less interpretable due to their large number of model parameters.

Informed by different reflection behaviors of participants in the data collection study, we trained models separately on the Manual and Auto datasets to understand how features variously influence the informativeness for each feedback mode. Models were evaluated with 5-fold cross-validation. Table 3 summarizes the model evaluations, reporting various metrics to compare models. The interpretable models had poorer performance than the larger models. We selected XGBoost for SalienTrack, since it had the highest performance with F1 scores 0.74 and 0.84 for Manual and Auto, respectively. This indicates good prediction performance for binary classification.

We investigated model explanations to 1) understand how the model made decisions regarding high or low informativeness, and 2) use as a mechanism to provide salient feedback to end-users. We employed SHAP and Anchors explanations to see which features were important and why they affected informativeness, respectively. We describe how they are calculated and interpreted, and evaluate their correctness towards saliency.

SHAP [59] calculates the attribution by each feature towards the model's inference for a specific instance. For each instance prediction, the attributions inform how important each feature is (magnitude) and whether it influences the decision towards informativeness (positive sign or large) or not (negative or small). Consider the example in Table 4 and exclude features about prior habits, since they are less actionable. These SHAP attributions change for different instances (see global visualization in Appendix C.3 Figure 19 ). Thus, saliency is dynamic with feedback instances. 

We evaluated explanation correctness to determine whether the most salient features most affected predictions. For an instance prediction, a feature that is more important will cause the prediction confidence to change more if the feature value is changed. We induce the change by perturbing salient features across the counterfactual rule threshold, e.g., for the instance in Table 4 predicted as informative, for the feature "Meal Cooking (Pan/Air Fried) :

Mean[Prev-Current]" with value 3/4 and Anchor Rule explanation >2/4, we would change its value to 2/4 to just violate this rule. Specifically, we create a counterfactual instance with only that feature value change, have the model predict the instance's informativeness, and measure the decrease in prediction confidence of informativeness. For an informative/uninformative prediction, we expect that this perturbation should decrease/increase the confidence to indicate the correct influence. Mathematically, we calculate the Signed Prediction Confidence Change for the th feature as Δ = − � − ¬ �, where is 1 if the prediction is positive and -1 if the prediction is uninformative, is the original prediction informativeness for the instance, and ¬ is the prediction confidence for the counterfactual instance. Figure 11 validates that, on average, features that were ranked as more important were more influential in determining whether the instance is predicted as informative or not. 

Having trained two models for saliency prediction for Manual and Auto feedback, we further propose a third variant Semi-Auto that combines both feedback modes. This will limit the burden of always requiring manual annotation by more often providing automatic feedback and occasionally providing manual feedback. Providing feedback with both Manual and Auto feedback modes involves several steps. First, we calibrate the relative desire for Manual or Auto feedback by using preference weight , for each feedback mode ∈ {Manual, Auto}. This depends on the application designer and user, e.g., Auto should be higher to prioritize lower burden. Then we select the top features with highest SHAP attributions from the maximum across feedback modes, i.e., � = max� Manual � Manual , Auto � Auto �. We chose = 3. Each top feature is conveyed with its maximizing feedback mode, i.e., = argmax � . Hence, for a tracked event selected for feedback, the three features that are selected for feedback are either shown automatically for plain reading, or require the user to manually estimate its value.

We have developed the salient feedback model with an informativeness model prediction and explanation techniques to select salient instances, features, and rules for different feedback modes. Table 5 summarizes how the mechanisms implement the dimensions in the SalienTrack framework ( Figure 4 ). 

Treat events as instances with annotations as features, and predict informativeness as binary classification. Which

Calculate SHAP attributions for the instance features, rank order features by attribution magnitude towards the prediction, and filter saliently select (filter) the top features ( chosen by the application designer). Why

Calculate Anchor rules to explain the criteria for the instance being predicted as informative or not. How

Compare prediction confidences for the Manual and Auto models and select mode with higher confidence.

We formatively studied the usefulness of the SalienTrack feedback interface with a scenario-driven semi-structured interview study. This allowed us to verify positive aspects of SalienTrack and identify issues before further investment in engineering and field testing. We aimed to qualitatively examine the prospective usefulness of dynamically selecting fewer features in feedback, understanding whether users would prefer other information, and how explore their opinions regarding feedback mode (manual, auto, semi-auto). We were interested to compare the features in SalienTrack and varied Feedback Modes as the independent variable with two baseline conditions (Baseline-Nutrition, Baseline-Historical), and three SalienTrack conditions (Manual, Auto, Semi-Auto). See for Figure 12 example screenshots. We prototyped static mock-ups of app screenshots in PowerPoint slides rather than an interactive prototype. This is equivalent to interviewing with paper prototypes to elicit more open discussions from the participant, since the interfaces look less developed. All feedback interfaces share the same basic interface design comprising a list of features with each feature value either automatically shown (e.g., "Low level of calories"), or requiring manual entry in a simple form. For long lists in the baseline interface, we grouped related features by categories (e.g., macronutrients, food groups, recent meals) to aid interpretability. For Auto feedback, users only need to read the values and do not need to estimate them. The Manual entry interface uses dropdown menus or checkboxes to limit choice overload and reduce user burden. More appealing and sophisticated visualizations could be used in future work (e.g., [36, 84] ) but our focus was on the straightforward truncation of features selected for saliency. The feedback can also incorporate why the meal was salient by displaying rules with numeric quantities; for categorical values, rules are already implicit by just showing the selected value (equality). For simplicity in the formative study, we used categorical features.

We examined three variants of SalienTrack with the Manual, Auto, and Semi-Auto modes. We limited to the top three salient features to limit information overload. Furthermore, if SalienTrack predicts low informativeness, then feedback would be omitted from the set of meals to review. Since SalienTrack selects features based on nutrition information and historical behavioral characteristics (e.g., average in past 3 meals), we had two baseline conditions to investigate their base usefulness; unlike SalienTrack, features for these baselines were curated and fixed selections for each meal, not dynamically selected. Baseline-Nutrition feedback shows 8 items in four categories (macronutrients, food group, cooking methods, ingredients). Baseline-Behavioral feedback shows the top 15 items selected based on feature importance of the XGBoost model; we divided these into two categories (current, recent meals). The long list for Baseline-Behavioral feedback also allows us to investigate the impact of lengthy feedback which may lead to information overload. All baseline feedback were in Auto mode, since requiring their manual entry would be obviously tedious. Although random subset selection can be considered a baseline to compare against SalienTrack selection, we did not include this since it would be perceived as clearly arbitrary and less useful.

The experiment procedure was as follows. We first briefed the participant about the scenario of photographing multiple meals over time and having a review session of several images through a feedback app. The participant was then instructed on how each of the five Feedback Modes worked and what information they provided. Details for the briefing and tutorial are in Appendix D. After the briefings, we commenced the main experiment. We had selected new food meal images that were canonical of our training dataset (western dishes typical of our data collection), and generated baseline and SalienTrack feedback. Participants were instructed to imagine being in a scenario where the user has eaten 7 consecutive meals and was reviewing the last 4 meals using a feedback app.

In the main study, first, the participant chose 1 set of meals from 3 possible sets that she was most familiar with to analyze. This was to maximize the familiarity and relevance of the meals to the participant's diet, and mitigate issues when interviewing on scenario data. The participant viewed 4 trials in the main study. In the first trial, the participant was shown three prior meals eaten (only as photos) to contextualize the scenario, and the app feedback for a fourth meal. The subsequent three trials showed the next meal in sequence and incremented the recent meals by one as a sliding window. The feedback was shown for all 5 Feedback Modes in the order: Baseline-Nutrition, Baseline-Behavior, SalienTrack Manual, Auto, Semi-Auto. We showed food images and app screenshots in a PowerPoint presentation, one screenshot at a time, and asked the participant to describe which information she found useful, that she could learn from, or what other information she would prefer to learn, and what she found tedious. We provided clarification when questions were raised. After reviewing four meals across all the trials, we asked the participant to rank and explain the informativeness and tediousness of each Feedback Mode, and discuss any features she would like to have included or excluded.

We recruited 10 participants through convenience sampling from people residing in the US, since they would be more familiar with US-based foods of our dataset. They were 3 male and 7 female, ages 26 to 35. We interviewed participants over Zoom and recorded the audio and screen interactions for subsequent analysis. Each interview session lasted about one hour, and each participant was compensated with a USD $15 Amazon gift card. Participants ranked the five Feedback Modes by informativeness and tediousness ( Figure 11 ). As expected,

Manual feedback was most tedious and least informative, since users needed to know and enter the information themselves. Participants most often ranked Baseline-Historical feedback as the most informative, but also most often ranked it as most tedious, because of its long list. Participants found Baseline-Historical information more useful than Baseline-Nutrition, because the latter only had information about the current meal and not previous ones. This shows that providing excessive feedback details for each meal is less appreciated than showing more meals. Participants found SalienTrack-Auto the least tedious as expected, but found SalienTrack-SemiAuto more informative due to the complementary benefits of mixing both Auto and Manual feedback. The key take-away is that SalienTrack-SemiAuto balances between reducing tediousness and improving informativeness. Next, we qualitatively analyzed the explanations by participants for their opinions. We performed a thematic analysis of participants' utterances using open coding [42] guided by the SalienTrack Framework dimensions, themes from our data collection study, and objectives to examine usefulness across feedback modes. Thematic coding was performed by one co-author researcher with regular discussion a senior coauthor. Participants liked the dynamic selection and concise feedback, and the variety of feedback modes, though they wanted more information on demand, and feedback in context of their goals. We report the most salient details.

Participants reflected on the feedback based on cognitive space similarly to our previous participants in the data collection study (Section 4.5). Some participants focused on specific nutrition information instead of the full long list with many information types. P7 said, "I only care about the macronutrients, especially the calories and carbs, because I track my food mostly for losing weight." Similarly, P3 mentioned that "I would check the food groups in the current meal and recent meals to see if my meals were balanced." Conversely, some participants were also interested in diet behavior. P4 felt that "the average level of macronutrients is very helpful for me. Actually, I also want to know the calorie level of the whole week." Therefore, the categorical grouping provided in the baseline feedback was useful. Desire for feedback of all meals. With SalienTrack, some meals were omitted from feedback due to low predicted informativeness. However, participants generally were eager to see feedback for all meals and were curious to know how some meals were predicted to be low informativeness. P3 wanted to reap the fruits of his efforts and remarked that "I've taken the photos, so it makes no sense to provide no information to me at all. I can ignore the information if I don't have time, but it's good to have the information." P2 understood the benefit of not reviewing all meals, remarking "it's OK that the App says no feedback for this meal if there is really no interesting information." P1's curiosity was piqued: "When the App says there is no interesting information of this meal for me, this makes me get interested in why the App thinks it's not interesting?" Therefore, SalienTrack should have the option to show low-informative meals on demand. We believe that interest in viewing all meals will wane over time, so a longitudinal study is needed to evaluate the usefulness of limiting salient moments (when).

In addition to viewing factual information, participants wanted the feedback to be contextualized to their health goals [11] and include action plans [2] . This agrees with our earlier findings on interpreting feedback by the valence of the meal (Section 4.5). P7 wanted to "categorize the information by positive and negative. Then I know what I am doing good, what I can improve." P4 felt that "some information is vague for me. I don't know what I am supposed to do given the information. I like clear suggestions." Therefore, SalienTrack could be combined with a healthiness prediction model to indicate meals and features that support or undermine healthiness, and combined with a recommender system for action plans. For example, when stating "this meal was deep fried", SalienTrack could contextualize that this was an "unhealthy cooking method with much fat", which is harmful towards a low-fat diet, and suggest to "consider baking instead".

We have answered our research questions: RQ1) What information is salient in feedback? RQ2) How to provide salient information in feedback? We summarize the evidence for salient feedback, discuss its implications for informativeness in self-tracking, how adding model explainability expands opportunities for feedback experience, and how to generalize our saliency approach to other self-tracking activities.

We examined the need for, provision of, and usefulness of salient feedback in data collection, modeling, and formative studies. Table 6 summarizes our findings along the four dimensions of the SalienTrack framework. Therefore, to sustain engagement, salient feedback should be provided occasionally, with concise details, with rationales supportive of or antagonistic to the user's health goals, and with diverse feedback modes. Table 6 : Evidence to support the usefulness towards saliency for each dimension in the SalienTrack framework.

We discuss limitations in our data collection study. 1) We had limited the feedback to nutrition information to limit user burden, and excluded other contextual information, such as events, places, and people [52] . The excluded features may occasionally be more salient than nutrition knowledge or diet behaviors, which future work should investigate. 2) We had also limited feedback sessions to weekly intervals, but future work can explore how much saliency is beneficial for varying feedback frequencies at every meal [47] or once per day [76] , and across years [84] , which will have differing total logged data amounts and different reflection patterns [37] . 3) There was a high drop-off rate in the data collection study, though this is a common issue in health behavior change studies [26] . Our persistent users may be biased towards engagement and rate informativeness higher than average.

Many machine learning techniques have been proposed to address burdens in collection and reflection for selftracking. For data collection, models can automatically detect and infer events, such as physical activity [76] , sleep [49] , stress [1] , and eating [9, 87, 91] . For feedback reflection, models can predict opportunities, such as recommending context-aware actions [76] and providing just-in-time adaptive interventions (JITAI) [53, 70] , finding associations between macronutrients blood glucose [65] , and identifying disengaged users [55] . We add to the latter body of work by providing salient information subsets based on the potential to learn from feedback.

Evidence for Support When Data collection study: Informativeness ratings per meal were varied across high and low ratings.

Modeling study: Gradient Boosted Tree can accurately predict informativeness (F1 Score = 0.74, 0.84).

Formative study: Participants wanted feedback for all meals initially, but agreed seeing fewer is less tedious. Which

Data collection study: Found that participants reflected on diet behaviors more than nutrition knowledge, and mentioned macronutrients more and ingredients less in negative reflections than in positive ones.

Modeling study: Some features are more salient than others for each prediction instance, and salient features dynamically change for each feedback instance.

Formative study: Found that participants prefer concise feedback, appreciated dynamic salient selection, and preferred historical information over detailed nutrition knowledge. Why

Data collection study: Found that participants explained their diet behaviors using contextual information.

Modeling study: Anchor rules learned decision boundaries to reveal counterfactual changes to lead to different prediction outcomes of informativeness.

Formative study: Found that participants wanted to relate feedback items to their healthiness goals. How

Data collection study: Found that participants rated Auto feedback as more informative than from Manual. Modeling study: Auto and Manual prediction models may select different salient features for the same instance, suggesting that informativeness depends on feedback mode.

Formative study: Found that participants prefer Auto feedback than Manual due to the latter's tediousness, but appreciated occasional Manual feedback for deeper reflection (i.e., most preferred SalienTrack-SemiAuto).

As more sophisticated models are used to support self-tracking and feedback, there is more need for explainable AI to support human reasoning goals [89] . For self-tracking, instead of generating explanations for data scientists to debug the models, it is important to explain to domain experts, such as diabetes educators [65] and public health officials [55] , and explain to consumers to persuade healthier behaviors [33] . We have leveraged Anchor rules to justify to the self-tracker why they should pay attention to the feedback. Our formative study informed how combining with health prediction modeling can provide more persuasive and explicit goal-oriented feedback.

Currently, the saliency selection of SalienTrack is purely data-driven based on end-user data, but the designer or dietitian may choose to prioritize some features. For example, we found that showing feedback on ingredients tends to produce positive valence reflections. Future work can encode this requirement as a prediction prior to using machine learning regularization to prioritize features for positive valence. To increase the clinical relevance of the salient selected features, dietitians can also specify higher priority factors (e.g., mentioning the vegetables food group and fiber macronutrient), and these can be added as regularization terms to penalize their omission.

We have demonstrated SalienTrack for selectively providing concise and dynamic feedback for meal self-tracking. Applying the SalienTrack technique involves multiples steps in two phases. We summarize them to help replicate the approach. First, conduct a usage trial of a self-tracking app with feedback intervention, similar to diet tracking trials (e.g., [11, 30, 48] ) or studies on personal informatics (e.g., [30, 35, 54] ). However, other than just recording activity or providing feedback, also survey the users on their perceived informativeness of the feedback for each tracked event, and for each information item. The data collection step is necessary to identify salient information due to differing application (e.g., healthy lifestyles or chronic disease management), cultural, and population contexts. For example, including carbohydrates is important for ethnic Indians with higher diabetes risk [66] , and including sodium intake can help Japanese users manage their stroke risk [69] ). Finally, extract features from quantitatively and qualitatively analyzing the perceived informativeness ratings and rationales We expect that historical temporal features that we extracted in our food logging use case to also be relevant for other applications.

The second phase involves engineering the AI system and app interface for salient feedback: 1) Train a machine learning model to predict informativeness. 2) Implement explanations using SHAP for saliency and Anchors for rule reasons. 3) Implement the salient feedback by displaying or eliciting only the top-ranked features based on the model prediction confidences of the Auto and Manual models, respectively. We only demonstrated the feedback UI with simple lists and form widgets like in [30] , but other pictorials or visualizations can be considered (e.g., [4, 36] ).

Note that saliency and informativeness predictions depend on the UI format; with better interactive visualizations, more informativeness may be higher, and tediousness may be smaller, so modeling results may be different.

The SalienTrack technique can also be applied to other behaviors, such as physical activity, sleep, and savings. We discuss applying SalienTrack to sleep tracking [49] with multiple steps.

Step 0) track contextual information, such as sleep time, coffee intake, time between exercise and sleep, room temperature, feedback informativeness rating, 1) train a model to predict when a user will learn from sleep episode feedback (whether about good or poor sleep), 2) apply SHAP to select which features to include in the feedback, and 3) determine how to present the feedback, e.g., manually ask about temperature comfort, or automatically show amount of movement during sleep).

To contextualize feedback with goals, such as sleep quality, a prediction model should be trained with the contextual features and Anchor rule explanations provided to explain why the context helped or harmed sleep. Some users may be frustrated with needing to do manual recording work despite the app having the automatic prediction capability. Nevertheless, the proposed ability to switch between Manual and Auto feedback in SalienTrack can mitigate users "losing the habit" and consequently abandoning self-tracking [31] by substituting forgotten logs with manual reflections. Furthermore, there is a technical benefit for occasionally switching the feedback from Auto to Manual. The manual feedback can be used for active learning for the machine, i.e., to prompt the user to provide annotations when the machine learning model's prediction confidence is low. This provides labeling data to help the model improve its accuracy and improve the feasibility towards Auto feedback. Hence, SalienTrack can be used for active learning [71] for the user and the machine [27] .

In general, SalienTrack can be used if automatic feedback can be obtained for a tracked behavior [20] . Data collection need not be automated, as is the case for mobile photo-based food logging. Applications where automatic inference remains elusive include capturing the social context or background of an activity (e.g., overeating due to attending a party), and feeling bloated after eating triggering foods [24] . For these applications where manual elicitation is needed for assessment, users will already be confounded with being burdened to reflect. In such cases, applying decision-theoretic models can help to mitigate repeated elicitations by accounting for their costs [53, 78] .

We have studied what users find salient in self-tracking feedback and found differences in perceived informativeness across logged meals, and for different nutrition, assessment, diet behavioral, and contextual information. Applying these insights, we quantified the informativeness in self-tracking and proposed the SalienTrack framework that defines the saliency of when, with which information, why, and how to provide feedback.

We implemented a machine learning model with explanations to predict the informativeness of feedback at each meal event, and explain the most salient information for users to learn. Our formative study showed the usefulness of SalienTrack to provide concise, dynamic feedback. SalienTrack demonstrates semi-automatic feedback based on informativeness, and expands opportunities to make feedback more concise and engaging.

We analyzed the background attitudes and logging behavior of participants. These supplement our analysis on user informativeness.

We analyzed the results from the pre-survey from all 136 respondents. The findings are similar for the subset 53 who continued for the weekly study. About half of participants (56.6%) perceived that they eat healthily, but almost all (91.9%) wanted to eat more healthily (Table 7) . Participants mostly frequently ate grains, meat and dairy, but there was a wide distribution of habitual eating of vegetables and fruits ( Table 8 , left). Participants ate baked foods (i.e., bread) most frequently, followed by grilled, pan fried, and steamed foods, but reported that they rarely ate boiled or deep fried foods ( Table 8 , right). Since participants were randomly assigned to Feedback Modes, there was no significant difference in pre-survey measures between groups.

We assessed participants' nutrition knowledge to ensure they were able to understand their food and the provided information. We analyzed the participants' performance in the Nutrition Knowledge Test [14] by first binarizing their responses in terms of whether they selected the second (right) item (i.e., rating > 0 or not), then grading whether the selection was correct. This produces a correctness metric that can range between 0 and 1. For the pre-survey, participants demonstrated good understanding (M=0.850, SD=0.304). 

To help to contextualize participant reflections of meal feedback, we analyzed the meal information logged.

Participants with Manual feedback had to write their own information and may have looked-up, estimated, or guessed them, though they would be more able to identify the dishes. Participants with Auto feedback received annotations from the Wizard-of-Oz method, which was rigorously coded from nutrition databases, but may be incorrectly identified. Therefore, both Feedback Modes were prone to some errors in various ways.

We found that the Auto feedback annotated more meals with below average macronutrient levels than Manual feedback ( Figure 15 ). This could be due to participants attempting to eat healthier meals, or eating more snacks rather than full meals; participants with Manual feedback may have over-estimated their macronutrient levels too. Participants ate meals with similar levels of Food Groups across Feedback Mode, with Grains being most common and Fruits being least common (Figure 16, left) . The distribution of Cooking Methods was similar between Feedback Mode too, with Baked being most common, followed by Pan-fried, Boiled, Steamed, and Deep fried (Figure 16 , middle)). Participants with Manual feedback reported fewer food groups and cooking methods, possibly due to less diligent annotations compared to the researcher annotators. In particular, participants seldom reported bread as being baked. The number of ingredients was the same for both Feedback Modes; participants reported 1 to 27 ingredients per meal (Median=5; Figure 16 , right). 

In our quantitative analyses, we employed the same method for all inferential statistics. We first binarized each measure from bi-polar Likert scale rating to >0 and ≤0, then trained a linear mixed effects model (e.g., Table 4 ) with Feedback Mode and Week as main fixed effects and Participant as random effect. Additional fixed effects models were included for some analyses. 

We combined the ratings for perceived ease of use and mental (non-)demandingness into an overall binarized rating. We analyzed perceived tediousness per nutrition knowledge information type to assess which is most costly and perceived learning with respect to nutrition knowledge and diet behavior information types. As expected, participants perceived Auto feedback as easier to use (Figure 17a ) and less tedious (Figure 17b ) than Manual feedback, especially after the first week. Reviewing macronutrients was the most tedious, while cooking methods were the least (Figure 17c) ; there was no difference between Feedback Modes. 

Participants reported high perceived accuracy (M=82.4% agreed). Participants with Auto feedback increased their perceived accuracy after the first week, but not those with Manual feedback (Figure 18 , left). This suggests their increasing trust in the Auto feedback over time. Participants with Auto feedback had marginally higher perceived informativeness than those with Manual feedback (M=62.7% vs. 42.1%, p<.0387). 

Not all feedback was perceived as informative (M=46.3% agreed), suggesting the need to not provide feedback all the time. Participants had higher perceived informativeness with Auto feedback after the first week, but not for Manual feedback (Figure 5, right) . This suggests that participants could learn more over time with Auto feedback, but not with Manual feedback. Whether the current meal has meat/fish/poultry (none, contains).

Whether the current meal has fruits (none, contains).

Whether the current meal has diary (none, contains).

Number of food groups (grains, vegetables, meat/fish/poultry, fruit, dairy) the meal has (1 to 5).

Whether the current meal is or has a part cooked with baking (not, has). Table 13 shows the SHAP attribution and Anchor rule explanations of a meal predicted to have low informativeness. Note that even though the meal "Baked salmon, carrots and potatoes" seems healthy, it may not be useful to show feedback. The user is not likely to learn from the meal because: 1) the user's trend of eating steam cooked meals is unchanged, which SalienTrack considers unremarkable; 2) the meal carbohydrate level is higher than the medium level, which is typical for the user; and 3) the recent meals remained similarly not deep fried (low SD), so it is not particularly diverse or interesting. Figure 19 : Global influence of features in the informativeness prediction model calculated from SHAP attributions for Manual (left) and Auto (right) feedback models. Each dot presents a test instance. Color represents the relative feature value (red=low, blue=high). Dots to the right of each vertical zero line indicates positive attribution towards high informativeness, and to the left of 0 indicates attribution towards low informativeness. Dots farther from 0 indicate stronger influence for the feature. Numbered yellow annotations indicate insights discussed in the paragraph text.

Aggregating SHAP attributions across a dataset in a scatter plot provides an overview of the model behavior ( Figure  19 ). This global explanation provides insights into which features are the most influential for predicting informativeness. Unlike Lim et al. [55] who used a single Decision Trees for global interpretability, using SHAP globally method is model agnostic, i.e., it can be used to explain any underlying prediction model.

We report key findings annotated in Figure 19 . 1) Features about Prior Eating Habits are the most influential (widest spread). For example, users who often eat vegetables tend to learn more from Auto feedback. 2) Features about diet behavior were more influential than features about nutrition knowledge. This agrees with our thematic analysis finding that users learn more about their Diet Behavior than Nutrition Knowledge (Figure 7) . 3) Features for the Auto model had higher influence than features for Manual. This suggests that users with Auto feedback were either more surprised about the feedback, or had little need for them (already knew), compared to users with Manual feedback. 4) The direction of informativeness depends on feedback mode: when eating meals with high levels of grains, participants with Manual feedback did not learn much (a), but participants with Auto feedback were surprised (b). 5) The level of influence for each feature varies by feedback mode too: when reviewing whether the current meal had baked cooking, For Manual feedback baked cooking is not influential, but for Auto feedback having baked foods would increase the informativeness of the meal and non-baked foods would be much less informative. 

Baseline-

Four groups of nutrition information about the current meal are shown, including the levels of calorie and macronutrients, food groups, cooking methods, and ingredients. The levels of calorie and macronutrients are calculated according to the American Dietary Guidelines for adults. Now please read through the information and select the items you find interesting or useful to you.

You can see two groups of information: five items of current meal information, and ten items of diet behavior information, which is accumulated nutrition information of your recent meals, e.g., "high level of calories at most in recent 4 meals". Why these 15 items instead of others?

We ran an AI system to analyze the data from our study and found these information items were most relevant to users' perception of learning. In other words, the selected items are the top-15 important ones. Now please read through the information and select the items you find interesting or useful to you. Then I will ask you to compare this mode with the first one.

An AI system selected three items of information for you, because it thought these items would be interesting to you rather than other information for this particular meal. Here it requires you to manually log the selected information items. Note that for different meals, the selected information items may change accordingly. This feature is different from NutritionTrack and DietTrack. Now please tell me the answers to the prompts and whether you find them interesting or useful for you. Then I will ask you to compare this mode with the prior ones.

SalienTrack Auto

Another AI system selected three items of information for you with the same logic as the Manual mode. But it shows you the information directly, instead of asking you to manually log.

Same as the Manual mode, the selected information items may change for different meals. Now please tell me the answers you would type into the UI and whether you find them interesting or useful for you. Then I will ask you to compare this mode with the prior ones.

The last mode is the hybrid version of the SalienTrack Manual and Auto modes. As you can see, some items are from the Auto mode, while some are from the Manual mode. Why do we combine them? Our prior study showed that both manually logging and automatically receiving information have advantages and disadvantages. For example, manually logging requires more effort from uses, but also makes users think more deeply about their food. You just experienced Manual and Auto modes separately. Now please talk about if this mode makes sense to you or not, and why?

Towards personal stress informatics: Comparing minimally invasive techniques for measuring daily stress in the wild

Crowdsourcing Exercise Plans Aligned with Expert Guidelines and Everyday Constraints

Recruitment in response to a pandemic: pivoting a community-based recruitment strategy to facebook for hard-to-reach populations during COVID-19

Trackly: A Customisable and Pictorial Self-Tracking App to Support Agency in Multiple Sclerosis Self-Care

Flexible and Mindful Self-Tracking: Design Implications from Paper Bullet Journals

Understanding families' motivations for sustainable behaviors

Battling the bulge: menu board calorie legislation and its potential impact on meal repurchase intentions

Reflective Informatics: Conceptual Dimensions for Designing Technologies of Reflection

EarBit: Using Wearable Sensors to Detect Eating Episodes in Unconstrained Environments

Contextual design: defining customer-centered systems

OneNote Meal: A Photo-Based Diary Study for Reflective Meal Tracking

Lab of things: A platform for conducting studies with connected devices in multiple homes

HomeLab: Shared Infrastructure for Home Technology Field Studies

The role of explanations in casual observational learning about nutrition

Lightweight visual data analysis on mobile devices -Providing self-monitoring feedback

Effect of different cooking methods on vegetable oxalate content

Deep understanding of cooking procedure for cross-modal recipe retrieval

Deep-based Ingredient Recognition for Cooking Recipe Retrieval

XGBoost: A Scalable Tree Boosting System

Semi-Automated Tracking: A Balanced Approach for Self-Monitoring Applications

Characterizing Visualization Insights from Quantified Selfers' Personal Data Presentations

Understanding self-reflection: How people reflect on personal data through visual data exploration. Pervasive Health

When Personal Tracking Becomes Social: Examining the Use of Instagram for Healthy Eating

Identifying and Planning for Individualized Change: Patient-Provider Collaboration Using Lightweight Food Diaries in Healthy Eating and Irritable Bowel Syndrome

Boundary negotiating artifacts in personal informatics: Patient-provider collaboration with patientgenerated data

No longer wearing: Investigating the abandonment of personal health-Tracking technologies on craigslist

Active Learning with Statistical Models

Goal-setting Considerations for Persuasive Technologies That Encourage Physical Activity

Activity Sensing in the Wild: A Field Trial of Ubifit Garden

Rethinking the mobile food journal: Exploring opportunities for lightweight photo-based capture

Barriers and negative nudges: Exploring challenges in food journaling

the Effects of Source Expertise and Feedback Valence on Intrinsic Motivation

Explainable AI Meets Persuasiveness: Translating Reasoning Results Into Behavioral Change Advice

Image-based food calorie estimation using knowledge on food categories, ingredients and cooking directions

Crumbs: Lightweight daily food challenges to promote engagement and mindfulness

Taming Data Complexity in Lifelogs: Exploring Visual Cuts of Personal Informatics Data

Opportunities and challenges for long-term tracking

The law of attrition

Single-View Food Portion Estimation Based on Geometric Models

UbiGreen: Investigating a Mobile Tool for Tracking and Supporting Green Transportation Habits

Recognition of cooking activities through air quality sensor data for supporting food journaling

The Discovery of grounded theory: strategies for qualitative research

Universal methods of design: 100 ways to research complex problems, develop innovative ideas, and design effective solutions

Multi-Task Image-Based Dietary Assessment for Food Recognition and Portion Size Estimation

Food, Not Nutrients, Is the Fundamental Unit in Nutrition

Buffalo meat composition as affected by different cooking methods

Foundations for Systematic Evaluation and Benchmarking of a Mobile Food Logger in a Largescale Nutrition Study

Food for Thought: The Impact of m-Health Enabled Interventions on Eating Behavior

Lullaby: A Capture &#38; Access System for Understanding the Sleep Environment

Reflection Companion: A Conversational System for Engaging Users in Reflection on Physical Activity

A Stage-Based Model of Personal Informatics Systems

Using Context to Reveal Factors That Affect Physical Activity

Personalized HeartSteps: A reinforcement learning algorithm for optimizing physical activity

Trade-off between automation and accuracy in mobile photo recognition food logging

How Does a Nation Walk? Interpreting Large-Scale Step Count Activity with Weekly Streak Patterns

Assessing Demand for Intelligibility in Context-Aware Applications

To do or not to do: Using positive and negative role models to harness motivation

TableChat: Mobile Food Journaling to Facilitate Family Support for Healthy Eating

A Unified Approach to Interpreting Model Predictions

FoodScrap: Promoting Rich Data Capture and Reflective Food Journaling Through Speech Input

Co-Designing Food Trackers with Dietitians: Identifying Design Opportunities for Food Tracker Customization

Krause's food & the nutrition care process-e-book

One More Bite? Inferring Food Consumption Level of College Students Using Smartphone Sensing and Self-Reports

Automated estimation of food type and amount consumed from body-worn audio and motion sensors

From Reflection to Action : Combining Machine Learning with Expert Knowledge for Nutrition Goal Recommendations

Why are Indians more prone to diabetes?

What is the sense of agency and why does it matter?

Im2Calories: Towards an automated mobile vision food diary

Sodium intake and risk of death from stroke in Japanese men and women

Just-in-Time Adaptive Interventions (JITAIs) in Mobile Health: Key Components and Design Principles for Ongoing Health Behavior Support

Active learning -A cultural change needed in teacher education and schools. Teaching and Teacher Education

PlateMate: Crowdsourcing nutrition analysis from food photographs

Hierarchical Multi-Task Learning for Healthy Drink Classification

Biases in Food Photo Taking Behavior

Measuring agreement on set-valued items (MASI) for semantic and pragmatic annotation

MyBehavior: automatic personalized health feedback from user behaviors and preferences using smartphones

Anchors : High-Precision Model-Agnostic Explanations

Using decision-theoretic experience sampling to build personalized mobile phone interruption models

FoodAI: Food image recognition via deep learning for smart food logging

Supporting patient-provider collaboration to identify individual triggers using food and symptom journals

Reflections of everyday activities in spending data

Modeling health behavior change: How to predict and modify the adoption and maintenance of health behaviors

Investigating Preferred Food Description Practices in Digital Food Journaling

Harnessing long term physical activity data -how long-term trackers use data and how an adherence-based interface supports new insights

Predicting "About-to-Eat" Moments for Just-in-Time Eating Intervention Tauhidur

3 Types of Cooking Methods and the Foods That Love Them

A Practical Approach for Recognizing Eating Moments with Wrist-mounted Inertial Sensing

Shifting to Virtual CBPR Protocols in the Time of Corona Virus/COVID-19

Designing Theory-Driven User-Centric Explainable AI

Fingerprints": Detecting meaningful moments for mobile health intervention

NeckSense: A Multi-Sensor Necklace for Detecting Eating Activities in Free-Living Conditions

Per-Meal Survey Figure 14: Example per-photo app feedback. Participants were presented with feedback information regarding the meal photo indicated with [App]. In Manual mode, all information is blank or unselected and users have to fill them out (shown here); in Auto mode, all information is pre-filled and users can edit them

Meal Macros (Calorie level) : Highest[Prev3-Current] Highest meal calorie level in previous 3 to current meals. Meal Macros (Protein level) : Highest[Prev3-Current] Highest meal protein level in previous 3 to current meals. Meal Macros (Fat level) : Highest[Prev3-Current] Highest meal fat level in previous 3 to current meals. Meal Macros (Calorie level) : Change[Prev1-Current] Change in calorie level (unchanged, decrease, increase) from previous and current meal. Meal Macros (Fat level) : Change[Prev2-Current] Change in fat level (unchanged, decrease, increase) from average of previous 2 meals to current meal

Change in presence of vegetables from average of previous 2 meals to current meal

Change in presence of vegetables (unchanged, decrease, increase) from previous meal of the same type (breakfast, lunch, dinner) to current meal. Meal Ingredients Count : Highest[Prev2-Current] Highest meal ingredient count in previous 2 to current meals

Average # meals with microwave cooking in previous and current meals. Meal Cooking (Microwaved) : Mean

Average # meals with microwave cooking in previous 3 to current meals. Meal Cooking (Pan/Air Fried) : Mean

Average # meals with pan/air fried cooking in previous 3 to current meals. Meal Cooking Method (Baked) : SD[Prev2-Current] Standard deviation of # meals with baked cooking in previous 2 to current meals

Standard deviation of # meals with deep fried cooking in previous 2 to current meals. Meal Cooking (Raw) : SD[Prev3-Current] Standard deviation of # meals with raw food in previous 3 to current meals. Meal Cooking (Steamed) : Trend

This appendix shows the different surveys used in the self-tracking informativeness study.