key: cord-0475496-cofx7yh9
authors: Yfantidou, Sofia; Sermpezis, Pavlos; Vakali, Athena
title: 14 Years of Self-Tracking Technology for mHealth - Literature Review: Lessons Learnt and the PAST SELF Framework
date: 2021-04-23
journal: nan
DOI: nan
sha: 14d9fd4571d834d16213be9c8848ff447727108a
doc_id: 475496
cord_uid: cofx7yh9

In today's connected society, many people rely on mHealth and self-tracking (ST) technology to help them adopt healthier habits with a focus on breaking their sedentary lifestyle and staying fit. However, there is scarce evidence of such technological interventions' effectiveness, and there are no standardized methods to evaluate their impact on people's physical activity (PA) and health. This work aims to help ST practitioners and researchers by empowering them with systematic guidelines and a framework for designing and evaluating technological interventions to facilitate health behavior change (HBC) and user engagement (UE), focusing on increasing PA and decreasing sedentariness. To this end, we conduct a literature review of 129 papers between 2008 and 2022, which identifies the core ST HCI design methods and their efficacy, as well as the most comprehensive list to date of UE evaluation metrics for ST. Based on the review's findings, we propose PAST SELF, a framework to guide the design and evaluation of ST technology that has potential applications in industrial and scientific settings. Finally, to facilitate researchers and practitioners, we complement this paper with an open corpus and an online, adaptive exploration tool for the PAST SELF data.

Nowadays, technology is an inextricable part of our lives that has affected us in multiple ways. Naturally, the field of health and well-being has not been left untouched. Especially when viewed through the lens of health promotion, technology is like a double-edged sword. On the one hand, research shows that technology may be detrimental to people's mental health and well-being [78] , while it has also contributed to a decline in physical activity (PA) by promoting a more sedentary lifestyle [130] . On the other hand, technological advancements and mHealth have enabled individualized health-promoting interventions to large populations via differing channels [130] . For instance, electronic health records, remote monitoring, and digital diagnostics are revolutionizing the healthcare domain by enabling more integrated, effective, and faster care even from a distance.

mHealth: the role of PA. Simultaneously, technological innovations aim to tackle the root causes of the population's health problems. According to the World Health Organization (WHO), physical activity is amongst the determinants of good health 1 . Specifically, physical inactivity has been identified by the WHO as the fourth leading risk factor for global mortality, accounting for 6% of deaths globally [135] . At the same time, according to Epstein et al. [44] , it is the most well-studied domain within mHealth, accounting for more than 1 out of 3 publications in the related literature. Hence, it is evident that the concept of mHealth is inextricably linked to PA and PA-promoting technologies. At the individual level, physically active people enjoy various health benefits, such as improved muscular and cardio-respiratory fitness and lower coronary heart disease rates. At a collective level, more active societies can generate additional returns on environmental and social benefits, such as reduced use of fossil fuels, cleaner air, and healthier economies [134] . Nevertheless, in today's society, fewer and fewer people are sufficiently active, with more than a quarter of the global population not meeting the WHO recommendations for PA 2 . To combat this physical inactivity pandemic, international agencies and organizations are launching short-term campaigns, e.g., United Nations' (UN) PA challenge during the COVID-19 pandemic, or setting long-term goals, e.g., the WHO's Global Action Plan for Physical Activity 2018-2030 [134] . Similarly, technology is trying to address the challenges of this physical inactivity pandemic in multiple ways. For example, social media have provided health experts and aficionados with a platform to share health and PA-related content to billions of users, interactive games, such as Pokémon GO, have managed to lift video game players off their sofas [7] , and activity trackers keep us aware of our PA levels at all times. At the end of the day, though, self-tracking (ST) technology might be the game-changer for health promotion and PA.

mHealth: the role of ST. ST (also referred to as self-monitoring, life-logging, quantified self, and personal informatics) refers to "the practice of gathering data about oneself on a regular basis and then recording and analysing the data to produce statistics and other data (such as images) relating to regular habits, behaviours and feelings" [110, §1] . In the digital world, ST refers to the use of ubiquitous technology (such as wearable or mobile devices and apps) for helping users monitor and manage various aspects of their lives, for instance, PA, sleep, and disease. Unlike desktop computing, ubiquitous computing can occur with any device, at any time, in any place and any data format across any network. Thus, connected wearable devices are becoming the default medium at which health and ST apps are booming 3 . As a result, there is a shift from accusing technology of promoting a culture of sedentariness to recognizing mHealth's potential for empowering well-being.

To accomplish the mission of assisting users with improving their health outcomes focusing on PA, ST technology needs to understand the human aspects of interaction with computers. In other words, ST technology needs to achieve two goals: (i) change users' behavior towards healthier habits, referred to as Health Behavior Change (HBC); and (ii) monitor these positive changes over time through sustained User Engagement (UE). To assist in the fight against physical inactivity, ST technology needs to enable people to successfully change their habits and attitudes to adopt better health behaviors (e.g., increase step counts, decrease prolonged sedentariness). While short-term PA changes can be temporarily beneficial, an efficient ST technology should aim for a long-term HBC. However, assisting users in their HBC journey requires ways to reliably and accurately measure human behavior and technology usage. Microsoft co-founder and humanitarian Bill Gates (2013) [63] , expresses that what is often missing is good measurement and a commitment to follow the data: "I have been struck again and again by how important measurement is to improving the human condition". Indeed, designers and researchers need to be able to measure the effectiveness of their interventions to harvest and maintain the full benefits of digital HBC, through successful HCI and UE. • Systematic Review.

-We explore the basic principles associated with ST technology for HBC (with a focus on PA) and UE and identify key challenges, limitations, and open questions that motivate our work (Section 2) -We present our methodology for conducting a systematic review of the related literature, based on which we pose this article's research questions (Section 3). -We analyze included works and present a synthesis of the results related to interventions, experimental setups, theoretical frameworks, and publication details (Section 4). -We proceed to an in-depth study of the system design elements, namely interface components and system features, and how they affect the effectiveness of ST interventions, as well as the PA-and UE-related evaluation metrics most commonly used in ST research (Section 5). • PAST SELF.

-We propose a framework to systematically classify and evaluate methods and experimental results in ST technology. Specifically, PAST SELF is a conceptual, prescriptive framework, in the sense that it guides system design and evaluation by transforming raw data into a set of abstractions and guidelines. The PAST SELF framework consists of two components: The Design Component (Section 5.1.6) introduces the Periodic Table of Self-Tracking Design (PAST), and enables ST developers and researchers to identify the most effective software and interface design elements for ST technology based on previous interventions. The Evaluation Component (Section 5.2) provides practitioners with a standardized way of measuring PA and UE to evaluate the effectiveness of their system over time, by introducing the Self-Tracking Evaluation Framework (SELF). -We make our corpus of primary works that aggregates detailed information about the methods and metrics used in each of them, as well as numerical results of their experimental findings, publicly available as an open dataset [190] . Moreover, we provide an online, interactive tool 4 for the visualization of the PAST component of our framework, as well as its source code for reusability purposes [174] . We deem that the corpus dataset (which is open to contributions) and the visualization tool can further facilitate ST researchers and practitioners beyond this review's scope.

In this section, we initially provide an overview of the HBC and UE technology aspects, which are identified as crucial aspects for the ST technology's success, as seen in Section 1. We provide a formal background on these aspects valuable for ST practitioners and researchers. Moreover, we discuss relevant past reviews and meta-analyses to reveal critical challenges and limitations, as well as the novelty of the present work.

In the context of public health, HBC refers to "efforts put in place to change people's habits and attitudes, to prevent disease" [135] . HBC efforts can be aimed at different levels, including individual, organizational, community, and population levels, and there exists an interaction between them. HBC programs usually utilize behavioral change theories at different levels. Individual and interpersonal theories are frequently encountered in the field of ST technology, including, but not limited to, the Transtheoretical Model of Health Behavior Change [151] , and the Self-Determination Theory [38] . To better understand the theories implemented in ST technology, we provide a qualitative analysis of theoretical frameworks in ST in Section 4.

Descriptions of PSD model strategies Primary task support

The system has to decrease the effort and strain users consume when doing their target behavior by reducing complex behaviors into simple and easy tasks for users.

The system has to guide users in the attitude change process or experience by providing opportunities for action performance that makes the user nearer to the target behavior.

The system offers tailored information for its user group according to their interests, needs, personality, or other factors related to the user group.

The system has to provide personalized content and customized services for users. Self-monitoring

The system has to give means for users to track and monitor their performance, progress, or status in accomplishing their goals.

The system needs to give means for observing and noticing the connection between the cause and effect of users' behavior.

The system must deliver means for rehearsing a target behavior. Dialogue support Praise

The system has to deliver praise through images, symbols, words, videos, or sounds as an approach to give user feedback information regarding his/her behavior.

The system should offer virtual rewards for users to provide credit for doing the target behavior.

The system has to remind users to perform their target behavior while using the system.

The system has to suggest ways that users can achieve the target behavior and maintain performing behavior during the use of the system. Similarity

The system must imitate its users in some particular manner, so the system should remind the users of themselves in a meaningful way.

The system should be visually attractive and contain a look and feel that meets its users' desires and appeal. Social role

The system has to adopt a social role by supporting the communication between users and the system's specialists. System credibility support Trustworthiness

The system has to give truthful, fair, reasonable, and unbiased information.

The system has to offer information displaying experience, knowledge, and competence. Surface credibility

The system must have a competent look and feel that portrays system credibility based on an initial assessment. Real-world feel

The system must give information about the organization or the real individuals behind its content and services.

The system should refer to people in the role of authority. Third-party endorsements The system should deliver endorsements from well-known and respected sources.

The system has to give means to investigate the accuracy of the system content through external sources. Social support Social learning

The system has to give a user the ability to observe other users and their performance outcomes while they are doing their target behavior. Social comparison

The system should enable users to compare their performance with other users' performance.

The system has to have a feature for gathering together individuals that have identical objectives and let them feel norms. Social facilitation

The system should enable a user to discern other users who are performing the target behavior along with him/her.

The system should offer the opportunity for a user to cooperate with other users to achieve the target behavior goal.

The system should allow a user to compete with other users. In the competition principle, there is a chance for winning or losing a race.

The system has to offer public recognition (e.g., ranking) for users who do their target behavior. However, behavioral change theories do not provide specific details on how their theoretical components could be translated into a real-world HBC system, such as ST technology. Thus, the interpretation is up to the ST practitioner. To bridge this gap, Oinas-Kukkonen and Harjumaa [133] have proposed a computer-science-based framework for Persuasive Systems Design (PSD), which is theory-creating by its nature. PSD is widely adopted and appreciated as a model that describes the content and software functionality required in a Behavior Change (BC) product or service. For these reasons, we organize and present the persuasive strategies utilized by the included studies (Sections 5.1.1 to 5.1.5), and we build the PAST component of our framework (Section 5.1.6) based on the generic PSD principles. In particular, the PSD framework defines four categories of strategies, namely, primary task support, dialogue support, credibility, and social support; each category has seven sub-groups within. Table 1 summarizes the PSD model strategies for BC. As a final note, we would like to mention that the PSD framework per se provides no information about each of its elements' effectiveness in achieving HBC, which we discuss below and address in our framework. Specifically, in the following sections, we present and discuss results of previous ST studies on the effectiveness of different system interaction components (Sections 5.1.1 to 5.1.5) and propose a way to formalize the quantification of their efficiency in our PAST SELF Framework (Section 5.1.6).

For HBC to translate into population health, it must be maintained over the long run regardless of the use of technological interventions. It is important to note that attrition of a single ST technology does not equal abandonment of the desired behavior. There are two categories of lapses when it comes to ST technology: short-term and long-term lapses, and the latter may lead to attrition [47] . However, the causes behind such lapses are varied, ranging from the high cost of collecting and integrating, or having and sharing data, to changed life circumstances, and accomplished or alternative goals and contexts [45] . Additionally, users might choose to switch between ST systems to match their dynamic needs and health goals; hence abandoning one system does not necessarily mean abandoning ST or the effort for HBC [161] . Nevertheless, in the context of ST technology, HBC systems need to measure, increase, or sustain UE to drive users towards HBC. If abandoned early, ST technology can evidently not accompany the users in their HBC journey, which may or may not continue independently. A clearer definition of UE will enable us to set the foundations for its quantification.

Attfield et al. [11] define UE as "the quality of the user experience that emphasizes the positive aspects of the interaction, particularly the phenomena associated with wanting to use a technological resource longer and frequently". UE is vital to measure because it can quantify if the user interacts with the system successfully or not, avoiding higher attrition rates, a common problem of ST technology, as seen in Section 1. We adopt a three-faceted view of UE: the emotional aspect, the cognitive aspect, and the behavioral aspect of UE [98, 144] . The emotional aspect refers to the user's feelings and state of mind regarding the system and is usually measured through self-reports. The cognitive aspect refers to the user's physical reaction to the system (e.g., eye gaze, bodily response). It is usually measured through physiological measurements, such as body temperature and heart rate measurements. Finally, the behavioral aspect refers to the user's behavioral response to the system (e.g., frequency of visits, duration) and is usually measured by analytics, such as analytics on usage logs. Each aspect captures UE's different characteristics, and a combination of all aspects offers a holistic view. Several UE metrics that fall under these aspects refer mostly to desktop computing or, in some cases, mobile computing, where measurements, such as eye-tracking or mouse-tracking, are still possible. Hence, in our work, we screen the included studies for elements related to the emotional, cognitive and behavioral aspects of UE by adapting and refining the generic methodology of Lalmas et al. [98] to the ubiquitous ST technology domain. This refinement leads to the creation of the SELF component of our framework, as discussed in Section 5.2.

Various studies have examined and evaluated the effectiveness of ubiquitous interventions for HBC with a focus on reducing sedentary behavior or increasing PA for individuals. Some researchers have conducted reviews that include interventions targeted to samples with specific characteristics, such as age group [65, 179] , race [126] , mental health issues [127] , or prior experience with ST [155] . Others focus on examining the effects of specific behavior change techniques, such as incentives [168] , personalization [124] , social sharing [46] , data summarization [72] , or technological advancements [169] on the activity levels of individuals. While these studies provide valuable knowledge for future research in tailored interventions, it is evident that due to the strict inclusion criteria, these reviews suffer from a limited number of primary studies, and their results might not be generalizable to the whole population. At the same time, some reviews do not focus on ubiquitous technology solutions, failing to capture the requirements of designing and evaluating modern ST technology. Also, due to the subject's multidisciplinarity, multiple reviews from different domains (e.g., medicine, psychology) seem to neglect recommendations for designing and evaluating ST technology, thus failing to bridge the gap between theoretical foundations and practice.

On the other hand, reviews that explore the design space for HBC offer insights for extended explorations targeted by our work. For instance, researchers in persuasive computing focus on using ST technology to persuade people to change their health behaviors. Matthews et al. [114] and Aldenaini et al. [3] have conducted systematic reviews of 80 and 20 papers respectively to assess the effectiveness of mobile phone-based interventions in encouraging PA and identify research trends in the area. However, they neither assess the effectiveness of individual intervention components nor propose a comprehensive methodology for the overall evaluation of such interventions. To address the first limitation, Aldenaini et al. [4] have expanded their initial work by publishing a second systematic review of 170 papers, where they have evaluated the effectiveness of individual intervention components in promoting PA. Similar to our work, they categorize intervention components under the PSD framework and report the success rates per technique. However, due to the static nature of their report, it is cumbersome for the reader to assess the effectiveness of such components for a distinct sample population, intervention duration, or sample size. Additionally, none of the aforementioned works culminates their review into a prescriptive, end-to-end framework, such as "PAST SELF" for designing and evaluating ST technology.

To encounter such frameworks, we need to explore HCI research in the field of ST technologies for HBC. Several HCI user studies have culminated in frameworks, as well as models and guidelines, for designing successful HBC technologies. Specifically, Li et al. [103] have proposed the widely used stage-based model of personal informatics systems composed of five stages (preparation, collection, integration, reflection, and action) that describe the user's experience with ST technology through time. Similarly, Epstein et al. [47] have introduced the lived informatics model of personal informatics by surveying and interviewing past and present trackers of PA, finances, and location regarding their experiences. Both models are fundamental in HCI literature for HBC and provide valuable insights for ST technology design. However, their focus is inherently different from our framework's, which aims to provide researchers and HCI designers with actionable insights on translating theory into practice (prescriptive framework) rather than high-level recommendations (explanatory framework). Also, Elsden et al. [43] have proposed "Quantified Past", a model that seeks to explore how to design for long-term use, while Kumar et al. [97] conducted a literature review on mobile and wearable sensing frameworks for mHealth. Nevertheless, all of the frameworks and models mentioned above are conceptually different, but of no less importance than the one proposed in this paper, in a sense that they are a result of empirical HCI research and not based on the systematic review and synthesis of years of literature in the field.

Evidently, though, HCI researchers have also conducted several reviews in the field of ST technology for HBC. Ayobi et al. [12] have published a review of 20 papers, identifying and characterizing three streams of research in personal informatics-psychological, phenomenological, and humanistic. Similarly, in their work, Kersten-van Dijk et al. [79] have reported promising insights and methodological pitfalls drawn from 24 empirical studies utilizing ST technology. While these reviews introduce distinct research directions, they do not provide any actionable information regarding designing and evaluating ST technology. Additionally, they are limited in scope, since they draw their primary studies solely from a computer science digital library (despite the interdisciplinarity of the domain). More recently, Epstein et al. [44] have published an exhaustive mapping review of the personal informatics literature, where they seek to answer questions related, but not limited, to identifying personal informatics sub-domains, tracking motivations, challenges and ethical concerns, and types of research contributions. While this mapping review is a significant contribution to the ST literature, its scope is orthogonal to this systematic review. Mapping reviews aim to summarize the range of findings on a research topic at a high-level. By comparison, systematic reviews synthesize and summarize those findings, for example, into comprehensive guidelines or prescriptive frameworks, such as PAST SELF.

Summing up, to the best of our knowledge, there are no systematic reviews that provide an evidence-based, end-to-end solution for designing and evaluating ST technology based on synthesizing, correlating, and adapting past studies' results. Moreover, no review has bridged the gap between HBC and UE quantification in the ST domain as far as we are aware. The limited reviews that refer to UE focus primarily on UE on websites or generic software [164] , rather than on HBC technology. Hence, the current work aims to bridge these gaps by following a formal research methodology, which we present in the following section.

The current study follows an established methodology to determine how researchers have approached the design and evaluation of ubiquitous ST technology solutions for HBC and UE. Precisely, we follow a systematic methodology to ensure the quality of included studies and limit the initial number of articles, based on the guidelines introduced by Kitchenham's [87] widely recognized protocol for conducting a systematic review.

Based on this protocol, we initially identified the need for a systematic review. In Section 2.3, we gave an overview of the literature reviews and frameworks surrounding the use of ST technology in HBC interventions. However, none of them focused on this article's objective, namely identifying, categorizing, and presenting best practices for designing and evaluating ST technology for HBC and UE with a focus on PA. At the same time, we could not locate any similar work. Based on the Kitchenham methodology criteria and the research gaps discussed in Sections 1 and 2.3, we specify the following five research questions that drive our review. We believe that these questions provide researchers in the domain or general stakeholders with a comprehensive view of UE and HBC strategies in ubiquitous ST technology, with the ultimate purpose of increasing PA levels. RQ1 How many research studies exist that address issues related to sustained UE and HBC in ST technology? RQ2 Which set-ups have been used (e.g., sample size, intervention duration) for HBC and UE experiments? RQ3 Which are the theoretical frameworks that have been used to increase PA and sustain UE? RQ4 Which are the most effective HCI design strategies for ST technology (interface and system components and functionalities) that have been used to achieve HBC and sustain UE? RQ5 How can we measure PA and evaluate UE in ST technology?

To locate the papers that would help us answer the questions above, we chose to perform a broad, automated search in digital libraries, focusing on articles that have been published in journals and conferences for the last 18 years (between 2004 and 2022). Early research works in the field of ubiquitous technology for HBC with a focus on PA (see related reviews [34, 44] [32] and UbiFit Garden (2008) [35] . Additionally, commercial wearable technology for ST (e.g., Nike + iPod Sport Kit, Fitbit) emerged in the late 2000s. Hence, our research time range is chosen based on the aforementioned scientific and industrial advancements. As our sources, we utilized Google Scholar, Scopus, IEEE Xplore, and Web of Science digital libraries for their coverage and accessibility or the quality of their results. To identify appropriate search terms combined with Boolean operators, we followed the guidelines of Spanos and Angelis [166] . According to the authors, "the determination of search terms is an iterative procedure starting with trial searches using different search terms, considering an initial set of articles that is already known to belong to the research field of the systematic review". The procedure of determining search terms ends "when the initial set of already known articles is found by the search". Following this procedure led us to the search query below:

("mobile application" OR "mobile phone" OR smartphone OR "digital coach" OR "digital trainer" OR wearable OR "activity tracker" OR "self-tracking devices" OR smartwatch) AND ("user study" OR "persuasive technology" OR "user engagement" OR "user motivation") AND (fitness OR "physical activity" OR exercise)

To ensure the high quality and relevance of the included papers, we define appropriate inclusion and exclusion criteria which help us determine the final sample of articles:

(1) Articles published in a peer-reviewed academic journal or proceedings from an international scientific conference (NPR);

(2) Articles published in English (NE);

(3) Articles that include at least one user experiment, such as an intervention, a pilot study, or a longitudinal study (NI); (4) Articles that utilize at least one ubiquitous device, such as a mobile or a wearable device (NUI); (5) Articles that include a quantitative assessment of the intervention's effect, either in terms of altered PA levels or user engagement (NM).

(1) Articles that discuss HBC and UE in different domains, such as Marketing or Behavioral Economics (DD);

(2) Articles that discuss other forms of HBC, such as nutrition monitoring, stress monitoring, or smoking cessation, rather than physical activity (NPAO); (3) Articles that refer to interventions that utilize a subjective assessment of activity levels, e.g., utilizing surveys measuring the perceived level of activity or manual data entry (NS); (4) Articles that use out-of-the-box products, without any additional intervention components, solely for monitoring performance (OP); (5) Articles that do not describe the ubiquitous ST technology's design features in the description of the user experiment (NDF); (6) Articles published in different outlets but referring to the same user experiment (D). Finally, the sequential execution of the steps above led to our review's final set of articles. Our search results from various digital libraries are depicted in Figure 2 . Overall, we screened 16924 articles after duplicate elimination, of which we removed 374 based on date criteria (earlier than 2004), 15112 based on the title, and 862 based on the abstract. After carefully studying the remaining 576 articles, we excluded 447 articles based on our inclusion/exclusion criteria. Hence, 129 articles synthesized our final pool. We present the data features we extracted from the included articles and the results of their synthesis in the following sections.

In this section, we present our quantitative findings based on the investigated articles. The section provides answers to our research questions RQ1, RQ2, and RQ3; namely, it presents the number and venues of published studies related to sustained UE and HBC in ST technology over the years, the experiment set-ups they have utilized, as well as the theoretical frameworks behind the intervention design.

We first present a general overview of the scientific topics in the field of ST technology for HBC in Figure 3 . The depicted word cloud is produced from the keywords in our article pool. It illustrates topics gathering scientific interest in the domain of ST technology for HBC. Not surprisingly, "physical activity", "behavior change", and "mHealth", are some of the main topics of interest. It also highlights some of the design techniques utilized for HBC interventions (e.g., gamification, social support, competition) and the sampled populations' characteristics (e.g., adolescents, type-2 diabetes, obesity). Finally, it verifies the interdisciplinarity of the field, as discussed in previous sections, with terms emitting from Behavioral Economics, Medicine, Psychology, and Computer Science.

Regarding our research questions, we present the results in response to RQ1 in Figures 4 and 5 , which summarize the number of publications per year related to ST and HBC technology and the number of publications per publisher or conference organizer, respectively. User interventions utilizing commercial or custom-made ST technology for HBC first appeared in 2008, which is reasonable since ST technology became commercially available in the late 2000s. Since then, the number of related publications has faced a steady increase, while it has more than doubled in three years (between 2014 and 2017). Note that Figure 4 is only based upon our final article pool (129 articles), but our exploration of the non-eligible article pool (447 articles) follows a similar trend. It is important to note that there is a significant body of work in the field prior to 2008, which refers to artifact contributions or experimental technology that led to the development of advanced interventions later on. While these works are ground-breaking in the field, they are out of the scope of this literature review, which focuses on user interventions for HBC and UE, their components, and reported results.

Key Finding: ST technology for HBC research is still in its infancy, but with growing interest over the years. Key Finding: The interdisciplinarity of the domain is evident by its main publishing venues; medical and medical informatics journals hold the first spots for journals, while computer science conferences are the most popular. Figure 6 shows that more than three-quarters of all investigated studies have a sample size of fewer than 100 subjects, while more than a third have a sample size of less than 20. Similarly, in Figure 7 , we can see that 4 out of 5 interventions have a duration of fewer than three months, while almost half have a duration of less than a month. While such sample sizes and duration may be understandable in an experimental setting, small samples and short duration can negatively affect the generalizability of interventions' results [50] . Hence, these observations highlight the need for large-scale experiments and provide a fertile space for future experimentation with larger populations over extended periods. Key Finding: The majority of interventions utilize a small sample size (< 100 subjects) and short duration (< 1 month), causing generalizability issues for the intervention outcomes and asking for large-scale interventions.

The mean age of the sample population for all studies is 38 years old. The majority of the experiments refer to young adults (22%), i.e., university students, or adults and middle-aged people (48%), while fewer experiments focus on kids (5%) and adolescents (13%) or the elderly (12%). However, the population worldwide is ageing 5 , meaning that we are experiencing a growth in the number and proportion of older people. Hence, ST technology should cater to the needs of this growing market segment, and future studies should focus on exploring the design space of ST devices for the elderly. At the same time, adolescents are active users of technology and also the most inactive population subgroup, with 3 in 4 adolescents (aged 11-17 years) not meeting the global recommendations for PA set by WHO [134] . Future research should be directed towards designing adolescent-oriented ST technology by prioritizing the enhancement of features that they can offer to this user subgroup. Researchers working with sensitive user groups, such as the elderly, kids, and adolescents, should always consider the ethical and societal implications behind technological intervention deployments with these cohorts, as discussed in Section 6.

Key Finding: While research currently focuses on the adult population, ST technology for the elderly or adolescents demonstrates a hidden potential for the future.

Apart from the age criterion, a small percentage of studies have focused on populations with different characteristics regarding gender (<8%), ethnicity (<3%), physical health (<18%), PA level (<8%), employment status (<16%), and mental health (<1%). The most popular ST technologies utilized for the experiments involve Fitbit activity trackers (36 articles), ActiGraph accelerometers (12 articles), and Polar, Omron, and Jawbone activity trackers (4 articles each). Whenever an intervention focused on mobile ST technology, it usually utilized the mobile's built-in sensors and a custom-made application. Much fewer interventions incorporate powerful out-of-the-box APIs, such as the Google Fit SDK, or Apple's Core Motion API. We presume that the following two reasons have contributed to this situation. First, this could be because many of the researchers behind ST for HBC interventions do not have a computer science background and potentially lack the skill set to develop more technologically advanced interventions. More importantly, though, many of these technologies have only been made available recently and are still in development. For example, Apple's CoreMotion API was only released in 2013, meaning that researchers conducting interventions previously had to develop every component from scratch. Nowadays, such APIs facilitate developers in creating ST technology interventions by giving them easy access to sensor data and users' real-time and historical PA data and integrating with popular ST devices.

Key Finding: There is an abundance of existing, out-of-the-box APIs and ML libraries that can facilitate and enhance ST interventions development. However, up to now, studies have been limited to exploiting only a fraction of these capabilities. Figure 8 gives us an answer to RQ3; namely, it outlines the theoretical frameworks utilized to support HBC interventions. Notice that almost half of the investigated articles (specifically 48%) either did not utilize any theoretical framework or did not clearly identify one in their full text, limiting their potential impact. This absence is partly driven by the plethora of overlapping behavior change theories and related strategies and the lack of domain expertise and interdisciplinary research [145] . However, BC theories can positively affect ST technology by informing design, guiding evaluation, and inspiring alternative experimental designs [74] . From those articles that did mention a psychological theory, we can see that the four most popular theories include the Social Cognitive Theory (21 articles) [157] , the Behaviour Change Technique Taxonomy [119] and the Self-Determination Theory [38] (12 articles each), and the Transtheoretical Model (11 articles) [151] . While these theories mostly explain human behavior rather than provide methods for designing and evaluating ST technology, they work as foundations behind ST technology's proposed interventions. The PAST SELF framework considers several interventions, backed by different theories, bringing their insights together to provide a common framework for designing and evaluating ST technology for HBC and UE.

Key Finding: There is no common framework for ST interventions. Also, almost half of the investigated articles were not based on any theoretical framework or did not clearly identify one in their full text.

To sum up, our key findings showcase that ST technology research for HBC is still in its infancy with growing interest from various disciplines (RQ1). It is a field with several unexplored facets with great potential for research, e.g., conducting experiments with larger sample sizes, longer duration, and varied population characteristics (RQ2). Finally, the lack of utilization of development tools and theoretical frameworks (RQ3) highlights the need for a standardized, robust, and extensible framework for the design and evaluation of ST systems for HBC and UE; this is the gap that the PAST SELF framework we propose (Section 5) aims to fill.

In Section 4, we offer an evidence-based response to three of our research questions, RQ1, RQ2, and RQ3. This section responds to RQ4 and RQ5 through the thorough study of the 129 investigated articles and the synthesis of the outcomes towards establishing a systematic framework. In the following subsections, we present an analysis of the most common HCI design (Section 5.1) and evaluation (Section 5.2) techniques encountered in the included articles, providing answers to RQ4 and RQ5, respectively.

Additionally, we bring together past results and current practices in the form of the PAST SELF Framework. PAST SELF is an evidence-based framework for identifying, categorizing, and presenting ST technology's most common design and evaluation elements for HBC and UE. Specifically, in Section 5.1.6 we propose the design component of the framework, the Periodic Table of Self-Tracking Design (PAST), and in Section 5.2 we propose the evaluation component, the Self-Tracking Evaluation Framework (SELF). Finally, in Section 5.3, we demonstrate how researchers and practitioners in the field can apply and benefit from the PAST SELF framework through a use case scenario. PAST SELF aims to help researchers learn from previous experiences, identify best practices, facilitate and accelerate their current research, and avoid common pitfalls.

This section discusses how the general-purpose PSD persuasive techniques have been used in practice in prior research to design ST interfaces and system features. To achieve this and provide an answer to RQ4, we manually extract PSD elements from the investigated papers and discuss them in Sections 5.1.1 to 5.1.5. Finally, in Section 5.1.6, we bring together the knowledge extracted from the investigated literature, presenting the PAST Component of our PAST SELF Framework, which gives us an evidence-based insight into the effectiveness of different persuasive techniques for designing successful ST technology.

In the PSD Framework, Primary Task Support includes seven persuasive techniques: Self-monitoring, Personalization, Reduction, Tailoring, Tunneling, Simulation, and Rehearsal (See Table 1 ).

In the investigated papers, self-monitoring usually takes the form of online visual, textual, audio or haptic feedback on a mobile application dashboard, a watch face or application, a phone background, a push notification, a web application dashboard, a public display, or another wearable device interface. It can also take the form of offline feedback, such as e-mail or SMS communication. The feedback can be both real-time and analytical, even though the latter is less common in the investigated literature. Real-time, textual performance feedback, such as step count, is the most common information provided to the users as an indication of their daily PA. It comes as no surprise that self-monitoring is the most common PSD technique encountered in the literature, since recording and visualizing the users' data is by definition the purpose of ST technology.

Personalization is implemented as personalized goals based on past PA [2, 9, 15, 16, 25, 28, 30, 37, 41, 68, 69, 71, 76, 85, 91-93, 99, 117, 120, 128, 146, 165, 186, 196, 197] , automatic identification of psychological state [102] , personalized exercise plans based on user goals, preferences or past behaviors [19, 22, 76, 100, 143, 152, 153, 167] , or generally personalized content based on user's habits, time of day, location, weather, or preferences [1, 23, 39, 64, 66, 89, 118, 153, 159, 176, 185, 191] . Specifically, to set new, personalized goals, the system learns about the user's activity levels based on their past PA (usually daily step counts) and then recommends a more challenging daily step goal. The goal has to be challenging enough to motivate the user to increase their PA but not unrealistically demanding to avoid demotivating the user. Also, it has to be in line with the recommended guidelines. Similarly, personalized content, such as exercise plans, aims to contextualize PA in the user's lifestyle to make it easier to perform the target behavior, namely sufficient PA. Promoting active transportation, e.g., walking to work over driving, is an example of a personalized, contextualized suggestion. Note that, according to previous research [104] , associations between physical activity and contextual information help users become more aware of the factors that affect their PA levels. Finally, the application of Machine Learning (ML) for ST personalization is very promising and still at its early stages. Different users have different patterns of usage [156] and future work should focus more on learning the users' needs, wants, pain points, and habits based on their past data to automatically adjust the systems' features to the individual user and maximize its results.

Reduction is achieved through multiple simultaneous goals [26, 27, 33, 55, 66, 67, 75, 96] , graded goals or difficulty levels [17, 27, 58, 61, 75, 85, 92, 100, 125, 129, 139, 152, 171, 172, 195] , contextualization of PA into everyday life [54, 67, 112, 128, 185, 187] , exercise planning functionality or guidance [5, 9, 17, 19, 23, 30, 59, 75, 76, 89, 101, 118, 129, 143, 148, 159, 167] , social network integration [2, 30] , or feedback on progress towards goal accomplishment [8, 42, 48, 75, 83, 102, 111, 125, 176, 194] . The idea behind reduction is that the complexity of performing PA (time, physical and mental effort, and persistence required) can be decreased by modularizing PA and providing users with step-by-step guidance along the way. For instance, by providing graded, simultaneous step goals, the system gives even less active users (e.g., 6000 daily steps) the satisfaction of goal accomplishment to motivate them to perform better in the future (e.g., 8000 daily steps or even the recommended 10000 steps). On a different note, systems that offer on demand exercise plans of different duration, PA type and PA level, facilitate even the most novice users in the process of performing PA.

Tailoring takes the form of personalized information based on psychological profiles [1, 25, 73, 102, 120, 128, 192] , gender [73, 162, 165] , age [5, 15, 59] , professional occupation [120, 165, 176] , interests [76] , health status [58, 64, 148] , or season [94] , as well as tailored motivational messages [92, 136] , and exercise plans based on user experience [69, 129, 143] or established guidelines [180] . For example, cancer survivors require different PA content than healthy adults, elderly with limited tech skills require different interfaces than the tech-savvy kids of today, exercise buffs need more demanding exercise plans than novices, and people of different ethnicities might have different habits in terms of PA that the system should take into consideration. Note that Tailoring differs from Personalization in that it caters for and adapts to the preferences of a user subgroup rather than an individual user. Similar to Personalization, Tailoring ST technology has great potential in the age of ML. Future researchers can train inclusive models for sample populations with different characteristics to build more inclusive ST technology that caters to different user groups' needs.

Tunneling is realized through information provision and PA recommendation pairing [52, 99, 115] , reminders and goal-setting pairing [2, 108, 129, 147, 159] , reminders and PA recommendation pairing [187] , goal-setting and feedback loop pairing [129] , context or goal identification and PA recommendation pairing [100, 153] , gradual goal adjustments [16, 41, 52, 167] , graded rewards availability [26, 195] , or step-by-step guided PA routines [19, 23, 141] . The concept behind tunneling is transforming PA, or any related target behavior, into a step-by-step process that the user can follow. For example, based on the investigated literature, users are less likely to set their own goals autonomously. To get the users into the habit of setting goals, some systems send goal-setting reminders, potentially accompanied by new goals recommendations. Similarly, to recommend appropriate PA programs to the user, some systems inquire users about their goals and provide PA suggestions that make it possible to achieve these goals. These are examples of how a user can achieve a target behavior, e.g., set up a step goal or follow a PA recommendation, step-by-step with the system's help.

Simulation is implemented as cause-and-effect metaphors of growing gardens [33, 68, 76] that bloom with PA and wither with inactivity, pet avatars [85, 86, 141, 171] and human avatars [118, 162] that are happy and thriving when the user is active and sad otherwise, and other virtual experiences [58, 69, 129, 172] that are affected by real-world PA. Moreover, it is achieved through the presentation of expected health outcomes (short-term or long-term) based on current PA levels [23, 66, 67, 152] , or via connecting locations or hours of the day with levels of PA or sedentariness [10, 176] . The concept behind the investigated papers that utilize simulation is that the user will develop an emotional bond and a sense of responsibility about the virtual garden or pet, which will drive them to perform more PA to better care for them.

Finally, Rehearsal is achieved through demonstration of exercises via short videos [5, 90, 91] or animated icons [83, 101] . This technique can be beneficial for novice and less tech-savvy target groups, such as the elderly, but advanced or sporty users might find it indifferent.

In the PSD Framework, Dialogue includes seven persuasive techniques: Rewards, Suggestion, Reminders, Similarity, Praise, and Social Role (See Table 1 ).

In the included papers, Rewards are implemented as free game commodities [14, 58, 61, 76, 102, 125, 172, 182] , congratulatory feedback for goal achievement or breaking sedentariness bouts [2, 18, 24, 39, 82, 85, 99, 101, 105, 128, 146, 183, 196, 197] , additional system functionality [85, 86] , badges or points [8-10, 17, 23, 26, 27, 31, 36, 40, 58, 59, 69, 77, 90, 93, 102, 117, 121, 132, 139, 146, 171, 172, 187, 195, 197] , raffle tickets [20, 83, 138, 140, 186] , and material [36, 139] or financial incentives [29, 30, 36, 42, 53, 71, 96, 113, 116] . Based on our article pool, rewards is the most ambiguous PSD technique. Various large-scale studies with financial incentives have reported statistically non-significant improvements in the user's PA, while studies utilizing virtual rewards, such as points and badges have reported mixed results. Hence, this technique should be used with caution in future work and always in combination with others. ST technologies should promote a long-term usage beyond rewards, as corroborated by related work [154] . Nevertheless, gained rewards create stored value for the user of a ST system, increasing the need to stay engaged [49] .

Suggestion takes the form of PA recommendations [1, 16, 19, 20, 28, 30, 36, 52, 58, 59, 66, 67, 89-91, 94, 96, 102, 120, 128, 129, 143, 148, 149, 152, 153, 165, 175, 176, 187, 195] , exercise plans and guidance [5, 100, 101, 129, 159] , break and stretching suggestions [10, 18, 23, 54, 81, 84, 177] , goal adjustment recommendation [2, 15, 41, 75, 120, 146] , behavior change tips [41, 85] , emergency services communication in case of injury [136] , or generally healthy living and self-care recommendations [73, 81, 167] . In the majority of the investigated articles, suggestions were limited to scripted tips and PA recommendations by the research team or PA experts. However, this onesize-fits-all approach is outdated in the era of ML and personalization, where suggestions can be micro-targeted and tailored to the needs of a specific user or group to increase their effectiveness.

Reminders are implemented via automated phone calls [52] , text messages [1, 25, 64, 115, 117, 147, 170, 175, 177, 184] , e-mails [20, 89, 159] , social media notifications [30] , random or just-in-time notifications [2, 2, 5, 23, 31, 40, 66, 67, 77, 89, 92, 111, 129, 194, 196] , watch reminders [83, 101] , and visual, audio or haptic prompts and in-app reminders [10, 18, 54, 108, 142] . Their purpose includes reminding users of goal-setting, wearable wear time and activity logging, application usage instructions, sedentariness levels, break times, and current PA levels. However, reminders can be a double edged sword. If not sent parsimoniously, they can be ignored or cause annoyance to the user. To be effective, a reminder should be sent at a time when a user is ready to receive it. Such reminders fall under the umbrella of Just-in-Time Adaptive Interventions (JITAIs), a field that has gathered scientific attention [70] and should be the focus of future work in the field of ST.

Similarity is achieved through the use of embodied conversational agents [16] , human avatars [5, 58, 59, 83, 85, 86, 118] , or the utilization of the user's physical location [102] . Human-like representations, such as avatars, have the potential to provide a user experience that resembles human-to-human interaction, triggering social responses from the users. In other words, a human-controlled avatar yields social presence, namely the perception that another individual is in the user's environment [109] . At the same time, the perceived "human control" that avatars seem to exert elicits stronger behavioral responses for users than the perception of machine control [56] .

Praise is achieved through motivational messages [1, 5, 16, 31, 37, 39, 52, 81, 83, 92, 112, 117, 128, 128, 129, 136] , feedback depending on goal achievement [58, 82, 94, 102, 123, 139] , or happy icons and emojis [141, 181] . Similarly to reminders, praise should not be too intrusive or repetitive, as it can become an annoyance to the user.

Liking takes the form of stylized, interactive displays [2, 33, 171] , customizable displays [195] , enhanced usability [31, 143] , imaginary scenery interfaces [58, 68, 76, 77, 187] , and user-tailored interface design [59, 102, 128] . Accessible and usable interfaces are of vital importance in the field of ST. Frequently, users access the applications while performing PA, e.g., at the gym or while running, which means that their design should be easy and intuitive. Complicated or confusing interfaces can be unappealing and may soon lose user interest.

Finally, Social Role refers to utilizing the system as an accountability mechanism, such as a virtual or human coach or physician [19, 25, 52, 99, 167, 177] , a peer leader [129] , or a virtual pet [17, 77, 86, 141, 171, 195] . Based on the model of supportive accountability [122] , accountability mechanisms, such as external human coaches, can foster motivation, encouragement, and ultimately adherence and UE.

In the PSD Framework, Social Support includes seven persuasive techniques: Social Comparison, Social Learning, Cooperation, Competition, Recognition, and Social Facilitation (See Table 1 ).

Social Comparison takes the form of public performance displays [8, 48, 106] , virtual and real-world competitors [9, 14, 102] , as well as performance sharing and comparison [2, 13, 26, 27, 36, 55, 80, 85, 120, 129, 131, 132, 146, 170, 177, 181, 194] . Users who are exposed to social comparison information desire to avoid the stigma of unhealthy behavior, such as decreased PA, and hence consider adapting their behavior to the majority rule [193] .

Cooperation is usually implemented through user team-ups (dyads or groups) [17, 20, 22, 26-28, 36, 48, 58, 59, 62, 77, 85, 86, 102, 105, 129, 131, 138, 140, 158] . The investigated articles reported better results when the groups consist of friends, colleagues or family members rather than random users. In other words, social connectedness is key for the success of cooperation-based interventions.

Social Learning is achieved via PA-related discussion forums [9, 73, 85, 86, 123, 167, 170] and social network groups [6, 13, 36, 40, 117, 148, 149, 173] , instant messaging functionality [2, 26, 27, 30, 58, 62, 105, 123, 167, 170, 194] , public profiles [2, 111, 146] , multi-player gaming mode [173] , real-world support [15, 77] , or peers' performance and experiences sharing [158, 178] . By sharing their experiences, users can seek support, motivation and feel less alone in their HBC journey. However, some of the investigated papers report that users with social ties (e.g., colleagues, family members) did not utilize the social functionality as much, but preferred face-to-face communication instead. Hence, the this technique's importance may vary depending on the use case.

Competition normally takes the form of individual and group-based PA competitions and challenges [17, 26, 27, 30, 36, 40, 48, 55, 59, 62, 69, 85, 86, 102, 105, 118, 131, 132, 150, 163, 182, 187, 195, 197] . In our article pool, competitions are reported as more effective when they consist of users with similar PA levels. Competitions with large PA differences between participants can be deemed too easy for the more advanced users and unattainable by the less active. Similarly, when it comes to one-to-one competitions, the investigated papers report that they were more effective when the involved users had similar PA levels and limited step differences throughout the day. Hence, the competitors of the user can be either real or contrived to match the user's PA behavior.

Recognition is implemented through competition leaderboards and podiums [9, 20, 27, 36, 40, 55, 62, 69, 77, 105, 118, 120, 150, 163, 167, 172, 182, 195, 197] , social network posts about winner teams or users [30] , physical awards [132] , and success stories testimonies [5, 20] . Evidently, Recognition is not a standalone intervention component but is usually combined with Competition or Social Comparison.

Social Facilitation examples include social network commenting on PA-related posts [13, 55, 58, 111, 117, 146, 172] , public testimonies [128] , virtual commodity swapping [125] , public participants' lists [69] , and user invitation schemes [102] . Note that Social Facilitation components can help increase UE by increasing user commitment. A user who performs a menial task, such as referring a friend to a fitness app, not only increases the user base, but in reality, invests in the product itself, creating stored value and promoting future use [49] .

Finally, Normative Influence is achieved via public PA-related commitments [128, 139] , presentation of the financial and environmental effects of inactivity [28, 128] , virtual and physical users demonstrating ideal PA behavior [85, 178] , and comparison against PA guidelines [2, 26, 27] and overall users' performance [20, 68, 102, 108, 140, 141, 181, 192] . For example, based on the investigated literature, users think twice before they break PA commitment pledges to family and friends on social media, since this might negatively affect their social image.

In the PSD Framework, Social Support includes seven persuasive techniques: Authority, Real-world Feel, Expertise, Trustworthiness, Surface Credibility, Third-party Endorsements, and Verifiability (See Table 1 ). However, in the included papers, we only encounter the first four persuasive techniques.

In this review's article pool, Authority takes the form of external accountability mechanisms, such as human coaches [52, 143] or physicians [5, 91, 112, 167] , guideline recommendation by international organizations (e.g., WHO, US Health Agency) [15, 100, 136, 177, 192] , as well as health organizations and committees behind the app creation [31, 113] . Having high-profile organizations behind a system or functionality gives additional credibility to the system and enables the users to trust it more.

Expertise takes the form of PA-related content curated by domain experts [5, 128] , communication with human coaches and physicians [19, 37, 58, 99] , or tech support portals [102] . Such functionality can be costly and thus is not commonly encountered in practice in scientific interventions.

Trustworthiness is implemented via regular updates [31, 195] , intensive testing and debugging [31] , and data handling based on current regulations (e.g., GDPR) [136, 187] . While we rarely encounter this technique in the included works, system security is fundamental for the trusted use of ST technologies.

Real-word Feel is achieved through counseling services with the researchers [99, 111, 136] , app store communication [31] , or in-app and website contact forms [59, 91, 196] .

Apart from PSD's persuasive techniques, we identify four additional techniques in the investigated papers: Goal-setting, Punishment, General Information, and Variability, which we explain further below.

Goal-setting is implemented as static or dynamic PA goals usually in terms of steps, active minutes or MVPA duration [2, 8, 9, 15, 16, 19-22, 24-31, 33, 37, 39, 41, 42, 48, 55, 58, 59, 64, 66, 67, 69, 71, 75, 76, 85, 86, 91-94, 96, 99-102, 108, 111, 117, 120, 121, 123, 125, 129, 138-141, 143, 146, 147, 159, 165, 167, 171, 175, 176, 180, 183, 185, 187, 195-197] . This technique can overlap with the Personalization technique regarding personalized goals or with the Reduction technique regarding multiple, simultaneous goals. However, it is more generic, covering also a large number of papers which utilize static goals, which follow the international guidelines for PA. Static goals though apply the one-size-fits-all approach and cannot adjust to the user's changing needs and wants. Thus, future research should focus on tailoring and personalizing goals to the users' reality, including, but not limited to, habits, physical condition, personal expectations, daily schedule and location.

Punishment takes the form of negative visual or textual feedback for under-performance [83, 94, 101, 141, 181, 192] , and virtual or monetary reward loss for goal accomplishment failure [29, 139] . While most of the investigated papers utilize positive feedback to promote HBC, it is unclear whether positive or negative feedback leads to more favorable BC in an HBC intervention. HBC theories make contradicting predictions regarding the influence of the feedback polarity [95] . In the PAST component, Praise has a higher PAST_score than Punishment based on our article pool results, but either can potentially yield positive results.

General Information provision takes the form of e-mails or notifications that do not necessarily reflect on the user's performance, rather than provide general information and interesting facts regarding PA and HBC [10, 30, 37, 41, 58, 82, 91, 102, 129, 141, 165, 167, 175, 192] .

Finally, Variability refers to the system's ability to provide a variable experience to the user through variable rewards [102, 125, 141] , variable game elements, e.g., levels, varying interfaces, and hidden tasks [58, 102, 129] . Practitioners have praised the power of variability in sustaining UE [49] , but it is still not fully utilized in ST for HBC research. The limited related papers report neutral results, but future research should focus more on identifying the effect of variability on HBC and UE.

Designing ST products is far from straightforward, which is proven by current ST technology's pitfalls (e.g., dubious effectiveness and high attrition rates) as discussed in Section 1, and the variety of persuasive techniques utilized in practice as demonstrated in Sections 5.1.1 to 5.1.5. The PSD Framework (Section 2.1), which we use to analyze the persuasive techniques applied in the included papers, aims to aid practitioners in this intricate design process by presenting 28 design principles for persuasive system content and functionality (Table 1) . However, PSD attaches the same weight to all 28 principles, ranging from self-monitoring to third-party endorsements. It is evident, though, that some principles (e.g., self-monitoring, reminders, personalization) may bear higher importance than others in ST technology. Here is where the PAST component comes in. Not all techniques are equally important: All design techniques in ST systems can be significant under different settings, and no single technique can guarantee success. However, some techniques are more efficient than others or may have specific characteristics that make them more or less frequently used in practice. To capture the importance of ST design techniques, we devise an evidence-based score for each technique based upon the investigated papers' results. Table 2 provides the necessary notation for the understanding of the score generation formulas. We define a value , for the result of an investigated paper that uses a technique as follows: −1 corresponds to a negative result, −0.5 a partially negative result, 0 a neutral result, 0.5 a partially positive result, and 1 a positive result. A result is considered positive if the respective paper's intervention led to a statistically significant increase in the users' PA. It is considered neutral if the intervention caused no statistical change in the users' PA and negative if it led to a statistically significant decrease in the users' PA. Finally, we consider a result partially positive or negative if the paper reported mixed results. We manually perform the numerical coding of the results. Note that if a paper reports positive results ( , = 1) and utilizes two different techniques 1 and 2 , then the positive result will be split between the two techniques, assuming all techniques have equal contribution to the final result; namely, 1, = 0.5 and 2, = 0.5.

Then, we define the efficacy of the technique , as the sum of the coded results of the papers the technique appears in divided by the number of these papers, and the frequency of the technique , as the number of papers that the technique appears divided by the total number of papers:

The final score for a technique is a combination of its reported efficacy and usage frequency in the papers:

where is a weight that can be used to emphasize the efficacy or the frequency metric. We deemed this weighted combination necessary to avoid over-rewarding infrequent techniques for individual positive results or over-rewarding frequent techniques only for their prevalence in the investigated papers. In the remainder, for presentation purposes, we set = 0.5 to achieve a balance between the measured quantities and normalize the PAST_score in the [0, 1] interval. The Periodic Table of Successful HBC elements: We calculate the PAST_scores for the techniques in the studied papers and create the Periodic Table of Successful Health Behavior Change, as seen in Figure 9 . Presenting these elements in a Periodic Table is human-friendly and easily memorizable. At the same time, it offers a comprehensive and formal classification, describing a user interface and system features design process.

Specifically, each element in the table represents a PSD technique (Table 1) or additional persuasive techniques encountered in the investigated papers, namely, goal-setting, punishment, general information provision, and variability (32 elements in total); as discussed in detail in Sections 5.1.1 to 5.1.5. Elements in our periodic table are divided into five categories (Primary Task Support, Dialogue, Social Support, System Credibility, Other), and are organized in columns, respectively. Each cell contains an abbreviation for the technique, the technique's name, as well as the technique's PAST_score. Finally, we normalize the scores in a [+1, +5] interval for the periodic table as follows: scores ∈ [0, 0.2) → +1, scores ∈ [0.2, 0.4) → +2, scores ∈ [0.4, 0.6) → +3, scores ∈ [0.6, 0.8) → +4, and scores ∈ [0.8, 1) → +5. Each technique is color-coded based on its normalized PAST_score, and the PAST_score is presented at the top-right corner of each cell. Similar to the Periodic Table of Elements, PAST aims to provide a comprehensive, evidence-based, and robust standard for designing HBC interventions. Our PAST_score aims at guiding creators in the design process of ST technology, by helping them prioritize the development of certain ST features and functionalities (based on their expected impact on HBC and UE). Analysis and Insights: Based on our periodic table, we analyze the reviewed techniques' scores, obtaining insights on their importance. Note that the score ranges between +1 and +5, with higher scores holding more persuasive power for HBC, e.g., a technique with a score of +5 can be seen as more important than one with a +1. Nevertheless, typically a combination of different techniques should be used in practice for an efficient system. First, in the periodic table of Figure 9 , we observe that not all categories are as frequent; Primary Task Support is the most frequent (91% of investigated articles), followed by Dialogue (80%), Social Support (42%), and System Credibility (9%). The disproportionate frequencies already indicate the relative importance of certain feature categories (e.g., Primary Task Support) over others serving a complementary role (e.g., System Credibility).

Within the Primary Task Support category, Self-monitoring is the most effective technique (+5), followed by Personalization (+4), Reduction (+3), Tailoring, and Tunneling (+2), and finally, Simulation and Rehearsal (+1). In the Dialogue category, Rewards are the most effective persuasive technique (+5), followed by Reminders (+4), Suggestion and Similarity (+3), and finally, Praise, Liking, and Social Role (+2). In the Social Support category, Social Comparison, Social Learning, and Cooperation share similar weight (+3), followed by Competition, Recognition, and Social Facilitation and Normative Influence (+2). In the System Credibility Category, Authority is the most effective technique (+2) (even though it still holds limited persuasive power), followed by Real-world Feel, Expertise, and Trustworthiness (+1). In contrast, the Surface Credibility, Third-party Endorsements, and Verifiability techniques are not explicitly mentioned in any investigated articles. Note that the very low frequency of these techniques in the investigated papers and the lack of research on the persuasive power of system credibility features might be responsible for the low importance of this category's techniques. Finally, among the techniques that do not fall under the PSD framework, Goal-setting is the most effective (+5), followed by Punishment (+2), General Information provision (+1), and Variability (+1).

For presentation purposes, in our discussion, we use a weight = 0.5 for the PAST_score. However, the PAST_score weights can be adjusted according to application needs, emphasizing the efficacy or frequency aspects. To enable researchers and practitioners to adjust the PAST component according to their application requirements, we offer an online, interactive exploration tool for weight adjustment, filtering and real-time visualization of the adapted PAST_scores, where the user can explore how different weights affect the positioning of persuasive techniques in the PAST table.

The PAST Component helps practitioners and researchers design evidence-based ST technology to maximize its effect on user HBC. To verify this contribution, we need a way to measure user experience, beyond simple measurement of PA changes [161] . Notice that the user experience incorporates different facets of the user's behavior, such as the user's thoughts and emotions, their PA and behavioral patterns, and the user's interaction with the system and the environmental factors that affect it. However, as discussed in Section 1, there is no standard approach for evaluating the effects of ST interventions on user habits and UE. Moreover, as seen in Section 2.2, current UE metrics focus on web or sometimes mobile applications rather than wearable devices. Also, existing metrics focus on quantifying the users' purchasing or browsing behavior rather than their health behavior. This is where the SELF component of our PAST SELF framework comes in.

To find out which measures and metrics can be utilized to evaluate ST technology, we identified and extracted all measured quantities mentioned in the investigated papers (e.g., self-reported data, PA data, UE data, metadata) and categorized them based on four UE aspects (inspired by the work of Lalmas et al. [98] discussed in Section 2.2): the Perceived Self Aspect, the Physical Self Aspect, the Behavioral Self Aspect, and the Environmental Aspect. The Perceived Self Aspect (See SELF Table 3 ) refers to the user's self-reported image of their life regarding their everyday experiences, as well as psychological, technological, social, and health factors, and is usually measured through qualitative evaluation methods. Hence, in our work, it encompasses all self-reported data utilized in our article pool. The Physical Self Aspect (See SELF Table 4 ) refers to the user's physical reaction to the interaction with system, which in ST technology can be for example interpreted as the PA performed in response to the system's intervention. Thus, it includes all PA-related quantities from the investigated articles. The Behavioral Self Aspect (See SELF Table 5 ) refers to the user's behavioral response to the system, which in ST takes the form of UE metrics, such as wear-time, and session duration. Finally, the Environmental Aspect (See SELF Table 6 ) refers to external factors that may affect the user's interaction with the system, such as weather or location. Hence, we identified and incorporated in this aspect all metadata reported in the investigated articles. Note that these aspects draw from and adapt Lalmas' work to the realities of ubiquitous computing and ST technology, and they expand it to account for the particularity of the domain. For example, we introduce a novel aspect of ubiquitous UE, namely the "Environmental Aspect", to incorporate various external factors that influence a user's behavior and interaction with the ST system, such as the weather or their daily schedule. These factors might not be central to a user's interaction with a traditional computer system. However, they are detrimental to the user's PA behavior and, subsequently, their interaction with ST technology.

This categorization has led to the creation of the Self-Tracking Evaluation Framework (SELF). SELF is a novel tool for the standardization of the evaluation process of ubiquitous ST technology-based interventions. Each aspect of the SELF component is shown here through a SELF Table. Each SELF Table contains the measures and metrics related to its respective aspect. Note that the SELF Tables are not supposed to be exhaustive lists of all potential ST metrics. On the contrary, they constitute clear and organized presentations of the most commonly used metrics in the ST literature, as identified during our review process.

SELF Table 3 presents the measurable concepts related to the Perceived Self Aspect of UE. ST and HBC research is highly interdisciplinary and almost always utilizes self-reports. We identified that these reports are centered around five main factors: User Factors, Psychological Factors, Human-Computer Interaction Factors, Social Factors, and Health Factors. Each Factor consists of the following measurable concepts:

(1) User factors include concepts such as technological habits & competency [9, 14, 25, 30, 33, 39, 61, 64, 75, 83, 115, 128, 153, 170, 171, 180, 184, 187, 195] , PA habits & competency [2, 5, 8, 15, 20, 21, 28, 33, 40, 42, 52-54, 73, 76, 80, 81, 83, 85, 86, 90, 91, 93, 96, 112, 118, 128, 138, 148, 150, 159, 165, 171, 181, 182, 184, 186, 189, 195, 196] , sedentary habits [73, 85, 86, 148] , dietary habits [30, 149] , daily life patterns [10, 15, 24, 93, 171, 187, 191] , time perspective [36, 71] , and habit formation [93, 172] . Such information can be utilized to contextualize the system to the user's reality, for example, by providing personalized suggestions and JITAIs based on the user's schedule and daily habits. (2) Psychological factors include concepts, such as personality traits [93, 106, 132, 177] , self-efficacy [28, 52, 91, 115, 116, 120, 149, 159, 162, 165, 178] , the user's stage of behavior change [1, 15, 28, 33, 39, 67, 68, 73, 106, 115, 116, 128, 152, 158, 171] , behavioral self-regulation [117, 120, 150, 162] , motivation [20, 76, 77, 111, 118, 149, 178] , emotional state [39, 152, 159] , user goals & expectations [15, 52, 93, 120, 131, 152] , attitudes & intentions toward HBC [9, 10, 15, 20, 24, 28, 59, 71, 80, 85, 93, 93, 111, 117, 120, 143, 148-150, 152, 159, 162, 165, 182, 184, 196] , attitudes toward technology [111, 184] , as well as attitudes toward one's appearance [116] . This factor is vital in understanding the user behavior regarding ST, as human psychology influences human behavior. For instance, applied research has reported "dramatic improvements in recruitment, retention, and progress using [BC] stage-matched interventions" [151] . (3) HCI factors include system usability [2, 5, 8, 10, 23, 83, 120, 131, 143, 165, 170, 177, 180, 187, 197] , system utility [15, 142, 180] , and user expectations & satisfaction [14, 22, 48, 64, 66, 69, 85, 86, 91, 100, 101, 112, 120, 132, 142, 148, 149, 153, 162, 175, 178, 197] . Such measures enable researchers to evaluate the effect of different features and UI components on HBC and UE, e.g., through A/B Testing. (4) Social factors measure concepts, such as social support [15, 21, 28, 36, 48, 52, 71, 149, 165] , social influence & norms [36, 80, 91, 120] , social comparison tendencies [9] , and group cohesion & closeness [15, 48, 178] . While the focus of ST technology has been on individual HBC, research supports that ST is a profoundly social practice [110, 161] , and hence social factors play an important role on a user's HBC journey. (5) Health factors incorporate concepts such as physical health [22, 25, 54, 71, 91, 117, 123, 148, 159, 186, 196] , mental health [1, 5, 16, 25, 36] , health literacy [16] , and quality of life [25, 29, 53, 93, 102, 117, 148] Self-reports can provide rich and meaningful information through user feedback, opening up new perspectives on user behavior. Qualitative data can reveal subtle yet critical design pitfalls, which, if discovered promptly, can help researchers improve the effectiveness of a system during user interventions. Hence qualitative studies are critical in the early stages of development of ST technology, and user interventions can be utilized once the system is mature enough to demonstrate effectiveness in the real-world [88] .

Upon handling qualitative data, though, one has to accept that the results may be intertwined with the experience of the researchers and that the findings might not generalize as well as large-scale quantitative studies [65] . Additionally, a limitation of self-reports is that they rely on the user input. However, grabbing the users' attention and requesting their time can be challenging in an era of attention scarcity [57] . Thus, future work could focus on automating the extraction of knowledge related to some factors of the Perceived Self Aspect of UE. For instance, could an ML model predict the user's motivation or their attitude toward PA?

5.2.2 The Physical Self Aspect of UE. SELF Table 4 presents the metrics related to the Physical Self Aspect of UE, with a focus on PA. Hence, in the investigated papers, we identified PA-related measures capturing generic PA characteristics (e.g., Moderate to Vigorous Physical Activity (MVPA) duration, energy expenditure, heart rate) [5, 6, 9, 10, 14, 15, 17-19, 22, 23, 28, 30, 31, 33, 36, 39, 40, 53, 59, 64, 69, 73, 76, 76, 83, 86, 89-91, 101, 102, 106, 111, 112, 117, 118, 120, 131, 132, 142, 143, 148, 148, 149, 152, 153, 159, 162, 165, 175, 177, 183, 184, 186, 187, 189, 191, 194, 195] , walking-specific characteristics (e.g., step count, floor count) [1, 2, 6, 8-10, 15-17, 20-31, 36, 39, 42, 48, 52, 54, 55, 59, 61, 66, 68, 69, 71, 73, 75-77, 80, 81, 85, 89, 91, 93, 94, 96, 100, 105, 106, 111-113, 115-117, 120, 121, 123, 125, 128, 138-141, 146-150, 152, 153, 158, 159, 163, 165, 170-172, 175, 176, 178, 181-184, 186, 187, 189, 191, 194, 196] , and running-specific characteristics (e.g., running velocity, lap time) [13, 153] . In addition, we identified measurable concepts related to sedentariness (e.g., sedentary bout duration, number of sit-to-stand transitions) [10, 18, 53, 64, 73, 142, 153] or goal accomplishment (e.g., frequency of goal compliance) [8, 15, 71, 75, 108, 138-140, 158, 159, 194, 196, 197] .

SELF Table 5 presents the metrics related to the Behavioral Self Aspect of UE. Usage analytics are commonly utilized to quantify users' behavior concerning the system. Usage metrics can be split into Intra-session metrics and Inter-session metrics. Inter-session metrics are more common in our article pool, and they usually take the form of wear time for activity trackers or total usage time for apps. However, both intra-session and inter-session metrics give us valuable information related to UE. Currently, the majority of the investigated papers do not incorporate this aspect of UE. However, future research should focus more on exploring such metrics' effect on the user's HBC journey. Intra-session metrics measure the user's UE with the ST technology during an individual session. A session is a set of user interactions with a system that takes place within a given time frame. Optimizations can take place based on intra-session metrics, with increasing complexity from "Feature" granularity (e.g., number of accesses of the individual PA goal-setting functionality) to "Session" granularity (e.g., number of PA goals set after a system reminder). Intra-session metrics can be categorized into three groups, Involvement metrics (e.g., session duration, screen views) [24, 66-68, 76, 102, 165, 170, 195] , Interaction metrics (e.g., number of accepted notifications, notification response time) [2, 23, 67, 68, 73, 76, 83, 89, 91, 117, 125, 142, 152, 165, 170, 175, 177, 187, 195] , and Contribution metrics (e.g., number of User-Generated Content (UGC) posts) [6, 73, 93, 108, 117, 191, 195] . Intersession metrics measure the user's loyalty to ST technology. They include Session to Session metrics, which capture the time between two sessions, and Session to Extended Period metrics, which capture longer periods, such as weeks, months, or years. Session to Session metrics [66] [67] [68] include metrics, such as time between sessions (absence time) or MVPA between sessions. Session to Extended Period metrics [2, 6, 9, 10, 13, 15-17, 24, 30, 36, 48, 61, 64, 66-69, 73, 75-77, 91, 93, 102, 105, 111, 120, 123, 125, 132, 146, 147, 163, 165, 170, 175, 178, 186, 187, 191, 194, 196, 197] , include, among others, number of valid wear days, wear time and total usage. Metrics marked with an asterisk indicate that they are primarily targeted to wearable technology. Notice that the purpose of the SELF component and the respective documentation is not to provide definitions for all used metrics. Our goal is to provide practitioners and researchers with a summarized and concise idea of the metrics utilized in the investigated papers, most of which are widely used and are easily identified through the citation list or a search engine.

Finally, SELF Table 6 presents the Environmental Aspect of UE. It outlines the external constraints that the investigated papers have taken into account in the study of ST technology for HBC. In terms of User Constraints, articles have utilized the users' daily schedule [23, 31, 66, 152, 153, 165, 180] , daily commute patterns [31, 66] , anthropometrics [9, 21, 22, 24, 25, 28, 53, 54, 59, 69, 76, 90, 91, 102, 111, 116, 123, 147-149, 153, 170, 178, 194, 196] , and cardiorespiratory fitness [25, 90, 148] . Regarding Social Constraints, articles have used the users' physical and virtual social network [2, 13, 26, 27, 55, 80, 96, 102, 108, 182] , as well as their social interactions [2, 6, 9, 13, 26, 27, 30, 55, 117, 182] . Also, Geographical Constraints have been utilized to contextualize user content and explain user behavior, including geolocation [17, 31, 42, 66, 67, 89, 102, 125, 146, 152, 153, 165, 176] , Indoor Location [23, 176] , visited places [66, 146] , and weather [89, 176, 186] . Finally, in terms of Time Constraints, time and day of the week [89] , as well as bank holidays [89] have been utilized in the investigated articles. Various studies in our article pool reported differences in performed PA between interventions conducted during spring versus interventions conducting in winter, as well as drops in PA during bank holidays. Such findings highlight the effect a user's environment has on HBC and performed PA; hence, it should be taken into consideration for future research.

Now that we have showcased both components of the PAST SELF Framework in Sections 5.1.6 and 5.2, we proceed to demonstrate how practitioners and researchers can apply them in practice. To this end, we present in Figure 10 a use case scenario with the timeline of a regular workday of an ST technology user. The timeline consists of five distinct lanes. The first four represent the four aspects of UE, and the last one includes the design techniques utilized in the development of an ST technological product. The dotted lines represent a timestamped event in our scenario. The parallel timelines showcase how the PAST component interacts with the SELF component and how UE's different aspects are interwoven. As stated previously, our scenario assumes a day in an ST user's life, which consists of various groups of events: • At 08:00, the user arrives at work. The system identifies the change in the user's location (Environmental Aspect of UE). It then sends the user a reminder on their wearable device to take the stairs instead of the elevator (Reminder Design Technique).

• After a few hours, at 11:00, the system detects prolonged sedentary activity (Physical Self Aspect of UE). Thus, a push notification from their ST app suggests that the user take a short walk (Suggestion Design Technique).

Upon seeing the notification, the user accepts the prompt (Behavioral Self Aspect of UE) and completes a 10-min walking bout by 11:30 (Physical Self Aspect of UE). The user's acceptance can help the system improve itself by learning the appropriate interruption times based on the user's schedule. • Based on a sunny weather prediction for the afternoon, the system recommends a personalized route to home for the user (Personalization, Reduction & Suggestion Design Techniques). The recommendation happens upon identifying a change in the user's environment at 14:00 when the user leaves work. However, the user soon rejects the prompt (Behavioral Self Aspect of UE) and takes a bus (Environmental Aspect of UE). The ST technology developer can interpret this rejection in various ways; the user might go to the gym after work rather than home, or the suggested route might be too long or too dangerous for the user. In any case, identifying this user behavior can help the system improve itself by suggesting alternative routes or sending earlier notifications to help the user mentally prepare for this schedule change. • Finally, at 18:00, the system sends a Momentary Ecological Assessment (EMA) to the user, which takes the form of a push notification from the ST app. The EMA's goal is to evaluate the user's satisfaction levels with today's notifications' timing. The user completes the short survey prompt by 18:05, which means a 5-min EMA response time (Behavioral Self Aspect of UE). This fast response time might signify that early evening might be convenient for the user to complete short feedback forms (Real-world Feel Design Technique). been used in practice more frequently than others. While commonly used techniques, such as Goal Setting or Self-monitoring, are proven to have high effectiveness, infrequent but promising techniques, such as Similarity and Variability, deserve more scientific attention. We believe that this review and its comprehensive presentation of HCI feature design can guide ST researchers and practitioners in choosing the PSD techniques that are more suitable for their use case and implementing them in real-world systems. HCI Evaluation & UE Quantification: While UE with a single system does not necessarily imply sustained HBC, it is evident that unsuccessful or ultimately abandoned ST technology cannot assist the user in their HBC journey. To monitor and improve the quality of the HCI with the ST system, we need a way to quantify UE. However, UE is a multi-faceted concept that goes beyond simply measuring PA or health changes. It encompasses various aspects of the user's persona and interaction with the system, such as the user's thoughts and affective states, health habits and behavioral patterns, or the environmental factors that affect the user's interaction with the system. This review provides a comprehensive list of multi-faceted UE evaluation metrics that have been used in the related literature. Its goal is to assist ST researchers and practitioners with accompanying the users on their ST journey through constant, iterative evaluation and adaptation of the ST system. Sample Size & Intervention Duration: The majority of the investigated studies have limited sample size and duration, which might undermine the generalizability of their results and highlights a need for large-scale studies. However, it is important to note that the majority of the studies are restricted by the time and budget available to the researchers; hence small-scale studies are more realistic given such restrictions. Additionally, small-scale studies are more suitable for rapid prototyping and testing in the early stages of the development of ST technology solutions for HBC. It might take several years to demonstrate sustained HBC, but surely, quantifying indications of HBC in the short term can be more straightforward. Nevertheless, in later stages of technology development, large-scale studies can help build more robust and reliable ST technology. Moreover, they enable the collection of greater amounts of data for service tailoring and personalization through big data analytics and A/B testing. Cohorts, Equity & Accessibility: As our results illustrate, there is a lack of research on how different subgroups of users (e.g., race, age, gender, health status) interact with ST technology, presenting an opportunity for future work. Future research in the field should focus on equitable access to ST technology for HBC, taking into account the socioeconomic status, variable health, and technological literacy of cohorts, while being sensitive to their cultural and language needs. Similarly, a promising direction for future research is the accessibility of ST technology, which we have not encountered in the included primary studies. Accessibility should go beyond addressing physical usability to an in-depth analysis of the population's needs in terms of disabilities.

It is important to note that such user subgroups frequently come from disadvantaged or minority population segments; hence research in the field should always raise questions concerning ethical concepts. The PAST SELF Framework: Our article pool utilizes a variety of design and evaluation techniques and metrics. This lack of standardization makes it difficult to compare results, evaluate each work's contribution, and obtain insights. Our PAST SELF framework brings together techniques utilized in previous works and comprehensively organizes them to guide future studies. Specifically, we propose the Periodic Table of Self-Tracking Design (PAST), which showcases the most common design techniques for ST technology along with their expected efficacy. Also, we propose the Self-Tracking Evaluation Framework (SELF) which presents a comprehensive list of evaluation metrics organized under four key dimensions of UE. PAST SELF can help ST practitioners and researchers design and quantify the user experience to make more informed decisions for future interventions. Nevertheless, the role of the PAST SELF framework is, in reality, complementary. We recommend that interested parties should combine knowledge from our review and framework, as well as domain experts and the users' themselves through participatory design and citizen science practices. Open Science & Open Data: To ensure the PAST SELF framework's maintainability and abide by the FAIR Data Principles [188] , we make our corpus of primary studies publicly available through GitHub [190] . Our aim for this repository is twofold: (i) It can serve as a live dataset of ST technology interventions for HBC and UE, where new studies will be added, thus keeping up with technological advances in ST for HBC. To this end, we encourage researchers and practitioners to share their work with the research team and contribute to this public repository. (ii) Due to its detailed information, our dataset can facilitate further experimentation, analysis, and derivation of additional metrics. Online Exploration Tool. To overcome the pitfalls of static reporting, we have created an online, interactive exploration tool for the PAST SELF framework 6 . This tool enables researchers and practitioners to adjust the PAST component in real-time according to their application requirements through various filtering and weight adjustment options. This way, a researcher can tailor the PAST score according to their application needs and for specialized cohorts, or they can choose to favor large-scale, long-term studies over small-scale, short-term studies and the opposite. Such filtering options eliminate the potential danger of authors' critique or biases affecting the reported PAST_score. On the contrary, through the interaction with the tool, the user can explore if and how different weights and filters affect the positioning of persuasive techniques in the PAST table. Surely, the entire field of ST for HBC is quite broad and cannot be covered by a single literature review. While our review primarily focuses on the design and evaluation aspects for the general population, future studies could focus on exploring the literature related to different sample groups to provide specialized versions of the PAST SELF framework for each user segment with varying characteristics. Our adaptive exploration tool is the first step towards this direction. Future studies could also concentrate on HBC's different aspects, such as stress management, smoking cessation, or disease control. PA is only the beginning of the possibilities ST devices have in capturing human data. Moreover, due to the lack of evaluation standardization, the strict criteria required to conduct a meta-analysis would significantly limit our article pool. For example, various studies did not report baselines before the intervention, while others did not report results for both control and intervention groups. However, given a standardized reporting format in the future, a meta-analysis could provide valuable information about the efficacy of different ST design techniques. Finally, this review has left out some cutting-edge aspects of ST technology that are still not present in the related literature. For instance, only a limited number of papers utilize ML models for personalizing the user experience is ST. Hence, a categorization of such methods would not make sense at this point. However, future studies should focus on taking advantage of the state-of-the-art in ML to provide more customized ST products to the users. At the same time, new types of sensors and related functionality are incorporated into ST devices, such as ECG, fall detection, SpO 2 sensors, and integration with popular voice assistants, which may transform the field of ST into a complete health tracking experience. Future work should study if these advancements affect the users' HBC journey and UE with the technology itself.

This project has received funding from the European Union's Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie grant agreement No 813162. The content of this paper reflects only the authors' view and the Agency and the Commission are not responsible for any use that may be made of the information it contains. The authors would also like to thank T. Valk and S. Karamanidis, for their contribution to the development of the PAST SELF tool.

Text to move: A randomized controlled trial of a text-messaging program to improve physical activity behaviors in patients with type 2 diabetes mellitus

A Persuasive and Social mHealth Application for Physical Activity: A Usability and Feasibility Study

Mobile Phone-Based Persuasive Technology for Physical Activity and Sedentary Behavior: A Systematic Review

Trends in Persuasive Technologies for Physical Activity and Sedentary Behavior: A Systematic Review

Helping the elderly with physical exercise: Development of persuasive mobile intervention sensitive to elderly cognitive decline

Online actions with offline impact: How online social networks influence online and offline user behavior

Influence of Pokémon Go on physical activity: study and implications

Extending a gamified mobile app with a public display to encourage walking

Addressing barriers to physical activity among women: A feasibility study using social networking-enabled technology

Evaluation of stAPP: A smartphone-based intervention to reduce prolonged sitting among Belgian adults

Towards a science of user engagement (position paper)

Reflections on 5 years of personal informatics: Rising concerns and emerging directions. Conference on Human Factors in Computing Systems -Proceedings 07-12

Run forrest run!": Measuring the impact of App-enabled performance and social feedback on running performance

Physical activity motivating games: Virtual rewards for real activity

Combining activity trackers with motivational interviewing and mutual support to increase physical activity in parent-adolescent dyads: Longitudinal observational feasibility study

A randomized controlled trial of an automated exercise coach for older adults

Examining the Feasibility of Smartphone Game Applications for Physical Activity Promotion in Middle School Students

B-MOBILE -A smartphone-based intervention to reduce sedentary time in overweight/obese individuals: A within-subjects experimental trial

An e-coaching ecosystem: design and effectiveness analysis of the engagement of remote coaching on athletes

Enhancing participation in a national pedometer-based workplace intervention amongst staff at a Scottish university

Do they need goals or support? A report from a goal-setting intervention using physical activity monitors in youth

Building a physical activity intervention into clinical care for breast and colorectal cancer survivors in Wisconsin: a randomized controlled pilot trial

BreakSense: Combining physiological and location sensing to promote mobility during work-breaks

The Positive Impact of Push vs Pull Progress Feedback

A pilot comparative study of one-way versus two-way text message program to promote physical activity among people with severe mental illness

HealthyTogether: Exploring social incentives for mobile fitness applications

Social incentives in pervasive fitness apps for obese and diabetic patients

Does having a buddy help women with young children increase physical activity? Lessons learned from a pilot study

Loss-framed financial incentives and personalized goal-setting to increase physical activity among ischemic heart disease patients using wearable devices: The ACTIVE REWARD randomized trial

Tweeting to Health: A Novel mHealth Intervention Using Fitbits and Twitter to Foster Healthy Lifestyles

Active 10: Brisk walking to support regular physical activity

Design requirements for technologies that encourage physical activity

Flowers or a robot army?: Encouraging awareness & activity with personal, mobile displays

Goal-setting considerations for persuasive technologies that encourage physical activity

Activity sensing in the wild: A field trial of UbiFit Garden

A feasibility study of 'The StepSmart Challenge' to promote physical activity in adolescents

Effect of adding telephone-based brief coaching to an mhealth app (Stay Strong) for promoting physical activity among veterans: randomized controlled trial

Self-determination theory: A macrotheory of human motivation, development, and health

WalkMore: Promoting walking with just-in-time context-aware prompts

User engagement and attrition in an app-based physical activity intervention: Secondary analysis of a randomized controlled trial

Effectiveness of a 3-month mobile phone-based behavior change program on active transportation and physical activity in adults: Randomized controlled trial

Physical activity behavior change driven by engagement with an incentive-based app: Evaluating the impact of sweatcoin

A Quantified Past: Toward Design for Remembering With Personal Informatics

Mapping and Taking Stock of the Personal Informatics Literature

Beyond abandonment to next steps: Understanding and designing for life after personal informatics tool use

From "nobody cares" to "way to go!": A design framework for social sharing in personal informatics

A lived informatics model of personal informatics

FitAware: Mediating group fitness strategies with smartwatch glanceable feedback

Hooked: How to build habit-forming products

How sample size influences research outcomes

Increasing physical activity with mobile devices: a meta-analysis

A walking intervention for postmenopausal women using mobile phones and interactive voice response

Design and baseline characteristics of participants in the TRial of Economic Incentives to Promote Physical Activity (TRIPPA): A randomized controlled trial of a six month pedometer program with financial incentives

Mobile App to Reduce Inactivity in Sedentary Overweight Women

Motivating physical activity at work: Using persuasive social media extensions for simple mobile devices

Avatars versus agents: a meta-analysis quantifying the effect of agency on social influence

The economy of attention

Mila blooms: a mobile phone application and behavioral intervention for promoting physical activity and a healthy diet among adolescent survivors of childhood cancer

Improving pacific adolescents' physical activity toward international recommendations: Exploratory study of a digital education app coupled with activity trackers

Deconstructing the Fitbit IPO and S-1. Rock Health. Retrieved

MobileKids Monster Manor") to Promote Physical Activity among Children. Games for

MobileKids Monster Manor") to Promote Physical Activity among Children. Games for

Bill Gates: My plan to fix the world's biggest problems

Outcomes of a text message, Fitbit, and coaching intervention on physical activity maintenance among cancer survivors: a randomized control pilot trial

Understanding users' disengagement with wearable activity trackers

How do we engage with activity trackers? a longitudinal study of habito

Exploring the Design Space of Glanceable Feedback for Physical Activity Trackers

Gamifying accelerometer use increases physical activity levels of sedentary office workers

A systematic review of just-in-time adaptive interventions (JITAIs) to promote physical activity

A Trial of Financial and Social Incentives to Increase Older Adults' Walking

A Framework for Generating Summaries from Temporal Personal Health Data

A mobile health intervention for weight management among young adults: A pilot randomised controlled trial

Mind the theoretical gap: Interpreting, using, and developing behavioral theory in HCI research. Conference on Human Factors in Computing Systems -Proceedings

Supporting Users in Setting Effective Goals in Activity Tracking

Effectiveness of a behavior change technique-based smartphone game to improve intrinsic motivation and physical activity adherence in patients with type 2 diabetes: Randomized controlled trial

A Family Health App: Engaging Children to Manage Wellness of Adults

Social media use and adolescent mental health: Findings from the UK Millennium Cohort Study

Personal informatics, self-insight, and behavior change: a critical review of current literature

Harnessing social dynamics through persuasive technology to promote healthier lifestyle

Text messaging to motivate walking in older african americans: A randomized controlled trial

The Effect of a Mobile and Wearable Device Intervention on Increased Physical Activity to Prevent Metabolic Syndrome: Observational Study

StretchArms: Promoting Stretching Exercise with a Smartwatch

Promoting physical activity using a wearable activity tracker in college students: A cluster randomized controlled trial

Harnessing Different Motivational Frames via Mobile Phones to Promote Daily Physical Activity and Reduce Sedentary Behavior in Aging Adults

Effects of three motivationally targeted mobile device applications on initial physical activity and sedentary behavior change in midlife and older adults: A randomized trial

Guidelines for performing systematic literature reviews in software engineering

How to evaluate technologies for health behavior change in HCI research

Efficacy of contextually tailored suggestions for physical activity: A micro-randomized optimization trial of heart steps

Effects of eHealth physical activity encouragement in adolescents with complex congenital heart disease: The PReVaiL randomized clinical trial

Selftracking of Physical Activity in People with Type 2 Diabetes: A Randomized Controlled Trial

Increasing physical activity in Cancer Survivors through a Text-messaging Exercise motivation Program (ICanSTEP)

Adaptive step goals and rewards: a longitudinal growth model of daily steps for a smartphone-based walking intervention

Using feedback to promote physical activity: The role of the feedback sign

Using feedback to promote physical activity: the role of the feedback sign

A Cluster-Randomized Trial on Small Incentives to Promote Physical Activity

Mobile and wearable sensing frameworks for mHealth studies and applications: a systematic review

Measuring user engagement

Feasibility and acceptability of a counseling-and mHealth-based physical activity intervention for pregnant women with diabetes: the fit for two pilot study

Personalization revisited: A reflective approach helps people better personalize health services and motivates them to increase physical activity

Promoting Stretching Activity with Smartwatch -A Pilot Study

Feasibility of Gamified Mobile Service Aimed at Physical Activation in Young Men: Population-Based Randomized Controlled Study (MOPO)

A stage-based model of personal informatics systems

Using context to reveal factors that affect physical activity

A mobile health team challenge to promote stepping and stair climbing activities: Exploratory feasibility study

Pediluma: Motivating physical activity through contextual information and social influence

Fish'n'Steps: Encouraging physical activity with an interactive computer game

The dark side of positive social influence

At the heart of it all: The concept of presence

Self-tracking cultures: Towards a sociology of personal informatics

Feasibility and Acceptability of a Wearable Technology Physical Activity Intervention With Telephone Counseling for Mid-Aged and Older Adults: A Randomized Controlled Pilot Trial

mActive: A randomized clinical trial of an automated mHealth intervention for physical activity promotion

An Incentivized, Workplace Physical Activity Intervention Preferentially Increases Daily Steps in Inactive Employees

Persuasive technology in mobile applications promoting physical activity: a systematic review

Combining temporal and spectral information with spatial mapping to identify differences between phonological and semantic networks: a magnetoencephalographic approach

The effectiveness of an incentivized physical activity programme (Active student) among female medical students in Pakistan: A randomized controlled trial

A Fitbit and Facebook mHealth intervention for promoting physical activity among adolescent and young adult childhood cancer survivors: A pilot study

Race Yourselves: A Longitudinal Exploration of Self-Competition Between Past, Present, and Future Performances in a VR Exergame

The behavior change technique taxonomy (v1) of 93 hierarchically clustered techniques: building an international consensus for the reporting of behavior change interventions

The Use and Effects of an App-Based Physical Activity Intervention "Active2Gether" in Young Adults: Quasi-Experimental Trial

Evaluating the carrot rewards app, a population-level incentive-based intervention promoting step counts across two canadian provinces: Quasi-experimental study

Supportive accountability: a model for providing human support to enhance adherence to eHealth interventions

Effect of Adding Online Social Support Tools to an Adult Walking Program: A Pilot Randomized Controlled Trial

Personalization in Real-Time Physical Activity Coaching using Mobile Applications: A Scoping Review

Stickers for steps: A study of an activity tracking system with face-to-face social engagement

The effectiveness of e-& mHealth interventions to promote physical activity and healthy diets in developing countries: a systematic review

Personal informatics in interpersonal contexts: Towards the design of technology that supports the social ecologies of long-term mental health management

Text messages for encouraging physical activity: Are they effective after the novelty effect wears off

Emmanuel Stamatakis, and Cecilie Thøgersen-Ntoumani. 2021. Development and feasibility of a mobile phone application designed to support physically inactive employees to increase walking

Technology's influence on physical activity and exercise science: the present and the future

Towards health exercise behavior change for teams using life-logging

Evaluation of the impact of extrinsic rewards on user engagement in a health promotion context

A systematic framework for designing and evaluating persuasive systems

Global action plan on physical activity 2018-2030: more active people for a healthier world

Global recommendations on physical activity for health. World Health Organization

Mobile health intervention promoting physical activity in adults post cardiac rehabilitation: pilot randomized controlled trial

Inside Wearables Part 1: How behavior change unlocks long-term engagement

Individual Versus Team-Based Financial Incentives to Increase Physical Activity: A Randomized, Controlled Trial

Effect of a game-based intervention designed to enhance social incentives to increase physical activity among families: The BE FIT randomized clinical trial

A Randomized Trial of Social Comparison Feedback and Financial Incentives to Increase Physical Activity

Increasing physical activity in older adults using STARFISH, an interactive smartphone application (app); a pilot study

Acceptability of smartphone technology to interrupt sedentary time in adults with diabetes

Assessing the user experience design as a persuasive methodology in a real world sport application

Factors Related to Cognitive, Emotional, and Behavioral Engagement in the Online Asynchronous Classroom

Digital behaviour change interventions to break and form habits

Effectiveness of an activity tracker-and internet-based adaptive walking program for adults: A randomized controlled trial

The effect of automated text messaging and goal setting on pedometer adherence and physical activity in patients with diabetes: A randomized controlled trial

Effectiveness of Combined Smartwatch and Social Media Intervention on Breast Cancer Survivor Health Outcomes: A 10-Week Pilot Randomized Trial

Use of wearable technology and social media to improve physical activity and dietary behaviors among college students: A 12-week randomized pilot study

Do web-based competitions promote physical activity? Randomized controlled trial

The transtheoretical model of health behavior change

MyBehavior: Automatic personalized health feedback from user behaviors and preferences using smartphones

Automated Personalized Feedback for Physical Activity and Dietary Behavior Change With Mobile Phones: A Randomized Controlled Trial on Adults

Self-monitoring and technology: Challenges and open issues in personal informatics

Personal informatics for everyday life: How users without prior self-tracking experience engage with personal data

Designing a personal informatics system for users without experience in self-tracking: a case study

Social cognitive theory in technological innovations

Exploring cooperative fitness tracking to encourage physical activity among office workers

Time for change: using implementation intentions to promote physical activity in a randomised pilot trial

Can smartphone apps increase physical activity? Systematic review and meta-analysis

Personal tracking as lived informatics

Study on motivating physical activity in children with personalized gamified feedback

How gamification affects physical activity: Large-scale analysis of walking challenges in a mobile application

Scales for measuring user engagement with social network sites: A systematic review of psychometric properties

Effect and process evaluation of a smartphone app to promote an active lifestyle in lower educated working young adults: Cluster randomized controlled trial

The impact of information security events to the stock market: A systematic literature review

Feasibility of a theoryinformed mobile app for changing physical activity in youth with multiple sclerosis

The impact of incentives on exercise behavior: a systematic review of randomized controlled trials

Machine Learning in Mental Health: A Systematic Review of the HCI Literature to Support the Development of Effective and Implementable ML Systems

Efficacy of a mobile social networking intervention in promoting physical activity: Quasi-experimental study

Utilizing Gamification Approaches in Pervasive Health: How Can We Motivate Physical Activity Effectively? EAI Endorsed Transactions on Pervasive Health and Technology

Walking for fun or for "likes"? The impacts of different gamification orientations of fitness apps on consumers' physical activities

Long-Term effect of smartphone-delivered Interval Walking Training on physical activity in patients with type 2 diabetes: Protocol for a parallel group single-blinded randomised controlled trial

2021. syfantid/past-framework-visualization: Paper Release. Valk Systems

Self-monitoring and reminder text messages to increase physical activity in colorectal cancer survivors (Smart Pace): A pilot randomized controlled trial

Enhancing physical activity through context-aware coaching

Toward a persuasive mobile application to reduce sedentary behavior

Testing a Social Network Intervention Using Vlogs to Promote Physical Activity Among Adolescents: A Randomized Controlled Trial

Wearable physical activity tracking systems for older adults-a systematic review

Using phone-based activity monitors to promote physical activity in older adults: A pilot study

A Randomized-Controlled Trial of Social Norm Interventions to Increase Physical Activity

StepCity: A preliminary investigation of a personal informatics-based social game on behavior change

An mHealth Intervention Using a Smartphone App to Increase Walking Behavior in Young Adults: A Pilot Study

Wearable Sensor/Device (Fitbit One) and SMS Text-Messaging Prompts to Increase Physical Activity in Overweight and Obese Adults: A Randomized Controlled Trial

A Smartphone App to Support Sedentary Behavior Change by Visualizing Personal Mobility Patterns and Action Planning (SedVis): Development and Pilot Study

Preliminary efficacy of prize-based contingency management to increase activity levels in healthy adults

Walk Your City: Using Nudging to Promote Walking

The FAIR Guiding Principles for scientific data management and stewardship

An Application of Cloud Physical Activity Promotion System on High School Female Students' Physical Activity

2020. syfantid/PAST-SELF-Framework-Data: Paper Release

Using Behavioral Analytics to Increase Exercise: A Randomized N-of-1 Study

Estimation of Behavior Change Stage from Walking Information and Improvement of Walking Volume by Message Intervention

Social norms, self-identity, and attention to social comparison information in the context of exercise and healthy diet behavior

Mobile App-Based Small-Group Physical Activity Intervention for Young African American Women: a Pilot Randomized Controlled Trial

Keeping users engaged through feature updates: A long-term study of using wearable-based exergames

Evaluating machine learning-based automated personalized daily step goals delivered through a mobile phone app: Randomized controlled trial

Deconstructing gamification: evaluating the effectiveness of continuous measurement, virtual rewards, and social comparison for promoting physical activity. Personal and Ubiquitous Computing