key: cord-0438511-8341pwde authors: Rong, Yi; Rawassizadeh, Reza title: ODSearch: A Fast and Resource Efficient On-device Information Retrieval for Mobile and Wearable Devices date: 2022-01-31 journal: nan DOI: nan sha: 2796ca990537ed4c1e2d42bcaa52e6c829a32717 doc_id: 438511 cord_uid: 8341pwde Mobile and wearable technologies have promised significant changes to the healthcare industry. Although cutting-edge communication and cloud-based technologies have allowed for these upgrades, their implementation and popularization in low-income countries have been challenging. We propose ODSearch, an On-device Search framework equipped with a natural language interface for mobile and wearable devices. To implement search, ODSearch employs compression and Bloom filter, it provides near real-time search query responses without network dependency. Our experiments were conducted on a mobile phone and smartwatch. We compared ODSearch with current state-of-the-art search mechanisms, and it outperformed them on average by 55 times in execution time, 26 times in energy usage, and 2.3% in memory utilization. Technological advancements and widespread adoption of mobile and wearable technologies has transformed healthcare services and improved the quality of delivery by offering convenient low-cost apparatuses that provide inexpensive health monitoring and prognosis [12, 25, 27, 39, 73] . Consequently, new scientific disciplines have arisen, such as mobile health (mHealth) or employing wearables for symptom diagnosis, and their popularity has increased from the Covid-19 outbreak [1, 20, 50] . As a result, the demand for mHealth integration into the current healthcare system is also increasing [47] . One primary application of mHealth is operating the devices as a data collection and repository for Personal Health Records (PHRs) [17, 33, 59] that are either manually entered by users [8] , or automatically collected by device sensors, such as physical activities [35, 65] and heart rate [46] . mHealth services provide affordable access to medical facilities, especially in low-income countries with limited resources [3] , which promises to improve the quality of healthcare in those regions. However, despite continuous efforts, mHealth has not yet been extensively adapted [4, 11] . • We compare compression rate and average encoding/decoding time for lossless compression algorithms for use on battery powered devices. This section reviews studies that mitigate IR challenges for mobile and wearable devices. We have organized them into three categories: (i) indexing and information retrieval on resource-constrained devices, (ii) on-device databases and search frameworks, and (iii) text compression algorithms. Mobile IR and indexing focus on finding appropriate ways to help users analyze collected contextual data [68] . Although there are advances to on-device machine learning [16] , extensive exploration of mobile IR is still lacking. Information Retrieval: There are some promising resource-efficient approaches for IR systems such as, Gupta et al. [26] who demonstrated a term-weighting schema-based ranking function, where ranking function was based on fuzzy logic to improve the accuracy of retrieving relevant documents. Subhashini and Kumar et al. [63] concentrated on improving search accuracy through optimizing natural language processing techniques, by considering only nouns and verbs. Rawassizadeh et al. [53] proposed an on-device natural language query interface that can parse closed domain queries, but it does not include information retrieval. These studies did not investigate the issue of searching large amounts of data on small devices. Indexing: Another common approach to facilitate access to large-scale data is indexing. Białecki et al. [7] proposed Apache Lucene 2 , a full-text search engine, that uses an inverted index as a kernel and is widely used as a practical platform in the industry. Yang et al. [72] proposed 'Anserini', an extension toolkit on top of Lucene, that adds a scalable inverted index, streamlined IR, and an architecture enabling multi-stage ranking. Tan et al. [64] designed and implemented an upgraded inverted index combined with a hash function to simplify the inverted index that makes the corresponding query more accurate and memory efficient. Rawassizadeh et al. [52] used spatio-temporal clustering as index construction to facilitate search and reduce the search space by leveraging spatio-temporal indices. We chose to compare our approach with Lucene [7] , because it is widely used as an inverted index on different platforms. We also benefited from a descriptive solution to design our search framework by employing the temporal filtering mechanism used by Rawassizadeh et al. [52] to reduce the search space. Tan et al. [64] built one of the earliest resource-efficient search engines that could work on embedded devices with a top-k query algorithm, by using a buffer cache and an inverted index. Their approach is limited to keyword search and not range queries. Lyu et al. [43] proposed an empirical study that local databases often employ as the on-device IR system to provide users data storage and retrieval. They also analyzed and identified that the most frequently used databases in Android are SQLite, Oracle, and Realm. However, continuously using the local databases can lead to excessive power consumption or security problems [43] . Moreover, there are other local databases that can operate on mobile phones and smartwatches with exceptional performance, such as the H2 3 , LevelDB [14] , and ObjectBox [70] . We chose to compare our approach to three popular databases that run locally on Android devices: SQLLite 4 , Realm 5 , and H2 databases. From a technical perspective, H2 and SQLLite both use a combination of B-tree index and brute force method for searching the data, while Realm only uses B+Tree indices. Therefore, from a lower-level perspective, we are comparing our approach with B-Tree and B+Tree indices. To our knowledge, there is no empirical analysis for databases or search engines that can search data collected by the smartwatch. We compared the execution time, memory utilization, and energy usage in the search tasks on mobile phones and smartwatches of ODSearch against the aforementioned listed databases. The majority of lossless compression algorithms can be classified as either statistical-based approaches or dictionarybased approaches [61, 71] . Statistical-based approaches take advantage of each character's frequency and some popular examples are Run-length encoding [24] , Shannon-Fano encoding [19, 62] , Arithmetic encoding [49] , and Huffman encoding [31] . Run-length encoding can effectively compress consecutive repeated characters in a text. Shannon-Fano encoding can construct prefix code to compress symbols based on their probabilities, but it is challenging to achieve optimal compression efficiency [61] . Arithmetic encoding can approximate the optimal compression ratio, but encoding and decoding are very time-consuming [51] . Huffman encoding can approach the optimal compression rate if the frequencies of characters are very large [51] . Huffman encoding is also used in recent federated learning architectures [44] for reducing communication with the server while transferring neural network weights back and forth. Conversely, dictionary-based approaches store all recurring patterns, including single characters and strings of different lengths, while keeping the mapping relation between patterns and their codes in a dictionary. This makes dictionary-based approaches relatively efficient, but their search process is computationally expensive [51] , possibly resulting in excessive encoding and decoding time. A popular example, Lempel-Ziv-Welch (LZW) compression, also uses a Linux based operating systems [18] . We compared listed lossless compression later and report our rationale for our decision. IR algorithms refer to algorithms used for indexing and retrieving structured or unstructured information from text, video, images, and audio. Hersh et al. [28] explained that four modules are required for building an IR system for health applications: content, metadata, search engine, and queries. In line with those principles, ODSearch includes four modules: two focused on searching the content and two focused on connecting natural language queries with the search engines. Those modules are (i) Query Translator, (ii) Bloom Filter, (iii) Compression, and (iv) Answer Translator. Among these modules, ODSearch builds a pipeline of query, search, and information retrieval, and operates in two phases shown in Figure 1 (a) and (b). The first phase, "Preprocessing", focuses on constructing indices for the underlying datasets. It operates periodically, such as once per day, and processes the new data that was added to the system. This phase transfers the raw data from local storage to the Bloom filter and compresses it with Huffman encoding. Bloom filter constructs bit arrays for each day's data entries, which we call the "Bit Catalogue". Huffman encoding performs the compression on the original data, and stores it in "Compressed Local Storage" (CLS). The second phase is "Query Execution". In this phase, the framework uses the "Query Translator" to obtain keywords from the natural language query entered by the user. Next, query keywords are searched in Bit Catalogue. If a positive result is returned (i.e., keywords may exist in the content), then the query keywords will be subsequently searched in the CLS. If the keywords do not exist and the result of the Bit Catalogue check is negative, subsequent searching in the CLS and "Huffman Decoding" will be skipped. Finally, the "Answer Translator" calculates all search results from the previous step and presents the final query result in natural language. The term "calculate" here, refers to summation, minimum, maximum, and average. Figure 1 shows the architecture of the ODSearch framework. We have separated the Preprocessing phase from Query Execution for clarity. Each component of Figure 1 will be explained later in more detail. With this architecture, we were able to get a near real-time response even with very large amounts of data on a resource-constrained device such as a smartwatch. To conduct our experiments, we used two real-world datasets, one from smartphones and one from smartwatches. We used the "Ubiqlog" dataset [59] , generated from a lifelogging tool on mobile phones, and the "Insight for Wear" dataset [58] , generated from a continuous sensing tool on a smartwatch. Both UbiqLog 6 and Insight for Wear 7 datasets are publicly available for research purposes. Mobile and wearable devices can provide robust and affordable means to access data regarding a users' heart rate, step count, and activity type [21, 22, 30, 34] . Moreover, these are the de-facto standard information objects generated from mobile and wearable device sensors, and they are among the most frequently asked values when users query their PHRs [53] . We use heart rate, step count, and activity type from the "Insight for Wear" dataset and the step count and activity type data from the "UbiqLog" dataset. Missing data due to a variety of sensors within different devices is inherent in these datasets [54] . Therefore, we chose the user from each dataset (two users total) with the most available data and consequently the fewest missing data points. Choosing the largest dataset available simulates the worst case of a real world search. Original PHR data collected from mobile or wearable sensors are all associated with time. A PHR record, r, can be represented within a 3-tuple arrangement: =< , , >. S denotes the sensor name, T is the record's timestamp, and D is the sensor data. We define the combination of S, T, D as the metadata of the content. Among the three categories of the dataset, heart-rate data is recorded as beats per minute (bpm), step count refers to the total number of steps taken in a day. The activity types, including "still", "tilting", "onfoot", "invehicle", and "unkonwn", were extracted by the Google To demonstrate the scalability of our approach, we created synthetic datasets based on the two real users we selected. The synthetic data is simply a repetition of the original data segment, ignoring the data distribution. The size of the synthetic segments varied between 30 Kb and 48 Mb. According to the number of records, the largest synthetic dataset simulates approximately over three years worth of data. Moreover, the upper limit of 48 Mb was based on the processing capacity of both devices. Table 1 provides a summary of the number of instances that occured in the real-world and synthetic datasets. The Bloom filter [13, 42] is a space-efficient probabilistic data structure used to search for member existence within a dataset. A Bloom filter consists of an array of m bits, which can represent n elements in a set = { 1 , 2 , ..., }. Initially, all bits are set to zero. In the process of inserting elements into a database that has a Bloom filter, each element ∈ is We implemented Bloom filters through Guava [5] Java library. To support Bloom filter query based on time, we insert records from different dates to different Bloom filter instances. It means, we created an independent bit array for each day's data. Bit arrays were collected and stored in the "Bit Catalogue". Our "Bit Catalogue" a key-value pair data structure whose keys are unique, used date as a key and a bit array as value. The process of implementing the Bloom filter module is shown in Figure 2 . First, we extracted the three metadata attributes S, T, D from the original data in local storage, then we assigned the date of T as the key of the Bit Catalogue, and inserted S and D into the bit array, which is the corresponding value for the key. This approach makes each pair (date and bit array) unique and independent in the Bit Catalogue. When a user queries data of a certain period, it can retrieve all dates in the given period and use them to look for the corresponding bit array in the Bit Catalogue. If the queried bit arrays return any positive result, the algorithm proceeds with a query to the next step, which fetches all records containing the keyword. If none of the queried Bloom filters returns a positive result, the searched keyword never appears in the given period. Then, no further search will be done. In other words, the Bloom filter module operates as an 'initial filter' before searching the content of data files. This approach enhances the efficiency of the search and enriches the functionalities of the search framework. A Bloom filter is prone to false positives because it could claim an element belongs to a set even when it is absent. However, within our framework, getting false-positive results just means the queried element will be further searched in the next step. The evaluation section demonstrates the significant impact of using the Bloom filter. Information storage is not typically an issue with cloud systems, but every query involves delays, both in the cloud and on-device. [15, 57, 74] . To reduce the delay, we analyzed several lossless text compression algorithms including, Run-length encoding, Shannon-Fano encoding, Arithmetic encoding, Huffman encoding, and LZW compression [61] . Furthermore, based on the results of our analysis, we chose to use Huffman encoding to compress data and facilitate data retrieval from the encoded data. Huffman encoding replaces characters that occur more frequently with binary codes that require fewer bits, and replaces less frequent characters with binary codes that require more bits. The process of Huffman encoding starts from counting the frequency of each character, and then a Huffman tree (a type of binary tree) is constructed based on those frequencies. Therefore, a binary representation for each character can be inferred from the Huffman tree. In simple words, Huffman encoding, and the later decoding, are character-based, and suitable for compressing textual content. The implementation of Huffman coding in this work is customized from the existing work in Algorithms [60] . The process flow of Huffman coding is shown in Figure 3 . This module begins with counting the frequency of each character in the sensor data, and then it generates a Huffman tree. Since almost all mobile and wearable health related queries are time-dependent [53] , therefore we compressed the data by time to enable temporal query, and we developed a CLS that included a "Nested Dictionary". Here a Nested Dictionary refers to key-value storage, and it uses a dictionary as a value to allocate data by their dates and sensor types. In other words, the CLS is a two-layer dictionary with a nested structure. The outer dictionary uses the T as a key and an inner dictionary uses T as a value. It means, we create a new dictionary Manuscript submitted to ACM for each T. The internal dictionary assigns S, sensor name, as a key and a collection of the sensor data's binary codes as value. All of the sensor data's binary codes will be collected in an array and stored in the value of the inner dictionary. For instance, if we refer to an activity type PHR on 08/01/2021 showing "onfoot" as the sensor data, it will be encoded as 001011011100001010. And if we define the content between curly braces as a dictionary (i.e., key: value) and the content between square brackets as an array, the sensor reading will be stored in the Nested Dictionary as {"08-01-2021": {"activity": [001011011100001010]}}. Inside the Nested Dictionary, "08-01-2021" is the key of the outer dictionary and "activity" is the key of the inner dictionary, and [001011011100001010] is the value of the inner dictionary. Using this compression level, we can process keyword-based queries for sensor data, sensor type, and date. We encoded the query keyword (keyword is extracted from user query) from the constructed Huffman tree and then searched for them in the CLS. After fetching all the compressed results matching the three criteria (date, sensor type, and keyword for sensor data), we used "Huffman Decoding" to translate the query results into human-understandable text. However, if the query does not search for a specific keyword, all records that match the date and sensor type will be retrieved. We report the difference between using and not using this type of Huffman coding in the "Experimental Evaluation" section. In Figure 3 , the dotted lines show that the execution of Huffman encoding depends on keyword-based queries and the presence of the keyword. However, since results will always be decoded from the CLS, we do not use a dotted line for Huffman decoding. To enable the NLI, we designed a translation component (Query Translator) between users' questions (natural-language query) and machine understandable language. "Query Translator" extracts keywords from users' queries and organize Rawassizadeh et al. [53] proposed that user's natural language query can be processed with four dictionaries of keywords including: (a) question words, such as "how many" or "what"; (b) temporal notion, such as "today" or "this week"; (c) sensor name, such as "heart rate" or "activity"; and (d) aggregation words, such as "total" or "average". However, there is no implementation proposal that execute the query on the underlying dataset. Therefore, we implemented "Query Translator" to bridge this gap, and it implements query classifications proposed by previous works [38] to construct the machine query. Figure 4 presents the workflow of the "Query Translator" and "Answer Translator" modules. Query Translator begins by identifying four query keywords: a question word, a subject word, a temporal notion, and an aggregation word. This classification and possible types of questions that users could ask about their PHR, have been proposed in previous works [53] . We identified query types by conducting a manual qualitative analysis of the question dataset proposed in [53] . Two researchers performed theme analysis and their Fleiss-Kappa score was 82%, indicating a substantial agreement. The descriptions of query types are shown in Table 2, and Table 3 presents examples of parsing five different query types. We take the query type to refine essential information of the natural language query and use it to control the "Answer Translator". For example, one of the query types is "keyword existence", which means the query only asks for the membership of a keyword; hence only Bit Catalogue will be queried, and subsequent search on the CLS will be skipped. After the keywords and the query type are identified, the "Query Processing" component, which consists of the Bloom Filter and Huffman Coding, will search the preprocessed data and return records matching the query keywords. However, before presenting returned records to users the "Answer Translator" decorates them and converts them to natural language text. In particular, the "Answer Translator" performs two tasks: (i) it calculates the query results based on the query type; and (ii) it performs natural language decoration on the query result. The computed result can be either boolean or numeric results. In both cases, they are decorated in a human-readable expression, and the final query result is presented on the user interface. For example, if a user query is "How many times did I run this month?", the query type will be "keyword count" and the query type calculator will return a numeric value. Assuming that the result is '7', the natural language decorator will tell the user "You did it 7 times", which the value '7' is wrapped in a human-readable text. Query Type Description keyword existence Check the membership of a keyword in a specific period keyword count Count the frequency of a keyword in a specific period period max/min Calculate the maximum/minimum value in a specific period, such as maximum daily steps in a month period sum Calculate the sum value in a specific period, such as the total steps in a week period average Calculate the average value in a specific period, such as the average heart rate in a one day In this section, we first evaluate the compression efficiency of the listed lossless compression algorithms. Next, we report the impact of using or not using the Bloom filter or Huffman coding in our approach. We then provide a comparison between state-of-the-art search methods (SOTA) and ODSearch. To perform this comparison, we measured query response time, memory usage, energy consumption, and scalability on smartphones and smartwatches. All search experiments were implemented on a mobile phone and a smartwatch. We built a mHealth conversational interface, based on UbiqLog architecture [59] for both mobile phones and smartwatches to query a user's PHRs, and then obtain performance metrics (response time, memory utilization, and energy usage) for each experiment. The mobile phone device we used for experiments is Oukitel C12, equipped with 2 GB RAM, 16 GB Storage, and a 3300 mAh battery. The smartwatch device is TicWatch S2, which has only 512 MB RAM, 4 GB storage, and a 415 mAh battery. The smartphone's operating system is Android OS version 8.1, and the smartwatch is wearOS version 2.32. To cover different states of information retrieval in our experiments, we made five sample queries (see Table 3 ), covering various question words, temporal terms, subjects, and aggregation words. Henceforth, we will refer to these queries using their alphabetical acronym (symbols in Table 3 ). To reduce the stochasticity of experiments each of our experiment has been repeated five times and the average number is reported as a result. We compared five well-known lossless compression algorithms [61] to encode and decode a set of sensor data. To make our experiment dataset as stochastic as the real-world data, we synthesized random sensor data. The synthesized data was from the existing real-world data, but its sensor values were permuted. The size of the synthetic dataset was 188,889 Bytes which was made up of 30,000 records, including 10,000 entries for activity type, 10,000 entries for step number, and 10,000 entries for heart rate. LZW encoding had the highest computational complexity among the five algorithms, and achieved the highest compression ratio. Due to its complexity it required the longest time for encoding, 18 times more than Huffman encoding's time. Manuscript submitted to ACM Based on the results presented in Table 4 , we selected Huffman encoding because (i) Huffman encoding presented the best time efficiency based on the sum of encoding and decoding time, and were very close to the best result; (ii) although LZW had the best compression ratio, it is computationally too expensive for encoding, which is a significant issue on resource-constrained devices such as smartwatches; and (iii) compared with other compression methods in the experiment, Huffman encoding outperformed all in at least one aspects of compression ratio, encoding time, and decoding time. The backbone of our search framework is Bloom filters and compression (Huffman Coding This section reports execution time, memory utilization, and energy usage of our approach compared to four State-ofthe-Art (SOTA) methods. Three SOTA databases that are used on mobile or wearable devices were selected, including SQLLite, Realm and H2. Lucene was also selected even though it is not a database itself, but because it is a popular indexing mechanism that is implemented on mobile devices [7, 9, 10] . To enable ODSearch implementation on real-world mHealth applications, an essential factor that directly affects the user experience is the execution time (a.k.a. response time) [56, 67] . The execution time starts when the user sends a query and ends when the user receives the answer. As described above, we used SQLite, Realm, H2, and Lucene. Additionally, we added a brute force search as a baseline. The mobile phone and smartwatch execution time results are shown separately in Table 6 (a) and (b). In addition, Figure 5 presents the average execution time of the five queries from Table 6 . Results in Figure 5 shows that our approach significantly outperforms all other search mechanisms on both devices. For both smartphone and smartwatch, on average, the execution time is 55 times faster than SOTA methods and 6 times faster than the fastest SOTA method, i.e., Realm. Although it is not common to report, we also report the execution time of the preprocessing for each search method. In comparison to conventional computers, another limited resource in mobile or wearable devices is memory [6] . We use the memory profiler 9 in Android Studio to collect the memory usage of the device in real-time. Memory profiler identified that the preprocessing of all search tools required the most amount of memory. Mobile and wearable devices are known for their limited battery capacity [29, 58] and energy utilization directly affects the usability of the device [69] . We report the energy usage for query execution to identify the energy impact of ODSearch. Energy usage monitoring started when a query input was entered into the GUI and ended when the user received the answer. To monitor the battery usage in millijoules ( ) or joules ( ), we analyzed the battery changes at its current ampere ( ) and voltage ( ). Since we know the interval of a query or reprocessing (in second, ), we can compute the energy consumption based on the following equation: 1 (1000 ) = 1 × 1 × 1 Table 8 shows the average energy usage in millijoules for the five repeated queries, on both devices. Moreover, Figure 7 presents the average results of the five sample queries from To measure scalability, we examined how the performance of the different search methods changes as the data size increased. We used synthetic datasets with increasing sizes for this experiment. As discussed, the synthetic datasets were constructed from the real-world data of users, but the sizes of synthetic datasets varied, i.e., 30kB, 3MB, 6MB, 12MB, 18MB, 24MB, 30MB, 36MB, 42MB, 48MB. 4.8.1 Scalability on Mobile Phone. First, we measured the memory utilization of preprocessing for each data size. If the data size was too large, the preprocessing would be terminated and is shown as an "ERROR" representing Java "Out Of Memory Error" in our report. We also explored the changes in response time as a function of the size of the dataset. Since the difference among the query response times is too large to visualize, we report the response times for the five sample queries with logarithmic scale (see Figure 8 ). In all of these experiments, ODSearch outperforms all other methods. Another advantage of ODSearch is its constant efficiency as illustrated by queries a and c (see Figure 8 ). This is because both queries a and c are membership queries and ODSearch uses Bloom filter to query membership with (1) complexity for our synthetic datasets. Moreover, we report the energy usage for querying different sizes of data. Since the energy usage also differs for different dataset sizes, we use a logarithmic scale to present five charts for the five sample queries in Figure 9 . ODSearch consumed the least amount of energy and also outperformed all other methods in terms of energy consumption. 10 Based on our experiment 1MB of data is an average data for one week and existing fitness trackers are usually presenting data for one week. Table 10 . Maximum memory utilization (in MB) of preprocessing synthetic dataset on smartwatch. "Error" here refers to Java "Out of Memory Error". We also report marginal memory utilization on a smartwatch. These results are calculated between their maximal loadable synthetic dataset (30 MB For clarity the Y axis is presented in logarithmic scale. Note that the intervals between data sizes on X axis are not equal. Fig. 11 . Energy usage of querying synthetic data on smartwatch. For clarity the Y axis is presented in logarithmic scale. Note that the intervals between data sizes on X axis are not equal. Figure 10 presents the query response time on a smartwatch, and Figure 11 presents the energy usage for the five sample queries. According to the response time and energy utilization results, ODSearch is the most energy-efficient and the fastest method. These results also verify that, even on the smartwatch, ODSearch can answer a query between 0.01 seconds and 0.7 second, which can be considered as a real-time response. In this work, we describe ODSearch, a network-independent, fast, and resource-efficient search framework that interacts with natural language and answers queries on mobile phones and wearables in real-time or near real-time. Our method consists of four modules, of which Query Translator and Answer Translator modules are designed for natural language interaction, and Bloom Filter and Huffman Coding modules are built for efficient information retrieval. We have experimented with several loss-less compression algorithms and demonstrate that Huffman coding is the best compression algorithm for ODSearch, and it can significantly improve the execution time of the search operation. Furthermore, we present that ODSearch can query in real-time on mobile phone and smartwatch devices and outperform state-of-the-art search methods on large or small datasets, in execution time, energy utilization and memory use. Can mHealth technology help mitigate the effects of the COVID-19 pandemic? 2020. mHealth communication framework using blockchain and IoT technologies Systematic review on what works, what does not work and why of implementation of mobile health (mHealth) projects in Africa Factors affecting implementation of digital health interventions for people with psychosis or bipolar disorder, and their family and friends: a systematic review Getting Started with Google Guava Sparsification and separation of deep learning layers for constrained resource inference on wearables Grant Ingersoll, and Lucid Imagination Patient health record systems scope and functionalities: literature review and future directions A mobile phone information search engine based on Heritrix and Lucene Semantic Search of Mobile Applications Using Word Embeddings mHealth interventions in low-income countries to address maternal health: a systematic review 5G in healthcare: how fast will be the transformation? BloomFlash: Bloom filter on flash-based storage Getting started with LevelDB On-device machine learning: An algorithms and learning theory perspective Unmesh Kurup, and Mohak Shah. 2021. A survey of on-device machine learning: An algorithms and learning theory perspective Detecting physical activity within lifelogs towards preventing obesity and aiding ambient assisted living Hybrid Control-Flow Checking with On-Line Statistics The transmission of information Mehdi Hosseinzadeh, and Reza Rawassizadeh. 2021. 11 Years with Wearables: Quantitative Analysis of Social Media, Academia, News Agencies, and Lead User Community from 2009-2020 on Wearable Technologies. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies Can wearable devices accurately measure heart rate variability? A systematic review Deep neural networks for human activity recognition with wearable sensors: Leave-one-subject-out cross-validation for model selection The information dilemma: How ICT strengthen or weaken authoritarian rule Run-length encodings (corresp.) Verifiable privacy-preserving monitoring for cloud-assisted mhealth systems A new fuzzy logic based ranking function for efficient information retrieval system Securing electronic healthcare records: A mobile-based biometric authentication approach Information retrieval: A biomedical and health perspective Understanding Smartwatch Battery Utilization in the Wild Validity of FitBit, Jawbone UP, Nike+ and other wearable devices for level and stair walking A method for the construction of minimum-redundancy codes Security and privacy for mhealth and uhealth systems: A systematic mapping study Memento: An emotion-driven lifelogging system with wearables Recommendations for determining the validity of consumer wearable and smartphone step count: expert statement and checklist of the INTERLIVE network Amine Bahi, and Carole Frindel. 2020. Privacy-Preserving IoT framework for activity recognition in personal healthcare monitoring Sefr: A fast linear-time classifier for ultra-low power devices How a Conversational Agent Might Help Farmers in the Field NaLIR: an interactive natural language interface for querying relational databases Just-in-time but not too much: Determining treatment timing in mobile health Blockchain Technology in Healthcare: A Scientific and Technological Driving Force Toward Accurate and Efficient Feature Selection for Speaker Recognition on Wearables Optimizing bloom filter: Challenges, solutions, and comparisons An empirical study of local database usage in android applications Morteza Homayounfar, Farshid Alizadeh-Shabdiz, and Reza Rawassizadeh. 2021. FEDZIP: A Compression Framework for Communication-Efficient Federated Learning Health expenditure, literacy and economic growth: PMG evidence from Asian countries Heartsense: Ubiquitous accurate multi-modal fusion-based heart rate estimation using smartphones Mobile health applications for disease screening and treatment support in low-and middle-income countries: A narrative review The mhealth Source coding algorithms for fast data compression Wearable sensor data and self-reported symptoms for COVID-19 detection Burrows-Wheeler Transform Based Lossless Text Compression Using Keys and Huffman Coding Indexing multivariate mobile data through spatio-temporal event detection and clustering A natural language query interface for searching personal information on smartwatches Ghost imputation: Accurately reconstructing missing data of the off period Scalable daily human behavioral pattern mining from multivariate temporal data Lesson Learned from Collecting Quantified Self Information via Mobile and Wearable Devices NoCloud: Exploring network disconnection through on-device data analysis Energy-efficient integration of continuous context sensing and prediction into smartwatches UbiqLog: a generic mobile phone-based life-log framework A comparative study of text compression algorithms A mathematical theory of communication. The Bell system technical journal A framework for efficient information retrieval using NLP techniques Microsearch: A search engine for embedded devices used in pervasive computing Defining adherence: Making sense of physical activity tracker data A Blockchain-Enabled Framework for mHealth Systems Quantifying interactive user experience on thin clients Introduction to mobile information retrieval Quantifying sources and types of smartwatch usage sessions Comparative analysis of database systems dedicated for Android A technique for high-performance data compression Anserini: Enabling the use of lucene for information retrieval research 2021. Passive Health Monitoring Using Large Scale Mobility Data On-device Learning Systems for Edge Intelligence: A Software and Hardware Synergy Perspective Additionally, ODSearch is among the most scalable search methods that can handle large data searches, while consuming the minimum amount of memory and energy. Future work will focus on building a conversational agent based on ODSearch, and conduct a user study to analyze the impact of using conversational agents instead of traditional graphbased data representation that is common in fitness applications currently.