key: cord-0146551-apv4yoky authors: Toda, Takamichi; Moriwaki, Daisuke; Ota, Kazuhiro title: Aggregate Learning for Mixed Frequency Data date: 2021-05-20 journal: nan DOI: nan sha: 1fad760d5e19e308233e05c8de7bbd7d76ec61dc doc_id: 146551 cord_uid: apv4yoky Large and acute economic shocks such as the 2007-2009 financial crisis and the current COVID-19 infections rapidly change the economic environment. In such a situation, the importance of real-time economic analysis using alternative datais emerging. Alternative data such as search query and location data are closer to real-time and richer than official statistics that are typically released once a month in an aggregated form. We take advantage of spatio-temporal granularity of alternative data and propose a mixed-FrequencyAggregate Learning (MF-AGL)model that predicts economic indicators for the smaller areas in real-time. We apply the model for the real-world problem; prediction of the number of job applicants which is closely related to the unemployment rates. We find that the proposed model predicts (i) the regional heterogeneity of the labor market condition and (ii) the rapidly changing economic status. The model can be applied to various tasks, especially economic analysis Devastating external economic shocks such as the 2007-2009 financial crisis and COVID-19 infections rapidly change economic circumstances. Amid these situations, real-time economic analysis is essential. The real-time analysis of the economic conditions is often called Nowcasting or Economic Nowcasting. Nowcasting provides important insights into the current economic status that are essential to appropriate policy responses including monetary and fiscal stimulus. An understanding of the on-going economic situation is also needed by private companies who make decisions on investment and employment. Nowcasting often takes advantage of alternative data, non-standard data such as search queries, location data, SNS data, and satellite images [3, 6, 8, 14, 25, 37] . These data are suitable for economic nowcasting because of their high frequency. While most of the official statistics are reported once a month, alternative data is usually recorded in real-time. Another beauty of alternative data is its granularity. While official statistics usually report state-level statistics alternative data can provide detailed information up to the individual-level. In response to the surging demand for alternative data, several works have already emerged. Tech giants such as Apple, Google, and Facebook provide mobility statistics based on their proprietary data [2, 12, 28] . These data help us understand several aspects of economic status. However, it is hard to understand the whole picture of the economy itself because the relationship between these alternative data-based statistics and familiar economic indices such as GDP and unemployment rates is unclear. To fill this gap, many nowcasting/forecasting models that use high-frequency data such as Google search query to predict economic variables are proposed [3, 5, 6, 25, 32, 35, 37] . Combining high-frequency data like Google search query and low-frequency data such as monthly unemployment rates and quarterly GDP has been actively studied by economists [9, 10] . One caveat of these studies is that they do not fully extract the granularity of the alternative data. Target variables of these studies are often aggregated statistics such as GDP and unemployment rates. Although the predictor variables have a greater granularity, the predicted values are aggregated at the state or even national level. However, the governments, especially local governments need to deal with heterogeneity among small areas. For example, the bankruptcy of a large automaker will critically affect the local economy where the factories are located but does not affect the labor market at a national level. Authorities need to be sensitive to not only temporal changes but also regional heterogeneity. The problem is that there is no granular data in the official statistics. That is, there is no label for the granular level. The problem is called aggregate output learning or aggregate learning and there has been only a few research papers has been published until recently [27] . However, the problem has been taking attention from an increasing number of researchers [7, 19] . We, here, combine mixed-frequency data literature in economics and aggregate learning literature in machine learning to fully utilize the richness of alternative data and provide an example of useful application. Our Mixed-Frequency Aggregate Learning (MF-AGL) model takes advantage of spatio-temporal granularity of alternative data and predicts economic indicators in smaller areas in real-time, which are not possible by standard forecasting models. The model also updates the prediction in real-time using high-frequency data without high-frequency target data. To train the model, we define a novel loss function for spatio-temporally aggregated label data and granular predicted values. More specifically, we aggregate predicted values for small areas and calculate loss using an aggregated level. We also calculate the loss for each highfrequency features so that the model can learn the mixed-frequency structure of the data. That means we reuse the same label data over and over. We apply the model to the real-world problem; prediction of the number of job applicants for smaller areas in Japan. The number of job applicants reflects the condition of the labor market because job applicants are also unemployed persons. An acute increase in the number of job applicants implies deterioration of labor market conditions. To our best knowledge, the present work is the first to propose a novel method combining mixed frequency data and aggregate learning. We also demonstrate its practical importance in a realworld application. While we applied the model to the nowcasting of the labor market, the model can be applied to any task that contains (i) infrequent and aggregated indices such as GDP and (ii) spatio-temporally granular data. In the following sections, we first discuss the background and related works in Section 2, then state the problem setting and describe the proposed model in Section 3. We then show the experiment results with the detail of the data and data pipeline in Section 4. Finally, Section 5concludes. Nowcasting using mixed-frequency data has been an active research area [1, 4, 9, 10, 20, 24, 36] . In particular, nowcasting of labor market statistics with alternative data date back to the late 2000s [3, 6, 37] . They suggested the potential predictive power of search query data. While most of the studies utilize web data represented by Google trends, Moriwaki (2020) [25] uses smartphone GPS data to predict unemployment rates. Unprecedented COVID-19 pandemic reminds the importance of economic nowcasting [21] . The government needs to recognize the rapidly-changing economic environment to take appropriate actions. In response to the surging demand, giant tech companies such as Google, Facebook, and Apple have been contributing by providing alternative data-based indices [2, 12, 28] . Among them, mobility data has gained a lot of attention. Mobility data provides insight into how people change their behavior in life. Monitoring people's mobility patterns is an important example [13, 18, 26] . Mobility pattern reflects people's shopping behavior in physical stores, leisure, and travel. In addition, more complicated economic activity can be assessed. One notable example is unemployment [25] . In Japan, unemployed persons are required to visit public employment offices to collect unemployment insurance benefits. Hence the number of visitors is correlated to the number of unemployed persons. Official statistics only show monthly and prefecture-level data for the number of unemployment insurance benefits. The monthly statistics are usually released a month after the end of the month. Real-time data for this number is of high value in economic nowcasting. In their seminal work [27] , Musicant et al.(2007) first defined the aggregate output learning problem in which the labels are only available in an aggregated form. They investigate various machine learning models that are applicable to the problem. The aggregate output learning problem has been recently re-investigated in various research [7, 19, 38] . The present work is also related to unsupervised learning for super-resolution [34] and video interpolation [33] in the computer vision literature, which aims at recovering granularity from the data without labels. Let ∈ R be a target variable which is of interest of economists (e.g. unemployment rate and GDP). stands for some larger area such as nation and state and stands for some longer time period such as quarter and month. Let ∈ R be feature variable which is correlated with the target variable . stands for some smaller area such as city and county and stands for shorter time period such as day. The difference between and , and and are the granularity. In particular, an area is divided by multiple small areas s and time period is divided by multiple short time period . We use mapping function and to describe the relationship. In particular, ( ) = indicates that time belongs to and ( ) = indicates area belongs to . Our goal is to find a predictor which predict 0 from granular data ( ) ≤ 0 , where ( 0 ) = 0 , (i.e. 0 belongs to 0 ). Notice that the superscript is not larger area but small area . That is, predicts instead of . Table 1 illustrates the data structure of our problem. Feature vector x is observed for City 1 and 2 and for each day. But output value is only observed for prefecture (assume the prefecture comprise of only two cities) and each month. We want to predict monthly values for each city (i.e., 1 Jan and 2 Jan ). Since feature vector x is collected in real-time, we want to update the prediction using the latest information. As shown in the Table 1 the predicted values are changing according to the feature vector. That is,ˆJ an |x Jan1 ≠ Jan |x Jan2 . In this way, we can fully utilize the real-time and granular data for the forecast. While we conduct spatial disaggregation, we directly predict the aggregated value for a longer period. That is, we predict rather than . This seems simple supervised learning but it is not. To see this, let be Jan 15, 2020. Then is Jan 2020. We have data only for the past. That is, we have x Jan1 , x Jan2 , · · · , x Jan15 but do not have x Jan16 , x Jan17 , · · · , x Jan31 . Nowcasting predicts the aggregated value using incomplete time series data [10] . For economists, the monthly value is more informative than the daily value . represents the economic status in a very short term while represents the forecast of the longer time period. For example, let be the unemployment rate for January 2020 and be that for January 1, 2020. While the daily movement of the unemployment rate is Table 1 : An illustration of output data , Input data x, and predicted dataˆ. The predicted data is updated every day using latest information although the label data is only available for spatially and temporally aggregated value. Jan 2 · · · · · · · · · · · · · · · Jan 31 interesting for stock traders who are eager to know the short-term fluctuation, the forecast of the monthly unemployment rate is of prime interest for policymakers who need to know the trend of economic status. Furthermore, policy interventions take some time to be implemented. On the other hand, geographically granular estimates are much more useful than aggregated value especially for local governments who need precise information about their local economy. Nevertheless, our method is easy to transform into a daily version. In sum, our goal is to find a good predictor (x ) for , ( ) = . In this section, we describe our proposed model. Following the aggregate learning literature [7, 27] , we first define the aggregation function as, where ∈ [0, 1] is weight. The weight controls for the share of the values for each granular area in the large area. In the simplest case (including the application discussed below), is set to one. In these cases, can learn the actual values of the target from the features. However, when we only obtain normalized features such as population per acre or averaged age, we need weighted sum. In these cases, the weights are pre-determined based on real data and knowledge. Area and population are often surveyed at a fine granularity by censuses and can be used as good proxies of the share. Predictor predicts the outcome values for small areas. In contrast to standard supervised learning, true values for the predictor can not be observed. In another word, the predictor predicts latent values. The main feature vector x is of granular information such as search queries, posts in social networking service (SNS), point of sales (POS), credit card, and mobility data. These data are often called alternative data. The vector x contains the current value and its lagged values. That is, x = ( , −1 , −2 , · · · ). Forecasters want to update the prediction when new data arrive. That is, can be any timing. For example, Let be April 2020 and 1 , 2 , · · · , 30 be April 1st, 2nd , · · · , 30th. Then, x 1 and x 2 should give different prediction. However, the label data are only available for complete data x 30 = ( 30 , 29 , · · · , 1 ). 30 is missing in x 29 . In economics, missing data is dealt with by either (i) training model on data with the same missing structure or (ii) imputation of missing data. In the above example, a forecaster who adopts (i) uses only data from the 1st to 29th day for each month to keep the same missing structure when training their model. Naturally speaking, it causes a huge loss of information. A forecaster who resorts to (ii) needs to prepare another model to conduct imputation. In our model, the missing-ness is treated by introducing auxiliary feature vector and non-linearity in the parameters. The vector contains one-hot encoded year, month, day, larger area , and small area . By doing so, predictor can learn the missing structure of the data and appropriately use the information. The non-linearity of the parameter is essential in this sense. We adopt a simple recurrent neural network (RNN) to make the predictor flexible (Fig. 3) . As a result, we can fully utilize the information and make it end-to-end. At each data point ( , ), the predictor predicts the values for small area at short term period . The predicted values are weighted-average and the loss is calculated. Trained predictor is used for the prediction using granular data. Forecaster can use real-time data x to predict the smaller area's current status asˆ= (x ). The good news is that the forecaster does not need to re-train the model until new label data is released. She can re-use the same model for an extended period. We apply the MF-AGL model to analyze the real-world economy. In particular, we apply the model to predict the number of job applicants in Japan. The Japanese government releases official statistics on the number of persons who file job applications to public employment offices on monthly basis. In Japan, unemployed persons need to file a job application to the public employment office to take up the unemployment insurance benefits. The number of job applicants is counted at 544 public employment offices, the official statistics summarizes the number for 47 prefectures. The number of job applications is a good proxy for the number of unemployed persons. This is very similar to the unemployment insurance claims statistics in the U.S., which is considered one of the most important economic indicators by economists. The reasons we chose the problem are the following. First, unemployment is a huge tragedy; It leads to loss of income and also loss of contact with society, which causes economic and mental hardship. Real-time analysis of labor market conditions is essential for a swift policy response. Second, the prefecture-level data provided by the reports are too rough for the appropriate policy response. In Japan, each prefecture has a population of several million to ten million. More granular statistics are needed for careful policy intervention. Third, monthly updates of the reports are too infrequent. Amid the COVID-19, the deterioration of the market is very fast. Looking at monthly data for one or two months ago is not very meaningful. We need a real-time update for the data. Fortunately, there is a good alternative data for the number of job applicants. As Moriwaki [25] shows GPS data from smartphones has good predictive power for the number of unemployed persons. In this study, we utilize similar datasets. As shown in Fig. 1 , the GPS readings around public employment offices indicate the visits to the office. In contrast to [25] who only count the number of GPS readings inside the radius of the offices, we extract rich features from these data. The detail of the feature extraction is provided in Section 4.2 The whole process is summarized in Fig. 2 We first extract feature vector z from raw GPS readings taken from mobile apps. Then the visit predictor predicts x q , the number of visits to public unemployment offices located in each city. Then another predictor predicts , the number of job applicants for each office. The visit predictor is trained on the different domain and transferred to the task. The transfer learning is explained in Section 4.3. We use three datasets as follows. The crucial challenge is to accurately count visitors from noisy GPS readings. GPS is very noisy especially for the smartphone because the logs are usually very sparse. For example, many apps log the GPS records when the phone is moved. This algorithm aims at minimizing buttery consumption. As a result, the records are not recorded at the same intervals. Also, the accuracy of location deteriorates inside buildings as the signals from satellites are not reached. One possible solution is to rely on the machine learning techniques that denoise the data and extract visited POIs (Point-of-Interests). There are various approaches to do this job [11, 16, 29, 30] . We find that the visited point extraction methods typically rely on (i) the number of stay points, (ii) the stop location, (iii) stop duration, and (iv) speed. We follow their approach and extract extensively rich information from location data to achieve high performance. The list of the features is presented in Table 2 . The visit predictor in Fig. 2 uses location trajectory to classify visit/non-visit to some POIs. Max speed of the user stay count # of stay points speed at 9 points Speed at 9 nearest points from POI cosine at 9 points Cosine extracted from the trajectory of users Another challenge for visit prediction is that there is no true label for public employment office visitors. We transfer the visit predictor trained on the different proprietary source and transfer to the visit prediction for the main task; visits to public employment offices. The visits predictor uses the extracted features as in Section 4.2. The predictor is implemented using LightGBM [15] . The hyperparameter tuning is done with the LightGBM tuner. These features are powerful. The visit predictor achieved an AUC of 0.86 for the classification task for the original task (train: test = 9,091 : 4,479). Unemployed persons visit nearest public employment office to file an unemployment insurance claim. There are 544 offices in Japan. We divide the entire country into 544 regions based on the location of the offices assuming each region is covered by the nearest office. We use the transferred predictor to predict the visit count to each office. With the predicted visits count, we train the job applicants predictor that predicts the number of job applicants for each public employment office. Input to the model is described in Table 3 . The predictor uses past visit count and dummy variables extracted from the date of making a prediction. One-hot encoded month dummy day dummy One-hot encoded day dummy prefecture dummy One-hot encoded prefecture dummy public office dummy One-hot encoded office dummy The training/prediction model is a simple recurrent neural network, and Fig. 3 shows the network architecture. This model uses an LSTM layer and a multi-layer perceptron to predict a daily visit count of small areas, and each prediction is aggregated to predict a monthly visit count of large areas according to Eq. (1). We use Adam [17] solver for optimization with 1 = 0.9, 2 = 0.999, initial learning rate = 0.0001, no weight decay and no learning rate decay. We train our models for a total of 600 epochs with a batch size of 1 and the MSE loss described in Eq. (2) on Tesla V100 GPU. We implemented the whole network in PyTorch [31] . Now we demonstrate the usefulness of the proposed model. Fig. 4 shows the four maps represent regional heterogeneity of the change of the number of job applicants. The color represents decrease (good, bright) and increases (bad, dark) in the number of job applicants from the previous year (i.e., year over year). The maps are generated by data from the actual Reports on Employment Service for October 2020 (Ground Truth), predictions by the proposed model (MF-AGL), predictions by an auto-regressive model (AR), and predictions by a Random Forest model (RF). The auto-regressive (AR) model is a standard model for economic forecasting and time-series prediction. Random Forest (RF) is a standard machine learning method which is especially good for small sample. The AR model uses the number of job applications of past 11 months (i.e., 11 lags) as inputs, i.e.,ˆ= ( −1 , · · · , −11 ). RF model uses dummy variables for year, month, and prefecture as well as the number of job applications of past 11 months (i.e., 11 lags). The MF-AGL model is trained on the data from October 2016 to September 2020. Then the model use feature extracted on October 31, 2020. The AR and RF model is trained on the data from October 2016 to September 2020 and use the data from September 2020 as input. It seems unfair that only the MF-AGL model uses the data from October 2020. However, this is the strength of the nowcasting model. To see this, Table 4 shows the schedule of data availability. On October 1, 2020, we only have label data for July 2020 and before because the Reports for August 2020 are released on October 2. On the other hand, we have features x Oct1 in real-time. As such, on October 31, 2020, our MF-AGL model can use the latest values of features while the traditional AR and RF model can only use the label data for the last month. Now let's turn to the maps in Fig. 4 . One significant difference of the MF-AGL model is its geographical granularity. While the other three figures only tell that the southeast areas are in bad condition (darker), the MF-AGL model shows that there is a mix of bad and good conditions at a granular level. We can see there are darker areas in the granular map that are bright in the ground truth data. The local governments need to take care of these hidden problems. The other finding is that the AR and RF models are not good at prediction. To see this, we first aggregate the prediction by MF-AGL model at prefecture-level and calculate Mean Absolute Percentage Error (MAPE), a standard metric to evaluate forecasting models. The reason we use MAPE is that the number of job applications in each prefecture is proportional to the population. Other metrics such as mean squared error (MSE) and mean absolute error (MAE) are more affected by an error in the populous prefecture while MAPE treats each prefecture equally. The results are shown in Figure 3 : Our aggregate learning architecture for job applicants prediction. We use a simple recurrent neural network composed of an LSTM layer and a multi-layer perceptron as a predictor. Each daily prediction is aggregated as monthly visits count of large area. The loss value is computed by the MSE criterion on the training phase. Oct31 output 0 , · · · , July 0 , · · · , Aug 0 , · · · , Sep input x Oct1 level is not the priority of the MF-AGL model, this result highlights the robustness of the prediction of the model. Finally, we demonstrate another practical usefulness of our model. As discussed in Section 3.1, the beauty of our model is real-time updates of forecasting using granular data. Fig. 5 shows how the model updates the prediction using the real-time data. The model shows the year over year for each city and for each day. That is, we first predict the disaggregate number of job applicants on the same day of the previous year and calculate the year-over-year change in the number of job applicants. The figure implies the rapid improvement in the labor market during October 2020. The Fig. 4. possible reason is the peak-out of the second wave of the COVID-19 pandemic. As shown in Fig. 6 , the number of cases was dramatically decreased in September and stable in October. In this work, we proposed a novel aggregate learning method that adopts to the mixed-frequency data. The model predicts spatiotemporal changes in the economic indices without granular label data. We proposed an LSTM-based simple architecture and the loss function for the training. We applied the model to a real-world task that is the prediction of the number of job applicants in Japan. Our MF-AGL model well predicts the regional heterogeneity at the sub-prefecture level and also the rapid change in the labor market condition during a month. The present model can be applied to broad areas including GDP prediction, labor market prediction, and industrial production prediction. The direction of the future work can be the extension to the other domain and more improvement of the model using more fine-grained architectures. Figure 5 : Changes in the number of job applicants in Japan. Due to the limited space of the paper, we exclude Okinawa islands from the map. The color represents good (lighter)-bad(darker) condition of the labor market from the previous year. The prediction is done by MF-AGL model uses the input available at each day. Big Data for 21st Century Economic Statistics Apple Inc. 2020. COVID-19 -Mobility Trends Reports -Apple Google Econometrics and Unemployment Forecasting State Space Models and MIDAS Regressions Predicting the Present with Google Trends Google It!' Forecasting the US Unemployment Rate with A Google Job Search Index An Aggregate Learning Approach for Interpretable Semi-Supervised Population Prediction and Disaggregation Using Ancillary Data Forecasting GDP Growth from Outer Space Macroeconomics and the reality of mixed frequency data MIDAS Regressions: Further Results and New Directions Deriving Personal Trip Data from GPS Data: A Literature Review on the Existing Methodologies Google Inc. 2020. COVID-19 Community Mobility Reports Understanding the Impact of the COVID-19 Pandemic on Transportationrelated Behaviors with Human Mobility Data From twitter to GDP: Estimating economic activity from social media LightGBM: A Highly Efficient Gradient Boosting Decision Tree Extracting visited points of interest from vehicle trajectories Adam: A Method for Stochastic Optimization The effect of human mobility and control measures on the COVID-19 epidemic in China Publisher: American Association for the Advancement of Science Section: Research Article Variational Learning on Aggregate Outputs with Gaussian Processes Factor-MIDAS for Now-and Forecasting with Ragged-Edge Data: A Model Comparison for German GDP We Need to Change Our Economic Indicators to Keep up with the Crisis Labour Ministry of Health, Welfare Bayesian MIDAS Penalized Regressions: Estimation, Selection, and Prediction Nowcasting Unemployment Rates with Smartphone GPS Data Nudging Preventive Behaviors in COVID-19 Crisis: A Large Scale RCT using Smartphone Advertising Supervised Learning by Training on Aggregate Outputs COVID-19 Mobility Data Network. 2020. Facebook Data for Good Mobility Dashboard Extracting Arbitrary-shaped Stay Regions from Geospatial Trajectories with Outliers and Missing Points Probabilistic identification of visited point-of-interest for personalized automatic check-in PyTorch: An Imperative Style, High-Performance Deep Learning Library Nowcasting Unemployment Rates with Google Searches: Evidence from the Visegrad Group Countries Unsupervised Video Interpolation Using Cycle Consistency zero-shot" super-resolution using deep internal learning Query Indices and a 2008 Downturn: Israeli Data High-dimensional macroeconomic forecasting and variable selection via penalized regression Predicting the Present with Google Trends. SSRN Scholarly Paper ID 1659302 Multi-Task Learning for Aggregated Data Using Gaussian Processes