key: cord-183600-wxfq90zq authors: Arthur, Rudy title: Studying the UK Job Market During the COVID-19 Crisis with Online Job Ads date: 2020-10-07 journal: nan DOI: nan sha: doc_id: 183600 cord_uid: wxfq90zq The COVID-19 global pandemic and the lockdown policies enacted to mitigate it have had profound effects on the labour market. Understanding these effects requires us to obtain and analyse data in as close to real time as possible, especially as rules change rapidly and local lockdowns are enacted. In this work we study the UK labour market by analysing data from the online job board Reed.co.uk. Using topic modelling and geo-inference methods we are able to break down the data by sector and geography. We also study how the salary, contract type and mode of work have changed since the COVID-19 crisis hit the UK in March. Overall, vacancies were down by 60 to 70% in the first weeks of lockdown. By mid September numbers had recovered slightly, though were still down by 40 to 50%. Broken down by sector, vacancies for hospitality and graduate jobs are greatly reduced, while there were more care work and nursing vacancies during lockdown. Differences by geography are less significant than between sectors, though there is some indication that local lockdowns stall recovery and less badly hit areas may have experienced a smaller reduction in vacancies. There are also small but significant changes in the median salary and number of full time and permanent jobs. In addition to these results, this work presents an open methodology that enables a rapid and detailed survey of the job market in these unsettled conditions. The COVID-19 pandemic has claimed over 40,000 lives in the UK as of the June 24th [1] . The most drastic measure to limit the spread of has been the imposition of so-called lockdown measures. 'Lockdown' refers to national or regional orders calling for the closure of businesses and restriction of assembly and travel. Lockdown policies have not been uniform across nations. Some areas, for example Sweden and Japan, implemented essentially voluntary measures [2, 3] while others, for example China and Germany, imposed and enforced quite severe restrictions on assembly and business opening [4, 5] . The UK's lockdown policy was somewhere between these two extremes. Beginning somewhat later than many other European nations on March 21st the UK government introduced the The Health Protection (Coronavirus, Business Closure) (England) Regulations 2020 [6] which was superseded by the The Health Protection (Coronavirus, Restrictions) (England) Regulations 2020 on March 26th [7] . This piece of legislation, which we will hereafter refer to as 'lockdown', included restricted freedom of movement, bans on gatherings and enforced business closures. The lifting of some of these rules began on May 13th, though in some heavily affected areas stricter measures were retained or re-imposed [8, 9] and the rules have been subsequently modified e.g. 'the rule of six' [10] . The economic impacts of the COVID-19 crisis have been severe. The UK saw a 125.9% increase in unemployment claims between March and May and vacancies dropping by 58% over the same period [11] , though these impacts seem not to have impacted some headline employment indicators by May [12] . To provide support during this period the UK government introduced unprecedented measures: namely the Coronavirus Job Retention (furlough) Scheme [13] to attempt to keep unemployment rates in check by providing grants to pay up to 80% of salaries. The academic study of lockdown has necessarily been re-active and observational. The rapid onset of the crisis has meant that researchers have had to source and analyse real time labour market information. For example Bick et. al [14] use an online labour market survey; Chetty et. al. [15] use anonymised data from several large companies; Hensvik et. al. [16] use vacancy postings while Forsythe et. al [17] use data from the job market analysis company Burning Glass 1 . This methodology yields useful insights into the effect of the COVID-19 crisis on the labour market. For example [17] show a 44% drop in vacancy postings between February and April in the US, observed across occupational categories (essential or non-essential work) as well as states which may have had different lockdown policies. Bick et. al. [14] report similar drops in the US across sector and demographics during the same period, with a slow recovery in the months after. Analysis of vacancy postings in Sweden, which has had probably the least restrictive lockdown in Europe, shows a drop in job adverts by around 40% [16] . Hensvik et. al. also report that job seekers are searching less intensively and redirecting their searches towards less severely hit occupations. The effect of the crisis on different demographic sectors of the economy has exacerbated previously existing inequalities [18] . In Europe Adams-Prassl et. al. [19] found that workers in Germany were insulated from job losses by longstanding institutional frameworks, compared to workers in the UK who are in a much more precarious position. The same work also finds that job losses and reductions in earnings disproportionately affect women, workers without a university degree and younger people. There is still serious academic debate about the efficacy of lockdown as a disease control measure [20] , how it should be implemented [21] and the negative and unintended consequences of these policies [22] . This work is an observational study, using jobs posted on an online job board to study the effect of COVID-19, lockdown and other disease control measures on the job market in the UK. We will use topic modelling and location inference methods on job vacancy data to cross section the data by geography and occupational sector. The aim is to provide insight into the magnitude of the effect of COVID-19 and associated policies on job vacancies across sectors and geographies. In particular we will look at • The number of job postings by date. • Time series of vacancies by sector. • Time series of vacancies by geographic region. • The distribution of salary; type of contract (full time, part time, fixed contract) and mode of work (permanent or temporary) before and after the COVID crisis hit in 2020. The technical approach we take is similar to that used by Burning Glass or [23] . In [23] Turrell et. al. used an unsupervised machine learning method on a corpus of online job adverts to discover how the labour market is segmented. In this work we use a similar data set but apply a supervised method which gives more robust results. We also add a location inference step to enable a geographic analysis. We collect jobs from the online job board Reed.co.uk. Reed is a large recruitment agency and owner of the UK's first recruitment website, which recieves around 7 million visits per month 2 . Apart from its popularity, what makes it suitable for this analysis is an API 3 which allows for download of job adverts. We have collected job adverts from Jan 1st 2020 up to September 26th 2020, 6 months after the UK lockdown began. This gives us 3 months of pre-crisis data and 6 months of post-crisis data -in total 1369355 job adverts are used. This should be sufficient to establish a baseline of vacancy information before COVID-19 became a national issue, the entire lockdown period and a significant post lockdown period so that we can study any potential recovery. With less than a year of data we cannot see seasonality, especially seasonal winter jobs which fall outside the study period, or long term trends, for example trends due to Brexit or global economic downturns or upturns. Figure 1 shows the number of job adverts collected per day. Job adverts which have been deleted or removed are returned by the API as JSON (Javascript Object Notation 4 ) objects with all null values and are not included in Figure 1 . There is no indication that these removed job ads have significant impact and they represent only around 3% of the total number of records, spread fairly uniformly across the period. Figure 1 shows some trends quite clearly. We have indicated some key events on the plot, as well as the period we have defined as 'lockdown'. We can see that the first recorded case of coronavirus in the UK on the 31st of January 2020 is associated with no particular change in trend. At this point coronavirus was not considered to be a major issue in the UK. The major drop in the number of job ads posted starts on March 16th when UK Prime Minister Boris Johnson issued advice against "non-essential" travel and contact. By the time lockdown began the number of ads was reduced to under half of the early year baseline. A major spike in job ads is associated with the second amendment to the Health Protection Act [7] , which allowed for the re-opening of certain businesses and services e.g. garden centres, tennis courts and recycling centres. Aside from this spike, the number of jobs posted per day has slowly grown from its nadir in mid-April, though has still only recovered to around 50% to 60% of January and February levels. There is an interesting spike in the week beginning the 14th of September. This is caused by the NHS Business Services Authority advertising 8547 and 7831, mostly nursing, jobs on the 16th and 18th of September respectively. Subtracting these ads puts this week in line with the previous several weeks. Though the main site is searchable by topic and sector, the JSON payload returned by Reed's API does not include a theme or topic marker and so this must be inferred. Algorithms such as LDA [24] and Doc2Vec [25] transform text documents to low dimensional vector representations which enhance automatic topic detection algorithms. LDA has previously been used with success on a very similar data set [23, 26] . We attempted to model our data using both of these methods. Criteria Number hotel "hotel" in employername or ("premier inn" in cleandescr and "discounted" not in cleandescr) or ("housekeep" in cleantitle and "hotel" in cleandescr) or ("chef" in cleantitle and "hotel" in cleandescr) graduate "graduate scheme" in cleandescr 108 nurse "registered nurse" in cleantitle 720 care "care assistant" == cleantitle 1162 teacher "secondary teacher" == cleantitle or "primary teacher" == cleantitle 161 software "software developer" in cleantitle or "software engineer" in cleantitle 776 delivery "delivery driver" in cleantitle 238 Table 1 : The criteria for each set of seed job adverts and the number matched in the training set. cleantitle and cleandescr are the job title and job description after applying the pre-processing steps described in the text, employername is the name of the organization posting the ad. While some categories of job are readily detectable e.g. software developer, the topics or clusters detected are often not stable when varying the algorithm parameters. Measures like coherence [27] fail to provide obvious evidence in favour of any one parameter set. Thus we do not use unsupervised methods in this work. Rather than infer job classes from the data as [23] did, we want to study predetermined categories of job e.g. nursing, teaching or graduate jobs. We therefore take a different and somewhat simpler approach which nevertheless suffices to identify topics. For each ad we combine the job title with the job description to constitute a 'document'. The document text is cleaned to remove HTML artefacts, lower-cased, lemmatised using the wordnet lemmatiser [28] and tokenised. Using data from 2019 (Reed job ids 39500000 to 39600000) we collect a number of 'seed' documents. These seeds are collections of job ads which are representative of given sectors S c = {d c1 , d c2 , . . . , d cn }. The label c is a job category e.g.' teacher' and the documents d ci are job adverts definitively in that sector. Table 1 shows the criteria for collecting the seeds and the number in each set. The sharp increase on the right corresponds to jobs which are very similar to the jobs in each seed. The cutoff is chosen to be 0.1, this is rather high, but the results in Section 3 are robust to small variations in the cutoff. The seeds were chosen to highlight job categories which were likely to be impacted differently by the COVID-19 crisis. We expected delivery, nursing and care work demand to increase, while we expected hospitality and graduate jobs to decrease, the latter expectation informed by [19] . Software was chosen as a job which could in principle be done remotely so it would be interesting to see how this affected vacancy numbers. Finally we expected the number of teachers required to be largely unaffected by the crisis. These categories are not exhaustive and do not represent the only job categories detectable in the data. We used Gensim's implementation of TF-IDF [29] to transform each document into a vector and compute the cosine similarity of every job in our 7 2020 collection with the vectors in each seed set. The average similarity score of the target job d t with the jobs in the seed set is used as a measure of the similarity of the target job with the job category similarity c = 1 n n i cos (d t , d ci ) . Figure 2 shows the ranked similarity of each job in the 2020 data set to each category. There is a sharp increase on the right hand for jobs which are of the same type as the ones in the corresponding seed set. We choose a cutoff value of similarity c > 0.1 and classify a job as a member of the corresponding category if its similarity is higher than the cutoff. If a job advert has similarity higher than the cutoff with multiple categories we assign it to the category with which it has maximum similarity. Small variations in the value of the cutoff slightly affect the absolute number of jobs matches, but not the relative numbers, which is what we aim to study. Location inference is done using the 'location' field in the advert's JSON. Some adverts are only localised at the county level, e.g. Devon. These are identified by checking against a list of UK administrative counties. The rest of the adverts are checked by querying the location field against the Geonames [30] and Nominatim databases [31] , in that order. Geonames is effective at returning co-ordinates for larger towns and cities [32] . Nominatim is a web based geo-coding service which uses OpenStreetMap data to find locations by name and address. Since the same locations re-occur multiple times, every Nominatim lookup is saved in a database which is queried before calling the web service. This process returns GPS coordinates or bounding boxes for 97% of all non-null job adverts. Figure 3 shows the spatial distribution of job ads, localised to NUTS2 regions [33] . The geographic distribution of job adverts roughly corresponds to the UK population distribution. The number of ads normalised by population (the bottom map in figure 3) , shows a higher density in the south east. This could be reflective of economic disparities within the UK [34] as well as the relative popularity of the job board Reed.co.uk in different regions. Even without knowing how use of Reed is spread across the UK we can still look at relative trends. Some changes may reflect a drift in the user base of the site, but large and sudden changes are likely to be reflective of significant exogenous events. This section presents a collection of time series for different job classes identified by our topic modelling approach over the study period. These time series are accompanied by word clouds constructed from the titles of the matched job adverts to give a sense of the job category. The captions of the figures will give some commentary on the observed trends. The job categories have been chosen to be representative of some key sectors and to show interesting trends, they do not represent every job sector that can be found in the data. [19] we find a drastic drop in the number of adverts for these schemes, with a very slow recovery in the weeks after lockdown. . This time series also shows an increased demand for care workers during the crisis, especially during the lockdown period. The closure of many services through this period likely increases the need that older or disabled people have for special assistance. Figure 8 : Jobs matching 'teacher'. This time series shows a slight decrease in the demand for teachers during lockdown, however some of the backlog seems to have been filled once the first easing of restrictions was announced. After lockdown the number of vacancies in this sector is still down slightly, but is reduced far less than other sectors and the average. With most high street stores closed throughout lockdown the number of deliveries could reasonably be expected to increase. However, at the beginning of lockdown the number of adverts for drivers decreased significantly, in line with trends in other sectors. With the easing of restrictions some of this backlog was filled and the number of adverts is now above pre-crisis levels. This unexpected behaviour shows that companies reacted to lockdown by reducing costs, rather than hiring in anticipation of increased demand. The regions of the UK were not affected equally by COVID-19. As of writing the death rates per 100,000 in the North East and North West were 80.7 and 85.7 respectively, while the rate in the South West was only 33.7 [1] . In some areas with high case rates a local lockdown was imposed. The first of these was the city of Leicester and surrounding areas on July 4th [9] , with another significant local lockdown implemented on July 25th in the area of Blackburn with Darwen [8] . Large areas of the North of England were subsequently subjected to more severe restrictions than the rest of the UK [35] . In this section we will examine how the effect of the crisis on vacancies was distributed across the UK; we will compare regions which were affected to different degrees by COVID-19 and examine if local lockdowns have a compounding effect on depressing the job vacancy data. Below we show time series for different geographic regions, with commentary in the captions. The number of job adverts in the North East. The north east was one of the worst affected regions in the UK. Surprisingly, unlike the pattern for the whole of the UK and Greater London, after the lockdown eased the number of job adverts recovered to near pre-crisis level by mid-August. However, note that the absolute number of jobs is significantly lower, at around 200 postings per day. Figure 13 : Showing the number of job adverts in the North West. The North West was the worst affected region in the UK and subject to different rules after August 5th [35] . Despite this, the time series looks broadly similar to the one for Greater London, Figure 11 , and the UK as a whole, Figure 1 . [9] . There is no marked change in the number of job adverts in response to this local restriction, with the count staying around its previous (depressed) level. Lifting the local restrictions does correspond to an increase in the number of adverts posted, but this increase is slight and in line with national trends, so determining causality requires further investigation. Figure 15 . There is a suggestion that job vacancies have stayed depressed during the local lockdown. Blackburn is part of the North and remains subject to different rules [35] , so lockdown in this area is ongoing despite the end of the specific period of local lockdown. The spike immediately following the end of the local lockdown is observed nationally (corresponding to a large NHS recruitment campaign) and is not caused by lockdown lifting. Other interesting information associated with job adverts includes the salary, contract type (temporary, permanent or contract) and mode of employment (full time or part time). In this section we investigate if there has been any change in the frequency of different contract types, modes of employment or distribution of salaries. To do this we split the data into two sets. In one set we take job ads posted before 07/03/2020, which we identify as the last normal week before the corona crisis started to affect the number of jobs ads posted (see Figure 1 ). In the other set we take job ads posted after 26/03/2020 which is the first Monday after the enactment of lockdown. Figure 17 shows the proportion of jobs advertised as temporary, permanent or contract, Figure 18 shows the proportion of jobs advertised as full time or part time and Figure 19 shows the distribution of advertised salaries. There is a very small (but significant under a t-test and KS-test, p-value < 10 −10 ) increase in mean and median annual salary. There is also a slight but significant (χ 2 test p-value < 10 −10 ) trend towards part time and nonpermanent jobs. The data tells us that the job market has been hit hard by the COVID-19 crisis. Companies have responded by reducing hiring significantly. Certain sectors, like hospitality and graduate recruitment, have been particularly affected while others, like care work and teaching, are relatively unaffected. The conditions of work in a post COVID-19 world, at least in terms of contracts, hours of employment and salary seem to be broadly unchanged but with a slight shift towards higher salary, non-permanent and part time work. One hypothesis is that the lower paying jobs aren't being advertised, increasing the median salary, while the jobs that are being advertised are shifting towards part time and less fixed contracts, as companies try to hedge against future uncertainty. There may also be seasonal and long term trends. A full understanding of this pattern requires more data. Regional differences are not as strong as sectoral ones, mirroring the results obtained in the US by [17] . The North East seems to have recovered faster than the national average, but the absolute number of adverts in this area is much lower than in other regions. This could be due to a number of factors: a previously depressed local job market; the local (un)popularity of Reed or even that the types of jobs available in this region not being openly advertised online. Further investigation is required before any strong conclusions can be drawn. It also seems that local lockdowns do not have a compounding effect on the job market but may postpone recovery, though again more investigation is required to fully support this conclusion. Future work could look at improving our topic detection algorithms, for example using a semi-supervised approach to detect known sectors [36] . We could also perform topic detection within sectors to study changes in job descriptions in response to new conditions of work e.g. an increase in home working [37] . The topic and geographic filters can be combined to look at e.g. 'teaching jobs in the south west' and the salary and contract type analysis can be performed on a sectoral or geographic level. The most effective way to enable stakeholders to get this information is likely the creation of an interactive dashboard using the methodology and data described in this work. The research of [23] as well as the company Burning Glass shows that using machine learning and geo-inference techniques on job ads can help us to understand labour market trends and skills demand. This research shows these methods can be used to study short term shocks to the labour market, such as COVID-19 and associated lockdown policies. This paper presents a simple and transparent way to study the effect of economic shocks and major government interventions on labour market activity across employment sectors and geographies. This is complementary data to e.g. survey methodology [15] and is accessible to academics, unlike the very useful but opaque methodology of [17] and Burning Glass. Though 'evidence based policy' is something of a catchphrase [38] , it is of course desirable that policy makers consider all the available information before taking action. The effect on the job market should be considered when extending or relaxing lockdown rules and especially in relation to extending or ending furlough and other compensation schemes. We hope that this paper and its methods provide valuable insights for broad or targeted policy interventions. Swedish public health authority guidance Japan state of emergency declaration Wuhan lockdown travel restrictions The health protection (coronavirus, business closure) (england) regulations 2020 The health protection (coronavirus, restrictions) (england) regulations 2020 The health protection (coronavirus, restrictions) (blackburn with darwen and luton) regulations 2020 (si 2020/800) The health protection (coronavirus, restrictions) (leicester) regulations 2020 (si 2020/685) ) (england) (amendment) (no. 4) regulations 2020 Iza covid-19 crisis response monitoring: short-run labor market impacts of covid-19 UK unemployment Coronavirus job retention scheme Real-time labor market estimates during the 2020 coronavirus outbreak Real-time economics: A new platform to track the impacts of covid-19 on people, businesses, and communities using private sector data Job search during the covid-19 crisis Labor Demand in the time of COVID-19: Evidence from vacancy postings and UI claims Determinants of disparities in covid-19 job losses Coronavirus disease 2019: the harms of exaggerated information and non-evidence-based measures Physical distancing interventions and incidence of coronavirus disease 2019: natural experiment in 149 countries An improved measure of deaths due to covid-19 in england and wales Using online job vacancies to understand the uk labour market from the bottom-up Latent dirichlet allocation Distributed representations of sentences and documents Using job vacancies to understand the effects of labour market mismatch on uk output and productivity Exploring the space of topic coherence measures WordNet: An Electronic Lexical Database, Bradford Books Software Framework for Topic Modelling with Large Corpora Social sensing of floods in the uk Relationship of nuts areas to uk administrative geographies Divided and connected: Regional inequalities in the north, the uk and the developed world The health protection (coronavirus, restrictions on gatherings) (north of england) regulations 2020 Labeled lda: A supervised topic model for credit attribution in multi-labeled corpora COVID-19 and remote work: an early look at US data What is wrong with evidence based policy, and how can it be improved?