Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SIC codes and workplace assignment #27

Open
Hussein-Mahfouz opened this issue May 13, 2024 · 7 comments
Open

SIC codes and workplace assignment #27

Hussein-Mahfouz opened this issue May 13, 2024 · 7 comments
Assignees
Labels
Task 2 assigning activities to geographic locations validation Model validation and consistency

Comments

@Hussein-Mahfouz
Copy link
Collaborator

In the SPC, not all people with a job are assigned to a workplace. As a result, not all people with a job have a "commute" trip. The SIC codes could be very useful for assigning people to workplaces in our model - is there an issue with the SIC codes, and do they need editing?

Enriched Spenser <> TUS matching logic:

  • Individuals from Spenser are matched to the TUS based on age35g, sex, and nssec: see the findTUSmatch() function

  • Attributes of the matched individual, including sic1d2007 and sic2d2007 are taken from the matched individual

  • If a person in the enriched spenser dataset has a combination of age35g, sex, and nssec that does not exist in the Time Use Survey, they are then matched on age35g and sex only. Things to think about:

    • Is the time use survey representative when it comes to % of employed people?
    • Is the number of employed people in the spc similar to what we should expect in any specific region?
    • Would the matching improve if the age35g column was relaxed. Breaking down the age into 35 groups is very granular. Less groups = Higher matching

SPC commuting location assignment logic

  • workplaces are assigned based on SIC codes which are obtained from the time use survey

  • In the SPC commuting flows logic, workers are assigned to jobs based on SIC. Not all workers are matched, and it seems like there is an acceptable threshold for matching before ignoring the SIC

Our approach to workplace assignment (TODO)

We can use SIC codes as done in spc, but have a fallback logic if SIC code does not exist

@sgreenbury could you please take a lot and edit if it doesn't make sense?

@Hussein-Mahfouz Hussein-Mahfouz added Task 2 assigning activities to geographic locations validation Model validation and consistency labels May 13, 2024
@BZ-BowenZhang BZ-BowenZhang self-assigned this May 21, 2024
@BZ-BowenZhang
Copy link
Collaborator

BZ-BowenZhang commented May 24, 2024

In the current SPC dataset, the working-age population in West Yorkshire is 1,496,784, of which 597,873 people have the workplace assigned.

For West Yorkshire, the total employment recorded in the Business Register and Employment Survey (by MSOA) is 1,025,985, which should be the target number for assigned workplaces.

Possible reason for unmatching

  • For those 913,918 people (working-age population minus those who have workplaces) who do not have a workplace assigned, all have a pwkstat. 32% of their pwkstat is Employee FT, 16% is Student, and 16% is self-employed.
    -359,578 out of 913,918 people have the salary_hourly and salary_yearly attributes, which means they should have been assigned a workplace.
  • 172,104 people have not been assigned a sic1d2007; 69% of them are Students, which is reasonable.

I guess the main reason is when generating the 'job market', the proportion of the sic1d2007 cannot match the numbers in the Business Register and Employment Survey, which causes the part of jobs in each sector to be unmatched even though the overall job number is similar. I plotted the figure for the number of jobs in each sector in the Business Register and Employment Survey versus the number in SPC, which could prove this situation. I believe that could be the main reason for the unmatched workplaces.

Screenshot 2024-05-24 at 03 45 25

@sgreenbury
Copy link
Collaborator

Thanks @BZ-BowenZhang for the update on this, it's very helpful to see the distributions of the two datasets.

@Hussein-Mahfouz
Copy link
Collaborator Author

Notes from today's meeting:

  • TUS is a representative sample of the UK population
  • The SPC matches on nssec + age + sex (link) and not region currently. Industry distributions vary across the country, and matching without accounting for region will probably lead to a population with SIC codes that are not representative of the regional distribution (as reported in the business registry). This could explain Bowen's findings above. We can:
    • Check if region is available in TUS, and how we can use it for matching
    • See if there is any logical grouping of SIC codes. If there is, we can be flexible with assigning people to jobs. When assigning to a location from the business registry, if there are no more jobs in the desired SIC code, assign to a job in the same SIC code group
    • Consider a scaling factor as per Stuart's comment (does this still apply or was this suggested when we thought the TUS sample was all from Oxford?)

@sgreenbury
Copy link
Collaborator

Thanks for adding this @Hussein-Mahfouz.

@sgreenbury
Copy link
Collaborator

Adding notes from discussion with @BZ-BowenZhang for options with increasing complexity:

  • Option 1: Do not use sic code to match employment location to individuals. This is available in SPC by setting sic_threshold higher (e.g. above 0.62). This may be a good option in the first instance since the TUS features associated with the sic1d2007 are not directly used in AcBM.
  • Option 2: Could we scale the number of jobs per business to match workers instead, e.g. if there is business A with 90 jobs and business B with 10 jobs for SIC sector C but there are 150 workers, then apply scaling in proportion to the number of jobs which means A now has 135 jobs and B has 15 jobs. While this will provide all SPC people with workplaces, an issue is that we consider the BRES to be more accurate in representing the number of jobs by SIC code. A mixed assignment approach is an option where the randomly unassigned people are given a matching workplace from the remaining pool of jobs after the SIC code matching. This would be an update in the job market matching code to implement a two-stage matching procedure.
  • Option 3: Alternatively we might consider rematching TUS individuals including region-specific information (TUS is across UK and has region variables available duresmc [Government Office Regions and former Metropolitan Counties] and dgorpaf [Government Office Regions], see TUS). This is more complicated, further upstream in the current pipeline and may have small sample size and other characteristics involved in the matching reducing the variation in time use for a given region.
    • Next step: analyse the distribution of sic1d2007 (requires mapping the sic20070 code in, see here) when broken down by region in TUS and compare to the BRES distribution. If the distribution more closely matches the BRES, this would further motivate including regional matching.

@sgreenbury
Copy link
Collaborator

@Hussein-Mahfouz for reference

@BZ-BowenZhang
Copy link
Collaborator

Update on 17th July:

The new SPC dataset without SIC code assigning has been tested, and the matching results are slightly improved:

Previous 597,873 assigned, 898,911 unassigned
Now 656,296, assigned, 840,488 unassigned

There is still a gap between the current number and the target number from the Business Register and Employment Survey (1,025,985). The mismatches in the SIC code have not been resolved, so further checks of the matching process may be needed.

Picture 1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Task 2 assigning activities to geographic locations validation Model validation and consistency
Projects
None yet
Development

No branches or pull requests

3 participants