Workforce Data Initiative

Partner(s): Alfred P. Sloan Foundation, National Economic Council, Department of Labor
Status: Transitioning work to new organization
Github Repo:
White Paper:
Team: Tristan Crockett, Eddie Lin, Matt Bauman, Hyunzoo Chai, Matt Gee, Christina Sung


The Workforce Data Initiative (WDI) is a national effort lead by the the National Economic Council, Department of Labor, and DSaPP aimed at dramatically improving the availability of rich labor market information through an innovative new public-private partnership model for administrative data. This model protects individual privacy and the business interests of participating institutions, while also generating a new public resource for research and applications.

The rapid growth in the use of data for business hasn’t led to a proportional growth in the availability of data for research. Significant legal and technical hurdles prevent the majority of empirical social scientists from using administrative data, from either public or private organizations,. This is especially true for workforce data. Massive new unstructured data sources from private companies and increasingly integrated state/federal employment and wage data provide some of the most important emerging opportunities for study of the labor market. These data sources lend themselves to the study of topics as varied as the demand for skills, human capital formation, occupational choice, and the return on investment in education and training.

The WDI seeks to establish the data architecture for the next generation of workforce research, combining dynamic, unstructured private data sources with public administrative data. The data made available through this partnership will result in a more comprehensive view of labor and business dynamics and aid in developing a diverse array of support services.

Open Skills API

Application developers are best aided by an easy to use API that provides information about competencies, occupations, and how they relate to each other. With this, they can build useful interfaces for job seekers or others who wish to explore the labor market. To satisfy this need, DSaPP collaborated with the Department of Labor to build the Open Skills API using existing survey-based labor market data.

Research Hub

Economists and other researchers are best aided by tabular datasets describing the state of the labor market over time. To satisfy this need, the team built the Research Hub, containing data on counts of job title, combined with, occupation codes, extracted competency, and wage data for different geographic areas over time. The code that produced these datasets was made available open source.


The workforce data research is now moving toward a large scale data processing and analysis direction. Lack of technical capacity among social science researchers for working with large, unstructured job posting data sources will be solved to some degree with the skills-ml library. By wrapping algorithms such as machine learning, word embedding, into higher-level workforce-related functions and classes, it could be easier for users to build a data processing pipeline, train a classifier, and create a curated dataset.


Training Provider Outcomes Toolkit aims to collect and provide data of training providers’ records and outcomes data for further analysis. These training providers range from small trade apprenticeships to community colleges to multi-state organizations. The Workforce Innovation and Opportunities Act of 2014 requires performance reports from training providers in order for them to receive federal funding. This mandate is accomplished through state and local workforce development boards, who have the authority to negotiate connections to wage and unemployment databases.

Competency Framework Data

  • O*NET: Containing hundreds of standardized descriptors of occupations and competencies on almost 1,000 occupations covering the entire U.S. economy.
  • ESCO: European version of O*NET

Job Postings

  • NLX: Job posting data from National Labor Exchange. They collects and distributes job openings exclusively found on corporate career websites and state job banks.
  • VA: Virginia open job posting data set.
  • CareerBuilder: Job posting data from CareerBuilder.

Representative Analysis

Generate comparable statistics by occupation, geography, and time from both the Data@Work research database and relevant national labor market datasets released by the Bureau of Labor Statistics.

The team organized a community of interested parties around monthly researchers calls, with an audience of members from the public, private, and academic sectors. The team would present works in progress at these calls to get feedback, and recruit collaborators to give guest presentations to the rest of the group.

Data from the Research Hub was used by economics researchers to write two separate white papers about different segments of the labor market over time.

The Open Skills API has been used to prototype different applications aimed at job seekers, including one that was presented to the White House Opportunity Project, and multiple that were pitched directly to a group of underserved job seekers.

Future Plans and Areas for Improvement
Future work will be guided by the T3 Innovation Network, a collaboration headed by the US Chamber of Commerce Foundation and involving 150+ organizations from the public, private, and academic sectors.
The team’s white paper detailing the Skills-ML development process discusses many technical improvements that would help bring the research forward.