Preventing San Jose Housing Violations

Partner(s): City of San Jose
Status: Transitioning technology to partner
Github Repo: https://github.com/dssg/san_jose_housing
Team: Kit Rodolfa, Jane Zanzig, Erika Salomon, Klaus Ackermann, Lauren Haynes

Summary
The City of San Jose conducts regular inspections of its multiple housing units (e.g. apartment buildings and hotels). San Jose assigns case loads to its inspectors based on two factors: number of prior major code violations (properties with more major violations are inspected more frequently) and the amount of time since the last inspection (properties inspected longer ago are prioritized). Although this approach incorporates two risk indicators, it neglects other important information, such as records of building repairs. In collaboration with the Multiple Housing inspection unit DSaPP built a predictive model to more accurately prioritize the “worst first” as San Jose doles out case loads to inspectors. With this system, San Jose is able to identify when properties are likely to have a violation, thus finding more violations than they do using the number of past violations alone.

Background
Across the country, housing quality has not kept pace with housing prices, which puts the health and wellbeing of poor occupants at continued risk. The 12% of San Jose residents who live in poverty are no exception, and they deserve to live in housing that is safe, decent, and sanitary. As Matthew Desmond and Monica Bell’s report on Housing, Poverty, and the Law states, substandard housing conditions ”exert a powerful influence on health, job stability, educational success, and other matters essential to well-being.” In San Jose , ensuring safe, decent, and sanitary housing conditions for buildings with three or more units under one roof is the job of the Multiple Housing unit of Code Enforcement.

Scope
By building a stronger predictive model for high-risk properties, the City of San Jose (CSJ) will increase the safety and well-being of residents living in many rental properties. San Jose assigns case loads to its inspectors based on two factors: its number of prior major code violations (properties with more major violations are inspected more frequently) and the amount of time since its last inspection (properties inspected longer ago are prioritized). San Jose conducts approximately 300 proactive inspections (i.e., no in response to complaints) every three months. The goal of the model is to determine which 300 properties are most at risk of a health and safety violation as targets for inspection in the following three months so that properties can be inspected based on their risk of violations at the time of inspection (rather than their overall risk).

Data
This project used data from two CJS systems: multiple housing property information in AMANDA (planning, building, property, business info) and inspections information in the Code Enforcement System (complaints and actions taken). We supplemented this with data about the area surrounding each property by using the US Census Bureau’s American Community Survey.

Analysis
DSaPP software generated thousands of machine learning models and evaluated the predictive accuracy of each, simulating how every model would have performed had it been in use in San Jose since 2012. Each model used the data available at a given date in the simulation (e.g., January 1, 2012, April 1, 2012, and so on) to generate a list of properties to inspect in the following three months. We looked for models that consistently selected for inspection properties known to have violations in the three months following its prediction.

Results
Even though most building inspections already found serious violations, the model that we selected identified properties with serious violations at a 25% higher rate than current inspection practices in our simulations. We tested this model in the real world by conducting a three-month trial in which some inspections were based on the current practice (combining previous violations and time since last inspection) and others were based on our model. Although the current strategy and the model had substantial overlap in the properties they prioritized, where they differed, properties targeted by just the model showed an 8 percentage point lift in the violation rate relative to those targeted by just the current strategy (75% of properties inspected showing major violations vs 67%).

Future Plans and Areas for Improvement
We are working with CSJ’s IT department to transfer the software developed in this project to their infrastructure so that the Code Enforcement group can deploy the model’s predictions on an ongoing basis and CSJ can continue to develop and improve the system. These improvements may include generating better predictions using new data. As part of this project, we identified that many building codes cover a wide range of violations that vary in the immediacy and severity of their threat to tenant health and safety. Inspectors are now collecting data on the severity of violations that can eventually be used in the predictive models to identify properties at risk of the most severe kinds of violations.