We run training programs, workshops, and tutorials for students, government agencies, non profits, foundations, and corporations. Some of our trainings include:
  • The value of data driven decision-making (for managers and executives in government agencies and non profits)
  • How to scope data science projects
  • Assessing your Data maturity
  • Hands-on Technical Trainings including the Applied Data Analytics for Public Policy (with Coleridge Initiative)
Our trainings for governments and non-profits are designed for Directors and Executives of organizations as well as Analysts and Policymakers.

Data Science Projects

We work with governments, non-profits, and other organizations on data science projects across health, criminal justice, public safety, education, economic development, transportation, and more. Most of our projects tackle operational problems that have tangible impact, and result in software that can be used by our partner organizations (and others) for social impact and improved policies. Recent examples of our projects include:
  • Building Data-Driven Police Early Intervention Systems
  • Prioritizing Preventative Lead Hazard Inspections
  • Prioritizing Health and Safety Housing Inspections
  • Reducing incarcerations by identifying at risk individuals in need of social services

Research Areas

Our research initiatives are motivated by working on hands-on data science projects with governments, non-profits, and other policy organizations. As we tackle policy problems, we identify open areas where existing methods from computer science, machine learning, artificial intelligence  or social sciences are lacking and formulate our research initiatives to fill those gaps. We then push the results of our research back into our data science tools so they can be used across our projects and by our project partners. We are currently working on:
  • Auditing  and Correcting for Bias and Equity Issues in Data Science Systems
  • Increasing the interpretability and transparency of machine learning models used in policy decisions
  • Designing experimental validation methodologies for machine learning systems
  • Developing methods for monitoring and updating deployed data science systems

Data Science Pipelines and Tools

We believe in open and reusable code and tools. All of our (non-confidential) project code is available under an open source license on our github page. All of our internal data science tools are also available for other organizations to use. Examples of such tools include:
  • Triage: Our data science pipeline platform that’s used in many of our internal projects, which contains components for generating features, building machine learning models, and evaluating those models.
  • Entity Deduplication Tool (pgdedupe)
  • Post-Modeling Tools for analyzing the models built, feature importances, and exploring the outputs of those models before deployment.
  • Bias Audits: To run bias audits on the outputs of machine learning models