We created Triage in response to commonly occurring challenges in the development of machine learning systems for public policy and social good problems. While many tools (sklearn, keras, pytorch, etc.) exist to build ML models, an end-to-end project requires a lot more than just building models.
Building systems with predictive models that are going to be used in production require making many design decisions that need to match with how the system is going to be used. These choices then get turned into modeling choices and code.
We need to answer questions such as:
- What should be included in the data (cohort selection)?
- What should our rows consist of (unit of analysis)?
- What is our label? What outcome are we predicting or estimating and over what period of time?
- How should we generate and incorporate time and space varying features (spatiotemporal explanatory variables)?
- How do we deal with time in our training, selection, and validation process?
- Which models/methods and associated hyper-parameters should we try?
- What evaluation metrics do we use to compare and select models?
- How do we compare and evaluate models over time?
- How should we interpret and explain the models and it’s predictions to the user in the loop?
- How do we ensure fairness and equity in our system?
- How to we understand and communicate intervention lists generated by these models?
These questions are critical but complicated and hard to answer apriori. Even when these design choices are made, we still have turn these choices into code throughout the course of a project. Triage is designed around these questions and generates a set of data matrices, models, predictions, evaluations, and analysis that makes it easier for data scientists to select the best models to use.
Triage aims to help solve these problems by:
- Guiding users (data scientists, analysts, researchers) through these design choices by highlighting operational use questions that are important.
- Providing interfaces to these different phases of a project, such as an
Experiment. Each phase is defined by a configuration (corresponding to a design choice) specific to the needs of the project, and an arrangement of core data science components that work together to produce the output of that phase.
Each of these components require careful design choices to be made. Triage facilitates this decision-making process for programmers and developers, with a special focus on tackling data science, AI, and ML problems in public policy and social impact.