Data Science Pipelines and Tools
We believe in open and reusable code and tools. All of our (non-confidential) project code is available under an open source license on our github page. All of our internal data science tools are also available for other organizations to use. Examples of such tools include:
Triage: Our data science pipeline platform that’s used in many of our internal projects, which contains components for generating features, building machine learning models, and evaluating those models.
Entity Deduplication Tool (pgdedupe)
Post-Modeling Tools for analyzing the models built, feature importances, and exploring the outputs of those models before deployment.
Bias Audits: To run bias audits on the outputs of machine learning models