Past work
Mathesar (Tech Lead)
Mathesar is an open‑source, self‑hosted web application that makes working with PostgreSQL simple for teams of all skill levels. It provides a familiar, spreadsheet‑like interface for viewing, editing, querying, and collaborating on data—while keeping data on your own servers and using native Postgres roles and privileges for access control.
What this demonstrates for clients
- Building product-quality data tools on top of existing databases (Postgres-first, minimal abstraction).
- Leading delivery across a multi-language stack (Svelte/TypeScript + Python/Django + Postgres).
- Designing usable workflows for non-technical users (tables, relationships, query builder “explorations”, forms).
- Operating in open source: shipping, reviews, coordination with a broad contributor community.
Links: GitHub (Mathesar) • Project site (mathesar.org)
CC Catalog / CC Search Catalog (Senior Data Engineer)
CC Catalog (archived/transfered to WordPress as Openverse Catalog) is an open-source data pipeline project focused on identifying and collecting Creative Commons–licensed works at web scale. It combined large-scale processing of Common Crawl data with ETL jobs pulling from public APIs, producing structured outputs used downstream for search and indexing.
What this demonstrates for clients
- Designing and operating large-scale data pipelines (web-scale ingestion, transformation, and outputs suitable for indexing).
- Practical distributed processing with Apache Spark on AWS EMR (parsing crawl metadata and extracting signals).
- Workflow orchestration and scheduled ETL using Apache Airflow (daily and monthly jobs).
- Production-minded engineering: reproducible pipelines, clear workflow boundaries, and data-quality considerations.
Links: GitHub (CC Catalog archive) • Current home (WordPress/openverse-catalog)