About

My default has always been to get something real into production as fast as possible, then improve from there. That rhythm shaped how I think about both data engineering and data science: define what's enough to be useful, ship it, watch how it behaves, sharpen it.

I've built most of my career in small data teams of two or three people. That context shaped how I work: I own the full stack across data engineering and modeling, which means a faster path from question to answer, and answers that hold up in production.

I work at Cloud and Big Data scale, across the full data stack. In the last year I've added Claude Code, and it's genuinely changed how I work. I spend more time on the parts that require judgment, and I'm constantly testing in practice how far AI can go in data work: what it can replace, what it can augment, and where human judgment still matters more than people think.

I care about clean abstractions, reproducible environments, and systems that explain themselves. The right system multiplies everything built on top of it. Getting that foundation right is always worth the time.

When I'm not wrangling pipelines, I'm reading, writing about data strategy, or exploring how technology shapes the way we make sense of things.

Experience

2024 — PRESENT

Senior Data Scientist · Madbox

Principal of end-to-end data pipelines, data infrastructure, and data governance across the full GCP stack. Developed LTV modeling using chain-ratio decomposition and maximum likelihood estimation, directly improving UA bid strategies. Built churn models to improve user retention and performed uplift analysis to measure their causal effect.

PythonBigQuerydbtLTV ModelingGCP
2022 — 2024

BI Engineer · Madbox

Migrated all data models from BigQuery-orchestrated Airflow pipelines to dbt. Optimized the Pub/Sub event streaming pipeline, halving infrastructure costs. Designed the iOS attribution pipeline post-ATT and defined the Conversion Value schema using unsupervised machine learning.

BigQuerydbtAirflowPub/SubPython

Projects

01/

Banking Fraud Detection Pipeline

End-to-end fraud detection on 13M credit card transactions with a LightGBM + focal-loss classifier served via FastAPI on Cloud Run. BigQuery/dbt feature layer, Terraform-managed GCP, and a LangChain agent that generates PDF reports.

LightGBMBigQueryFastAPIGCP
02/

BigQuery Air Quality Forecasting

Multi-pollutant hourly forecasting across 25 Seoul stations using a LightGBM ensemble with conformalized quantile regression for calibrated prediction intervals. Supervised anomaly detection, dbt on BigQuery, FastAPI on Cloud Run.

BigQueryLightGBMForecastingConformal Prediction

Products

sessum_ai

Transforms Claude Code session transcripts into a structured knowledge base in Obsidian. Bronze/Silver/Gold processing pipeline with cross-project concept deduplication and zero additional API cost.

PythonSQLiteClaude CodeObsidian

Kindle Highlights

Web app for organizing and searching Kindle highlights. Parses clippings files, deduplicates at the database layer, and renders with an amber highlighter effect.

Next.jsPostgreSQLTypeScript

ccloquells

Portfolio website for Catalan writer and translator Carme Cloquells Tudurí. Responsive, SEO-optimized, with dynamic routing for books and publications.

Next.jsTypeScriptTailwind

beacon-dqaWIP

Data pipelines test framework with integrated AI resolution.

PythondbtAI