ATM-Bench
A benchmark for long-term personalized memory QA built from four years of real multimodal life data, with a public leaderboard and open evaluation stack.
Selected Work
My work sits around multimodal retrieval, personal memory, agent tooling, and vision-language reasoning. This page collects the projects that are easier to understand as a portfolio than as a chronological blog feed.
A benchmark for long-term personalized memory QA built from four years of real multimodal life data, with a public leaderboard and open evaluation stack.
An Explain-then-Detect approach for hateful meme detection, using preference optimization and conditional decision entropy to improve rationale-grounded decisions.
A local token and cost dashboard for agent sessions, designed to make multi-agent usage auditable without sending session metadata to a hosted analytics service.
A self-hosted SearXNG and Firecrawl workflow that restores discovery and extraction when coding agents run against non-default model backends.