Research

Selected Work

Research systems, benchmarks, and tools I keep building on.

My work sits around multimodal retrieval, personal memory, agent tooling, and vision-language reasoning. This page collects the projects that are easier to understand as a portfolio than as a chronological blog feed.

Benchmark · Personal AI · Multimodal

ATM-Bench

A benchmark for long-term personalized memory QA built from four years of real multimodal life data, with a public leaderboard and open evaluation stack.

Project Paper Code Blog

ICLR 2026 · Hateful Meme Detection

ExPO-HM

An Explain-then-Detect approach for hateful meme detection, using preference optimization and conditional decision entropy to improve rationale-grounded decisions.

Project Paper Code Blog

Agents · Local Tooling

Tokdash

A local token and cost dashboard for agent sessions, designed to make multi-agent usage auditable without sending session metadata to a hosted analytics service.

Blog Code

Agent Infrastructure · Self-hosting

Owned Search for Agents

A self-hosted SearXNG and Firecrawl workflow that restores discovery and extraction when coding agents run against non-default model backends.

Blog

Reading Paths

Personal AI Multimodal Agents Machine Learning

Research systems, benchmarks, and tools I keep building on.

ATM-Bench

ExPO-HM

Tokdash

Owned Search for Agents

Reading Paths

我持续构建的研究系统、基准与工具。

ATM-Bench

ExPO-HM

Tokdash

给智能体一个自己的搜索引擎

阅读路径

FEATURED TAGS

ABOUT ME

FRIENDS