I received my master's degree in Computer Science at Tsinghua University. During my master study, I was fortunate to be mentored by Prof. Yang Liu and Prof. Peng Li. Before graduate study, I received my bachelor's degree from the School of Economics and Management at Tsinghua University.
2025.05: Internship at Samsung this summer, at Mountain View, California.
2024.09: Starting my PhD Journey at UMass Amherst.
Research
I am broadly interested in embodied intelligence and multi-modal foundation models. Currently, I focus on building lifelong embodied agents executable in real-world environments. I welcome collaboration opportunities and encourages interested individuals to reach out.
We propose Mirage, interleaving latent visual tokens, which represent compact imagery visual features, with explicit text tokens to solve diverse multimodal reasoning tasks, boosting the reasoning performance without the full pixel-level image generation.
In this work, we introduce VCA, a curiosity-driven video agent with self-exploration capability, which autonomously navigates video segments and efficiently builds a comprehensive understanding of complex video sequences.
In this paper, we inspect existing representative approaches and analyze their synergy with continual learning strategies. Also, we integrate these strategies into current approaches to further boost LLMs' efficiency in processing long contexts.
In this work, we introduce the principles of Unified Alignment for Agents (UA2), which advocate for the simultaneous alignment of agents with human intentions, environmental dynamics, and self-constraints such as the limitation of monetary budgets.
In this work, we propose our Tuning-free Rule Accumulation (TRAN) framework, which guides LLMs in improving their performance by learning from previous mistakes.
In this work, we propose the Restricted Orthogonal Gradient prOjection (ROGO) framework. The basic idea is to adopt a restricted orthogonal constraint allowing parameters optimized in the direction oblique to the whole frozen space to facilitate forward knowledge transfer while consolidating previous knowledge.
In this work, we propose to improve the performance of on-device NMT systems with dynamic multi-branch layers. Specifically, we design a layer-wise dynamic multi-branch network with only one branch activated during training and inference.
In this study, we proposed a novel method to model the dual relationship between an emission inventory and pollution concentrations for emission inventory estimation.
Internship
Samsung Research America - Research Intern (May. 2025 - Now)
Manager: Luowei Zhou
Zhiyuan Innovation Technology - Research Intern (Jan. 2024 - Aug. 2024)
Manager: Prof. Hao Dong