Research Interests
Large language models, reinforcement learning, and diffusion models.
I am a researcher working on large language models, reinforcement learning, and diffusion models. My biggest dream is to build a general-purpose agent that can learn to do anything.
Researcher · Engineer · Pianist
I am a First-year PhD student at INSAIT, supervised by Prof. Yuxia Wang. I am also a amateur pianist, and I like watching football matches.
Large language models, reinforcement learning, and diffusion models.
INSAIT
Focus: Large language models, reinforcement learning, and diffusion models.
Fudan University
Focus: Natural Language Processing, reinforcement learning, and watermarking.
Harbin Institute of Technology
Operating systems, Database systems, Compilers, Computer Networks, etc.
We propose GumbelSoft, a novel language model watermarking method that leverages the GumbelMax-tric.
We propose a reward shaping method PAR to mitigate reward hacking in RLHF.
StepFun
Worked on LLM Post-training.
Fudan University
Worked as a teaching assistant for the course "Operating Systems".
National Gallery.
The road of an ordinary people.
A comfortable song for driving.