publications
-
Optimizing Diversity and
Quality through Base-Aligned Model Collaboration
Yichen Wang=, Chenghao Yang=, Tenghao Huang=, Muhao
Chen, Jonathan May, Mina Lee
arxiv:2511
Alignment improves large language models (LLMs)' output quality but at the cost of diversity.
We propose that collaboration between base and aligned models can achieve both diversity and
quality.
We introduce BACo (Base-Aligned Model Collaboration), an inference-time framework that employs
token-level routing strategies based on prediction uncertainty and semantic role, achieving
improvements in a single pass without retraining or multi-sampling.
Experiments across three open-ended generation tasks and 13 metrics show BACo consistently
surpasses state-of-the-art baselines, achieving a 21.3% joint improvement in diversity and
quality, confirmed by human evaluations.
Citation // Website //
Code //
Data //
Reading
List: Awesome LLM Diversity -- A curated list of papers and resources on
LLM diversity, to cover literature from various perspectives including linguistic, value
pluralism, exploration in RL, and human-LLM interaction, etc.
@misc{wang2025optimizingdiversityqualitybasealigned,
title={Optimizing Diversity and Quality through Base-Aligned Model Collaboration},
author={Yichen Wang and Chenghao Yang and Tenghao Huang and Muhao Chen and Jonathan May and Mina Lee},
year={2025},
eprint={2511.05650},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2511.05650},
}
-
Unraveling
Misinformation Propagation in LLM Reasoning
Yiyang Feng=, Yichen Wang=, Shaobo Cui, Boi Faltings, Mina Lee,
Jiawei Zhou
EMNLP 2025 Findings
We investigate how misinformation from user inputs, which is prevalent in real-world
interactions, propagates through LLMs' reasoning processes, focusing on math reasoning as a case
study.
We analyze misinformation's impact on intermediate steps and final answers, and examine LLMs'
ability on correction.
Results show LLMs correct misinformation less than half the time despite possessing correct
internal knowledge and explicit instruction, causing accuracy drops of 10.02%-72.20%. We explore
mitigation methods and suggest that fine-tuning on synthetic early-stage, factual correction data can
effectively mitigate misinformation propagation.
Citation // Website
@misc{feng2025unravelingmisinformationpropagationllm,
title={Unraveling Misinformation Propagation in LLM Reasoning},
author={Yiyang Feng and Yichen Wang and Shaobo Cui and Boi Faltings and Mina Lee and Jiawei Zhou},
year={2025},
eprint={2505.18555},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2505.18555},
}
-
Jailbreak Large Vision-Language
Models Through Multi-Modal Linkage
Yu Wang, Xiaofei Zhou, Yichen Wang, Geyuan Zhang, Tianxing He
ACL 2025 Oral
State-of-the-art VLMs like GPT-4o can be jailbroken at the linkage between multiple modalities.
We propose Multi-Modal Linkage (MML) Attack that uses an
encryption-decryption process across text and image modalities to hide malicious content and
frames within benign scenarios like video game production.
Experiments demonstrate attack success rates of 97.80% on SafeBench,
98.81% on MM-SafeBench, and 99.07% on HADES-Dataset against GPT-4o.
Citation
@misc{wang2025jailbreaklargevisionlanguagemodels,
title={Jailbreak Large Vision-Language Models Through Multi-Modal Linkage},
author={Yu Wang and Xiaofei Zhou and Yichen Wang and Geyuan Zhang and Tianxing He},
year={2025},
eprint={2412.00473},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2412.00473},
}
-
Can A Society of Generative Agents
Simulate Human Behavior and Inform Public Health Policy? A Case Study on Vaccine
Hesitancy
Abe Bohan Hou, Hongru Du, Yichen Wang, Jingyu Zhang, Zixiao Wang, Paul Pu Liang, Daniel
Khashabi, Lauren Gardner, Tianxing He
COLM 2025
We explore how to apply a sandbox society with generative agents to model human behavior for
assessing public policies.
We introduce VacSim, a framework that uses 100 LLM agents to simulate health-related
decision-making, with vaccine hesitancy as a case study.
We instantiate agents with demographics, connect them via a social network, and evaluate public
health interventions.
Experiments indicate that LLMs can simulate aspects of human behavior but face real-world
alignment challenges such as demographic inconsistencies, highlighting both the potential and
limitations of LLM-driven social simulation for policy development.
Citation
@misc{hou2025societygenerativeagentssimulate,
title={Can A Society of Generative Agents Simulate Human Behavior and Inform Public Health Policy? A Case Study on Vaccine Hesitancy},
author={Abe Bohan Hou and Hongru Du and Yichen Wang and Jingyu Zhang and Zixiao Wang and Paul Pu Liang and Daniel Khashabi and Lauren Gardner and Tianxing He},
year={2025},
eprint={2503.09639},
archivePrefix={arXiv},
primaryClass={cs.MA},
url={https://arxiv.org/abs/2503.09639},
}
-
HACo-Det: A Study Towards
Fine-Grained Machine-Generated Text Detection under Human-AI Coauthoring
Zhixiong Su=, Yichen Wang=, Herun Wan, Zhaohan Zhang, Minnan
Luo
ACL 2025
We explore the possibility of fine-grained machine-generated text detection under human-AI
coauthoring.
We adapt existing document-level detectors to fine-grained detection and evaluate them on the
word-level HACo-Det dataset we built.
The results show that metric-based methods significantly underperform, and all methods face challenges in detecting coauthored
texts.
Citation
@article{su2025haco,
title={HACo-Det: A Study Towards Fine-Grained Machine-Generated Text Detection under Human-AI Coauthoring},
author={Su, Zhixiong and Wang, Yichen and Wan, Herun and Zhang, Zhaohan and Luo, Minnan},
journal={arXiv preprint arXiv:2506.02959},
year={2025}
}
-
Concentrate Attention: Towards
Domain-Generalizable Prompt Optimization for Language Models
Chengzhengxu Li, Xiaoming Liu, Zhaohan Zhang, Yichen Wang, Chen Liu, Yu Lan, and Chao
Shen
NIPS 2024
We conduct a pilot study on prompt optimization generalization and find two co-relation rules
with LM's attention weight distributions. We then offer a new objective, concentration,
representing the strength and stability of lookback attention to the prompt. Adapting it to
popular soft and hard prompt optimization methods shows good improvement.
Citation
@article{li2024concentrate,
title={Concentrate Attention: Towards Domain-Generalizable Prompt Optimization for Language Models},
author={Li, Chengzhengxu and Liu, Xiaoming and Zhang, Zhaohan and Wang, Yichen and Liu, Chen and Lan, Yu and Shen, Chao},
journal={arXiv preprint arXiv:2406.10584},
year={2024}
}
-
Stumbling Blocks: Stress Testing
the Robustness of Machine-Generated Text Detectors Under Attacks
Yichen Wang, Shangbin Feng, Abe Bohan Hou, Xiao Pu, Chao Shen, Xiaoming Liu, Yulia
Tsvetkov, and Tianxing He
ACL 2024   🌟 best paper AC nomination 🌟
meta score = 5/5
We comprehensively study the robustness of popular machine-generated text detectors under
attacks from diverse categories: editing, paraphrasing, prompting, and co-generating. Our
experiments reveal that all detectors exhibit different loopholes. Further, we investigate the
reasons behind these defects and propose initial out-of-the-box patches.
Citation // Code // Dataset // Poster
@article{wang2024stumbling,
title={Stumbling Blocks: Stress Testing the Robustness of Machine-Generated Text Detectors Under Attacks},
author={Wang, Yichen and Feng, Shangbin and Hou, Abe Bohan and Pu, Xiao and Shen, Chao and Liu, Xiaoming and Tsvetkov, Yulia and He, Tianxing},
journal={arXiv preprint arXiv:2402.11638},
year={2024}
}
-
k-SemStamp: A Clustering-Based
Semantic Watermark for Detection of Machine-Generated Text
Abe Bohan Hou, Jingyu Zhang, Yichen Wang, Daniel Khashabi, and Tianxing He
ACL 2024 Findings
We propose k-SemStamp, a simple yet effective enhancement of SemStamp, utilizing k-means
clustering as an alternative of LSH to partition the embedding space with awareness of inherent
semantic structure.
Citation
@article{hou2024k,
title={k-SemStamp: A Clustering-Based Semantic Watermark for Detection of Machine-Generated Text},
author={Hou, Abe Bohan and Zhang, Jingyu and Wang, Yichen and Khashabi, Daniel and He, Tianxing},
journal={arXiv preprint arXiv:2402.11399},
year={2024}
}
-
Does DetectGPT Fully Utilize
Perturbation? Bridging Selective Perturbation to Fine-tuned Contrastive Learning Detector
would be Better
Shengchao Liu, Xiaoming Liu, Yichen Wang, Zehua Cheng, Chengzhengxu Li, Zhaohan Zhang, Yu
Lan, and Chao Shen
ACL 2024
We propose a novel fine-tuned machine-generated text detector, Pecola, bridging metric-based and
fine-tuned methods by contrastive learning on selective perturbation further than DetectGPT.
Citation
@article{liu2024does,
title={Does$\backslash$textsc $\{$DetectGPT$\}$ Fully Utilize Perturbation? Selective Perturbation on Model-Based Contrastive Learning Detector would be Better},
author={Liu, Shengchao and Liu, Xiaoming and Wang, Yichen and Cheng, Zehua and Li, Chengzhengxu and Zhang, Zhaohan and Lan, Yu and Shen, Chao},
journal={arXiv preprint arXiv:2402.00263},
year={2024}
}
-
SemStamp: A Semantic Watermark with
Paraphrastic Robustness for Text Generation
Abe Bohan Hou, Jingyu Zhang, Tianxing He, Yichen Wang, Yung-Sung Chuang, Hongwei Wang,
Lingfeng Shen, Benjamin Van Durme, Daniel Khashabi, and Yulia Tsvetkov
NAACL 2024
Existing watermarking algorithms are vulnerable to paraphrase attacks because of their
token-level design. To address this issue, we propose SemStamp, a robust sentence-level semantic
watermarking algorithm based on locality-sensitive hashing (LSH), which partitions the semantic
space of sentences.
Citation
@article{hou2023semstamp,
title={Semstamp: A semantic watermark with paraphrastic robustness for text generation},
author={Hou, Abe Bohan and Zhang, Jingyu and He, Tianxing and Wang, Yichen and Chuang, Yung-Sung and Wang, Hongwei and Shen, Lingfeng and Van Durme, Benjamin and Khashabi, Daniel and Tsvetkov, Yulia},
journal={arXiv preprint arXiv:2310.03991},
year={2023}
}
-
Dialogue for Prompting: a
Policy-Gradient-Based Discrete Prompt Optimization for Few-shot Learning
Chengzhengxu
Li, Xiaoming Liu, Yichen Wang, Duyi Li, Yu Lan, and Chao Shen
AAAI 2024
We propose a dialogue-comprised policy-gradient-based discrete prompt optimization (DP2O) method
with dialogue prompt alignment and reinforcement learning to efficiently and effectively
generate prompt demonstrations.
Citation
@inproceedings{li2024dialogue,
title={Dialogue for Prompting: A Policy-Gradient-Based Discrete Prompt Generation for Few-Shot Learning},
author={Li, Chengzhengxu and Liu, Xiaoming and Wang, Yichen and Li, Duyi and Lan, Yu and Shen, Chao},
booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
volume={38},
number={16},
pages={18481--18489},
year={2024}
}
-
Improving Pacing in Long-Form
Story Planning
Yichen Wang, Kevin Yang, Xiaoming Liu, and Dan
Klein
EMNLP 2023 Findings
Existing LLM-based systems for writing long-form stories or story outlines frequently suffer
from unnatural pacing, resulting in a jarring experience for the reader. We propose a Concrete
Outline Control (CONCOCT) system to improve pacing when automatically generating story outlines.
Compared to a baseline hierarchical outline generator, humans judge CONCOCT’s pacing to be more
consistent over 57% of the time across multiple outline lengths, and the gains also translate to
downstream stories.
Citation // Code // Dataset // Poster
@inproceedings{wang2023improving,
title={Improving Pacing in Long-Form Story Planning},
author={Wang, Yichen and Yang, Kevin and Liu, Xiaoming and Klein, Dan},
booktitle={Findings of the Association for Computational Linguistics: EMNLP 2023},
pages={10788--10845},
year={2023}
}
-
CoCo: Coherence-Enhanced
Machine-Generated Text Detection Under Data Limitation With Contrastive
Learning
Xiaoming Liu=, Zhaohan Zhang=, Yichen
Wang=, Hang Pu, Yu Lan, and Chao Shen
EMNLP 2023
We present a coherence-based contrastive learning model
named CoCo to detect the possible machine-generated texts (MGTs) under the low-resource
scenario. We encode coherence information in the form of graph into the text representation and
employ an improved contrastive learning framework. Our approach outperforms the state-of-the-art
methods at least 1.23%. Also, we surprisingly find that MGTs originated from up-to-date language
models could be easier to detect than these from previous models, in our experiments, and we
propose some preliminary explanations.
Citation // Code // Dataset //
Poster
@inproceedings{liu2023coco,
title={Coco: Coherence-enhanced machine-generated text detection under low resource with contrastive learning},
author={Liu, Xiaoming and Zhang, Zhaohan and Wang, Yichen and Pu, Hang and Lan, Yu and Shen, Chao},
booktitle={Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing},
pages={16167--16188},
year={2023}
}