Hello! I’m Jinwei Hu, currently pursuing my Ph.D in Computer Science at the University of Liverpool, supervised by Prof. Xiaowei Huang and Dr. Yi Dong. My research focuses on understanding and improving the safety, robustness, and reliability of modern AI systems, particularly LLM-driven agents operating in dynamic and adversarial environments. My work has appeared in leading venues including ACL, NeurIPS, AAAI, ICML, TACL, CVPR, ICASSP, IROS and IEEE Transactions journals, and particularly has received an Outstanding Paper Award at ACL 2026.

I also contribute to the research community through academic service for venues including AAAI, NeurIPS, ICML, AAMAS, ACL, ECAI, TNNLS, and TOSEM, among others. Before beginning my Ph.D., I completed an MSc with Distinction in Applied Computational Science and Engineering at Imperial College London, where my research on AI for Science and explainable AI, supervised by Dr. Sibo Cheng and Dr. Rossella Arcucci at the Data Science Institute, resulted in publications in many leading interdisciplinary journals such as CEJ.

📚 My current research interests include but not limited to:

AI Safety and Security
Responsible Agentic AI
LLM Post-training for fine grained knowledge control
Testing and verification of AI-based systems
Neuro-Symbolic AI
AI4Science

I am always open to research discussions and collaborations. Please feel free to get in touch! 😃

🔥 News

2026.07: 🏆 Honoured to share that my first-author paper, “Lying with Truths,” received an Outstanding Paper Award at ACL 2026.
2026.07: 📝 Acted as the Session Chair in ACL 2026 for Session: Mathematical, Symbolic, Neurosymbolic, and Logical Reasoning.
2026.07: 🎤 Gave an Oral Presentation in ACL 2026 at Harbor D in Grand Hyatt Manchester Hotel, San Diego.
2026.06: 📝 Invited as the Program Committee for AAAI 2027.
2026.04: 📝 Invited as the Reviewer for NeurIPS 2026.
2026.04: 🎉 Our paper on multi-agent collusive attack of LLM-based agents in open channel has been accepted at ACL 2026.
2026.04: 📝 Invited as the Journal Referee for IEEE Transactions on Neural Networks and Learning Systems (TNNLS).
2026.03: 📝 Invited as the Journal Referee for ACM Transactions on Software Engineering and Methodology (TOSEM).
2026.01: 🎤 Gave an Oral Presentation at AAAI 2026 at Singapore EXPO.
2026.01: 🎉 Our paper on adversarial robustness testing has been accepted at IEEE ICASSP 2026 which will be hosted at Barcelona, Spain.
2025.12: 🎤 Invited to present a talk on AI in Programmatic Agents at the Trustworthy AI+ Workshop, co-hosted by King’s College London and the University of Exeter.
2025.11: 🎉 Our paper on Domain Adaptation of Agentic AI has been accepted at AAAI 2026 and selected for an Oral Presentation.
2025.10: 🎉 Our paper on LLM guardrail has been available at Artificial Intelligence Review.
2025.09: 🎉 Our paper on LLM unlearning has been accepted at NeurIPS 2025.
2025.09: 🎉 Our emerging research work on LLM-powered agent’s responsibility has been accepted as a poster by UKAIRS 2025.
2025.08: 📝 Served as a Reviewer for NeurIPS 2025, reviewing submissions in the area of AI safety.
2025.08: 📝 Served as a Programme Committee Member for AAAI 2026, contributing to the review process for papers in main track.
2025.08: 🎉 Our paper on randomized smoothing for LLM-driven multi-agent systems was accepted as a Fast Track publication in the top-tier journal CJoA.
2025.07: 🎉 Our paper on adversarial testing for industrial cyber-physical systems, was published in IEEE Transactions on ICPS.
2025.06: 🎉 Our paper on safe prunning LoRA has been accepted at TACL.
2025.06: 📝 Served as a Programme Committee Member for UKAIRS 2025.
2025.02: 🎉 Our paper on social media deepfake detection has been accepted by CVPR 2025.
2024.09: 🔬 Act as a Research Associate and mainly work on the project “CRoCS: Certified Robust and Scalable Autonomous Operation in Cyber Space,” funded by the Alan Turing Institute (AICD Research Centre).
2024.06: 🏆 Won the ELLIS Manchester Scholarship and thanks for supporting me to attended the ELLIS Summer Session hosted at the University of Manchester.
2024.05: 🎤 Gave a tutorial session about “How to Control LLMs’ behaviors and Design Strategy to safeguard LLMs” at TACPS & Trust-AI Reading Group.
2024.05: 📝 Served as a Reviewer for ECAI 2024.
2024.05: 🎉 Our paper on LLM guardrail has been accepted by ICML 2024.
2024.01: 🎉 My master’s thesis on Explainable AI and Chemistry (AI4Science) was accepted for publication in the top journal CEJ.
2023.12: 🏆 Awarded a full scholarship to pursue my PhD at the University of Liverpool and joined the Trustworthy Autonomous Cyber Physical Systems (ACPS) Lab, under the supervision of Prof. Xiaowei Huang and Dr. Yi Dong.
2023.10: 🎓 Graduated from Imperial College London and is honored to receive the highest academic award of Master of Science with Distinction in UK.

📝 Publications

Selected Publications in Conference

ACL 2026 Oral & Outstanding Paper Award 🏆

Lying with Truths: Open-Channel Multi-Agent Collusion for Belief Manipulation via Generative Montage

Jinwei Hu, Xinmiao Huang, Youcheng Sun, Yi Dong, Xiaowei Huang

We identify and formalize cognitive collusion in LLM agents, and propose a multi-agent generative montage framework that manipulates beliefs using only truthful evidence, revealing a new class of reasoning-driven vulnerabilities in public channel.

AAAI 2026 Oral

Tapas Are Free! Training-Free Adaptation of Programmatic Agents via LLM-Guided Program Synthesis in Dynamic Environments

Jinwei Hu, Yi Dong, Youcheng Sun, Xiaowei Huang

We propose TAPA, a training-free framework that uses LLMs to synthesize agent actions for adaptive decision-making, shifting from policy retraining to action-level adaptation and demonstrating strong performance in cyber defense and swarm control.

NeurIPS 2025

FALCON: Fine-grained Activation Manipulation by Contrastive Orthogonal Unalignment for Large Language Model

Jinwei Hu, Zhenglin Huang, Xiangyu Yin, Wenjie Ruan, Guangliang Cheng, Yi Dong, Xiaowei Huang

We prpose a representation-guided unlearning framework that combines contrastive learning, gradient projection, and information-theoretic metrics to enable more precise knowledge removal in LLMs.

ICASSP 2026

DDSA: Dual-Domain Strategic Attack for Spatial-Temporal Efficiency in Adversarial Robustness Testing

Jinwei Hu, Shiyuan Meng, Yi Dong, Xiaowei Huang

We propose a dual-domain adversarial testing framework that improves efficiency by selectively attacking critical regions and frames, enabling resource-efficient robustness evaluation in real-time image systems.

Selected Publications in Journal

Chinese Journal of Aeronautics & Featured Article

Enhancing Robustness of LLM-Driven Multi-Agent Systems through Randomized Smoothing

Jinwei Hu, Yi Dong, Zhengtao Ding, Xiaowei Huang

We design a defense framework for LLM-driven multi-agent systems that leverages randomized smoothing to provide probabilistic safety guarantees, mitigating malicious behaviors and hallucination propagation while preserving system performance.

IEEE Transactions on Industrial Cyber-Physical Systems

Hierarchical Testing With Rabbit Optimization for Industrial Cyber-Physical Systems

Jinwei Hu, Zezhi Tang, Xin Jin, Benyuan Zhang, Yi Dong, Xiaowei Huang

We propose HERO, a black-box adversarial testing framework that combines hierarchical analysis and optimization to efficiently generate high-quality time-series adversarial examples, enabling robust evaluation of ICPS applications.

Chemical Engineering Journal

Explainable AI models for predicting drop coalescence in microfluidics device

Jinwei Hu, Kewei Zhu, Sibo Cheng, Nina M Kovalchuk, Alfred Soulsby, Mark JH Simmons, Omar K Matar, Rossella Arcucci

We investigate droplet coalescence prediction in microfluidic systems using machine learning and explainable AI, revealing key physical factors that govern coalescence while ensuring interpretability in AI predictions.

Other publication details are shown in Google Scholar.

💻 Involved Projects

Robustifying Generative AI through Human-Centric Integration of Neural and Symbolic Methods
- Role: Research Associate
- Funding: EU Horizon, €9.3M
CRoCS: Certified Robust and Scalable Autonomous Operation in Cyber Space
- Role: Research Associate
- Funding: Alan Turing Institute (AI for Cyber Defence (AICD) Research Centre), £80K

🎖 Honors and Awards

2026.07: Outstanding Paper Award, ACL 2026.
2024.06: ELLIS Manchester Scholarship, the University of Manchester.
2023.12: PhD Full Scholarship, University of Liverpool.
2023.10: MSc with Distinction, Imperial College London.

💬 Invited Talks

2026.07: Oral Presentation at ACL 2026, San Diego.
2026.01: Oral Presentation at AAAI 2026, Singapore EXPO.
2025.12: Invited talk on AI in Programmatic Agents at the Trustworthy AI+ Workshop, co-hosted by King’s College London and the University of Exeter.
2024.05: Tutorial on “How to Control LLMs’ behaviors and Design Strategy to safeguard LLMs” at TACPS & Trust-AI Reading Group.

🤝 Services

Journal Reviewing

IEEE Transactions on Neural Networks and Learning Systems (TNNLS)
ACM Transactions on Software Engineering and Methodology (TOSEM)
IEEE Internet of Things Journal (IoTJ)

Conference Program Committee / Reviewer

AAAI 2026/2027
AAMAS 2026
NeurIPS 2025/2026
ICML 2025
UKAIRS 2025
ECAI 2024