Carnegie Mellon University

Two pink daisies, growing intertwined in front of a blue sky

AgInteracT: LLM-based Human-AI Teaming Simulation Environment

By Maarten Sap

We propose AgInteracT, a multi-party simulation environment that allows humans, AIbased human digital twins, and AI agents to interact with each other towards accomplishing a shared social goal or practical objective. Our environment will leverage large language models (LLMs) adapted to embody complex profiles (e.g., using psychological scales, interaction data) to build AI digital twins, and enable agents and digital twins to communicate in natural language, allowing for a rich spectrum of interactions. Our proposed multi-level, multi-dimensional evaluation framework AgInteracT Eval will measure the performance and social intelligence of agents, digital twins, and human interactors, including dimensions such as persona consistency, social norm following, and goal accomplishment. The team will also be evaluated on goal completion rate, efficiency, and cohesion.

To showcase AgInteracT’s usefulness towards studying human-AI teams, we propose to use our simulation environment to investigate how team composition and teammate characteristics of digital twins affect team performance in two types of military-inspired tasks: (1) Cooperative tasks, in which teammates of different backgrounds, cultures, and expertise must cooperate towards a shared goal (e.g., deciding what to do in a search and rescue mission). (2) Semi-cooperative tasks, in which teammates with different characteristics have to negotiate to achieve their own goals (e.g., job offer negotiations, game of Diplomacy).

Within these tasks, we will investigate the following research questions: (1) How can we best create digital twins that are realistic and faithful to their human counterpart? This will help us develop more faithful and realistic human digital twins. (2) How can we most accurately evaluate the performance of agents and teams using automatic methods? This will help us build better evaluation models for agents and teams, human and AI alike. (3) How do team size, composition, specific agent characteristics, and team communication strategies influence the outcomes of tasks? This will help us answer social science inspired questions to help teams better achieve their goals.