Multimodal Dialogue for Empathetic Human-Robot Interaction ~ By Giacomo Salici

Abstract: Large Language Models (LLMs) like ChatGPT and Llama enable dialogue with artificial agents but often lack empathy and fail to connect with humans. In social robotics, robots need to understand and respond empathetically to human emotions. This study introduces a dataset of 5600 dialogues created with ChatGPT and a multimodal agent, Emma, which responds to facial expressions. The model is trained in three steps: fine-tuning Llama2 on the dataset using both facial expressions and text, training a reward model based on emotion changes, and using Proximal Policy Optimization (PPO) to optimize positive emotional responses. A survey comparing our model’s responses to ChatGPT’s shows that our model is more empathetic and humanlike. We also conducted a human-robot interaction (HRI) experiment with the Pepper robot, where participants rated the empathy and appropriateness of the robot’s responses. The results were statistically significant using a chi-square test.