System and Method for Interactive Robot Action Replanning Using Large Language Models

Fuente: WIPO "tomato"
A robotic controller for controlling a robot according to a sequence of robotic actions. comprises an input interface to receive multimodal inputs specifying instructions for performing a task in audio, video, and a text modality. The controller transforms the multimodal instructions into encodings using a large language model (LLM) encoder and decodes the encodings into a first sequence of robotic instructions and a robot action description of the actions using an LLM decoder. Human feedback input is received corresponding to at least one action in the first sequence of actions and the controller encodes the feedback input with the robot action description. The controller feeds the encoded data along with multimodal features generated from the encodings into the LLM decoder to generate a corrected sequence of actions. The controller is configured to control a robot according to the corrected sequence of actions.