SYSTEM AND METHOD FOR ROBOT PLANNING USING LARGE LANGUAGE MODELS

Fuente: WIPO "tomato"
A robotic controller for controlling a robot according to a sequence of robotic actions. comprises an input interface configured to receive a plurality of multimodal inputs each specifying instructions for performing a task in a different modality including audio, video, and a text modality. The controller also comprises a multimodal large language model, an action sequence decoder, and a controller. The multimodal LLM includes a multimodal LLM encoder and an LLM decoder. The multimodal LLM encoder is trained with machine learning to transform the multimodal instructions into encodings and the LLM decoder is configured to decode the encodings into a sequence of robotic instructions. The action sequence decoder is trained with machine learning to transform the sequence of robotic instructions into a sequence of actions using a library of robotic skills. The controller is configured to control a robot according to the sequence of actions.