DIGITAL ASSISTANT INTERACTIONS BASED ON USER ATTENTION

Fuente: WIPO "tomato"
An example process includes: detecting audio data and video data, wherein the video data represents a scene; and in response to detecting the audio data and the video data: in accordance with a determination, based on the audio data and the video data, that the scene includes a user whose attention is directed to the electronic device while the user is speaking and that a set of initiation criteria is satisfied: determining whether the audio data includes speech that is intended for the electronic device; and in accordance with a determination, based on the audio data and the video data, that the scene does not include a user whose attention is directed to the electronic device while the user is speaking: forgoing determining whether the audio data includes speech that is intended for the electronic device.