VOICE RECOGNITION METHOD, VOICE RECOGNITION DEVICE, AND VOICE RECOGNITION PROGRAM

Fecha de publicación: 15/12/2022
Fuente: WIPO (eseential oils OR extracts)
An extraction unit (15b) extracts a sequence of features for each voice signal frame. A learning unit (15c) trains a voice recognition model (14a) which uses CTC, by using the extracted sequence of features. A generation unit (15d) generates a spike sequence, which is a sequence of labels outputted by the voice recognition model as spikes, by using the trained voice recognition model (14a). A prediction learning unit (15e) trains a spike point prediction model (14b) for predicting the point in time at which said spike will be outputted by using the generated spike sequence.