Measuring The Efficacy Of Large Language Models On Classification Tasks

Fuente: WIPO "tomato"
Techniques for evaluating the efficacy of large language models on classification tasks are disclosed. A prompt that includes an instruction and a content item to be classified is submitted multiple times to a large language model. For each submission of the prompt, a corresponding classification label from a set of two or more classification labels is returned. Each classification label is compared to the expected classification label for the content item using a label distance value metric. Using the label distance value metric, a confidence score is generated.