Meta’s AI Model Autonomously Improves Other AI Models

Meta’s AI Model Autonomously Improves Other AI Models

Author: ChemistryViews

Meta’s Fundamental AI Research (FAIR) department has introduced the “Self-Taught Evaluator”, an AI model that autonomously evaluates and improves other AI models without human input. Meta’s iterative self-improvement scheme trains a Large Language Model (LLM)-as-a-Judge using unlabeled instructions, generating contrasting outputs, and refining the model’s reasoning and judgments with each iteration. The FAIR researchers used only AI-generated data to train the scoring model and claim it outperforms models that rely on human-labeled data.

The Self-Taught Evaluator was already introduced in an August paper, demonstrating its use of the “chain of thought” technique.

This aproach aims to reduce costs and speed up AI development by eliminating the need for human involvement required in the so called “Reinforcement Learning with Human Feedback” (RLAIF) method, which relies on expert annotators who must have specialized expertise to label data accurately and verify that answers to complex math and writing queries are correct. Other companies, like Google and Anthropic, have also explored RLHF alternatives, but unlike Meta, they do not publicly release their models.


 

 

Leave a Reply

Kindly review our community guidelines before leaving a comment.

Your email address will not be published. Required fields are marked *