Technical Terms
RLHF
Definition
Reinforcement Learning from Human Feedback, a method for improving model behaviour using human preference data.
In Plain English
Training the model based on which answers people preferred.
Reinforcement Learning from Human Feedback, a method for improving model behaviour using human preference data.
Training the model based on which answers people preferred.