LatentObservations
Subscribe
Sign in
Why do we need RLHF? Imitation, Inverse RL…
Ran Wei
Jan 15, 2024
8
RLHF aims to exceed human performance, but could we just be fixing distribution shift?
Read →
Comments
This site requires JavaScript to run correctly. Please
turn on JavaScript
or unblock scripts
Why do we need RLHF? Imitation, Inverse RL…
RLHF aims to exceed human performance, but could we just be fixing distribution shift?