LatentObservations
Subscribe
Sign in
Share this post
LatentObservations
Why do we need RLHF? Imitation, Inverse RL, and the role of reward
Copy link
Facebook
Email
Notes
More
Why do we need RLHF? Imitation, Inverse RL…
Ran Wei
Jan 15
8
Share this post
LatentObservations
Why do we need RLHF? Imitation, Inverse RL, and the role of reward
Copy link
Facebook
Email
Notes
More
RLHF aims to exceed human performance, but could we just be fixing distribution shift?
Read →
Comments
Share
Copy link
Facebook
Email
Notes
More
This site requires JavaScript to run correctly. Please
turn on JavaScript
or unblock scripts
Share this post
Why do we need RLHF? Imitation, Inverse RL…
Share this post
RLHF aims to exceed human performance, but could we just be fixing distribution shift?