LatentObservations
Subscribe
Sign in
Share this discussion
Why do we need RLHF? Imitation, Inverse RL, and the role of reward
latentobservations.substack.com
Copy link
Facebook
Email
Note
Other
Why do we need RLHF? Imitation, Inverse RL…
Ran Wei
Jan 15
7
Share this post
Why do we need RLHF? Imitation, Inverse RL, and the role of reward
latentobservations.substack.com
Copy link
Facebook
Email
Note
Other
RLHF aims to exceed human performance, but could we just be fixing distribution shift?
Read →
0 Comments
Share
Share
Copy link
Facebook
Email
Note
Other
This site requires JavaScript to run correctly. Please
turn on JavaScript
or unblock scripts
Why do we need RLHF? Imitation, Inverse RL, and the role of reward
Why do we need RLHF? Imitation, Inverse RL…
Why do we need RLHF? Imitation, Inverse RL, and the role of reward
RLHF aims to exceed human performance, but could we just be fixing distribution shift?