arxiv:2307.12950

RLCD: Reinforcement Learning from Contrast Distillation for Language Model Alignment

Published on Jul 24, 2023

· Submitted by

akhaliq on Jul 25, 2023

Upvote

Authors:

Kevin Yang ,

Dan Klein ,

Nanyun Peng ,

Yuandong Tian

Abstract

Reinforcement Learning from Contrast Distillation (RLCD) aligns language models to natural language principles using simulated preference pairs without human feedback, outperforming existing methods across various alignment tasks.

AI-generated summary

We propose Reinforcement Learning from Contrast Distillation (RLCD), a method for aligning language models to follow natural language principles without using human feedback. RLCD trains a preference model using simulated preference pairs that contain both a high-quality and low-quality example, generated using contrasting positive and negative prompts. The preference model is then used to improve a base unaligned language model via reinforcement learning. Empirically, RLCD outperforms RLAIF (Bai et al., 2022b) and context distillation (Huang et al., 2022) baselines across three diverse alignment tasks--harmlessness, helpfulness, and story outline generation--and on both 7B and 30B model scales for preference data simulation.

View arXiv page View PDF Add to collection

Community

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2307.12950 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2307.12950 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2307.12950 in a Space README.md to link it from this page.