This module contains some useful reward functions, primarily intended for use with the [GRPOTrainer].
GRPOTrainer
[[autodoc]] rewards.think_format_reward