Configuration Parsing Warning: In adapter_config.json: "peft.base_model_name_or_path" must be a string
Configuration Parsing Warning: In adapter_config.json: "peft.task_type" must be a string

FLUX.1-dev LoRA Fine-tuned with Flow-GRPO

This LoRA (Low-Rank Adaptation) model is a fine-tuned version of FLUX.1-dev using Flow-GRPO (Flow-based Group Relative Policy Optimization), a novel reinforcement learning technique for flow matching models.

Model Description

This model was trained using the Flow-GRPO methodology described in the paper "Flow-GRPO: Training Flow Matching Models via Online RL". Flow-GRPO integrates online reinforcement learning into flow matching models by:

  1. ODE-to-SDE conversion: Transforms deterministic flow matching into stochastic sampling for RL exploration
  2. Denoising reduction: Uses fewer denoising steps during training while maintaining full quality at inference
  3. Human preference optimization: Trained with PickScore reward to align with human preferences

Training Details

Core Configuration

  • Base Model: FLUX.1-dev
  • Training Method: Flow-GRPO with PickScore reward
  • Resolution: 512×512
  • Mixed Precision: bfloat16
  • Seed: 42

LoRA Configuration

  • LoRA Enabled: True
  • Rank: Not specified in config (typically 32-64)
  • Target Modules: Transformer layers

Training Hyperparameters

  • Learning Rate: 5e-5
  • Batch Size: 1 (with gradient accumulation: 32 steps)
  • Optimizer: 8-bit AdamW
    • β₁: 0.9
    • β₂: 0.999
    • Weight Decay: 1e-4
    • Epsilon: 1e-8
  • Gradient Clipping: Max norm 1.0
  • Max Epochs: 100,000
  • Save Frequency: Every 100 steps

Flow-GRPO Specific

  • Reward Function: PickScore (human preference)
  • Beta (KL penalty): 0.001
  • Clip Range: 0.2
  • Advantage Clipping: Max 5.0
  • Timestep Fraction: 0.2
  • Guidance Scale: 3.5

Sampling Configuration

  • Training Steps: 2 (denoising reduction)
  • Evaluation Steps: 4
  • Images per Prompt: 4
  • Batches per Epoch: 4

Usage

With Diffusers

import torch
from diffusers import FluxPipeline

# Load the base model
pipe = FluxPipeline.from_pretrained(
    "black-forest-labs/FLUX.1-dev", 
    torch_dtype=torch.bfloat16,
    device_map="auto"
)

# Load the LoRA weights
pipe.load_lora_weights("ighoshsubho/lora-grpo-flux-dev")

# Generate an image
prompt = "A serene landscape with mountains and a lake at sunset"
image = pipe(
    prompt,
    height=512,
    width=512,
    guidance_scale=3.5,
    num_inference_steps=20,
    max_sequence_length=256,
).images[0]

image.save("generated_image.png")

Adjusting LoRA Strength

# You can adjust the LoRA influence
pipe.set_adapters(["default"], adapter_weights=[0.8])  # 80% LoRA influence

Training Data & Objectives

  • Dataset: Custom PickScore dataset for human preference alignment
  • Prompt Function: General OCR prompts
  • Optimization Target: Maximizing PickScore while maintaining image quality
  • KL Regularization: Prevents reward hacking and maintains model stability

Performance Improvements

This model demonstrates improvements in:

  • Human preference alignment through PickScore optimization
  • Text rendering quality via OCR-focused training
  • Compositional understanding enhanced by Flow-GRPO's exploration mechanism
  • Stable training with minimal reward hacking due to KL regularization

Technical Notes

  • Uses denoising reduction during training (2 steps) for efficiency
  • Maintains full quality with standard inference steps (20-50)
  • Trained with mixed precision (bfloat16) for memory efficiency
  • 8-bit AdamW optimizer reduces memory footprint
  • Gradient accumulation (32 steps) enables effective large batch training

Limitations

  • Optimized for 512×512 resolution
  • Focused on PickScore preferences (may not generalize to all aesthetic preferences)
  • LoRA adaptation may have reduced capacity compared to full fine-tuning

Citation

If you use this model, please cite the Flow-GRPO paper:

@article{liu2025flow,
  title={Flow-GRPO: Training Flow Matching Models via Online RL},
  author={Liu, Jie and Liu, Gongye and Liang, Jiajun and Li, Yangguang and Liu, Jiaheng and Wang, Xintao and Wan, Pengfei and Zhang, Di and Ouyang, Wanli},
  journal={arXiv preprint arXiv:2505.05470},
  year={2025}
}

License

This model is released under the Apache 2.0 License, following the base FLUX.1-dev model license.

Downloads last month
14
Inference Providers NEW
Examples

Model tree for ighoshsubho/lora-grpo-flux-dev

Adapter
(31735)
this model