FLUX.1-dev FP16

High-quality text-to-image generation model from Black Forest Labs. This repository contains the FLUX.1-dev model in FP16 precision for optimal quality and compatibility with modern GPUs.

Model Description

FLUX.1-dev is a state-of-the-art text-to-image diffusion model designed for high-fidelity image generation. This FP16 version maintains full precision for maximum quality output, ideal for creative professionals and researchers requiring the highest image quality.

Key Capabilities:

  • High-resolution text-to-image generation
  • Advanced prompt understanding with T5-XXL text encoder
  • Superior detail and coherence in generated images
  • Wide range of artistic styles and subjects
  • Multi-text encoder architecture (CLIP + T5)

Repository Contents

flux-dev-fp16/
β”œβ”€β”€ checkpoints/flux/
β”‚   └── flux1-dev-fp16.safetensors          # 23 GB - Complete model checkpoint
β”œβ”€β”€ clip/
β”‚   └── t5xxl_fp16.safetensors              # 9.2 GB - T5-XXL text encoder
β”œβ”€β”€ clip_vision/
β”‚   └── clip_vision_h.safetensors           # CLIP vision encoder
β”œβ”€β”€ diffusion_models/flux/
β”‚   └── flux1-dev-fp16.safetensors          # 23 GB - Diffusion model
β”œβ”€β”€ text_encoders/
β”‚   β”œβ”€β”€ clip-vit-large.safetensors          # 1.6 GB - CLIP ViT-Large encoder
β”‚   β”œβ”€β”€ clip_g.safetensors                  # 1.3 GB - CLIP-G encoder
β”‚   β”œβ”€β”€ clip_l.safetensors                  # 235 MB - CLIP-L encoder
β”‚   └── t5xxl_fp16.safetensors              # 9.2 GB - T5-XXL encoder
└── vae/flux/
    └── flux-vae-bf16.safetensors           # 160 MB - VAE decoder (BF16)

Total Size: ~72 GB

Hardware Requirements

Minimum Requirements

  • VRAM: 24 GB (RTX 3090, RTX 4090, A5000, A6000)
  • RAM: 32 GB system memory
  • Disk Space: 80 GB free space
  • GPU: NVIDIA GPU with Compute Capability 7.0+ (Volta or newer)

Recommended Requirements

  • VRAM: 32+ GB (RTX 6000 Ada, A6000, H100)
  • RAM: 64 GB system memory
  • Disk Space: 100+ GB for workspace and outputs
  • GPU: NVIDIA RTX 4090 or professional GPUs

Performance Notes

  • FP16 precision provides best quality but highest VRAM usage
  • Consider FP8 version if VRAM is limited (see flux-dev-fp8 directory)
  • Generation time: ~30-60 seconds per image at 1024x1024 (depending on GPU)

Usage Examples

Using with Diffusers Library

import torch
from diffusers import FluxPipeline

# Load the pipeline with local model files
pipe = FluxPipeline.from_pretrained(
    "E:/huggingface/flux-dev-fp16",
    torch_dtype=torch.float16
)
pipe = pipe.to("cuda")

# Generate an image
prompt = "A majestic lion standing on a cliff at sunset, cinematic lighting, photorealistic"
image = pipe(
    prompt=prompt,
    num_inference_steps=50,
    guidance_scale=7.5,
    height=1024,
    width=1024
).images[0]

image.save("output.png")

Using with ComfyUI

  1. Copy model files to ComfyUI directories:

    • checkpoints/flux/flux1-dev-fp16.safetensors β†’ ComfyUI/models/checkpoints/
    • text_encoders/*.safetensors β†’ ComfyUI/models/clip/
    • vae/flux/flux-vae-bf16.safetensors β†’ ComfyUI/models/vae/
  2. In ComfyUI:

    • Load Checkpoint: Select flux1-dev-fp16
    • Text Encoder: Automatically loaded
    • VAE: Select flux-vae-bf16

Using Individual Components

from diffusers import AutoencoderKL
from transformers import T5EncoderModel, CLIPTextModel

# Load text encoders
t5_encoder = T5EncoderModel.from_pretrained(
    "E:/huggingface/flux-dev-fp16/text_encoders",
    torch_dtype=torch.float16,
    filename="t5xxl_fp16.safetensors"
)

clip_encoder = CLIPTextModel.from_pretrained(
    "E:/huggingface/flux-dev-fp16/text_encoders",
    torch_dtype=torch.float16,
    filename="clip_l.safetensors"
)

# Load VAE
vae = AutoencoderKL.from_pretrained(
    "E:/huggingface/flux-dev-fp16/vae/flux",
    torch_dtype=torch.bfloat16,
    filename="flux-vae-bf16.safetensors"
)

Model Specifications

Architecture:

  • Type: Latent Diffusion Transformer
  • Parameters: ~12B (diffusion model)
  • Text Encoders:
    • T5-XXL: 4.7B parameters (FP16)
    • CLIP-G: 1.3B parameters
    • CLIP-L: 235M parameters
  • VAE: BF16 precision (160M parameters)

Precision:

  • Diffusion Model: FP16 (float16)
  • Text Encoders: FP16 (float16)
  • VAE: BF16 (bfloat16)

Format:

  • .safetensors - Secure tensor format with fast loading

Resolution Support:

  • Native: 1024x1024
  • Range: 512x512 to 2048x2048
  • Aspect ratios: Supports non-square resolutions

Performance Tips

Memory Optimization

# Enable memory efficient attention
pipe.enable_attention_slicing()

# Enable VAE tiling for high resolutions
pipe.enable_vae_tiling()

# Use CPU offloading if VRAM limited (slower)
pipe.enable_sequential_cpu_offload()

Speed Optimization

# Use torch.compile for faster inference (PyTorch 2.0+)
pipe.unet = torch.compile(pipe.unet, mode="reduce-overhead", fullgraph=True)

# Reduce inference steps (trade quality for speed)
image = pipe(prompt, num_inference_steps=25)  # Default is 50

Quality Optimization

  • Use 50-75 inference steps for best quality
  • Guidance scale: 7-9 for balanced results
  • Higher guidance (10-15) for stronger prompt adherence
  • Consider prompt engineering for better results

License

This model is released under the Apache 2.0 License.

Usage Terms:

  • βœ… Commercial use allowed
  • βœ… Modification and redistribution allowed
  • βœ… Patent use allowed
  • ⚠️ Requires attribution to Black Forest Labs

See the LICENSE file for full terms.

Citation

If you use this model in your research or projects, please cite:

@misc{flux-dev,
  title={FLUX.1-dev: High-Quality Text-to-Image Generation},
  author={Black Forest Labs},
  year={2024},
  howpublished={\url{https://blackforestlabs.ai/}}
}

Related Resources

Version Information

  • Model Version: FLUX.1-dev
  • Precision: FP16
  • Release: 2024
  • README Version: v1.4

For FP8 precision version (lower VRAM usage), see E:/huggingface/flux-dev-fp8/

Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Collection including wangkanai/flux-dev-fp16