FLUX.1-dev FP16

High-quality text-to-image generation model from Black Forest Labs. This repository contains the FLUX.1-dev model in FP16 precision for optimal quality and compatibility with modern GPUs.

Model Description

FLUX.1-dev is a state-of-the-art text-to-image diffusion model designed for high-fidelity image generation. This FP16 version maintains full precision for maximum quality output, ideal for creative professionals and researchers requiring the highest image quality.

Key Capabilities:

High-resolution text-to-image generation
Advanced prompt understanding with T5-XXL text encoder
Superior detail and coherence in generated images
Wide range of artistic styles and subjects
Multi-text encoder architecture (CLIP + T5)

Repository Contents

flux-dev-fp16/
├── checkpoints/flux/
│   └── flux1-dev-fp16.safetensors          # 23 GB - Complete model checkpoint
├── clip/
│   └── t5xxl_fp16.safetensors              # 9.2 GB - T5-XXL text encoder
├── clip_vision/
│   └── clip_vision_h.safetensors           # CLIP vision encoder
├── diffusion_models/flux/
│   └── flux1-dev-fp16.safetensors          # 23 GB - Diffusion model
├── text_encoders/
│   ├── clip-vit-large.safetensors          # 1.6 GB - CLIP ViT-Large encoder
│   ├── clip_g.safetensors                  # 1.3 GB - CLIP-G encoder
│   ├── clip_l.safetensors                  # 235 MB - CLIP-L encoder
│   └── t5xxl_fp16.safetensors              # 9.2 GB - T5-XXL encoder
└── vae/flux/
    └── flux-vae-bf16.safetensors           # 160 MB - VAE decoder (BF16)

Total Size: ~72 GB

Hardware Requirements

Minimum Requirements

VRAM: 24 GB (RTX 3090, RTX 4090, A5000, A6000)
RAM: 32 GB system memory
Disk Space: 80 GB free space
GPU: NVIDIA GPU with Compute Capability 7.0+ (Volta or newer)

Recommended Requirements

VRAM: 32+ GB (RTX 6000 Ada, A6000, H100)
RAM: 64 GB system memory
Disk Space: 100+ GB for workspace and outputs
GPU: NVIDIA RTX 4090 or professional GPUs

Performance Notes

FP16 precision provides best quality but highest VRAM usage
Consider FP8 version if VRAM is limited (see flux-dev-fp8 directory)
Generation time: ~30-60 seconds per image at 1024x1024 (depending on GPU)

Usage Examples

Using with Diffusers Library

import torch
from diffusers import FluxPipeline

# Load the pipeline with local model files
pipe = FluxPipeline.from_pretrained(
    "E:/huggingface/flux-dev-fp16",
    torch_dtype=torch.float16
)
pipe = pipe.to("cuda")

# Generate an image
prompt = "A majestic lion standing on a cliff at sunset, cinematic lighting, photorealistic"
image = pipe(
    prompt=prompt,
    num_inference_steps=50,
    guidance_scale=7.5,
    height=1024,
    width=1024
).images[0]

image.save("output.png")

Using with ComfyUI

Copy model files to ComfyUI directories:
- checkpoints/flux/flux1-dev-fp16.safetensors → ComfyUI/models/checkpoints/
- text_encoders/*.safetensors → ComfyUI/models/clip/
- vae/flux/flux-vae-bf16.safetensors → ComfyUI/models/vae/
In ComfyUI:
- Load Checkpoint: Select flux1-dev-fp16
- Text Encoder: Automatically loaded
- VAE: Select flux-vae-bf16

Using Individual Components

from diffusers import AutoencoderKL
from transformers import T5EncoderModel, CLIPTextModel

# Load text encoders
t5_encoder = T5EncoderModel.from_pretrained(
    "E:/huggingface/flux-dev-fp16/text_encoders",
    torch_dtype=torch.float16,
    filename="t5xxl_fp16.safetensors"
)

clip_encoder = CLIPTextModel.from_pretrained(
    "E:/huggingface/flux-dev-fp16/text_encoders",
    torch_dtype=torch.float16,
    filename="clip_l.safetensors"
)

# Load VAE
vae = AutoencoderKL.from_pretrained(
    "E:/huggingface/flux-dev-fp16/vae/flux",
    torch_dtype=torch.bfloat16,
    filename="flux-vae-bf16.safetensors"
)

Model Specifications

Architecture:

Type: Latent Diffusion Transformer
Parameters: ~12B (diffusion model)
Text Encoders:
- T5-XXL: 4.7B parameters (FP16)
- CLIP-G: 1.3B parameters
- CLIP-L: 235M parameters
VAE: BF16 precision (160M parameters)

Precision:

Diffusion Model: FP16 (float16)
Text Encoders: FP16 (float16)
VAE: BF16 (bfloat16)

Format:

.safetensors - Secure tensor format with fast loading

Resolution Support:

Native: 1024x1024
Range: 512x512 to 2048x2048
Aspect ratios: Supports non-square resolutions

Performance Tips

Memory Optimization

# Enable memory efficient attention
pipe.enable_attention_slicing()

# Enable VAE tiling for high resolutions
pipe.enable_vae_tiling()

# Use CPU offloading if VRAM limited (slower)
pipe.enable_sequential_cpu_offload()

Speed Optimization

# Use torch.compile for faster inference (PyTorch 2.0+)
pipe.unet = torch.compile(pipe.unet, mode="reduce-overhead", fullgraph=True)

# Reduce inference steps (trade quality for speed)
image = pipe(prompt, num_inference_steps=25)  # Default is 50

Quality Optimization

Use 50-75 inference steps for best quality
Guidance scale: 7-9 for balanced results
Higher guidance (10-15) for stronger prompt adherence
Consider prompt engineering for better results

License

This model is released under the Apache 2.0 License.

Usage Terms:

✅ Commercial use allowed
✅ Modification and redistribution allowed
✅ Patent use allowed
⚠️ Requires attribution to Black Forest Labs

See the LICENSE file for full terms.

Citation

If you use this model in your research or projects, please cite:

@misc{flux-dev,
  title={FLUX.1-dev: High-Quality Text-to-Image Generation},
  author={Black Forest Labs},
  year={2024},
  howpublished={\url{https://blackforestlabs.ai/}}
}

Related Resources

Official Website: https://blackforestlabs.ai/
Model Card: https://huggingface.co/black-forest-labs/FLUX.1-dev
Documentation: https://huggingface.co/docs/diffusers/en/api/pipelines/flux
Community: https://huggingface.co/black-forest-labs

Version Information

Model Version: FLUX.1-dev
Precision: FP16
Release: 2024
README Version: v1.4

For FP8 precision version (lower VRAM usage), see E:/huggingface/flux-dev-fp8/

Downloads last month: -

Collection including wangkanai/flux-dev-fp16

flux-dev

Collection

flux image models • 5 items • Updated 10 days ago • 1