FLUX.1-dev FP16
High-quality text-to-image generation model from Black Forest Labs. This repository contains the FLUX.1-dev model in FP16 precision for optimal quality and compatibility with modern GPUs.
Model Description
FLUX.1-dev is a state-of-the-art text-to-image diffusion model designed for high-fidelity image generation. This FP16 version maintains full precision for maximum quality output, ideal for creative professionals and researchers requiring the highest image quality.
Key Capabilities:
- High-resolution text-to-image generation
- Advanced prompt understanding with T5-XXL text encoder
- Superior detail and coherence in generated images
- Wide range of artistic styles and subjects
- Multi-text encoder architecture (CLIP + T5)
Repository Contents
flux-dev-fp16/
βββ checkpoints/flux/
β βββ flux1-dev-fp16.safetensors # 23 GB - Complete model checkpoint
βββ clip/
β βββ t5xxl_fp16.safetensors # 9.2 GB - T5-XXL text encoder
βββ clip_vision/
β βββ clip_vision_h.safetensors # CLIP vision encoder
βββ diffusion_models/flux/
β βββ flux1-dev-fp16.safetensors # 23 GB - Diffusion model
βββ text_encoders/
β βββ clip-vit-large.safetensors # 1.6 GB - CLIP ViT-Large encoder
β βββ clip_g.safetensors # 1.3 GB - CLIP-G encoder
β βββ clip_l.safetensors # 235 MB - CLIP-L encoder
β βββ t5xxl_fp16.safetensors # 9.2 GB - T5-XXL encoder
βββ vae/flux/
βββ flux-vae-bf16.safetensors # 160 MB - VAE decoder (BF16)
Total Size: ~72 GB
Hardware Requirements
Minimum Requirements
- VRAM: 24 GB (RTX 3090, RTX 4090, A5000, A6000)
- RAM: 32 GB system memory
- Disk Space: 80 GB free space
- GPU: NVIDIA GPU with Compute Capability 7.0+ (Volta or newer)
Recommended Requirements
- VRAM: 32+ GB (RTX 6000 Ada, A6000, H100)
- RAM: 64 GB system memory
- Disk Space: 100+ GB for workspace and outputs
- GPU: NVIDIA RTX 4090 or professional GPUs
Performance Notes
- FP16 precision provides best quality but highest VRAM usage
- Consider FP8 version if VRAM is limited (see
flux-dev-fp8directory) - Generation time: ~30-60 seconds per image at 1024x1024 (depending on GPU)
Usage Examples
Using with Diffusers Library
import torch
from diffusers import FluxPipeline
# Load the pipeline with local model files
pipe = FluxPipeline.from_pretrained(
"E:/huggingface/flux-dev-fp16",
torch_dtype=torch.float16
)
pipe = pipe.to("cuda")
# Generate an image
prompt = "A majestic lion standing on a cliff at sunset, cinematic lighting, photorealistic"
image = pipe(
prompt=prompt,
num_inference_steps=50,
guidance_scale=7.5,
height=1024,
width=1024
).images[0]
image.save("output.png")
Using with ComfyUI
Copy model files to ComfyUI directories:
checkpoints/flux/flux1-dev-fp16.safetensorsβComfyUI/models/checkpoints/text_encoders/*.safetensorsβComfyUI/models/clip/vae/flux/flux-vae-bf16.safetensorsβComfyUI/models/vae/
In ComfyUI:
- Load Checkpoint: Select
flux1-dev-fp16 - Text Encoder: Automatically loaded
- VAE: Select
flux-vae-bf16
- Load Checkpoint: Select
Using Individual Components
from diffusers import AutoencoderKL
from transformers import T5EncoderModel, CLIPTextModel
# Load text encoders
t5_encoder = T5EncoderModel.from_pretrained(
"E:/huggingface/flux-dev-fp16/text_encoders",
torch_dtype=torch.float16,
filename="t5xxl_fp16.safetensors"
)
clip_encoder = CLIPTextModel.from_pretrained(
"E:/huggingface/flux-dev-fp16/text_encoders",
torch_dtype=torch.float16,
filename="clip_l.safetensors"
)
# Load VAE
vae = AutoencoderKL.from_pretrained(
"E:/huggingface/flux-dev-fp16/vae/flux",
torch_dtype=torch.bfloat16,
filename="flux-vae-bf16.safetensors"
)
Model Specifications
Architecture:
- Type: Latent Diffusion Transformer
- Parameters: ~12B (diffusion model)
- Text Encoders:
- T5-XXL: 4.7B parameters (FP16)
- CLIP-G: 1.3B parameters
- CLIP-L: 235M parameters
- VAE: BF16 precision (160M parameters)
Precision:
- Diffusion Model: FP16 (float16)
- Text Encoders: FP16 (float16)
- VAE: BF16 (bfloat16)
Format:
.safetensors- Secure tensor format with fast loading
Resolution Support:
- Native: 1024x1024
- Range: 512x512 to 2048x2048
- Aspect ratios: Supports non-square resolutions
Performance Tips
Memory Optimization
# Enable memory efficient attention
pipe.enable_attention_slicing()
# Enable VAE tiling for high resolutions
pipe.enable_vae_tiling()
# Use CPU offloading if VRAM limited (slower)
pipe.enable_sequential_cpu_offload()
Speed Optimization
# Use torch.compile for faster inference (PyTorch 2.0+)
pipe.unet = torch.compile(pipe.unet, mode="reduce-overhead", fullgraph=True)
# Reduce inference steps (trade quality for speed)
image = pipe(prompt, num_inference_steps=25) # Default is 50
Quality Optimization
- Use 50-75 inference steps for best quality
- Guidance scale: 7-9 for balanced results
- Higher guidance (10-15) for stronger prompt adherence
- Consider prompt engineering for better results
License
This model is released under the Apache 2.0 License.
Usage Terms:
- β Commercial use allowed
- β Modification and redistribution allowed
- β Patent use allowed
- β οΈ Requires attribution to Black Forest Labs
See the LICENSE file for full terms.
Citation
If you use this model in your research or projects, please cite:
@misc{flux-dev,
title={FLUX.1-dev: High-Quality Text-to-Image Generation},
author={Black Forest Labs},
year={2024},
howpublished={\url{https://blackforestlabs.ai/}}
}
Related Resources
- Official Website: https://blackforestlabs.ai/
- Model Card: https://huggingface.co/black-forest-labs/FLUX.1-dev
- Documentation: https://huggingface.co/docs/diffusers/en/api/pipelines/flux
- Community: https://huggingface.co/black-forest-labs
Version Information
- Model Version: FLUX.1-dev
- Precision: FP16
- Release: 2024
- README Version: v1.4
For FP8 precision version (lower VRAM usage), see E:/huggingface/flux-dev-fp8/
- Downloads last month
- -