|
|
--- |
|
|
language: |
|
|
- en |
|
|
tags: |
|
|
- dllm |
|
|
- diffusion-language-model |
|
|
- text-generation |
|
|
- diffusion |
|
|
- language-model |
|
|
license: apache-2.0 |
|
|
--- |
|
|
|
|
|
# HDLM-Epsilon: Hybrid Diffusion Language Model |
|
|
|
|
|
[](https://arxiv.org/abs/2504.06416) |
|
|
[](https://github.com/ServiceNow/hdlm) |
|
|
|
|
|
This model card is for the **hdlm-base model with epsilon=0.0** |
|
|
|
|
|
## Model Description |
|
|
|
|
|
HDLM-Epsilon is a hybrid diffusion language model that unifies autoregressive and diffusion-based sequence generation through epsilon-hybrid noising. This model interpolates evolution operators between absorbing and uniform processes, making it conceptually closer to MDLM (Sahoo et al. 2024) while maintaining the benefits of both paradigms. |
|
|
|
|
|
The epsilon parameter (ε) controls the blend between absorbing and uniform processes during training, where smaller values emphasize the absorbing process and larger values incorporate more uniform noise. |
|
|
|
|
|
## Model Architecture |
|
|
|
|
|
- **Base Model**: Transformer architecture with custom conditioning layers |
|
|
- **Vocabulary Size**: 50,258 tokens (GPT-2 vocabulary + absorbing token) |
|
|
- **Context Length**: 1024 tokens |
|
|
- **Training**: Hybrid loss combining token masking with random token corruption |
|
|
- **Inference**: Supports multiple sampling algorithms including ACS (Adaptive Correction Sampler) |
|
|
|
|
|
## Usage |
|
|
|
|
|
### Quick Start |
|
|
|
|
|
```python |
|
|
from hdlm.hf_utils import smart_model_loader |
|
|
from hdlm.epsilon_hybrid.sample import full_diff |
|
|
from transformers import GPT2TokenizerFast |
|
|
import torch |
|
|
|
|
|
# Load model using smart loader (automatically detects model type) |
|
|
model, cfg, device, accelerator, metaschedule = smart_model_loader( |
|
|
model_path="hdlm-group/hdlm-base-epsilon-0.0", |
|
|
model_type="auto", # automatically detects epsilon_hybrid |
|
|
device="cuda" |
|
|
) |
|
|
|
|
|
# Load tokenizer |
|
|
tokenizer = GPT2TokenizerFast.from_pretrained('gpt2') |
|
|
|
|
|
# Generate text |
|
|
prompt = "The future of artificial intelligence" |
|
|
prompt_ids = tokenizer.encode(prompt, return_tensors='pt').to(device) |
|
|
|
|
|
# Full diffusion sampling |
|
|
generated = full_diff( |
|
|
model=model, |
|
|
prompt=prompt_ids, |
|
|
batch_size=1, |
|
|
alg='acs', # or 'original', 'remask', 'remdm' |
|
|
steps=512, |
|
|
temperature=1.0, |
|
|
context_length=1024, |
|
|
device=device |
|
|
) |
|
|
|
|
|
# Decode generated text |
|
|
generated_text = tokenizer.decode(generated[0], skip_special_tokens=True) |
|
|
print(generated_text) |
|
|
``` |
|
|
|
|
|
### Evaluation |
|
|
|
|
|
```bash |
|
|
# Text generation evaluation |
|
|
python hdlm/eval_generation.py \ |
|
|
--checkpoint_path hdlm-group/hdlm-base-epsilon-0.0 \ |
|
|
--sampling_method full_diff \ |
|
|
--algorithm acs \ |
|
|
--save_samples |
|
|
|
|
|
# Perplexity evaluation |
|
|
python hdlm/eval_modeling.py \ |
|
|
--checkpoint_path hdlm-group/hdlm-base-epsilon-0.0 \ |
|
|
--work_dir "./logs/eval_modeling_epsilon" \ |
|
|
--dataset ptb |
|
|
``` |
|
|
|
|
|
## Training Details |
|
|
|
|
|
- **Dataset**: OpenWebText |
|
|
- **Batch Size**: 512 |
|
|
- **Learning Rate**: 3e-4 with cosine scheduling |
|
|
- **Epsilon (ε)**: 0.01 (controls hybrid noising blend) |
|
|
- **Lambda (λ)**: 1.0 (weighting factor for unmasked tokens) |
|
|
- **Loss Type**: Hybrid loss combining masking and random token corruption |
|
|
- **Training Steps**: 1M iterations |
|
|
- **Warmup**: 50K steps |
|
|
|
|
|
## Sampling Algorithms |
|
|
|
|
|
The model supports several sampling algorithms: |
|
|
|
|
|
- **`original`**: Standard diffusion sampling |
|
|
- **`acs`**: Adaptive Correction Sampler with error correction |
|
|
- **`remask`**: Remasking strategy for improved quality |
|
|
- **`remdm`**: ReMDM-style sampling with probability mixing |
|
|
|
|
|
## Model Variants |
|
|
|
|
|
Available epsilon values and their characteristics: |
|
|
|
|
|
- **ε = 0.01**: Minimal uniform noise, closest to pure absorbing process |
|
|
- **ε = 0.1**: Moderate hybrid behavior |
|
|
- **ε = 0.5**: Balanced absorbing-uniform blend |
|
|
|
|
|
## Citation |
|
|
|
|
|
```bibtex |
|
|
@article{fathi2025unifying, |
|
|
title={Unifying autoregressive and diffusion-based sequence generation}, |
|
|
author={Fathi, Nima and Scholak, Torsten and No{\"e}l, Pierre-Andr{\'e}}, |
|
|
journal={arXiv preprint arXiv:2504.06416}, |
|
|
year={2025} |
|
|
} |
|
|
``` |
|
|
|
|
|
## License |
|
|
|
|
|
This model is released under the same license as the original HDLM codebase. Please refer to the [GitHub repository](https://github.com/ServiceNow/hdlm) for license details. |
|
|
|