nimafathi commited on
Commit
2a57330
·
verified ·
1 Parent(s): 98a076a

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +118 -112
README.md CHANGED
@@ -2,138 +2,144 @@
2
  language:
3
  - en
4
  tags:
 
 
5
  - text-generation
6
  - diffusion
7
  - language-model
8
- license: mit
9
  ---
10
 
11
- # hdlm-group/hdlm-base-gamma-0.01
12
-
13
- This is a gamma_hybrid diffusion language model trained on text data.
14
-
15
- ## Model Details
16
-
17
- - **Model Type**: gamma_hybrid
18
- - **Architecture**: Diffusion-based language model
19
- - **Training Method**: Gamma-hybrid diffusion training
20
-
21
- ## Configuration
22
-
23
- ```yaml
24
- ngpus: 4
25
- gradient_accumulation_steps: 8
26
- model_type: gamma_hybrid
27
- tokenizer:
28
- tokens: 50257
29
- model: gpt2
30
- training:
31
- batch_size: 512
32
- accum: ${gradient_accumulation_steps}
33
- n_iters: 1000000
34
- snapshot_freq: 100
35
- log_freq: 10
36
- eval_freq: 100
37
- snapshot_freq_for_preemption: 3000
38
- weight: standard
39
- snapshot_sampling: true
40
- ema: 0.9999
41
- warmup_iter: -1
42
- data:
43
- train: openwebtext-train
44
- valid: wikitext103
45
- cache_dir: /home/toolkit/research-diffcodegen/data
46
- debug: false
47
- graph:
48
- type: QGamma
49
- gamma: 0.01
50
- file: /home/toolkit/research-diffcodegen/data
51
- report_all: false
52
- expanded_sigma: true
53
- noise:
54
- type: loglinear
55
- sigma_min: 0.0001
56
- sigma_max: 2.0
57
- ar_diffusion: false
58
- expanded_sigma: ${graph.expanded_sigma}
59
- sampling:
60
- predictor: analytic
61
- steps_per_level: 1
62
- noise_removal: true
63
- strategy: direct
64
- strategy_param: 0.9
65
- annealing:
66
- type: block
67
- efficient: false
68
- width: 1024
69
- tau: 2048
70
- eval_tau: 512
71
- steps_per_level: ${sampling.steps_per_level}
72
- sampling_method: SAR
73
- diffusion_loss_weight: 1.0
74
- ce_loss_weight: 4.0
75
- sampling_eps: 0.0001
76
- attention:
77
- context_type: block_causal
78
- block_type: full
79
- match_inference: true
80
- eval:
81
- batch_size: 32
82
- perplexity: true
83
- perplexity_batch_size: 16
84
- optim:
85
- weight_decay: 0.0
86
- optimizer: AdamW
87
- lr: 0.0003
88
- beta1: 0.9
89
- beta2: 0.999
90
- eps: 1.0e-08
91
- warmup: 10000
92
- grad_clip: 1.0
93
- scheduler: lambda
94
- experiment:
95
- name: QGamma0.01-v2
96
- wandb_project: debug-QGamma
97
- model:
98
- name: gamma_hdlm
99
- type: ddit
100
- hidden_size: 768
101
- cond_dim: 128
102
- length: 1024
103
- n_blocks: 12
104
- n_heads: 12
105
- scale_by_sigma: false
106
- dropout: 0.1
107
- transformer_sigma_conditioning: true
108
- hybrid_sigma_embedding: true
109
- post_process_logits: true
110
- use_timestep_embedding: true
111
 
112
- ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
113
 
114
  ## Usage
115
 
 
 
116
  ```python
117
- from our.hf_utils import smart_model_loader
 
 
 
 
 
 
 
 
 
 
118
 
119
- # Load the model
120
- model, config, device, accelerator, metaschedule = smart_model_loader(
121
- "hdlm-group/hdlm-base-gamma-0.01",
122
- model_type="gamma_hybrid"
 
 
 
 
 
 
 
 
 
 
 
 
123
  )
124
 
125
- # Use the model for text generation
126
- # (Add specific usage examples based on your model's capabilities)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
127
  ```
128
 
129
  ## Training Details
130
 
131
- This model was trained using the research-diffcodegen framework.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
132
 
133
  ## Citation
134
 
135
- If you use this model in your research, please cite the original paper and this implementation.
 
 
 
 
 
 
 
136
 
137
  ## License
138
 
139
- This model is released under the MIT License.
 
2
  language:
3
  - en
4
  tags:
5
+ - dllm
6
+ - diffusion-language-model
7
  - text-generation
8
  - diffusion
9
  - language-model
10
+ license: apache-2.0
11
  ---
12
 
13
+ # HDLM-Gamma: Hybrid Diffusion Language Model
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
14
 
15
+ [![Paper](https://img.shields.io/badge/Paper-arXiv-red)](https://arxiv.org/abs/2504.06416)
16
+ [![Code](https://img.shields.io/badge/Code-GitHub-blue)](https://github.com/ServiceNow/hdlm)
17
+
18
+ This is the model card for **dlm-group/hdlm-base-gamma-0.01**.
19
+
20
+ ## Model Description
21
+
22
+ HDLM-Gamma is a hybrid diffusion language model that unifies autoregressive and diffusion-based sequence generation through gamma-hybrid noising. This model interpolates transition operators between absorbing and uniform processes, making it conceptually closer to SEDD (Lou et al. 2024) while maintaining the benefits of both paradigms.
23
+
24
+ The gamma parameter (γ) controls the blend between absorbing and uniform transition matrices: Q_gamma = (1-γ) * Q_absorb + γ * Q_uniform, where smaller values emphasize the absorbing process and larger values incorporate more uniform transitions.
25
+
26
+ ## Model Architecture
27
+
28
+ - **Base Model**: Transformer architecture with staggered score conditioning
29
+ - **Vocabulary Size**: 50,258 tokens (GPT-2 vocabulary + absorbing token)
30
+ - **Context Length**: Variable (supports up to 2048 tokens)
31
+ - **Training**: Continuous-time diffusion with gamma-hybrid graph structure
32
+ - **Inference**: Analytic predictor with staggered score computation
33
 
34
  ## Usage
35
 
36
+ ### Quick Start
37
+
38
  ```python
39
+ from hdlm.hf_utils import smart_model_loader
40
+ from hdlm.gamma_hybrid.sampling import get_sa_sampling_fn
41
+ from transformers import GPT2TokenizerFast
42
+ import torch
43
+
44
+ # Load model using smart loader (automatically detects model type)
45
+ model, cfg, device, accelerator, metaschedule = smart_model_loader(
46
+ model_path="hdlm-group/hdlm-base-gamma-0.01",
47
+ model_type="auto", # automatically detects gamma_hybrid
48
+ device="cuda"
49
+ )
50
 
51
+ # Load tokenizer
52
+ tokenizer = GPT2TokenizerFast.from_pretrained('gpt2')
53
+
54
+ # Generate text
55
+ prompt = "The future of artificial intelligence"
56
+ prompt_ids = tokenizer.encode(prompt, return_tensors='pt').to(device)
57
+
58
+ # Configure sampling function (automatically set up from config)
59
+ sampling_fn = get_sa_sampling_fn(
60
+ config=cfg,
61
+ graph=None, # Will be created from config
62
+ noise=None, # Will be created from config
63
+ meta_schedule=metaschedule,
64
+ batch_dims=(1,),
65
+ eps=1e-4,
66
+ device=device
67
  )
68
 
69
+ # Generate samples
70
+ generated = sampling_fn(
71
+ model=model,
72
+ prompt=prompt_ids,
73
+ context_length=1024
74
+ )
75
+
76
+ # Decode generated text
77
+ generated_text = tokenizer.decode(generated[0], skip_special_tokens=True)
78
+ print(generated_text)
79
+ ```
80
+
81
+ ### Evaluation
82
+
83
+ ```bash
84
+ # Text generation evaluation
85
+ python hdlm/eval_generation.py \
86
+ --checkpoint_path hdlm-group/hdlm-base-gamma-0.01 \
87
+ --sampling_method SAR \
88
+ --save_samples
89
+
90
+ # Perplexity evaluation
91
+ python hdlm/eval_modeling.py \
92
+ --checkpoint_path hdlm-group/hdlm-base-gamma-0.01 \
93
+ --work_dir "./logs/eval_modeling_gamma" \
94
+ --dataset ptb
95
  ```
96
 
97
  ## Training Details
98
 
99
+ - **Dataset**: OpenWebText
100
+ - **Batch Size**: 256
101
+ - **Learning Rate**: 3e-4 with lambda scheduling
102
+ - **Gamma (γ)**: 0.01 (controls hybrid transition blend)
103
+ - **Graph Type**: QGamma with expanded sigma conditioning
104
+ - **Noise Schedule**: Log-linear (σ_min=1e-4, σ_max=10.0)
105
+ - **Training Steps**: 1M iterations
106
+ - **Warmup**: 50K steps
107
+
108
+ ## Key Components
109
+
110
+ ### Graph Structure
111
+ The QGamma graph combines absorbing and uniform transition matrices:
112
+ - **Absorbing component**: Transitions to absorbing state (mask token)
113
+ - **Uniform component**: Uniform transitions between all tokens
114
+ - **Hybrid blend**: Controlled by gamma parameter
115
+
116
+ ### Staggered Score
117
+ The model uses staggered score computation that applies different transformations to absorbing and uniform branches before combining them, enabling more flexible generation patterns.
118
+
119
+ ### Sampling Strategy
120
+ - **Predictor**: Analytic predictor with exact transition computation
121
+ - **Strategy**: Direct sampling with configurable strategy parameter
122
+ - **Noise Removal**: Optional final denoising step
123
+
124
+ ## Model Variants
125
+
126
+ Available gamma values and their characteristics:
127
+
128
+ - **γ = 0.01**: Minimal uniform transitions, closest to pure absorbing process
129
+ - **γ = 0.1**: Moderate hybrid behavior with increased uniform mixing
130
+ - **γ = 0.5**: Balanced absorbing-uniform transition blend
131
 
132
  ## Citation
133
 
134
+ ```bibtex
135
+ @article{fathi2025unifying,
136
+ title={Unifying autoregressive and diffusion-based sequence generation},
137
+ author={Fathi, Nima and Scholak, Torsten and No{\"e}l, Pierre-Andr{\'e}},
138
+ journal={arXiv preprint arXiv:2504.06416},
139
+ year={2025}
140
+ }
141
+ ```
142
 
143
  ## License
144
 
145
+ This model is released under the same license as the original HDLM codebase. Please refer to the [GitHub repository](https://github.com/ServiceNow/hdlm) for license details.