nimafathi commited on
Commit
c9dcaeb
·
verified ·
1 Parent(s): 66ce891

Upload HDLM model with complete HF integration

Browse files
Files changed (2) hide show
  1. README.md +5 -136
  2. model.safetensors +3 -0
README.md CHANGED
@@ -1,140 +1,9 @@
1
  ---
2
- language:
3
- - en
4
  tags:
5
- - text-generation
6
- - diffusion
7
- - language-model
8
- license: mit
9
  ---
10
 
11
- # hdlm-group/hdlm-base-gamma-0.01
12
-
13
- This is a gamma_hybrid diffusion language model trained on text data.
14
-
15
- ## Model Details
16
-
17
- - **Model Type**: gamma_hybrid
18
- - **Architecture**: Diffusion-based language model
19
- - **Training Method**: Gamma-hybrid diffusion training
20
-
21
- ## Configuration
22
-
23
- ```yaml
24
- ngpus: 4
25
- gradient_accumulation_steps: 8
26
- pretrain_autoregressive_path: /home/toolkit/research-diffcodegen/exp_local/openwebtext/mdlm-autoregressive/org-DiTAR-absorb-v2/checkpoints-meta/checkpoint.pth
27
- tokenizer:
28
- tokens: 50257
29
- model: gpt2
30
- training:
31
- batch_size: 512
32
- accum: ${gradient_accumulation_steps}
33
- n_iters: 1000000
34
- snapshot_freq: 100
35
- log_freq: 10
36
- eval_freq: 100
37
- snapshot_freq_for_preemption: 3000
38
- weight: standard
39
- snapshot_sampling: true
40
- ema: 0.9999
41
- warmup_iter: -1
42
- data:
43
- train: openwebtext-train
44
- valid: wikitext103
45
- cache_dir: /home/toolkit/research-diffcodegen/data
46
- debug: false
47
- graph:
48
- type: QGamma
49
- gamma: 0.01
50
- file: /home/toolkit/research-diffcodegen/data
51
- report_all: false
52
- expanded_sigma: true
53
- noise:
54
- type: loglinear
55
- sigma_min: 0.0001
56
- sigma_max: 2.0
57
- ar_diffusion: false
58
- expanded_sigma: ${graph.expanded_sigma}
59
- sampling:
60
- predictor: analytic
61
- steps_per_level: 1
62
- noise_removal: true
63
- strategy: direct
64
- strategy_param: 0.9
65
- annealing:
66
- type: block
67
- efficient: false
68
- width: 1024
69
- tau: 2048
70
- eval_tau: 512
71
- steps_per_level: ${sampling.steps_per_level}
72
- sampling_method: SAR
73
- diffusion_loss_weight: 1.0
74
- ce_loss_weight: 4.0
75
- sampling_eps: 0.0001
76
- attention:
77
- context_type: block_causal
78
- block_type: full
79
- match_inference: true
80
- eval:
81
- batch_size: 32
82
- perplexity: true
83
- perplexity_batch_size: 16
84
- optim:
85
- weight_decay: 0.0
86
- optimizer: AdamW
87
- lr: 0.0003
88
- beta1: 0.9
89
- beta2: 0.999
90
- eps: 1.0e-08
91
- warmup: 10000
92
- grad_clip: 1.0
93
- scheduler: lambda
94
- experiment:
95
- name: QGamma0.01-v2
96
- wandb_project: debug-QGamma
97
- model:
98
- name: gamma_hdlm
99
- type: ddit
100
- hidden_size: 768
101
- cond_dim: 128
102
- length: 1024
103
- n_blocks: 12
104
- n_heads: 12
105
- scale_by_sigma: false
106
- dropout: 0.1
107
- transformer_sigma_conditioning: true
108
- hybrid_sigma_embedding: true
109
- post_process_logits: true
110
- use_timestep_embedding: true
111
- model_type: gamma_hybrid
112
-
113
- ```
114
-
115
- ## Usage
116
-
117
- ```python
118
- from our.hf_utils import smart_model_loader
119
-
120
- # Load the model
121
- model, config, device, accelerator, metaschedule = smart_model_loader(
122
- "hdlm-group/hdlm-base-gamma-0.01",
123
- model_type="gamma_hybrid"
124
- )
125
-
126
- # Use the model for text generation
127
- # (Add specific usage examples based on your model's capabilities)
128
- ```
129
-
130
- ## Training Details
131
-
132
- This model was trained using the research-diffcodegen framework.
133
-
134
- ## Citation
135
-
136
- If you use this model in your research, please cite the original paper and this implementation.
137
-
138
- ## License
139
-
140
- This model is released under the MIT License.
 
1
  ---
 
 
2
  tags:
3
+ - model_hub_mixin
4
+ - pytorch_model_hub_mixin
 
 
5
  ---
6
 
7
+ This model has been pushed to the Hub using the [PytorchModelHubMixin](https://huggingface.co/docs/huggingface_hub/package_reference/mixins#huggingface_hub.PyTorchModelHubMixin) integration:
8
+ - Library: [More Information Needed]
9
+ - Docs: [More Information Needed]
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0450c9317152e1fd6273392242a10ae4c6009c0f339d2475d953653a101c4893
3
+ size 677728720