nimafathi commited on
Commit
16ee68c
·
verified ·
1 Parent(s): dacddf4

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +136 -5
README.md CHANGED
@@ -1,9 +1,140 @@
1
  ---
 
 
2
  tags:
3
- - model_hub_mixin
4
- - pytorch_model_hub_mixin
 
 
5
  ---
6
 
7
- This model has been pushed to the Hub using the [PytorchModelHubMixin](https://huggingface.co/docs/huggingface_hub/package_reference/mixins#huggingface_hub.PyTorchModelHubMixin) integration:
8
- - Library: [More Information Needed]
9
- - Docs: [More Information Needed]
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ language:
3
+ - en
4
  tags:
5
+ - text-generation
6
+ - diffusion
7
+ - language-model
8
+ license: mit
9
  ---
10
 
11
+ # hdlm-group/hdlm-base-gamma-0.05
12
+
13
+ This is a gamma_hybrid diffusion language model trained on text data.
14
+
15
+ ## Model Details
16
+
17
+ - **Model Type**: gamma_hybrid
18
+ - **Architecture**: Diffusion-based language model
19
+ - **Training Method**: Gamma-hybrid diffusion training
20
+
21
+ ## Configuration
22
+
23
+ ```yaml
24
+ ngpus: 4
25
+ gradient_accumulation_steps: 8
26
+ pretrain_autoregressive_path: /home/toolkit/research-diffcodegen/exp_local/openwebtext/mdlm-autoregressive/org-DiTAR-absorb-v2/checkpoints-meta/checkpoint.pth
27
+ tokenizer:
28
+ tokens: 50257
29
+ model: gpt2
30
+ training:
31
+ batch_size: 512
32
+ accum: ${gradient_accumulation_steps}
33
+ n_iters: 1000000
34
+ snapshot_freq: 500
35
+ log_freq: 100
36
+ eval_freq: 500
37
+ snapshot_freq_for_preemption: 3000
38
+ weight: standard
39
+ snapshot_sampling: true
40
+ ema: 0.9999
41
+ warmup_iter: -1
42
+ data:
43
+ train: openwebtext-train
44
+ valid: wikitext103
45
+ cache_dir: /home/toolkit/research-diffcodegen/data
46
+ debug: false
47
+ graph:
48
+ type: QGamma
49
+ gamma: 0.05
50
+ file: /home/toolkit/research-diffcodegen/data
51
+ report_all: false
52
+ expanded_sigma: true
53
+ noise:
54
+ type: loglinear
55
+ sigma_min: 0.0001
56
+ sigma_max: 2.0
57
+ ar_diffusion: false
58
+ expanded_sigma: ${graph.expanded_sigma}
59
+ sampling:
60
+ predictor: analytic
61
+ steps_per_level: 1
62
+ noise_removal: true
63
+ strategy: direct
64
+ strategy_param: 0.9
65
+ annealing:
66
+ type: block
67
+ efficient: false
68
+ width: 1024
69
+ tau: 2048
70
+ eval_tau: 256
71
+ steps_per_level: ${sampling.steps_per_level}
72
+ sampling_method: SAR
73
+ diffusion_loss_weight: 1.0
74
+ ce_loss_weight: 4.0
75
+ sampling_eps: 0.0001
76
+ attention:
77
+ context_type: block_causal
78
+ block_type: full
79
+ match_inference: true
80
+ eval:
81
+ batch_size: 32
82
+ perplexity: true
83
+ perplexity_batch_size: 16
84
+ optim:
85
+ weight_decay: 0.0
86
+ optimizer: AdamW
87
+ lr: 0.0003
88
+ beta1: 0.9
89
+ beta2: 0.999
90
+ eps: 1.0e-08
91
+ warmup: 10000
92
+ grad_clip: 1.0
93
+ scheduler: lambda
94
+ experiment:
95
+ name: QGamma0.05-v2
96
+ wandb_project: debug-QGamma
97
+ model:
98
+ name: gamma_hdlm
99
+ type: ddit
100
+ hidden_size: 768
101
+ cond_dim: 128
102
+ length: 1024
103
+ n_blocks: 12
104
+ n_heads: 12
105
+ scale_by_sigma: false
106
+ dropout: 0.1
107
+ transformer_sigma_conditioning: true
108
+ hybrid_sigma_embedding: true
109
+ post_process_logits: true
110
+ use_timestep_embedding: true
111
+ model_type: gamma_hybrid
112
+
113
+ ```
114
+
115
+ ## Usage
116
+
117
+ ```python
118
+ from our.hf_utils import smart_model_loader
119
+
120
+ # Load the model
121
+ model, config, device, accelerator, metaschedule = smart_model_loader(
122
+ "hdlm-group/hdlm-base-gamma-0.05",
123
+ model_type="gamma_hybrid"
124
+ )
125
+
126
+ # Use the model for text generation
127
+ # (Add specific usage examples based on your model's capabilities)
128
+ ```
129
+
130
+ ## Training Details
131
+
132
+ This model was trained using the research-diffcodegen framework.
133
+
134
+ ## Citation
135
+
136
+ If you use this model in your research, please cite the original paper and this implementation.
137
+
138
+ ## License
139
+
140
+ This model is released under the MIT License.