ridger commited on
Commit
37bc7e8
·
verified ·
1 Parent(s): 7c389c4

Upload folder using huggingface_hub

Browse files
Files changed (3) hide show
  1. README.md +31 -0
  2. config.json +1 -0
  3. configuration_ouro.py +4 -0
README.md CHANGED
@@ -27,6 +27,37 @@ tags:
27
  - **Iterative Latent Reasoning**: Performs reasoning through recurrent computation in latent space
28
  - **Adaptive Computation**: Supports early exit mechanisms for dynamic compute allocation
29
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
30
  ## Model Architecture
31
 
32
  Ouro-1.4B is based on the decoder-only Transformer architecture with parameter sharing across recurrent steps:
 
27
  - **Iterative Latent Reasoning**: Performs reasoning through recurrent computation in latent space
28
  - **Adaptive Computation**: Supports early exit mechanisms for dynamic compute allocation
29
 
30
+ ## Configuration
31
+
32
+ ### Recurrent Steps and Adaptive Exit
33
+
34
+ The model's computational behavior can be configured through the `config.json` file:
35
+
36
+ ```json
37
+ {
38
+ "total_ut_steps": 4,
39
+ "early_exit_threshold": 1.0
40
+ }
41
+ ```
42
+
43
+ - **`total_ut_steps`**: Controls the number of recurrent steps (default: 4). You can adjust this value to trade off between performance and computation time.
44
+ - **`early_exit_threshold`**: Controls the adaptive exit mechanism (default: 1.0). Lower values encourage earlier exit, while 1.0 means always use all steps.
45
+
46
+ **Example: Modify recurrent steps**
47
+ ```python
48
+ from transformers import AutoConfig, AutoModelForCausalLM
49
+
50
+ config = AutoConfig.from_pretrained("ByteDance/Ouro-1.4B")
51
+ config.total_ut_steps = 3 # Use 3 recurrent steps instead of 4
52
+ model = AutoModelForCausalLM.from_pretrained(
53
+ "ByteDance/Ouro-1.4B",
54
+ config=config,
55
+ device_map="auto"
56
+ )
57
+ ```
58
+
59
+ > **Note**: vLLM does not currently support the adaptive exit feature due to its inference optimization characteristics. When using vLLM, the model will always execute the full number of `total_ut_steps`.
60
+
61
  ## Model Architecture
62
 
63
  Ouro-1.4B is based on the decoder-only Transformer architecture with parameter sharing across recurrent steps:
config.json CHANGED
@@ -54,6 +54,7 @@
54
  "tie_word_embeddings": false,
55
  "torch_dtype": "bfloat16",
56
  "total_ut_steps": 4,
 
57
  "transformers_version": "4.55.0",
58
  "use_cache": true,
59
  "use_sliding_window": false,
 
54
  "tie_word_embeddings": false,
55
  "torch_dtype": "bfloat16",
56
  "total_ut_steps": 4,
57
+ "early_exit_threshold": 1.0,
58
  "transformers_version": "4.55.0",
59
  "use_cache": true,
60
  "use_sliding_window": false,
configuration_ouro.py CHANGED
@@ -169,6 +169,8 @@ class OuroConfig(PretrainedConfig):
169
  max_window_layers=28,
170
  layer_types=None,
171
  attention_dropout=0.0,
 
 
172
  **kwargs,
173
  ):
174
  self.vocab_size = vocab_size
@@ -193,6 +195,8 @@ class OuroConfig(PretrainedConfig):
193
  self.rope_theta = rope_theta
194
  self.rope_scaling = rope_scaling
195
  self.attention_dropout = attention_dropout
 
 
196
  # Validate the correctness of rotary position embeddings parameters
197
  # BC: if there is a 'type' field, move it to 'rope_type'.
198
  if self.rope_scaling is not None and "type" in self.rope_scaling:
 
169
  max_window_layers=28,
170
  layer_types=None,
171
  attention_dropout=0.0,
172
+ total_ut_steps=4,
173
+ early_exit_threshold=1.0,
174
  **kwargs,
175
  ):
176
  self.vocab_size = vocab_size
 
195
  self.rope_theta = rope_theta
196
  self.rope_scaling = rope_scaling
197
  self.attention_dropout = attention_dropout
198
+ self.total_ut_steps = total_ut_steps
199
+ self.early_exit_threshold = early_exit_threshold
200
  # Validate the correctness of rotary position embeddings parameters
201
  # BC: if there is a 'type' field, move it to 'rope_type'.
202
  if self.rope_scaling is not None and "type" in self.rope_scaling: