π§© Model Architecture Summary
The model is configured using HRMACTModelConfig, designed for hierarchical recurrent memory with adaptive computation. Key architectural parameters:
- Sequence length: 81
- Vocabulary size: 10
- High-level cycles: 2
- Low-level cycles: 2
Transformer Configuration:
- Layers: 4
- Hidden size: 256
- Attention heads: 4
- Feed-forward expansion ratio: 4
Adaptive Computation Time (ACT):
- Maximum halting steps: 16
- Exploration probability: 0.1
ποΈ Training details
Total training time: 3 days, 12 hours, 57 minutes (84.95 hours total)
GPU: x1 NVIDIA Tesla V100
- Downloads last month
- 17
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
π
Ask for provider support