🧩 Model Architecture Summary

The model is configured using HRMACTModelConfig, designed for hierarchical recurrent memory with adaptive computation. Key architectural parameters:

Sequence length: 81
Vocabulary size: 10
High-level cycles: 2
Low-level cycles: 2

Transformer Configuration:

Layers: 4
Hidden size: 256
Attention heads: 4
Feed-forward expansion ratio: 4

Adaptive Computation Time (ACT):

Maximum halting steps: 16
Exploration probability: 0.1

🏋️ Training details

Total training time: 3 days, 12 hours, 57 minutes (84.95 hours total)

GPU: x1 NVIDIA Tesla V100

Downloads last month: 17

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support