🧩 Model Architecture Summary

The model is configured using HRMACTModelConfig, designed for hierarchical recurrent memory with adaptive computation. Key architectural parameters:

  • Sequence length: 81
  • Vocabulary size: 10
  • High-level cycles: 2
  • Low-level cycles: 2

Transformer Configuration:

  • Layers: 4
  • Hidden size: 256
  • Attention heads: 4
  • Feed-forward expansion ratio: 4

Adaptive Computation Time (ACT):

  • Maximum halting steps: 16
  • Exploration probability: 0.1

πŸ‹οΈ Training details

Total training time: 3 days, 12 hours, 57 minutes (84.95 hours total)

GPU: x1 NVIDIA Tesla V100

Downloads last month
17
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support