HRM Maze 30x30 Hard
A Hierarchical Reasoning Model (HRM) trained to solve hard 30ร30 maze navigation problems using hierarchical processing and adaptive computation.
Model Details
Model Description
This is a Hierarchical Reasoning Model checkpoint fine-tuned specifically for solving hard maze pathfinding problems on 30ร30 grids. The model employs a two-level hierarchical architecture inspired by human cognition, with high-level (H) modules for abstract route planning and low-level (L) modules for detailed navigation decisions. It uses Adaptive Computation Time (ACT) with Q-learning based halting to dynamically allocate computational resources.
The model processes maze grids up to 30ร30 (900 tokens) and predicts optimal navigation paths through complex maze environments.
- Developed by: Sapient Inc.
- Model type: Hierarchical Reasoning Model (HRM)
- Language(s): Symbolic reasoning (maze navigation symbols)
- License: Apache 2.0
- Original checkpoint: sapientinc/HRM-checkpoint-maze-30x30-hard
Model Sources
- Repository: transformers
- Paper: Hierarchical Reasoning Model
- Original Repository: HRM GitHub
Uses
Direct Use
This model is designed for solving hard maze navigation problems. It can:
- Find optimal paths through complex 30ร30 maze environments
- Navigate mazes with multiple obstacles and dead ends
- Process partial maze representations and predict navigation sequences
- Demonstrate hierarchical planning strategies for spatial reasoning tasks
Downstream Use
The model can be used as:
- A component in game AI and procedural content generation
- A baseline for research in hierarchical spatial reasoning
- An example of applying neural networks to pathfinding and navigation problems
- A planning module in robotics and autonomous navigation research
Recommendations
Users should be aware that:
- The model is specialized for maze pathfinding and should not be used for general spatial reasoning tasks
- Input must be properly formatted as grid representations with the 6-token vocabulary
- Inference time may vary due to the adaptive computation mechanism
- The model is optimized for hard difficulty mazes and may be over-engineered for simple mazes
How to Get Started with the Model
import torch
from transformers import HrmForCausalLM
# Load the model
model = HrmForCausalLM.from_pretrained("zbloss/HRM-maze-30x30-hard")
model.eval()
# Prepare a maze grid (e.g., 20x20 = 400 tokens)
# Vocabulary: 0-5 representing different maze elements
# (e.g., 0=empty, 1=wall, 2=start, 3=goal, 4=path, 5=visited)
maze_grid = torch.randint(0, 6, (1, 400)) # Example 20x20 maze
puzzle_ids = torch.zeros(1, dtype=torch.long)
# Run inference
with torch.no_grad():
outputs = model(input_ids=maze_grid, puzzle_identifiers=puzzle_ids)
# Get predictions
predictions = torch.argmax(outputs.logits, dim=-1)
print(f"Predicted navigation path: {predictions}")
print(f"Q-halt: {outputs.q_halt_logits[0]:.4f}")
print(f"Q-continue: {outputs.q_continue_logits[0]:.4f}")
Training Details
Training Data
The model was trained on a dataset of hard difficulty 30ร30 maze environments. These mazes feature:
- Complex layouts with multiple branching paths
- Dead ends requiring backtracking
- Long optimal paths requiring multi-step planning
- Variable start and goal positions
Training Procedure
The model uses a hierarchical architecture with:
- High-level (H) module: 4 transformer layers for abstract route planning
- Low-level (L) module: 4 transformer layers for detailed navigation decisions
- H-cycles: 2 high-level reasoning cycles for strategic planning
- L-cycles: 2 low-level computation cycles per H-cycle for tactical moves
- ACT mechanism: Q-learning based adaptive halting with max 16 steps
Training Hyperparameters
- Training regime: bfloat16 mixed precision
- Architecture: 4 H-layers, 4 L-layers, 8 attention heads
- Hidden size: 512
- Intermediate size: 1536
- Max position embeddings: 900 (supports up to 30ร30 grids)
- Vocabulary size: 6 (maze navigation symbols)
Model Architecture
Technical Specifications
| Component | Value |
|---|---|
| Total Parameters | 27,270,658 (27.3M) |
| Model Size | 109.09 MB |
| Vocabulary Size | 6 |
| Hidden Size | 512 |
| Intermediate Size | 1536 |
| H-level Layers | 4 |
| L-level Layers | 4 |
| Attention Heads | 8 |
| H-cycles | 2 |
| L-cycles | 2 |
| Max Halting Steps | 16 |
| Max Grid Size | 30ร30 (900 tokens) |
| Position Encoding | RoPE (Rotary Position Embeddings) |
| Activation | SwiGLU |
Model Architecture and Objective
The Hierarchical Reasoning Model (HRM) features:
Two-level Hierarchical Processing:
- H-level (High-level): Performs slow, abstract route planning and strategic navigation
- L-level (Low-level): Executes fast, detailed navigation decisions and obstacle avoidance
Adaptive Computation Time (ACT):
- Q-learning based halting mechanism
- Dynamically determines when sufficient computation has been performed
- Allows variable computational depth based on maze complexity
- More complex mazes with longer paths trigger more reasoning cycles
Recurrent Carry State:
- Maintains H and L hidden states across reasoning cycles
- Enables iterative refinement of navigation strategies
- Supports backtracking and path correction
Positional Encoding:
- RoPE (Rotary Position Embeddings) for position-aware attention
- Critical for spatial reasoning in grid-based environments
- Supports up to 900 positions (30ร30 grids)
Compute Infrastructure
Software
- Framework: PyTorch with transformers library
- Precision: bfloat16
- Format: Safetensors
Performance
The model is designed to solve hard difficulty mazes on 30ร30 grids, demonstrating:
- Multi-step planning capabilities for long navigation sequences
- Ability to recognize and avoid dead ends
- Strategic backtracking when necessary
- Hierarchical decomposition of complex navigation problems
Citation
BibTeX:
@article{wang2025hierarchical,
title={Hierarchical Reasoning Model},
author={Wang, Guan and Li, Jin and Sun, Yuhao and Chen, Xing and Liu, Changling and Wu, Yue and Lu, Meng and Song, Sen and Yadkori, Yasin Abbasi},
journal={arXiv preprint arXiv:2506.21734},
year={2025}
}
APA:
Wang, G., Li, J., Sun, Y., Chen, X., Liu, C., Wu, Y., Lu, M., Song, S., & Yadkori, Y. A. (2025). Hierarchical Reasoning Model. arXiv preprint arXiv:2506.21734.
More Information
This checkpoint is a converted version of the original HRM checkpoint from sapientinc/HRM-checkpoint-maze-30x30-hard, formatted for use with the HuggingFace transformers library.
For more details about the HRM architecture and training methodology, see:
- Paper: https://arxiv.org/abs/2506.21734
- Original Implementation: https://github.com/sapientinc/HRM
Example Use Cases
- Game AI: Intelligent maze navigation in video games
- Path Planning Research: Baseline for hierarchical planning algorithms
- Robotics: Inspiration for hierarchical navigation strategies
- Education: Demonstrating neural approaches to classic AI problems
Model Card Contact
For questions or issues with this converted checkpoint, please open an issue in the transformers repository.
- Downloads last month
- 24