HRM Maze 30x30 Hard

A Hierarchical Reasoning Model (HRM) trained to solve hard 30×30 maze navigation problems using hierarchical processing and adaptive computation.

Model Details

Model Description

This is a Hierarchical Reasoning Model checkpoint fine-tuned specifically for solving hard maze pathfinding problems on 30×30 grids. The model employs a two-level hierarchical architecture inspired by human cognition, with high-level (H) modules for abstract route planning and low-level (L) modules for detailed navigation decisions. It uses Adaptive Computation Time (ACT) with Q-learning based halting to dynamically allocate computational resources.

The model processes maze grids up to 30×30 (900 tokens) and predicts optimal navigation paths through complex maze environments.

Developed by: Sapient Inc.
Model type: Hierarchical Reasoning Model (HRM)
Language(s): Symbolic reasoning (maze navigation symbols)
License: Apache 2.0
Original checkpoint: sapientinc/HRM-checkpoint-maze-30x30-hard

Model Sources

Repository: transformers
Paper: Hierarchical Reasoning Model
Original Repository: HRM GitHub

Uses

Direct Use

This model is designed for solving hard maze navigation problems. It can:

Find optimal paths through complex 30×30 maze environments
Navigate mazes with multiple obstacles and dead ends
Process partial maze representations and predict navigation sequences
Demonstrate hierarchical planning strategies for spatial reasoning tasks

Downstream Use

The model can be used as:

A component in game AI and procedural content generation
A baseline for research in hierarchical spatial reasoning
An example of applying neural networks to pathfinding and navigation problems
A planning module in robotics and autonomous navigation research

Recommendations

Users should be aware that:

The model is specialized for maze pathfinding and should not be used for general spatial reasoning tasks
Input must be properly formatted as grid representations with the 6-token vocabulary
Inference time may vary due to the adaptive computation mechanism
The model is optimized for hard difficulty mazes and may be over-engineered for simple mazes

How to Get Started with the Model

import torch
from transformers import HrmForCausalLM

# Load the model
model = HrmForCausalLM.from_pretrained("zbloss/HRM-maze-30x30-hard")
model.eval()

# Prepare a maze grid (e.g., 20x20 = 400 tokens)
# Vocabulary: 0-5 representing different maze elements
# (e.g., 0=empty, 1=wall, 2=start, 3=goal, 4=path, 5=visited)
maze_grid = torch.randint(0, 6, (1, 400))  # Example 20x20 maze
puzzle_ids = torch.zeros(1, dtype=torch.long)

# Run inference
with torch.no_grad():
    outputs = model(input_ids=maze_grid, puzzle_identifiers=puzzle_ids)

# Get predictions
predictions = torch.argmax(outputs.logits, dim=-1)
print(f"Predicted navigation path: {predictions}")
print(f"Q-halt: {outputs.q_halt_logits[0]:.4f}")
print(f"Q-continue: {outputs.q_continue_logits[0]:.4f}")

Training Details

Training Data

The model was trained on a dataset of hard difficulty 30×30 maze environments. These mazes feature:

Complex layouts with multiple branching paths
Dead ends requiring backtracking
Long optimal paths requiring multi-step planning
Variable start and goal positions

Training Procedure

The model uses a hierarchical architecture with:

High-level (H) module: 4 transformer layers for abstract route planning
Low-level (L) module: 4 transformer layers for detailed navigation decisions
H-cycles: 2 high-level reasoning cycles for strategic planning
L-cycles: 2 low-level computation cycles per H-cycle for tactical moves
ACT mechanism: Q-learning based adaptive halting with max 16 steps

Training Hyperparameters

Training regime: bfloat16 mixed precision
Architecture: 4 H-layers, 4 L-layers, 8 attention heads
Hidden size: 512
Intermediate size: 1536
Max position embeddings: 900 (supports up to 30×30 grids)
Vocabulary size: 6 (maze navigation symbols)

Model Architecture

Technical Specifications

Component	Value
Total Parameters	27,270,658 (27.3M)
Model Size	109.09 MB
Vocabulary Size	6
Hidden Size	512
Intermediate Size	1536
H-level Layers	4
L-level Layers	4
Attention Heads	8
H-cycles	2
L-cycles	2
Max Halting Steps	16
Max Grid Size	30×30 (900 tokens)
Position Encoding	RoPE (Rotary Position Embeddings)
Activation	SwiGLU

Model Architecture and Objective

The Hierarchical Reasoning Model (HRM) features:

Two-level Hierarchical Processing:
- H-level (High-level): Performs slow, abstract route planning and strategic navigation
- L-level (Low-level): Executes fast, detailed navigation decisions and obstacle avoidance
Adaptive Computation Time (ACT):
- Q-learning based halting mechanism
- Dynamically determines when sufficient computation has been performed
- Allows variable computational depth based on maze complexity
- More complex mazes with longer paths trigger more reasoning cycles
Recurrent Carry State:
- Maintains H and L hidden states across reasoning cycles
- Enables iterative refinement of navigation strategies
- Supports backtracking and path correction
Positional Encoding:
- RoPE (Rotary Position Embeddings) for position-aware attention
- Critical for spatial reasoning in grid-based environments
- Supports up to 900 positions (30×30 grids)

Compute Infrastructure

Software

Framework: PyTorch with transformers library
Precision: bfloat16
Format: Safetensors

Performance

The model is designed to solve hard difficulty mazes on 30×30 grids, demonstrating:

Multi-step planning capabilities for long navigation sequences
Ability to recognize and avoid dead ends
Strategic backtracking when necessary
Hierarchical decomposition of complex navigation problems

Citation

BibTeX:

@article{wang2025hierarchical,
  title={Hierarchical Reasoning Model},
  author={Wang, Guan and Li, Jin and Sun, Yuhao and Chen, Xing and Liu, Changling and Wu, Yue and Lu, Meng and Song, Sen and Yadkori, Yasin Abbasi},
  journal={arXiv preprint arXiv:2506.21734},
  year={2025}
}

APA:

Wang, G., Li, J., Sun, Y., Chen, X., Liu, C., Wu, Y., Lu, M., Song, S., & Yadkori, Y. A. (2025). Hierarchical Reasoning Model. arXiv preprint arXiv:2506.21734.

More Information

This checkpoint is a converted version of the original HRM checkpoint from sapientinc/HRM-checkpoint-maze-30x30-hard, formatted for use with the HuggingFace transformers library.

For more details about the HRM architecture and training methodology, see:

Paper: https://arxiv.org/abs/2506.21734
Original Implementation: https://github.com/sapientinc/HRM

Example Use Cases

Game AI: Intelligent maze navigation in video games
Path Planning Research: Baseline for hierarchical planning algorithms
Robotics: Inspiration for hierarchical navigation strategies
Education: Demonstrating neural approaches to classic AI problems

Model Card Contact

For questions or issues with this converted checkpoint, please open an issue in the transformers repository.

Downloads last month: 24