HRM Maze 30x30 Hard

A Hierarchical Reasoning Model (HRM) trained to solve hard 30ร—30 maze navigation problems using hierarchical processing and adaptive computation.

Model Details

Model Description

This is a Hierarchical Reasoning Model checkpoint fine-tuned specifically for solving hard maze pathfinding problems on 30ร—30 grids. The model employs a two-level hierarchical architecture inspired by human cognition, with high-level (H) modules for abstract route planning and low-level (L) modules for detailed navigation decisions. It uses Adaptive Computation Time (ACT) with Q-learning based halting to dynamically allocate computational resources.

The model processes maze grids up to 30ร—30 (900 tokens) and predicts optimal navigation paths through complex maze environments.

  • Developed by: Sapient Inc.
  • Model type: Hierarchical Reasoning Model (HRM)
  • Language(s): Symbolic reasoning (maze navigation symbols)
  • License: Apache 2.0
  • Original checkpoint: sapientinc/HRM-checkpoint-maze-30x30-hard

Model Sources

Uses

Direct Use

This model is designed for solving hard maze navigation problems. It can:

  • Find optimal paths through complex 30ร—30 maze environments
  • Navigate mazes with multiple obstacles and dead ends
  • Process partial maze representations and predict navigation sequences
  • Demonstrate hierarchical planning strategies for spatial reasoning tasks

Downstream Use

The model can be used as:

  • A component in game AI and procedural content generation
  • A baseline for research in hierarchical spatial reasoning
  • An example of applying neural networks to pathfinding and navigation problems
  • A planning module in robotics and autonomous navigation research

Recommendations

Users should be aware that:

  • The model is specialized for maze pathfinding and should not be used for general spatial reasoning tasks
  • Input must be properly formatted as grid representations with the 6-token vocabulary
  • Inference time may vary due to the adaptive computation mechanism
  • The model is optimized for hard difficulty mazes and may be over-engineered for simple mazes

How to Get Started with the Model

import torch
from transformers import HrmForCausalLM

# Load the model
model = HrmForCausalLM.from_pretrained("zbloss/HRM-maze-30x30-hard")
model.eval()

# Prepare a maze grid (e.g., 20x20 = 400 tokens)
# Vocabulary: 0-5 representing different maze elements
# (e.g., 0=empty, 1=wall, 2=start, 3=goal, 4=path, 5=visited)
maze_grid = torch.randint(0, 6, (1, 400))  # Example 20x20 maze
puzzle_ids = torch.zeros(1, dtype=torch.long)

# Run inference
with torch.no_grad():
    outputs = model(input_ids=maze_grid, puzzle_identifiers=puzzle_ids)

# Get predictions
predictions = torch.argmax(outputs.logits, dim=-1)
print(f"Predicted navigation path: {predictions}")
print(f"Q-halt: {outputs.q_halt_logits[0]:.4f}")
print(f"Q-continue: {outputs.q_continue_logits[0]:.4f}")

Training Details

Training Data

The model was trained on a dataset of hard difficulty 30ร—30 maze environments. These mazes feature:

  • Complex layouts with multiple branching paths
  • Dead ends requiring backtracking
  • Long optimal paths requiring multi-step planning
  • Variable start and goal positions

Training Procedure

The model uses a hierarchical architecture with:

  • High-level (H) module: 4 transformer layers for abstract route planning
  • Low-level (L) module: 4 transformer layers for detailed navigation decisions
  • H-cycles: 2 high-level reasoning cycles for strategic planning
  • L-cycles: 2 low-level computation cycles per H-cycle for tactical moves
  • ACT mechanism: Q-learning based adaptive halting with max 16 steps

Training Hyperparameters

  • Training regime: bfloat16 mixed precision
  • Architecture: 4 H-layers, 4 L-layers, 8 attention heads
  • Hidden size: 512
  • Intermediate size: 1536
  • Max position embeddings: 900 (supports up to 30ร—30 grids)
  • Vocabulary size: 6 (maze navigation symbols)

Model Architecture

Technical Specifications

Component Value
Total Parameters 27,270,658 (27.3M)
Model Size 109.09 MB
Vocabulary Size 6
Hidden Size 512
Intermediate Size 1536
H-level Layers 4
L-level Layers 4
Attention Heads 8
H-cycles 2
L-cycles 2
Max Halting Steps 16
Max Grid Size 30ร—30 (900 tokens)
Position Encoding RoPE (Rotary Position Embeddings)
Activation SwiGLU

Model Architecture and Objective

The Hierarchical Reasoning Model (HRM) features:

  1. Two-level Hierarchical Processing:

    • H-level (High-level): Performs slow, abstract route planning and strategic navigation
    • L-level (Low-level): Executes fast, detailed navigation decisions and obstacle avoidance
  2. Adaptive Computation Time (ACT):

    • Q-learning based halting mechanism
    • Dynamically determines when sufficient computation has been performed
    • Allows variable computational depth based on maze complexity
    • More complex mazes with longer paths trigger more reasoning cycles
  3. Recurrent Carry State:

    • Maintains H and L hidden states across reasoning cycles
    • Enables iterative refinement of navigation strategies
    • Supports backtracking and path correction
  4. Positional Encoding:

    • RoPE (Rotary Position Embeddings) for position-aware attention
    • Critical for spatial reasoning in grid-based environments
    • Supports up to 900 positions (30ร—30 grids)

Compute Infrastructure

Software

  • Framework: PyTorch with transformers library
  • Precision: bfloat16
  • Format: Safetensors

Performance

The model is designed to solve hard difficulty mazes on 30ร—30 grids, demonstrating:

  • Multi-step planning capabilities for long navigation sequences
  • Ability to recognize and avoid dead ends
  • Strategic backtracking when necessary
  • Hierarchical decomposition of complex navigation problems

Citation

BibTeX:

@article{wang2025hierarchical,
  title={Hierarchical Reasoning Model},
  author={Wang, Guan and Li, Jin and Sun, Yuhao and Chen, Xing and Liu, Changling and Wu, Yue and Lu, Meng and Song, Sen and Yadkori, Yasin Abbasi},
  journal={arXiv preprint arXiv:2506.21734},
  year={2025}
}

APA:

Wang, G., Li, J., Sun, Y., Chen, X., Liu, C., Wu, Y., Lu, M., Song, S., & Yadkori, Y. A. (2025). Hierarchical Reasoning Model. arXiv preprint arXiv:2506.21734.

More Information

This checkpoint is a converted version of the original HRM checkpoint from sapientinc/HRM-checkpoint-maze-30x30-hard, formatted for use with the HuggingFace transformers library.

For more details about the HRM architecture and training methodology, see:

Example Use Cases

  1. Game AI: Intelligent maze navigation in video games
  2. Path Planning Research: Baseline for hierarchical planning algorithms
  3. Robotics: Inspiration for hierarchical navigation strategies
  4. Education: Demonstrating neural approaches to classic AI problems

Model Card Contact

For questions or issues with this converted checkpoint, please open an issue in the transformers repository.

Downloads last month
24
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support