Spaces:

blanchon
/

RobotHub-InferenceServer

Sleeping

File size: 12,040 Bytes

---
title: RobotHub Inference Server
emoji: 🤖
colorFrom: blue
colorTo: purple
sdk: docker
app_port: 8001
suggested_hardware: t4-small
suggested_storage: medium
short_description: Real-time ACT model inference server for robot control
tags:
  - robotics
pinned: false
fullWidth: true
---

# 🤖 RobotHub Inference Server

**AI-Powered Robot Control Engine for Real-time Robotics**

The RobotHub Inference Server is the **AI brain** of the RobotHub ecosystem. It's a FastAPI server that processes real-time camera feeds and robot state data to generate precise control commands using transformer models like ACT, Pi0, SmolVLA, and Diffusion Policies.

## 🏗️ How It Works in the RobotHub Ecosystem

The RobotHub Inference Server is part of a complete robotics control pipeline:

```
┌─────────────────┐    ┌─────────────────┐    ┌─────────────────┐    ┌─────────────────┐
│  RobotHub       │    │  RobotHub       │    │  RobotHub       │    │  Physical       │
│  Frontend       │───▶│  TransportServer│───▶│  InferenceServer│───▶│  Robot          │
│                 │    │                 │    │                 │    │                 │
│ • Web Interface │    │ • Video Streams │    │ • AI Models     │    │ • USB/Network   │
│ • Robot Config  │    │ • Joint States  │    │ • Real-time     │    │ • Joint Control │
│ • Monitoring    │    │ • WebRTC/WS     │    │ • Inference     │    │ • Cameras       │
└─────────────────┘    └─────────────────┘    └─────────────────┘    └─────────────────┘
        │                        ▲                        │                        │
        │                        │                        │                        │
        └────────────────────────┼────────────────────────┼────────────────────────┘
                                 │                        │
                            Status & Control         Action Commands
```

### 🔄 Data Flow

1. **Input Sources** → TransportServer:
   - **Camera Feeds**: Real-time video from robot cameras (front, wrist, overhead, etc.)
   - **Joint States**: Current robot joint positions and velocities
   - **Robot Configuration**: Joint limits, kinematics, calibration data

2. **TransportServer** → **Inference Server**:
   - Streams normalized camera images (RGB, 224x224 or custom resolution)
   - Sends normalized joint positions (most joints -100 to +100, gripper 0 to +100)
   - Maintains real-time communication via WebSocket/WebRTC

3. **Inference Server** → **AI Processing**:
   - **Vision Processing**: Multi-camera image preprocessing and encoding
   - **State Encoding**: Joint position normalization and history buffering
   - **Policy Inference**: Transformer model processes visual + proprioceptive data
   - **Action Generation**: Outputs sequence of robot joint commands

4. **Output** → **Robot Execution**:
   - **Action Chunks**: Sequences of joint commands (ACT outputs 10-100 actions per inference)
   - **Real-time Control**: 20Hz control loop, 2Hz inference loop
   - **Safety Monitoring**: Emergency stop, joint limit checking

## 🚀 Quick Start

The server is primarily a **FastAPI REST API**, but includes an optional **Gradio web interface** for easy debugging and testing without needing to write code or use curl commands.

### Option 1: Server + UI (Recommended for Testing)

```bash
# Clone and setup
git clone https://github.com/julien-blanchon/RobotHub-InferenceServer
cd RobotHub-InferenceServer
uv sync

# Launch with integrated UI (FastAPI + Gradio on same port)
python launch_simple.py
```

**Access Points:**
- 🎨 **Web Interface**: http://localhost:7860/ (create sessions, monitor performance)
- 📖 **API Documentation**: http://localhost:7860/api/docs (REST API reference)
- 🔍 **Health Check**: http://localhost:7860/api/health (system status)

### Option 2: Server Only (Production)

```bash
# Launch FastAPI server only (no UI)
python -m inference_server.cli --server-only

# Or with custom configuration
python -m inference_server.cli --server-only --host localhost --port 8080
```

**Access:**
- 📖 **API Only**: http://localhost:7860/api/docs
- 🔍 **Health Check**: http://localhost:7860/api/health

### Option 3: Docker

```bash
# Build and run
docker build -t robothub-inference-server .
docker run -p 7860:7860 \
  robothub-inference-server
```

## 🛠️ Setting Up Your Robot

### 1. **Connect Your Hardware**

You need the RobotHub TransportServer running first:

```bash
# Start the RobotHub TransportServer (dependency)
cd ../RobotHub-TransportServer
docker run -p 8000:8000 robothub-transport-server
```

### 2. **Create an Inference Session**

**Via Web Interface (Gradio UI):**
1. Open http://localhost:7860/
2. Enter your **model path** (e.g., `./checkpoints/act_pick_place_model`)
3. Configure **camera names** (e.g., `front,wrist,overhead`)
4. Set **TransportServer URL** (default: `http://localhost:8000`)
5. Click **"Create & Start AI Control"**

**Via REST API:**
```python
import httpx

session_config = {
    "session_id": "robot_assembly_task",
    "policy_path": "./checkpoints/act_assembly_model",
    "policy_type": "act",  # or "pi0", "smolvla", "diffusion"
    "camera_names": ["front_cam", "wrist_cam"],
    "transport_server_url": "http://localhost:8000",
    "language_instruction": "Pick up the red block and place it on the blue platform"  # For SmolVLA
}

async with httpx.AsyncClient() as client:
    # Create session
    response = await client.post("http://localhost:7860/api/sessions", json=session_config)
    
    # Start inference
    await client.post(f"http://localhost:7860/api/sessions/{session_config['session_id']}/start")
```

### 3. **Connect Robot & Cameras**

The robot and cameras connect to the **TransportServer**, not directly to the Inference Server:

```python
# Example: Connect robot to TransportServer
from transport_server_client import RoboticsConsumer, RoboticsProducer
from transport_server_client.video import VideoProducer

# Robot receives AI commands and executes them
joint_consumer = RoboticsConsumer('http://localhost:8000')
await joint_consumer.connect(workspace_id, joint_input_room_id)

def execute_joint_commands(commands):
    """Execute commands on your actual robot hardware"""
    for cmd in commands:
        joint_name = cmd['name']
        position = cmd['value']  # Normalized: most joints -100 to +100, gripper 0 to +100
        robot.move_joint(joint_name, position)

joint_consumer.on_joint_update(execute_joint_commands)

# Robot sends its current state back
joint_producer = RoboticsProducer('http://localhost:8000')
await joint_producer.connect(workspace_id, joint_output_room_id)

# Send current robot state periodically
await joint_producer.send_state_sync({
    'shoulder_pan_joint': current_joint_positions[0],
    'shoulder_lift_joint': current_joint_positions[1],
    # ... etc
})

# Cameras stream to TransportServer
for camera_name, camera_device in cameras.items():
    video_producer = VideoProducer('http://localhost:8000')
    await video_producer.connect(workspace_id, camera_room_ids[camera_name])
    await video_producer.start_camera(camera_device)
```

## 🎮 Supported AI Models

### **ACT (Action Chunking Transformer)**
- **Best for**: Complex manipulation tasks requiring temporal coherence
- **Output**: Chunks of 10-100 future actions per inference
- **Use case**: Pick-and-place, assembly, cooking tasks

### **Pi0 (Vision-Language Policy)**
- **Best for**: Tasks requiring language understanding
- **Output**: Single actions with language conditioning
- **Use case**: "Pick up the red mug", "Open the top drawer"

### **SmolVLA (Small Vision-Language-Action)**
- **Best for**: Lightweight vision-language tasks
- **Use case**: Simple manipulation with natural language

### **Diffusion Policy**
- **Best for**: High-precision continuous control
- **Use case**: Precise assembly, drawing, writing

## 📊 Monitoring & Debugging

### Using the Web Interface

The Gradio UI provides real-time monitoring:
- **Active Sessions**: View all running inference sessions
- **Performance Metrics**: Inference rate, control rate, camera FPS
- **Action Queue**: Current action buffer status
- **Error Logs**: Real-time error tracking

### Using the REST API

```bash
# Check active sessions
curl http://localhost:7860/api/sessions

# Get detailed session info
curl http://localhost:7860/api/sessions/my_robot_session

# Stop a session
curl -X POST http://localhost:7860/api/sessions/my_robot_session/stop

# Emergency stop all sessions
curl -X POST http://localhost:7860/api/debug/emergency_stop
```

## 🔧 Configuration

### Multi-Camera Setup

```python
# Configure multiple camera angles
session_config = {
    "camera_names": ["front_cam", "wrist_cam", "overhead_cam", "side_cam"],
    # Each camera gets its own TransportServer room
}
```

### Custom Joint Mappings

The server handles various robot joint naming conventions automatically:
- **LeRobot names**: `shoulder_pan_joint`, `shoulder_lift_joint`, `elbow_joint`, etc.
- **Custom names**: `base_rotation`, `shoulder_tilt`, `elbow_bend`, etc.
- **Alternative names**: `joint_1`, `joint_2`, `base_joint`, etc.

See `src/inference_server/models/joint_config.py` for full mapping details.

## 🔌 Integration Examples

### **Standalone Python Application**

```python
import asyncio
from transport_server_client import RoboticsProducer, RoboticsConsumer  
from transport_server_client.video import VideoProducer
import httpx

class RobotAIController:
    def __init__(self):
        self.inference_client = httpx.AsyncClient(base_url="http://localhost:7860/api")
        self.transport_url = "http://localhost:8000"
        
    async def start_ai_control(self, task_description: str):
        # 1. Create inference session
        session_config = {
            "session_id": f"task_{int(time.time())}",
            "policy_path": "./checkpoints/general_manipulation_act",
            "policy_type": "act",
            "camera_names": ["front", "wrist"],
            "language_instruction": task_description
        }
        
        response = await self.inference_client.post("/sessions", json=session_config)
        session_data = response.json()
        
        # 2. Connect robot to the same workspace/rooms
        await self.connect_robot_hardware(session_data)
        
        # 3. Start AI inference
        await self.inference_client.post(f"/sessions/{session_config['session_id']}/start")
        
        print(f"🤖 AI control started for task: {task_description}")

# Usage
controller = RobotAIController()
await controller.start_ai_control("Pick up the blue cup and place it on the shelf")
```

## 🚨 Safety & Best Practices

- **Emergency Stop**: Built-in emergency stop via API: `/sessions/{id}/stop`
- **Joint Limits**: All joint values are normalized (most joints -100 to +100, gripper 0 to +100)
- **Hardware Limits**: Robot driver should enforce actual hardware joint limits
- **Session Timeouts**: Automatic cleanup prevents runaway processes
- **Error Handling**: Graceful degradation when cameras disconnect

## 🚀 Deployment

### **Local Development**
```bash
# All services on one machine
python launch_simple.py  # Inference Server with UI
```

### **Production Setup**
```bash
# Server only (no UI)
python -m inference_server.cli --server-only --host localhost --port 7860

# Or with Docker
docker run -p 7860:7860 robothub-inference-server
```