Spaces:

victor
/

glm-4-5v-brutalist-portal

Running

App Files Files Community

victor HF Staff commited on Sep 30

Commit

5b63760

verified ·

1 Parent(s): 357d069

brutalist website introducing this model

Browse files

Title: zai-org/GLM-4.5V · Hugging Face

URL Source: https://huggingface.co/zai-org/GLM-4.5V

Markdown Content:
![Image 1](https://raw.githubusercontent.com/zai-org/GLM-V/refs/heads/main/resources/logo.svg)

This model is part of the GLM-V family of models, introduced in the paper [GLM-4.1V-Thinking and GLM-4.5V: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning](https://huggingface.co/papers/2507.01006).

* **Paper**: [https://huggingface.co/papers/2507.01006](https://huggingface.co/papers/2507.01006)
* **GitHub Repository**: [https://github.com/zai-org/GLM-V/](https://github.com/zai-org/GLM-V/)
* **Online Demo**: [https://chat.z.ai/](https://chat.z.ai/)
* **API Access**: [ZhipuAI Open Platform](https://docs.z.ai/guides/vlm/glm-4.5v)
* **Desktop Assistant App**: [https://huggingface.co/spaces/zai-org/GLM-4.5V-Demo-App](https://huggingface.co/spaces/zai-org/GLM-4.5V-Demo-App)
* **Discord Community**: [https://discord.com/invite/8cnQKdAprg](https://discord.com/invite/8cnQKdAprg)

[](https://huggingface.co/zai-org/GLM-4.5V#introduction--model-overview) Introduction & Model Overview
------------------------------------------------------------------------------------------------------

Vision-language models (VLMs) have become a key cornerstone of intelligent systems. As real-world AI tasks grow increasingly complex, VLMs urgently need to enhance reasoning capabilities beyond basic multimodal perception — improving accuracy, comprehensiveness, and intelligence — to enable complex problem solving, long-context understanding, and multimodal agents.

Through our open-source work, we aim to explore the technological frontier together with the community while empowering more developers to create exciting and innovative applications.

**This Hugging Face repository hosts the `GLM-4.5V` model, part of the `GLM-V` series.**

### [](https://huggingface.co/zai-org/GLM-4.5V#glm-45v-1) GLM-4.5V

GLM-4.5V is based on ZhipuAI’s next-generation flagship text foundation model GLM-4.5-Air (106B parameters, 12B active). It continues the technical approach of GLM-4.1V-Thinking, achieving SOTA performance among models of the same scale on 42 public vision-language benchmarks. It covers common tasks such as image, video, and document understanding, as well as GUI agent operations.

[![Image 2: GLM-4.5V Benchmarks](https://raw.githubusercontent.com/zai-org/GLM-V/refs/heads/main/resources/bench_45v.jpeg)](https://raw.githubusercontent.com/zai-org/GLM-V/refs/heads/main/resources/bench_45v.jpeg)

Beyond benchmark performance, GLM-4.5V focuses on real-world usability. Through efficient hybrid training, it can handle diverse types of visual content, enabling full-spectrum vision reasoning, including:

* **Image reasoning** (scene understanding, complex multi-image analysis, spatial recognition)
* **Video understanding** (long video segmentation and event recognition)
* **GUI tasks** (screen reading, icon recognition, desktop operation assistance)
* **Complex chart & long document parsing** (research report analysis, information extraction)
* **Grounding** (precise visual element localization)

The model also introduces a **Thinking Mode** switch, allowing users to balance between quick responses and deep reasoning. This switch works the same as in the `GLM-4.5` language model.

### [](https://huggingface.co/zai-org/GLM-4.5V#glm-41v-9b) GLM-4.1V-9B

_Contextual information about GLM-4.1V-9B is provided for completeness, as it is part of the GLM-V series and foundational to GLM-4.5V's development._

Built on the [GLM-4-9B-0414](https://github.com/zai-org/GLM-4) foundation model, the **GLM-4.1V-9B-Thinking** model introduces a reasoning paradigm and uses RLCS (Reinforcement Learning with Curriculum Sampling) to comprehensively enhance model capabilities. It achieves the strongest performance among 10B-level VLMs and matches or surpasses the much larger Qwen-2.5-VL-72B in 18 benchmark tasks.

We also open-sourced the base model **GLM-4.1V-9B-Base** to support researchers in exploring the limits of vision-language model capabilities.

[![Image 3: Reinforcement Learning with Curriculum Sampling (RLCS)](https://raw.githubusercontent.com/zai-org/GLM-V/refs/heads/main/resources/rl.jpeg)](https://raw.githubusercontent.com/zai-org/GLM-V/refs/heads/main/resources/rl.jpeg)

Compared with the previous generation CogVLM2 and GLM-4V series, **GLM-4.1V-Thinking** brings:

1. The series’ first reasoning-focused model, excelling in multiple domains beyond mathematics.
2. **64k** context length support.
3. Support for **any aspect ratio** and up to **4k** image resolution.
4. A bilingual (Chinese/English) open-source version.

GLM-4.1V-9B-Thinking integrates the **Chain-of-Thought** reasoning mechanism, improving accuracy, richness, and interpretability. It leads on 23 out of 28 benchmark tasks at the 10B parameter scale, and outperforms Qwen-2.5-VL-72B on 18 tasks despite its smaller size.

[![Image 4: GLM-4.1V-9B Benchmarks](https://raw.githubusercontent.com/zai-org/GLM-V/refs/heads/main/resources/bench.jpeg)](https://raw.githubusercontent.com/zai-org/GLM-V/refs/heads/main/resources/bench.jpeg)

[](https://huggingface.co/zai-org/GLM-4.5V#project-updates) Project Updates
---------------------------------------------------------------------------

* 🔥 **News**: `2025/08/11`: We released **GLM-4.5V** with significant improvements across multiple benchmarks. We also open-sourced our handcrafted **desktop assistant app** for debugging. Once connected to GLM-4.5V, it can capture visual information from your PC screen via screenshots or screen recordings. Feel free to try it out or customize it into your own multimodal assistant. Click [here](https://huggingface.co/spaces/zai-org/GLM-4.5V-Demo-App) to download the installer or [build from source](https://github.com/zai-org/GLM-V/blob/main/examples/vllm-chat-helper/README.md)!
* **News**: `2025/07/16`: We have open-sourced the **VLM Reward System** used to train GLM-4.1V-Thinking. View the [code repository](https://github.com/zai-org/GLM-V/tree/main/glmv_reward) and run locally: `python examples/reward_system_demo.py`.
* **News**: `2025/07/01`: We released **GLM-4.1V-9B-Thinking** and its [technical report](https://arxiv.org/abs/2507.01006).

[](https://huggingface.co/zai-org/GLM-4.5V#model-implementation-code) Model Implementation Code
-----------------------------------------------------------------------------------------------

* GLM-4.5V model algorithm: see the full implementation in [transformers](https://github.com/huggingface/transformers/tree/main/src/transformers/models/glm4v_moe).
* GLM-4.1V-9B-Thinking model algorithm: see the full implementation in [transformers](https://github.com/huggingface/transformers/tree/main/src/transformers/models/glm4v).
* Both models share identical multimodal preprocessing, but use different conversation templates — please distinguish carefully.

[](https://huggingface.co/zai-org/GLM-4.5V#usage) Usage
-------------------------------------------------------

### [](https://huggingface.co/zai-org/GLM-4.5V#environment-installation) Environment Installation

For `SGLang` and `transformers`:

```
pip install -r https://raw.githubusercontent.com/zai-org/GLM-V/main/requirements.txt
```

For `vLLM`:

```
pip install -U vllm --pre --extra-index-url https://wheels.vllm.ai/nightly
pip install transformers-v4.55.0-GLM-4.5V-preview
```

### [](https://huggingface.co/zai-org/GLM-4.5V#quick-start-with-transformers) Quick Start with Transformers

```
from transformers import AutoProcessor, Glm4vMoeForConditionalGeneration
import torch

MODEL_PATH = "zai-org/GLM-4.5V"
messages = [
{
"role": "user",
"content": [
{
"type": "image",
"url": "https://upload.wikimedia.org/wikipedia/commons/f/fa/Grayscale_8bits_palette_sample_image.png"
},
{
"type": "text",
"text": "describe this image"
}
],
}
]
processor = AutoProcessor.from_pretrained(MODEL_PATH)
model = Glm4vMoeForConditionalGeneration.from_pretrained(
pretrained_model_name_or_path=MODEL_PATH,
torch_dtype="auto",
device_map="auto",
)
inputs = processor.apply_chat_template(
messages,
tokenize=True,
add_generation_prompt=True,
return_dict=True,
return_tensors="pt"
).to(model.device)
inputs.pop("token_type_ids", None)
generated_ids = model.generate(**inputs, max_new_tokens=8192)
output_text = processor.decode(generated_ids[0][inputs["input_ids"].shape[1]:], skip_special_tokens=False)
print(output_text)
```

The special tokens `<|begin_of_box|>` and `<|end_of_box|>` in the response mark the answer’s bounding box in the image. The bounding box is given as four numbers — for example `[x1, y1, x2, y2]`, where `(x1, y1)` is the top-left corner and `(x2, y2`)` is the bottom-right corner. The bracket style may vary ([], [[]], (), <>, etc.), but the meaning is the same: it encloses the coordinates of the box. These coordinates are relative values between 0 and 1000, normalized to the image size.

For more code information, please visit our [GitHub](https://github.com/zai-org/GLM-V/).

### [](https://huggingface.co/zai-org/GLM-4.5V#grounding-example) Grounding Example

GLM-4.5V equips precise grounding capabilities. Given a prompt that requests the location of a specific object, GLM-4.5V is able to reasoning step-by-step and identify the bounding boxes of the target object. The query prompt supports complex descriptions of the target object as well as specified output formats, for example:

> * Help me to locate in the image and give me its bounding boxes.
> * Please pinpoint the bounding box [[x1,y1,x2,y2], …] in the image as per the given description.

Here, `<expr>` is the description of the target object. The output bounding box is a quadruple $$[x_1,y_1,x_2,y_2]$$ composed of the coordinates of the top-left and bottom-right corners, where each value is normalized by the image width (for x) or height (for y) and scaled by 1000.

In the response, the special tokens `<|begin_of_box|>` a

Files changed (2) hide show

README.md +7 -4
index.html +157 -18

README.md CHANGED Viewed

@@ -1,10 +1,13 @@
 ---
-title: Glm 4 5v Brutalist Portal
-emoji: 🐢
-colorFrom: purple
 colorTo: blue
 sdk: static
 pinned: false
 ---
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

 ---
+title: GLM-4.5V Brutalist Portal 🚧
+colorFrom: green
 colorTo: blue
+emoji: 🐳
 sdk: static
 pinned: false
+tags:
+  - deepsite-v3
 ---
+# Welcome to your new DeepSite project!
+This project was created with [DeepSite](https://deepsite.hf.co).

index.html CHANGED Viewed

@@ -1,19 +1,158 @@
-<!doctype html>
-<html>
-	<head>
-		<meta charset="utf-8" />
-		<meta name="viewport" content="width=device-width" />
-		<title>My static Space</title>
-		<link rel="stylesheet" href="style.css" />
-	</head>
-	<body>
-		<div class="card">
-			<h1>Welcome to your static Space!</h1>
-			<p>You can modify this app directly by editing <i>index.html</i> in the Files and versions tab.</p>
-			<p>
-				Also don't forget to check the
-				<a href="https://huggingface.co/docs/hub/spaces" target="_blank">Spaces documentation</a>.
-			</p>
-		</div>
-	</body>
 </html>

+<!DOCTYPE html>
+<html lang="en">
+<head>
+    <meta charset="UTF-8">
+    <meta name="viewport" content="width=device-width, initial-scale=1.0">
+    <title>GLM-4.5V · zai-org</title>
+    <link rel="icon" type="image/x-icon" href="/static/favicon.ico">
+    <script src="https://cdn.tailwindcss.com"></script>
+    <script src="https://unpkg.com/feather-icons"></script>
+    <script src="https://cdn.jsdelivr.net/npm/feather-icons/dist/feather.min.js"></script>
+    <script>
+        tailwind.config = {
+            theme: {
+                extend: {
+                    colors: {
+                        brutal: {
+                            50: '#fafafa',
+                            100: '#f5f5f5',
+                            200: '#eeeeee',
+                            300: '#e0e0e0',
+                            400: '#bdbdbd',
+                            500: '#9e9e9e',
+                            600: '#757575',
+                            700: '#616161',
+                            800: '#424242',
+                            900: '#212121'
+                        }
+                    }
+                }
+            }
+        }
+    </script>
+</head>
+<body class="bg-brutal-100 text-brutal-900 font-mono">
+    <!-- Navigation -->
+    <nav class="border-b-4 border-brutal-900 bg-brutal-200 p-4">
+        <div class="container mx-auto px-4 flex justify-between items-center">
+            <div class="flex items-center space-x-4">
+                <div class="w-8 h-8 bg-brutal-900"></div>
+            <div class="hidden md:flex space-x-6">
+                <a href="#overview" class="hover:bg-brutal-300 px-3 py-1">Overview</a>
+                <a href="#capabilities" class="hover:bg-brutal-300 px-3 py-1">Capabilities</a>
+                <a href="#implementation" class="hover:bg-brutal-300 px-3 py-1">Implementation</a>
+                <a href="#updates" class="hover:bg-brutal-300 px-3 py-1">Updates</a>
+                <a href="#citation" class="hover:bg-brutal-300 px-3 py-1">Citation</a>
+            </div>
+            <button class="md:hidden p-2">
+                    <i data-feather="menu"></i>
+                </button>
+        </div>
+    </nav>
+    <!-- Hero Section -->
+    <section class="border-b-4 border-brutal-900 bg-brutal-200">
+        <div class="container mx-auto px-4 py-20">
+            <div class="max-w-6xl mx-auto">
+                <!-- Model Title -->
+                <div class="bg-brutal-300 p-6 mb-8">
+                    <h1 class="text-4xl md:text-6xl font-black tracking-tight">zai-org/GLM-4.5V</h1>
+                    <p class="text-xl mt-4">A Vision-Language Model with 1M Context and Grounding</p>
+                </div>
+                <!-- Model Logo -->
+                <div class="flex justify-center mb-8">
+                    <div class="w-32 h-32 bg-brutal-900 flex items-center justify-center">
+                        <div class="w-24 h-24 bg-brutal-100 border-4 border-brutal-900">
+                        <div class="w-20 h-20 bg-brutal-500"></div>
+                    </div>
+                </div>
+            </div>
+        </div>
+    </section>
+    <!-- Main Content Sections -->
+    <main class="container mx-auto px-4 py-12">
+        <!-- Introduction -->
+        <section id="overview" class="mb-16">
+            <div class="border-4 border-brutal-900 bg-brutal-200 p-8">
+                <h2 class="text-3xl font-black mb-6">GLM-4.5V: The Multimodal Powerhouse</h2>
+            <div class="grid grid-cols-1 md:grid-cols-2 gap-8">
+                <div>
+                    <p class="text-lg leading-relaxed">GLM-4.5V is zai-org's latest vision-language model featuring 1M context length, superior multimodal capabilities, and advanced grounding functionality.</p>
+                </div>
+            </div>
+        </section>
+    <!-- Capabilities Grid -->
+    <section id="capabilities" class="mb-16">
+        <div class="border-4 border-brutal-900 bg-brutal-200 p-6">
+            <h3 class="text-2xl font-black mb-6 border-b-4 border-brutal-900 p-4">
+                <h3 class="text-xl font-black mb-4">Multimodal Capabilities</h3>
+            <div class="grid grid-cols-1 md:grid-cols-3 gap-4">
+            <div class="border-4 border-brutal-900 bg-brutal-200 p-6">
+                <div class="text-center">
+                    <div class="w-16 h-16 bg-brutal-900"></div>
+            <div class="bg-brutal-300 p-4">
+                    <h4 class="text-lg font-black">Image Reasoning</h4>
+                    <p class="mt-2">Advanced visual question answering and detailed image analysis</p>
+                </div>
+            </div>
+        </section>
+    <!-- Implementation Section -->
+    <section id="implementation" class="mb-16">
+            <h2 class="text-3xl font-black mb-6">Implementation & Usage</h2>
+            <div class="border-4 border-brutal-900 bg-brutal-200 p-4">
+            <h3 class="text-xl font-black mb-4">Quick Start</h3>
+            <div class="bg-brutal-900 text-brutal-100 p-6">
+                <h3 class="text-lg font-black mb-4">Environment Setup</h3>
+            <div class="bg-brutal-300 p-4">
+                    <code class="block p-4 bg-brutal-800 text-brutal-100 overflow-x-auto">
+# Install dependencies
+pip install torch torchvision transformers
+pip install git+https://github.com/zai-org/glm-4.5v
+                </code>
+            </div>
+        </section>
+    <!-- Updates Section -->
+    <section id="updates" class="mb-16">
+            <h2 class="text-3xl font-black mb-6">Project Updates</h2>
+            <div class="border-4 border-brutal-900 bg-brutal-200 p-4">
+            <h3 class="text-lg font-black">Latest Release: v1.0.0</h3>
+            <div class="space-y-4">
+                <div class="border-4 border-brutal-900 bg-brutal-300 p-4">
+                <p class="text-sm text-brutal-600">2024-12-19</p>
+                <p class="text-lg">Initial release with comprehensive vision-language capabilities</p>
+            </div>
+        </section>
+    <!-- Citation -->
+    <section id="citation" class="mb-16">
+            <div class="border-4 border-brutal-900 bg-brutal-200 p-4">
+                    <p class="font-medium">If you use GLM-4.5V in your research, please cite:</p>
+            <div class="bg-brutal-900 text-brutal-100 p-6">
+                <p class="text-sm">@misc{glm45v2024,<br>
+      title={GLM-4.5V: A Vision-Language Model with 1M Context</p>
+        </section>
+    </main>
+    <!-- Footer -->
+    <footer class="border-t-4 border-brutal-900 bg-brutal-200 p-8">
+                <div class="flex flex-wrap justify-between items-center">
+            <div class="flex space-x-6">
+                <a href="https://github.com/zai-org/GLM-4.5V">
+                <div class="flex space-x-4">
+                    <a href="https://github.com/zai-org/GLM-4.5V" class="hover:bg-brutal-300 px-3 py-1">GitHub</a>
+                    <a href="https://huggingface.co/zai-org/GLM-4.5V" class="hover:bg-brutal-300 px-3 py-1">Demo</a>
+                    <a href="https://huggingface.co/zai-org/GLM-4.5V" class="hover:bg-brutal-300 px-3 py-1">API</a>
+                    <a href="https://discord.gg/zai-org" class="hover:bg-brutal-300 px-3 py-1">Discord</a>
+                </div>
+            </div>
+        </footer>
+    <script>
+        feather.replace();
+    </script>
+</body>
 </html>