Zai-org
GLM-4.5
The GLM-4.5 Series models are foundation models specifically engineered for intelligent agents. The flagship GLM-4.5 integrates a total of 355 billion parameters (32 billion active), unifying reasoning, coding, and agent capabilities to address complex application demands. As a hybrid reasoning system, it offers two operational modes: - Thinking Mode: Enables complex reasoning, tool invocation, and strategic planning - Non-Thinking Mode: Delivers low-latency responses for real-time interactions This architecture bridges high-performance AI with adaptive functionality for dynamic agent environments.
GLM 4.5V
Z.ai's GLM-4.5V sets a new standard in visual reasoning, achieving state-of-the-art performance across 42 benchmarks among open-source models. Beyond benchmarks, it excels in real-world applications through hybrid training, enabling comprehensive visual understanding—from image and video analysis and GUI interaction to complex document processing and precise visual grounding. In China's GeoGuessr challenge, GLM-4.5V outperformed 99% of 21,000 human players within 16 hours, reaching 66th place within a week. Built on the GLM-4.5-Air foundation and incorporating the approach of GLM-4.1V-Thinking, it utilizes a 106-billion-parameter MoE architecture to deliver scalable, efficient performance. This model bridges advanced AI research with practical deployment, delivering unmatched visual intelligence.
GLM 4.1V 9B Thinking
GLM-4.1V-9B-Thinking is an open-source Vision-Language Model (VLM) jointly released by Zhipu AI and Tsinghua University’s KEG Lab, specifically designed to handle complex multimodal cognitive tasks. Built upon the GLM-4-9B-0414 base model, it integrates Chain-of-Thought (CoT) reasoning and employs reinforcement learning strategies, significantly enhancing its cross-modal reasoning capabilities and stability. As a lightweight model with 9 billion parameters, it strikes an optimal balance between deployment efficiency and performance. Across 28 authoritative benchmark evaluations, it matches or surpasses the performance of the 72B-parameter Qwen-2.5-VL-72B in 18 metrics. The model excels in tasks such as image-text understanding, mathematical and scientific reasoning, and video comprehension, while also supporting 4K-resolution images and arbitrary aspect ratios.