zai

org-GLM-OCR

AIOCRMultimodalComputer VisionLLMDocument Understanding

+340

// summary

GLM-OCR is a multimodal OCR model based on the GLM-V architecture, specifically designed for complex document understanding. By introducing multi-token prediction loss and reinforcement learning, the model achieves exceptional recognition accuracy and generalization capabilities while maintaining a lightweight 0.9B parameter count. Additionally, the project provides a comprehensive SDK and various deployment solutions, supporting efficient local deployment and cloud API calls.

// use cases

Layout analysis and high-precision information extraction for complex documents

OCR recognition for special scenarios such as tables, code, and stamps

Lightweight deployment in high-concurrency services and edge computing environments