glm-4.1v-thinking是智谱ai推出的开源视觉语言模型,专为复杂认知任务设计,支持图像、视频、文档等多模态输入。模型在glm-4v架构基础上引入思维链推理机制,基于课程采样强化学习策略,系统性提升跨模态因果推理能力与稳定性。模型轻量版glm-4.1v-9b-thinking(glm-4.1v-9b-base基座模型和glm-4.1v-9b-thinking具备深度思考和推理能力)参数量控制在10b级别,在28项权威评测中,有23项达成10b级模型最佳成绩,其中18项持平或超越参数量高达72b的qwen-2.5-vl,展现出小体积模型的极限性能潜力。
模型在MMStar、MMMU-Pro、ChartQAPro、OSWorld等28项权威评测中,有23项达成10B级模型的最佳成绩,其中18项持平或超越参数量高达72B的Qwen-2.5-VL。
<span>import</span> requests <span>import</span> json <span># 设置API接口地址和API Key</span> api_url <span>=</span> <span>"http://api.zhipuopen.com/v1/glm-4.1v-thinking"</span> api_key <span>=</span> <span>"your_api_key"</span> <span># 准备输入数据</span> input_data <span>=</span> <span>{</span> <span>"image"</span><span>:</span> <span>"image_url_or_base64_encoded_data"</span><span>,</span> <span>"text"</span><span>:</span> <span>"your_input_text"</span> <span>}</span> <span># 设置请求头</span> headers <span>=</span> <span>{</span> <span>"Authorization"</span><span>:</span> <span><span>f"Bearer </span><span><span>{</span>api_key<span>}</span></span><span>"</span></span><span>,</span> <span>"Content-Type"</span><span>:</span> <span>"application/json"</span> <span>}</span> <span># 发送请求</span> response <span>=</span> requests<span>.</span>post<span>(</span>api_url<span>,</span> headers<span>=</span>headers<span>,</span> data<span>=</span>json<span>.</span>dumps<span>(</span>input_data<span>)</span><span>)</span> <span># 获取结果</span> result <span>=</span> response<span>.</span>json<span>(</span><span>)</span> <span>print</span><span>(</span>result<span>)</span>
<span>from</span> transformers <span>import</span> AutoModelForVision2Seq<span>,</span> AutoProcessor <span>import</span> torch <span># 加载模型和处理器</span> model_name <span>=</span> <span>"THUDM/glm-4.1v-thinking"</span> model <span>=</span> AutoModelForVision2Seq<span>.</span>from_pretrained<span>(</span>model_name<span>)</span> processor <span>=</span> AutoProcessor<span>.</span>from_pretrained<span>(</span>model_name<span>)</span> <span># 准备输入数据</span> image_url <span>=</span> <span>"image_url_or_image_path"</span> text <span>=</span> <span>"your_input_text"</span> inputs <span>=</span> processor<span>(</span>images<span>=</span>image_url<span>,</span> text<span>=</span>text<span>,</span> return_tensors<span>=</span><span>"pt"</span><span>)</span> <span># 进行推理</span> <span>with</span> torch<span>.</span>no_grad<span>(</span><span>)</span><span>:</span> outputs <span>=</span> model<span>(</span><span>**</span>inputs<span>)</span> <span># 获取结果</span> result <span>=</span> processor<span>.</span>decode<span>(</span>outputs<span>.</span>logits<span>[</span><span>0</span><span>]</span><span>,</span> skip_special_tokens<span>=</span><span>True</span><span>)</span> <span>print</span><span>(</span>result<span>)</span>
以上就是GLM-4.1V-Thinking— 智谱AI开源的视觉语言模型系列的详细内容,更多请关注php中文网其它相关文章!
Copyright 2014-2025 https://www.php.cn/ All Rights Reserved | php.cn | 湘ICP备2023035733号