Ollama英特尔优化版使用体验-小果冻之家

Ollama在魔塔社区ModelScope上有个英特尔优化版，可以在受支持的Intel GPU(包括核显)上运行大模型，可谓是没有独显用户的福音，链接 https://modelscope.cn/models/ipexllm/ollama-ipex-llm

显卡支持

11~14代的Intel核显；
Ultra系列上的核显；
Arc A系列；
Arc B系列；

模型方面支持常见的模型，如deeepseek:r1等。

运行

以Windows为例

进到解压目录，执行.\start-ollama.bat；
同样同在解压目录下运行 .\ollama.exe run --verbose deepseek-r1:7b；

速度

在公司电脑上Intel i5-13420H(核显是intel UHD Graphics，16G内存核显可用一半也就是8G)上运行 deepseek-r1:7b，显存占用6.1G，速度约7token/s，家里的Intel i5-12500H(核显是Intel Iris Graphics)运行同样的模型，输出约8.6token/s。

交互脚本

import requests

model = 'modelscope.cn/unsloth/DeepSeek-R1-Distill-Qwen-7B-GGUF:Q4_K_M'
prompt = '给我讲个笑话'

response = requests.post(
    "http://localhost:11434/api/generate",
    json={
        "model": model,
        "prompt": prompt,
        "stream": False  # 设为 False 方便统计时间
    }
)

response_json = response.json()

print(response_json)

或者使用ollama库交互

# 先pip安装
# pip install ollama  -i https://mirrors.163.com/pypi/simple/
import ollama

model = 'modelscope.cn/unsloth/DeepSeek-R1-Distill-Qwen-7B-GGUF:Q4_K_M'
prompt = '给我讲个笑话'
response = ollama.generate(model=model, prompt=prompt)
print(response)

ModelScope上的deepseek-r1:7b模型名称实际是modelscope.cn/unsloth/DeepSeek-R1-Distill-Qwen-7B-GGUF:Q4_K_M，可通过http://localhost:11434/api/tags查看。

点击量: 1

显卡支持

运行

速度

交互脚本

相关文章:

Leave a Comment Cancel reply