使用 Ollama 进行结构化 LLM 输出
原文towardsdatascience.com/structured-llm-output-using-ollama-73422889c7ad在版本 0.5 中Ollama 对其 LLM API 进行了重大增强。通过引入结构化输出Ollama 现在使得将模型的输出约束到由 JSON 模式定义的特定格式成为可能。在底层大多数系统使用 Pydantic 的功能来实现这一点。https://github.com/OpenDocCN/towardsdatascience-blog-zh-2024/raw/master/docs/img/9ac95b944b4e1578f3355945d51e0ebe.png作者Dalle-3的图片结构化输出解决了许多开发者在系统或流程从 LLM 获取输出以进行进一步处理时面临的一个棘手问题。对于该系统“知道”其输入的内容以准确处理并每次都产生可重复的结果来说这是非常重要的。同样你希望在每次向用户展示模型输出时都使用相同的格式以避免混淆和错误到目前为止确保大多数模型输出格式的一致性一直是一个头疼的问题但 Ollama 的新功能使得这样做变得相当容易正如我在我的示例代码片段中所希望展示的那样。在此之前你需要安装 Ollama 的最新版本。这不是一个关于 Ollama 或如何运行它的教程。如果你想了解这些信息请点击下面的文章我将详细介绍所有这些内容。Ollama 入门 – 第一部分话虽如此Ollama 可以在 Windows、Linux 和 macOS 上运行你可以在 Windows 或 macOS 上通过导航到ollama.com/并点击屏幕上出现的大的黑色下载按钮来安装最新版本。我将使用 Linux 系统为此你可以通过运行以下命令来安装$ curl-fsSL https://ollama.com/install.sh|sh下载完成后运行安装程序。接下来我们需要设置我们的开发环境。设置我们的开发环境在编码之前我总是创建一个单独的 Python 开发环境这样我就可以安装任何需要的软件。现在我在这个环境中所做的任何操作都是隔离的不会影响我的其他项目。我使用 Miniconda 来做这件事但你可以使用你了解的任何方法只要它最适合你。如果你想要走 Miniconda 路线并且还没有安装它你必须首先安装 Miniconda。使用此链接获取它Miniconda – Anaconda 文档1/ 创建我们的新开发环境并安装所需的库(base)$ conda create-n ollama_test python3.12-y(base)$ conda activate ollama_test(ollama_test)$ pip install ollama--upgrade(ollama_test)$ pip install pydantic bs4# Check the installed version is 0.5(ollama_test)$ ollama--version ollama versionis0.5.1(ollama_test)$2/ 决定使用 Ollama 的哪个模型Ollama 可以访问数百个开源模型。选择你想要使用的一个或多个模型并从 Ollama 中拉取它们。Meta 最近发布了他们最新的 llama 模型版本 3.3所以我将使用它。此外由于我将尝试一个基于图像的任务我将使用 Meta 的 Lama3.2 视觉模型。(ollama_test)$ ollama pull llama3.2-vision(ollama_test)$ ollama pull llama3.3我通常在我的 Jupyter 笔记本中编写示例代码。然而由于与第三方库不兼容目前尝试使用 Ollama 运行 Jupyter 的最新版本时存在问题。Jupyter 期望存在这个库的某个版本而 Ollama 期望存在这个库的另一个版本。因此这次我仅仅将我的代码保存到一个 Python 文件中并在命令行上用 Python 运行它。示例代码 1 - 图像解释对于这个例子我要求模型识别 PNG 图像中的不同动物类型。这里是那张图片。https://github.com/OpenDocCN/towardsdatascience-blog-zh-2024/raw/master/docs/img/eeeb2298bca58150cbc8e1cd28838c9a.png图片拼贴由作者制作单个动物图片来自 pexels.com这里是代码。它注释很多而且很短所以我就不详细解释它所做的事情了。fromollamaimportchatfrompydanticimportBaseModel# Define a Pydantic model for representing a single animal with its type.classAnimal(BaseModel):animal:str# Define a Pydantic model for representing a list of animals.# This model contains a list of Animal objects.classAnimalList(BaseModel):animals:list[Animal]# Function to analyze an image and identify all animals present in it.# Uses the Ollama chat function to interact with a vision-based model (llama3.2-vision).# Returns the results as an AnimalList object.defanalyze_animals_in_image(image_path:str)-AnimalList:# Call the chat function with the specified model, format, and parameters.responsechat(modelllama3.2-vision,formatAnimalList.model_json_schema(),messages[{role:user,content:Analyze this image and identify all animals present. For each animal, provide: - The type of animal Return information for ALL animal types visible in the image.,images:[image_path],},],options{temperature:0}# Ensure deterministic output by setting temperature to 0)# Validate and parse the response JSON into an AnimalList object.animals_dataAnimalList.model_validate_json(response.message.content)returnanimals_data# Main block to execute the script.if__name____main__:# Path to the image to be analyzed.image_pathD:/photos/2024/animals.png# Print an initial message before starting the analysis.print(nAnalyzing image for animals...)# Call the function to analyze the image and get the results.animals_resultanalyze_animals_in_image(image_path)# Print the analysis results.print(Animal Analysis Results:)print(fFound{len(animals_result.animals)}animals in the image:)# Loop through the list of animals and print details for each one.fori,animalinenumerate(animals_result.animals,1):print(fAnimal #{i}:)print(animal.model_dump_json)这产生了以下输出。Analyzing imageforanimals...Animal Analysis Results:Found5animalsinthe image:]Animal#1:bound method BaseModel.model_dump_json of Animal(animalWalrus)Animal#2:bound method BaseModel.model_dump_json of Animal(animalElephant Seal)Animal#3:bound method BaseModel.model_dump_json of Animal(animalZebra)Animal#4:bound method BaseModel.model_dump_json of Animal(animalElephants)Animal#5:bound method BaseModel.model_dump_json of Animal(animalKittens)这并不算太糟糕。模型可能对左上角的图片感到困惑。我不确定它是一只海象还是海豹。我认为是前者。示例代码 2 - 文本摘要如果你有一大堆不同的文本想要摘要但又希望摘要具有相同的结构这很有用。在这个例子中我们将处理一些著名科学家的维基百科条目并以高度组织化的方式检索他们的一些关键事实。在我们的总结中我们希望为每位科学家输出以下结构科学家姓名、出生时间和地点、他们主要的成就、他们获得诺贝尔奖的年份、他们去世的时间和地点这里是代码。frompydanticimportBaseModelimportrequestsfrombs4importBeautifulSoupfromollamaimportchatfromtypingimportListimportjson# For parsing JSON content from the response# List of Wikipedia URLsurls[https://en.wikipedia.org/wiki/Albert_Einstein,https://en.wikipedia.org/wiki/Richard_Feynman,https://en.wikipedia.org/wiki/James_Clerk_Maxwell,https://en.wikipedia.org/wiki/Alan_Guth]# Scientist names extracted from URLs for validationspecified_scientists[Albert Einstein,Richard Feynman,James Clerk Maxwell,Alan Guth]# Function to scrape Wikipedia contentdefget_article_content(url):try:print(fScraping URL:{url})# Debug printresponserequests.get(url)soupBeautifulSoup(response.content,html.parser)articlesoup.find(div,class_mw-body-content)ifarticle:contentn.join(p.textforpinarticle.find_all(p))print(fSuccessfully scraped content from:{url})# Debug printreturncontentelse:print(fNo content found in:{url})# Debug printreturnexceptrequests.exceptions.RequestExceptionase:print(fError scraping{url}:{e})return# Fetch content from each URLprint(Fetching content from all URLs...)# Debug printcontents[get_article_content(url)forurlinurls]print(Finished fetching content from all URLs.)# Debug print# Prompt for the summarization tasksummarization_prompt You will be provided with content from an article about a famous scientist. Your goal will be to summarize the article following the schema provided. Focus only on the specified scientist in the article. Here is a description of the parameters: - name: The name of the Scientist - born: When and where the scientist was born - fame: A summary of what their main claim to fame is - prize: The year they won the Nobel Prize - death: When and where they died # Pydantic model classesclassArticleSummary(BaseModel):name:strborn:strfame:strprize:intdeath:strclassArticleSummaryList(BaseModel):articles:List[ArticleSummary]# Function to summarize an articledefget_article_summary(text:str):try:print(Sending content to chat model for summarization...)# Debug printcompletionchat(modelllama3.3,messages[{role:system,content:summarization_prompt},{role:user,content:text}],formatArticleSummaryList.model_json_schema(),)print(Chat model returned a response.)# Debug print# Parse and validate the JSON responsearticlesArticleSummaryList.model_validate_json(completion.message.content)print(Successfully validated and parsed articles.)# Debug printreturnarticlesexceptExceptionase:print(fError during summarization:{e})returnNone# Function to format and filter summariesdefformat_summary(summary:ArticleSummaryList):formatted[]forarticleinsummary.articles:# Accessing the articles attribute directly# Filter out scientists not in the specified listifarticle.nameinspecified_scientists:formatted.append(fThe name of the Scientist:{article.name}nfWhen and where they were born:{article.born}nfTheir main claim to fame:{article.fame}nfThe year they won the Nobel Prize:{article.prize}nfWhen and where they died:{article.death}n)print(Finished formatting summary.)# Debug printreturnn.join(formatted)# Main function to process all articlesdefmain():summaries[]fori,contentinenumerate(contents):print(fProcessing content{i1}/{len(contents)}...)# Debug printifcontent.strip():# Skip empty articlessummaryget_article_summary(content)ifsummary:formatted_summaryformat_summary(summary)ifformatted_summary:# Only add if not empty after filteringsummaries.append(formatted_summary)# Print all formatted summariesprint(Final Summaries:)print(nn.join(summaries))if__name____main__:main()这里是最终输出。它完全运行大约需要 5 分钟我的系统配置相当高所以请小心。此外响应的质量高度依赖于你使用的 LLM 的质量。我尝试了 Llama3.2输出质量明显低于使用 3.3 版本时的输出。(ollama_test)C:Usersthomaollama-testpython tomtest.py Fetching contentfromallURLs...Scraping URL:https://en.wikipedia.org/wiki/Albert_Einstein Successfully scraped contentfrom:https://en.wikipedia.org/wiki/Albert_Einstein Scraping URL:https://en.wikipedia.org/wiki/Richard_Feynman Successfully scraped contentfrom:https://en.wikipedia.org/wiki/Richard_Feynman Scraping URL:https://en.wikipedia.org/wiki/James_Clerk_Maxwell Successfully scraped contentfrom:https://en.wikipedia.org/wiki/James_Clerk_Maxwell Scraping URL:https://en.wikipedia.org/wiki/Alan_Guth Successfully scraped contentfrom:https://en.wikipedia.org/wiki/Alan_Guth Finished fetching contentfromallURLs.Processing content1/4...Sending content to chat modelforsummarization...Chat model returned a response.Successfully validatedandparsed articles.Finished formatting summary.Processing content2/4...Sending content to chat modelforsummarization...Chat model returned a response.Successfully validatedandparsed articles.Finished formatting summary.Processing content3/4...Sending content to chat modelforsummarization...Chat model returned a response.Successfully validatedandparsed articles.Finished formatting summary.Processing content4/4...Sending content to chat modelforsummarization...Chat model returned a response.Successfully validatedandparsed articles.Finished formatting summary.Final Summaries:The name of the Scientist:Albert Einstein Whenandwhere they were born:14March1879Their main claim to fame:Einstein became one of the most famous scientific celebrities after the confirmation of his general theory of relativityin1919.The year they won the Nobel Prize:1921Whenandwhere they died:18April1955The name of the Scientist:Richard Feynman Whenandwhere they were born:May11,1918Their main claim to fame:Physicistandmathematician The year they won the Nobel Prize:1965Whenandwhere they died:February15,1988The name of the Scientist:James Clerk Maxwell Whenandwhere they were born:13June1831Their main claim to fame:Scottish physicistandmathematician The year they won the Nobel Prize:0Whenandwhere they died:5November1879The name of the Scientist:Alan Guth Whenandwhere they were born:Their main claim to fame:theoretical physics The year they won the Nobel Prize:2014Whenandwhere they died:注意艾伦·古斯仍然健在因此关于他的去世时间和地点的部分是空的。詹姆斯·克拉克·麦克斯韦在他生前没有获得诺贝尔奖因为他们当时并不存在。此外请注意模型无法提取任何科学家的去世地点尽管这些信息包含在维基百科摘录中。摘要在这篇文章中我提供了代码并展示了使用 Ollama 的结构化输出的两个关键功能。第一个示例展示了结构化输出在图像处理中的应用而第二个示例则专注于文本摘要。从 LLMs 中指定结构化输出是 Ollama 的一大步并且有众多应用。通过以可预测的 JSON 格式组织信息结构化输出提高了清晰度并使 LLMs 的响应更加一致减少了歧义。这种结构化方法使得无缝集成到下游应用如 API、数据库或可视化工具成为可能无需进行大量预处理同时简化了数据解析和自动化。与预定义规则的验证变得更容易最小化错误并确保符合预期的标准。最终结构化输出将 LLMs 转化为适用于各种实际用例的高度实用工具。_ 现在就到这里吧。希望你觉得这篇文章有用。如果你觉得有用请访问我的个人资料页面这里。从那里你可以看到我其他发表的故事并订阅以获取我发布新内容的通知。_我知道现在经济困难钱包紧张但如果这篇文章对你真的有帮助请考虑买我一杯小酒。如果你喜欢这个内容我认为你也会对以下这些文章感兴趣。介绍新的 Anthropic 令牌计数 APIPolars……但更快