2025-04-12发表2025-06-04更新AI / LangChain17 分钟读完 (大约2613个字)

LangChain：模型I/O之输出解析器

LangChain的输出解析器（Output Parser）是Model I/O模块的核心组件之一，主要用于将语言模型（LLM）的非结构化文本输出转换为结构化数据。它们确保模型输出符合特定格式，便于后续处理，比如解析模型的输出数据。

在大语言模型应用开发中，结构化数据提高了数据处理的效率、简化了数据的存储和检索、支持数据分析，有助于提高数据质量。

输出解析器的作用

输出解析器：是通过改变提示词模板，即增加输出指令，来指导模型按照特定的格式输出内容。

原本提示词模板是不含输出指令的，如果想得到某种特定格式的输出结果，就得指定输出指令，就需要用到输出解析器。如果不需要输出特定的格式，则保持原有提示词模板即可。

输出特定格式的结果：可以直接将指令写入提示词模板；也可以构造好提示词模板后使用输出解析器的预设指令。两者效果是等价的，区别在于是新自写，还是使用预设指令；是一起写，还是分开写。

功能

输出解析器通过格式化指令嵌入到提示词（Prompt）中，引导LLM生成符合特定格式的响应，再通过解析逻辑提取结构化数据。例如，要求LLM输出JSON、列表或日期格式。

作用

结构化输出：将 LLM 返回的自由文本转换为 JSON、列表、字典等结构化格式。
数据验证：确保输出符合预定义的格式或数据类型（如日期、数值）。
流程标准化：为后续业务逻辑提供统一的输入格式。

优势

标准化输出：避免因LLM输出格式不统一导致的解析错误。
增强可控性：支持复杂结构的生成（如嵌套对象、多字段数据）。

输出解析器的类型

LangChain 提供了一系列预设计的输出解析器，可以针对不同的数据类型给出合适的输出指令。

基础解析器

列表解析器（ListOutputParser）：用于解析列表类型的输出。

（如CommaSeparatedListOutputParser）：将逗号分隔的文本转为列表。
日期时间解析器（DatetimeOutputParser）：提取日期时间字符串并转换为Python对象。
布尔值解析器（BooleanOutputParser）：用于解析布尔值类型的输出。
枚举类型解析器（EnumOutputParser）：用于解析枚举类型的输出。

结构化解析器

结构化输出解析器（StructuredOutputParser）：用于解析具有特定结构的输出。
Pydantic解析器（PydanticOutputParser）：基于Pydantic模型定义字段及类型，生成结构化JSON。
JSON解析器（JsonOutputParser）：直接解析JSON字符串。

高级解析器

自动修复解析器（OutputFixingParser）：当解析失败时，调用另一个LLM修复错误。
重试解析器（RetryOutputParser）：多次尝试解析或生成新输出。

自定义解析器

继承BaseOutputParser类，实现parse()和get_format_instructions()方法，例如提取Markdown代码块内容。

输出解析器的使用

使用方式

列表解析器

输出解析器的使用主要靠提示词模板对象的 partial方法注入输出指令的字符串，主要实现方式是利用 PromptTemplate对象的 partial方法，或在实例化 PromptTemplate对象时传递 partial_variables参数。

首先使用 output_parser.get_format_instructions()获取预设的输出指令。
然后在实例化 PromptTemplate类时将 format_instructions 作为 partial_variables的一部分传入。
再使用 output_parser.parse()对响应的内容格式化。

示例：

from langchain.output_parsers import CommaSeparatedListOutputParser
from langchain_core.prompts import PromptTemplate
from langchain_openai import ChatOpenAI
from deepseek.config import DeepSeekConfig

# deepseek config
config = DeepSeekConfig()

model = ChatOpenAI(
    model=config.model,
    api_key=config.api_key,
    base_url=config.base_url,
    temperature=0.7,
    max_tokens=1024,
)
# 输出解析器
output_parser = CommaSeparatedListOutputParser()
format_instructions = output_parser.get_format_instructions()

# 创建提示模板
prompt = PromptTemplate(
    template="列出{city}的排名{num}的免费景点。{format_instructions}",
    input_variables=["city", "num"],
    partial_variables={"format_instructions": format_instructions}
)

partial_prompt = prompt.format(city="深圳", num=3)
response = model.invoke(partial_prompt)
# 将返回内容解析为列表
format_response = output_parser.parse(response.content)
print(format_response)

# 输出：['莲花山公园', '深圳湾公园', '大梅沙海滨公园']

结构化解析器

结构化输出解析器（StructuredOutputParser）可以将模型原本返回的字符串形式的输出，转化为可以在代码中直接使用的数据结构。

通过定义输出的数据结构，提示词模板中加入了包含这个定义的输出指令，让模型输出符合定义的数据结构。本质上来说就是告诉模型数据结构定义，要求模板给出一个符合该定义的数据，不再仅仅是一句话的回答，而是抽象的数据结构。

import re
from langchain.output_parsers import StructuredOutputParser, ResponseSchema
from langchain_core.output_parsers import BaseOutputParser
from langchain_core.prompts import PromptTemplate, ChatPromptTemplate
from langchain_openai import ChatOpenAI
from deepseek.config import DeepSeekConfig
import json

# deepseek config
config = DeepSeekConfig()

model = ChatOpenAI(
    model=config.model,
    api_key=config.api_key,
    base_url=config.base_url,
    temperature=0.7,
    max_tokens=1024,
)

# 定义响应模式
response_schemas = [
    ResponseSchema(name="name", description="景点名称"),
    ResponseSchema(name="description", description="景点描述"),
    ResponseSchema(name="rank", description="景点排名")
]

# 创建结构化输出解析器
output_parser = StructuredOutputParser.from_response_schemas(response_schemas)


class JsonBlockParser(BaseOutputParser):
    def get_format_instructions(self) -> str:
        pass

    def parse(self, text: str) -> str:
        match = re.search(r'```.*?\n(.*?)```', text, re.DOTALL)
        return match.group(1).strip() if match else ""


# 获取格式化指令并构建提示模板
format_instructions = output_parser.get_format_instructions()
prompt = PromptTemplate(
    template="请列出{city}的排名前{num}的免费景点，每个景点需要包含名称、描述和排名。{format_instructions}",
    input_variables=["city", "num"],
    partial_variables={"format_instructions": format_instructions}
)

partial_prompt = prompt.format(city="深圳", num=2)
response = model.invoke(partial_prompt)

json_parser = JsonBlockParser()

# print(response.content)
try:
    format_response = json_parser.parse(response.content)
    print(format_response)

    json_obj = json.loads(format_response)
    print(json_obj[0])
    print(json_obj[0].get("name"))
except Exception as e:
    print(f"解析输出时出错: {e}")
    print(f"原始响应: {response.content}")

原始输出是个JSON代码块：

```json
[
    {
        "name": "深圳湾公园",
        "description": "深圳湾公园是深圳市内一处著名的海滨休闲公园，拥有优美的海岸线和丰富的自然景观，是市民和游客休闲、健身、观鸟的热门去处。",
        "rank": "1"
    },
    {
        "name": "莲花山公园",
        "description": "莲花山公园位于深圳市中心区，是市民休闲娱乐的重要场所。公园内有邓小平雕像、风筝广场等景点，登顶可俯瞰深圳市中心全景。",
        "rank": "2"
    }
]
```

使用自定义解析器截取JSON代码块中的内容，输出结果

[
    {
        "name": "深圳湾公园",
        "description": "深圳湾公园是深圳市著名的海滨公园，拥有优美的海岸线和丰富的自然景观，是市民休闲、健身、观海的好去处。",
        "rank": "1"
    },
    {
        "name": "莲花山公园",
        "description": "莲花山公园位于深圳市中心，是市民喜爱的城市公园之一，山顶可俯瞰深圳市区全景，公园内还有邓小平雕像等景点。",
        "rank": "2"
    }
]
{'name': '深圳湾公园', 'description': '深圳湾公园是深圳市著名的海滨公园，拥有优美的海岸线和丰富的自然景观，是市民休闲、健身、观海的好去处。', 'rank': '1'}
深圳湾公园

使用示例

示例1：列表解析器

from langchain.output_parsers import CommaSeparatedListOutputParser
from langchain.prompts import ChatPromptTemplate
from langchain_openai import ChatOpenAI
from deepseek.config import DeepSeekConfig

# deepseek config
config = DeepSeekConfig()

model = ChatOpenAI(
    model=config.model,
    api_key=config.api_key,
    base_url=config.base_url,
    temperature=0.7,
    max_tokens=1024,
)

output_parser = CommaSeparatedListOutputParser()
prompt = ChatPromptTemplate.from_template("列出{city}的{num}个景点。{format_instructions}")

partial_variables = {"city": "深圳", "num": 3, "format_instructions": output_parser.get_format_instructions()}

chain = prompt | model | output_parser
response = chain.invoke(partial_variables)
print(response)
# 输出：['世界之窗', '欢乐谷', '东部华侨城']

示例2：Pydantic解析器

from langchain.output_parsers import PydanticOutputParser
from langchain_core.prompts import ChatPromptTemplate
from langchain_openai import ChatOpenAI
from pydantic import BaseModel, Field

from deepseek.config import DeepSeekConfig


class Book(BaseModel):
    title: str = Field(description="书名")
    author: str = Field(description="作者")


output_parser = PydanticOutputParser(pydantic_object=Book)
prompt = ChatPromptTemplate.from_template("解析书籍信息：{text}\n{format_instructions}")

config = DeepSeekConfig()

model = ChatOpenAI(
    model=config.model,
    api_key=config.api_key,
    base_url=config.base_url,
    temperature=0.7,
    max_tokens=1024,
)

chain = prompt | model | output_parser

partial_variables = {"text": "《朝花夕拾》是鲁迅的散文集。", "format_instructions": output_parser.get_format_instructions()}

response = chain.invoke(partial_variables)
print(response)  
# 输出：Book(title='朝花夕拾', author='鲁迅')

示例3：自定义解析器

提取Markdown代码块

from langchain.schema import BaseOutputParser
import re

from langchain_core.prompts import ChatPromptTemplate
from langchain_openai import ChatOpenAI

from deepseek.config import DeepSeekConfig


class CodeBlockParser(BaseOutputParser):
    def get_format_instructions(self) -> str:
        pass

    def parse(self, text: str) -> str:
        match = re.search(r'```.*?\n(.*?)```', text, re.DOTALL)
        return match.group(1).strip() if match else ""


output_parser = CodeBlockParser()
prompt = ChatPromptTemplate.from_template("生成查询用户年龄的SQL语句。{format_instructions}")

config = DeepSeekConfig()

model = ChatOpenAI(
    model=config.model,
    api_key=config.api_key,
    base_url=config.base_url,
    temperature=0.7,
    max_tokens=1024,
)

partial_variables = {"format_instructions": "将结果包裹在```sql代码块中。"}

chain = prompt | model | output_parser
response = chain.invoke(partial_variables)
print(response)  
# 输出：SELECT age FROM users WHERE id = 1;

最佳实践与扩展

结合LCEL链式调用
使用|管道符连接提示模板、模型和解析器，提升代码简洁性。
错误处理
使用OutputFixingParser或RetryOutputParser增强解析鲁棒性，应对LLM输出偏差。
多模态解析
支持XML、YAML等格式解析，适用于API接口或复杂数据交换场景。

官方推荐与更新

最新功能：LangChain v0.1+ 新增了对XMLOutputParser和YamlOutputParser的支持，适用于需要严格数据格式的场景。
文档参考：建议查阅LangChain官方文档获取最新解析器类型及API细节。

通过灵活选择解析器类型，开发者可以高效构建从简单列表到复杂嵌套结构的应用，显著提升LLM输出的可用性。

LangChain：模型I/O之输出解析器

http://blog.gxitsky.com/2025/04/12/AI-LangChain-009-Output-Parse/

作者

光星

发布于

2025-04-12

更新于

2025-06-04

LangChain：模型I/O之输出解析器

输出解析器的作用

功能

作用

优势

输出解析器的类型

基础解析器

结构化解析器

高级解析器

自定义解析器

输出解析器的使用

使用方式

列表解析器

结构化解析器

使用示例

示例1：列表解析器

示例2：Pydantic解析器

示例3：自定义解析器

最佳实践与扩展

官方推荐与更新

作者

发布于

更新于

许可协议

喜欢这篇文章？打赏一下作者吧

评论

目录