LLM 智能体¶

LlmAgent（通常也简称为 Agent）是 ADK 中的核心组件，作为你应用程序的"思考"部分。它利用大型语言模型（LLM）的能力进行推理、理解自然语言、做出决策、生成回应，以及与工具交互。

与遵循预定义执行路径的确定性工作流智能体不同，LlmAgent 的行为是非确定性的。它使用 LLM 来解释指令和上下文，动态决定如何继续、使用哪些工具（如果有的话），或者是否将控制权转交给另一个智能体。

构建一个有效的 LlmAgent 涉及定义它的身份、通过指令明确引导它的行为，以及为其配备必要的工具和能力。

定义智能体的身份和目的¶

首先，你需要确定这个智能体是什么以及它用来做什么。

name（必需）： 每个智能体都需要一个唯一的字符串标识符。这个 name 对内部操作至关重要，特别是在多智能体系统中，智能体需要相互引用或委托任务。选择一个反映智能体功能的描述性名称（例如，customer_support_router、billing_inquiry_agent）。避免使用像 user 这样的保留名称。
description（可选，推荐用于多智能体）： 提供智能体能力的简明摘要。这个描述主要由其他 LLM 智能体使用，用于确定它们是否应该将任务路由到这个智能体。使其足够具体，以便与同伴区分开来（例如，"处理当前账单查询"，而不仅仅是"账单智能体"）。
model（必需）： 指定将为这个代理的推理提供支持的基础语言模型。这是一个类似于 "gemini-2.0-flash" 的字符串标识符。模型的选择会影响代理的能力、成本和性能。有关可用选项和注意事项，请参阅模型文档。

# 示例：定义基本身份
capital_agent = LlmAgent(
    model="gemini-2.0-flash",
    name="capital_agent",
    description="回答用户关于给定国家首都的问题。"
    # 指令和工具将在接下来添加
)

引导智能体：指令（`instruction`）¶

instruction 参数可能是塑造 LlmAgent 行为最关键的参数。它是一个字符串（或返回字符串的函数），告诉智能体：

其核心任务或目标。
其个性或角色（例如，"你是一个乐于助人的助手"，"你是一个机智的海盗"）。
对其行为的约束（例如，"只回答关于 X 的问题"，"永远不要透露 Y"）。
如何以及何时使用其 tools。你应该解释每个工具的用途以及应该在什么情况下调用它，补充工具本身的任何描述。
其输出的期望格式（例如，"以 JSON 形式回应"，"提供一个项目符号列表"）。

有效指令的技巧：

清晰明确： 避免含糊不清。清楚地说明期望的行动和结果。
使用 Markdown： 使用标题、列表等提高复杂指令的可读性。
提供示例（少样本）： 对于复杂任务或特定输出格式，直接在指令中包含示例。
指导工具使用： 不仅仅是列出工具；解释智能体何时和为什么应该使用它们。

# 示例：添加指令
capital_agent = LlmAgent(
    model="gemini-2.0-flash",
    name="capital_agent",
    description="回答用户关于给定国家首都的问题。",
    instruction="""你是一个提供国家首都信息的智能体。
当用户询问一个国家的首都时：
1. 从用户的查询中识别国家名称。
2. 使用 `get_capital_city` 工具来查找首都。
3. 清晰地回应用户，说明首都城市。
示例查询："法国的首都是什么？"
示例回应："法国的首都是巴黎。"
""",
    # 工具将在接下来添加
)

（注：对于适用于系统中所有智能体的指令，考虑在根智能体上使用 global_instruction，详见多智能体部分。）

为智能体装备：工具（`tools`）¶

工具赋予你的 LlmAgent 超越 LLM 内置知识或推理能力的能力。它们允许智能体与外部世界交互、进行计算、获取实时数据或执行特定操作。

tools（可选）： 提供智能体可以使用的工具列表。列表中的每个项目可以是：
- 一个 Python 函数（自动包装为 FunctionTool）。
- 一个继承自 BaseTool 的类实例。
- 另一个智能体的实例（AgentTool，启用智能体间委托 - 参见多智能体）。

LLM 使用函数/工具名称、描述（来自文档字符串或 description 字段）和参数模式，根据对话和其指令决定调用哪个工具。

# 定义一个工具函数
def get_capital_city(country: str) -> str:
  """检索给定国家的首都城市。"""
  # 替换为实际逻辑（例如，API 调用、数据库查询）
  capitals = {"france": "巴黎", "japan": "东京", "canada": "渥太华"}
  return capitals.get(country.lower(), f"抱歉，我不知道{country}的首都。")

# 将工具添加到智能体
capital_agent = LlmAgent(
    model="gemini-2.0-flash",
    name="capital_agent",
    description="回答用户关于给定国家首都的问题。",
    instruction="""你是一个提供国家首都信息的智能体... (前面的指令文本)""",
    tools=[get_capital_city] # 直接提供函数
)

在工具部分了解更多关于工具的信息。

高级配置与控制¶

除了核心参数外，LlmAgent 还提供几个选项来进行更精细的控制：

微调 LLM 生成（`generate_content_config`）¶

你可以使用 generate_content_config 调整底层 LLM 生成响应的方式。

generate_content_config（可选）： 传递 google.genai.types.GenerateContentConfig 的实例来控制参数，如 temperature（随机性）、max_output_tokens（响应长度）、top_p、top_k 和安全设置。

from google.genai import types

agent = LlmAgent(
    # ... 其他参数
    generate_content_config=types.GenerateContentConfig(
        temperature=0.2, # 更确定性的输出
        max_output_tokens=250
    )
)

结构化数据（`input_schema`、`output_schema`、`output_key`）¶

对于需要结构化数据交换的场景，你可以使用 Pydantic 模型。

input_schema（可选）： 定义一个表示预期输入结构的 Pydantic BaseModel 类。如果设置，传递给此智能体的用户消息内容必须是符合此模式的 JSON 字符串。你的指令应该相应地引导用户或前置智能体。
output_schema（可选）： 定义一个表示期望输出结构的 Pydantic BaseModel 类。如果设置，智能体的最终响应必须是符合此模式的 JSON 字符串。
- 约束： 使用 output_schema 可以在 LLM 内实现受控生成，但禁用智能体使用工具或将控制权转移给其他智能体的能力。你的指令必须引导 LLM 直接生成匹配模式的 JSON。
output_key（可选）： 提供一个字符串键。如果设置，智能体的最终响应的文本内容将自动保存到会话的状态字典中，使用这个键（例如，session.state[output_key] = agent_response_text）。这对于在智能体之间或工作流的步骤之间传递结果很有用。

from pydantic import BaseModel, Field

class CapitalOutput(BaseModel):
    capital: str = Field(description="国家的首都。")

structured_capital_agent = LlmAgent(
    # ... 名称、模型、描述
    instruction="""你是一个首都信息智能体。给定国家，仅回复包含首都的 JSON 对象。格式：{"capital": "首都名称"}""",
    output_schema=CapitalOutput, # 强制 JSON 输出
    output_key="found_capital"  # 将结果存储在 state['found_capital']
    # 在这里不能有效使用 tools=[get_capital_city]
)

管理上下文（`include_contents`）¶

控制智能体是否接收之前的对话历史。

include_contents（可选，默认值：'default'）： 确定是否将 contents（历史记录）发送给 LLM。
- 'default'：智能体接收相关的对话历史。
- 'none'：智能体不接收之前的 contents。它仅基于当前的指令和在当前回合提供的任何输入进行操作（适用于无状态任务或强制特定上下文）。
```
stateless_agent = LlmAgent(
    # ... 其他参数
    include_contents='none'
)
```

规划与代码执行¶

对于涉及多步骤推理或执行代码的更复杂的推理：

planner（可选）： 分配一个 BasePlanner 实例以在执行前启用多步骤推理和规划。（参见多智能体模式）。
code_executor（可选）： 提供一个 BaseCodeExecutor 实例，允许智能体执行 LLM 响应中的代码块（例如，Python）。（参见工具/内置工具）。

整合在一起：示例¶

代码

这是完整的基本 capital_agent：

# 基本首都智能体的完整示例代码
# --- Full example code demonstrating LlmAgent with Tools vs. Output Schema ---
import json # Needed for pretty printing dicts

from google.adk.agents import LlmAgent
from google.adk.runners import Runner
from google.adk.sessions import InMemorySessionService
from google.genai import types
from pydantic import BaseModel, Field

# --- 1. Define Constants ---
APP_NAME = "agent_comparison_app"
USER_ID = "test_user_456"
SESSION_ID_TOOL_AGENT = "session_tool_agent_xyz"
SESSION_ID_SCHEMA_AGENT = "session_schema_agent_xyz"
MODEL_NAME = "gemini-2.0-flash"

# --- 2. Define Schemas ---

# Input schema used by both agents
class CountryInput(BaseModel):
    country: str = Field(description="The country to get information about.")

# Output schema ONLY for the second agent
class CapitalInfoOutput(BaseModel):
    capital: str = Field(description="The capital city of the country.")
    # Note: Population is illustrative; the LLM will infer or estimate this
    # as it cannot use tools when output_schema is set.
    population_estimate: str = Field(description="An estimated population of the capital city.")

# --- 3. Define the Tool (Only for the first agent) ---
def get_capital_city(country: str) -> str:
    """Retrieves the capital city of a given country."""
    print(f"\n-- Tool Call: get_capital_city(country='{country}') --")
    country_capitals = {
        "united states": "Washington, D.C.",
        "canada": "Ottawa",
        "france": "Paris",
        "japan": "Tokyo",
    }
    result = country_capitals.get(country.lower(), f"Sorry, I couldn't find the capital for {country}.")
    print(f"-- Tool Result: '{result}' --")
    return result

# --- 4. Configure Agents ---

# Agent 1: Uses a tool and output_key
capital_agent_with_tool = LlmAgent(
    model=MODEL_NAME,
    name="capital_agent_tool",
    description="Retrieves the capital city using a specific tool.",
    instruction="""You are a helpful agent that provides the capital city of a country using a tool.
The user will provide the country name in a JSON format like {"country": "country_name"}.
1. Extract the country name.
2. Use the `get_capital_city` tool to find the capital.
3. Respond clearly to the user, stating the capital city found by the tool.
""",
    tools=[get_capital_city],
    input_schema=CountryInput,
    output_key="capital_tool_result", # Store final text response
)

# Agent 2: Uses output_schema (NO tools possible)
structured_info_agent_schema = LlmAgent(
    model=MODEL_NAME,
    name="structured_info_agent_schema",
    description="Provides capital and estimated population in a specific JSON format.",
    instruction=f"""You are an agent that provides country information.
The user will provide the country name in a JSON format like {{"country": "country_name"}}.
Respond ONLY with a JSON object matching this exact schema:
{json.dumps(CapitalInfoOutput.model_json_schema(), indent=2)}
Use your knowledge to determine the capital and estimate the population. Do not use any tools.
""",
    # *** NO tools parameter here - using output_schema prevents tool use ***
    input_schema=CountryInput,
    output_schema=CapitalInfoOutput, # Enforce JSON output structure
    output_key="structured_info_result", # Store final JSON response
)

# --- 5. Set up Session Management and Runners ---
session_service = InMemorySessionService()

# Create separate sessions for clarity, though not strictly necessary if context is managed
session_service.create_session(app_name=APP_NAME, user_id=USER_ID, session_id=SESSION_ID_TOOL_AGENT)
session_service.create_session(app_name=APP_NAME, user_id=USER_ID, session_id=SESSION_ID_SCHEMA_AGENT)

# Create a runner for EACH agent
capital_runner = Runner(
    agent=capital_agent_with_tool,
    app_name=APP_NAME,
    session_service=session_service
)
structured_runner = Runner(
    agent=structured_info_agent_schema,
    app_name=APP_NAME,
    session_service=session_service
)

# --- 6. Define Agent Interaction Logic ---
async def call_agent_and_print(
    runner_instance: Runner,
    agent_instance: LlmAgent,
    session_id: str,
    query_json: str
):
    """Sends a query to the specified agent/runner and prints results."""
    print(f"\n>>> Calling Agent: '{agent_instance.name}' | Query: {query_json}")

    user_content = types.Content(role='user', parts=[types.Part(text=query_json)])

    final_response_content = "No final response received."
    async for event in runner_instance.run_async(user_id=USER_ID, session_id=session_id, new_message=user_content):
        # print(f"Event: {event.type}, Author: {event.author}") # Uncomment for detailed logging
        if event.is_final_response() and event.content and event.content.parts:
            # For output_schema, the content is the JSON string itself
            final_response_content = event.content.parts[0].text

    print(f"<<< Agent '{agent_instance.name}' Response: {final_response_content}")

    current_session = session_service.get_session(app_name=APP_NAME,
                                                  user_id=USER_ID,
                                                  session_id=session_id)
    stored_output = current_session.state.get(agent_instance.output_key)

    # Pretty print if the stored output looks like JSON (likely from output_schema)
    print(f"--- Session State ['{agent_instance.output_key}']: ", end="")
    try:
        # Attempt to parse and pretty print if it's JSON
        parsed_output = json.loads(stored_output)
        print(json.dumps(parsed_output, indent=2))
    except (json.JSONDecodeError, TypeError):
         # Otherwise, print as string
        print(stored_output)
    print("-" * 30)


# --- 7. Run Interactions ---
async def main():
    print("--- Testing Agent with Tool ---")
    await call_agent_and_print(capital_runner, capital_agent_with_tool, SESSION_ID_TOOL_AGENT, '{"country": "France"}')
    await call_agent_and_print(capital_runner, capital_agent_with_tool, SESSION_ID_TOOL_AGENT, '{"country": "Canada"}')

    print("\n\n--- Testing Agent with Output Schema (No Tool Use) ---")
    await call_agent_and_print(structured_runner, structured_info_agent_schema, SESSION_ID_SCHEMA_AGENT, '{"country": "France"}')
    await call_agent_and_print(structured_runner, structured_info_agent_schema, SESSION_ID_SCHEMA_AGENT, '{"country": "Japan"}')

if __name__ == "__main__":
    await main()

（这个示例演示了核心概念。更复杂的智能体可能包含模式、上下文控制、规划等。）

虽然本页涵盖了 LlmAgent 的核心配置，但在其他地方详细介绍了几个提供更高级控制的相关概念：

回调： 使用 before_model_callback、after_model_callback 等拦截执行点（模型调用前/后，工具调用前/后）。参见回调。
多智能体控制： 智能体交互的高级策略，包括规划（planner）、控制智能体转移（disallow_transfer_to_parent、disallow_transfer_to_peers）以及系统范围的指令（global_instruction）。参见多智能体。

LLM 智能体¶

定义智能体的身份和目的¶

引导智能体：指令（instruction）¶

为智能体装备：工具（tools）¶

高级配置与控制¶

微调 LLM 生成（generate_content_config）¶

结构化数据（input_schema、output_schema、output_key）¶

管理上下文（include_contents）¶