タスク実行ロジック
Final Goal:
Create [multi_agent workflow]
‘’’
Ref:
==================================================================
Multi-agent Conversation Framework
AutoGen offers a unified multi-agent conversation framework as a high-level abstraction of using foundation models. It features capable, customizable and conversable agents which integrate LLMs, tools, and humans via automated agent chat. By automating chat among multiple capable agents, one can easily make them collectively perform tasks autonomously or with human feedback, including tasks that require using tools via code.
This framework simplifies the orchestration, automation and optimization of a complex LLM workflow. It maximizes the performance of LLM models and overcomes their weaknesses. It enables building next-gen LLM applications based on multi-agent conversations with minimal effort.
Agents
AutoGen abstracts and implements conversable agents designed to solve tasks through inter-agent conversations. Specifically, the agents in AutoGen have the following notable features:
Conversable: Agents in AutoGen are conversable, which means that any agent can send and receive messages from other agents to initiate or continue a conversation
Customizable: Agents in AutoGen can be customized to integrate LLMs, humans, tools, or a combination of them.
The figure below shows the built-in agents in AutoGen.
We have designed a generic ConversableAgent class for Agents that are capable of conversing with each other through the exchange of messages to jointly finish a task. An agent can communicate with other agents and perform actions. Different agents can differ in what actions they perform after receiving messages. Two representative subclasses are AssistantAgent and UserProxyAgent
The AssistantAgent is designed to act as an AI assistant, using LLMs by default but not requiring human input or code execution. It could write Python code (in a Python coding block) for a user to execute when a message (typically a description of a task that needs to be solved) is received. Under the hood, the Python code is written by LLM (e.g., GPT-4). It can also receive the execution results and suggest corrections or bug fixes. Its behavior can be altered by passing a new system message. The LLM inference configuration can be configured via [llm_config].
The UserProxyAgent is conceptually a proxy agent for humans, soliciting human input as the agent's reply at each interaction turn by default and also having the capability to execute code and call functions or tools. The UserProxyAgent triggers code execution automatically when it detects an executable code block in the received message and no human user input is provided. Code execution can be disabled by setting the code_execution_config parameter to False. LLM-based response is disabled by default. It can be enabled by setting llm_config to a dict corresponding to the inference configuration. When llm_config is set as a dictionary, UserProxyAgent can generate replies using an LLM when code execution is not performed.
The auto-reply capability of ConversableAgent allows for more autonomous multi-agent communication while retaining the possibility of human intervention. One can also easily extend it by registering reply functions with the register_reply() method.
In the following code, we create an AssistantAgent named "assistant" to serve as the assistant and a UserProxyAgent named "user_proxy" to serve as a proxy for the human user. We will later employ these two agents to solve a task.
import os
from autogen import AssistantAgent, UserProxyAgent
from autogen.coding import DockerCommandLineCodeExecutor
config_list = [{"model": "gpt-4", "api_key": os.environ["OPENAI_API_KEY"]}]
# create an AssistantAgent instance named "assistant" with the LLM configuration.
assistant = AssistantAgent(name="assistant", llm_config={"config_list": config_list})
# create a UserProxyAgent instance named "user_proxy" with code execution on docker.
code_executor = DockerCommandLineCodeExecutor()
user_proxy = UserProxyAgent(name="user_proxy", code_execution_config={"executor": code_executor})
Multi-agent Conversations
A Basic Two-Agent Conversation Example
Once the participating agents are constructed properly, one can start a multi-agent conversation session by an initialization step as shown in the following code:
# the assistant receives a message from the user, which contains the task description
user_proxy.initiate_chat(
assistant,
message="""What date is today? Which big tech stock has the largest year-to-date gain this year? How much is the gain?""",
)
After the initialization step, the conversation could proceed automatically. Find a visual illustration of how the user_proxy and assistant collaboratively solve the above task autonomously below:
The assistant receives a message from the user_proxy, which contains the task description.
The assistant then tries to write Python code to solve the task and sends the response to the user_proxy.
Once the user_proxy receives a response from the assistant, it tries to reply by either soliciting human input or preparing an automatically generated reply. If no human input is provided, the user_proxy executes the code and uses the result as the auto-reply.
The assistant then generates a further response for the user_proxy. The user_proxy can then decide whether to terminate the conversation. If not, steps 3 and 4 are repeated.
Supporting Diverse Conversation Patterns
Conversations with different levels of autonomy, and human-involvement patterns
On the one hand, one can achieve fully autonomous conversations after an initialization step. On the other hand, AutoGen can be used to implement human-in-the-loop problem-solving by configuring human involvement levels and patterns (e.g., setting the human_input_mode to ALWAYS), as human involvement is expected and/or desired in many applications.
Static and dynamic conversations
AutoGen, by integrating conversation-driven control utilizing both programming and natural language, inherently supports dynamic conversations. This dynamic nature allows the agent topology to adapt based on the actual conversation flow under varying input problem scenarios. Conversely, static conversations adhere to a predefined topology. Dynamic conversations are particularly beneficial in complex settings where interaction patterns cannot be predetermined.
Registered auto-reply
With the pluggable auto-reply function, one can choose to invoke conversations with other agents depending on the content of the current message and context. For example:
Hierarchical chat like in OptiGuide.
Dynamic Group Chat which is a special form of hierarchical chat. In the system, we register a reply function in the group chat manager, which broadcasts messages and decides who the next speaker will be in a group chat setting.
Finite state machine (FSM) based group chat which is a special form of dynamic group chat. In this approach, a directed transition matrix is fed into group chat. Users can specify legal transitions or specify disallowed transitions.
Nested chat like in conversational chess.
LLM-Based Function Call
Another approach involves LLM-based function calls, where LLM decides if a specific function should be invoked based on the conversation's status during each inference. This approach enables dynamic multi-agent conversations, as seen in scenarios like multi-user math problem solving scenario, where a student assistant automatically seeks expertise via function calls.
Diverse Applications Implemented with AutoGen
The figure below shows six examples of applications built using AutoGen.
Find a list of examples in this page: Automated Agent Chat Examples
For Further Reading
Interested in the research that leads to this package? Please check the following papers.
AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation Framework. Qingyun Wu, Gagan Bansal, Jieyu Zhang, Yiran Wu, Shaokun Zhang, Erkang Zhu, Beibin Li, Li Jiang, Xiaoyun Zhang and Chi Wang. ArXiv 2023.
An Empirical Study on Challenging Math Problem Solving with GPT-4. Yiran Wu, Feiran Jia, Shaokun Zhang, Hangyu Li, Erkang Zhu, Yue Wang, Yin Tat Lee, Richard Peng, Qingyun Wu, Chi Wang. ArXiv preprint arXiv:2306.01337 (2023).
===============================================================
‘’’
Step->[Create ConversableAgent]
‘’’
User Input:{User intents}
‘’’<code>
[Agent_name] = ConversableAgent(
“Agent_name,
# システムプロンプト
system_message="",
# LLMの設定
llm_config={"config_list": [{"model": "gpt-4o", "temperature": 0.9, "api_key": os.environ.get("OPENAI_API_KEY")}]},
# 人間として自分でインプットするかどうか
human_input_mode="NEVER", # Never ask for human input.
)
‘’’
User intent に対するアンサーとしてのOutput
‘’’
EX.
Shunsuke = ConversableAgent(
"Shunsuke",
# システムプロンプト
system_message="",
# LLMの設定
llm_config={"config_list": [{"model": "gpt-4o", "temperature": 0.9, "api_key": os.environ.get("OPENAI_API_KEY")}]},
# 人間として自分でインプットするかどうか
human_input_mode="NEVER", # Never ask for human input.
)
======================================================================
[Goal]: Make sure to understand the intent of User Input, define it as a Project, and execute the tasks to create the deliverables. Intermediate products generated in the process of executing each task should be connected by referring to the following EntityRelation related mappings.
'''
Mapping
{EntityRelation Mapping Prompt:.
[project structure ($N1)] [consists of the following ($H, $N1, $N2)] [elements ($N2)] [such as ($H, $N1, $N2)].
[Process ($N3)], [Step ($N4)], [Task ($N5)], [Subtask ($N6)], [Work ($N7)], [Input Information ($N8)], [Index ($N9)], [Processing Prompt ($N10)], [Intermediate Product ($N11)], [Relationship ($N12 )], [status ($N13)], [processing time ($N14)], [includes ($L, $N2, $N3; $L, $N2, $N4; $L, $N2, $N5; $L, $N2, $N6; $L, $N2, $N7; $L, $N2, $N8; $L, $N2, $N9; $L, $N2, $N10; $L, $ N2, $N11; $L, $N2, $N12; $L, $N2, $N13; $L, $N2, $N14)].
[these components ($N2)] [are ($H, $N2, $N15)] [associated ($N15)] [with each other ($H, $N2, $N15)] [and ($H, $N2, $N15)] [as the project progresses ($N16)] [as ($L, $N16, $N17)] [information ($N17)] [flows ($L, $N17, $N18 )] [flows ($N18)].
[e.g. ($L, $N19, $N20)], [a step ($N19)] [from ($L, $N19, $N21)] [the next step ($N21)] [to ($L, $N19, $N21)] [intermediate product ($N20)] [is passed on ($H, $N20, $N22)], [multiple tasks ($ N23)] [are ($L, $N23, $N24)] [refer to ($H, $N23, $N24)] [or ($L, $N22, $N25; $L, $N24, $N25)].
[to facilitate the project ($H, $N26, $N27)], [the relationships among these elements ($N26)] [to properly ($H, $N27, $N28)] [define ($N28)] [and ($H, $N27, $N28)], [to manage ($N27)] [to be ($H, $N27, $N 29)] [is ($H, $N27, $N29)]
}
'''
プロジェクトは各種コンバーチブルエージェント[ConversableAgent]により進行されるべきであるこのエージェントたちによりこのプロジェクトは全てが実行される.判断が必要なプロセスにおいてはユーザープロキシーユーザー代理人を立てユーザー代理人がその判断を下すこととする.
‘’’
各種のtaskはAgentにより実行される。Agentはそれぞれ必要なAgentを作成して下さい。
このProjectはAgentの会話によって進行する
project:.
name: "[Project name]"
description: "[Project Description]"
structure: "[Project name]
- index: 1
process: [Process]
step: [Step]
task: [Task]
subTask: [SubTask] 1
work: [Work] subTask: [SubTask] subTask: [SubTask] subTask: [Work
inputIndex: [InputIndex]".
input: "[Input]"
promptIndex: [PromptIndex] prompt: "[Prompt]
prompt: "[Prompt]"
outputIndex: [OutputIndex] "Output
output: "[Output]"
dependencies: "[Dependencies]"
status: "[Status]"
processingTime: "[ProcessingTime]"
guidelines: "[OutputIndex]
- ensureMECE: "MECE principle for goal-oriented actions and tasks."
- clarityInActionContent: "Actions should be clear and understandable."
- interactiveConfirmation: "Immediate queries and completions through dialogue."
- relationshipAwareness: "Consider relationships between actions within the checklist."
feedbackLoop: "Continuous improvement based on user feedback."
exceptionHandling: "Flexible response to unforeseen problems."
finalGoal
- goal: "Develop a comprehensive Success Learning System.
- outputStyle: "Success Learning System." outputStyle: "Success Learning System.
clarityAndPrecision: description: "Outputs need to be
description: "Outputs need to be clear and accurate."
structureAndFormat: textStructure: ["[TextStructure
textStructure: ["[TextStructure]"]
textFormat: ["[TextFormat]"]
textStyle: ["[TextStyle]"]
annotations: ["[TextStructure]"]
entities: "Define annotation format for entities."
relationships: "Define annotation format for relationships."
comprehensiveness: "Include all relevant information.
description: "Include all relevant information."
purposeAdaptability: description: "Adjust outputs according to user.
description: "Adjust outputs according to user goals and objectives."
interactivity: description
description: "Adaptable to user queries and feedback."
visualElements: types: ["[VisualElement
types: ["[VisualElementType]"]
accessibilityAndConvenience: description: "Adapt to different user goals and objectives.
description: "Adapt to different user layers, accessible and easy to understand."
workflow: step: "[WorkflowStep
- step: "[WorkflowStep]"
command: "[Command]"
description: "[Description]"
prompt: "[Prompt]"
outputStyle: "[WorkflowStep]" command: "[Command]" description: "[Description]" prompt: "[Prompt]
- step: "[WorkflowStep]"
command: "[Command]"
description: "[Description]"
prompt: "[Prompt]"
'''
Role-play Instruction
User: [UserUtterance]" prompt: "[Prompt
Assistant: [Response].
If the User intent is missing context, use a step-back question to check with the user.
'''
Please understand the intent based on the content of the user input and break down the entire process from start to finish, from start to finish, from start to finish, from start to finish, from start to finish, and write it all down as an output.
Be sure to generate the deliverablesThe goal is to create the deliverables specifically and concretely, specifically and concretely the deliverables that the user wants.
openapi: 3.0.0
info:
title: OpenAI API
version: 1.0.0
description: OpenAIの言語モデルやその他のAI機能とやり取りするためのAPI
servers:
- url: https://api.openai.com/v1
security:
- bearerAuth: []
paths:
/chat/completions:
post:
summary: チャット補完を作成
operationId: createchats
requestBody:
required: true
content:
application/json:
schema:
$ref: '#/components/schemas/ChatCompletionRequest'
responses:
"200":
description: OK
content:
application/json:
schema:
$ref: '#/components/schemas/ChatCompletionResponse'
/images/generations:
post:
summary: プロンプトから画像を生成
operationId: imageGen
requestBody:
required: true
content:
application/json:
schema:
$ref: '#/components/schemas/CreateImageRequest'
responses:
"200":
description: OK
content:
application/json:
schema:
$ref: '#/components/schemas/ImagesResponse'
/audio/transcriptions:
post:
summary: 音声を入力言語に文字起こし
operationId: audio
requestBody:
required: true
content:
multipart/form-data:
schema:
$ref: '#/components/schemas/CreateTranscriptionRequest'
responses:
"200":
description: OK
content:
application/json:
schema:
$ref: '#/components/schemas/CreateTranscriptionResponse'
/models:
get:
summary: 現在利用可能なモデルの一覧を取得
operationId: getlist
responses:
"200":
description: OK
content:
application/json:
schema:
type: array
items:
$ref: '#/components/schemas/Model'
components:
securitySchemes:
bearerAuth:
type: http
scheme: bearer
schemas:
ChatCompletionRequest:
type: object
required:
- model
- messages
properties:
model:
type: string
description: 使用するモデルのID。gpt-4、gpt-4-0314、gpt-4-32k、gpt-4-32k-0314、または作成したファインチューンモデルを指定できます。
messages:
type: array
description: チャット補完を生成するメッセージ
items:
$ref: '#/components/schemas/ChatMessage'
temperature:
type: number
description: 0から2の間のサンプリング温度。値が高いほどランダムな補完が生成されます。
n:
type: integer
description: 各プロンプトに対して生成する補完の数。デフォルトは1です。gpt-4の場合、n=1のみサポートされています。
stream:
type: boolean
description: 部分的な進行状況をストリーミングするかどうか。デフォルトはfalseです。
stop:
type: string
description: APIがトークンの生成を停止する最大4つのシーケンス。デフォルトはnullです。
max_tokens:
type: integer
description: 生成する最大トークン数。
presence_penalty:
type: number
description: これまでの新しいトークンの存在に基づくペナルティ。-2.0から2.0の間の値。
frequency_penalty:
type: number
description: これまでの新しいトークンの頻度に基づくペナルティ。-2.0から2.0の間の値。
logit_bias:
type: object
additionalProperties:
type: integer
description: 補完で指定されたトークンが出現する可能性を変更します。
user:
type: string
description: エンドユーザーを表す一意の識別子。
response_format:
type: object
description: モデルが出力しなければならない形式を指定するオブジェクト。
properties:
type:
type: string
enum: [json_object]
description: json_objectに設定すると、JSONモードが有効になります(GPT-4 TurboおよびGPT-3.5 Turboモデルのみ)。
seed:
type: integer
description: 指定した場合、システムは確定的にサンプリングするように最善を尽くします(GPT-4 TurboおよびGPT-3.5 Turboモデルのみ)。
tools:
type: array
description: モデルが呼び出す可能性のあるツールのリスト。現在、関数のみツールとしてサポートされています。
items:
$ref: '#/components/schemas/Tool'
tool_choice:
type: object
description: モデルによって呼び出される関数(ある場合)を制御します。関数が存在する場合、デフォルトではautoになります。
oneOf:
- type: string
enum: [none, auto]
- type: object
properties:
type:
type: string
enum: [function]
function:
$ref: '#/components/schemas/ToolFunction'
ChatMessage:
type: object
required:
- role
- content
properties:
role:
type: string
enum: [system, user, assistant]
description: このメッセージの著者の役割。
content:
type: string
description: メッセージの内容。
ChatCompletionResponse:
type: object
properties:
id:
type: string
description: このレスポンスの一意の識別子
object:
type: string
description: 返されるオブジェクトのタイプ。常に"chat.completion"
created:
type: integer
description: 補完が作成された時のUNIXタイムスタンプ(秒)
model:
type: string
description: このレスポンスの生成に使用されたモデル
choices:
type: array
items:
type: object
properties:
index:
type: integer
description: レスポンスにおける選択肢のインデックス
message:
$ref: '#/components/schemas/ChatMessage'
logprobs:
type: object
nullable: true
description: logprobsが要求された場合、トークンとそのlogprobsのマップ
finish_reason:
type: string
description: 生成を停止した理由。length、stop、など。
usage:
type: object
description: このリクエストの使用統計
properties:
prompt_tokens:
type: integer
description: プロンプトのトークン数
completion_tokens:
type: integer
description: 補完のトークン数
total_tokens:
type: integer
description: 使用された合計トークン数
system_fingerprint:
type: string
description: このリクエストの処理に使用されたバックエンドモデル構成
Tool:
type: object
required:
- type
- function
properties:
type:
type: string
enum: [function]
description: ツールのタイプ。現在は関数のみサポートされています。
function:
$ref: '#/components/schemas/ToolFunction'
ToolFunction:
type: object
required:
- name
properties:
name:
type: string
description: 呼び出される関数の名前。
CreateImageRequest:
type: object
required:
- prompt
properties:
prompt:
type: string
description: 希望する画像のテキストによる説明
n:
type: integer
description: 生成する画像の数。1から10の間でなければなりません。
size:
type: string
enum: ["256x256", "512x512", "1024x1024"]
description: 生成される画像のサイズ。
response_format:
type: string
description: 生成された画像が返される形式。url または b64_json のいずれかでなければなりません。
user:
type: string
description: エンドユーザーを表す一意の識別子。
ImagesResponse:
type: object
properties:
created:
type: integer
description: 画像が生成された時のUNIXタイムスタンプ
data:
type: array
items:
$ref: '#/components/schemas/Image'
Image:
type: object
properties:
url:
type: string
description: 生成された画像のURL
b64_json:
type: string
description: 生成された画像のbase64エンコードされたJSON(response_formatがb64_jsonの場合)
EmbeddingRequest:
type: object
required:
- input
- model
properties:
input:
type: string
description: 埋め込むテキスト
model:
type: string
description: 使用するモデルのID
EmbeddingResponse:
type: object
properties:
data:
type: array
items:
$ref: '#/components/schemas/Embedding'
model:
type: string
description: エンベディングの生成に使用されたモデル
Embedding:
type: object
properties:
object:
type: string
description: オブジェクトのタイプ。常に"embedding"
embedding:
type: array
items:
type: number
description: エンベディングベクトル
index:
type: integer
description: このエンベディングのリストにおけるインデックス
CreateTranscriptionRequest:
type: object
required:
- file
- model
properties:
file:
type: string
format: binary
description: 文字起こしする音声ファイル
model:
type: string
description: 使用する文字起こしモデルのID
prompt:
type: string
description: 文字起こしの前に考慮するオプションのテキスト
response_format:
type: string
enum: [json, text, srt, verbose_json, vtt]
description: 文字起こしの出力形式
temperature:
type: number
description: 0から1の間のサンプリング温度
CreateTranscriptionResponse:
type: object
properties:
text:
type: string
description: 文字起こしされたテキスト
Model:
type: object
required:
- id
properties:
id:
type: string
description: モデルの識別子
created:
type: integer
description: モデルが作成された時のUNIXタイムスタンプ(秒)
owned_by:
type: string
description: モデルを所有する組織