Add Finance Agent Example (#1752)

Signed-off-by: minmin-intel <minmin.hou@intel.com>
Signed-off-by: Rita Brugarolas <rita.brugarolas.brufau@intel.com>
Signed-off-by: rbrugaro <rita.brugarolas.brufau@intel.com>
Co-authored-by: rbrugaro <rita.brugarolas.brufau@intel.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: lkk <33276950+lkk12014402@users.noreply.github.com>
Co-authored-by: lkk12014402 <kaokao.lv@intel.com>
This commit is contained in:
minmin-intel
2025-04-13 23:27:07 -07:00
committed by GitHub
parent 72ce335663
commit 1852e6bcc3
26 changed files with 2454 additions and 0 deletions

145
FinanceAgent/README.md Normal file
View File

@@ -0,0 +1,145 @@
# Finance Agent
## 1. Overview
The architecture of this Finance Agent example is shown in the figure below. The agent has 3 main functions:
1. Summarize long financial documents and provide key points.
2. Answer questions over financial documents, such as SEC filings.
3. Conduct research of a public company and provide an investment report of the company.
![Finance Agent Architecture](assets/finance_agent_arch.png)
The `dataprep` microservice can ingest financial documents in two formats:
1. PDF documents stored locally, such as SEC filings saved in local directory.
2. URLs, such as earnings call transcripts ([example](https://www.fool.com/earnings/call-transcripts/2025/03/06/costco-wholesale-cost-q2-2025-earnings-call-transc/)) and online SEC filings ([example](https://investors.3m.com/financials/sec-filings/content/0000066740-25-000006/0000066740-25-000006.pdf)).
Please note:
1. Each financial document should be about one company.
2. URLs ending in `.htm` are not supported.
## 2. Getting started
### 2.1 Download repos
```bash
mkdir /path/to/your/workspace/
export WORKDIR=/path/to/your/workspace/
genaicomps
genaiexamples
```
### 2.2 Set up env vars
```bash
export HF_CACHE_DIR=/path/to/your/model/cache/
export HF_TOKEN=<you-hf-token>
```
### 2.3 Build docker images
Build docker images for dataprep, agent, agent-ui.
```bash
cd GenAIExamples/FinanceAgent/docker_image_build
docker compose -f build.yaml build --no-cache
```
If deploy on Gaudi, also need to build vllm image.
```bash
cd $WORKDIR
git clone https://github.com/HabanaAI/vllm-fork.git
# get the latest release tag of vllm gaudi
cd vllm-fork
VLLM_VER=$(git describe --tags "$(git rev-list --tags --max-count=1)")
echo "Check out vLLM tag ${VLLM_VER}"
git checkout ${VLLM_VER}
docker build --no-cache -f Dockerfile.hpu -t opea/vllm-gaudi:latest --shm-size=128g . --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy
```
## 3. Deploy with docker compose
### 3.1 Launch vllm endpoint
Below is the command to launch a vllm endpoint on Gaudi that serves `meta-llama/Llama-3.3-70B-Instruct` model on 4 Gaudi cards.
```bash
cd $WORKDIR/GenAIExamples/FinanceAgent/docker_compose/intel/hpu/gaudi
bash launch_vllm.sh
```
### 3.2 Prepare knowledge base
The commands below will upload some example files into the knowledge base. You can also upload files through UI.
First, launch the redis databases and the dataprep microservice.
```bash
# inside $WORKDIR/GenAIExamples/FinanceAgent/docker_compose/intel/hpu/gaudi/
bash launch_dataprep.sh
```
Validate datat ingest data and retrieval from database:
```bash
python $WORKPATH/tests/test_redis_finance.py --port 6007 --test_option ingest
python $WORKPATH/tests/test_redis_finance.py --port 6007 --test_option get
```
### 3.3 Launch the multi-agent system
```bash
# inside $WORKDIR/GenAIExamples/FinanceAgent/docker_compose/intel/hpu/gaudi/
bash launch_agents.sh
```
### 3.4 Validate agents
FinQA Agent:
```bash
export agent_port="9095"
prompt="What is Gap's revenue in 2024?"
python3 $WORKDIR/GenAIExamples/FinanceAgent/tests/test.py --prompt "$prompt" --agent_role "worker" --ext_port $agent_port
```
Research Agent:
```bash
export agent_port="9096"
prompt="generate NVDA financial research report"
python3 $WORKDIR/GenAIExamples/FinanceAgent/tests/test.py --prompt "$prompt" --agent_role "worker" --ext_port $agent_port --tool_choice "get_current_date" --tool_choice "get_share_performance"
```
Supervisor ReAct Agent:
```bash
export agent_port="9090"
python3 $WORKDIR/GenAIExamples/FinanceAgent/tests/test.py --agent_role "supervisor" --ext_port $agent_port --stream
```
Supervisor ReAct Agent Multi turn:
```bash
python3 $WORKDIR/GenAIExamples/FinanceAgent/tests/test.py --agent_role "supervisor" --ext_port $agent_port --multi-turn --stream
```
## How to interact with the agent system with UI
The UI microservice is launched in the previous step with the other microservices.
To see the UI, open a web browser to `http://${ip_address}:5175` to access the UI. Note the `ip_address` here is the host IP of the UI microservice.
1. `create Admin Account` with a random value
2. use an opea agent endpoint, for example, the `Research Agent` endpoint `http://$ip_address:9096/v1`, which is a openai compatible api
![opea-agent-setting](assets/opea-agent-setting.png)
3. test opea agent with ui
![opea-agent-test](assets/opea-agent-test.png)

Binary file not shown.

After

Width:  |  Height:  |  Size: 57 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 114 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 378 KiB

View File

@@ -0,0 +1,133 @@
# Copyright (C) 2024 Intel Corporation
# SPDX-License-Identifier: Apache-2.0
services:
worker-finqa-agent:
image: opea/agent:latest
container_name: finqa-agent-endpoint
volumes:
- ${TOOLSET_PATH}:/home/user/tools/
- ${PROMPT_PATH}:/home/user/prompts/
ports:
- "9095:9095"
ipc: host
environment:
ip_address: ${ip_address}
strategy: react_llama
with_memory: false
recursion_limit: ${recursion_limit_worker}
llm_engine: vllm
HUGGINGFACEHUB_API_TOKEN: ${HUGGINGFACEHUB_API_TOKEN}
llm_endpoint_url: ${LLM_ENDPOINT_URL}
model: ${LLM_MODEL_ID}
temperature: ${TEMPERATURE}
max_new_tokens: ${MAX_TOKENS}
stream: false
tools: /home/user/tools/finqa_agent_tools.yaml
custom_prompt: /home/user/prompts/finqa_prompt.py
require_human_feedback: false
no_proxy: ${no_proxy}
http_proxy: ${http_proxy}
https_proxy: ${https_proxy}
REDIS_URL_VECTOR: $REDIS_URL_VECTOR
REDIS_URL_KV: $REDIS_URL_KV
TEI_EMBEDDING_ENDPOINT: $TEI_EMBEDDING_ENDPOINT
port: 9095
worker-research-agent:
image: opea/agent:latest
container_name: research-agent-endpoint
volumes:
- ${TOOLSET_PATH}:/home/user/tools/
- ${PROMPT_PATH}:/home/user/prompts/
ports:
- "9096:9096"
ipc: host
environment:
ip_address: ${ip_address}
strategy: react_llama
with_memory: false
recursion_limit: ${recursion_limit_worker}
llm_engine: vllm
HUGGINGFACEHUB_API_TOKEN: ${HUGGINGFACEHUB_API_TOKEN}
llm_endpoint_url: ${LLM_ENDPOINT_URL}
model: ${LLM_MODEL_ID}
stream: false
tools: /home/user/tools/research_agent_tools.yaml
custom_prompt: /home/user/prompts/research_prompt.py
require_human_feedback: false
no_proxy: ${no_proxy}
http_proxy: ${http_proxy}
https_proxy: ${https_proxy}
FINNHUB_API_KEY: ${FINNHUB_API_KEY}
FINANCIAL_DATASETS_API_KEY: ${FINANCIAL_DATASETS_API_KEY}
port: 9096
supervisor-react-agent:
image: opea/agent:latest
container_name: supervisor-agent-endpoint
depends_on:
- worker-finqa-agent
# - worker-research-agent
volumes:
- ${TOOLSET_PATH}:/home/user/tools/
- ${PROMPT_PATH}:/home/user/prompts/
ports:
- "9090:9090"
ipc: host
environment:
ip_address: ${ip_address}
strategy: react_llama
with_memory: true
recursion_limit: ${recursion_limit_supervisor}
llm_engine: vllm
HUGGINGFACEHUB_API_TOKEN: ${HUGGINGFACEHUB_API_TOKEN}
llm_endpoint_url: ${LLM_ENDPOINT_URL}
model: ${LLM_MODEL_ID}
temperature: ${TEMPERATURE}
max_new_tokens: ${MAX_TOKENS}
stream: true
tools: /home/user/tools/supervisor_agent_tools.yaml
custom_prompt: /home/user/prompts/supervisor_prompt.py
require_human_feedback: false
no_proxy: ${no_proxy}
http_proxy: ${http_proxy}
https_proxy: ${https_proxy}
WORKER_FINQA_AGENT_URL: $WORKER_FINQA_AGENT_URL
WORKER_RESEARCH_AGENT_URL: $WORKER_RESEARCH_AGENT_URL
DOCSUM_ENDPOINT: $DOCSUM_ENDPOINT
REDIS_URL_VECTOR: $REDIS_URL_VECTOR
REDIS_URL_KV: $REDIS_URL_KV
TEI_EMBEDDING_ENDPOINT: $TEI_EMBEDDING_ENDPOINT
port: 9090
docsum-vllm-gaudi:
image: opea/llm-docsum:latest
container_name: docsum-vllm-gaudi
ports:
- ${DOCSUM_PORT:-9000}:9000
ipc: host
environment:
no_proxy: ${no_proxy}
http_proxy: ${http_proxy}
https_proxy: ${https_proxy}
LLM_ENDPOINT: ${LLM_ENDPOINT}
LLM_MODEL_ID: ${LLM_MODEL_ID}
HF_TOKEN: ${HF_TOKEN}
LOGFLAG: ${LOGFLAG:-False}
MAX_INPUT_TOKENS: ${MAX_INPUT_TOKENS}
MAX_TOTAL_TOKENS: ${MAX_TOTAL_TOKENS}
DocSum_COMPONENT_NAME: ${DocSum_COMPONENT_NAME:-OpeaDocSumvLLM}
restart: unless-stopped
agent-ui:
image: opea/agent-ui:latest
container_name: agent-ui
environment:
host_ip: ${host_ip}
no_proxy: ${no_proxy}
http_proxy: ${http_proxy}
https_proxy: ${https_proxy}
ports:
- "5175:8080"
ipc: host

View File

@@ -0,0 +1,82 @@
# Copyright (C) 2025 Intel Corporation
# SPDX-License-Identifier: Apache-2.0
services:
tei-embedding-serving:
image: ghcr.io/huggingface/text-embeddings-inference:cpu-1.5
container_name: tei-embedding-serving
entrypoint: /bin/sh -c "apt-get update && apt-get install -y curl && text-embeddings-router --json-output --model-id ${EMBEDDING_MODEL_ID} --auto-truncate"
ports:
- "${TEI_EMBEDDER_PORT:-10221}:80"
volumes:
- "./data:/data"
shm_size: 1g
environment:
no_proxy: ${no_proxy}
http_proxy: ${http_proxy}
https_proxy: ${https_proxy}
host_ip: ${host_ip}
HF_TOKEN: ${HF_TOKEN}
healthcheck:
test: ["CMD", "curl", "-f", "http://${host_ip}:${TEI_EMBEDDER_PORT}/health"]
interval: 10s
timeout: 6s
retries: 48
redis-vector-db:
image: redis/redis-stack:7.2.0-v9
container_name: redis-vector-db
ports:
- "${REDIS_PORT1:-6379}:6379"
- "${REDIS_PORT2:-8001}:8001"
environment:
- no_proxy=${no_proxy}
- http_proxy=${http_proxy}
- https_proxy=${https_proxy}
healthcheck:
test: ["CMD", "redis-cli", "ping"]
timeout: 10s
retries: 3
start_period: 10s
redis-kv-store:
image: redis/redis-stack:7.2.0-v9
container_name: redis-kv-store
ports:
- "${REDIS_PORT3:-6380}:6379"
- "${REDIS_PORT4:-8002}:8001"
environment:
- no_proxy=${no_proxy}
- http_proxy=${http_proxy}
- https_proxy=${https_proxy}
healthcheck:
test: ["CMD", "redis-cli", "ping"]
timeout: 10s
retries: 3
start_period: 10s
dataprep-redis-finance:
image: ${REGISTRY:-opea}/dataprep:${TAG:-latest}
container_name: dataprep-redis-server-finance
depends_on:
redis-vector-db:
condition: service_healthy
redis-kv-store:
condition: service_healthy
tei-embedding-serving:
condition: service_healthy
ports:
- "${DATAPREP_PORT:-6007}:5000"
environment:
no_proxy: ${no_proxy}
http_proxy: ${http_proxy}
https_proxy: ${https_proxy}
DATAPREP_COMPONENT_NAME: ${DATAPREP_COMPONENT_NAME}
REDIS_URL_VECTOR: ${REDIS_URL_VECTOR}
REDIS_URL_KV: ${REDIS_URL_KV}
TEI_EMBEDDING_ENDPOINT: ${TEI_EMBEDDING_ENDPOINT}
LLM_ENDPOINT: ${LLM_ENDPOINT}
LLM_MODEL: ${LLM_MODEL}
HUGGINGFACEHUB_API_TOKEN: ${HF_TOKEN}
HF_TOKEN: ${HF_TOKEN}
LOGFLAG: true

View File

@@ -0,0 +1,36 @@
# Copyright (C) 2025 Intel Corporation
# SPDX-License-Identifier: Apache-2.0
export ip_address=$(hostname -I | awk '{print $1}')
export HUGGINGFACEHUB_API_TOKEN=${HF_TOKEN}
export TOOLSET_PATH=$WORKDIR/GenAIExamples/FinanceAgent/tools/
echo "TOOLSET_PATH=${TOOLSET_PATH}"
export PROMPT_PATH=$WORKDIR/GenAIExamples/FinanceAgent/prompts/
echo "PROMPT_PATH=${PROMPT_PATH}"
export recursion_limit_worker=12
export recursion_limit_supervisor=10
vllm_port=8086
export LLM_MODEL_ID="meta-llama/Llama-3.3-70B-Instruct"
export LLM_ENDPOINT_URL="http://${ip_address}:${vllm_port}"
export TEMPERATURE=0.5
export MAX_TOKENS=4096
export WORKER_FINQA_AGENT_URL="http://${ip_address}:9095/v1/chat/completions"
export WORKER_RESEARCH_AGENT_URL="http://${ip_address}:9096/v1/chat/completions"
export EMBEDDING_MODEL_ID="BAAI/bge-base-en-v1.5"
export TEI_EMBEDDING_ENDPOINT="http://${ip_address}:10221"
export REDIS_URL_VECTOR="redis://${ip_address}:6379"
export REDIS_URL_KV="redis://${ip_address}:6380"
export MAX_INPUT_TOKENS=2048
export MAX_TOTAL_TOKENS=4096
export DocSum_COMPONENT_NAME="OpeaDocSumvLLM"
export DOCSUM_ENDPOINT="http://${ip_address}:9000/v1/docsum"
export FINNHUB_API_KEY=${FINNHUB_API_KEY}
export FINANCIAL_DATASETS_API_KEY=${FINANCIAL_DATASETS_API_KEY}
docker compose -f compose.yaml up -d

View File

@@ -0,0 +1,15 @@
# Copyright (C) 2025 Intel Corporation
# SPDX-License-Identifier: Apache-2.0
export host_ip=${ip_address}
export DATAPREP_PORT="6007"
export TEI_EMBEDDER_PORT="10221"
export REDIS_URL_VECTOR="redis://${ip_address}:6379"
export REDIS_URL_KV="redis://${ip_address}:6380"
export LLM_MODEL=$model
export LLM_ENDPOINT="http://${ip_address}:${vllm_port}"
export DATAPREP_COMPONENT_NAME="OPEA_DATAPREP_REDIS_FINANCE"
export EMBEDDING_MODEL_ID="BAAI/bge-base-en-v1.5"
export TEI_EMBEDDING_ENDPOINT="http://${ip_address}:${TEI_EMBEDDER_PORT}"
docker compose -f dataprep_compose.yaml up -d

View File

@@ -0,0 +1,7 @@
# Copyright (C) 2025 Intel Corporation
# SPDX-License-Identifier: Apache-2.0
export LLM_MODEL_ID="meta-llama/Llama-3.3-70B-Instruct"
export MAX_LEN=16384
docker compose -f vllm_compose.yaml up -d

View File

@@ -0,0 +1,35 @@
# Copyright (C) 2025 Intel Corporation
# SPDX-License-Identifier: Apache-2.0
services:
vllm-service:
image: ${REGISTRY:-opea}/vllm-gaudi:${TAG:-latest}
container_name: vllm-gaudi-server
ports:
- "8086:8000"
volumes:
- ${HF_CACHE_DIR}:/data
environment:
no_proxy: ${no_proxy}
http_proxy: ${http_proxy}
https_proxy: ${https_proxy}
HF_TOKEN: ${HF_TOKEN}
HUGGING_FACE_HUB_TOKEN: ${HF_TOKEN}
HF_HOME: /data
HABANA_VISIBLE_DEVICES: all
OMPI_MCA_btl_vader_single_copy_mechanism: none
LLM_MODEL_ID: ${LLM_MODEL_ID}
VLLM_TORCH_PROFILER_DIR: "/mnt"
VLLM_SKIP_WARMUP: true
PT_HPU_ENABLE_LAZY_COLLECTIVES: true
healthcheck:
test: ["CMD-SHELL", "curl -f http://$host_ip:8086/health || exit 1"]
interval: 10s
timeout: 10s
retries: 100
runtime: habana
cap_add:
- SYS_NICE
ipc: host
command: --model $LLM_MODEL_ID --tensor-parallel-size 4 --host 0.0.0.0 --port 8000 --max-seq-len-to-capture $MAX_LEN

View File

@@ -0,0 +1,28 @@
# Copyright (C) 2024 Intel Corporation
# SPDX-License-Identifier: Apache-2.0
services:
dataprep:
build:
context: GenAIComps
dockerfile: comps/dataprep/src/Dockerfile
args:
http_proxy: ${http_proxy}
https_proxy: ${https_proxy}
no_proxy: ${no_proxy}
image: ${REGISTRY:-opea}/dataprep:${TAG:-latest}
agent:
build:
context: GenAIComps
dockerfile: comps/agent/src/Dockerfile
args:
http_proxy: ${http_proxy}
https_proxy: ${https_proxy}
no_proxy: ${no_proxy}
image: ${REGISTRY:-opea}/agent:${TAG:-latest}
# agent-ui:
# build:
# context: ../ui
# dockerfile: ./docker/Dockerfile
# extends: agent
# image: ${REGISTRY:-opea}/agent-ui:${TAG:-latest}

View File

@@ -0,0 +1,40 @@
# Copyright (C) 2025 Intel Corporation
# SPDX-License-Identifier: Apache-2.0
REACT_AGENT_LLAMA_PROMPT = """\
You are a helpful assistant engaged in multi-turn conversations with Financial analysts.
You have access to the following two tools:
{tools}
**Procedure:**
1. Read the question carefully. Divide the question into sub-questions and conquer sub-questions one by one.
3. If there is execution history, read it carefully and reason about the information gathered so far and decide if you can answer the question or if you need to call more tools.
**Output format:**
You should output your thought process. Finish thinking first. Output tool calls or your answer at the end.
When making tool calls, you should use the following format:
TOOL CALL: {{"tool": "tool1", "args": {{"arg1": "value1", "arg2": "value2", ...}}}}
If you can answer the question, provide the answer in the following format:
FINAL ANSWER: {{"answer": "your answer here"}}
======= Conversations with user in previous turns =======
{thread_history}
======= End of previous conversations =======
======= Your execution History in this turn =========
{history}
======= End of execution history ==========
**Tips:**
* You may need to do multi-hop calculations and call tools multiple times to get an answer.
* Do not assume any financial figures. Always rely on the tools to get the factual information.
* If you need a certain financial figure, search for the figure instead of the financial statement name.
* If you did not get the answer at first, do not give up. Reflect on the steps that you have taken and try a different way. Think out of the box. You hard work will be rewarded.
* Give concise, factual and relevant answers.
* If the user question is too ambiguous, ask for clarification.
Now take a deep breath and think step by step to answer user's question in this turn.
USER MESSAGE: {input}
"""

View File

@@ -0,0 +1,70 @@
# Copyright (C) 2025 Intel Corporation
# SPDX-License-Identifier: Apache-2.0
REACT_AGENT_LLAMA_PROMPT = """\
Role: Expert Investor
Department: Finance
Primary Responsibility: Generation of Customized Financial Analysis Reports
Role Description:
As an Expert Investor within the finance domain, your expertise is harnessed to develop bespoke Financial Analysis Reports that cater to specific client requirements. This role demands a deep dive into financial statements and market data to unearth insights regarding a company's financial performance and stability. Engaging directly with clients to gather essential information and continuously refining the report with their feedback ensures the final product precisely meets their needs and expectations.
Key Objectives:
Analytical Precision: Employ meticulous analytical prowess to interpret financial data, identifying underlying trends and anomalies.
Effective Communication: Simplify and effectively convey complex financial narratives, making them accessible and actionable to non-specialist audiences.
Client Focus: Dynamically tailor reports in response to client feedback, ensuring the final analysis aligns with their strategic objectives.
Adherence to Excellence: Maintain the highest standards of quality and integrity in report generation, following established benchmarks for analytical rigor.
Performance Indicators: The efficacy of the Financial Analysis Report is measured by its utility in providing clear, actionable insights. This encompasses aiding corporate decision-making, pinpointing areas for operational enhancement, and offering a lucid evaluation of the company's financial health. Success is ultimately reflected in the report's contribution to informed investment decisions and strategic planning.
Reply TERMINATE when everything is settled.
You have access to the following tools:
{tools}
For writing a comprehensive analysis financial research report, you can use all the tools provided to retrieve information available for the company.
**Pay attention to the following:**
1. Explicitly explain your working plan before you kick off.
2. Read the question carefully. Firstly You need get accurate `start_date` and `end_date` value, because most tools need the 2 values like company news, financials. You can get `end_date` with `get_current_date` tool if user doesn't provide. And you can infer `start_date` with `end_date` using the rule `start_date is one year earlier than end_date` if user doesn't provide.
3. Use tools one by one for clarity, especially when asking for instructions.
4. Provide stock performance, because the financial report is used for stock investment analysis.
5. Read the execution history if any to understand the tools that have been called and the information that has been gathered.
6. Reason about the information gathered so far and decide if you can answer the question or if you need to call more tools.
**Output format:**
You should output your thought process:
When need tool calls, you should use the following format:
TOOL CALL: {{"tool": "tool1", "args": {{"arg1": "value1", "arg2": "value2", ...}}}}
TOOL CALL: {{"tool": "tool2", "args": {{"arg1": "value1", "arg2": "value2", ...}}}}
If you have enough financial data, provide the financial report in the following format:
FINAL ANSWER: {{"answer": "compile all the analyzed data and insights into a comprehensive financial report, which contains the following paragraphs: income summarization, market position, business overview, risk assessment, competitors analysis, share performance analysis."}}
Follow these guidelines when formulating your answer:
1. If the question contains a false premise or assumption, answer “invalid question”.
2. If you are uncertain or do not know the answer, answer “I don't know”.
3. Give concise, factual and relevant answers.
**IMPORTANT:**
* Do not generate history messages repeatedly.
* Divide the question into sub-questions and conquer sub-questions one by one.
* Questions may be time sensitive. Pay attention to the time when the question was asked.
* You may need to combine information from multiple tools to answer the question.
* If you did not get the answer at first, do not give up. Reflect on the steps that you have taken and try a different way. Think out of the box. You hard work will be rewarded.
* Do not make up tool outputs.
======= Conversations with user in previous turns =======
{thread_history}
======= End of previous conversations =======
======= Your execution History in this turn =========
{history}
======= End of execution history ==========
Now take a deep breath and think step by step to answer user's question in this turn.
USER MESSAGE: {input}
"""

View File

@@ -0,0 +1,34 @@
# Copyright (C) 2025 Intel Corporation
# SPDX-License-Identifier: Apache-2.0
REACT_AGENT_LLAMA_PROMPT = """\
You are a helpful assistant engaged in multi-turn conversations with users.
You have the following worker agents working for you. You can call them as calling tools.
{tools}
**Procedure:**
1. Read the question carefully. Decide which agent you should call to answer the question.
2. The worker agents need detailed inputs. Ask the user to clarify when you lack certain info or are uncertain about something. Do not assume anything. For example, user asks about "recent earnings call of Microsoft", ask the user to specify the quarter and year.
3. Read the execution history if any to understand the worker agents that have been called and the information that has been gathered.
4. Reason about the information gathered so far and decide if you can answer the question or if you need to gather more info.
**Output format:**
You should output your thought process. Finish thinking first. Output tool calls or your answer at the end.
When calling worker agents, you should use the following tool-call format:
TOOL CALL: {{"tool": "tool1", "args": {{"arg1": "value1", "arg2": "value2", ...}}}}
TOOL CALL: {{"tool": "tool2", "args": {{"arg1": "value1", "arg2": "value2", ...}}}}
If you can answer the question, provide the answer in the following format:
FINAL ANSWER: {{"answer": "your answer here"}}
======= Conversations with user in previous turns =======
{thread_history}
======= End of previous conversations =======
======= Your execution History in this turn =========
{history}
======= End of execution history ==========
Now take a deep breath and think step by step to answer user's question in this turn.
USER MESSAGE: {input}
"""

View File

@@ -0,0 +1,98 @@
# Copyright (C) 2025 Intel Corporation
# SPDX-License-Identifier: Apache-2.0
import argparse
import json
import uuid
import requests
def process_request(url, query, is_stream=False):
proxies = {"http": ""}
content = json.dumps(query) if query is not None else None
try:
resp = requests.post(url=url, data=content, proxies=proxies, stream=is_stream)
if not is_stream:
ret = resp.json()["text"]
else:
for line in resp.iter_lines(decode_unicode=True):
print(line)
ret = None
resp.raise_for_status() # Raise an exception for unsuccessful HTTP status codes
return ret
except requests.exceptions.RequestException as e:
ret = f"An error occurred:{e}"
return None
def test_worker_agent(args):
url = f"http://{args.ip_addr}:{args.ext_port}/v1/chat/completions"
if args.tool_choice is None:
query = {"role": "user", "messages": args.prompt, "stream": "false"}
else:
query = {"role": "user", "messages": args.prompt, "stream": "false", "tool_choice": args.tool_choice}
ret = process_request(url, query)
print("Response: ", ret)
def add_message_and_run(url, user_message, thread_id, stream=False):
print("User message: ", user_message)
query = {"role": "user", "messages": user_message, "thread_id": thread_id, "stream": stream}
ret = process_request(url, query, is_stream=stream)
print("Response: ", ret)
def test_chat_completion_multi_turn(args):
url = f"http://{args.ip_addr}:{args.ext_port}/v1/chat/completions"
thread_id = f"{uuid.uuid4()}"
# first turn
print("===============First turn==================")
user_message = "Key takeaways of Gap's 2024 Q4 earnings call?"
add_message_and_run(url, user_message, thread_id, stream=args.stream)
print("===============End of first turn==================")
# second turn
print("===============Second turn==================")
user_message = "What was Gap's forecast for 2025?"
add_message_and_run(url, user_message, thread_id, stream=args.stream)
print("===============End of second turn==================")
def test_supervisor_agent_single_turn(args):
url = f"http://{args.ip_addr}:{args.ext_port}/v1/chat/completions"
query_list = [
"What was Gap's revenue growth in 2024?",
"Can you summarize Costco's 2025 Q2 earnings call?",
# "Should I increase investment in Costco?",
]
for query in query_list:
thread_id = f"{uuid.uuid4()}"
add_message_and_run(url, query, thread_id, stream=args.stream)
print("=" * 50)
if __name__ == "__main__":
parser = argparse.ArgumentParser()
parser.add_argument("--ip_addr", type=str, default="127.0.0.1", help="endpoint ip address")
parser.add_argument("--ext_port", type=str, default="9090", help="endpoint port")
parser.add_argument("--stream", action="store_true", help="streaming mode")
parser.add_argument("--prompt", type=str, help="prompt message")
parser.add_argument("--agent_role", type=str, default="supervisor", help="supervisor or worker")
parser.add_argument("--multi-turn", action="store_true", help="multi-turn conversation")
parser.add_argument("--tool_choice", nargs="+", help="limit tools")
args, _ = parser.parse_known_args()
print(args)
if args.agent_role == "supervisor":
if args.multi_turn:
test_chat_completion_multi_turn(args)
else:
test_supervisor_agent_single_turn(args)
elif args.agent_role == "worker":
test_worker_agent(args)
else:
raise ValueError("Invalid agent role")

View File

@@ -0,0 +1,270 @@
#!/bin/bash
# Copyright (C) 2024 Intel Corporation
# SPDX-License-Identifier: Apache-2.0
set -xe
export WORKPATH=$(dirname "$PWD")
export WORKDIR=$WORKPATH/../../
echo "WORKDIR=${WORKDIR}"
export ip_address=$(hostname -I | awk '{print $1}')
LOG_PATH=$WORKPATH
#### env vars for LLM endpoint #############
model=meta-llama/Llama-3.3-70B-Instruct
vllm_image=opea/vllm-gaudi:latest
vllm_port=8086
vllm_image=$vllm_image
HF_CACHE_DIR=${model_cache:-"/data2/huggingface"}
vllm_volume=${HF_CACHE_DIR}
#######################################
#### env vars for dataprep #############
export host_ip=${ip_address}
export DATAPREP_PORT="6007"
export TEI_EMBEDDER_PORT="10221"
export REDIS_URL_VECTOR="redis://${ip_address}:6379"
export REDIS_URL_KV="redis://${ip_address}:6380"
export LLM_MODEL=$model
export LLM_ENDPOINT="http://${ip_address}:${vllm_port}"
export DATAPREP_COMPONENT_NAME="OPEA_DATAPREP_REDIS_FINANCE"
export EMBEDDING_MODEL_ID="BAAI/bge-base-en-v1.5"
export TEI_EMBEDDING_ENDPOINT="http://${ip_address}:${TEI_EMBEDDER_PORT}"
#######################################
function get_genai_comps() {
if [ ! -d "GenAIComps" ] ; then
git clone --depth 1 --branch ${opea_branch:-"main"} https://github.com/opea-project/GenAIComps.git
fi
}
function build_dataprep_agent_images() {
cd $WORKDIR/GenAIExamples/FinanceAgent/docker_image_build/
get_genai_comps
echo "Build agent image with --no-cache..."
docker compose -f build.yaml build --no-cache
}
function build_agent_image_local(){
cd $WORKDIR/GenAIComps/
docker build -t opea/agent:latest -f comps/agent/src/Dockerfile . --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy
}
function build_vllm_docker_image() {
echo "Building the vllm docker image"
cd $WORKPATH
echo $WORKPATH
if [ ! -d "./vllm-fork" ]; then
git clone https://github.com/HabanaAI/vllm-fork.git
fi
cd ./vllm-fork
# VLLM_VER=$(git describe --tags "$(git rev-list --tags --max-count=1)")
VLLM_VER=v0.6.6.post1+Gaudi-1.20.0
git checkout ${VLLM_VER} &> /dev/null
docker build --no-cache -f Dockerfile.hpu -t $vllm_image --shm-size=128g . --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy
if [ $? -ne 0 ]; then
echo "$vllm_image failed"
exit 1
else
echo "$vllm_image successful"
fi
}
function start_vllm_service_70B() {
echo "token is ${HF_TOKEN}"
echo "start vllm gaudi service"
echo "**************model is $model**************"
docker run -d --runtime=habana --rm --name "vllm-gaudi-server" -e HABANA_VISIBLE_DEVICES=all -p $vllm_port:8000 -v $vllm_volume:/data -e HF_TOKEN=$HF_TOKEN -e HUGGING_FACE_HUB_TOKEN=$HF_TOKEN -e HF_HOME=/data -e OMPI_MCA_btl_vader_single_copy_mechanism=none -e PT_HPU_ENABLE_LAZY_COLLECTIVES=true -e http_proxy=$http_proxy -e https_proxy=$https_proxy -e no_proxy=$no_proxy -e VLLM_SKIP_WARMUP=true --cap-add=sys_nice --ipc=host $vllm_image --model ${model} --max-seq-len-to-capture 16384 --tensor-parallel-size 4
sleep 10s
echo "Waiting vllm gaudi ready"
n=0
until [[ "$n" -ge 200 ]] || [[ $ready == true ]]; do
docker logs vllm-gaudi-server &> ${LOG_PATH}/vllm-gaudi-service.log
n=$((n+1))
if grep -q "Uvicorn running on" ${LOG_PATH}/vllm-gaudi-service.log; then
break
fi
if grep -q "No such container" ${LOG_PATH}/vllm-gaudi-service.log; then
echo "container vllm-gaudi-server not found"
exit 1
fi
sleep 10s
done
sleep 10s
echo "Service started successfully"
}
function stop_llm(){
cid=$(docker ps -aq --filter "name=vllm-gaudi-server")
echo "Stopping container $cid"
if [[ ! -z "$cid" ]]; then docker rm $cid -f && sleep 1s; fi
}
function start_dataprep(){
docker compose -f $WORKPATH/docker_compose/intel/hpu/gaudi/dataprep_compose.yaml up -d
sleep 1m
}
function validate() {
local CONTENT="$1"
local EXPECTED_RESULT="$2"
local SERVICE_NAME="$3"
echo "EXPECTED_RESULT: $EXPECTED_RESULT"
echo "Content: $CONTENT"
if echo "$CONTENT" | grep -q "$EXPECTED_RESULT"; then
echo "[ $SERVICE_NAME ] Content is as expected: $CONTENT"
echo 0
else
echo "[ $SERVICE_NAME ] Content does not match the expected result: $CONTENT"
echo 1
fi
}
function ingest_validate_dataprep() {
# test /v1/dataprep/ingest
echo "=========== Test ingest ==========="
local CONTENT=$(python $WORKPATH/tests/test_redis_finance.py --port $DATAPREP_PORT --test_option ingest)
local EXIT_CODE=$(validate "$CONTENT" "200" "dataprep-redis-finance")
echo "$EXIT_CODE"
local EXIT_CODE="${EXIT_CODE:0-1}"
if [ "$EXIT_CODE" == "1" ]; then
docker logs dataprep-redis-server-finance
exit 1
fi
# test /v1/dataprep/get
echo "=========== Test get ==========="
local CONTENT=$(python $WORKPATH/tests/test_redis_finance.py --port $DATAPREP_PORT --test_option get)
local EXIT_CODE=$(validate "$CONTENT" "Request successful" "dataprep-redis-finance")
echo "$EXIT_CODE"
local EXIT_CODE="${EXIT_CODE:0-1}"
if [ "$EXIT_CODE" == "1" ]; then
docker logs dataprep-redis-server-finance
exit 1
fi
}
function stop_dataprep() {
echo "Stopping databases"
cid=$(docker ps -aq --filter "name=dataprep-redis-server*" --filter "name=redis-*" --filter "name=tei-embedding-*")
if [[ ! -z "$cid" ]]; then docker stop $cid && docker rm $cid && sleep 1s; fi
}
function start_agents() {
echo "Starting Agent services"
cd $WORKDIR/GenAIExamples/FinanceAgent/docker_compose/intel/hpu/gaudi/
bash launch_agents.sh
sleep 2m
}
function validate_agent_service() {
# # test worker finqa agent
echo "======================Testing worker finqa agent======================"
export agent_port="9095"
prompt="What is Gap's revenue in 2024?"
local CONTENT=$(python3 $WORKDIR/GenAIExamples/FinanceAgent/tests/test.py --prompt "$prompt" --agent_role "worker" --ext_port $agent_port)
echo $CONTENT
local EXIT_CODE=$(validate "$CONTENT" "15" "finqa-agent-endpoint")
echo $EXIT_CODE
local EXIT_CODE="${EXIT_CODE:0-1}"
if [ "$EXIT_CODE" == "1" ]; then
docker logs finqa-agent-endpoint
exit 1
fi
# # test worker research agent
echo "======================Testing worker research agent======================"
export agent_port="9096"
prompt="generate NVDA financial research report"
local CONTENT=$(python3 $WORKDIR/GenAIExamples/AgentQnA/tests/test.py --prompt "$prompt" --agent_role "worker" --ext_port $agent_port --tool_choice "get_current_date" --tool_choice "get_share_performance")
local EXIT_CODE=$(validate "$CONTENT" "NVDA" "research-agent-endpoint")
echo $CONTENT
echo $EXIT_CODE
local EXIT_CODE="${EXIT_CODE:0-1}"
if [ "$EXIT_CODE" == "1" ]; then
docker logs research-agent-endpoint
exit 1
fi
# test supervisor react agent
echo "======================Testing supervisor agent: single turns ======================"
export agent_port="9090"
local CONTENT=$(python3 $WORKDIR/GenAIExamples/FinanceAgent/tests/test.py --agent_role "supervisor" --ext_port $agent_port --stream)
echo $CONTENT
# local EXIT_CODE=$(validate "$CONTENT" "" "react-agent-endpoint")
# echo $EXIT_CODE
# local EXIT_CODE="${EXIT_CODE:0-1}"
# if [ "$EXIT_CODE" == "1" ]; then
# docker logs react-agent-endpoint
# exit 1
# fi
echo "======================Testing supervisor agent: multi turns ======================"
local CONTENT=$(python3 $WORKDIR/GenAIExamples/FinanceAgent/tests/test.py --agent_role "supervisor" --ext_port $agent_port --multi-turn --stream)
echo $CONTENT
# local EXIT_CODE=$(validate "$CONTENT" "" "react-agent-endpoint")
# echo $EXIT_CODE
# local EXIT_CODE="${EXIT_CODE:0-1}"
# if [ "$EXIT_CODE" == "1" ]; then
# docker logs react-agent-endpoint
# exit 1
# fi
}
function stop_agent_docker() {
cd $WORKPATH/docker_compose/intel/hpu/gaudi/
container_list=$(cat compose.yaml | grep container_name | cut -d':' -f2)
for container_name in $container_list; do
cid=$(docker ps -aq --filter "name=$container_name")
echo "Stopping container $container_name"
if [[ ! -z "$cid" ]]; then docker rm $cid -f && sleep 1s; fi
done
}
echo "workpath: $WORKPATH"
echo "=================== Stop containers ===================="
stop_llm
stop_agent_docker
stop_dataprep
cd $WORKPATH/tests
# echo "=================== #1 Building docker images===================="
build_vllm_docker_image
build_dataprep_agent_images
#### for local test
# build_agent_image_local
# echo "=================== #1 Building docker images completed===================="
# echo "=================== #2 Start vllm endpoint===================="
start_vllm_service_70B
# echo "=================== #2 vllm endpoint started===================="
# echo "=================== #3 Start dataprep and ingest data ===================="
start_dataprep
ingest_validate_dataprep
# echo "=================== #3 Data ingestion and validation completed===================="
echo "=================== #4 Start agents ===================="
start_agents
validate_agent_service
echo "=================== #4 Agent test passed ===================="
echo "=================== #5 Stop microservices ===================="
stop_agent_docker
stop_dataprep
stop_llm
echo "=================== #5 Microservices stopped===================="
echo y | docker system prune
echo "ALL DONE!!"

View File

@@ -0,0 +1,70 @@
# Copyright (C) 2025 Intel Corporation
# SPDX-License-Identifier: Apache-2.0
import json
import os
import requests
def test_html(url, link_list):
proxies = {"http": ""}
payload = {"link_list": json.dumps(link_list)}
try:
resp = requests.post(url=url, data=payload, proxies=proxies)
print(resp.text)
resp.raise_for_status() # Raise an exception for unsuccessful HTTP status codes
print("Request successful!")
except requests.exceptions.RequestException as e:
print("An error occurred:", e)
def test_delete(url, filename):
proxies = {"http": ""}
payload = {"file_path": filename}
try:
resp = requests.post(url=url, json=payload, proxies=proxies)
print(resp.text)
resp.raise_for_status() # Raise an exception for unsuccessful HTTP status codes
print("Request successful!")
except requests.exceptions.RequestException as e:
print("An error occurred:", e)
def test_get(url):
proxies = {"http": ""}
try:
resp = requests.post(url=url, proxies=proxies)
print(resp.text)
resp.raise_for_status() # Raise an exception for unsuccessful HTTP status codes
print("Request successful!")
except requests.exceptions.RequestException as e:
print("An error occurred:", e)
if __name__ == "__main__":
import argparse
parser = argparse.ArgumentParser()
parser.add_argument("--test_option", type=str, default="ingest", help="ingest, get, delete")
parser.add_argument("--port", type=str, default="6007", help="port number")
args = parser.parse_args()
port = args.port
if args.test_option == "ingest":
url = f"http://localhost:{port}/v1/dataprep/ingest"
link_list = [
"https://www.fool.com/earnings/call-transcripts/2025/03/06/costco-wholesale-cost-q2-2025-earnings-call-transc/",
"https://www.fool.com/earnings/call-transcripts/2025/03/07/gap-gap-q4-2024-earnings-call-transcript/",
]
test_html(url, link_list)
elif args.test_option == "delete":
url = f"http://localhost:{port}/v1/dataprep/delete"
filename = "Costco Wholesale"
test_delete(url, filename)
elif args.test_option == "get":
url = f"http://localhost:{port}/v1/dataprep/get"
test_get(url)
else:
raise ValueError("Invalid test_option value. Please choose from ingest, get, delete.")

View File

@@ -0,0 +1,20 @@
# Copyright (C) 2025 Intel Corporation
# SPDX-License-Identifier: Apache-2.0
search_knowledge_base:
description: Search knowledge base of SEC filings.
callable_api: finqa_tools.py:get_context_bm25_llm
args_schema:
query:
type: str
description: the query to search for. Should be detailed. Do not include the company name.
company:
type: str
description: the company of interest.
year:
type: str
description: the year of interest, can only specify one year. can be an empty string.
quarter:
type: str
description: the quarter of interest, can only specify one quarter. can be 'Q1', 'Q2', 'Q3', 'Q4'. can be an empty string.
return_output: retrieved_data

View File

@@ -0,0 +1,100 @@
# Copyright (C) 2025 Intel Corporation
# SPDX-License-Identifier: Apache-2.0
from tools.utils import *
def get_context_bm25_llm(query, company, year, quarter=""):
k = 5
company_list = get_company_list()
company = get_company_name_in_kb(company, company_list)
if "Cannot find" in company or "Database is empty" in company:
return company
print(f"Company: {company}")
# chunks
index_name = f"chunks_{company}"
vector_store = get_vectorstore(index_name)
chunks_bm25 = bm25_search_broad(query, company, year, quarter, k=k, doc_type="chunks")
chunks_sim = similarity_search(vector_store, k, query, company, year, quarter)
chunks = chunks_bm25 + chunks_sim
# tables
try:
index_name = f"tables_{company}"
vector_store_table = get_vectorstore(index_name)
# get tables matching metadata
tables_bm25 = bm25_search_broad(query, company, year, quarter, k=k, doc_type="tables")
tables_sim = similarity_search(vector_store_table, k, query, company, year, quarter)
tables = tables_bm25 + tables_sim
except:
tables = []
# get unique results
context = get_unique_docs(chunks + tables)
print("Context:\n", context[:500])
if context:
query = f"{query} for {company} in {year} {quarter}"
prompt = ANSWER_PROMPT.format(query=query, documents=context)
response = generate_answer(prompt)
response = parse_response(response)
else:
response = f"No relevant information found for {company} in {year} {quarter}."
print("Search result:\n", response)
return response
def search_full_doc(query, company):
company = company.upper()
# decide if company is in company list
company_list = get_company_list()
company = get_company_name_in_kb(company, company_list)
if "Cannot find" in company or "Database is empty" in company:
return company
# search most similar doc title
index_name = f"titles_{company}"
vector_store = get_vectorstore_titles(index_name)
k = 1
docs = vector_store.similarity_search(query, k=k)
if docs:
doc = docs[0]
doc_title = doc.page_content
print(f"Most similar doc title: {doc_title}")
kvstore = RedisKVStore(redis_uri=REDIS_URL_KV)
doc = kvstore.get(doc_title, f"full_doc_{company}")
content = doc["full_doc"]
doc_length = doc["doc_length"]
print(f"Doc length: {doc_length}")
print(f"Full doc content: {content[:100]}...")
# once summary is done, can save to kvstore
# first delete the old record
# kvstore.delete(doc_title, f"full_doc_{company}")
# then save the new record with summary
# kvstore.put(doc_title, {"full_doc": content, "summary":summary,"doc_length":doc_length, **metadata}, f"full_doc_{company}")
return content
if __name__ == "__main__":
# company="Gap"
# year="2024"
# quarter="Q4"
company = "Costco"
year = "2025"
quarter = "Q2"
collection_name = f"chunks_{company}"
search_metadata = ("company", company)
resp = get_context_bm25_llm("revenue", company, year, quarter)
print("***Response:\n", resp)
print("=" * 50)
print("testing retrieve full doc")
query = f"{company} {year} {quarter} earning call"
search_full_doc(query, company)

View File

@@ -0,0 +1,146 @@
# Copyright (C) 2025 Intel Corporation
# SPDX-License-Identifier: Apache-2.0
import json
from collections import OrderedDict
from typing import Any, Dict, List, Optional, Tuple
from redis import Redis
from redis.asyncio import Redis as AsyncRedis
DEFAULT_COLLECTION = "data"
DEFAULT_BATCH_SIZE = 1
class RedisKVStore:
def __init__(
self,
redis_uri: Optional[str] = "redis://127.0.0.1:6379",
**kwargs: Any,
):
try:
# connect to redis from url
self._redis_client = Redis.from_url(redis_uri, **kwargs)
self._async_redis_client = AsyncRedis.from_url(redis_uri, **kwargs)
except ValueError as e:
raise ValueError(f"Redis failed to connect: {e}")
def put(self, key: str, val: dict, collection: str = DEFAULT_COLLECTION) -> None:
"""Put a key-value pair into the store.
Args:
key (str): key
val (dict): value
collection (str): collection name
"""
self._redis_client.hset(name=collection, key=key, value=json.dumps(val))
async def aput(self, key: str, val: dict, collection: str = DEFAULT_COLLECTION) -> None:
"""Put a key-value pair into the store.
Args:
key (str): key
val (dict): value
collection (str): collection name
"""
await self._async_redis_client.hset(name=collection, key=key, value=json.dumps(val))
def put_all(
self,
kv_pairs: List[Tuple[str, dict]],
collection: str = DEFAULT_COLLECTION,
batch_size: int = DEFAULT_BATCH_SIZE,
) -> None:
"""Put a dictionary of key-value pairs into the store.
Args:
kv_pairs (List[Tuple[str, dict]]): key-value pairs
collection (str): collection name
"""
with self._redis_client.pipeline() as pipe:
cur_batch = 0
for key, val in kv_pairs:
pipe.hset(name=collection, key=key, value=json.dumps(val))
cur_batch += 1
if cur_batch >= batch_size:
cur_batch = 0
pipe.execute()
if cur_batch > 0:
pipe.execute()
def get(self, key: str, collection: str = DEFAULT_COLLECTION) -> Optional[dict]:
"""Get a value from the store.
Args:
key (str): key
collection (str): collection name
"""
val_str = self._redis_client.hget(name=collection, key=key)
if val_str is None:
return None
return json.loads(val_str)
async def aget(self, key: str, collection: str = DEFAULT_COLLECTION) -> Optional[dict]:
"""Get a value from the store.
Args:
key (str): key
collection (str): collection name
"""
val_str = await self._async_redis_client.hget(name=collection, key=key)
if val_str is None:
return None
return json.loads(val_str)
def get_all(self, collection: str = DEFAULT_COLLECTION) -> Dict[str, dict]:
"""Get all values from the store."""
collection_kv_dict = OrderedDict()
for key, val_str in self._redis_client.hscan_iter(name=collection):
value = json.loads(val_str)
collection_kv_dict[key.decode()] = value
return collection_kv_dict
async def aget_all(self, collection: str = DEFAULT_COLLECTION) -> Dict[str, dict]:
"""Get all values from the store."""
collection_kv_dict = OrderedDict()
async for key, val_str in self._async_redis_client.hscan_iter(name=collection):
value = json.loads(val_str)
collection_kv_dict[key.decode()] = value
return collection_kv_dict
def delete(self, key: str, collection: str = DEFAULT_COLLECTION) -> bool:
"""Delete a value from the store.
Args:
key (str): key
collection (str): collection name
"""
deleted_num = self._redis_client.hdel(collection, key)
return bool(deleted_num > 0)
async def adelete(self, key: str, collection: str = DEFAULT_COLLECTION) -> bool:
"""Delete a value from the store.
Args:
key (str): key
collection (str): collection name
"""
deleted_num = await self._async_redis_client.hdel(collection, key)
return bool(deleted_num > 0)
@classmethod
def from_host_and_port(
cls,
host: str,
port: int,
):
"""Load a RedisPersistence from a Redis host and port.
Args:
host (str): Redis host
port (int): Redis port
"""
url = f"redis://{host}:{port}".format(host=host, port=port)
return cls(redis_uri=url)

View File

@@ -0,0 +1,124 @@
# Copyright (C) 2025 Intel Corporation
# SPDX-License-Identifier: Apache-2.0
get_company_profile:
description: get a company's profile information.
callable_api: research_tools.py:get_company_profile
args_schema:
symbol:
type: str
description: the company name or ticker symbol.
return_output: profile
get_company_news:
description: retrieve market news related to designated company.
callable_api: research_tools.py:get_company_news
args_schema:
symbol:
type: str
description: the company name or ticker symbol.
start_date:
type: str
description: start date of the search period for the company's basic financials, yyyy-mm-dd.
end_date:
type: str
description: end date of the search period for the company's basic financials, yyyy-mm-dd.
max_news_num:
type: int
description: maximum number of news to return, default to 10.
return_output: news
get_basic_financials_history:
description: get historical basic financials for a designated company.
callable_api: research_tools.py:get_basic_financials_history
args_schema:
symbol:
type: str
description: the company name or ticker symbol.
freq:
type: str
description: reporting frequency of the company's basic financials, such as annual, quarterly.
start_date:
type: str
description: start date of the search period for the company's basic financials, yyyy-mm-dd.
end_date:
type: str
description: end date of the search period for the company's basic financials, yyyy-mm-dd.
selected_columns:
type: list
description: List of column names of news to return, should be chosen from 'assetTurnoverTTM', 'bookValue', 'cashRatio', 'currentRatio', 'ebitPerShare', 'eps', 'ev', 'fcfMargin', 'fcfPerShareTTM', 'grossMargin', 'inventoryTurnoverTTM', 'longtermDebtTotalAsset', 'longtermDebtTotalCapital', 'longtermDebtTotalEquity', 'netDebtToTotalCapital', 'netDebtToTotalEquity', 'netMargin', 'operatingMargin', 'payoutRatioTTM', 'pb', 'peTTM', 'pfcfTTM', 'pretaxMargin', 'psTTM', 'ptbv', 'quickRatio', 'receivablesTurnoverTTM', 'roaTTM', 'roeTTM', 'roicTTM', 'rotcTTM', 'salesPerShare', 'sgaToSale', 'tangibleBookValue', 'totalDebtToEquity', 'totalDebtToTotalAsset', 'totalDebtToTotalCapital', 'totalRatio','10DayAverageTradingVolume', '13WeekPriceReturnDaily', '26WeekPriceReturnDaily', '3MonthADReturnStd', '3MonthAverageTradingVolume', '52WeekHigh', '52WeekHighDate', '52WeekLow', '52WeekLowDate', '52WeekPriceReturnDaily', '5DayPriceReturnDaily', 'assetTurnoverAnnual', 'assetTurnoverTTM', 'beta', 'bookValuePerShareAnnual', 'bookValuePerShareQuarterly', 'bookValueShareGrowth5Y', 'capexCagr5Y', 'cashFlowPerShareAnnual', 'cashFlowPerShareQuarterly', 'cashFlowPerShareTTM', 'cashPerSharePerShareAnnual', 'cashPerSharePerShareQuarterly', 'currentDividendYieldTTM', 'currentEv/freeCashFlowAnnual', 'currentEv/freeCashFlowTTM', 'currentRatioAnnual', 'currentRatioQuarterly', 'dividendGrowthRate5Y', 'dividendPerShareAnnual', 'dividendPerShareTTM', 'dividendYieldIndicatedAnnual', 'ebitdPerShareAnnual', 'ebitdPerShareTTM', 'ebitdaCagr5Y', 'ebitdaInterimCagr5Y', 'enterpriseValue', 'epsAnnual', 'epsBasicExclExtraItemsAnnual', 'epsBasicExclExtraItemsTTM', 'epsExclExtraItemsAnnual', 'epsExclExtraItemsTTM', 'epsGrowth3Y', 'epsGrowth5Y', 'epsGrowthQuarterlyYoy', 'epsGrowthTTMYoy', 'epsInclExtraItemsAnnual', 'epsInclExtraItemsTTM', 'epsNormalizedAnnual', 'epsTTM', 'focfCagr5Y', 'grossMargin5Y', 'grossMarginAnnual', 'grossMarginTTM', 'inventoryTurnoverAnnual', 'inventoryTurnoverTTM', 'longTermDebt/equityAnnual', 'longTermDebt/equityQuarterly', 'marketCapitalization', 'monthToDatePriceReturnDaily', 'netIncomeEmployeeAnnual', 'netIncomeEmployeeTTM', 'netInterestCoverageAnnual', 'netInterestCoverageTTM', 'netMarginGrowth5Y', 'netProfitMargin5Y', 'netProfitMarginAnnual', 'netProfitMarginTTM', 'operatingMargin5Y'.
return_output: history_financials
get_basic_financials:
description: get latest basic financials for a designated company.
callable_api: research_tools.py:get_basic_financials
args_schema:
symbol:
type: str
description: the company name or ticker symbol.
selected_columns:
type: list
description: List of column names of news to return, should be chosen from 'assetTurnoverTTM', 'bookValue', 'cashRatio', 'currentRatio', 'ebitPerShare', 'eps', 'ev', 'fcfMargin', 'fcfPerShareTTM', 'grossMargin', 'inventoryTurnoverTTM', 'longtermDebtTotalAsset', 'longtermDebtTotalCapital', 'longtermDebtTotalEquity', 'netDebtToTotalCapital', 'netDebtToTotalEquity', 'netMargin', 'operatingMargin', 'payoutRatioTTM', 'pb', 'peTTM', 'pfcfTTM', 'pretaxMargin', 'psTTM', 'ptbv', 'quickRatio', 'receivablesTurnoverTTM', 'roaTTM', 'roeTTM', 'roicTTM', 'rotcTTM', 'salesPerShare', 'sgaToSale', 'tangibleBookValue', 'totalDebtToEquity', 'totalDebtToTotalAsset', 'totalDebtToTotalCapital', 'totalRatio','10DayAverageTradingVolume', '13WeekPriceReturnDaily', '26WeekPriceReturnDaily', '3MonthADReturnStd', '3MonthAverageTradingVolume', '52WeekHigh', '52WeekHighDate', '52WeekLow', '52WeekLowDate', '52WeekPriceReturnDaily', '5DayPriceReturnDaily', 'assetTurnoverAnnual', 'assetTurnoverTTM', 'beta', 'bookValuePerShareAnnual', 'bookValuePerShareQuarterly', 'bookValueShareGrowth5Y', 'capexCagr5Y', 'cashFlowPerShareAnnual', 'cashFlowPerShareQuarterly', 'cashFlowPerShareTTM', 'cashPerSharePerShareAnnual', 'cashPerSharePerShareQuarterly', 'currentDividendYieldTTM', 'currentEv/freeCashFlowAnnual', 'currentEv/freeCashFlowTTM', 'currentRatioAnnual', 'currentRatioQuarterly', 'dividendGrowthRate5Y', 'dividendPerShareAnnual', 'dividendPerShareTTM', 'dividendYieldIndicatedAnnual', 'ebitdPerShareAnnual', 'ebitdPerShareTTM', 'ebitdaCagr5Y', 'ebitdaInterimCagr5Y', 'enterpriseValue', 'epsAnnual', 'epsBasicExclExtraItemsAnnual', 'epsBasicExclExtraItemsTTM', 'epsExclExtraItemsAnnual', 'epsExclExtraItemsTTM', 'epsGrowth3Y', 'epsGrowth5Y', 'epsGrowthQuarterlyYoy', 'epsGrowthTTMYoy', 'epsInclExtraItemsAnnual', 'epsInclExtraItemsTTM', 'epsNormalizedAnnual', 'epsTTM', 'focfCagr5Y', 'grossMargin5Y', 'grossMarginAnnual', 'grossMarginTTM', 'inventoryTurnoverAnnual', 'inventoryTurnoverTTM', 'longTermDebt/equityAnnual', 'longTermDebt/equityQuarterly', 'marketCapitalization', 'monthToDatePriceReturnDaily', 'netIncomeEmployeeAnnual', 'netIncomeEmployeeTTM', 'netInterestCoverageAnnual', 'netInterestCoverageTTM', 'netMarginGrowth5Y', 'netProfitMargin5Y', 'netProfitMarginAnnual', 'netProfitMarginTTM', 'operatingMargin5Y'.
return_output: basic_financials
get_current_date:
description: get current date.
callable_api: research_tools.py:get_current_date
return_output: current_date
analyze_balance_sheet:
description: gets balance sheets for a given ticker over a given period.
callable_api: research_tools.py:analyze_balance_sheet
args_schema:
symbol:
type: str
description: the company name or ticker symbol.
period:
type: str
description: The period of the balance sheets, possible values such as annual, quarterly, ttm. Default is 'annual'.
limit:
type: int
description: The number of balance sheets to return. Default is 10.
return_output: balance_sheet
analyze_income_stmt:
description: gets income statements for a given ticker over a given period.
callable_api: research_tools.py:analyze_income_stmt
args_schema:
symbol:
type: str
description: the company name or ticker symbol.
period:
type: str
description: The period of the balance sheets, possible values, such as annual, quarterly, ttm. Default is 'annual'.
limit:
type: int
description: The number of balance sheets to return. Default is 10.
return_output: income_stmt
analyze_cash_flow:
description: gets cash flow statements for a given ticker over a given period.
callable_api: research_tools.py:analyze_cash_flow
args_schema:
symbol:
type: str
description: the company name or ticker symbol.
period:
type: str
description: The period of the balance sheets, possible values, such as annual, quarterly, ttm. Default is 'annual'.
limit:
type: int
description: The number of balance sheets to return. Default is 10.
return_output: cash_flow
get_share_performance:
description: gets stock prices for a given ticker over 60 days.
callable_api: research_tools.py:get_share_performance
args_schema:
symbol:
type: str
description: the company name or ticker symbol.
end_date:
type: str
description: end date of the search period for the company's basic financials, yyyy-mm-dd.
return_output: stock_price

View File

@@ -0,0 +1,468 @@
# Copyright (C) 2024 Intel Corporation
# SPDX-License-Identifier: Apache-2.0
import json
import os
import random
from collections import defaultdict
from datetime import date, datetime, timedelta
from textwrap import dedent
from typing import Annotated, Any, List, Optional
import pandas as pd
finnhub_client = None
try:
if os.environ.get("FINNHUB_API_KEY") is None:
print("Please set the environment variable FINNHUB_API_KEY to use the Finnhub API.")
else:
import finnhub
finnhub_client = finnhub.Client(api_key=os.environ["FINNHUB_API_KEY"])
print("Finnhub client initialized")
except:
pass
# https://github.com/langchain-ai/langchain/blob/master/libs/community/langchain_community/utilities/financial_datasets.py
"""
Util that calls several of financial datasets stock market REST APIs.
Docs: https://docs.financialdatasets.ai/
"""
import requests
from pydantic import BaseModel
FINANCIAL_DATASETS_BASE_URL = "https://api.financialdatasets.ai/"
class FinancialDatasetsAPIWrapper(BaseModel):
"""Wrapper for financial datasets API."""
financial_datasets_api_key: Optional[str] = None
def __init__(self, **data: Any):
super().__init__(**data)
self.financial_datasets_api_key = data["api_key"]
@property
def _api_key(self) -> str:
if self.financial_datasets_api_key is None:
raise ValueError(
"API key is required for the FinancialDatasetsAPIWrapper. "
"Please provide the API key by either:\n"
"1. Manually specifying it when initializing the wrapper: "
"FinancialDatasetsAPIWrapper(financial_datasets_api_key='your_api_key')\n"
"2. Setting it as an environment variable: FINANCIAL_DATASETS_API_KEY"
)
return self.financial_datasets_api_key
def get_income_statements(
self,
ticker: str,
period: str,
limit: Optional[int],
) -> Optional[dict]:
"""Get the income statements for a stock `ticker` over a `period` of time.
:param ticker: the stock ticker
:param period: the period of time to get the balance sheets for.
Possible values are: annual, quarterly, ttm.
:param limit: the number of results to return, default is 10
:return: a list of income statements
"""
url = (
f"{FINANCIAL_DATASETS_BASE_URL}financials/income-statements/"
f"?ticker={ticker}"
f"&period={period}"
f"&limit={limit if limit else 10}"
)
# Add the api key to the headers
headers = {"X-API-KEY": self._api_key}
# Execute the request
response = requests.get(url, headers=headers)
data = response.json()
return data.get("income_statements", None)
def get_balance_sheets(
self,
ticker: str,
period: str,
limit: Optional[int],
) -> List[dict]:
"""Get the balance sheets for a stock `ticker` over a `period` of time.
:param ticker: the stock ticker
:param period: the period of time to get the balance sheets for.
Possible values are: annual, quarterly, ttm.
:param limit: the number of results to return, default is 10
:return: a list of balance sheets
"""
url = (
f"{FINANCIAL_DATASETS_BASE_URL}financials/balance-sheets/"
f"?ticker={ticker}"
f"&period={period}"
f"&limit={limit if limit else 10}"
)
# Add the api key to the headers
headers = {"X-API-KEY": self._api_key}
# Execute the request
response = requests.get(url, headers=headers)
data = response.json()
return data.get("balance_sheets", None)
def get_cash_flow_statements(
self,
ticker: str,
period: str,
limit: Optional[int],
) -> List[dict]:
"""Get the cash flow statements for a stock `ticker` over a `period` of time.
:param ticker: the stock ticker
:param period: the period of time to get the balance sheets for.
Possible values are: annual, quarterly, ttm.
:param limit: the number of results to return, default is 10
:return: a list of cash flow statements
"""
url = (
f"{FINANCIAL_DATASETS_BASE_URL}financials/cash-flow-statements/"
f"?ticker={ticker}"
f"&period={period}"
f"&limit={limit if limit else 10}"
)
# Add the api key to the headers
headers = {"X-API-KEY": self._api_key}
# Execute the request
response = requests.get(url, headers=headers)
data = response.json()
return data.get("cash_flow_statements", None)
def run(self, mode: str, ticker: str, **kwargs: Any) -> str:
if mode == "get_income_statements":
period = kwargs.get("period", "annual")
limit = kwargs.get("limit", 10)
return json.dumps(self.get_income_statements(ticker, period, limit))
elif mode == "get_balance_sheets":
period = kwargs.get("period", "annual")
limit = kwargs.get("limit", 10)
return json.dumps(self.get_balance_sheets(ticker, period, limit))
elif mode == "get_cash_flow_statements":
period = kwargs.get("period", "annual")
limit = kwargs.get("limit", 10)
return json.dumps(self.get_cash_flow_statements(ticker, period, limit))
else:
raise ValueError(f"Invalid mode {mode} for financial datasets API.")
financial_datasets_client = None
try:
if os.environ.get("FINANCIAL_DATASETS_API_KEY") is None:
print("Please set the environment variable FINANCIAL_DATASETS_API_KEY to use the financialdatasets.ai data.")
else:
financial_datasets_client = FinancialDatasetsAPIWrapper(api_key=os.environ["FINANCIAL_DATASETS_API_KEY"])
print("FINANCIAL DATASETS client initialized")
except Exception as e:
print(str(e))
def get_company_profile(symbol: Annotated[str, "ticker symbol"]) -> str:
"""Get a company's profile information."""
profile = finnhub_client.company_profile2(symbol=symbol)
if not profile:
return f"Failed to find company profile for symbol {symbol} from finnhub!"
formatted_str = (
"[Company Introduction]:\n\n{name} is a leading entity in the {finnhubIndustry} sector. "
"Incorporated and publicly traded since {ipo}, the company has established its reputation as "
"one of the key players in the market. As of today, {name} has a market capitalization "
"of {marketCapitalization:.2f} in {currency}, with {shareOutstanding:.2f} shares outstanding."
"\n\n{name} operates primarily in the {country}, trading under the ticker {ticker} on the {exchange}. "
"As a dominant force in the {finnhubIndustry} space, the company continues to innovate and drive "
"progress within the industry."
).format(**profile)
return formatted_str
def get_company_news(
symbol: Annotated[str, "ticker symbol"],
start_date: Annotated[
str,
"start date of the search period for the company's basic financials, yyyy-mm-dd",
],
end_date: Annotated[
str,
"end date of the search period for the company's basic financials, yyyy-mm-dd",
],
max_news_num: Annotated[int, "maximum number of news to return, default to 10"] = 10,
):
"""Retrieve market news related to designated company."""
news = finnhub_client.company_news(symbol, _from=start_date, to=end_date)
if len(news) == 0:
print(f"No company news found for symbol {symbol} from finnhub!")
news = [
{
"date": datetime.fromtimestamp(n["datetime"]).strftime("%Y%m%d%H%M%S"),
"headline": n["headline"],
"summary": n["summary"],
}
for n in news
]
# Randomly select a subset of news if the number of news exceeds the maximum
if len(news) > max_news_num:
news = random.choices(news, k=max_news_num)
news.sort(key=lambda x: x["date"])
output = pd.DataFrame(news)
return output.to_json(orient="split")
def get_basic_financials_history(
symbol: Annotated[str, "ticker symbol"],
freq: Annotated[
str,
"reporting frequency of the company's basic financials: annual / quarterly",
],
start_date: Annotated[
str,
"start date of the search period for the company's basic financials, yyyy-mm-dd",
],
end_date: Annotated[
str,
"end date of the search period for the company's basic financials, yyyy-mm-dd",
],
selected_columns: Annotated[
list[str] | None,
"List of column names of news to return, should be chosen from 'assetTurnoverTTM', 'bookValue', 'cashRatio', 'currentRatio', 'ebitPerShare', 'eps', 'ev', 'fcfMargin', 'fcfPerShareTTM', 'grossMargin', 'inventoryTurnoverTTM', 'longtermDebtTotalAsset', 'longtermDebtTotalCapital', 'longtermDebtTotalEquity', 'netDebtToTotalCapital', 'netDebtToTotalEquity', 'netMargin', 'operatingMargin', 'payoutRatioTTM', 'pb', 'peTTM', 'pfcfTTM', 'pretaxMargin', 'psTTM', 'ptbv', 'quickRatio', 'receivablesTurnoverTTM', 'roaTTM', 'roeTTM', 'roicTTM', 'rotcTTM', 'salesPerShare', 'sgaToSale', 'tangibleBookValue', 'totalDebtToEquity', 'totalDebtToTotalAsset', 'totalDebtToTotalCapital', 'totalRatio'",
] = None,
) -> str:
if freq not in ["annual", "quarterly"]:
return f"Invalid reporting frequency {freq}. Please specify either 'annual' or 'quarterly'."
basic_financials = finnhub_client.company_basic_financials(symbol, "all")
if not basic_financials["series"]:
return f"Failed to find basic financials for symbol {symbol} from finnhub! Try a different symbol."
output_dict = defaultdict(dict)
for metric, value_list in basic_financials["series"][freq].items():
if selected_columns and metric not in selected_columns:
continue
for value in value_list:
if value["period"] >= start_date and value["period"] <= end_date:
output_dict[metric].update({value["period"]: value["v"]})
financials_output = pd.DataFrame(output_dict)
financials_output = financials_output.rename_axis(index="date")
return financials_output.to_json(orient="split")
def get_basic_financials(
symbol: Annotated[str, "ticker symbol"],
selected_columns: Annotated[
list[str] | None,
"List of column names of news to return, should be chosen from 'assetTurnoverTTM', 'bookValue', 'cashRatio', 'currentRatio', 'ebitPerShare', 'eps', 'ev', 'fcfMargin', 'fcfPerShareTTM', 'grossMargin', 'inventoryTurnoverTTM', 'longtermDebtTotalAsset', 'longtermDebtTotalCapital', 'longtermDebtTotalEquity', 'netDebtToTotalCapital', 'netDebtToTotalEquity', 'netMargin', 'operatingMargin', 'payoutRatioTTM', 'pb', 'peTTM', 'pfcfTTM', 'pretaxMargin', 'psTTM', 'ptbv', 'quickRatio', 'receivablesTurnoverTTM', 'roaTTM', 'roeTTM', 'roicTTM', 'rotcTTM', 'salesPerShare', 'sgaToSale', 'tangibleBookValue', 'totalDebtToEquity', 'totalDebtToTotalAsset', 'totalDebtToTotalCapital', 'totalRatio','10DayAverageTradingVolume', '13WeekPriceReturnDaily', '26WeekPriceReturnDaily', '3MonthADReturnStd', '3MonthAverageTradingVolume', '52WeekHigh', '52WeekHighDate', '52WeekLow', '52WeekLowDate', '52WeekPriceReturnDaily', '5DayPriceReturnDaily', 'assetTurnoverAnnual', 'assetTurnoverTTM', 'beta', 'bookValuePerShareAnnual', 'bookValuePerShareQuarterly', 'bookValueShareGrowth5Y', 'capexCagr5Y', 'cashFlowPerShareAnnual', 'cashFlowPerShareQuarterly', 'cashFlowPerShareTTM', 'cashPerSharePerShareAnnual', 'cashPerSharePerShareQuarterly', 'currentDividendYieldTTM', 'currentEv/freeCashFlowAnnual', 'currentEv/freeCashFlowTTM', 'currentRatioAnnual', 'currentRatioQuarterly', 'dividendGrowthRate5Y', 'dividendPerShareAnnual', 'dividendPerShareTTM', 'dividendYieldIndicatedAnnual', 'ebitdPerShareAnnual', 'ebitdPerShareTTM', 'ebitdaCagr5Y', 'ebitdaInterimCagr5Y', 'enterpriseValue', 'epsAnnual', 'epsBasicExclExtraItemsAnnual', 'epsBasicExclExtraItemsTTM', 'epsExclExtraItemsAnnual', 'epsExclExtraItemsTTM', 'epsGrowth3Y', 'epsGrowth5Y', 'epsGrowthQuarterlyYoy', 'epsGrowthTTMYoy', 'epsInclExtraItemsAnnual', 'epsInclExtraItemsTTM', 'epsNormalizedAnnual', 'epsTTM', 'focfCagr5Y', 'grossMargin5Y', 'grossMarginAnnual', 'grossMarginTTM', 'inventoryTurnoverAnnual', 'inventoryTurnoverTTM', 'longTermDebt/equityAnnual', 'longTermDebt/equityQuarterly', 'marketCapitalization', 'monthToDatePriceReturnDaily', 'netIncomeEmployeeAnnual', 'netIncomeEmployeeTTM', 'netInterestCoverageAnnual', 'netInterestCoverageTTM', 'netMarginGrowth5Y', 'netProfitMargin5Y', 'netProfitMarginAnnual', 'netProfitMarginTTM', 'operatingMargin5Y', 'operatingMarginAnnual', 'operatingMarginTTM', 'payoutRatioAnnual', 'payoutRatioTTM', 'pbAnnual', 'pbQuarterly', 'pcfShareAnnual', 'pcfShareTTM', 'peAnnual', 'peBasicExclExtraTTM', 'peExclExtraAnnual', 'peExclExtraTTM', 'peInclExtraTTM', 'peNormalizedAnnual', 'peTTM', 'pfcfShareAnnual', 'pfcfShareTTM', 'pretaxMargin5Y', 'pretaxMarginAnnual', 'pretaxMarginTTM', 'priceRelativeToS&P50013Week', 'priceRelativeToS&P50026Week', 'priceRelativeToS&P5004Week', 'priceRelativeToS&P50052Week', 'priceRelativeToS&P500Ytd', 'psAnnual', 'psTTM', 'ptbvAnnual', 'ptbvQuarterly', 'quickRatioAnnual', 'quickRatioQuarterly', 'receivablesTurnoverAnnual', 'receivablesTurnoverTTM', 'revenueEmployeeAnnual', 'revenueEmployeeTTM', 'revenueGrowth3Y', 'revenueGrowth5Y', 'revenueGrowthQuarterlyYoy', 'revenueGrowthTTMYoy', 'revenuePerShareAnnual', 'revenuePerShareTTM', 'revenueShareGrowth5Y', 'roa5Y', 'roaRfy', 'roaTTM', 'roe5Y', 'roeRfy', 'roeTTM', 'roi5Y', 'roiAnnual', 'roiTTM', 'tangibleBookValuePerShareAnnual', 'tangibleBookValuePerShareQuarterly', 'tbvCagr5Y', 'totalDebt/totalEquityAnnual', 'totalDebt/totalEquityQuarterly', 'yearToDatePriceReturnDaily'",
] = None,
) -> str:
"""Get latest basic financials for a designated company."""
basic_financials = finnhub_client.company_basic_financials(symbol, "all")
if not basic_financials["series"]:
return f"Failed to find basic financials for symbol {symbol} from finnhub! Try a different symbol."
output_dict = basic_financials["metric"]
for metric, value_list in basic_financials["series"]["quarterly"].items():
value = value_list[0]
output_dict.update({metric: value["v"]})
results = {}
for k in selected_columns:
if k in output_dict:
results[k] = output_dict[k]
return json.dumps(results, indent=2)
def get_current_date():
return date.today().strftime("%Y-%m-%d")
def combine_prompt(instruction, resource):
prompt = f"Resource: {resource}\n\nInstruction: {instruction}"
return prompt
def analyze_balance_sheet(
symbol: Annotated[str, "ticker symbol"],
period: Annotated[
str, "the period of time to get the balance sheets for. Possible values are: annual, quarterly, ttm."
],
limit: int = 10,
) -> str:
"""Retrieve the balance sheet for the given ticker symbol with the related section of its 10-K report.
Then return with an instruction on how to analyze the balance sheet.
"""
balance_sheet = financial_datasets_client.run(
mode="get_balance_sheets",
ticker=symbol,
period=period,
limit=limit,
)
df_string = "Balance sheet:\n" + balance_sheet
instruction = dedent(
"""
Delve into a detailed scrutiny of the company's balance sheet for the most recent fiscal year, pinpointing
the structure of assets, liabilities, and shareholders' equity to decode the firm's financial stability and
operational efficiency. Focus on evaluating the liquidity through current assets versus current liabilities,
the solvency via long-term debt ratios, and the equity position to gauge long-term investment potential.
Contrast these metrics with previous years' data to highlight financial trends, improvements, or deteriorations.
Finalize with a strategic assessment of the company's financial leverage, asset management, and capital structure,
providing insights into its fiscal health and future prospects in a single paragraph. Less than 130 words.
"""
)
prompt = combine_prompt(instruction, df_string)
return prompt
def analyze_income_stmt(
symbol: Annotated[str, "ticker symbol"],
period: Annotated[
str, "the period of time to get the balance sheets for. Possible values are: annual, quarterly, ttm."
],
limit: int = 10,
) -> str:
"""Retrieve the income statement for the given ticker symbol with the related section of its 10-K report.
Then return with an instruction on how to analyze the income statement.
"""
# Retrieve the income statement
income_stmt = financial_datasets_client.run(
mode="get_income_statements",
ticker=symbol,
period=period,
limit=limit,
)
df_string = "Income statement:\n" + income_stmt
# Analysis instruction
instruction = dedent(
"""
Conduct a comprehensive analysis of the company's income statement for the current fiscal year.
Start with an overall revenue record, including Year-over-Year or Quarter-over-Quarter comparisons,
and break down revenue sources to identify primary contributors and trends. Examine the Cost of
Goods Sold for potential cost control issues. Review profit margins such as gross, operating,
and net profit margins to evaluate cost efficiency, operational effectiveness, and overall profitability.
Analyze Earnings Per Share to understand investor perspectives. Compare these metrics with historical
data and industry or competitor benchmarks to identify growth patterns, profitability trends, and
operational challenges. The output should be a strategic overview of the companys financial health
in a single paragraph, less than 130 words, summarizing the previous analysis into 4-5 key points under
respective subheadings with specific discussion and strong data support.
"""
)
# Combine the instruction, section text, and income statement
prompt = combine_prompt(instruction, df_string)
return prompt
def analyze_cash_flow(
symbol: Annotated[str, "ticker symbol"],
period: Annotated[
str, "the period of time to get the balance sheets for. Possible values are: annual, quarterly, ttm."
],
limit: int = 10,
) -> str:
"""Retrieve the cash flow statement for the given ticker symbol with the related section of its 10-K report.
Then return with an instruction on how to analyze the cash flow statement.
"""
cash_flow = financial_datasets_client.run(
mode="get_cash_flow_statements",
ticker=symbol,
period=period,
limit=limit,
)
df_string = "Cash flow statement:\n" + cash_flow
instruction = dedent(
"""
Dive into a comprehensive evaluation of the company's cash flow for the latest fiscal year, focusing on cash inflows
and outflows across operating, investing, and financing activities. Examine the operational cash flow to assess the
core business profitability, scrutinize investing activities for insights into capital expenditures and investments,
and review financing activities to understand debt, equity movements, and dividend policies. Compare these cash movements
to prior periods to discern trends, sustainability, and liquidity risks. Conclude with an informed analysis of the company's
cash management effectiveness, liquidity position, and potential for future growth or financial challenges in a single paragraph.
Less than 130 words.
"""
)
prompt = combine_prompt(instruction, df_string)
return prompt
def get_share_performance(
symbol: Annotated[str, "Ticker symbol of the stock (e.g., 'AAPL' for Apple)"],
end_date: Annotated[
str,
"end date of the search period for the company's basic financials, yyyy-mm-dd",
],
) -> str:
"""Plot the stock performance of a company compared to the S&P 500 over the past year."""
filing_date = datetime.strptime(end_date, "%Y-%m-%d")
start = (filing_date - timedelta(days=60)).strftime("%Y-%m-%d")
end = filing_date.strftime("%Y-%m-%d")
interval = "day" # possible values are {'second', 'minute', 'day', 'week', 'month', 'year'}
interval_multiplier = 1 # every 1 day
# create the URL
url = (
f"https://api.financialdatasets.ai/prices/"
f"?ticker={symbol}"
f"&interval={interval}"
f"&interval_multiplier={interval_multiplier}"
f"&start_date={start}"
f"&end_date={end}"
)
headers = {"X-API-KEY": "your_api_key_here"}
response = requests.get(url, headers=headers)
# parse prices from the response
prices = response.json().get("prices")
df_string = "Past 60 days Stock prices:\n" + json.dumps(prices)
instruction = dedent(
"""
Dive into a comprehensive evaluation of the company's stock price for the latest 60 days.
Less than 130 words.
"""
)
prompt = combine_prompt(instruction, df_string)
return prompt

View File

@@ -0,0 +1,114 @@
# Copyright (C) 2025 Intel Corporation
# SPDX-License-Identifier: Apache-2.0
import json
import requests
try:
from tools.redis_kv import RedisKVStore
from tools.utils import *
except ImportError:
from redis_kv import RedisKVStore
from utils import *
def get_summary_else_doc(query, company):
# search most similar doc title
index_name = f"titles_{company}"
vector_store = get_vectorstore_titles(index_name)
k = 1
docs = vector_store.similarity_search(query, k=k)
if docs:
doc = docs[0]
doc_title = doc.page_content
print(f"Most similar doc title: {doc_title}")
kvstore = RedisKVStore(redis_uri=REDIS_URL_KV)
try:
# Check if summary already exists in the KV store
content = kvstore.get(f"{doc_title}_summary", f"full_doc_{company}")["summary"]
is_summary = True
print("Summary already exists in KV store.")
except Exception as e:
doc = kvstore.get(doc_title, f"full_doc_{company}")
content = doc["full_doc"]
is_summary = False
print("No summary found in KV store, returning full document content.")
else:
print(f"No similar document found for query: {query}")
doc_title = None
content = None
is_summary = False
return doc_title, content, is_summary
def save_doc_summary(summary, doc_title, company):
"""Adds a summary to the existing document in the key-value store.
Args:
kvstore: The key-value store instance.
summary: The summary to be added.
doc_title: The title of the document.
company: The company associated with the document.
"""
kvstore = RedisKVStore(redis_uri=REDIS_URL_KV)
# doc_dict = kvstore.get(doc_title, f"full_doc_{company}")
# # Add the summary to the dictionary
# doc_dict["summary"] = summary
# Save the updated value back to the store
kvstore.put(f"{doc_title}_summary", {"summary": summary}, collection=f"full_doc_{company}")
def summarize(doc_name, company):
docsum_url = os.environ.get("DOCSUM_ENDPOINT")
print(f"Docsum Endpoint URL: {docsum_url}")
company = format_company_name(company)
doc_title, sum, is_summary = get_summary_else_doc(doc_name, company)
if not doc_title:
return f"Cannot find documents related to {doc_title} in KV store."
if not is_summary:
data = {
"messages": sum,
"max_tokens": 512,
"language": "en",
"stream": False,
"summary_type": "auto",
"chunk_size": 2000,
}
headers = {"Content-Type": "application/json"}
try:
print("Computing Summary with OPEA DocSum...")
resp = requests.post(url=docsum_url, data=json.dumps(data), headers=headers)
ret = resp.json()["text"]
resp.raise_for_status() # Raise an exception for unsuccessful HTTP status codes
except requests.exceptions.RequestException as e:
ret = f"An error occurred:{e}"
# save summary into db
print("Saving Summary into KV Store...")
save_doc_summary(ret, doc_title, company)
return ret
else:
return sum
if __name__ == "__main__":
# company = "Gap"
# year = "2024"
# quarter = "Q4"
company = "Costco"
year = "2025"
quarter = "Q2"
print("testing summarize")
doc_name = f"{company} {year} {quarter} earning call"
ret = summarize(doc_name, company)
print("Summary: ", ret)
print("=" * 50)

View File

@@ -0,0 +1,32 @@
# Copyright (C) 2025 Intel Corporation
# SPDX-License-Identifier: Apache-2.0
finqa_agent:
description: answer financial questions about a company.
callable_api: supervisor_tools.py:finqa_agent
args_schema:
query:
type: str
description: should include company name and time, for example, which business unit had the highest growth for Microsoft in 2024.
return_output: retrieved_data
summarization_tool:
description: Searches KV store for summary, if it doesn't exist pulls full document and summarize it
callable_api: sum_agent_tools.py:summarize
args_schema:
doc_name:
type: str
description: Descriptive name of the document
company:
type: str
description: Name of the company document belongs to
return_output: summary
# research_agent:
# description: generate research report on a specified company with fundamentals analysis, sentiment analysis and risk analysis.
# callable_api: supervisor_tools.py:research_agent
# args_schema:
# company:
# type: str
# description: the company name
# return_output: report

View File

@@ -0,0 +1,28 @@
# Copyright (C) 2025 Intel Corporation
# SPDX-License-Identifier: Apache-2.0
import os
import requests
def finqa_agent(query: str):
url = os.environ.get("WORKER_FINQA_AGENT_URL")
print(url)
proxies = {"http": ""}
payload = {
"messages": query,
}
response = requests.post(url, json=payload, proxies=proxies)
return response.json()["text"]
def research_agent(company: str):
url = os.environ.get("WORKER_RESEARCH_AGENT_URL")
print(url)
proxies = {"http": ""}
payload = {
"messages": company,
}
response = requests.post(url, json=payload, proxies=proxies)
return response.json()["text"]

359
FinanceAgent/tools/utils.py Normal file
View File

@@ -0,0 +1,359 @@
# Copyright (C) 2025 Intel Corporation
# SPDX-License-Identifier: Apache-2.0
import os
from langchain_community.embeddings import HuggingFaceBgeEmbeddings
from langchain_community.retrievers import BM25Retriever
from langchain_core.documents import Document
from langchain_huggingface import HuggingFaceEndpointEmbeddings
from langchain_redis import RedisConfig, RedisVectorStore
from openai import OpenAI
try:
from tools.redis_kv import RedisKVStore
except ImportError:
from redis_kv import RedisKVStore
# Embedding model
EMBED_MODEL = os.getenv("EMBED_MODEL", "BAAI/bge-base-en-v1.5")
TEI_EMBEDDING_ENDPOINT = os.getenv("TEI_EMBEDDING_ENDPOINT", "")
# Redis URL
REDIS_URL_VECTOR = os.getenv("REDIS_URL_VECTOR", "redis://localhost:6379/")
REDIS_URL_KV = os.getenv("REDIS_URL_KV", "redis://localhost:6380/")
# LLM config
LLM_MODEL = os.getenv("model", "meta-llama/Llama-3.3-70B-Instruct")
LLM_ENDPOINT = os.getenv("llm_endpoint_url", "http://localhost:8086")
print(f"LLM endpoint: {LLM_ENDPOINT}")
MAX_TOKENS = 1024
TEMPERATURE = 0.2
COMPANY_NAME_PROMPT = """\
Here is the list of company names in the knowledge base:
{company_list}
This is the company of interest: {company}
Determine if the company of interest is the same as any of the companies in the knowledge base.
If yes, map the company of interest to the company name in the knowledge base. Output the company name in {{}}. Example: {{3M}}.
If none of the companies in the knowledge base match the company of interest, output "NONE".
"""
ANSWER_PROMPT = """\
You are a financial analyst. Read the documents below and answer the question.
Documents:
{documents}
Question: {query}
Now take a deep breath and think step by step to answer the question. Wrap your final answer in {{}}. Example: {{The company has a revenue of $100 million.}}
"""
def format_company_name(company):
company = company.upper()
# decide if company is in company list
company_list = get_company_list()
print(f"company_list {company_list}")
company = get_company_name_in_kb(company, company_list)
if "Cannot find" in company or "Database is empty" in company:
raise ValueError(f"Company not found in knowledge base: {company}")
print(f"Company: {company}")
return company
def get_embedder():
if TEI_EMBEDDING_ENDPOINT:
# create embeddings using TEI endpoint service
# Huggingface API token for TEI embedding endpoint
HUGGINGFACEHUB_API_TOKEN = os.getenv("HUGGINGFACEHUB_API_TOKEN", "")
assert HUGGINGFACEHUB_API_TOKEN, "HuggingFace API token is required for TEI embedding endpoint."
embedder = HuggingFaceEndpointEmbeddings(model=TEI_EMBEDDING_ENDPOINT)
else:
# create embeddings using local embedding model
embedder = HuggingFaceBgeEmbeddings(model_name=EMBED_MODEL)
return embedder
def generate_answer(prompt):
"""Use vllm endpoint to generate the answer."""
# send request to vllm endpoint
client = OpenAI(
base_url=f"{LLM_ENDPOINT}/v1",
api_key="token-abc123",
)
params = {
"max_tokens": MAX_TOKENS,
"temperature": TEMPERATURE,
}
completion = client.chat.completions.create(
model=LLM_MODEL, messages=[{"role": "user", "content": prompt}], **params
)
# get response
response = completion.choices[0].message.content
print(f"LLM Response: {response}")
return response
def parse_response(response):
if "{" in response:
ret = response.split("{")[1].split("}")[0]
else:
ret = ""
return ret
def get_company_list():
kvstore = RedisKVStore(redis_uri=REDIS_URL_KV)
company_list_dict = kvstore.get("company", "company_list")
if company_list_dict:
company_list = company_list_dict["company"]
return company_list
else:
return []
def get_company_name_in_kb(company, company_list):
if not company_list:
return "Database is empty."
company = company.upper()
if company in company_list:
return company
prompt = COMPANY_NAME_PROMPT.format(company_list=company_list, company=company)
response = generate_answer(prompt)
if "NONE" in response.upper():
return f"Cannot find {company} in knowledge base."
else:
ret = parse_response(response)
if ret:
return ret
else:
return "Failed to parse LLM response."
def get_docs_matching_metadata(metadata, collection_name):
"""
metadata: ("company_year", "3M_2023")
docs: list of documents
"""
key = metadata[0]
value = metadata[1]
kvstore = RedisKVStore(redis_uri=REDIS_URL_KV)
collection = kvstore.get_all(collection_name) # collection is a dict
matching_docs = []
for idx in collection:
doc = collection[idx]
if doc["metadata"][key] == value:
print(f"Found doc with matching metadata {metadata}")
print(doc["metadata"]["doc_title"])
matching_docs.append(doc)
print(f"Number of docs found with search_metadata {metadata}: {len(matching_docs)}")
return matching_docs
def convert_docs(docs):
# docs: list of dicts
converted_docs_content = []
converted_docs_summary = []
for doc in docs:
content = doc["content"]
# convert content to Document object
metadata = {"type": "content", **doc["metadata"]}
converted_content = Document(id=doc["metadata"]["doc_id"], page_content=content, metadata=metadata)
# convert summary to Document object
metadata = {"type": "summary", "content": content, **doc["metadata"]}
converted_summary = Document(id=doc["metadata"]["doc_id"], page_content=doc["summary"], metadata=metadata)
converted_docs_content.append(converted_content)
converted_docs_summary.append(converted_summary)
return converted_docs_content, converted_docs_summary
def bm25_search(query, metadata, company, doc_type="chunks", k=10):
collection_name = f"{doc_type}_{company}"
print(f"Collection name: {collection_name}")
docs = get_docs_matching_metadata(metadata, collection_name)
if docs:
docs_text, docs_summary = convert_docs(docs)
# BM25 search over content
retriever = BM25Retriever.from_documents(docs_text, k=k)
docs_bm25 = retriever.invoke(query)
print(f"BM25: Found {len(docs_bm25)} docs over content with search metadata: {metadata}")
# BM25 search over summary/title
retriever = BM25Retriever.from_documents(docs_summary, k=k)
docs_bm25_summary = retriever.invoke(query)
print(f"BM25: Found {len(docs_bm25_summary)} docs over summary with search metadata: {metadata}")
results = docs_bm25 + docs_bm25_summary
else:
results = []
return results
def bm25_search_broad(query, company, year, quarter, k=10, doc_type="chunks"):
# search with company filter, but query is query_company_quarter
metadata = ("company", f"{company}")
query1 = f"{query} {year} {quarter}"
docs1 = bm25_search(query1, metadata, company, k=k, doc_type=doc_type)
# search with metadata filters
metadata = ("company_year_quarter", f"{company}_{year}_{quarter}")
print(f"BM25: Searching for docs with metadata: {metadata}")
docs = bm25_search(query, metadata, company, k=k, doc_type=doc_type)
if not docs:
print("BM25: No docs found with company, year and quarter filter, only search with company and year filter")
metadata = ("company_year", f"{company}_{year}")
docs = bm25_search(query, metadata, company, k=k, doc_type=doc_type)
if not docs:
print("BM25: No docs found with company and year filter, only search with company filter")
metadata = ("company", f"{company}")
docs = bm25_search(query, metadata, company, k=k, doc_type=doc_type)
docs = docs + docs1
if docs:
return docs
else:
return []
def set_filter(metadata_filter):
# metadata_filter: tuple of (key, value)
from redisvl.query.filter import Text
key = metadata_filter[0]
value = metadata_filter[1]
filter_condition = Text(key) == value
return filter_condition
def similarity_search(vector_store, k, query, company, year, quarter=None):
query1 = f"{query} {year} {quarter}"
filter_condition = set_filter(("company", company))
docs1 = vector_store.similarity_search(query1, k=k, filter=filter_condition)
print(f"Similarity search: Found {len(docs1)} docs with company filter and query: {query1}")
filter_condition = set_filter(("company_year_quarter", f"{company}_{year}_{quarter}"))
docs = vector_store.similarity_search(query, k=k, filter=filter_condition)
if not docs: # if no relevant document found, relax the filter
print("No relevant document found with company, year and quarter filter, only search with company and year")
filter_condition = set_filter(("company_year", f"{company}_{year}"))
docs = vector_store.similarity_search(query, k=k, filter=filter_condition)
if not docs: # if no relevant document found, relax the filter
print("No relevant document found with company_year filter, only search with company.....")
filter_condition = set_filter(("company", company))
docs = vector_store.similarity_search(query, k=k, filter=filter_condition)
print(f"Similarity search: Found {len(docs)} docs with filter and query: {query}")
docs = docs + docs1
if not docs:
return []
else:
return docs
def get_index_name(doc_type: str, metadata: dict):
company = metadata["company"]
if doc_type == "chunks":
index_name = f"chunks_{company}"
elif doc_type == "tables":
index_name = f"tables_{company}"
elif doc_type == "titles":
index_name = f"titles_{company}"
elif doc_type == "full_doc":
index_name = f"full_doc_{company}"
else:
raise ValueError("doc_type should be either chunks, tables, titles, or full_doc.")
return index_name
def get_content(doc):
# doc can be converted doc
# of saved doc in vector store
if "type" in doc.metadata and doc.metadata["type"] == "summary":
print("BM25 retrieved doc...")
content = doc.metadata["content"]
elif "type" in doc.metadata and doc.metadata["type"] == "content":
print("BM25 retrieved doc...")
content = doc.page_content
else:
print("Dense retriever doc...")
doc_id = doc.metadata["doc_id"]
# doc_summary=doc.page_content
kvstore = RedisKVStore(redis_uri=REDIS_URL_KV)
collection_name = get_index_name(doc.metadata["doc_type"], doc.metadata)
result = kvstore.get(doc_id, collection_name)
content = result["content"]
# print(f"***Doc Metadata:\n{doc.metadata}")
# print(f"***Content: {content[:100]}...")
return content
def get_unique_docs(docs):
results = []
context = ""
i = 1
for doc in docs:
content = get_content(doc)
if content not in results:
results.append(content)
doc_title = doc.metadata["doc_title"]
ret_doc = f"Doc [{i}] from {doc_title}:\n{content}\n"
context += ret_doc
i += 1
print(f"Number of unique docs found: {len(results)}")
return context
def get_vectorstore(index_name):
config = RedisConfig(
index_name=index_name,
redis_url=REDIS_URL_VECTOR,
metadata_schema=[
{"name": "company", "type": "text"},
{"name": "year", "type": "text"},
{"name": "quarter", "type": "text"},
{"name": "doc_type", "type": "text"},
{"name": "doc_title", "type": "text"},
{"name": "doc_id", "type": "text"},
{"name": "company_year", "type": "text"},
{"name": "company_year_quarter", "type": "text"},
],
)
embedder = get_embedder()
vector_store = RedisVectorStore(embedder, config=config)
return vector_store
def get_vectorstore_titles(index_name):
config = RedisConfig(
index_name=index_name,
redis_url=REDIS_URL_VECTOR,
metadata_schema=[
{"name": "company", "type": "text"},
{"name": "year", "type": "text"},
{"name": "quarter", "type": "text"},
{"name": "doc_type", "type": "text"},
{"name": "doc_title", "type": "text"},
{"name": "company_year", "type": "text"},
{"name": "company_year_quarter", "type": "text"},
],
)
embedder = get_embedder()
vector_store = RedisVectorStore(embedder, config=config)
return vector_store