Langchain rag with Elasticsearch

利用 langchain 和 Elasticsearch 进行 rag, 对 Nvidia 的年报分析。PDF 地址：2024annualreport。

部分 package 版本:

langchain==0.2.0
elasticsearch=8.0.0

对应 Elasticsearch 服务是直接拉取的，版本是 8.12.2。

 docker pull docker.elastic.co/elasticsearch/elasticsearch:8.12.2
docker pull docker.elastic.co/kibana/kibana:8.12.2

1. 对 pdf 进行加载

 from langchain_community.document_loaders import PyPDFLoader
 
loader = PyPDFLoader(r"./a2024annualreport.pdf")
pages = loader.load_and_split()

2. 构建 es embedding 检索

 from langchain.embeddings import HuggingFaceEmbeddings
from langchain.vectorstores import ElasticKnnSearch, ElasticsearchStore
from elasticsearch import Elasticsearch
 
 
finger = "CE:xx"
es = Elasticsearch(
    "https://xx:9200",
    ssl_assert_fingerprint=finger,
    basic_auth=("elastic", "xx"))
embeddings = HuggingFaceEmbeddings(model_name="./all-MiniLM-L6-v2")
esVectorStore = ElasticsearchStore.from_documents(
    pages,
    es_connection=es,
    index_name="annualreport",
    embedding=embeddings,
    strategy=ElasticsearchStore.ExactRetrievalStrategy())
 
retriever = esVectorStore.as_retriever()
query = "What are the year-over-year changes in the company's revenue and profit?"
print(esVectorStore.similarity_search(query))

3. 利用 LLM 回答

这里使用的是 deepseek, 便宜呀！

 from langchain_openai import ChatOpenAI
llms = ChatOpenAI(base_url="https://api.deepseek.com/v1",
    api_key=deepseek,
    model="deepseek-chat")
 
from langchain.chains import create_retrieval_chain
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain_core.prompts import ChatPromptTemplate
 
system_prompt = (
    "You are an assistant for question-answering tasks."
    "Use the following pieces of retrieved context to answer"
    "the question. If you don't know the answer, say that you ""don't know.""\n\n"
    "{context}"
)
 
prompt = ChatPromptTemplate.from_messages(
    [
        ("system", system_prompt),
        ("human", "{input}"),
    ]
)
 
question_answer_chain = create_stuff_documents_chain(llms, prompt)
rag_chain = create_retrieval_chain(retriever, question_answer_chain)
 
response = rag_chain.invoke({"input": query})
response["answer"]

答案如下, 相应的答案在 24 页。

这里只简单示例单个 pdf，可以多个 pdf 也同样创建。并且还可以利用 es 的关键字检索和 embedding 检索结合来获取更准确的检索结果。这里就不一一展示了。

  The year-over-year changes in the company's revenue and profit are as follows:
 
- Full-year revenue increased by 126% to a record $60.9 billion.
- GAAP earnings per diluted share was $11.93, up 586% from a year ago.
- Non-GAAP earnings per diluted share were $12.96, up 288% from a year ago.
- Non-GAAP gross margin was 73.8%.
 
These figures indicate significant growth in both revenue and profit for the company compared to the previous year.

正文完

发表至： langchain

2024-05-20

转载说明：除特殊说明外本站文章皆由CC-4.0协议发布，转载请联系tensortimes@gmail.com。

2.langchain系列2——使用gemini pro和streamlit构建问答应用

1.langchain系列1——pdf问答

3.langchain系列3——chatglm3量化和rag四大名著进行问答

4.langchain系列4——基于gradio的多类型文档的问答demo

大模型Baichuan、ChatGLM、LLama、Qwen总结

Langchain rag with Elasticsearch

1. 对 pdf 进行加载

2. 构建 es embedding 检索

3. 利用 LLM 回答

Cursor Free VIP 工具 0.48.x 版本全面介绍

2025年最新高质量Agent项目全面报告

Krillin AI: 一站式视频本地化与增强解决方案

	docker pull docker.elastic.co/elasticsearch/elasticsearch:8.12.2
	docker pull docker.elastic.co/kibana/kibana:8.12.2

	from langchain_community.document_loaders import PyPDFLoader

	loader = PyPDFLoader(r"./a2024annualreport.pdf")
	pages = loader.load_and_split()

	from langchain.embeddings import HuggingFaceEmbeddings
	from langchain.vectorstores import ElasticKnnSearch, ElasticsearchStore
	from elasticsearch import Elasticsearch


	finger = "CE:xx"
	es = Elasticsearch(
	"https://xx:9200",
	ssl_assert_fingerprint=finger,
	basic_auth=("elastic", "xx"))
	embeddings = HuggingFaceEmbeddings(model_name="./all-MiniLM-L6-v2")
	esVectorStore = ElasticsearchStore.from_documents(
	pages,
	es_connection=es,
	index_name="annualreport",
	embedding=embeddings,
	strategy=ElasticsearchStore.ExactRetrievalStrategy())

	retriever = esVectorStore.as_retriever()
	query = "What are the year-over-year changes in the company's revenue and profit?"
	print(esVectorStore.similarity_search(query))

	from langchain_openai import ChatOpenAI
	llms = ChatOpenAI(base_url="https://api.deepseek.com/v1",
	api_key=deepseek,
	model="deepseek-chat")

	from langchain.chains import create_retrieval_chain
	from langchain.chains.combine_documents import create_stuff_documents_chain
	from langchain_core.prompts import ChatPromptTemplate

	system_prompt = (
	"You are an assistant for question-answering tasks."
	"Use the following pieces of retrieved context to answer"
	"the question. If you don't know the answer, say that you ""don't know.""\n\n"
	"{context}"
	)

	prompt = ChatPromptTemplate.from_messages(
	[
	("system", system_prompt),
	("human", "{input}"),
	]
	)

	question_answer_chain = create_stuff_documents_chain(llms, prompt)
	rag_chain = create_retrieval_chain(retriever, question_answer_chain)

	response = rag_chain.invoke({"input": query})
	response["answer"]

	The year-over-year changes in the company's revenue and profit are as follows:

	- Full-year revenue increased by 126% to a record $60.9 billion.
	- GAAP earnings per diluted share was $11.93, up 586% from a year ago.
	- Non-GAAP earnings per diluted share were $12.96, up 288% from a year ago.
	- Non-GAAP gross margin was 73.8%.

	These figures indicate significant growth in both revenue and profit for the company compared to the previous year.