1. Q&A over Documents
先看一个例子,对于 OutdoorClothingCatalog_1000.csv
CSV 文档,我们用 VectorstoreIndexCreator 构建向量索引后就能查询对应的内容了。这是怎么实现的呢?
from langchain.chains import RetrievalQA | |
from langchain.chat_models import ChatOpenAI | |
from langchain.document_loaders import CSVLoader | |
from langchain.vectorstores import DocArrayInMemorySearch | |
from IPython.display import display, Markdown | |
from langchain.indexes import VectorstoreIndexCreator | |
#1. 导入 csv | |
file = 'OutdoorClothingCatalog_1000.csv' | |
loader = CSVLoader(file_path=file) | |
#2. 构建关于文档的向量索引 | |
index = VectorstoreIndexCreator(vectorstore_cls=DocArrayInMemorySearch)\ | |
.from_loaders([loader]) | |
query ="Please list all your shirts with sun protection \ | |
in a table in markdown and summarize each one." | |
respose = index.query(query) | |
display(Markdown(response)) |
其实对于文档而言经过 LLM 编码后就是一个向量,然后我们通过返回与 query 自身向量相近的向量 top- k 个向量来实现这个查询功能。

但是一般而言文档都是很大的,或者说我们有很多类似的文档。这时需要分割成小的 chunks,再经过 LLM 得到对应的 embedding 向量。然后去查找与 query 向量相近的 top- k 个向量,这样就找到了相近的 chunks。这时,返回这些 chunks 就好了。

有了向量数据库,怎么使用呢?对于文档向量数据库的使用方法有下面几种:
- 直接用 index 来查询,
from langchain.embeddings import OpenAIEmbeddings | |
embeddings = OpenAIEmbeddings() | |
file = 'OutdoorClothingCatalog_1000.csv' | |
loader = CSVLoader(file_path=file) | |
docs = loader.load() | |
llm = ChatOpenAI(temperature = 0.0) | |
index = VectorstoreIndexCreator(vectorstore_cls=DocArrayInMemorySearch)\ | |
.from_loaders([loader]) | |
response = index.query(query, llm=llm) |
db.similarity_search(query)
:直接使用 similarity_search 搜索
db = DocArrayInMemorySearch.from_documents( | |
docs, | |
embeddings | |
) | |
query = "Please suggest a shirt with sunblocking" | |
docs = db.similarity_search(query) | |
print(docs) |
- db 作为检索器
retriever = db.as_retriever() | |
qdocs = "".join([docs[i].page_content for i in range(len(docs))]) | |
response = llm.call_as_llm(f"{qdocs} Question: Please list all your \ | |
shirts with sun protection in a table in markdown and summarize each one.") |
RetrievalQA
使用 chain 来搜索答案
qa_stuff = RetrievalQA.from_chain_type( | |
llm=llm, | |
chain_type="stuff", | |
retriever=retriever, | |
verbose=True | |
) | |
response = qa_stuff.run(query) |
还有其它方法, 查看 。

2. Evaluation
langchain.debug = True
检查 qa 过程是否正确QAEvalChain
验证生成答案和实际答案是否一致。
from langchain.evaluation.qa import QAEvalChain | |
llm = ChatOpenAI(temperature=0) | |
eval_chain = QAEvalChain.from_llm(llm) | |
graded_outputs = eval_chain.evaluate(examples, predictions) | |
for i, eg in enumerate(examples): | |
print(f"Example {i}:") | |
print("Question:" + predictions[i]['query']) | |
print("Real Answer:" + predictions[i]['answer']) | |
print("Predicted Answer:" + predictions[i]['result']) | |
print("Predicted Grade:" + graded_outputs[i]['text']) | |
print() |
3. Agents
一般步骤: 具体查看 agent 官方文档 。
- 实例化 agent 和 tools 加载哪些领域知识
- 初始化 agent
- agent.run(question),来回答问题。
from langchain.agents.agent_toolkits import create_python_agent | |
from langchain.agents import load_tools, initialize_agent | |
from langchain.agents import AgentType | |
from langchain.tools.python.tool import PythonREPLTool | |
from langchain.chat_models import ChatOpenAI | |
llm = ChatOpenAI(temperature=0) | |
tools = load_tools(["wikipedia", "llm-math"], llm=llm) | |
agent = initialize_agent(tools, llm, | |
agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION, | |
verbose=True) | |
agent.run("Which country is in the northest?") | |
#agent.run("How to solve the center gravity of a triangle?") |
create_python_agent
构建自己的机器人助手,比如对顾客姓名排序。
agent = create_python_agent( | |
llm, | |
tool=PythonREPLTool(), | |
verbose=True | |
) | |
customer_list = [["Harrison", "Chase"], | |
["Lang", "Chain"], | |
["Dolly", "Too"], | |
["Elle", "Elem"], | |
["Geoff","Fusion"], | |
["Trance","Former"], | |
["Jen","Ayai"] | |
] | |
agent.run(f"""Sort these customers by \ | |
last name and then first name \ | |
and print the output: {customer_list}""") |
@tool
装饰器可以构建自己的工具如:
from langchain.agents import tool | |
from datetime import date | |
@tool | |
def time(text: str) -> str: | |
"""Returns todays date, use this for any \ | |
questions related to knowing todays date. \ | |
The input should always be an empty string, \ | |
and this function will always return todays \ | |
date - any date mathmatics should occur \ | |
outside this function.""" | |
return str(date.today()) | |
agent= initialize_agent(tools + [time], | |
llm, | |
agent=AgentType.CHAT_ZERO_SHOT_REACT_DESCRIPTION, | |
handle_parsing_errors=True, | |
verbose = True) |
本门课程主要讲解了 langchain 的使用,主要包括:
- Models, Prompts and parsers
- Memory
- Chains
- QA
- Evaluation
- Agents
正文完