We install the Postgres extension called pgvector, which enables tables to have columns of type vector where vector is a set of floats. In this example we use a 768-dimensional vector, i.e. a vector of length 768.
We create a table that will save the articles for our knowledge base — the text of each article, the title of the article, and a vector embedding of the article text. We name the table articles and the columns title, text, and embedding.
We extract the content at four Wikipedia URLs and separate the title and content for each.
We clean each article body, divide the text into chunks of 500 characters, and use an embedding model to create a 768-dimensional vector from each chunk. The vector is a numerical representation (a float) of the meaning of the chunk of text.
We save the title, a chunk from the body, and the embedding vector for the chunk in a row of the database. For each article, there are as many vectors as there are chunks.
We index the vector column for similarity search in Part 2.
import psycopg2
from sentence_transformers import SentenceTransformer
import requests
from bs4 import BeautifulSoup
import re
import ollama
# Your connection params here
MY_DB_HOST = ‘localhost’
MY_DB_PORT = 5432
MY_DB_NAME = ‘nitin’
MY_DB_USER = ‘nitin’
MY_DB_PASSWORD = ”
# Set up the database connection
conn = psycopg2.connect(
host=MY_DB_HOST,
port=MY_DB_PORT,
dbname=MY_DB_NAME,
user=MY_DB_USER,
password=MY_DB_PASSWORD
)
cur = conn.cursor()
# Create the articles table with the pgvector extension
# If the pgvector extension is not installed on your machine it will need to be installed.
# See https://github.com/pgvector/pgvector or cloud instances with pgvector installed.
# First create the pgvector extension, then a table with a 768 dim vector column for embeddings.
# Note that the title and full text of the article is also saved with the embedding.
# This allows vector similarity search on the embedding column, returning matched text
# along with matched embeddings depending on what is needed.
# After this SQL command is executed we will have
# a) a pgvector extension installed if it did not already exist
# b) an empty table with a column of type vector along with two columns,
# one to save the title of the article and one to save a chunk of text.
# Postgres does not put a limit on the number of dimensions in pgvector embeddings.
# It is worth experimenting with larger lengths but note they need to match the length of embeddings
# created by the model you use. Embeddings of ~1k, 2k, or more dimensions are common among embeddings APIs.
cur.execute(”’
CREATE EXTENSION IF NOT EXISTS vector;
DROP TABLE IF EXISTS articles;
CREATE TABLE articles (
id SERIAL PRIMARY KEY,
title TEXT,
text TEXT,
embedding VECTOR(768)
);
”’)
conn.commit()
# Below are the sources of content for creating embeddings to be inserted in our demo vector db.
# Feel free to add your own links but note that different sources other than Wikipedia may
# have different junk characters and may require different pre-processing.
# As a start try other Wikipedia pages, then expand to other sources.
urls= [
‘https://en.wikipedia.org/wiki/Pentax_K-x’,
‘https://en.wikipedia.org/wiki/2008_Tour_de_France’,
‘https://en.wikipedia.org/wiki/Onalaska,_Texas’,
‘https://en.wikipedia.org/wiki/List_of_extinct_dog_breeds’
]
# Fetch the HTML at a given link and extract only the text, separating title and content.
# We will use this text to extract content from Wikipedia articles to answer queries.
def extract_title_and_content(url):
try:
response = requests.get(url)
if response.status_code == 200: # success
# Create a BeautifulSoup object to parse the HTML content
soup = BeautifulSoup(response.content, ‘html.parser’)
# Extract the title of the page
title = soup.title.string.strip() if soup.title else “”
# Extract the text content from the page
content = soup.get_text(separator=” “)
return {“title”: title, “text”: content}
else:
print(f”Failed to retrieve content from {url}. Status code: {response.status_code}”)
return None
except requests.exceptions.RequestException as e:
print(f”Error occurred while retrieving content from {url}: {str(e)}”)
return None
# Create the embedding model
# This is the model we use to generate embeddings, i.e. to encode text chunks into numeric vectors of floats.
# Sentence Transformers (sbert.net) is a collection of transformer models designed for creating embeddings
# from sentences. These are trained on data sets used for different applications. We use one tuned for Q&A,
# hence the ‘qa’ in the name. There are other embedding models, some tuned for speed, some for breadth, etc.
# The site sbert.net is worth studying for picking the right model for other uses. It’s also worth looking
# at the embedding models of providers like OpenAI, Cohere, etc. to learn the differences, but note that
# the use of an online model involves a potential loss of privacy.
embedding_model = SentenceTransformer(‘multi-qa-mpnet-base-dot-v1′)
articles = []
embeddings = []
# Extract title,content from each URL and store it in the list.
for url in urls:
article = extract_title_and_content(url)
if article:
articles.append(article)
for article in articles:
raw_text = article[“text”]
# Pre-processing: Replace large chunks of white space with a space, eliminate junk characters.
# This will vary with each source and will need custom cleanup.
text = re.sub(r’\s+’, ‘ ‘, raw_text)
text = text.replace(“]”, “”).replace(“[“, “”)
# chunk into 500 character chunks, this is a typical size, could be lower if total size of article is small.
chunks = [text[i:i + 500] for i in range(0, len(text), 500)]
for chunk in chunks:
# This is where we invoke our model to generate a list of floats.
# The embedding model returns a numpy ndarray of floats.
# Psycopg coerces the list into a vector for insertion.
embedding = embedding_model.encode([chunk])[0].tolist()
cur.execute(”’
INSERT INTO articles (title, text, embedding)
VALUES (%s, %s, %s); ”’, (article[“title”], chunk, embedding)
)
embeddings.append(embedding)
conn.commit()
# Create an index
# pgvector allows different indexes for similarity search.
# See the docs in the README at https://github.com/pgvector/pgvector for detailed explanations.
# Here we use ‘hnsw’ which is an index that assumes a Hierarchical Network Small Worlds model.
# HNSW is a pattern seen in network models of language. Hence this is one of the indexes
# that is expected to work well for language embeddings. For this small demo it will probably not
# make much of a difference which index you use, and the others are also worth trying.
# The parameters provided in the ‘USING’ clause are ’embedding vector_cosine_ops’
# The first, ’embedding’ in this case, needs to match the name of the column which holds embeddings.
# The second, ‘vector_cosine_ops’, is the operation used for similarity search i.e. cosine similarity.
# The same README doc on GitHub gives other choices but for most common uses it makes little difference
# hence cosine similarity is used as our default.
cur.execute(”’
CREATE INDEX ON articles USING hnsw (embedding vector_cosine_ops);
”’)
conn.commit()
cur.close()
conn.close()
# End of file
Part 2. Retrieve context from the vector database and query the LLM
In part 2 we ask a natural language question of our knowledge base, using similarity search to find a context and using an LLM (in this case Meta’s Llama 3) to generate an answer to the question in the provided context. The steps:
We encode our natural language query as a vector using the same embedding model we used to encode the chunks of text we extracted from the Wikipedia pages.
We perform a similarity search on this vector using a SQL query. Similarity, or specifically cosine similarity, is a way to find the vectors in our database that are nearest to the vector query. Once we find the nearest vectors, we can use them to retrieve the corresponding text that is saved with each vector. That’s the context for our query to the LLM.
We append this context to our natural language query text, explicitly telling the LLM that the provided text is to be taken as the context for answering the query.
We use a programmatic wrapper around Ollama to pass the natural language query and contextual text to the LLM’s request API and fetch the response. We submit three queries, and we receive the answer in context for each query. An example screenshot for the first query is shown below.
IDG