Chatbot using llama2 LLM and Django in ten steps

Victor Yeo
3 min read3 days ago

--

This article describes a simple and fast way of creating a chatbot using llama2 LLM and Django. We will download an opensource llama2 model from HuggingFace and run it on the laptop’s CPU. Yes, CPU (Nvidia GPU is not required). The Langchain framework is used extensively in the chatbot coding.

Most beautifully, the whole thing can be completed in ten steps.

Firstly, create a new folder, cd to the new folder, and start a new Django project.

django-admin startproject <proj_name> .

Notice the dot at the end. It means we start the django project in the current directory. We can use djangoproj as the <proj_name>.

Secondly, create a new django app.

django-admin startapp <app_name>

Let’s use djangoapp as the <app_name>.
Thirdly, modify the urls.py in <proj_name>, add the djangoapp.urls to it.

urlpatterns = [
path(‘’, include(‘djangoapp.urls’)),
path(‘admin/’, admin.site.urls),
]

Fourly, modify the settings.py in <proj_name>, add the djangoapp to the installed apps.

INSTALLED_APPS = [
.....
‘djangoapp’, # Add the custom app to the list of installed apps
]

5th step, setup the urls.py in djangoapp by adding the path.

urlpatterns = [
path('', views.index, name="index"),
path('db_status/', views.db_status, name='db_status'),
path('build_db/', views.build_db, name='build_db'),
]

6th step, add the code to views.py in djangoapp. The purpose is to display an index page when browser reads the djangoapp, and if chat button is clicked, calls the answer_query function.

def index(request):
if request.method == 'POST':
query = request.POST.get('query')
result = answer_query(query)
resultJson = JsonResponse(result)
return resultJson
return render(request, 'djangoapp/index.html')

7th step, add the index.html file. We use standard HTML code to present the query and answer interface. There is embedded javascript in the html file to process the query by connecting the query with urls.py and views.py of the djangoapp.

8th step, create a new file called logic.py, add the llama2 code to it. The Chroma instance is initialised with the embeddings. We use LlamaCpp to create llm instance.

def answer_query(query):
model_name = “sentence-transformers/all-mpnet-base-v2”
model_kwargs = {“device”: “cpu”}
embeddings = HuggingFaceEmbeddings(
model_name=model_name,
model_kwargs=model_kwargs
)
db = Chroma(
collection_name="web_docs",
embedding_function=embeddings,
persist_directory=CHROMA_DB_DIRECTORY
)
llm = LlamaCpp(
model_path="djangoapp/models/llama-2-7b-chat.Q4_0.gguf",
n_gpu_layers=40,
n_batch=512, # Batch size for model processing
n_ctx=2048, # Context size
verbose=False, # enable detailed logging or not
)
...

chain = RetrievalQAWithSourcesChain.from_chain_type(
llm=llm,
chain_type="stuff",
retriever=db.as_retriever(),
chain_type_kwargs={"prompt": prompt}
)

Then, we reach the important part of the code where we connect the chroma and llm using RetrievalQAWithSourcesChain.
The stuffing method used above is a way to summarize text by feeding the entire document to a LLM in a single call.

9th step, go to https://huggingface.co/TheBloke/Llama-2-7B-Chat-GGUF and download the llama-2–7b-chat.Q4_0.gguf llama2 model. The model file is more than 3GB in size.

10th step, run pip install for the python packages used, migrate the data and run the app!

pip install .....                      # install the Python packages
python manage.py migrate # run one time only to migrate data
python manage.py runserver # run the Django app

Afterthoughts:
This article is highly simplified. There are many variations of using the llama2 model with chroma vectored database.

The source code can be found at
https://github.com/victoryeo/django-llama.git

The screenshot of the chatBot:

Reference:
https://python.plainenglish.io/django-langchained-e53aab3ad6bf
https://github.com/GoogleCloudPlatform/generative-ai/blob/main/language/use-cases/document-summarization/summarization_large_documents_langchain.ipynb

--

--