If you have used OpenAI and would like to finetune the model on your own data.

Let us setup the environment first

pip3 install openai
pip3 install llama-index 
pip3 install pypdf 
pip3 install gradio 

To train my custom document, in this case, I have downloaded the documment "Getting ROIC right" from EY website, and posted a few questions:

Q1: Why is Return on Invested Capital Important?

Question_1

Q2: Can you explain what a liability is? Can you give me a full list in bullet points?

Question_2

Q3: Your role is investor and your task is to calculate ROIC. Also give me an example with actual numbers.

Question_2

Now, lets talk about the code.

My folder structure looks like this, and I have copied my pdf in training-data folder

training-data

Folder_Structure

In the above image, I have my code in custom_chat.py. There are two other folders created by llama-index, the indexes folder that it built using your pdf content and flagged folder created by gradio.

Now the code part:

from llama_index import GPTVectorStoreIndex, SimpleDirectoryReader, LLMPredictor, ServiceContext, StorageContext, load_index_from_storage
from langchain import OpenAI
import gradio
import os

os.environ["OPENAI_API_KEY"] = 'INSERT_KEY_HERE'

def construct_index(directory_path):
    # set number of output tokens
    num_outputs = 256

    _llm_predictor = LLMPredictor(llm=OpenAI(temperature=0.5, model_name="gpt-3.5-turbo", max_tokens=num_outputs))

    service_context = ServiceContext.from_defaults(llm_predictor=_llm_predictor)

    docs = SimpleDirectoryReader(directory_path).load_data()

    index = GPTVectorStoreIndex.from_documents(docs, service_context=service_context)

    #Directory in which the indexes will be stored
    index.storage_context.persist(persist_dir="indexes")

    return index

def chatbot(input_text):

    # rebuild storage context
    storage_context = StorageContext.from_defaults(persist_dir="indexes")

    #load indexes from directory using storage_context 
    query_engine = load_index_from_storage(storage_context).as_query_engine()

    response = query_engine.query(input_text)

    #returning the response
    return response.response

#Creating the web UIusing gradio
iface = gradio.Interface(fn=chatbot,
                     inputs=gradio.Textbox(lines=5, label="Enter your question here"),
                     outputs="text",
                     title="Custom-trained AI Chatbot")

#Constructing indexes based on the documents in traininData folder
#This can be skipped if you have already trained your app and need to re-run it
index = construct_index("training-data")

#launching the web UI using gradio
iface.launch(share=True)

Published

Category

LLM

Tags

Contact