Building An OpenAI GPT with Your API: A Step-by-Step Guide

Alexander Sniffin
13 min readNov 12, 2023

--

At OpenAI’s first DevDay in 2023, a new product called GPTs were announced. These GPTs offer a quick and easy way to build a ChatGPT extension through a “no-code” platform which greatly simplifies the development of complex multi-modal chat bots.

We’re rolling out custom versions of ChatGPT that you can create for a specific purpose — called GPTs. GPTs are a new way for anyone to create a tailored version of ChatGPT to be more helpful in their daily life, at specific tasks, at work, or at home — and then share that creation with others. For example, GPTs can help you learn the rules to any board game, help teach your kids math, or design stickers.

Anyone can easily build their own GPT — no coding is required. You can make them for yourself, just for your company’s internal use, or for everyone. Creating one is as easy as starting a conversation, giving it instructions and extra knowledge, and picking what it can do, like searching the web, making images or analyzing data. Try it out at chat.openai.com/create.

Let’s explore what this means by going over the existing functionality and concepts. Then by building our own GPT and how to add both an application programming interface (API) and custom knowledge!

GPTs Overview

To start, let’s review the existing features of what we can do with GPTs and create a simple GPT before moving onto a more advanced one using an API.

User Interface

The UI for GPTs is simple and can be made completely from your browser.

GPT Builder Home Screen

It has two components, a GPT Builder that allows you to communicate what you want to do, almost like a GPT for building GPT’s but also a more manual configuration option.

It’s designed to be easy to use and requires “no-code” but it does provide more complex functionality by giving developers the ability to upload their own knowledge and provide API’s as Actions.

GPTs are a multi-model copy of ChatGPT. They have support for vision, DALL-E, and tools like web browsing, a code interpreter using Python and custom actions that use public API’s.

This is very similar concept to what has stemmed from open-source projects like Agents which LangChain, a popular framework for building LLM applications describes as the following:

The core idea of agents is to use a language model to choose a sequence of actions to take. In chains, a sequence of actions is hardcoded (in code). In agents, a language model is used as a reasoning engine to determine which actions to take and in which order.

OpenAI has abstracted the building of Agents with GPTs without any programming. They also provide a similar developer API known as Assistants that give more flexibility into building complex applications like GPTs.

Store

GPTs can be publicly shared through the GPT Store, a way to discover and share your creations with other users.

GPT Store

Simple GPT Example

Let’s build a simple GPT with no added knowledge or actions. Luckily, OpenAI has built a lot of functionality that handles the “magic” on how GPTs work and how they can extend ChatGPT. This means creating a GPT takes only a few minutes.

Before, building something similar to GPTs required programming a conversational bot with lots of complexities. Even using the OpenAI API, it still required a lot of understanding on how to use the Chat Completion API with tools like LangChain for building bots that could use tools or multiple models. This has been simplified and abstracted away allowing the quick development of advance conversational bots.

However, it’s important to note that this simplification comes with a trade-off in terms of reduced flexibility to more custom approaches.

Getting Started

Let’s demonstrate how to take advantage of the underlying multi-model architecture of ChatGPT.

We’ll call our GPT prototype “Reverse Fashion Search”, a GPT that allows users to upload images of an outfit where the vision model identifies the different clothing pieces then attempting to find those same clothing pieces online.

This can be done in a few minutes, something which would’ve previously taken a significant amount of effort.

Instructions Prompt

Lets start with our prompt which is the most important part to our GPT.

If you’re unfamiliar with prompts, checkout promptingguide.ai. A very useful resource for diving into prompting techniques used with language models like ChatGPT.

Prompt engineering is a relatively new discipline for developing and optimizing prompts to efficiently use language models (LMs) for a wide variety of applications and research topics. Prompt engineering skills help to better understand the capabilities and limitations of large language models (LLMs).

Put simply, a prompt is just the text that we instruct our LLM to follow.

Our prompt should be well structured but also lay out the stage for how the LLM should respond to messages.

ChatGPT works by prompting the LLM with a conversational format. The LLM will generate text on the input, it infers the meaning of the context based on its training data and the prompt. A very simple example of a conversation prompt might look like:

system: You are a helpful AI assistant.
user: Hi!
assistant: Hello, how can I help you today?

Each message in the conversation will continue the prompt until some stop word is reached or the token limit is reached. Lucky for us, we don’t need to worry about this because ChatGPT will handle this for our GPT, nice!

Writing the Reverse Fashion Search Prompt

When thinking about our instructions prompt, we’ll think of it as our system prompt like the example above, which sets the stage for how the language model will reason.

Here’s an example of what we can use with our Reverse Fashion Search GPT:

You're an AI assistant designed to help the user find similar clothing online by analyzing and identify clothing from example images. These images can be sourced from social media posts, user uploads like screenshots, etc. Your task involves detailed analysis and subsequent search for similar clothing items available for purchase.

Step-by-Step Process:
1. Image Acquisition:
- Request the user to provide an image. This can be a direct upload or a screenshot from social media platforms.
- Note: Inform the user that screenshots may be necessary for certain social media platforms that require login, as you cannot access these platforms directly.

2. Identifying the Subject:
- If the image contains multiple people, ask the user to specify whose clothing they are interested in.
- Proceed once the user identifies the subject of interest.

3. Detailed Clothing Analysis:
- Thoroughly describe each piece of clothing worn by the chosen subject in the image.
- Include details such as color, pattern, fabric type, style (e.g., v-neck, button-down), and any distinctive features (e.g., logos, embellishments).

4. Verification:
- Present the clothing description to the user for confirmation.
- If there are inaccuracies or missing details, ask the user to clarify or provide additional information.

5. Search and Present Options:
- Once the description is confirmed, begin web browsing for similar clothing items.
- Ask the user if they prefer to search for all items simultaneously or one at a time.
- Searched results can be direct links to a specific item or a search query to another site.
- For each item found, provide a direct purchase link for each line item, the link should be the entire summery of the item. e.g. "[- Amazon: A white t-shirt](link)"
- Try to provide a price if possible for each item

6. User Confirmation and Iteration:
- After presenting each find, ask the user to confirm if it matches their expectations.
- If the user is not satisfied, either adjust the search based on new input (repeat from step 5) or ask if they wish to start the process over with a new image.


Constraints:
- When asking the user questions, prompt in clear and simple to understand format, give the user a selection of options in a structured manner. e.g. "... Let me know if this correct, here are the next steps: - Search for all items - Search each item one at a time"
- Format your responses in HTML or MD to be easier to read
- Be concise when possible, remember the user is trying to find answers quickly
- Speak with emojis and be helpful, for example this would be an intro:
"""
# 🌟 Welcome to Your Fashion Search Assistant Powered by ChatGPT! 🌟

Hello! 👋 If you're looking to **find clothing items similar to those in a photo**, I'm here to help. 🛍️👗👔
### Getting Started is Easy:
1. **Upload an Image** 🖼️ or
2. **Provide a Screenshot** from a social media platform. 📱💻 🔍

**Remember:** If it's from a social media platform that requires login, a **screenshot** will be necessary. Let's embark on this fashion-finding journey together! 🚀
"""

This is a detailed prompt to instruct the LLM. It provides an overview on what is should do, a step-by-step process and constraints. It takes advantage of a few prompting techniques like few-shot prompt by providing an example on how to speak. It also provides some detailed reasoning steps for the model to follow.

Longer prompts can be a problem, this is because the GPT-4 model can only process so many tokens from both input and output tokens. Once again, this isn’t something we need to worry about because ChatGPT understands how to paraphrase, summarize and continue long running conversations. Even so, it’s important to know as the quality of the conversation will eventually degrade as more tokens are added and the conversation grows. This prompt has 522 words which is roughly 653 tokens. OpenAI provides a good example for the estimates on this where they describe a token as being about 3/4 of a word.

Knowledge & Capabilities

For this example it doesn’t need any extra knowledge, instead we will just give it access to the “Web Browsing” tool.

Result

A cool DALL-E 3 interpretation of the GPT

Once you finish adding your prompt and any conversation starters, you can save and publish your GPT! The finished result is a simple GPT where no programming was needed. It has vision capabilities, web browsing and GPT-4 for helping with reverse fashion searching, super cool and easy to do!

Demo

Demo 3x Speed

Here’s the link for giving it a try.

Action GPT Example

Extending our GPT is a fairly straight forward process. We can give our GPT API access to other systems not provided by OpenAI. We do this by using an action. Actions use Web API’s which allow ChatGPT to communicate by using a developer provided interface that gives it the ability to make requests over a network. The format of the interface uses OpenAPI specifications (previously known as Swagger), a standardized schema for sharing your API to others, for this case our GPT.

Imagine we want to modify our previous Reverse Fashion Search GPT to search only on our private website. This way we can disable web browsing and limit traffic to only our site. This is possible using an action and our own API, the GPT will have the ability to interface directly with our site, abstracting the usage of a website and adding the capabilities of the underlying LLM to help guide us.

API Specification

If you have an existing web service, you’ll need to make sure you can generate an OpenAPI specification. You’ll want to follow the official documentation on doing this if you don’t already have a specification generated. Alternatively, asking ChatGPT to generate a schema for an endpoint is possible too, just verify the schema and any constraints. This should be at least version 3.0 as of when this was published.

Mock API

For this example, I created a mock API to test with. This is an API that returns some clothing options for a hypothetical product company.

Actions need a valid domain in order to work. If this is a public API, you just add that in the servers field. If the API requires authentication, you can supply either an API key or OAuth credentials in the action configuration settings.

If you want to test local changes you can set up a network tunneling service like LocalTunnel. After installing their CLI it’s as easy as just running:

lt --port 8080

This will forward HTTP traffic to port 8080 on our localhost and give us a public address that can be used for our servers for quick local testing.

After you have an API setup with your specification you’ll want to create a new action.

Defining a GPT Action

Here’s an example of a JSON doc for a mock API of a product company:

{
"openapi": "3.0",
"info": {
"description": "API for interfacing with some product company",
"title": "Product Company API",
"contact": {
"name": "Admin",
"email": "admin@email.com"
},
"version": "1.0.0"
},
"servers": [
{
"url": "https://api.product-company.nice"
}
],
"paths": {
"/search": {
"get": {
"operationId": "searchProducts",
"description": "Allows users to search for clothing products by various criteria.",
"parameters": [
{
"name": "query",
"in": "query",
"description": "The search query string",
"required": true,
"schema": {
"type": "string"
}
},
{
"name": "priceRange",
"in": "query",
"description": "Filter by price range",
"required": false,
"schema": {
"type": "string"
}
},
{
"name": "brand",
"in": "query",
"description": "Filter by brand",
"required": false,
"schema": {
"type": "string"
}
},
{
"name": "sortBy",
"in": "query",
"description": "Sort results by a specific field",
"required": false,
"schema": {
"type": "string"
}
},
{
"name": "page",
"in": "query",
"description": "Page number for pagination",
"required": false,
"schema": {
"type": "integer"
}
},
{
"name": "limit",
"in": "query",
"description": "Number of items per page",
"required": false,
"schema": {
"type": "integer"
}
}
],
"responses": {
"200": {
"description": "OK",
"content": {
"application/json": {
"schema": {
"type": "array",
"items": {
"$ref": "#/components/schemas/Product"
}
}
}
}
},
"400": {
"description": "Bad Request",
"content": {
"application/json": {
"schema": {
"type": "object",
"properties": {
"error": {
"type": "string",
"description": "Incorrect request"
}
}
}
}
}
},
"500": {
"description": "Internal Server Error",
"content": {
"application/json": {
"schema": {
"type": "object",
"properties": {
"error": {
"type": "string",
"description": "Server error"
}
}
}
}
}
}
}
}
}
},
"components": {
"schemas": {
"Product": {
"type": "object",
"properties": {
"brand": {
"type": "string"
},
"category": {
"type": "string"
},
"color": {
"type": "string"
},
"description": {
"type": "string"
},
"id": {
"type": "string"
},
"name": {
"type": "string"
},
"price": {
"type": "number"
},
"size": {
"type": "string"
},
"url": {
"type": "string"
}
}
}
}
}
}

This will be validated and give us a breakdown on which actions are available. If you have a privacy link, you’ll need to add that in before publishing a public GPT.

Make sure that you specify the correct constraints and schema or the GPT will be more prone to making mistakes when making the API request.

Knowledge

One of the best features of GPTs is the ability to perform retrieval augmented generation (RAG) on documents which may not have been trained as part of the underlying model. GPTs can automatically consume this document context and abstract the usage completely away from anyone developing the GPT.

According to the Assistants API, which might give us a hint at how GPTs work, they say:

Once a file is uploaded and passed to the Assistant, OpenAI will automatically chunk your documents, index and store the embeddings, and implement vector search to retrieve relevant content to answer user queries.

This is very convenient as it removes some of the complexity but could also can be limiting as it doesn’t provide any control on how documents are split (chunked), indexed or how similarity search occurs with the vector store.

Nevertheless, for the most simple use-cases, this has been generalized. If your GPT has a complex dataset, you might want to consider implementing your own RAG as part of your action.

Demo

Demo 3x Speed

Awesome, it works! I can see the logs showing that the request came into my local server.

2023/11/12 14:29:16 Method: GET, URI: /search?query=high-waisted+white+skirt+button+detail&limit=5, Host: salty-bees-drum.loca.lt, RemoteAddr: [::1]:63445, UserAgent: Mozilla/5.0 AppleWebKit/537.36 (KHT
ML, like Gecko); compatible; ChatGPT-User/1.0; +https://openai.com/bot
2023/11/12 14:29:16 Query Parameter: query, Value: high-waisted white skirt button detail
2023/11/12 14:29:16 Query Parameter: limit, Value: 5

Custom Actions Example

If you’re interested in seeing a public GPT with Actions being used, checkout my Message In a Bottle GPT.

This GPT demonstrates a neat concept on how communication might work for multiple users by giving the ability to send messages (as bottles) to other users. A fun way to share a message or DALL-E generation to other people.

Message in a Bottle GPT

Publishing Your GPT

To publish your GPT to the GPT store you’ll need to update your builder profile. This requires registering your account to a domain.

Builder Profile

Once your profile has been set up, you’ll be able to publish your GPT and find it on the GPT store!

Publish

Current Problems

GPTs face a lot of the same underlying issues ChatGPT and other LLM backed assistants face today, these are primarily:

Other issues specific to GPTs are:

  • prone to errors
  • how they reason their usage with tools and actions, like sometimes making schema mistakes
  • how they perform RAG with large and complex datasets

Some of these will be understood more in time, especially as the platform and technology mature.

Hopefully things will continue to get better and new features will be released as fast as they have been so far.

Thanks for reading!

--

--

Alexander Sniffin

Software Engineer solving the next big problem one coffee at a time @ alexsniffin.com