Categories
Tutorials

ChatGPT for SQL: A Repeatable Prompt for Better Data Queries

ChatGPT has been a lifesaver when I had to get some data for my articles stored in a database, but ChatGPT is only effective when trained. This tutorial will walk you through:

  • Prompting ChatGPT to help you describe the structure
  • Writing a prompt to help you interpret the data
  • Handling errors

The final prompt has this structure:

I have a [database type] with a table called [table name]. I want you to help me find information from the database. 

Here's the columns, data type, description and examples. 

[question]

[paste column description]

Here’s how to build the prompt.

Step 1: Get information about the database

Describing the database’s structure requires you to find out the columns, data types and their definitions.

To create this tutorial, I’m using some flight data from the online database sandbox SQLize so you don’t have to set anything up. Go to the dropdown labelled “SQL Server” and pick “PostgreSQL 10 Bookings (Read Only)”.

In the absence of knowing what’s in the table, I generally like to ask for 100 rows of all columns. The first query will include these bits of information:

  • Database type (e.g. MySQL? SQLite? PostgreSQL?)
  • Table name (in this case, “flights”)
  • Request (“show 100 rows with all columns”)
I have a PostgreSQL 10 database. I know the table is called "flights". Can you show me 100 rows with all columns?
chatgpt sql query
SELECT *
FROM flights
LIMIT 100;

Copy the query and run it in SQLize and you’ll get this output on the right:

chatgpt database output
Click to see sample data
flight_idflight_noscheduled_departurescheduled_arrivaldeparture_airportarrival_airportstatusaircraft_codeactual_departureactual_arrival
1185PG01342017-09-10 06:50:00+002017-09-10 11:55:00+00DMEBTKScheduled319[null][null]
3979PG00522017-08-25 11:50:00+002017-08-25 14:35:00+00VKOHMAScheduledCR2[null][null]
4739PG05612017-09-05 09:30:00+002017-09-05 11:15:00+00VKOAERScheduled763[null][null]
5502PG05292017-09-12 06:50:00+002017-09-12 08:20:00+00SVOUFAScheduled763[null][null]

Next, find out the data type of each column. I asked ChatGPT “are you able to tell me the data type for each column?” and it gave me this query:

SELECT 
    column_name,
    data_type,
    is_nullable
FROM information_schema.columns
WHERE table_name = 'flights'
ORDER BY ordinal_position;

I ran that and got this:

column_namedata_typeis_nullable
flight_idintegerNO
flight_nocharacterNO
scheduled_departuretimestamp with time zoneNO
scheduled_arrivaltimestamp with time zoneNO
departure_airportcharacterNO
arrival_airportcharacterNO
statuscharacter varyingNO
aircraft_codecharacterNO
actual_departuretimestamp with time zoneYES
actual_arrivaltimestamp with time zoneYES

Step 2: How to describe columns for ChatGPT to understand your table

Study the output and describe each column to create the heart of your prompt. I use this format:

[column name] : [data_type], [description OR distinct data points] [examples]

So here are the columns, and how I’d describe them:

flight_id : integer, a serial number (example: 1185, 3979, 4739, 5502)

flight_no: character, follows flight number conventions, 2 letters and usually 4 to 5 digits (example: PG0134, PG0052, PG0561)


scheduled_departure: timestamp with time zone (example: 2017-09-05 09:30:00+00)

scheduled_arrival: timestamp with time zone (example: 2017-09-05 11:15:00+00)

departure_airport: character, IATA codes consisting of 3 letters (example: DME, VKO, SVO)

arrival_airport: character,  IATA codes consisting of 3 letters (example: BTK, HMA, AER)

aircraft_code: character, usually 3 character (e.g. 319, CR2, 763)

status: *... see section below ...*

actual_departure: timestamp with time zone, may be NULL (example: 2017-09-05 11:15:00+00)

actual_arrival: timestamp with time zone, may be NULL (example: 2017-09-05 11:15:00+00)

Describing the ‘status’ column

When I reached the column status, I realized there probably aren’t too many distinct values in the column, so I decided to find out all the values and put them in the prompt.

Again, I prompted ChatGPT, “there’s a column called “status”, tell me its distinct values?”

chatgpt sql query

Add the unique values to the prompt, like this

status: character varying. Distinct values: "Departed", "Arrived", "On Time", "Cancelled", "Delayed", "Scheduled"

Step 3: Building the rest of the prompt

The next step involves building the head of the prompt by telling ChatGPT your database system (PostgreSQL 10) and your table’s name (flights).

I have a PostgreSQL 10 database with a table called "flights". I want you to help me find information from the database. 

Here's the columns, data type, description and examples. 

[insert question here if free tier, see below]

[paste column description from above]

If you are subscribed to a paid tier, consider creating your own GPT with the prompt. If you’re using the free tier, you can paste the whole prompt in with your question.

chatgpt prompt to write sql query

And the output:

statuscount
Arrived16707
Scheduled15383
On Time518
Cancelled414
Departed58
Delayed41

Query: Most common planes departing from DME

Next: let’s use the ChatGPT query “which are the top 3 most common airplane_codes with departure_airport being DME”

I deliberately used actual column names in my prompt (airplane_codes, departure_airport) to avoid misinterpretation.

SELECT 
    aircraft_code,
    COUNT(*) AS flight_count
FROM 
    flights
WHERE 
    departure_airport = 'DME'
GROUP BY 
    aircraft_code
ORDER BY 
    flight_count DESC
LIMIT 3;

Output:

aircraft_codeflight_count
SU9994
CR2829
321427

You can find out what aircraft these are using this Wikipedia table.

Handling errors

Error? Copy it back into the same ChatGPT prompt and use its updated query.

If that doesn’t work, you’d have to examine whether you’ve described the column incorrectly.

Also, try feeding ChatGPT more examples. You can do this effectively by uploading the raw output from the 100-sample test done as the first step.

How this can help you

The prompt above has helped me get data much faster, rather than requesting it through a third party.

Once you’ve structured your prompt this way, you’ll start to see ChatGPT consistently give you correct, efficient SQL queries even if you don’t have deep SQL knowledge yourself.

You can iterate as much as you want, and have different analyses just by speaking plain English to ChatGPT. Plus the technique above is repeatable on any table.

Have a project in mind?

Websites. Graphics. SEO-oriented content.

I can get your next project off the ground.

See how I have helped my clients.