Published

How Marqo can help generate SQL using AI via the Vanna Python library

Marqo is an end-to-end, multimodal vector search engine. Users can store and query unstructured data such as text, images, and code

Vanna now supports Marqo as a native integration to help generate SQL using AI

What is Marqo?

Marqo

Marqo is an end-to-end, multimodal vector search engine. Users can store and query unstructured data such as text, images, and code. You can see more about Marqo here . Once you've got a Marqo instance up and running, you can use it to generate SQL with the help of the Vanna Python library.

What is Vanna?

Vanna is a Python library that uses Retrieval Augmented Generation (RAG) to generate SQL queries. Vanna

Fundamentally it provides 2 high-level functions:

  1. vn.train() - This is used to store information in a vector database or vector search like Marqo.
  2. vn.ask() - This will retrieve information from the vector database and use it to construct a prompt for the LLM. The LLM will then generate SQL queries based on the prompt.

How can you use it to help generate SQL?

You can give the Vanna Python library information about your database and then ask it to generate SQL queries for you. The metadata will be stored in Marqo and then used to generate SQL queries via your favorite LLM.

Start the Marqo Docker Container

vanna-marqo-start

Install the Vanna Python Library and the Marqo Python Library

pip install vanna
pip install marqo

Configure Vanna to use Marqo as the Vector Store, with your favorite LLM

In this case we're using OpenAI's GPT-4 model.

from vanna.marqo.marqo import Marqo_VectorStore
from vanna.openai.openai_chat import OpenAI_Chat

class MyVanna(Marqo_VectorStore, OpenAI_Chat):
    def __init__(self, config=None):
        Marqo_VectorStore.__init__(self, config=config) # You can pass additional Marqo configuration options like the url and model name here
        OpenAI_Chat.__init__(self, config=config)

vn = MyVanna(config={'api_key': os.environ['OPENAI_API_KEY'], 'model': 'gpt-4'})

Add Training Data

An example of training data could be DDL statements, SQL queries, or any other text-based documentation you have about your database, business, or industry.

Here we'll add a simple DDL statement to the vector store.

vn.train(ddl='CREATE TABLE users (id INT, name VARCHAR(255), email VARCHAR(255), PRIMARY KEY (id))')

In reality, you'd want to add a lot more training data. You can add as much as you want. The more you add, the better the results will be. See this example notebook for how you can do this.

Ask a Question

Now we can ask a question and get the SQL, tabular results, automatically generated Plotly code, and potential follow-up questions.

vn.ask('How many users are there?')

Vanna Jupyter Notebook

Connect to a Database

You can also connect to a database to run the queries and get the tabular results, chart, and more. See this example notebook for how you can do this. The notebook also shows how to connect to various databases and other LLMs.

User Interface

You can use this with Streamlit or any other Python-based UI framework. We have 2 open-source streamlit apps that you can use as a starting point.

Vanna Streamlit

Run this as a Notebook

Run as a Notebook

Vanna Logo Vanna.AI

The fastest way to get insights from your database just by asking questions