%pip install vanna
%pip install snowflake-connector-python
import vanna as vn
import snowflake.connector
Login¶
Creating a login and getting an API key is as easy as entering your email (after you run this cell) and entering the code we send to you. Check your Spam folder if you don't see the code.
api_key = vn.get_api_key('my-email@example.com')
vn.set_api_key(api_key)
Set your Model¶
You need to choose a globally unique model name. Try using your company name or another unique string. All data from models are isolated - there's no leakage.
vn.set_model('my-model') # Enter your model name here. This is a globally unique identifier for your model.
Automatic Training¶
If you'd like to use automatic training, the Vanna package can crawl your database to fetch metadata to train your model. You can put in your Snowflake credentials here. These details are only referenced within your notebook. These database credentials are never sent to Vanna's severs.
vn.connect_to_snowflake(account='my-account', username='my-username', password='my-password', database='my-database')
training_plan = vn.get_training_plan_experimental(filter_databases=['SNOWFLAKE_SAMPLE_DATA'], filter_schemas=['TPCH_SF1'])
training_plan
Trying query history Trying INFORMATION_SCHEMA.COLUMNS for SNOWFLAKE_SAMPLE_DATA
Train on SQL: What are the top 10 customers ranked by total sales? Train on SQL: What are the top 10 customers in terms of total sales? Train on SQL: What are the top two customers with the highest total sales for each region? Train on SQL: What are the top 5 customers with the highest total sales? Train on SQL: What is the total quantity of each product sold in each region, ordered by region name and total quantity in descending order? Train on SQL: What is the number of orders for each week, starting from the most recent week? Train on SQL: What countries are in the region 'EUROPE'? Train on Information Schema: SNOWFLAKE_SAMPLE_DATA.TPCH_SF1 SUPPLIER Train on Information Schema: SNOWFLAKE_SAMPLE_DATA.TPCH_SF1 LINEITEM Train on Information Schema: SNOWFLAKE_SAMPLE_DATA.TPCH_SF1 CUSTOMER Train on Information Schema: SNOWFLAKE_SAMPLE_DATA.TPCH_SF1 PARTSUPP Train on Information Schema: SNOWFLAKE_SAMPLE_DATA.TPCH_SF1 PART Train on Information Schema: SNOWFLAKE_SAMPLE_DATA.TPCH_SF1 ORDERS Train on Information Schema: SNOWFLAKE_SAMPLE_DATA.TPCH_SF1 REGION Train on Information Schema: SNOWFLAKE_SAMPLE_DATA.TPCH_SF1 NATION
vn.train(plan=training_plan)
View Training Data¶
At any time you can see what training data is in your model
vn.get_training_data()
id | training_data_type | question | content | |
---|---|---|---|---|
0 | 15-doc | documentation | None | This is a table in the PARTSUPP table.\n\nThe ... |
1 | 11-doc | documentation | None | This is a table in the CUSTOMER table.\n\nThe ... |
2 | 14-doc | documentation | None | This is a table in the ORDERS table.\n\nThe fo... |
3 | 1244-sql | sql | What are the names of the top 10 customers? | SELECT c.c_name as customer_name\nFROM snowf... |
4 | 1242-sql | sql | What are the top 5 customers in terms of total... | SELECT c.c_name AS customer_name, SUM(l.l_quan... |
5 | 17-doc | documentation | None | This is a table in the REGION table.\n\nThe fo... |
6 | 16-doc | documentation | None | This is a table in the PART table.\n\nThe foll... |
7 | 1243-sql | sql | What are the top 10 customers with the highest... | SELECT c.c_name as customer_name,\n sum(... |
8 | 1239-sql | sql | What are the top 100 customers based on their ... | SELECT c.c_name as customer_name,\n sum(... |
9 | 13-doc | documentation | None | This is a table in the SUPPLIER table.\n\nThe ... |
10 | 1241-sql | sql | What are the top 10 customers in terms of tota... | SELECT c.c_name as customer_name,\n sum(... |
11 | 12-doc | documentation | None | This is a table in the LINEITEM table.\n\nThe ... |
12 | 18-doc | documentation | None | This is a table in the NATION table.\n\nThe fo... |
13 | 1248-sql | sql | How many customers are in each country? | SELECT n.n_name as country,\n count(*) a... |
14 | 1240-sql | sql | What is the number of orders placed each week? | SELECT date_trunc('week', o_orderdate) as week... |
Removing Training Data¶
If you added some training data by mistake, you can remove it. Model performance is directly linked to the quality of the training data.
vn.remove_training_data(id='my-training-data-id')