Page cover image

Connecting Data

Adding structured and unstructured data for your AI Analyst to work with

Connect your database

You'll be asked to connect a database when creating a bot. We currently support Postgres, BigQuery, MySQL, and Snowflake and plan to support every major database in the future.

The standard bot setup requires read/write access to your database. Patterns needs to create tables for persisting result sets from analyses, when connecting Patterns to a new database these tables will be created in it's own namespace.

No writable database, need help setting up?

Don't have a data warehouse to connect? Need help setting one up? We provide full-service implementation, data source setup, and data modeling. Please inquire here for more information.

Managing Table Access

After your database is connected, you will be asked to choose available Tables for your AI Analyst through the Table Manager.

Table manager

Tables can be hidden from the Agent if they are not intended to be queried; this is common in the case when the Table is part of a larger data pipeline and only intended as an intermediate representation of data.

You can access the Table manager anytime by clicking here:

Table context

In addition to being queryable, Tables also provide your AI Analyst with context for improved query performance. Context includes the Tables schema, readme, and a sample of the data.

Implied context

  • Schema - a simple representation of all a tables columns and types.

  • Data Sample - five random rows of data from the table

Additional context

In a Table's Readme you can add hints and clues for the LLM to use that table with better performance. It's common to add column definitions, common values, or query suggestions to the Readme.

For example, in a Table of companies there is column for employee_count which contains strings for different segments of companies. The base Agent may intuit the 8 different categories contained within this column by using a sample of the data, but to be sure we always query this column correctly we can add a hint like this:

employee_count contains the following values:
- 1-10
- 11-50
- 51-100
- 101-250
- 251-500
- 5001-10000
- 501-1000
- unknown

In another example, a company's customers table has two different types of customer_ids per order. The first, a simple customer_id is generated for each order, meanwhile another column unique_customer_id persists a stable customer identifier across all orders. You might add the following text to the customers table in this case to clarify querying:

customer_id - an identifier assigned to all orders in the dataset and joins to the orders table. The same customer will get different ids for different orders. 
customer_unique_id - a stable identifier that can be used to query for all of a customers orders

Unstructured Documents

You can add markdown documents for your AI Analyst to use in generating responses to questions you ask it. You Analyst will retrieve a few documents that are semantically similar to the question asked, and base its answer off of the content in that document.

You can add documents by clicking on the box below:

When queried, your AI Analyst will retrieve chunks of documents that are determined by a hash delimiter. E.g. when you add a # to a document, the content before and after it will be consider separate chunks. AI Analyst will retrieve 2-3 semantically similar documents everytime a user question is asked.

Advanced ways to add data

Patterns' backend is a data and code automation platform, in Developer Mode you can find other ways to import data into your database.

Python scripts to upload data

You can write arbitrary python within a Patterns node to import data from an API, CSV, database, file or any other external source. Utilize the Patterns Table object to persist Tables in your database. Below is an example for how to do this with the ISS Shuttle Location API:

from patterns import State, Table
from datetime import datetime, timedelta
import requests

detect_freq = timedelta(seconds=5)
desired_data_points = 60

# Connect to a table in write mode:
iss_locations = Table("iss_locations", "w")

# Use State to track how many runs we have
state = State()
data_count = state.get_value("data_count", 0)

if data_count >= desired_data_points:
    # If we have enough data, exit
    exit()

# Fetch the current location of the International Space Station
resp = requests.get(
    "http://api.open-notify.org/iss-now.json",
)

# Write the location to the stream
iss_locations.append(resp.json())

Upload CSV

Simple click and upload. Needs to be CSV format and can only handle up to 30mb.

Webhooks for data ingestion

Webhook nodes generate an endpoint on the Patterns platform for you to POST data to. The webhook will generate a url that you can then send form-encoded or json data to via HTTP POST. Your webhook will be automatically connected to an output table where webhook request payloads will appear.

Last updated