Declare data sources

Data sources are the inputs to your semantic model. They include Snowflake tables, CSV files, in-memory DataFrames, and Python literals. This guide shows you how to declare those sources in your model. See Define Base Facts for the next step: turning source data into entities and relationships.

PyRel is installed and configured. See Set Up Your Environment for instructions.

Choose the right source type

Your source type determines how you reference fields (columns) in semantic declarations and queries. Most projects start with in-memory sources during prototyping and move to Snowflake tables when the data is shared and stable. Most inputs fall into two buckets:

Table-like sources. These are row-and-column datasets. This includes Snowflake tables/views and in-memory tabular data like DataFrames. It also includes CSV data you load into pandas. Use Model.Table for Snowflake. Use Model.data for DataFrames and small inline datasets.
Model constants. These are fixed values you write in Python. Use Python literals for one-off constants. Use Model.Enum for a small, fixed set of named constants that are queryable in the model.

This table helps you choose the right source type for your project:

I have a…	I can use…
Snowflake table or view	`Model.Table` to reference it by path. See Use a Snowflake table with `Model.Table`
pandas DataFrame in Python	`Model.data` to treat it as a temporary table in queries and definitions. See Use a DataFrame with `Model.data`
CSV file on disk	`Model.data` for local iteration; load it into Snowflake and switch to `Model.Table` when it’s shared or large. See Use CSV data with `Model.data`
A few explicit rows for examples or tests	`Model.data` to keep the example self-contained. See Use inline Python data with `Model.data`
A small set of model-specific named constants	`Model.Enum` for named values you can store and query; use Python literals for one-off constants. See Create model constants with `Model.Enum`

Use a Snowflake table with `Model.Table`

Model.Table() gives you a table-like object backed by a supported Snowflake source in the semantics DSL. This is the most common way to declare data sources in PyRel because it connects your model directly to the data in Snowflake and supports data at scale.

What Snowflake objects are supported

You can use the following Snowflake objects as sources with Model.Table() as long as your configured Snowflake role has SELECT on the source and change tracking is enabled:

Snowflake object	Supported?	Notes
Standard table or view	Yes
Snowflake-managed Iceberg table	Yes	This is a preview feature.
Temporary or transient table	No
Dynamic table	No
External table view	No

Use a native table or view

Follow these steps to declare a standard Snowflake table or view as a source with Model.Table():

Ensure change tracking is enabled

Before you declare the table, check whether Snowflake change tracking is already enabled on the table or view you plan to use. PyRel needs change tracking to read from Snowflake-backed sources through data streams.

You can use SQL or Python to check for change tracking.
- SQL
- Python
Run a metadata check for the source you plan to use and check that the CHANGE_TRACKING column is ON:
-- For a table SHOW TABLES LIKE '<MY_TABLE_NAME>' IN SCHEMA DB.SCHEMA; -- For a view SHOW VIEWS LIKE '<MY_VIEW_NAME>' IN SCHEMA DB.SCHEMA;
Run a metadata check for the source you plan to use and check that the CHANGE_TRACKING column is ON:
from relationalai.client import connect_sync with connect_sync() as client: sql = client.core.sql_executor assert sql is not None rows = sql.collect( "SHOW TABLES LIKE '<MY_TABLE_NAME>' IN SCHEMA DB.SCHEMA", operation="data.change_tracking.check_table", ) print(rows)
For a view, run SHOW VIEWS LIKE '<MY_VIEW_NAME>' IN SCHEMA DB.SCHEMA instead. In either result, the CHANGE_TRACKING column should be ON.
What to do if change tracking is missing
You need OWNERSHIP on the table or view to enable change tracking. Once change tracking is enabled, anyone with only SELECT privileges can use that source through PyRel.
If you own the source, enable change tracking directly in Snowflake. For the SQL and PyRel enablement steps, see Enable Change Tracking on a Table or View.
If you do not own the source, ask the table owner or a Snowflake admin to enable it for you.
You can also let PyRel try to enable change tracking with data.ensure_change_tracking. That setting is off by default and only works when the role running PyRel can alter the source. See Enable or disable automatic change tracking.
Declare the table

You can declare the table with or without a table schema:
- With schema is more explicit and helps schema mismatches fail quickly in production.
- Without schema is useful when you want faster exploration and iteration during development.
Use the Model.Table() method to declare the table:
- With schema
- Without schema
from relationalai.semantics import Integer, Model, String m = Model("MyModel") t = m.Table( "<MY_DB>.<MY_SCHEMA>.<MY_TABLE>", schema={ "COLUMN_1": Integer, "COLUMN_2": String, }, )
from relationalai.semantics import Model m = Model("MyModel") t = m.Table("<MY_DB>.<MY_SCHEMA>.<MY_TABLE>")
You can set a default Snowflake schema to avoid writing fully qualified names, or even define named inputs in config to use different tables across environments without changing your model code.
Verify by selecting a column

Use index access, such as t["COL"], to reference a column in a query with Model.select:
```
m.select(t["COLUMN_1"], t["COLUMN_2"]).to_df()
```
If the column name is a valid Python identifier, you can also use attribute access, such as t.col:
```
m.select(t["COLUMN_1"], t["COLUMN_2"]).to_df()
```
Attribute access is case-insensitive, so t.COLUMN_1 and t.column_1 both work.
What to do if your test query fails
If m.select(...).to_df() fails, check these first:
- The Snowflake path points to the table or view you intended to use.
- Your Snowflake role has SELECT on that source.
- You completed step 1, or PyRel is configured to try enabling change tracking with data.ensure_change_tracking = true.
- If change tracking is still missing and you do not have OWNERSHIP, ask the table owner or a Snowflake admin to enable it.
For more, see Troubleshoot common issues.

The value returned by Model.Table is a Table object. It behaves like a table whose columns you can reference in queries and definitions.
Without schema=, column metadata is resolved lazily the first time you access columns. If the table path, permissions, or source schema are not what you expect, you can get an error later when you run a query that references the table.
With schema={...}, column metadata is resolved immediately and column names become available right away. If the table path, permissions, or source schema are not what you expect, you can get an error as soon as you create the table object.
Turning table rows into entities and relationships is a separate step.

Use an Iceberg table

Follow these steps to declare a Snowflake-managed Iceberg table as a source with Model.Table():

Ensure change tracking is enabled

Before you declare the table, check whether Snowflake change tracking is already enabled on the Iceberg table you plan to use. PyRel needs change tracking to read from Snowflake-backed sources through data streams.

You can use SQL or Python to check for change tracking.
- SQL
- Python
Run a metadata check for the source you plan to use and check that the CHANGE_TRACKING column is ON:
SHOW ICEBERG TABLES LIKE '<MY_TABLE_NAME>' IN SCHEMA DB.SCHEMA;
Run a metadata check for the source you plan to use and check that the CHANGE_TRACKING column is ON:
from relationalai.client import connect_sync with connect_sync() as client: sql = client.core.sql_executor assert sql is not None rows = sql.collect( "SHOW ICEBERG TABLES LIKE '<MY_TABLE_NAME>' IN SCHEMA DB.SCHEMA", operation="data.change_tracking.check_table", ) print(rows)
In the result, the CHANGE_TRACKING column should be ON.
What to do if change tracking is missing
You need OWNERSHIP on the Iceberg table to enable change tracking. Once change tracking is enabled, anyone with only SELECT privileges can use that source through PyRel.
If you own the source, enable change tracking directly in Snowflake. For the SQL and PyRel enablement steps, see Enable Change Tracking on a Table or View.
If you do not own the source, ask the table owner or a Snowflake admin to enable it for you.
You can also let PyRel try to enable change tracking with data.ensure_change_tracking. That setting is off by default and only works when the role running PyRel can alter the source. See Enable or disable automatic change tracking.
Declare the table

You can declare the table with or without a table schema:
- With schema is more explicit and helps schema mismatches fail quickly in production.
- Without schema is useful when you want faster exploration and iteration during development.
Use the Model.Table() method to declare the table:
- With schema
- Without schema
from relationalai.semantics import Integer, Model, String m = Model("MyModel") t = m.Table( "<MY_DB>.<MY_SCHEMA>.<MY_ICEBERG_TABLE>", schema={ "CUSTOMER_ID": Integer, "NAME": String, }, )
from relationalai.semantics import Model m = Model("MyModel") t = m.Table("<MY_DB>.<MY_SCHEMA>.<MY_ICEBERG_TABLE>")
You can set a default Snowflake schema to avoid writing fully qualified names, or even define named inputs in config to use different tables across environments without changing your model code.
Verify by selecting a column

Use index access, such as t["COL"], to reference a column in a query with Model.select:
```
m.select(t["CUSTOMER_ID"], t["NAME"]).to_df()
```
If the column name is a valid Python identifier, you can also use attribute access, such as t.col:
```
m.select(t.customer_id, t.name).to_df()
```
Attribute access is case-insensitive, so t.CUSTOMER_ID and t.customer_id both work.
What to do if your test query fails
If m.select(...).to_df() fails, check these first:
- The Snowflake path points to the table you intended to use.
- Your Snowflake role has SELECT on that source.
- You completed step 1, or PyRel is configured to try enabling change tracking with data.ensure_change_tracking = true.
- If change tracking is still missing and you do not have OWNERSHIP, ask the table owner or a Snowflake admin to enable it.
For more, see Troubleshoot common issues.

Snowflake-managed Iceberg table support is a preview feature.
The value returned by Model.Table is a Table object. It behaves like a table whose columns you can reference in queries and definitions.
Prefer bracket access (t["COL"]) when a column name is not a valid Python identifier or contains spaces.
Without schema=, column metadata is resolved lazily the first time you access columns. If the table path, permissions, or source schema are not what you expect, you can get an error later when you run a query that references the table.
With schema={...}, column metadata is resolved immediately and column names become available right away. If the table path, permissions, or source schema are not what you expect, you can get an error as soon as you create the table object.
Turning table rows into entities and relationships is a separate step.

Set a default Snowflake schema to avoid fully-qualified names

To avoid writing fully qualified Snowflake paths in your model code, you can set a default schema in your configuration. This is especially helpful when most of your input tables live in the same Snowflake database and schema. It lets you write Model.Table("MY_TABLE") instead of Model.Table("MY_DB.MY_SCHEMA.MY_TABLE").

To set a default schema:

raiconfig.yaml
Programmatic

Set the tables.default_schema option in raiconfig.yaml

connections:
    # ...
tables:
  default_schema: "<MY_DB>.<MY_SCHEMA>"

Use an unqualified table name in your model

from relationalai.semantics import Model

m = Model("MyModel")
t = m.Table("<MY_TABLE_NAME>")

print(t.info.physical_name)
# <MY_DB>.<MY_SCHEMA>.<MY_TABLE_NAME>

Set the same value in Python
```
from relationalai.config import create_config

cfg = create_config(
    tables={
        "default_schema": "ANALYTICS.RAW",
    }
)
```
This example assumes you already have a valid connection configured in a file. If not, you can also set up a connection programmatically with create_config(connections={...}).

Use an unqualified table name in your model

from relationalai.semantics import Model

m = Model("MyModel", config=cfg)
t = m.Table("<MY_TABLE_NAME>")

# Inspect the physical name to verify that the default schema is applied.
print(t.info.physical_name)
# Output: <MY_DB>.<MY_SCHEMA>.<MY_TABLE_NAME>

If you call Model.Table() with a fully-qualified name, that name takes precedence over default_schema. This means you can use default_schema for most tables and still reference a few tables with fully-qualified names when needed.

Define named inputs in configuration

You can define named inputs in your configuration and reference them in your model code with Model.Table(). Use this pattern when you want to change the physical source behind an named input across environments without having to change your model code.

To define named inputs in configuration:

raiconfig.yaml
Programmatic

Create a named input in raiconfig.yaml

Define the tables section with a name for your source, such as MY_TABLE_SOURCE, and set the fqn to the physical Snowflake path for that source:

connections:
   # ...
tables:
  MY_TABLE_SOURCE:
    fqn: "<MY_DB>.<MY_SCHEMA>.<MY_TABLE>"
    type: "iceberg"  # Optional: set the table type if it's not native.

Alternatively, you can define the name in a profile override if you want to use different sources across environments:

connections:
   # ...
profiles:
dev:
   tables:
      MY_TABLE_SOURCE:
      fqn: "<MY_DEV_DB>.<MY_DEV_SCHEMA>.<MY_DEV_TABLE>"
prod:
   tables:
      MY_TABLE_SOURCE:
      fqn: "<MY_PROD_DB>.<MY_PROD_SCHEMA>.<MY_PROD_TABLE>"

See Use profiles to manage multiple configurations for details on using profiles.

Use the named input in your model

from relationalai.semantics import Model

m = Model("MyModel")
t = m.Table("MY_TABLE_SOURCE")

# Inspect the physical name to verify that the correct table is resolved from the config.
print(t.info.physical_name)
# Output: <MY_DB>.<MY_SCHEMA>.<MY_TABLE>

Create the same named input in Python

Pass a tables dictionary to create_config() with the following structure:

from relationalai.config import create_config

cfg = create_config(
    tables={
        "MY_TABLE_SOURCE": {
            "fqn": "<MY_DB>.<MY_SCHEMA>.<MY_TABLE>",
            "type": "iceberg",  # Optional: set the table type if it's not native.
        },
    }
)

Alternatively, you can set fqn to an environment variable and pass different values for that variable across environments:

import os

from relationalai.config import create_config

cfg = create_config(
    tables={
        "MY_TABLE_SOURCE": {
            "fqn": os.getenv("MY_TABLE_SOURCE_FQN"),
            "type": "iceberg",  # Optional: set the table type if it's not native.
        },
    }
)

Use the named input in your model

from relationalai.semantics import Model

m = Model("MyModel", config=cfg)
t = m.Table("MY_TABLE_SOURCE")

print(t.info.physical_name)
# Output: <MY_DB>.<MY_SCHEMA>.<MY_TABLE>

Use CSV data with `Model.data`

CSV files are a convenient starting point for local iteration. PyRel treats CSV data as in-memory tabular data after you load it in Python and pass it to Model.data. Choose the variant that matches how you prefer to parse CSVs.

Load a CSV with `pandas.read_csv`

Choose this when you already use pandas for cleanup and type normalization. This variant reads a CSV file into a DataFrame and then wraps it with Model.data.

Create a sample CSV file

Create a file named sample.csv in your working directory (or anywhere you can reference by path) with the following contents:
```
customer_id,name
1,Alice
2,Bob
```

Load the CSV file with Model.data

Read the file with pandas.read_csv, then call Model.data:

from pathlib import Path

import pandas as pd
from relationalai.semantics import Model

m = Model("MyModel")

csv_path = Path("sample.csv")
# If you created the file elsewhere, update the path.
# Example: csv_path = Path("/absolute/path/to/sample.csv")

df = pd.read_csv(csv_path, encoding="utf-8")
d = m.data(df)

Verify by selecting all columns

Use index access, like d["COL"], to reference a column in a query with Model.select:
```
m.select(d["COL1"], d["COL2"]).to_df()
```
You can also use attribute access, like d.col, for columns with valid Python identifiers:
```
m.select(d.col1, d.col2).to_df()
```
Accessing columns by attribute is case insensitive, so d.COL1 and d.col1 both work.

pandas.read_csv infers dtypes. If a column should stay a string, pass dtype= to pandas.read_csv or normalize types before you call Model.data. For example, treat an ID with leading zeros as a string.
If you see unexpected column names, fix them in pandas before you reference them in definitions. For example, trim leading and trailing whitespace.
Mapping CSV-backed columns into entities and relationships is a separate step.

Load a CSV with `csv.DictReader`

Choose this when you want to avoid pandas and keep dependencies minimal. This variant parses CSV text into a list of dictionaries and passes it to Model.data.

Create a sample CSV file

Create a file named sample.csv in your working directory (or anywhere you can reference by path) with the following contents:
```
customer_id,name
1,Alice
2,Bob
```

Load the CSV file with Model.data

Parse the file with csv.DictReader, then call Model.data:

import csv
from pathlib import Path

from relationalai.semantics import Model

m = Model("MyModel")

csv_path = Path("sample.csv")
# If you created the file elsewhere, update the path.
# Example: csv_path = Path("/absolute/path/to/sample.csv")

with csv_path.open("r", encoding="utf-8", newline="") as f:
    rows = list(csv.DictReader(f))
    d = m.data(rows)

Verify by selecting a column

Use index access, like d["COL"], to reference a column in a query with Model.select:
```
m.select(d["COL1"], d["COL2"]).to_df()
```

csv.DictReader returns strings for all values. If you need numeric types, convert values in Python before you call Model.data.
Always open the file with newline="" (as shown) so the csv module handles newlines consistently across platforms.
Model.data returns a Data object that behaves like a table with columns you can reference in queries and definitions.
Mapping CSV-backed columns into entities and relationships is a separate step.

Use a DataFrame with `Model.data`

A DataFrame source lets you reuse transformed in-memory tabular data as an input to model definitions. Choose this when you already have a pandas DataFrame from preprocessing, feature engineering, or notebook exploration.

Wrap a DataFrame with Model.data

Start from a DataFrame with stable column names, then call Model.data:

import pandas as pd
from relationalai.semantics import Model

m = Model("MyModel")

df = pd.DataFrame(
    [
        {"customer_id": 1, "name": "Alice"},
        {"customer_id": 2, "name": "Bob"},
    ]
)

d = m.data(df)

Verify by selecting columns

Select a couple of columns with Model.select to confirm the mapping is what you expect:
```
m.select(d.customer_id, d.name).to_df()
```

Model.data returns a Data object that behaves like a table with columns you can reference in queries and definitions.
You can use either dot access (d.name) or bracket access (d["name"]) to reference columns. Prefer bracket access when a column name isn’t a valid Python identifier.
If results look surprising, check df.dtypes and normalize critical columns before you call Model.data.
Mapping DataFrame-backed columns into entities and relationships is a separate step.

Use inline Python data with `Model.data`

Inline data is the fastest way to seed small, explicit rows for examples and tests. Choose this when you want the smallest possible repro without relying on external files or a database. Keep inline datasets small and schema-like so they don’t drift from your production sources.

Provide rows as a list of dictionaries

Choose this variant when you want column names to come directly from your Python keys.

Create a Data source from rows

Call Model.data with a list of dictionaries:

from relationalai.semantics import Model

m = Model("MyModel")

d = m.data(
    [
        {"name": "Alice", "age": 10},
        {"name": "Bob", "age": 30},
    ]
)

Preview the columns

Query the columns with Model.select:
```
m.select(d.name, d.age).to_df()
```

Model.data returns a Data object that behaves like a table with columns you can reference in queries and definitions.
You can use either dot access (d.name) or bracket access (d["name"]) to reference columns. Prefer bracket access when a column name isn’t a valid Python identifier.
If you have exactly one active model, you can also use the top-level data helper as a convenience wrapper around Model.data.
Mapping inline data columns into entities and relationships is a separate step.

Provide rows as a list of tuples

Choose this variant when your data is naturally row-oriented and you want to provide the column names explicitly.

Create a Data source and set column names

Pass columns=[...] so your column names are stable and readable in later declarations:

from relationalai.semantics import Model

m = Model("MyModel")

d = m.data(
    [(0, 72.5), (1, 71.9)],
    columns=["minute", "temperature"],
)

Preview the columns

Preview the columns with Model.select:
```
m.select(d.minute, d.temperature).to_df()
```

Model.data returns a Data object that behaves like a table with columns you can reference in queries and definitions. You can use either dot access (d.minute) or bracket access (d["minute"]) to reference columns.
If you omit columns for tuple rows, you can access columns by 0-based integer index, such as d[0] and d[1]. They are also exposed with the default names col0, col1, col2, … so you can write d.col0 or d["col0"] if you prefer.
Mapping inline data columns into entities and relationships is a separate step.

Create model constants with `Model.Enum`

Model.Enum creates a small, fixed set of named constants inside your model. Choose this when you want values that behave like model entities (so you can store them, join on them, and query them) rather than one-off Python literals. Enum members are defined lazily the first time you reference them in a query or definition.

Declare an enum type

Define an enum by subclassing Model.Enum:

from relationalai.semantics import Model

   m = Model("MyModel")

class Status(m.Enum):
    ACTIVE = "ACTIVE"
    INACTIVE = "INACTIVE"

Verify by selecting an enum member

Reference an enum member in a query with Model.select:
```
m.select(Status.ACTIVE).to_df()
```

If you only need a one-off constant, prefer a Python literal.
You can use enum members in queries and definitions just like other concepts and relationships. They are stored in the model and can be joined on, returned in results, and used in logic.

Troubleshoot common issues

Choose the table that matches the source type you’re troubleshooting.

If you’re using a Snowflake-backed source with Model.Table():

Symptom	Likely cause	Fix
`Model.Table("DB.SCHEMA.OBJECT")` or your first `.select(...).to_df()` call fails immediately	The table path or object name is wrong, or it resolves to a different object than you expected	Check the fully qualified Snowflake path and confirm that the object exists in the database and schema you intended to use.
`.select(...).to_df()` fails with an access or permission error	Your Snowflake role does not have `SELECT` on the source table or view, even if change tracking is already enabled	Grant `SELECT` on the source object, or switch to a role that already has it.
`.select(...).to_df()` fails because change tracking is not enabled	PyRel reads Snowflake-backed sources through data streams, which require change tracking on the table or view	Enable change tracking on the table or view, or set `data.ensure_change_tracking = true` so PyRel can try to enable it automatically. Enabling change tracking requires `OWNERSHIP`. If you are not the owner, ask the table owner or a Snowflake admin to enable it for you.
Queries run, but recent Snowflake changes do not appear in results	Source declaration succeeded, but the end-to-end sync path or query-time freshness settings are not behaving as you expect.	Use Manage Data Shared With the RAI Native App to check CDC service status and data stream health. Then review Configure data sync behavior for `data.wait_for_stream_sync` and `data.data_freshness_mins`.
The Snowflake object resolves, but PyRel still cannot use it as a Snowflake-backed source	The source object type is not supported for this workflow	Use a supported standard table or standard view instead. Do not use temporary tables, transient tables, dynamic tables, or external tables or views here. Snowflake-managed Iceberg tables are supported as a preview feature.

If you’re loading tabular data with Model.data():

Symptom	Likely cause	Fix
A CSV loaded with `pandas.read_csv()` produces unexpected types in PyRel	`pandas.read_csv()` inferred dtypes you did not want	Pass `dtype=` to `pandas.read_csv()`, or normalize types before you call `Model.data()`. This matters most for columns such as IDs that should remain strings.
A CSV loaded with `csv.DictReader` keeps every value as a string	`csv.DictReader` returns strings for all values	Convert values in Python before you call `Model.data()` if you need numeric types.
Column references fail, or column names are not what you expected after loading CSV data	The CSV headers need cleanup before you reference them	Clean up the column names before you reference them in definitions. For example, trim leading and trailing whitespace in pandas before you call `Model.data()`.
`d.some_column` fails for a DataFrame source or inline Python rows	The column name is not a valid Python identifier	Use bracket access such as `d["some column"]`. Prefer bracket access whenever a column name is not a valid Python identifier.
Results from a DataFrame-backed source look surprising	The underlying DataFrame dtypes are not what you expected	Check `df.dtypes` and normalize critical columns before you call `Model.data()`.
You loaded inline tuple rows, but the column names you expected are not available	You omitted `columns=[...]`, so PyRel exposed the tuple fields by index, with default names such as `col0` and `col1`	Pass `columns=[...]` for stable names, or access the data with `d[0]`, `d[1]`, `d.col0`, or `d["col0"]`.