Declare data sources
Data sources are the inputs to your semantic model. They include Snowflake tables, CSV files, in-memory DataFrames, and Python literals. This guide shows you how to declare those sources in your model. See Define Base Facts for the next step: turning source data into entities and relationships.
- PyRel is installed and configured. See Set Up Your Environment for instructions.
Choose the right source type
Section titled “Choose the right source type”Your source type determines how you reference fields (columns) in semantic declarations and queries. Most projects start with in-memory sources during prototyping and move to Snowflake tables when the data is shared and stable. Most inputs fall into two buckets:
- Table-like sources.
These are row-and-column datasets.
This includes Snowflake tables/views and in-memory tabular data like DataFrames.
It also includes CSV data you load into pandas.
Use
Model.Tablefor Snowflake. UseModel.datafor DataFrames and small inline datasets. - Model constants.
These are fixed values you write in Python.
Use Python literals for one-off constants.
Use
Model.Enumfor a small, fixed set of named constants that are queryable in the model.
This table helps you choose the right source type for your project:
| I have a… | I can use… |
|---|---|
| Snowflake table or view | Model.Table to reference it by path. See Use a Snowflake table with Model.Table |
| pandas DataFrame in Python | Model.data to treat it as a temporary table in queries and definitions. See Use a DataFrame with Model.data |
| CSV file on disk | Model.data for local iteration; load it into Snowflake and switch to Model.Table when it’s shared or large. See Use CSV data with Model.data |
| A few explicit rows for examples or tests | Model.data to keep the example self-contained. See Use inline Python data with Model.data |
| A small set of model-specific named constants | Model.Enum for named values you can store and query; use Python literals for one-off constants. See Create model constants with Model.Enum |
Use a Snowflake table with Model.Table
Section titled “Use a Snowflake table with Model.Table”Model.Table() gives you a table-like object backed by a supported Snowflake source in the semantics DSL.
This is the most common way to declare data sources in PyRel because it connects your model directly to the data in Snowflake and supports data at scale.
What Snowflake objects are supported
Section titled “What Snowflake objects are supported”You can use the following Snowflake objects as sources with Model.Table() as long as your configured Snowflake role has SELECT on the source and change tracking is enabled:
| Snowflake object | Supported? | Notes |
|---|---|---|
| Standard table or view | Yes | |
| Snowflake-managed Iceberg table | Yes | This is a preview feature. |
| Temporary or transient table | No | |
| Dynamic table | No | |
| External table view | No |
Use a native table or view
Section titled “Use a native table or view”Follow these steps to declare a standard Snowflake table or view as a source with Model.Table():
-
Ensure change tracking is enabled
Before you declare the table, check whether Snowflake change tracking is already enabled on the table or view you plan to use. PyRel needs change tracking to read from Snowflake-backed sources through data streams.
You can use SQL or Python to check for change tracking.
Run a metadata check for the source you plan to use and check that the
CHANGE_TRACKINGcolumn isON:-- For a tableSHOW TABLES LIKE '<MY_TABLE_NAME>' IN SCHEMA DB.SCHEMA;-- For a viewSHOW VIEWS LIKE '<MY_VIEW_NAME>' IN SCHEMA DB.SCHEMA;Run a metadata check for the source you plan to use and check that the
CHANGE_TRACKINGcolumn isON:from relationalai.client import connect_syncwith connect_sync() as client:sql = client.core.sql_executorassert sql is not Nonerows = sql.collect("SHOW TABLES LIKE '<MY_TABLE_NAME>' IN SCHEMA DB.SCHEMA",operation="data.change_tracking.check_table",)print(rows)For a view, run
SHOW VIEWS LIKE '<MY_VIEW_NAME>' IN SCHEMA DB.SCHEMAinstead. In either result, theCHANGE_TRACKINGcolumn should beON.What to do if change tracking is missing
You need
OWNERSHIPon the table or view to enable change tracking. Once change tracking is enabled, anyone with onlySELECTprivileges can use that source through PyRel.If you own the source, enable change tracking directly in Snowflake. For the SQL and PyRel enablement steps, see Enable Change Tracking on a Table or View.
If you do not own the source, ask the table owner or a Snowflake admin to enable it for you.
You can also let PyRel try to enable change tracking with
data.ensure_change_tracking. That setting is off by default and only works when the role running PyRel can alter the source. See Enable or disable automatic change tracking. -
Declare the table
You can declare the table with or without a table schema:
- With schema is more explicit and helps schema mismatches fail quickly in production.
- Without schema is useful when you want faster exploration and iteration during development.
Use the
Model.Table()method to declare the table:from relationalai.semantics import Integer, Model, Stringm = Model("MyModel")t = m.Table("<MY_DB>.<MY_SCHEMA>.<MY_TABLE>",schema={"COLUMN_1": Integer,"COLUMN_2": String,},)from relationalai.semantics import Modelm = Model("MyModel")t = m.Table("<MY_DB>.<MY_SCHEMA>.<MY_TABLE>") -
Verify by selecting a column
Use index access, such as
t["COL"], to reference a column in a query withModel.select:m.select(t["COLUMN_1"], t["COLUMN_2"]).to_df()If the column name is a valid Python identifier, you can also use attribute access, such as
t.col:m.select(t["COLUMN_1"], t["COLUMN_2"]).to_df()Attribute access is case-insensitive, so
t.COLUMN_1andt.column_1both work.What to do if your test query fails
If
m.select(...).to_df()fails, check these first:- The Snowflake path points to the table or view you intended to use.
- Your Snowflake role has
SELECTon that source. - You completed step 1, or PyRel is configured to try enabling change tracking with
data.ensure_change_tracking = true. - If change tracking is still missing and you do not have
OWNERSHIP, ask the table owner or a Snowflake admin to enable it.
For more, see Troubleshoot common issues.
- The value returned by
Model.Tableis aTableobject. It behaves like a table whose columns you can reference in queries and definitions. - Without
schema=, column metadata is resolved lazily the first time you access columns. If the table path, permissions, or source schema are not what you expect, you can get an error later when you run a query that references the table. - With
schema={...}, column metadata is resolved immediately and column names become available right away. If the table path, permissions, or source schema are not what you expect, you can get an error as soon as you create the table object. - Turning table rows into entities and relationships is a separate step.
Use an Iceberg table
Section titled “Use an Iceberg table”Follow these steps to declare a Snowflake-managed Iceberg table as a source with Model.Table():
-
Ensure change tracking is enabled
Before you declare the table, check whether Snowflake change tracking is already enabled on the Iceberg table you plan to use. PyRel needs change tracking to read from Snowflake-backed sources through data streams.
You can use SQL or Python to check for change tracking.
Run a metadata check for the source you plan to use and check that the
CHANGE_TRACKINGcolumn isON:SHOW ICEBERG TABLES LIKE '<MY_TABLE_NAME>' IN SCHEMA DB.SCHEMA;Run a metadata check for the source you plan to use and check that the
CHANGE_TRACKINGcolumn isON:from relationalai.client import connect_syncwith connect_sync() as client:sql = client.core.sql_executorassert sql is not Nonerows = sql.collect("SHOW ICEBERG TABLES LIKE '<MY_TABLE_NAME>' IN SCHEMA DB.SCHEMA",operation="data.change_tracking.check_table",)print(rows)In the result, the
CHANGE_TRACKINGcolumn should beON.What to do if change tracking is missing
You need
OWNERSHIPon the Iceberg table to enable change tracking. Once change tracking is enabled, anyone with onlySELECTprivileges can use that source through PyRel.If you own the source, enable change tracking directly in Snowflake. For the SQL and PyRel enablement steps, see Enable Change Tracking on a Table or View.
If you do not own the source, ask the table owner or a Snowflake admin to enable it for you.
You can also let PyRel try to enable change tracking with
data.ensure_change_tracking. That setting is off by default and only works when the role running PyRel can alter the source. See Enable or disable automatic change tracking. -
Declare the table
You can declare the table with or without a table schema:
- With schema is more explicit and helps schema mismatches fail quickly in production.
- Without schema is useful when you want faster exploration and iteration during development.
Use the
Model.Table()method to declare the table:from relationalai.semantics import Integer, Model, Stringm = Model("MyModel")t = m.Table("<MY_DB>.<MY_SCHEMA>.<MY_ICEBERG_TABLE>",schema={"CUSTOMER_ID": Integer,"NAME": String,},)from relationalai.semantics import Modelm = Model("MyModel")t = m.Table("<MY_DB>.<MY_SCHEMA>.<MY_ICEBERG_TABLE>") -
Verify by selecting a column
Use index access, such as
t["COL"], to reference a column in a query withModel.select:m.select(t["CUSTOMER_ID"], t["NAME"]).to_df()If the column name is a valid Python identifier, you can also use attribute access, such as
t.col:m.select(t.customer_id, t.name).to_df()Attribute access is case-insensitive, so
t.CUSTOMER_IDandt.customer_idboth work.What to do if your test query fails
If
m.select(...).to_df()fails, check these first:- The Snowflake path points to the table you intended to use.
- Your Snowflake role has
SELECTon that source. - You completed step 1, or PyRel is configured to try enabling change tracking with
data.ensure_change_tracking = true. - If change tracking is still missing and you do not have
OWNERSHIP, ask the table owner or a Snowflake admin to enable it.
For more, see Troubleshoot common issues.
- Snowflake-managed Iceberg table support is a preview feature.
- The value returned by
Model.Tableis aTableobject. It behaves like a table whose columns you can reference in queries and definitions. - Prefer bracket access (
t["COL"]) when a column name is not a valid Python identifier or contains spaces. - Without
schema=, column metadata is resolved lazily the first time you access columns. If the table path, permissions, or source schema are not what you expect, you can get an error later when you run a query that references the table. - With
schema={...}, column metadata is resolved immediately and column names become available right away. If the table path, permissions, or source schema are not what you expect, you can get an error as soon as you create the table object. - Turning table rows into entities and relationships is a separate step.
Set a default Snowflake schema to avoid fully-qualified names
Section titled “Set a default Snowflake schema to avoid fully-qualified names”To avoid writing fully qualified Snowflake paths in your model code, you can set a default schema in your configuration.
This is especially helpful when most of your input tables live in the same Snowflake database and schema.
It lets you write Model.Table("MY_TABLE") instead of Model.Table("MY_DB.MY_SCHEMA.MY_TABLE").
To set a default schema:
-
Set the
tables.default_schemaoption inraiconfig.yamlraiconfig.yaml connections:# ...tables:default_schema: "<MY_DB>.<MY_SCHEMA>" -
Use an unqualified table name in your model
from relationalai.semantics import Modelm = Model("MyModel")t = m.Table("<MY_TABLE_NAME>")print(t.info.physical_name)# <MY_DB>.<MY_SCHEMA>.<MY_TABLE_NAME>
-
Set the same value in Python
from relationalai.config import create_configcfg = create_config(tables={"default_schema": "ANALYTICS.RAW",}) -
Use an unqualified table name in your model
from relationalai.semantics import Modelm = Model("MyModel", config=cfg)t = m.Table("<MY_TABLE_NAME>")# Inspect the physical name to verify that the default schema is applied.print(t.info.physical_name)# Output: <MY_DB>.<MY_SCHEMA>.<MY_TABLE_NAME>
- If you call
Model.Table()with a fully-qualified name, that name takes precedence overdefault_schema. This means you can usedefault_schemafor most tables and still reference a few tables with fully-qualified names when needed.
Define named inputs in configuration
Section titled “Define named inputs in configuration”You can define named inputs in your configuration and reference them in your model code with Model.Table().
Use this pattern when you want to change the physical source behind an named input across environments without having to change your model code.
To define named inputs in configuration:
-
Create a named input in
raiconfig.yamlDefine the
tablessection with a name for your source, such asMY_TABLE_SOURCE, and set thefqnto the physical Snowflake path for that source:raiconfig.yaml connections:# ...tables:MY_TABLE_SOURCE:fqn: "<MY_DB>.<MY_SCHEMA>.<MY_TABLE>"type: "iceberg" # Optional: set the table type if it's not native.Alternatively, you can define the name in a profile override if you want to use different sources across environments:
raiconfig.yaml connections:# ...profiles:dev:tables:MY_TABLE_SOURCE:fqn: "<MY_DEV_DB>.<MY_DEV_SCHEMA>.<MY_DEV_TABLE>"prod:tables:MY_TABLE_SOURCE:fqn: "<MY_PROD_DB>.<MY_PROD_SCHEMA>.<MY_PROD_TABLE>"See Use profiles to manage multiple configurations for details on using profiles.
-
Use the named input in your model
from relationalai.semantics import Modelm = Model("MyModel")t = m.Table("MY_TABLE_SOURCE")# Inspect the physical name to verify that the correct table is resolved from the config.print(t.info.physical_name)# Output: <MY_DB>.<MY_SCHEMA>.<MY_TABLE>
-
Create the same named input in Python
Pass a
tablesdictionary tocreate_config()with the following structure:from relationalai.config import create_configcfg = create_config(tables={"MY_TABLE_SOURCE": {"fqn": "<MY_DB>.<MY_SCHEMA>.<MY_TABLE>","type": "iceberg", # Optional: set the table type if it's not native.},})Alternatively, you can set
fqnto an environment variable and pass different values for that variable across environments:import osfrom relationalai.config import create_configcfg = create_config(tables={"MY_TABLE_SOURCE": {"fqn": os.getenv("MY_TABLE_SOURCE_FQN"),"type": "iceberg", # Optional: set the table type if it's not native.},}) -
Use the named input in your model
from relationalai.semantics import Modelm = Model("MyModel", config=cfg)t = m.Table("MY_TABLE_SOURCE")print(t.info.physical_name)# Output: <MY_DB>.<MY_SCHEMA>.<MY_TABLE>
Use CSV data with Model.data
Section titled “Use CSV data with Model.data”CSV files are a convenient starting point for local iteration.
PyRel treats CSV data as in-memory tabular data after you load it in Python and pass it to Model.data.
Choose the variant that matches how you prefer to parse CSVs.
Load a CSV with pandas.read_csv
Section titled “Load a CSV with pandas.read_csv”Choose this when you already use pandas for cleanup and type normalization.
This variant reads a CSV file into a DataFrame and then wraps it with Model.data.
-
Create a sample CSV file
Create a file named
sample.csvin your working directory (or anywhere you can reference by path) with the following contents:customer_id,name1,Alice2,Bob -
Load the CSV file with
Model.dataRead the file with
pandas.read_csv, then callModel.data:from pathlib import Pathimport pandas as pdfrom relationalai.semantics import Modelm = Model("MyModel")csv_path = Path("sample.csv")# If you created the file elsewhere, update the path.# Example: csv_path = Path("/absolute/path/to/sample.csv")df = pd.read_csv(csv_path, encoding="utf-8")d = m.data(df) -
Verify by selecting all columns
Use index access, like
d["COL"], to reference a column in a query withModel.select:m.select(d["COL1"], d["COL2"]).to_df()You can also use attribute access, like
d.col, for columns with valid Python identifiers:m.select(d.col1, d.col2).to_df()Accessing columns by attribute is case insensitive, so
d.COL1andd.col1both work.
pandas.read_csvinfers dtypes. If a column should stay a string, passdtype=topandas.read_csvor normalize types before you callModel.data. For example, treat an ID with leading zeros as a string.- If you see unexpected column names, fix them in pandas before you reference them in definitions. For example, trim leading and trailing whitespace.
- Mapping CSV-backed columns into entities and relationships is a separate step.
Load a CSV with csv.DictReader
Section titled “Load a CSV with csv.DictReader”Choose this when you want to avoid pandas and keep dependencies minimal.
This variant parses CSV text into a list of dictionaries and passes it to Model.data.
-
Create a sample CSV file
Create a file named
sample.csvin your working directory (or anywhere you can reference by path) with the following contents:customer_id,name1,Alice2,Bob -
Load the CSV file with
Model.dataParse the file with
csv.DictReader, then callModel.data:import csvfrom pathlib import Pathfrom relationalai.semantics import Modelm = Model("MyModel")csv_path = Path("sample.csv")# If you created the file elsewhere, update the path.# Example: csv_path = Path("/absolute/path/to/sample.csv")with csv_path.open("r", encoding="utf-8", newline="") as f:rows = list(csv.DictReader(f))d = m.data(rows) -
Verify by selecting a column
Use index access, like
d["COL"], to reference a column in a query withModel.select:m.select(d["COL1"], d["COL2"]).to_df()
csv.DictReaderreturns strings for all values. If you need numeric types, convert values in Python before you callModel.data.- Always open the file with
newline=""(as shown) so thecsvmodule handles newlines consistently across platforms. Model.datareturns aDataobject that behaves like a table with columns you can reference in queries and definitions.- Mapping CSV-backed columns into entities and relationships is a separate step.
Use a DataFrame with Model.data
Section titled “Use a DataFrame with Model.data”A DataFrame source lets you reuse transformed in-memory tabular data as an input to model definitions. Choose this when you already have a pandas DataFrame from preprocessing, feature engineering, or notebook exploration.
-
Wrap a DataFrame with
Model.dataStart from a DataFrame with stable column names, then call
Model.data:import pandas as pdfrom relationalai.semantics import Modelm = Model("MyModel")df = pd.DataFrame([{"customer_id": 1, "name": "Alice"},{"customer_id": 2, "name": "Bob"},])d = m.data(df) -
Verify by selecting columns
Select a couple of columns with
Model.selectto confirm the mapping is what you expect:m.select(d.customer_id, d.name).to_df()
Model.datareturns aDataobject that behaves like a table with columns you can reference in queries and definitions.- You can use either dot access (
d.name) or bracket access (d["name"]) to reference columns. Prefer bracket access when a column name isn’t a valid Python identifier. - If results look surprising, check
df.dtypesand normalize critical columns before you callModel.data. - Mapping DataFrame-backed columns into entities and relationships is a separate step.
Use inline Python data with Model.data
Section titled “Use inline Python data with Model.data”Inline data is the fastest way to seed small, explicit rows for examples and tests. Choose this when you want the smallest possible repro without relying on external files or a database. Keep inline datasets small and schema-like so they don’t drift from your production sources.
Provide rows as a list of dictionaries
Section titled “Provide rows as a list of dictionaries”Choose this variant when you want column names to come directly from your Python keys.
-
Create a
Datasource from rowsCall
Model.datawith a list of dictionaries:from relationalai.semantics import Modelm = Model("MyModel")d = m.data([{"name": "Alice", "age": 10},{"name": "Bob", "age": 30},]) -
Preview the columns
Query the columns with
Model.select:m.select(d.name, d.age).to_df()
Model.datareturns aDataobject that behaves like a table with columns you can reference in queries and definitions.- You can use either dot access (
d.name) or bracket access (d["name"]) to reference columns. Prefer bracket access when a column name isn’t a valid Python identifier. - If you have exactly one active model, you can also use the top-level
datahelper as a convenience wrapper aroundModel.data. - Mapping inline data columns into entities and relationships is a separate step.
Provide rows as a list of tuples
Section titled “Provide rows as a list of tuples”Choose this variant when your data is naturally row-oriented and you want to provide the column names explicitly.
-
Create a
Datasource and set column namesPass
columns=[...]so your column names are stable and readable in later declarations:from relationalai.semantics import Modelm = Model("MyModel")d = m.data([(0, 72.5), (1, 71.9)],columns=["minute", "temperature"],) -
Preview the columns
Preview the columns with
Model.select:m.select(d.minute, d.temperature).to_df()
Model.datareturns aDataobject that behaves like a table with columns you can reference in queries and definitions. You can use either dot access (d.minute) or bracket access (d["minute"]) to reference columns.- If you omit
columnsfor tuple rows, you can access columns by 0-based integer index, such asd[0]andd[1]. They are also exposed with the default namescol0,col1,col2, … so you can writed.col0ord["col0"]if you prefer. - Mapping inline data columns into entities and relationships is a separate step.
Create model constants with Model.Enum
Section titled “Create model constants with Model.Enum”Model.Enum creates a small, fixed set of named constants inside your model.
Choose this when you want values that behave like model entities (so you can store them, join on them, and query them) rather than one-off Python literals.
Enum members are defined lazily the first time you reference them in a query or definition.
-
Declare an enum type
Define an enum by subclassing
Model.Enum:from relationalai.semantics import Modelm = Model("MyModel")class Status(m.Enum):ACTIVE = "ACTIVE"INACTIVE = "INACTIVE" -
Verify by selecting an enum member
Reference an enum member in a query with
Model.select:m.select(Status.ACTIVE).to_df()
- If you only need a one-off constant, prefer a Python literal.
- You can use enum members in queries and definitions just like other concepts and relationships. They are stored in the model and can be joined on, returned in results, and used in logic.
Troubleshoot common issues
Section titled “Troubleshoot common issues”Choose the table that matches the source type you’re troubleshooting.
If you’re using a Snowflake-backed source with Model.Table():
| Symptom | Likely cause | Fix |
|---|---|---|
Model.Table("DB.SCHEMA.OBJECT") or your first .select(...).to_df() call fails immediately | The table path or object name is wrong, or it resolves to a different object than you expected | Check the fully qualified Snowflake path and confirm that the object exists in the database and schema you intended to use. |
.select(...).to_df() fails with an access or permission error | Your Snowflake role does not have SELECT on the source table or view, even if change tracking is already enabled | Grant SELECT on the source object, or switch to a role that already has it. |
.select(...).to_df() fails because change tracking is not enabled | PyRel reads Snowflake-backed sources through data streams, which require change tracking on the table or view | Enable change tracking on the table or view, or set data.ensure_change_tracking = true so PyRel can try to enable it automatically. Enabling change tracking requires OWNERSHIP. If you are not the owner, ask the table owner or a Snowflake admin to enable it for you. |
| Queries run, but recent Snowflake changes do not appear in results | Source declaration succeeded, but the end-to-end sync path or query-time freshness settings are not behaving as you expect. | Use Manage Data Shared With the RAI Native App to check CDC service status and data stream health. Then review Configure data sync behavior for data.wait_for_stream_sync and data.data_freshness_mins. |
| The Snowflake object resolves, but PyRel still cannot use it as a Snowflake-backed source | The source object type is not supported for this workflow | Use a supported standard table or standard view instead. Do not use temporary tables, transient tables, dynamic tables, or external tables or views here. Snowflake-managed Iceberg tables are supported as a preview feature. |
If you’re loading tabular data with Model.data():
| Symptom | Likely cause | Fix |
|---|---|---|
A CSV loaded with pandas.read_csv() produces unexpected types in PyRel | pandas.read_csv() inferred dtypes you did not want | Pass dtype= to pandas.read_csv(), or normalize types before you call Model.data(). This matters most for columns such as IDs that should remain strings. |
A CSV loaded with csv.DictReader keeps every value as a string | csv.DictReader returns strings for all values | Convert values in Python before you call Model.data() if you need numeric types. |
| Column references fail, or column names are not what you expected after loading CSV data | The CSV headers need cleanup before you reference them | Clean up the column names before you reference them in definitions. For example, trim leading and trailing whitespace in pandas before you call Model.data(). |
d.some_column fails for a DataFrame source or inline Python rows | The column name is not a valid Python identifier | Use bracket access such as d["some column"]. Prefer bracket access whenever a column name is not a valid Python identifier. |
| Results from a DataFrame-backed source look surprising | The underlying DataFrame dtypes are not what you expected | Check df.dtypes and normalize critical columns before you call Model.data(). |
| You loaded inline tuple rows, but the column names you expected are not available | You omitted columns=[...], so PyRel exposed the tuple fields by index, with default names such as col0 and col1 | Pass columns=[...] for stable names, or access the data with d[0], d[1], d.col0, or d["col0"]. |