Skip to content

Declare data sources

Data sources are the inputs to your semantic model. They include Snowflake tables, CSV files, in-memory DataFrames, and Python literals. This guide shows you how to declare those sources in your model. See Define Base Facts for the next step: turning source data into entities and relationships.

Your source type determines how you reference fields (columns) in semantic declarations and queries. Most projects start with in-memory sources during prototyping and move to Snowflake tables when the data is shared and stable. Most inputs fall into two buckets:

  • Table-like sources. These are row-and-column datasets. This includes Snowflake tables/views and in-memory tabular data like DataFrames. It also includes CSV data you load into pandas. Use Model.Table for Snowflake. Use Model.data for DataFrames and small inline datasets.
  • Model constants. These are fixed values you write in Python. Use Python literals for one-off constants. Use Model.Enum for a small, fixed set of named constants that are queryable in the model.

This table helps you choose the right source type for your project:

I have a…I can use…
Snowflake table or viewModel.Table to reference it by path. See Use a Snowflake table with Model.Table
pandas DataFrame in PythonModel.data to treat it as a temporary table in queries and definitions. See Use a DataFrame with Model.data
CSV file on diskModel.data for local iteration; load it into Snowflake and switch to Model.Table when it’s shared or large. See Use CSV data with Model.data
A few explicit rows for examples or testsModel.data to keep the example self-contained. See Use inline Python data with Model.data
A small set of model-specific named constantsModel.Enum for named values you can store and query; use Python literals for one-off constants. See Create model constants with Model.Enum

Model.Table() gives you a table-like object backed by a supported Snowflake source in the semantics DSL. This is the most common way to declare data sources in PyRel because it connects your model directly to the data in Snowflake and supports data at scale.

You can use the following Snowflake objects as sources with Model.Table() as long as your configured Snowflake role has SELECT on the source and change tracking is enabled:

Snowflake objectSupported?Notes
Standard table or viewYes
Snowflake-managed Iceberg tableYesThis is a preview feature.
Temporary or transient tableNo
Dynamic tableNo
External table viewNo

Follow these steps to declare a standard Snowflake table or view as a source with Model.Table():

  1. Ensure change tracking is enabled

    Before you declare the table, check whether Snowflake change tracking is already enabled on the table or view you plan to use. PyRel needs change tracking to read from Snowflake-backed sources through data streams.

    You can use SQL or Python to check for change tracking.

    Run a metadata check for the source you plan to use and check that the CHANGE_TRACKING column is ON:

    -- For a table
    SHOW TABLES LIKE '<MY_TABLE_NAME>' IN SCHEMA DB.SCHEMA;
    -- For a view
    SHOW VIEWS LIKE '<MY_VIEW_NAME>' IN SCHEMA DB.SCHEMA;
    What to do if change tracking is missing

    You need OWNERSHIP on the table or view to enable change tracking. Once change tracking is enabled, anyone with only SELECT privileges can use that source through PyRel.

    If you own the source, enable change tracking directly in Snowflake. For the SQL and PyRel enablement steps, see Enable Change Tracking on a Table or View.

    If you do not own the source, ask the table owner or a Snowflake admin to enable it for you.

    You can also let PyRel try to enable change tracking with data.ensure_change_tracking. That setting is off by default and only works when the role running PyRel can alter the source. See Enable or disable automatic change tracking.

  2. Declare the table

    You can declare the table with or without a table schema:

    • With schema is more explicit and helps schema mismatches fail quickly in production.
    • Without schema is useful when you want faster exploration and iteration during development.

    Use the Model.Table() method to declare the table:

    from relationalai.semantics import Integer, Model, String
    m = Model("MyModel")
    t = m.Table(
    "<MY_DB>.<MY_SCHEMA>.<MY_TABLE>",
    schema={
    "COLUMN_1": Integer,
    "COLUMN_2": String,
    },
    )
  3. Verify by selecting a column

    Use index access, such as t["COL"], to reference a column in a query with Model.select:

    m.select(t["COLUMN_1"], t["COLUMN_2"]).to_df()

    If the column name is a valid Python identifier, you can also use attribute access, such as t.col:

    m.select(t["COLUMN_1"], t["COLUMN_2"]).to_df()

    Attribute access is case-insensitive, so t.COLUMN_1 and t.column_1 both work.

    What to do if your test query fails

    If m.select(...).to_df() fails, check these first:

    • The Snowflake path points to the table or view you intended to use.
    • Your Snowflake role has SELECT on that source.
    • You completed step 1, or PyRel is configured to try enabling change tracking with data.ensure_change_tracking = true.
    • If change tracking is still missing and you do not have OWNERSHIP, ask the table owner or a Snowflake admin to enable it.

    For more, see Troubleshoot common issues.

  • The value returned by Model.Table is a Table object. It behaves like a table whose columns you can reference in queries and definitions.
  • Without schema=, column metadata is resolved lazily the first time you access columns. If the table path, permissions, or source schema are not what you expect, you can get an error later when you run a query that references the table.
  • With schema={...}, column metadata is resolved immediately and column names become available right away. If the table path, permissions, or source schema are not what you expect, you can get an error as soon as you create the table object.
  • Turning table rows into entities and relationships is a separate step.

Follow these steps to declare a Snowflake-managed Iceberg table as a source with Model.Table():

  1. Ensure change tracking is enabled

    Before you declare the table, check whether Snowflake change tracking is already enabled on the Iceberg table you plan to use. PyRel needs change tracking to read from Snowflake-backed sources through data streams.

    You can use SQL or Python to check for change tracking.

    Run a metadata check for the source you plan to use and check that the CHANGE_TRACKING column is ON:

    SHOW ICEBERG TABLES LIKE '<MY_TABLE_NAME>' IN SCHEMA DB.SCHEMA;
    What to do if change tracking is missing

    You need OWNERSHIP on the Iceberg table to enable change tracking. Once change tracking is enabled, anyone with only SELECT privileges can use that source through PyRel.

    If you own the source, enable change tracking directly in Snowflake. For the SQL and PyRel enablement steps, see Enable Change Tracking on a Table or View.

    If you do not own the source, ask the table owner or a Snowflake admin to enable it for you.

    You can also let PyRel try to enable change tracking with data.ensure_change_tracking. That setting is off by default and only works when the role running PyRel can alter the source. See Enable or disable automatic change tracking.

  2. Declare the table

    You can declare the table with or without a table schema:

    • With schema is more explicit and helps schema mismatches fail quickly in production.
    • Without schema is useful when you want faster exploration and iteration during development.

    Use the Model.Table() method to declare the table:

    from relationalai.semantics import Integer, Model, String
    m = Model("MyModel")
    t = m.Table(
    "<MY_DB>.<MY_SCHEMA>.<MY_ICEBERG_TABLE>",
    schema={
    "CUSTOMER_ID": Integer,
    "NAME": String,
    },
    )
  3. Verify by selecting a column

    Use index access, such as t["COL"], to reference a column in a query with Model.select:

    m.select(t["CUSTOMER_ID"], t["NAME"]).to_df()

    If the column name is a valid Python identifier, you can also use attribute access, such as t.col:

    m.select(t.customer_id, t.name).to_df()

    Attribute access is case-insensitive, so t.CUSTOMER_ID and t.customer_id both work.

    What to do if your test query fails

    If m.select(...).to_df() fails, check these first:

    • The Snowflake path points to the table you intended to use.
    • Your Snowflake role has SELECT on that source.
    • You completed step 1, or PyRel is configured to try enabling change tracking with data.ensure_change_tracking = true.
    • If change tracking is still missing and you do not have OWNERSHIP, ask the table owner or a Snowflake admin to enable it.

    For more, see Troubleshoot common issues.

  • Snowflake-managed Iceberg table support is a preview feature.
  • The value returned by Model.Table is a Table object. It behaves like a table whose columns you can reference in queries and definitions.
  • Prefer bracket access (t["COL"]) when a column name is not a valid Python identifier or contains spaces.
  • Without schema=, column metadata is resolved lazily the first time you access columns. If the table path, permissions, or source schema are not what you expect, you can get an error later when you run a query that references the table.
  • With schema={...}, column metadata is resolved immediately and column names become available right away. If the table path, permissions, or source schema are not what you expect, you can get an error as soon as you create the table object.
  • Turning table rows into entities and relationships is a separate step.

Set a default Snowflake schema to avoid fully-qualified names

Section titled “Set a default Snowflake schema to avoid fully-qualified names”

To avoid writing fully qualified Snowflake paths in your model code, you can set a default schema in your configuration. This is especially helpful when most of your input tables live in the same Snowflake database and schema. It lets you write Model.Table("MY_TABLE") instead of Model.Table("MY_DB.MY_SCHEMA.MY_TABLE").

To set a default schema:

  1. Set the tables.default_schema option in raiconfig.yaml

    raiconfig.yaml
    connections:
    # ...
    tables:
    default_schema: "<MY_DB>.<MY_SCHEMA>"
  2. Use an unqualified table name in your model

    from relationalai.semantics import Model
    m = Model("MyModel")
    t = m.Table("<MY_TABLE_NAME>")
    print(t.info.physical_name)
    # <MY_DB>.<MY_SCHEMA>.<MY_TABLE_NAME>
  • If you call Model.Table() with a fully-qualified name, that name takes precedence over default_schema. This means you can use default_schema for most tables and still reference a few tables with fully-qualified names when needed.

You can define named inputs in your configuration and reference them in your model code with Model.Table(). Use this pattern when you want to change the physical source behind an named input across environments without having to change your model code.

To define named inputs in configuration:

  1. Create a named input in raiconfig.yaml

    Define the tables section with a name for your source, such as MY_TABLE_SOURCE, and set the fqn to the physical Snowflake path for that source:

    raiconfig.yaml
    connections:
    # ...
    tables:
    MY_TABLE_SOURCE:
    fqn: "<MY_DB>.<MY_SCHEMA>.<MY_TABLE>"
    type: "iceberg" # Optional: set the table type if it's not native.

    Alternatively, you can define the name in a profile override if you want to use different sources across environments:

    raiconfig.yaml
    connections:
    # ...
    profiles:
    dev:
    tables:
    MY_TABLE_SOURCE:
    fqn: "<MY_DEV_DB>.<MY_DEV_SCHEMA>.<MY_DEV_TABLE>"
    prod:
    tables:
    MY_TABLE_SOURCE:
    fqn: "<MY_PROD_DB>.<MY_PROD_SCHEMA>.<MY_PROD_TABLE>"

    See Use profiles to manage multiple configurations for details on using profiles.

  2. Use the named input in your model

    from relationalai.semantics import Model
    m = Model("MyModel")
    t = m.Table("MY_TABLE_SOURCE")
    # Inspect the physical name to verify that the correct table is resolved from the config.
    print(t.info.physical_name)
    # Output: <MY_DB>.<MY_SCHEMA>.<MY_TABLE>

CSV files are a convenient starting point for local iteration. PyRel treats CSV data as in-memory tabular data after you load it in Python and pass it to Model.data. Choose the variant that matches how you prefer to parse CSVs.

Choose this when you already use pandas for cleanup and type normalization. This variant reads a CSV file into a DataFrame and then wraps it with Model.data.

  1. Create a sample CSV file

    Create a file named sample.csv in your working directory (or anywhere you can reference by path) with the following contents:

    customer_id,name
    1,Alice
    2,Bob
  2. Load the CSV file with Model.data

    Read the file with pandas.read_csv, then call Model.data:

    from pathlib import Path
    import pandas as pd
    from relationalai.semantics import Model
    m = Model("MyModel")
    csv_path = Path("sample.csv")
    # If you created the file elsewhere, update the path.
    # Example: csv_path = Path("/absolute/path/to/sample.csv")
    df = pd.read_csv(csv_path, encoding="utf-8")
    d = m.data(df)
  3. Verify by selecting all columns

    Use index access, like d["COL"], to reference a column in a query with Model.select:

    m.select(d["COL1"], d["COL2"]).to_df()

    You can also use attribute access, like d.col, for columns with valid Python identifiers:

    m.select(d.col1, d.col2).to_df()

    Accessing columns by attribute is case insensitive, so d.COL1 and d.col1 both work.

  • pandas.read_csv infers dtypes. If a column should stay a string, pass dtype= to pandas.read_csv or normalize types before you call Model.data. For example, treat an ID with leading zeros as a string.
  • If you see unexpected column names, fix them in pandas before you reference them in definitions. For example, trim leading and trailing whitespace.
  • Mapping CSV-backed columns into entities and relationships is a separate step.

Choose this when you want to avoid pandas and keep dependencies minimal. This variant parses CSV text into a list of dictionaries and passes it to Model.data.

  1. Create a sample CSV file

    Create a file named sample.csv in your working directory (or anywhere you can reference by path) with the following contents:

    customer_id,name
    1,Alice
    2,Bob
  2. Load the CSV file with Model.data

    Parse the file with csv.DictReader, then call Model.data:

    import csv
    from pathlib import Path
    from relationalai.semantics import Model
    m = Model("MyModel")
    csv_path = Path("sample.csv")
    # If you created the file elsewhere, update the path.
    # Example: csv_path = Path("/absolute/path/to/sample.csv")
    with csv_path.open("r", encoding="utf-8", newline="") as f:
    rows = list(csv.DictReader(f))
    d = m.data(rows)
  3. Verify by selecting a column

    Use index access, like d["COL"], to reference a column in a query with Model.select:

    m.select(d["COL1"], d["COL2"]).to_df()
  • csv.DictReader returns strings for all values. If you need numeric types, convert values in Python before you call Model.data.
  • Always open the file with newline="" (as shown) so the csv module handles newlines consistently across platforms.
  • Model.data returns a Data object that behaves like a table with columns you can reference in queries and definitions.
  • Mapping CSV-backed columns into entities and relationships is a separate step.

A DataFrame source lets you reuse transformed in-memory tabular data as an input to model definitions. Choose this when you already have a pandas DataFrame from preprocessing, feature engineering, or notebook exploration.

  1. Wrap a DataFrame with Model.data

    Start from a DataFrame with stable column names, then call Model.data:

    import pandas as pd
    from relationalai.semantics import Model
    m = Model("MyModel")
    df = pd.DataFrame(
    [
    {"customer_id": 1, "name": "Alice"},
    {"customer_id": 2, "name": "Bob"},
    ]
    )
    d = m.data(df)
  2. Verify by selecting columns

    Select a couple of columns with Model.select to confirm the mapping is what you expect:

    m.select(d.customer_id, d.name).to_df()
  • Model.data returns a Data object that behaves like a table with columns you can reference in queries and definitions.
  • You can use either dot access (d.name) or bracket access (d["name"]) to reference columns. Prefer bracket access when a column name isn’t a valid Python identifier.
  • If results look surprising, check df.dtypes and normalize critical columns before you call Model.data.
  • Mapping DataFrame-backed columns into entities and relationships is a separate step.

Inline data is the fastest way to seed small, explicit rows for examples and tests. Choose this when you want the smallest possible repro without relying on external files or a database. Keep inline datasets small and schema-like so they don’t drift from your production sources.

Choose this variant when you want column names to come directly from your Python keys.

  1. Create a Data source from rows

    Call Model.data with a list of dictionaries:

    from relationalai.semantics import Model
    m = Model("MyModel")
    d = m.data(
    [
    {"name": "Alice", "age": 10},
    {"name": "Bob", "age": 30},
    ]
    )
  2. Preview the columns

    Query the columns with Model.select:

    m.select(d.name, d.age).to_df()
  • Model.data returns a Data object that behaves like a table with columns you can reference in queries and definitions.
  • You can use either dot access (d.name) or bracket access (d["name"]) to reference columns. Prefer bracket access when a column name isn’t a valid Python identifier.
  • If you have exactly one active model, you can also use the top-level data helper as a convenience wrapper around Model.data.
  • Mapping inline data columns into entities and relationships is a separate step.

Choose this variant when your data is naturally row-oriented and you want to provide the column names explicitly.

  1. Create a Data source and set column names

    Pass columns=[...] so your column names are stable and readable in later declarations:

    from relationalai.semantics import Model
    m = Model("MyModel")
    d = m.data(
    [(0, 72.5), (1, 71.9)],
    columns=["minute", "temperature"],
    )
  2. Preview the columns

    Preview the columns with Model.select:

    m.select(d.minute, d.temperature).to_df()
  • Model.data returns a Data object that behaves like a table with columns you can reference in queries and definitions. You can use either dot access (d.minute) or bracket access (d["minute"]) to reference columns.
  • If you omit columns for tuple rows, you can access columns by 0-based integer index, such as d[0] and d[1]. They are also exposed with the default names col0, col1, col2, … so you can write d.col0 or d["col0"] if you prefer.
  • Mapping inline data columns into entities and relationships is a separate step.

Model.Enum creates a small, fixed set of named constants inside your model. Choose this when you want values that behave like model entities (so you can store them, join on them, and query them) rather than one-off Python literals. Enum members are defined lazily the first time you reference them in a query or definition.

  1. Declare an enum type

    Define an enum by subclassing Model.Enum:

    from relationalai.semantics import Model
    m = Model("MyModel")
    class Status(m.Enum):
    ACTIVE = "ACTIVE"
    INACTIVE = "INACTIVE"
  2. Verify by selecting an enum member

    Reference an enum member in a query with Model.select:

    m.select(Status.ACTIVE).to_df()
  • If you only need a one-off constant, prefer a Python literal.
  • You can use enum members in queries and definitions just like other concepts and relationships. They are stored in the model and can be joined on, returned in results, and used in logic.

Choose the table that matches the source type you’re troubleshooting.

If you’re using a Snowflake-backed source with Model.Table():

SymptomLikely causeFix
Model.Table("DB.SCHEMA.OBJECT") or your first .select(...).to_df() call fails immediatelyThe table path or object name is wrong, or it resolves to a different object than you expectedCheck the fully qualified Snowflake path and confirm that the object exists in the database and schema you intended to use.
.select(...).to_df() fails with an access or permission errorYour Snowflake role does not have SELECT on the source table or view, even if change tracking is already enabledGrant SELECT on the source object, or switch to a role that already has it.
.select(...).to_df() fails because change tracking is not enabledPyRel reads Snowflake-backed sources through data streams, which require change tracking on the table or viewEnable change tracking on the table or view, or set data.ensure_change_tracking = true so PyRel can try to enable it automatically. Enabling change tracking requires OWNERSHIP. If you are not the owner, ask the table owner or a Snowflake admin to enable it for you.
Queries run, but recent Snowflake changes do not appear in resultsSource declaration succeeded, but the end-to-end sync path or query-time freshness settings are not behaving as you expect.Use Manage Data Shared With the RAI Native App to check CDC service status and data stream health. Then review Configure data sync behavior for data.wait_for_stream_sync and data.data_freshness_mins.
The Snowflake object resolves, but PyRel still cannot use it as a Snowflake-backed sourceThe source object type is not supported for this workflowUse a supported standard table or standard view instead. Do not use temporary tables, transient tables, dynamic tables, or external tables or views here. Snowflake-managed Iceberg tables are supported as a preview feature.

If you’re loading tabular data with Model.data():

SymptomLikely causeFix
A CSV loaded with pandas.read_csv() produces unexpected types in PyRelpandas.read_csv() inferred dtypes you did not wantPass dtype= to pandas.read_csv(), or normalize types before you call Model.data(). This matters most for columns such as IDs that should remain strings.
A CSV loaded with csv.DictReader keeps every value as a stringcsv.DictReader returns strings for all valuesConvert values in Python before you call Model.data() if you need numeric types.
Column references fail, or column names are not what you expected after loading CSV dataThe CSV headers need cleanup before you reference themClean up the column names before you reference them in definitions. For example, trim leading and trailing whitespace in pandas before you call Model.data().
d.some_column fails for a DataFrame source or inline Python rowsThe column name is not a valid Python identifierUse bracket access such as d["some column"]. Prefer bracket access whenever a column name is not a valid Python identifier.
Results from a DataFrame-backed source look surprisingThe underlying DataFrame dtypes are not what you expectedCheck df.dtypes and normalize critical columns before you call Model.data().
You loaded inline tuple rows, but the column names you expected are not availableYou omitted columns=[...], so PyRel exposed the tuple fields by index, with default names such as col0 and col1Pass columns=[...] for stable names, or access the data with d[0], d[1], d.col0, or d["col0"].