# Match text with patterns

Use [`std.re`](/api/python/latest/semantics/std/re/) to filter facts and validate string formats with regex-based checks in PyRel.
This guide covers using `std.re.match()` and `std.re.fullmatch()` to check for patterns, using match results in conditions and derived facts, and understanding current limitations.

:::prereqs

- PyRel is installed and importable in Python.
  See [Set Up Your Environment](/build/setup) for instructions.
- You have a `Model` instance with declared concepts and relationships.
  See [Create a Model Instance](/build/guides/modeling/create-model-instance), [Declare Concepts](/build/guides/modeling/declare-concepts), and [Declare Relationships and Properties](/build/guides/modeling/declare-relationships-and-properties).
- You are comfortable deriving facts with `Model.define()` and filtering with `Model.where()`.
  See [Derive facts with logic](/build/guides/modeling/derive-facts-with-logic).

:::

## Understand how `std.re` matches work

The functions in the `std.re` module let you apply a regular expression to a string expression in your model.

Here’s what you need to know:

- Unlike Python’s `re` functions, `std.re.match()` and `std.re.fullmatch()` do not return booleans.
  They return [`RegexMatch`](/api/python/latest/semantics/std/re/regexmatch_class) expressions.
- A `RegexMatch` filters to only the facts where the match exists.
  If there is no match, the expression is missing and the fact is filtered out.
- You can use `RegexMatch` expressions in conditions just like any other expression in your model.

The following sections in this guide show how to use `std.re` functions and work with `RegexMatch` expressions in your definitions and queries.

:::related

- [`std.re` API reference](/api/python/latest/semantics/std/re)
- [`RegexMatch` API reference](/api/python/latest/semantics/std/re/regexmatch_class)
- [Work with strings](/build/guides/reasoning/rules-based/work-with-strings)

:::

## Match prefixes with `re.match`

Use [`re.match()`](/api/python/latest/semantics/std/re/match) when you want prefix-like logic.
`re.match()` checks for a match at the start of the string.

This example derives a `CriticalTicket` concept for subjects that start with the literal tag `[P0]`:

```python
from relationalai.semantics import Integer, Model, String
from relationalai.semantics.std import re

m = Model("SupportModel")

Ticket = m.Concept("Ticket")
CriticalTicket = m.Concept("CriticalTicket", extends=[Ticket])

m.define(
    Ticket.new(id=201, subject="[P0] Outage: login"),
    Ticket.new(id=202, subject="Re: [P0] outage"),
    Ticket.new(id=203, subject="[P1] Degraded performance"),
)

m.define(CriticalTicket(Ticket)).where(re.match(r"\[P0\]", Ticket.subject))

df = m.select(CriticalTicket.id, CriticalTicket.subject).to_df()
print(df)
```

:::in_this_example

- The pattern escapes `[` and `]` because they are regex metacharacters.
- `CriticalTicket(Ticket)` is a concept membership check that you can reuse in other definitions.

:::

## Validate whole strings with `re.fullmatch`

Use `re.fullmatch()` when the entire string must match an expected format.
If you use `re.match()` for validation, it can accept strings with trailing junk.

This example validates a strict external ID format and defines `Ticket.has_valid_external_id` when it matches:

```python
from relationalai.semantics import Integer, Model, String
from relationalai.semantics.std import re

m = Model("SupportModel")

Ticket = m.Concept("Ticket", identify_by={"id": Integer})
Ticket.external_id = m.Property(f"{Ticket} has external id {String:external_id}")
Ticket.has_valid_external_id = m.Relationship(f"{Ticket} has valid external id")
Ticket.external_id_validity = m.Property(f"{Ticket} external id validity is {String:validity}")

m.define(
    Ticket.new(id=301, external_id="INC-2026-0042"),
    Ticket.new(id=302, external_id="REQ-2025-0007"),
    Ticket.new(id=303, external_id="INC-2026-0042-extra"),
    Ticket.new(id=304, external_id="inc-2026-0042"),
)

EXTERNAL_ID_RE = r"(?:INC|REQ)-\d{4}-\d{4}"

m.define(Ticket.has_valid_external_id(Ticket)).where(
    re.fullmatch(EXTERNAL_ID_RE, Ticket.external_id)
)

m.define(Ticket.external_id_validity("valid")).where(Ticket.has_valid_external_id(Ticket))
m.define(Ticket.external_id_validity("invalid")).where(m.not_(Ticket.has_valid_external_id(Ticket)))

df = m.select(
    Ticket.id,
    Ticket.external_id,
    Ticket.external_id_validity,
).to_df()
print(df)
```

:::in_this_example

- `re.fullmatch(...)` rejects values with trailing junk like `"INC-2026-0042-extra"`.
- `Ticket.has_valid_external_id` is a unary relationship you can reuse as an existence check.
- `Ticket.external_id_validity` makes the result easy to inspect in a verification query.

:::

## Derive facts from pattern matches

When multiple definitions depend on the same pattern logic, define a derived fact early and reuse it.
This keeps your patterns in one place and makes it easier to verify and change them.

This example derives ticket categories from subject patterns, then reuses those categories to derive a routing queue:

```python
from relationalai.semantics import Integer, Model, String
from relationalai.semantics.std import re

m = Model("SupportModel")

Ticket = m.Concept("Ticket", identify_by={"id": Integer})
Ticket.subject = m.Property(f"{Ticket} has subject {String:subject}")

BillingTicket = m.Concept("BillingTicket", extends=[Ticket])
OutageTicket = m.Concept("OutageTicket", extends=[Ticket])
PriorityOutageTicket = m.Concept("PriorityOutageTicket", extends=[Ticket])

Ticket.queue = m.Property(f"{Ticket} routes to queue {String:queue}")

m.define(
    Ticket.new(id=401, subject="Billing: invoice mismatch"),
    Ticket.new(id=402, subject="[P0] Outage: login"),
    Ticket.new(id=403, subject="Customer outage report (urgent)"),
    Ticket.new(id=404, subject="Question about pricing"),
)

is_billing = re.match(r"(?i)billing:", Ticket.subject)
is_outage = re.match(r"(?i).*\boutage\b.*", Ticket.subject)

m.define(BillingTicket(Ticket)).where(is_billing)
m.define(OutageTicket(Ticket)).where(is_outage)
m.define(PriorityOutageTicket(Ticket)).where(OutageTicket(Ticket), re.match(r"\[P0\]", Ticket.subject))

m.define(Ticket.queue("billing")).where(BillingTicket(Ticket))
m.define(Ticket.queue("incident")).where(OutageTicket(Ticket))
m.define(Ticket.queue("general")).where(m.not_(BillingTicket(Ticket) | OutageTicket(Ticket)))

df = m.select(
    Ticket.id,
    Ticket.subject,
    Ticket.queue,
).to_df()
print(df)
```

:::in_this_example

- `BillingTicket(Ticket)` and `OutageTicket(Ticket)` turn pattern checks into reusable derived facts.
- `PriorityOutageTicket(Ticket)` reuses `OutageTicket(Ticket)` so the “outage” regex stays in one place.
- The queue routing is derived from those match-based facts, not from duplicated regex patterns.

:::

## Use match positions with `RegexMatch.span`

`RegexMatch` exposes position metadata you can use in downstream logic.
This is useful when a prefix/suffix check is not enough and you need offsets to reason about structure.

This example matches a bracketed severity tag and computes its span:

```python
from relationalai.semantics import Integer, Model, String
from relationalai.semantics.std import re

m = Model("SupportModel")

Ticket = m.Concept("Ticket", identify_by={"id": Integer})
Ticket.subject = m.Property(f"{Ticket} has subject {String:subject}")
Ticket.tag_length = m.Property(f"{Ticket} has severity tag length {Integer:length}")

m.define(
    Ticket.new(id=501, subject="[P0] Outage: login"),
    Ticket.new(id=502, subject="Outage without a tag"),
    Ticket.new(id=503, subject="[P12] Long tag example"),
)

tag_match = re.match(r"\[[A-Z0-9]+\]", Ticket.subject)

start = tag_match.start()
end = tag_match.end()

m.define(Ticket.tag_length(end - start + 1)).where(tag_match)

df = m.select(
    Ticket.id,
    Ticket.subject,
    start.alias("tag_start"),
    end.alias("tag_end"),
    Ticket.tag_length,
    tag_match.span().alias("tag_span"),
).to_df()
print(df)
```

:::in_this_example

- `RegexMatch.start()` and `RegexMatch.end()` return 0-based indexes.
- In `std.re`, `RegexMatch.end()` is inclusive.
  The matched length is `end - start + 1`.
- If there is no match, the match expression and its derived values are missing.

:::

## Know current `std.re` limitations

`std.re` is not a drop-in replacement for Python’s `re` module.
In particular, several helpers are present in the API reference but are not implemented yet.

- Not implemented: `re.search()`, `re.findall()`, and `re.sub()`.
- Not implemented: capture-group helpers on `RegexMatch` (`RegexMatch.group()`, `RegexMatch.group_by_name()`).
- Supported today: `re.match()`, `re.fullmatch()`, and match position metadata (`RegexMatch.start()`, `RegexMatch.end()`, `RegexMatch.span()`).
- Not supported: `re.compile()` or reusable compiled-pattern objects.

If you need search/substitution or capture-group extraction, keep the regex step upstream (for example in your ingestion pipeline), then define on already-normalized and already-parsed string fields.

:::related

- [`std.re` API reference](/api/python/latest/semantics/std/re)
- [`re.search()` API reference](/api/python/latest/semantics/std/re/search)
- [`re.findall()` API reference](/api/python/latest/semantics/std/re/findall)
- [`re.sub()` API reference](/api/python/latest/semantics/std/re/sub)

:::