Rego 103: Types of values and rules

Written by:

November 16, 2023

0 mins read

This blog post series offers a gentle introduction to Rego, the policy language from the creators of the Open Policy Agent (OPA) engine. If you’re a beginner and want to get started with writing Rego policy as code, you’re in the right place.

In this three-part series, we’ll go over the following:

Part 1: Rego 101: Introduction to Rego
Part 2: Rego 102: Combining queries with AND/OR and custom messages
Part 3 (this part!): Types of values and rules

As a reminder, Rego is a declarative query language from the makers of the Open Policy Agent (OPA) framework. The Cloud Native Computing Foundation (CNCF) accepted OPA as an incubation-level hosted project in April 2019, and OPA graduated from incubating status in 2021.

Rego is used to write policy as code, which applies programming practices such as version control and modular design to the evaluation of cloud and infrastructure as code (IaC) resources. OPA is the engine that evaluates policy as code written in Rego. And Snyk uses the Rego language for custom rules.

Part 2 recap

In Part 2, we showed you how to use the following:

AND and OR rules
The default keyword
The not keyword
Custom deny messages

In this part, we'll round out the series by focusing on set rules, object rules, functions, and iteration.

Types of values

In Rego, a value is a representation of some kind of data. Each value is of a specific type. Rego types fall into two categories: scalar and composite.

Scalar values represent a single unit of data and include the following types:

Strings are surrounded by double quotes.
Numbers include positive and negative integers and decimals.
Booleans can only be true or false.
null represents the absence of a value.

Composite values represent a collection of values and include the following types:

Arrays
Objects
Sets

If you've been reading our blog post series, you've already seen several examples of scalar and composite values. Composite values are a little more complex, so we'll take some time to dig into them.

Composite values

Arrays are ordered lists of one or more values surrounded by brackets. You can have an array of strings, an array of numbers, an array of arrays, an array of mixed types, and so on:

["alice", "bob", "carlotta"]
[2, -5, 3.8]
[[1, 2, 3], [4, 5, 6]]
[true, "banana", 17]

You can access any element inside an array by referring to its index or position in the array. Indexes always start at 0, so the first item in an array has the index 0, the second has the index 1, and so on, as shown in the users array below:

1users := ["alice", "bob", "carlotta"]
2             0       1        2

If you want to retrieve the first item in the users list, you'd use this syntax:

1users[0]  # evaluates to "alice"

If you want to assign the second item to the variable admin, you'd refer to it like so:

1admin := users[1]  # "bob" is assigned to admin

If you use the syntax users[3] to try to retrieve a (nonexistent) fourth item in the list, OPA will not find any matches, and the output will be an empty set {} (undefined).

Objects are unordered lists of one or more key-value pairs surrounded by curly braces. A key and value can be of any type, and the types don't have to match. Each key and its corresponding value are joined by a colon:

{ "alice": "admin", "bob": "user", "carlotta": "user" }
{ "ports": [80, 443] }
{ 80: true }

You can access the value of an object by specifying its key. For example, the users object below contains three key-value pairs:

1users := { "alice": "admin", "bob": "user", "carlotta": "user" }

In a query, to retrieve the value of the key-value pair with the key "alice", you'd use this syntax:

1users["alice"]  # evaluates to "admin"

Sets are unordered lists of one or more unique values, also surrounded by curly braces. Values can be of any type:

{1, 2, 3}
{"alice", "bob", "carlotta"}

Because sets are unordered, two sets can be equal if they have the same elements, even if they are in a different order. For example, {1, 2, 3} is equal to {2, 3, 1}.

You can look up whether an element is in a set. Let's say you have the following set:

1nums := {1, 2, 3}

In a query, to look up whether the integer 4 is in the nums set, you'd use this syntax:

1nums[4]  # does not evaluate to true

Types of rules

There are several different types of rules in Rego:

Complete rules that produce a single result.
Rules that generate sets.
Rules that generate objects.
Functions, which are actually a little different from rules.

Each of these can have a body consisting of queries. The difference is how the queries are constructed and what information they return. In the next section, we'll explain each type of rule, starting with complete rules.

Complete rules

If you've been reading this series, you've seen a complete rule already:

1allow := true {
2  input.user == "alice"
3}

A complete rule assigns a single value to a variable. Above, the variable allow would be assigned the value true if the condition in the query is met.

Here's another example, this time with the number 80 being assigned to the variable allowed_port if the condition in the query is met:

1allowed_port := 80 {
2  input.account_id != "123456789012"
3}

A complete rule can have one query, as above, or it can have multiple queries, like we've seen in another example from Part 2:

1allow := true {
2  input.user == "alice"
3  input.environment == "prod"
4}

Constants

What if you want a variable to hold a particular value no matter what? This is called a constant in other programming languages.

You could write it like this:

1pi := 3.14 {
2  true
3}

Because the condition in the query is always met, it evaluates to true no matter what — the variable pi always has the value 3.14.

Thanks to some syntactic sugar, you can then leave off the body altogether:

1pi := 3.14

The above is a complete rule, and no matter where you refer to pi in the program, it always represents 3.14.

Set and object comprehensions

There are many cases where you might want to assign the variable a collection of values, and that's where set comprehensions and object comprehensions come in.

Set comprehensions

A set comprehension adds elements to a variable one at a time, producing a set. Occasionally, you may want to iterate through the input file to generate a set of all the values that meet certain conditions. You can write a set comprehension to account for that.

Suppose you have this input document representing an array of all the users who are currently logged in across a system:

1{
2  "users": [
3    "alice",
4    "bob",
5    "carlotta"
6  ]
7}

Remember the company policy that says only Alice has administrative permissions? Let's say you want to create a unique list (set) of all the non-administrative users who are logged in.

You could generate a set like this:

1nonadmins[name] {
2  name := input.users[i]
3  name != "alice"
4}

To explain this, we'll look at the rule body first, then the head.

Rule body: This instructs OPA to scan the input.users array for elements that don't match "alice".

How does this work? We mentioned earlier that you refer to an item in an array by its index. We use the variable i here to represent the index (though you could name it something else if you like), because it's incremented on each pass through the input. When OPA runs the queries, it iterates through the users list by substituting i for the index of each item to grab each element, one at a time. If you're familiar with imperative programming, this is similar to how an imperative loop works, though it's not quite the same.

The first time OPA passes through the users array in the input, this is what happens behind the scenes:

1name := input.users[0]  # Evaluates to "alice"
2name != "alice"  # This condition is not fulfilled, so OPA discards the username and moves on.

On the second pass, this is what happens:

1name := input.users[1]  # Evaluates to "bob"
2name != "alice"  # This condition evaluates to true, so OPA adds the list item -- the username "bob" -- to the nonadmins set.

On the third pass, this is what happens:

1name := input.users[2]  # Evaluates to "carlotta"
2name != "alice"  # This condition also evaluates to true, so OPA adds the list item -- the username "carlotta" -- to the nonadmins set.

Rule head: The variable nonadmins refers to the set itself, and the variable name represents each unique name in the set (which in the body is input.users[i], as explained above). When OPA follows the logic in the body, if the current value of name matches all the conditions, that value is added to the nonadmins set.

Put it all together: To sum up, on each pass through the list of users, OPA adds the current input.users[i] value to the nonadmins[name] set if the value fulfills all the conditions listed in the queries.

The result is this nonadmins set:

1{
2  "nonadmins": [
3    "bob",
4    "carlotta"
5  ]
6}

And we can see that the non-admin users on duty are Bob and Carlotta!

Iteration

A note about iteration in Rego — iteration is implicit. There are no "while x == true" or "for y in z" loops here, in contrast to other languages such as Python. Instead, you iterate through an array, set, or object by using a variable instead of an array index, set element, or object key, as we've done with i below in the input.users array:

1name := input.users[i]

Because we've put a variable inside the brackets, OPA knows that we aren't referring to a single particular value — we're looking at all of them, one at a time.

This is a bit of an adjustment if you're used to imperative loops, but it's succinct! The above is equivalent to the following Python expression:

1for i in users:
2  name = i

Or, if you prefer:

1for i in range(0, len(users)):
2  name = users[i]

The underscore operator

If you only need to refer to the index once, you can use the wildcard operator (an underscore) instead of a named variable like i:

1nonadmins[name] {
2  name := input.users[_]
3  name != "alice"
4}

When used as an iterator (like the variable i), the wildcard operator represents any value in an array, any element in a set, or any key in an object. In effect, the wildcard is a variable without a name — a throwaway variable. In the rule above, OPA checks whether any name in the input.users array is not equal to "alice" (and if so, OPA adds it to the nonadmins set). The end result is exactly the same as if you'd used name := input.users[i].

On the other hand, if you need to keep track of the index in a rule, you'll want to use a named variable, as below:

1deny[msg] {
2  input.users[i] != "alice"
3  msg := sprintf("User %v is denied access", [input.users[i]])
4}

When used with the following input document…

1{
2  "users": [
3    "alice",
4    "bob",
5    "carlotta"
6  ]
7}

… you'd see the following output:

1{
2  "deny": [
3    "User bob is denied access",
4    "User carlotta is denied access"
5  ]
6}

In this case, we need to use a named variable because we want to make sure the name we're checking in the query input.users[i] != "alice" is the same name we're printing in the msg query. Therefore, it's important to keep track of the index. This is easier to understand if you look at what OPA is doing behind the scenes, where it's substituting an index for the variable i. Here's an example of one iteration through the input.users array:

1deny[msg] {
2  input.users[2] != "alice"
3  msg := sprintf("User %v is denied access", [input.users[2]])
4}

We need to use input.users[2] in both queries to make sure we're referring to the value at the same index (in this case, "carlotta").

Object comprehension

An object comprehension adds elements to a variable one at a time, producing an object similar to how set rules generate sets. But while the purpose of generating a set is to create a collection of unique values, an object rule's goal is to produce a collection of key-value pairs.

The process is very similar to writing a set rule. However, in an object rule, you specify the value part of the key-value pair in the rule head:

1nonadmins[name] := "logged-in" {
2  name = input.users[i]
3  name != "alice"
4}

This example is exactly the same as our first set rule example, except this one declares the value of each key-value pair to be "logged-in".

Let's use the same input document:

1{
2  "users": [
3    "alice",
4    "bob",
5    "carlotta"
6  ]
7}

When we evaluate the rule against the input above, the output is this:

1{
2  "nonadmins": {
3    "bob": "logged-in",
4    "carlotta": "logged-in"
5  }
6}

The result is a nonadmins object containing two key-value pairs. Each pair has a username as the key and "logged-in" as the value.

Functions

Functions in Rego are like functions in other languages: they present a modular, reusable way to instruct the program to do something. Function syntax is similar to rule syntax, and they both declare queries in the same way, but a function includes a parameter (which serves as a placeholder for a real argument) surrounded by parentheses.

For example, the function below takes the value of x, doubles it, and assigns the resulting value to y.

1double_function(x) := y {
2  y := x + x
3}

Elsewhere in the package, you can call it by providing an argument for it to operate on like so:

1z := double_function(2)

And z would evaluate to 4.

You could call it again by passing in argument 12 and assigning the result to foo, and foo would evaluate to 24:

1foo := double_function(12)

You can use a function when there's some very specific task you need to carry out multiple times, especially within other functions. This is useful when you want cleaner, more modular code. If there's a task you'll do repeatedly with different inputs, you can write a function for it.

You can also write "helper" functions that are used in other functions. Below, the allow variable is assigned the value true if all elements in the input.tags array are valid. To determine whether an element is valid, the helper functions is_lowercase_value and is_long_enough check whether a string argument is lowercase or the right length, respectively. They're both used inside the function is_valid, which is called from allow:

1is_lowercase_value(tag) {
2  lower(tag) == tag
3}
4
5is_long_enough(tag) {
6  count(tag) >= 3
7}
8
9is_valid(tag) {
10  is_lowercase_value(tag)
11  is_long_enough(tag)
12}
13
14allow {
15  tag := input.tags[i]
16  is_valid(tag)
17}

Evaluating example rules with OPA

Let's experiment with set rules, object rules, and functions in Rego. As in previous blog posts, we will focus on two ways of interacting with OPA:

Using the OPA Playground
Using OPA’s command line tool

For instructions on using these interfaces, see Part 1.

Once again, we're using more of a real-world example involving a Kubernetes pod. Here's the JSON manifest we will use as input:

1{
2  "apiVersion": "v1",
3  "kind": "Pod",
4  "metadata": {
5    "name": "mypod",
6    "labels": {
7      "stage": "prod"
8    }
9  },
10  "spec": {
11    "shareProcessNamespace": true,
12    "containers": [
13      {
14        "name": "myapp1",
15        "image": "myapp1:latest"
16      },
17      {
18        "name": "myapp2",
19        "image": "myapp2"
20      }
21    ]
22  }
23}

And here is the policy we'll be evaluating it against, enforcing the corporate requirement "Containers in production-stage pods should not use the latest image":

1is_labeled_prod(labels) {
2  labels.stage == "prod"
3} {
4  labels.stage == "production"
5}
6
7latest_containers[container] {
8  container := input.spec.containers[_]
9  endswith(container.image, ":latest")
10}
11
12deny[msg] {
13  input.kind == "Pod"
14  is_labeled_prod(input.metadata.labels)
15  container = latest_containers[_]
16  msg := sprintf("Container %v is using latest image on prod", [container.name])
17}

This set of rules demonstrates several concepts we've discussed in this blog post, such as functions, set rules, iteration, and the underscore operator. Here's how it all works:

is_labeled_prod(labels) — This is a function that returns true if the set of labels passed to it includes a "stage" label with the value "prod" or "production".

latest_containers[container] — This is a set rule. OPA uses the underscore operator as an iterator to check if any of the containers in the input uses the latest image, and if so, adds it to the latest_containers set. We determine if an image is the latest by checking if the image name ends in ":latest".

deny[msg] — This is also a set rule! It's a more advanced version of the deny[msg] set rule we showed you in Part 2. It has three queries, for which OPA takes the following actions:

Checks if the input.kind is "Pod".
Calls the is_labeled_prod function by passing in input.metadata.labels to check if there are any "stage" labels with the value "prod" or "production".
Iterates through the latest_containers set to check for any containers in the set.

IF the above three queries evaluate to true (i.e., OPA finds a match in the input for each condition), THEN OPA adds a custom message to the deny set listing the name of the non-compliant container.

For your convenience, we've created a playground with this content already: https://play.openpolicyagent.org/p/HgHE4w2b4y

If you evaluate the rules by selecting the Evaluate button in the playground or by executing a command such as opa eval -i input.json -d check_prod_pod.rego "data.rules.check_prod_pod" --format pretty if running OPA locally, you'll see this output:

1{
2  "deny": [
3    "Container myapp1 is using latest image on prod"
4  ],
5  "latest_containers": [
6    {
7      "image": "myapp1:latest",
8      "name": "myapp1"
9    }
10  ]
11}

The first item in the output is the deny set, showing a message that the myapp1 container is not compliant with our policy. You can also see the elements in the latest_containers set, which includes the name and image for each container — in this case, the only container is myapp1.

Let's see what happens if we change the image name for myapp2 to myapp2:latest (line 19). If we evaluate the rules again, the deny set now includes a message for myapp1 and myapp2, and the latest_containers set includes both containers:

1{
2  "deny": [
3    "Container myapp1 is using latest image on prod",
4    "Container myapp2 is using latest image on prod"
5  ],
6  "latest_containers": [
7    {
8      "image": "myapp1:latest",
9      "name": "myapp1"
10    },
11    {
12      "image": "myapp2:latest",
13      "name": "myapp2"
14    }
15  ]
16}

Finally, let's remove the "stage": "prod" line (line 7) so the input looks like this:

1    "labels": {
2    }

If we evaluate the rules now, we can see that latest_containers still includes both containers, but the deny set is empty:

1{
2  "deny": [],
3  "latest_containers": [
4    {
5      "image": "myapp1:latest",
6      "name": "myapp1"
7    },
8    {
9      "image": "myapp2:latest",
10      "name": "myapp2"
11    }
12  ]
13}

Because the pod isn't labeled for production, it's OK (per corporate policy) that its containers use the latest image. Therefore, the pod is compliant!

What’s next?

Congratulations! You've made it through our three-part blog series about writing Rego.

If you'd like to learn more, here are some useful resources:

If you’re interested in using Rego to write custom rules for Snyk IaC check out our documentation here. In addition to Snyk’s built-in security and compliance-mapped rulesets, IaC+ custom rules enable you to set customized security controls across your SDLC.

IaC+ gives you a single view and controls for your configuration issues from code to cloud with an issues UI, ruleset, and policy engine spanning IDE, SCM, CLI, CI/CD, Terraform Cloud, and deployed cloud environments such as AWS, Azure, and Google Cloud.

Get started in capture the flag

Learn how to solve capture the flag challenges by watching our virtual 101 workshop on demand.

Watch now

The developer security platform