String interpolation in YAML with Python
String interpolation is not a feature of YAML. In this post, I will present a quick way to perform string interpolation in your configuration files written in YAML format. For that, I will use Jinja syntax to define the placeholders in key values and process the .yaml
files with Python.
Goal
My goal is to
merge several
.yaml
configuration files into a single configuration object,the configuration files are processed in order, and later configs could potentially overwrite the values of keys defined in previously processed configs,
use placeholders in values that reference the values of other keys in the same or different
.yaml
file.
Example
Let's see an example. If I have 2 .yaml
files that are loaded in the following order
project:
name: project
environment: dev
storage:
bucket: "{{ project.name }}-{{ project.environment }}-{{ aws.account_id }}"
project:
name: yaml-interpolation
aws:
account_id: "123456789"
user:
username: "codiply"
user_arn: "arn:aws:iam::{{ aws.account_id }}:user/{{ user.username }}"
storage_path: "s3://{{ storage.bucket }}/{{ user.username }}"
I want the final result to be a Python dictionary (benedict dictionary specifically) that contains the configuration of this YAML file
aws:
account_id: '123456789'
project:
environment: dev
name: yaml-interpolation
storage:
bucket: yaml-interpolation-dev-123456789
user:
storage_path: s3://yaml-interpolation-dev-123456789/codiply
user_arn: arn:aws:iam::123456789:user/codiply
username: codiply
The code
For the implementation, I am using benedict and Jinja2, specifically the following versions
python-benedict==0.32.1
Jinja2==3.1.2
The imports are
import re
import typing
from benedict import benedict
from jinja2 import BaseLoader, Environment
I work with two representations:
A list of strings, each string containing the content of a YAML file. The order of this list is important when there are duplicate keys.
A merged nested dictionary with all settings combined. This will serve as the "context" for doing the string interpolation.
For loading the YAML files and merging them into a single dictionary I use benedict
which already gives me the functionality for loading and merging dictionaries. The code is
def _merge_configs_to_dict(yaml_texts: typing.List[str]) -> benedict:
merged_config = benedict()
for text in yaml_texts:
config = benedict.from_yaml(text)
merged_config.merge(config, overwrite=True, concat=True)
return merged_config
Notice that contents are processed in the order they are passed in, and due to the setting overwrite=True
, duplicate keys are overwritten. The setting concat=True
controls the behaviour for key values that are lists. In this case, I am appending elements to the list if they exist in multiple configs, but you can choose to overwrite the whole list with the new list.
Once I have a context object loaded, I can attempt to render each one of the YAML texts with Jinja
def _render_jinja(text: str, context: benedict) -> str:
template = Environment(loader=BaseLoader(), autoescape=False).from_string(text)
return template.render(context)
def _render_yaml_texts(yaml_texts: typing.List[str], context: benedict) -> typing.List[str]:
return [_render_jinja(yaml_text, context) for yaml_text in yaml_texts]
To tell if there are more placeholders left in the YAML, it is easier to work with the text representation.
def _exists_string_to_interpolate(yaml_texts: typing.List[str]) -> bool:
for text in yaml_texts:
if "{{" in text:
return True
return False
The idea is to go back and forth between the two representations (YAML text and dictionary/context) making string interpolations until there are no more interpolations to be made. If there are cyclic dependencies, the stopping condition will never be met. For that reason, I set a maximum number of iterations and I stop after the maximum number of passes. I raise an exception if at that point there are still placeholders left.
def _combine_configs_with_string_interpolation(ordered_yaml_texts: typing.List[str], max_passes: int = 8) -> benedict:
yaml_texts = ordered_yaml_texts
pass_number = 1
while pass_number <= max_passes and _exists_string_to_interpolate(yaml_texts):
context = _merge_configs_to_dict(yaml_texts)
yaml_texts = _render_yaml_texts(yaml_texts, context)
pass_number += 1
if _exists_string_to_interpolate(yaml_texts):
remaining_expressions = _find_all_remaining_placeholders(yaml_texts)
raise Exception(
f"Unable to extrapolate all strings after {max_passes} passes. "
"Check for cyclic references. "
f"Remaining expressions are {', '.join(remaining_expressions)}."
)
return _merge_configs_to_dict(yaml_texts)
For better debugging of cyclic dependencies, I find and report all placeholders that have not been replaced with a value. This function is below
def _find_all_remaining_placeholders(yaml_texts: typing.List[str]) -> typing.List[str]:
remaining = set()
for text in yaml_texts:
remaining.update(re.findall("{{.*}}", text))
return list(remaining)
To load the YAML texts, given some paths, the code is
def _load_yaml_texts(ordered_paths: typing.List[str]) -> typing.List[str]:
yaml_texts = []
for path in ordered_paths:
if os.path.isfile(path):
with open(path, "r") as file:
yaml_texts.append(file.read())
return yaml_texts
def load_config(ordered_yaml_paths: typing.List[str]) -> benedict:
yaml_texts = _load_yaml_texts(ordered_yaml_paths)
config = _combine_configs_with_string_interpolation(yaml_texts)
return config
Finally, to create the dictionary for a set of config filenames, I do
config = load_config(["/some/path/a.yaml", "/some/path/b.yaml"])
Most likely this config
will be used in your Python code, so a dictionary is a good representation. Or you can pass the dictionary to a constructor of a more typed object.
If you wish to get the rendered config as a single YAML file, you can simply do
config.to_yaml()
and store the result in a file.
Limitations
There are a few limitations
You cannot reference a key that contains a list. This is not exactly a limitation, because the goal is to do string interpolation. If you are here because you need to reference a list, then you should most likely be looking into anchors and aliases that are part of the YAML specification.
Depending on how deep is the graph of references, 8 passes might not be sufficient. You can raise the number of maximum passes to a bigger number.
If you have a cyclic dependency and a high number of maximum passes, the code is going to construct very large strings.
To demonstrate the last point, with the simplest cyclic dependency, this is what happens at each step
# Original
section:
key1: "{{ section.key2 }}-a"
key2: "{{ section.key1 }}-b"
# 1st pass
section:
key1: "{{ section.key1 }}-b-a"
key2: "{{ section.key2 }}-a-b"
# 2st pass
section:
key1: "{{ section.key1 }}-b-a-b-a"
key2: "{{ section.key2 }}-a-b-a-b"
# 3rd pass
section:
key1: "{{ section.key1 }}-b-a-b-a-b-a-b-a"
key2: "{{ section.key2 }}-a-b-a-b-a-b-a-b"
# 4th pass
section:
key1: "{{ section.key1 }}-b-a-b-a-b-a-b-a-b-a-b-a-b-a-b-a"
key2: "{{ section.key2 }}-a-b-a-b-a-b-a-b-a-b-a-b-a-b-a-b"
and the strings for these 2 values grow exponentially in size.