Prometheus Alertmanager — Custom email Alert configuration & templating

Cloud Guy
5 min readMay 2, 2022

Reference:

https://gitlab.torproject.org/tpo/tpa/team/-/issues/40214

https://prometheus.io/docs/alerting/latest/configuration/

https://github.com/prometheus/alertmanager

Alertmanaget config:

To integrate the email alerts with Prometheus Alertmanager, the alertmanager config file should look like below:

global:
resolve_timeout: 1m
route:
receiver: 'email-notifications'
templates:
- '/etc/config/email.tmpl'
receivers:
- name: 'email-notifications'
email_configs:
- to: cloudguy2020@gmail.com
from: cloudguy2020@gmail.com
smarthost: smtp.gmail.com:587
auth_username: cloudguy2020@gmail.com
auth_identity: cloudguy2020@gmail.com
auth_password: xxxxxxxxxxxxxxxxx
send_resolved: true

The email address mentioned above to be updated along with the smtp details.

Custom email template:

In the above alertmanager.yaml file, the email template file mention with attribute below:

templates:
- '/etc/config/email.tmpl'

The custom email template (subject line too) is dependent on the Prometheus Alert configurations & its parameters.

{{ define "__subject" }} [{{ .Status | toUpper }}{{ if eq .Status "firing" }}:{{ .Alerts.Firing | len }}-{{ .CommonLabels.alertname }}-{{ .CommonLabels.severity }}-{{ .CommonLabels.instance }}{{ end }}]{{ end}}{{ define "email.custom.txt" }}{{ if gt (len .Alerts.Firing) 0 -}}
Total firing alerts: {{ .Alerts.Firing | len }}
Total resolved alerts: {{ .Alerts.Resolved | len }}
{{ end }}
## Firing Alerts
{{ range .Alerts.Firing }}
-----
Time: {{ .StartsAt }}
Summary: {{ if .Annotations.summary}} {{.Annotations.summary}} {{end}}
Description: {{ if .Annotations.description}} {{.Annotations.description}} {{end}}
-----
{{ end }}
##Resolved Alerts
{{ range .Alerts.Resolved }}
-----
Time: {{ .StartsAt }} -- {{ .EndsAt }}
Summary: {{ if .Annotations.summary}} {{.Annotations.summary}} {{end}}
Description: {{ if .Annotations.description}} {{.Annotations.description}} {{end}}
-----
{{ end }}
{{end}}

So whenever the email contents & subject lines to be changed, the required “labels” to be used accordingly. In the above case “alertname”, “severity” & “instance” labels have been used. If we want to enrich the email alerts, the data should be generated through Prometheus alerting rules (covered later).

Now the custom email alerts looks like — -

De grouping of alerts:

The default configuration of Alertmanager groups all the alerts generated by Prometheus & sends these in through a single email. & in that case the cluster admins or support team will have a tough time to concentrate on the critical issue. So sometime it is required to de-group the alerts & so that individual email is triggered for each & every alerts.

De grouping of alerts (email/slack) can be achieved by adding the “group_by” under the “route” properties.

For more details about the other “group_by” options, please refer:https://prometheus.io/docs/alerting/latest/configuration/#route

route:
receiver: 'gmail-notifications'
group_by: [...]

Extra/Custom Labels in alerts:

Extra/Custom labels can be added through the Prometheus alerting rules & the same information can be provided through the alerts messages (slack/email etc.)

Sample alerting configuration in Prometheus (alerting-rules.yaml)

      - alert: AppABCDown
expr: absent(up{app="app-ABC"}) == 1
for: 0m
labels:
severity: critical
service: app-ABC-service
application: ABC
messagecode: PodDown
annotations:
summary: app ABC service missing (instance {{ $labels.instance }})
description: "ABC POD/service has disappeared. An app service component might be crashed.\n VALUE = {{ $value }}\n LABELS = {{ $labels }}"
- alert: AppXYZDown
expr: absent(up{app="app-XYZ"}) == 1
for: 0m
labels:
severity: critical
service: app-XYZ-service
application: XYZ
messagecode: PodDown
annotations:
summary: app XYZ service missing (instance {{ $labels.instance }})
description: "XYZ POD/service has disappeared. An app service component might be crashed.\n VALUE = {{ $value }}\n LABELS = {{ $labels }}"

In the above configuration snippet, you can see an extra label “messagecode” has been added.

Once added, the same information can be found in the Prometheus dashboard ( →Alerts):

Alerting configuration

Extra/Custom Labels in email alerts:

As some extra/custom labels (messagecode etc.) got added, the same can be provided to the OPS/Concerned team through the automated alerts.

Update the email.tmpl file to accommodate these custom labels:

{{ define "__subject" }} [{{ .CommonLabels.application }}-{{ .CommonLabels.severity }}-{{ .CommonLabels.messagecode }}-{{ .CommonLabels.service }}{{ end }}]{{ define "email.custom.txt" }}{{ if gt (len .Alerts.Firing) 0 -}}
Total firing alerts: {{ .Alerts.Firing | len }}
Total resolved alerts: {{ .Alerts.Resolved | len }}
{{ end }}
## Firing Alerts
{{ range .Alerts.Firing }}
-----
Time: {{ .StartsAt }}
Summary: {{ if .Annotations.summary}} {{.Annotations.summary}} {{end}}
Description: {{ if .Annotations.description}} {{.Annotations.description}} {{end}}
-----
{{ end }}
##Resolved Alerts
{{ range .Alerts.Resolved }}
-----
Time: {{ .StartsAt }} -- {{ .EndsAt }}
Summary: {{ if .Annotations.summary}} {{.Annotations.summary}} {{end}}
Description: {{ if .Annotations.description}} {{.Annotations.description}} {{end}}
-----
{{ end }}
{{end}}

Now the email alerts contains these custom labels:

email alerts

Using default labels’ value into custom labels:

To keep the “email.tmpl” consistent, it is recommended to use the values of default labels to the custom labels. This would help to make the email template uniform.

Same can be achieved like below:

service: '{{ $labels.instance }}'

Sample:

- alert: AppABCDown
expr: absent(up{app="app-ABC"}) == 1
for: 0m
labels:
severity: critical
service: '{{ $labels.app }}'-service
application: '{{ $labels.app }}'
messagecode: PodDown
annotations:
summary: app ABC service missing (instance {{ $labels.instance }})
description: "ABC POD/service has disappeared. An app service component might be crashed.\n VALUE = {{ $value }}\n LABELS = {{ $labels }}"
- alert: AppXYZDown
expr: absent(up{app="app-XYZ"}) == 1
for: 0m
labels:
severity: critical
service: '{{ $labels.app }}'-service
application: '{{ $labels.app }}'
messagecode: PodDown
annotations:
summary: app XYZ service missing (instance {{ $labels.instance }})
description: "XYZ POD/service has disappeared. An app service component might be crashed.\n VALUE = {{ $value }}\n LABELS = {{ $labels }}"

In this example, the custom labels “service” & “application” are using the value of the system generated label “app”.

Alerting configuration

Happy learning Prometheus & stay tuned for more articles on the same topic.

--

--