Amazon EMR, ALB & Me.
Amazon EMR, ALB & Me.
Amazon EMR is the industry-leading cloud big data platform for processing vast amounts of data using open source tools such as Apache Spark, Apache Hive, Apache HBase, Apache Flink, Apache Hudi, and Presto. To ensure traffic to EMR is secured using Transport Layer Security, an AWS Application Load Balancer is required.
My preferred way to deploy Amazon EMR (covered in another post), along with the Application Load Balancer and any dependencies is using Terraform. The below Terraform code is tested & deployed using:
Terraform v0.12.29
+ provider.aws v3.15.0
+ provider.random v3.0.0
I’ve deployed EMR with the following applications and ports:
- hadoop-hdfs-namenode - 50070
- hadoop-hdfs-datanode - 50075
- hbase - 16010
- hue - 8888
- jupyterhub - 9443
- livy - 8998
- spark - 18080
- tez - 8080
- yarn-node-manager - 8042
- yarn-resource-manager - 8088
- zeppelin - 8890
Prerequisites
- Terraform v0.12.29
- An AWS account in good standing
- AWS ACM (Amazon Certificate Manager issued, that covers the DNS namespace) - see line 6
Manifest
- AWS Application Load Balancer
- AWS security group for ingress traffic for AWS Application Load Balancer
- AWS Application Load Balancer listener (HTTPS)
- AWS Application Load Balancer listener rules (see above applications created)
- AWS Application Load Balancer target groups (see above applications created)
Terraform AWS Provider
- AWS Application Load Balancer
- AWS security group for ingress traffic for AWS Application Load Balancer
- AWS Application Load Balancer listener (HTTPS)
- AWS Application Load Balancer listener rules
- AWS Application Load Balancer target groups
Variables
Perform the following to correctly set the variables.
- Retrieve the code below, or adapt it however you see fit. (see below)
- Set your previously set AWS ACM certificate value in line #7,
domain = "*.troydieter.com"
for example - Set the top level domain, in line #13 - for example,
default = "emr.troydieter.com"
- Set the
aws-profile
variable to retrieve the credentials profile accordingly, in line #50 - otherwise it will use the default value. - Set the
aws_region
variables if not already us-east-1 in line #55 - Set the
environment
variable in line #61 accordingly
Apply
- Use
terraform plan
to ensure the variables set are correct - Use
terraform apply
to apply
Terraform Code
1# EMR Load Balancer Creation
2# www.troydieter.com
3
4# Certificate and domain
5
6data "aws_acm_certificate" "wildcard-cert" {
7 domain = "*.example.com"
8 statuses = ["ISSUED"]
9}
10
11variable "domain" {
12 type = string
13 default = "emr.example.com"
14 description = "The top level domain used for EMR"
15}
16
17resource "random_id" "lb-rand" {
18 byte_length = 2
19}
20
21provider "aws" {
22 profile = var.aws-profile
23 region = var.aws_region
24}
25
26# Tags
27
28locals {
29 emr-tags = {
30 "parent_app" = var.application
31 "environment" = var.environment
32 }
33}
34
35# Data sources
36# Used for the default target group, send traffic to the NameNode
37
38data "aws_lb_target_group" "emr-namenode" {
39 name = "hadoop-hdfs-namenode-${random_id.lb-rand.hex}"
40 depends_on = [ aws_lb_target_group.emr-tg ]
41}
42
43# Variables
44
45variable "application" {
46 type = string
47 default = "EMR"
48}
49
50variable "aws-profile" {
51 type = string
52 description = "AWS Profile used to deploy with"
53}
54
55variable "aws_region" {
56 type = string
57 default = "us-east-1"
58 description = "Region"
59}
60
61variable "environment" {
62 type = string
63 default = "dev"
64 description = "Environment you're deploying with"
65}
66
67variable "vpc_id" {
68 type = string
69 description = "The VPC ID that the load balancer deploys to"
70}
71
72variable "cidr_block" {
73 type = string
74 default = "0.0.0.0/0"
75 description = "CIDR Block of allowed ingress traffic"
76}
77
78variable elbsecpolicy {
79 type = string
80 default = "ELBSecurityPolicy-TLS-1-1-2017-01"
81 description = "Applied AWS ELB policy"
82}
83
84# Example list (map) of AWS EMR applications used
85
86variable emr-app {
87 type = map
88 default = {
89 hadoop-hdfs-namenode = "50070"
90 hadoop-hdfs-datanode = "50075"
91 hbase = "16010"
92 hue = "8888"
93 jupyterhub = "9443"
94 livy = "8998"
95 spark = "18080"
96 tez = "8080"
97 yarn-node-manager = "8042"
98 yarn-resource-manager = "8088"
99 zeppelin = "8890"
100 }
101}
102
103# Import subnets
104
105data "aws_subnet_ids" "alb-subnets" {
106 vpc_id = var.vpc_id
107}
108
109# AWS Security Group
110resource "aws_security_group" "lb_sg01" {
111 name = "${var.application}-${lower(var.environment)}-lb-sg01"
112 description = "Allow inbound traffic to the ${upper(var.application)} load balancer"
113 vpc_id = var.vpc_id
114 ingress {
115 description = "LB"
116 from_port = 443
117 to_port = 443
118 protocol = "tcp"
119 cidr_blocks = ["${var.cidr_block}"]
120 }
121
122 egress {
123 from_port = 0
124 to_port = 0
125 protocol = "-1"
126 cidr_blocks = ["0.0.0.0/0"]
127 }
128
129 lifecycle {
130 create_before_destroy = true
131 }
132
133}
134
135# EMR Load Balancer
136
137resource "aws_lb" "emr_lb" {
138 name = "${lower(var.application)}-${lower(var.environment)}-lb-${random_id.lb-rand.hex}"
139 load_balancer_type = "application"
140 subnets = data.aws_subnet_ids.alb-subnets.ids
141 security_groups = [aws_security_group.lb_sg01.id]
142 lifecycle {
143 ignore_changes = [
144 tags,
145 access_logs
146 ]
147 }
148 depends_on = [ aws_lb_target_group.emr-tg ]
149 tags = local.emr-tags
150}
151
152resource "aws_lb_listener" "emr-443" {
153 load_balancer_arn = aws_lb.emr_lb.arn
154 port = 443
155 protocol = "HTTPS"
156 ssl_policy = var.elbsecpolicy
157 certificate_arn = data.aws_acm_certificate.wildcard-cert.arn
158 default_action {
159 type = "forward"
160 target_group_arn = data.aws_lb_target_group.emr-namenode.arn
161 }
162 depends_on = [ aws_lb_target_group.emr-tg ]
163}
164
165resource "aws_lb_listener_rule" "host_based_emr_routing" {
166 for_each = var.emr-app
167 listener_arn = aws_lb_listener.emr-443.arn
168
169 action {
170 type = "forward"
171 target_group_arn = try(aws_lb_target_group.emr-tg[each.key].arn, "")
172 }
173
174 condition {
175 host_header {
176 values = ["${each.key}.${var.domain}"]
177 }
178 }
179}
180
181resource "aws_lb_target_group" "emr-tg" {
182 for_each = var.emr-app
183 name = "${each.key}-${random_id.lb-rand.hex}"
184 port = each.value
185 target_type = "instance"
186 protocol = "HTTP"
187 vpc_id = var.vpc_id
188 tags = local.emr-tags
189}