Utilize random_shuffle to improve AWS availability zone spread when deploying with Terraform

Share on:

random_shuffle

In my repository, event-driven-msk (shown here) - an Amazon VPC is deployed, along with subnets for private & public. A part of that requires a region selection (defined in your provider.tf file), along with availability zone selection.

History

Prior to discovering random_shuffle - I used this:

# Provisions the VPC for MSK

module "vpc" {

source = "terraform-aws-modules/vpc/aws"

name = "msk-vpc"

cidr = "172.16.16.0/20"

azs = ["${var.aws_region}a", "${var.aws_region}b", "${var.aws_region}c"]

private_subnets = ["172.16.16.0/25", "172.16.17.0/25", "172.16.18.0/25"]

public_subnets = ["172.16.16.128/25", "172.16.17.128/25", "172.16.18.128/25"]

enable_nat_gateway = true

enable_vpn_gateway = true

tags = local.common-tags

}

As you can see, I am defining the azs argument in the module using an interpolation expression and appending a character. This isn’t a desirable way to do this, as it is static and not dynamic.

Improvement

Here comes random_shuffle - Terraform Docs here

# Provisions the VPC for MSK 

data "aws_availability_zones" "available" {

state = "available"

}

resource "random_shuffle" "az" {

input = data.aws_availability_zones.available.names

result_count = 3

}

module "vpc" {

source = "terraform-aws-modules/vpc/aws"

name = "msk-vpc"

cidr = "172.16.16.0/20"

azs = ["${element(random_shuffle.az.result, 0)}", "${element(random_shuffle.az.result, 1)}", "${element(random_shuffle.az.result, 2)}"]

private_subnets = ["172.16.16.0/25", "172.16.17.0/25", "172.16.18.0/25"]

public_subnets = ["172.16.16.128/25", "172.16.17.128/25", "172.16.18.128/25"]

enable_nat_gateway = true

single_nat_gateway = true

one_nat_gateway_per_az = false

enable_vpn_gateway = true

enable_ipv6 = true

tags = local.common-tags

public_subnet_tags = {

connectivity = "public"

}

private_subnet_tags = {

connectivity = "private"

}

}

As you can see above, we are using the element function along with the random_shuffle attribute “result” to select random values from the data source, aws_availability_zones.

Gotcha’s

MSK (Managed Services for Kafka) is not available on all availability zones. In my case, deploying to us-east-1 - us-east-1e does not support MSK:

╷

│ Error: error creating MSK Cluster (data-platform-dev-48fd): BadRequestException: One or more subnets belong to unsupported availability zones: [us-east-1e].

│ {

│ RespMetadata: {

│ StatusCode: 400,

│ RequestID: "56571475-52e0-44d6-abdd-3acaa4e7b1ca"

│ },

│ InvalidParameter: "brokerNodeGroupInfo",

│ Message_: "One or more subnets belong to unsupported availability zones: [us-east-1e]."

│ }

│

│ with aws_msk_cluster.data_platform,

│ on data_platform_msk.tf line 96, in resource "aws_msk_cluster" "data_platform":

│ 96: resource "aws_msk_cluster" "data_platform" {

│

╵

How do we work around this? The best way I’ve found is to use the exclude_zone_ids argument in the aws_availability_zones data source:

data  "aws_availability_zones"  "available"  {

state =  "available"

exclude_zone_ids =  ["${var.aws_region}e"]

}

Result

Users who deploy this will now deploy to random availability zones in their AWS region, and no longer always deploy to AZ-A, AZ-B & AZ-C. Future improvements will include a better handling of MSK-forbidden availability zones.