Thursday, 11 September 2025

Mastering Terraform Provisioners & Data Sources: Extending Your Infrastructure Code (Part 8)

Standard

So far in Previous Blog Series, we’ve built reusable Terraform projects with variables, outputs, modules, and workspaces. But sometimes you need more:

  • Run a script after a server is created
  • Fetch an existing resource’s details (like VPC ID, AMI ID, or DNS record)

That’s where Provisioners and Data Sources come in.

What Are Provisioners?

Provisioners let you run custom scripts or commands on a resource after Terraform creates it.

They’re often used for:

  • Bootstrapping servers (installing packages, configuring users)
  • Copying files onto machines
  • Running one-off shell commands

Example: local-exec

Runs a command on your local machine after resource creation:

resource "aws_instance" "web" {
  ami           = "ami-0c55b159cbfafe1f0"
  instance_type = "t2.micro"

  provisioner "local-exec" {
    command = "echo ${self.public_ip} >> public_ips.txt"
  }
}

Here, after creating the EC2 instance, Terraform saves the public IP to a file.

Example: remote-exec

Runs commands directly on the remote resource (like an EC2 instance):

resource "aws_instance" "web" {
  ami           = "ami-0c55b159cbfafe1f0"
  instance_type = "t2.micro"

  connection {
    type     = "ssh"
    user     = "ec2-user"
    private_key = file("~/.ssh/id_rsa")
    host     = self.public_ip
  }

  provisioner "remote-exec" {
    inline = [
      "sudo yum update -y",
      "sudo yum install -y nginx",
      "sudo systemctl start nginx"
    ]
  }
}

This automatically installs and starts Nginx on the server after it’s created.

⚠️ Best Practice Warning:
Provisioners should be used sparingly. For repeatable setups, use configuration management tools like Ansible, Chef, or cloud-init instead of Terraform provisioners.

What Are Data Sources?

Data sources let Terraform read existing information from providers and use it in your configuration.

They don’t create resources—they fetch data.

Example: Fetch Latest AMI

Instead of hardcoding an AMI ID (which changes frequently), use a data source:

data "aws_ami" "latest_amazon_linux" {
  most_recent = true
  owners      = ["amazon"]

  filter {
    name   = "name"
    values = ["amzn2-ami-hvm-*-x86_64-gp2"]
  }
}

resource "aws_instance" "web" {
  ami           = data.aws_ami.latest_amazon_linux.id
  instance_type = "t2.micro"
}

Terraform fetches the latest Amazon Linux 2 AMI and uses it to launch the EC2 instance.

Example: Fetch Existing VPC

data "aws_vpc" "default" {
  default = true
}

resource "aws_subnet" "my_subnet" {
  vpc_id     = data.aws_vpc.default.id
  cidr_block = "10.0.1.0/24"
}

This looks up the default VPC in your account and attaches a new subnet to it.

Case Study: Startup with Hybrid Infra

A startup had:

  • A few manually created AWS resources (legacy)
  • New resources created via Terraform

Instead of duplicating legacy resources, they:

  • Used data sources to fetch existing VPCs and security groups
  • Added new Terraform-managed resources inside those

Result: Smooth transition to Infrastructure as Code without breaking existing infra.

Case Study: Automated Web Server Setup

A small dev team needed a demo web server:

  • Terraform created the EC2 instance
  • A remote-exec provisioner installed Apache automatically
  • A data source fetched the latest AMI

Result: One command (terraform apply) → Fully working web server online in minutes.

Best Practices

  • Use data sources wherever possible (instead of hardcoding values).
  • Limit provisioners—prefer cloud-init, Packer, or config tools for repeatability.
  • Keep scripts idempotent (safe to run multiple times).
  • Test provisioners carefully—errors can cause Terraform runs to fail.

Key Takeaways

  • Provisioners = Run custom scripts during resource lifecycle.
  • Data Sources = Fetch existing provider info for smarter automation.
  • Together, they make Terraform more flexible and powerful.

What’s Next?

In Blog 9, we’ll dive into Terraform Best Practices & Common Pitfalls—so you can write clean, scalable, and production-grade Terraform code.

Bibliography