Terraform with Google Cloud
Terraform does infrastructure as code. It can work with AWS, GCP, Azure, etc. This article will discuss using terraform to setup a private cloud with bastion host on Google cloud platform (GCP), and mention the problems faced along the way. The private cloud architecture is shown as the figure below. The private k8s cluster contains node pool. The access to the k8s cluster is via the bastion. The bastion is connected to two different subnets.
Terraform code:
Terraform uses HashiCorp Configuration Language (HCL) to create concise descriptions of resources.
In the terraform code, we create modules and workspaces. Terraform workspaces are intended to be a collection of configurations that represent a single environment. Terraform modules are components that can be re-used by one or more modules or workspaces. The folders are organised in the following way:
root_folder → workspaces → modules
Now we look at the actual terraform code. In network workspace, we create a main.tf and add reference to vpc module:
module “core_vpc” { source = “../modules/vpc” name = “core-vpc-dev” project = var.project env = var.env region = var.region}
In the same network workspace, we add reference to bastion module.
module “cloud_bastion” { source = “../modules/bastion” name = “bastion-vm” project = var.project env = var.env region = var.region zone = “${var.region}-c” vpc_self_link = module.core_vpc.vpc_self_link subnet_self_link = module.core_vpc.subnet_external_self_link members = ["myemail@myemail.com"]}
In modules folder, add vpc module, and create a vpc module file main.tf, add vpc resource from terraform “google_compute_network”.
resource “google_compute_network” “vpc” { name = var.name project = var.project routing_mode = “REGIONAL” auto_create_subnetworks = false}
Create a output.tf, the output variable vpc_self_link referenced in the main.tf is defined here.
output vpc_self_link { value = google_compute_network.vpc.self_link}
output subnet_external_self_link { value = google_compute_subnetwork.external.self_link}
We setup a terrafrom remote state in the google cloud storage (gcs) bucket. With remote state, terraform writes the state data to a remote data store, which can then be shared among all members of a team. Remote state is implemented by a backend. In Google cloud, the backend is google cloud storage bucket.
In all the workspaces, main.tf, add terraform backend:
terraform { backend “gcs” { bucket = “for-my-project” prefix = “<project_name>/state” }}
Next, in modules folder, we create a bastion module, add this file main.tf, and add bastion from terraform bastion host.
module “iap_bastion” { source = “terraform-google-modules/bastion-host/google” version = “2.9.0” project = var.project zone = var.zone network = var.vpc_self_link subnet = var.subnet_self_link members = var.members name = var.name service_account_name = "bastion-vm" fw_name_allow_ssh_from_iap = "allow-ssh-to-tunnel"}
After that, we add more workspaces and modules to create k8s cluster, then create node pools in the cluster.
For the final product, we will have network, compute, and access workspaces, as well as a <project_name> workspace. The workspaces re-use the modules of vpc, bastion, k8s, modules etc.
When we make changes to the terraform files, such as when we change the module version, or add new modules, we need to run “terraform init” again. After init is performed, we run “terraform plan -out=terra.plan”. Then, we run “terraform apply terra.plan”.
Note 1: the terraform commands are executed in workspaces/<project>/stg or workspaces/<project>/prd folder.
Note 2: when executing terraform commands in the workspaces, there may be sequences that you need to follow. For example, if you receive the error message when running “terraform plan”, it means you need to firstly execute the terraform commands in compute workspace. Only after that, you can execute the terraform commands in <project_name> workspace.
│ Error: Unsupported attribute
│
│ on main.tf line 46, in module “<project_name>”:
│ 46: k8s_cluster_name = data.terraform_remote_state.compute.outputs.k8s_cluster_name
│ ├────────────────
│ │ data.terraform_remote_state.compute.outputs is object with 5 attributes
│
│ This object does not have an attribute named “k8s_cluster_name”.
If we have to delete all the resources, run “terraform destroy”. Do not manually delete the resources using google cloud console.
Problems:
One problem i faced when doing “terraform plan” is described as below:
Error: Invalid count argumenton .terraform/modules/k8s_online_cluster.gke.gcloud_delete_default_kube_dns_configmap/main.tf line 63, in resource “null_resource” “module_depends_on”:
63: count = length(var.module_depends_on) > 0 ? 1 : 0The “count” value depends on resource attributes that cannot be determined until apply, so Terraform cannot predict how many instances will be created. To work around this, use the -target argument to first apply only the resources that the count depends on.
To solve this problem, I delete the state file in google cloud storage. and re-run terraform plan. The problem happens because i manually delete some resources in google cloud console, and the state file is not in sync with the state of the infrastructure resources.
Another problem i faced is, after creating the private cloud , i could not ssh to the bastion. I keep getting this error as below:
$ gcloud compute ssh bastion-vmExternal IP address was not found; defaulting to using IAP tunneling.ERROR: (gcloud.compute.start-iap-tunnel) Error while connecting [4003: 'failed to connect to backend']. (Failed to connect to port 22)kex_exchange_identification: Connection closed by remote host
I have setup the firewall rule that opens TCP port 22 to IP ranges: 35.235.240.0/20, and assigned the IAP secured tunnel permission to the user. (35.235.240.0/20
IP range contains all IP addresses that Google IAP uses for TCP forwarding.)
Eventually, it is found out that the bastion host is not booting up correctly, and the ssh daemon is not started at all. The bastion VM boot error is shown below.
To see the Bastion VM booting message, we can click “Serial port 1 (console)” as in the figure below.
The solution is to remove secure boot and vTPM settings of the VM.
And viola, we can ssh to the bastion using “gcloud compute ssh bastion-vm”.
Access the private k8s cluster:
We want to obtain the access to the k8s cluster. This is how we do it. Firstly, we set the project and select the cluster.
gcloud config set project <project_name>gcloud container clusters get-credentials <cluster-name>
Then, check the ~/.kube/config file, a cluster entry will be generated. In the cluster entry, we modify the server value to https://kubernetes:8543
- cluster:
certificate-authority-data: LS0tLS1....tLS0tCg==
server: https://kubernetes:8543
name: gke_<project-name>_<region>_<cluster-name>
We can setup a tunnel via the bastion to the k8s cluster with:
gcloud compute ssh bastion-vm -- -L 8543:<ip_addr_of_cluster>:443
(there is another firewall rule that open port 443 to the bastion host, so we tunnel to remote port 443)
Keep the command running. Open another terminal, and we can access the k8s cluster with:
kubectl -n <namespace> get pods
That is it.
Note 3:
When debugging ssh to bastion problem, i use a startup script to do open port 22 using ufw.
module “iap_bastion” {
…
startup_script = data.template_file.startup_script.rendered …
}
The startup script is defined as below. You can put it in the same tf file as iap_bastion.
data “template_file” “startup_script” {
template = <<EOF
sudo ufw allow 22/tcp
EOFvars = {
cluster_zone = var.zone
project = var.project
}
}
It is something not required at all in my situation. Anyway, it is worth noting if we want to create a startup script in future.
Note 4:
When doing terraform apply halfway, network was down. I run terraform apply again, then i encounter this problem.
│ Error: googleapi: Error 409: Already exists: projects/airasia-coeblockchain-stg/locations/asia-southeast1/clusters/k8s-default-stg., alreadyExists
│
│ with module.k8s_default_cluster.module.gke.google_container_cluster.primary,
│ on .terraform/modules/k8s_default_cluster.gke/modules/private-cluster/cluster.tf line 22, in resource “google_container_cluster” “primary”:
│ 22: resource “google_container_cluster” “primary” {
│
The solution is, i have to manually delete the k8s-default-stg on google cloud console.