Automation is only as good as its reliability. In a production pipeline, a silent cloud-init failure can lead to "zombie" VMs that are up but completely unreachable or misconfigured. Understanding how to handle advanced networking and, more importantly, how to parse logs when things go wrong is what separates a lab setup from a production-grade infrastructure.
If you missed part 1 of this series, you can catch up here. We previously focused on how to automate KVM provisioning with cloud-init and NoCloud.
Example 1 – Execute custom script and enable proxy
This time, instead of assigning the hostname manually, the task will be delegated to a script that executes only once during the first boot. Additionally, a proxy will be configured for the newly created user to ensure persistent internet access. At the end the summary message will be sent to the terminal output.
EXECUTION COMMAND:
virt-install --name cloud-init-001 \
--memory 4000 --os-variant ubuntu22.04 \
--disk=size=10,backing_store="$(pwd)/image/ubuntu-22.04-minimal-cloudimg-amd64.img" \
--cloud-init user-data="$(pwd)/user-data",meta-data="$(pwd)/meta-data",network-config="$(pwd)/network-config" \
--network network=default
META-DATA:
instance-id: 66710171-d878-482f-8172-9a4a88053033
USER-DATA:
#cloud-config
password: password
chpasswd:
expire: False
ssh_pwauth: True
users:
- default
- name: usr1
sudo: ALL=(ALL) NOPASSWD:ALL
shell: /bin/bash
lock_passwd: false
plain_text_passwd: password
write_files:
- path: /var/lib/cloud/scripts/per-instance/set_hostname.sh
permissions: '0755'
owner: root:root
content: |
#!/bin/bash
# Generate random 2-byte hex string and prepend "vm-"
NEW_HOSTNAME="pl-wro-$(openssl rand -hex 2)"
hostnamectl set-hostname "$NEW_HOSTNAME"
echo "Hostname set to $NEW_HOSTNAME" >> /var/log/hostname-change.log
- path: /home/usr1/.bashrc.d/proxy.sh
permissions: '0644'
owner: usr1:usr1
defer: true
content: |
export http_proxy="http://proxy.esl.cisco.com:80"
export https_proxy="http://proxy.esl.cisco.com:80"
export no_proxy="localhost,127.0.0.1,::1,192.168.122.0/24, 182.168.100.0/24"
runcmd:
- echo 'source ~/.bashrc.d/proxy.sh' >> /home/usr1/.bashrc
- chown usr1:usr1 /home/usr1/.bashrc
final_message: |
cloud-init has finished
version: $version
timestamp: $timestamp
datasource: $datasource
uptime: $uptime
NETWORK-CONFIG:
__EMPTY__
Whenever a dedicated script needs to be executed (ie.: during instance creation, on every reboot, or only once), it should be placed in the appropriate subdirectory under /var/lib/cloud/scripts/. The available options are per-boot, per-instance, per-once, and vendor.
Another important consideration that should always be kept in mind is the execution order of modules, ex. file creation (cc_write_files module) runs BEFORE user creation (cc_users_groups), meaning that without the defer: true keyword, cloud-init would return an error due to the target directory not yet existing at that point of time (which is not so easy to debug, because at the end the directory exits).
Example 2 – Configure extra interface and notify external server
In the previous examples, network settings were left at their default values. It is time to change that. An additional network interface will be configured and used to notify an external HTTP server that the cloud-init process has completed.
EXECUTION COMMAND:
virt-install --name cloud-init-002 \
--memory 4000 --os-variant ubuntu22.04 \
--disk=size=10,backing_store="$(pwd)/image/ubuntu-22.04-minimal-cloudimg-amd64.img" \
--cloud-init user-data="$(pwd)/user-data",meta-data="$(pwd)/meta-data",network-config="$(pwd)/network-config" \
--network network=default \
--network network=nodhcp-net
NOTE: Whenever the network configuration is modified, it is important to verify whether other related settings, such as the number of network interfaces need to be adjusted accordingly (by adding more interfaces like in this example).
META-DATA:
instance-id: 28862d34-74e1-4ebf-a832-ba2b322540d2
USER-DATA:
#cloud-config
password: password
chpasswd:
expire: False
ssh_pwauth: True
write_files:
- path: /var/lib/cloud/scripts/per-instance/set_hostname.sh
permissions: '0755'
owner: root:root
content: |
#!/bin/bash
# Generate random 2-byte hex string and prepend "vm-"
NEW_HOSTNAME="pl-wro-$(openssl rand -hex 2)"
hostnamectl set-hostname "$NEW_HOSTNAME"
echo "Hostname set to $NEW_HOSTNAME" >> /var/log/hostname-change.log
phone_home:
post: [instance_id, hostname]
url: http://192.168.100.1/
tries: 3
final_message: |
cloud-init has finished
version: $version
timestamp: $timestamp
datasource: $datasource
uptime: $uptime
NETWORK-CONFIG:
version: 2
ethernets:
enp1s0:
dhcp4: true
dhcp6: true
enp2s0:
dhcp4: false
dhcp6: false
addresses:
- 192.168.100.10/24
To make use of the phone home functionality, a simple HTTP server needs to be set up - for example, using a Python module.
user@kvm $ sudo python3 http/server.py
=== Phone Home Received ===
Headers: Host: 192.168.100.1
User-Agent: Cloud-Init/25.1.4-0ubuntu0~22.04.1
Accept-Encoding: gzip, deflate
Accept: */*
Connection: keep-alive
Content-Length: 69
Content-Type: application/x-www-form-urlencoded
Body:
instance_id=28862d34-74e1-4ebf-a832-ba2b322540d2
hostname=pl-wro-365d
192.168.100.10 - - [01/Apr/2026 18:36:14] "POST / HTTP/1.1" 200 -
As can be seen from the output above, the amount of information received is quite minimal, but it eliminates the need to constantly monitor the terminal of the newly created machine. By using another monitoring tool, it becomes easy to detect machines that are already up and running and proceed with further deployment steps.
Example 3 – Use a seed ISO with NoCloud
Cloud-init can retrieve the required parameters through various methods. For example, from ISO files, an HTTP/S or FTP server, or directly from the local disk of the machine itself. In this final example, an ISO file will be used for this purpose. However, to make this work, the file needs to be properly prepared, as it must adhere to a specific format.
# Method #1
user@kvm $ cloud-localds seed.iso user-data meta-data -N network-config
# Method #2
user@kvm $ genisoimage -o seed.iso -volid cidata -joliet -rock user-data meta-data network-config
NOTE: It is important to strictly follow the expected file naming convention - specifically user-data, meta-data, and network-config. These are the exact filenames that the cloud-init process searches for when reading from an ISO file. In the case of machine provisioning in the previous examples, the configuration files could have had different names.
EXECUTION COMMAND:
virt-install --name cloud-init-002 \
--memory 4000 --os-variant ubuntu22.04 \
--disk size=10,backing_store="$(pwd)/image/ubuntu-22.04-minimal-cloudimg-amd64.img",bus=virtio \
--disk path=./seed.iso,bus=virtio,readonly=on \
--network bridge=virbr0,model=virtio \
--noautoconsole \
--channel unix,target_type=virtio,name=org.qemu.guest_agent.0 \
--import
META-DATA:
instance-id: 93d7ce5d-c3b2-40c0-8c58-bb382c517c56
USER-DATA:
#cloud-config
password: password
chpasswd:
expire: False
ssh_pwauth: True
users:
- default
- name: usr1
sudo: ALL=(ALL) NOPASSWD:ALL
shell: /bin/bash
lock_passwd: false
plain_text_passwd: password
NETWORK-CONFIG:
__EMPTY__
In this case, the process does not differ from any of the previous examples: cloud-init mounts the ISO file as an additional disk and reads its contents. One of the advantages of using ISO files is that each time the disk is correctly detected, the cloud-init process will be triggered again (if a unique instance-id is detected). As a result, if the machine needs to be extended with a new configuration, it is sufficient to simply provide new input data in this way.
Resolving issues - tips & tricks
To conclude, here are a few approaches and tips on how to handle the troubleshooting process. Cloud-init is not the easiest tool to debug. Error messages most often only indicate what has failed, without providing much guidance on why. Troubleshooting cloud-init often requires correlating module output, logs, and datasource behavior, so a methodical approach matters.
Logs
The most fundamental source of information and the first place to start looking for answers is a command that has already appeared multiple times throughout this article: cloud-init status --long.
ubuntu@pl-wro-ff34:~$ cloud-init status --long
status: error
extended_status: error - done
boot_status_code: enabled-by-generator
last_update: Thu, 01 Jan 1970 00:00:51 +0000
detail: DataSourceNoCloud [seed=dmi,/dev/sr0][dsmode=local]
errors:
- ('write_files', OSError('Unknown user or group: "getpwnam(): name not found: \'usr1\'"'))
recoverable_errors:
WARNING:
- Running module write_files (<module 'cloudinit.config.cc_write_files' from '/usr/lib/python3/dist-packages/cloudinit/config/cc_write_files.py'>) failed
- Configuration entry 'tries' is not an integer, using 10 instead
- Failed to post phone home data to http://192.168.122.1/ in 10 tries
It provides a few key pieces of information:
- status - cloud-init result (ex. error, done, disabled),
- extended status - cloud-init result and current state (ex. running, done),
- information about data source (ex. NoCloud),
- errors in the order they occurred (with the names of module: `writes_files` or `cc_write_files.py`),
- warning in the order they occurred (important - not all warnings cause the cloud-init to fail).
Next, it is worth looking directly at the cloud-init logs themselves. These contain all the log entries generated by each module individually. To make browsing through them easier, it is helpful to search for entries matching either the module name or the error message visible in the cloud-init status --long output (each module’s name starts with prefix `cc`).
[...] handlers.py[DEBUG]: start: init-local/config-write_files: running config-write_files with frequency once-per-instance
[...] util.py[DEBUG]: Writing to /var/lib/cloud/instances/66710171-d878-482f-8172-9a4a88053033/sem/config_write_files - wb: [644] 24 bytes
[...] helpers.py[DEBUG]: Running config-write_files using lock (<FileLock using file '/var/lib/cloud/instances/66710171-d878-482f-8172-9a4a88053033/sem/config_write_files'>)
[...] util.py[DEBUG]: Writing to /var/lib/cloud/scripts/per-instance/configure_hostname.sh - wb: [755] 221 bytes
[...] util.py[DEBUG]: Changing the ownership of /var/lib/cloud/scripts/per-instance/configure_hostname.sh to 0:0
[...] handlers.py[DEBUG]: finish: init-local/config-write_files: FAIL: running config-write_files with frequency once-per-instance
[...] log_util.py[WARNING]: Running module write_files (<module 'cloudinit.config.cc_write_files' from '/usr/lib/python3/dist-packages/cloudinit/config/cc_write_files.py'>) failed
[...] log_util.py[DEBUG]: Running module write_files (<module 'cloudinit.config.cc_write_files' from '/usr/lib/python3/dist-packages/cloudinit/config/cc_write_files.py'>) failed
Traceback (most recent call last):
File "/usr/lib/python3/dist-packages/cloudinit/util.py", line 1641, in chownbyname
uid = pwd.getpwnam(user).pw_uid
KeyError: "getpwnam(): name not found: 'usr1'"
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
[... stack trace truncated ...]
OSError: Unknown user or group: "getpwnam(): name not found: 'usr1'"
In some cases, it is better to pay attention to the command's output rather than its logs (in the example below cloud-init tries to download packages from repositories, no connection can be an issue with firewall, gateway or proxy).
Ign:1 http://archive.ubuntu.com/ubuntu jammy InRelease
Ign:2 http://archive.ubuntu.com/ubuntu jammy-updates InRelease
Ign:3 http://archive.ubuntu.com/ubuntu jammy-backports InRelease
Ign:4 http://security.ubuntu.com/ubuntu jammy-security InRelease
[... output truncated ...]
Err:1 http://archive.ubuntu.com/ubuntu jammy InRelease
Cannot initiate the connection to archive.ubuntu.com:80 (2a06:bc80:0:1000::16). - connect (101: Network is unreachable)
Cannot initiate the connection to archive.ubuntu.com:80 (2620:2d:4002:1::101). - connect (101: Network is unreachable)
Manual triggering
Every machine configured using cloud-init stores a copy of all configuration files for each initialized instance.
root@pl-wro-2a88:~# ls -l /var/lib/cloud/instances/
total 8
drwxr-xr-x 5 root root 4096 Apr 2 19:02 4bb43851-ed40-4e7b-885e-520635e7fc6f
drwxr-xr-x 5 root root 4096 Apr 2 19:02 93d7ce5d-c3b2-40c0-8c58-bb382c517c56
root@pl-wro-2a88:~# ls -1 /var/lib/cloud/instances/4bb43851-ed40-4e7b-885e-520635e7fc6f/
boot-finished
cloud-config.txt
datasource
handlers
network-config.json
obj.pkl
scripts
sem
user-data.txt
user-data.txt.i
vendor-data.txt
vendor-data.txt.i
vendor-data2.txt
vendor-data2.txt.i
Once an error has been identified and a fix needs to be verified, there is no need to provision the entire machine from scratch. It is possible to force cloud-init to re-run at different levels of granularity: for a single module, for a single stage.
Single module
As a quick approach, the user-data file can be modified directly in the local copy on the current instance, and then the module can be forced to re-run.
root@pl-wro-2a88:~# cloud-init single --name cc_ntp --frequency always -f /var/lib/cloud/instances/4bb43851-ed40-4e7b-885e-520635e7fc6f/user-data.txt
It is important to note that modules have an assigned execution frequency - instance, always, or once - and in order to bypass this limitation, the frequency needs to be manually overridden to always.
Single stage
Again, simple modification of user-data file and rerun execution.
root@pl-wro-2a88:~# cloud-init modules --mode config -f /var/lib/cloud/instances/4bb43851-ed40-4e7b-885e-520635e7fc6f/user-data.txt
[...] modules.py[DEBUG]: Running module ntp (<module 'cloudinit.config.cc_ntp' from '/usr/lib/python3/dist-packages/cloudinit/config/cc_ntp.py'>) with frequency once-per-instance
[...] handlers.py[DEBUG]: start: modules-config/config-ntp: running config-ntp with frequency once-per-instance
[...] helpers.py[DEBUG]: config-ntp already ran (freq=once-per-instance)
When re-running all modules, the same principle applies. If a module has already been executed, it needs to be forced to run again. Otherwise, cloud-init will consider the execution unnecessary and skip it.
root@pl-wro-2a88:~# cloud-init modules --mode config --frequency always -f /var/lib/cloud/instances/4bb43851-ed40-4e7b-885e-520635e7fc6f/user-data.txt
[...] modules.py[DEBUG]: Running module ntp (<module 'cloudinit.config.cc_ntp' from '/usr/lib/python3/dist-packages/cloudinit/config/cc_ntp.py'>) with frequency once-per-instance
[...] handlers.py[DEBUG]: start: modules-config/config-ntp: running config-ntp with frequency once-per-instance
[...] util.py[DEBUG]: Writing to /var/lib/cloud/instances/4bb43851-ed40-4e7b-885e-520635e7fc6f/sem/config_ntp - wb: [644] 25 bytes
[...] helpers.py[DEBUG]: Running config-ntp using lock (<FileLock using file '/var/lib/cloud/instances/4bb43851-ed40-4e7b-885e-520635e7fc6f/sem/config_ntp'>)
[...] cc_ntp.py[DEBUG]: Selected NTP client "systemd-timesyncd", already installed
[...] cc_ntp.py[DEBUG]: service_name: systemd-timesyncd
[...] cc_ntp.py[DEBUG]: servers: ['ntp.acme.com']
[...] cc_ntp.py[DEBUG]: pools: []
[...] cc_ntp.py[DEBUG]: allow: []
[...] cc_ntp.py[DEBUG]: peers: []
In the logs, the frequency will still appear as once-per-instance, but the process simply treats the module as if it were being executed for the very first time.
Best practices summary
To wrap up, it is worth mentioning a few simple best practices to follow when working with cloud-init. Some of these have already been discussed throughout the article, but it is helpful to have them gathered in one place.
- First, the instance-id should be generated randomly rather than set using simple keywords. If cloud-init detects the same value as a previous run, the entire configuration will be completely ignored.
- Second, close attention should be paid to the execution order of modules. Some modules offer dedicated flags that allow them to defer their execution to a later stage (for example, cc_write_files can create a file that is then executed later by cc_runcmd).
- Third, it is highly recommended to make use of the official documentation, which contains numerous examples and frequently highlights edge cases and exceptions.Fourth, when trying to identify and fix an error, the analysis should be done module by module.
- Last but most importantly, the methodology is key. Parameters should not be changed blindly. The status should be checked first using cloud-init status, followed by a thorough analysis of the output logs in cloud-init-output.log.
However, if you want to read more about policy-based validation in CI/CD pipelines, please check our previous publication.
Conclusion
Cloud-init is a powerful and versatile tool that significantly simplifies the process of provisioning and configuring linux/unix virtual machines in a consistent, repeatable manner. While the learning curve and debugging experience may require some patience, the time invested pays off by enabling fully automated and reliable infrastructure deployment.
By moving from manual setup to declarative NoCloud configurations, you ensure that every VM in your environment is consistent and documented by design. For more hands-on infrastructure advice, check out our guide to hands-on infrastructure configuration.


