Testing storage performance
Testing disk I/O performance
In the previous post we saw how to set up a basic two-tiered storage backend for our Proxmox server. Today we’re going to see how well these storage backends perform relative to their hardware (SSD vs HDD). This will help me manage my expectations down the line and monitor performance as we move forward, it will also help us compare performance from the host operating system vs performance from within a virtual machine to diagnose any Qemu/Proxmox related issues.
But first things first, our environment at the moment does not have any virtual machines to work with, so let’s start there.
Creating and preparing a virtual machine
For this case, I am going to allow myself to create a virtual machine manually, but I promise this is an exception. My next post is going to be all about automating VM creation.
Let’s start by creating a virtual machine with two disks attached:
- /dev/sda: Runs the operating system (14 GB)
- /dev/sdb: Unpartitionned (32 GB)
The first thing we need to do here is to partition the disk /dev/sdb
which as mentioned earlier does not contain anything at this point. No magic here, using parted
does the job quite well. The reason I went with parted
instead of fdisk
here is because parted makes things easier to automate.
Once that part is done, we now need to create a filesystem on the created partition (/dev/sdb1) using mkfs
:
And finally we just need to mount the created partition by running:
mount /dev/sdb1 /mnt
And there we have it, a fully functional Ubuntu virtual machine with accessible storage from both storage tiers.
Testing I/O disk performance
Before we start just testing things and running benchmarking tools, let’s first take a look at what kind of metrics should we be looking at when it comes to disk performance.
There are mainly 2 metrics I want to collect here:
- Read throughput: Measured using
hdparm
. - Write throughput: Measured by writing a file using
dd
.
I might expand in the future on the definitions of some metrics but for now, I’m going to stick to the bare minimum. If you wish to take a more in-depth look at what these metrics mean, here are a few links:
Now that we know what we want to test and have the environments to do so, let’s get started.
When it comes to pretty much anything related to Linux system performance benchmarking, I like to refer to this helper made by Brendan Gregg (I highly recommend checking out his website and books)
According to the tool map above, dd
and hdparm
should be more than enough to collect the metrics we need, so let’s get started.
Hypervisor benchmarking
Let’s see how things look like from within the hypervisor itself.
Since the operating system itself is provisioned on one of the SSD disks, we can test the SSDs performance quite easily, let’s pick the following path: /root/testfile
When it comes to the HDDs, they can only be accessed through the ZFS pool so we first have to find out how we can access them from the host operating system.
The following command tells us where the ZFS pool is mounted locally:
root@ghima-node01:~# zfs get all | grep mountpoint
hdd_storage mountpoint /hdd_storage default
So for the HDD part, we are going to use /hdd_storage/testfile
as a path to test our disk performance, now let’s get started.
We start by writing a single block of 1G to test the throughput.
The SSD part seems quite normal and corresponds to what was promised by the hardware vendor at 205 MB/s
.
The HDD throughput seems surprising at first at around four times that of the SSD: 847 MB/s
but once we account for the fact that ZFS uses RAM for caching, things make a bit more sense.
So how can we test the actual HDD performance without the ZFS cache? Let’s give it a try.
We can try using the sync
command which makes sure that all cached write operations are immediately written to disk and time it using the time
command so we can compare the results.
As we can see, adding sync
to the throughput test did not really change anything about the time taken which implies that the output of dd
here reflects the actual performance of the hardware without any caching.
The HDD’s results on the other hand do seem to confirm that the speed gain experienced here is mainly due to ZFS’ caching. After adding sync
it now takes more than 10 times the time it took in the previous iteration.
Now let’s take a look at the read benchmaring results:
Virtual machine benchmarking
The results are not exactly what I expected but that’s good. It gives me something to dive deeper into in the future.
For now, the results seem satisfactory even if the HDDs outperform the SSDs in this setup by a large margin. This is most likely due to the differences between ZFS and LVM thin pools.
Wrapping up
Considerations
This experiment opens way too many doors for comfort but it would be interesting to see how this unfolds. It seems I underestimated the performance of a Raid-1 ZFS setup, so maybe in the future, I will convert the whole storage backend to a single ZFS pool with SSD just for caching? Who knows.
Immediate next steps
With this out of the way, I believe we are now ready to start deploying virtual machines! and that is going to be our next blog post.