Now we've explored the benefits and limitations of ZFS, let's create a pool and try it out!
This guide will not explore how to use ZFS as root, as it is very distro/OS-dependent. View guides about this topic on your distro/OS's own Wiki or Handbook.
Concepts
Unlike filesystems that operates on a single disk, partition or logical volume (i.e. ext4 and NTFS), ZFS can span across multiple storage volumes. In ZFS, we call the storage pool we can actually store stuff zpool
. Inside a zpool, there are a series of VDEV (Virtual DEVices)
. Data written to the zpool
will be separated on all of the VDEVs
, similar to RAID01.
If you are familiar with RAID, this may sound like a pretty bad idea, since failure of a single VDEV would bring down the whole pool. However, ZFS allows redundancy inside a VDEV to make sure they don't fail easily. There are quite a few types of VDEV2:
- single: just a single disk, no parity
-
mirror: multiple volumes with the same data, VDEV won't fail as far as any single volume is good
- similar to RAID 1
-
RAIDz1, RAIDz2, RAIDz3: multiple volumes with parity data, allows 1/2/3 bad volumes before VDEV fails
- requires 3/4/5 disks at minimum
- uses parity data, similar to RAID 5/6
Building redundancy on VDEV level makes zpool
very versatile. You can expand a storage pool via simply adding a new VDEV with any redundancy setup you wish, rather than reconstructing the whole array of volumes. Doing data parity in software also means that the Write hole issue that plagued other RAID5 systems can be mitigated.
Space Efficiency and Fault Tolerance
For mirror
, it is rather simple: you would need at least two volumes. For 2 volumes, you get 50% space efficiency; for 3 volumes, you get 33.3% efficiency. As far as any of the drives inside a mirror VDEV is still alive, the VDEV is good.
For RAIDz
, you will need at least \( 2+p \) volumes, where p is the number of parity volumes (1 for RAIDz, 2 for RAIDz2, 3 for RAIDz3). For example, assuming all volumes has the same capacity:
-
2 storage volumes + 1 parity volume (RAIDz1)
- \( \frac{2}{2+1} \approx 66.7\% \) space efficiency, allow 1 in 3 volumes to fail before data loss
-
4 storage volumes + 1 parity volume (RAIDz1)
- \( \frac{4}{4+1} = 80\% \) space efficiency, allow 1 in 5 volumes to fail before data loss
-
4 storage volumes + 2 parity volume (RAIDz2)
- \( \frac{4}{4+2} \approx 66.7\% \) space efficiency, allow 2 in 6 volumes to fail before data loss
Keep in mind that if you are mixing volumes with different capacity in a VDEV, ZFS would check the smallest size and assume all volumes has the same size (the minimum).
Planning
Once a volume is added to a ZFS pool, there's no way to shrink it (unlike ext3/4 or NTFS). Also, if you've assembled a VDEV with redundancy (mirror or RAIDz), there's no safe way to convert to other types of VDEV. So take your time and plan ahead!
Planning is important when designing storage system. An adequately designed system will make sure you get the most out of your hardware. Moreover, it will make life easier when bad things happen. There're three things to consider:
- Performance
- Space Efficiency
- Fault Tolerance
We've already talked about space efficiency and fault tolerance, so we won't reiterate here. Performance only comes into consideration if there will be heavy upon the storage pool (i.e. you will be running databases on it). If you do need good performance, here're some rule of thumbs:
- If budget allows, use
mirror
. - If
RAIDz
(at any level) is used, it is recommended to use \( 2^n + p \) drives to balance performance and space efficiency.
It is also very beneficial to tune ZFS to suit your workload, but this topic is beyond the scope of this article.
Create the pool
You should have a general plan on what the pool should look like. Now, let's create the pool.
Steps shown here assumes you are using ZFS on Linux. Although the general steps should be the same, FreeBSD may require different flags in the command, so check its manual and documentations before proceeding.
For storage pool creation and maintenance, we use the zpool
command. This command accepts subcommands, similar to git
. To create a ZFS pool, we use the create
subcommand:
|
|
-
-f
in order to mitigate Does not contain an EFI label error.- This may be ZFS on Linux specific. Check your manual.
-
-o ashift=12
use AF (Advanced Format) 4k sector size to improve performance. If your pool is made up of SSDs, use-o ashift=13
since SSD uses 8k sector size.- See more detail about this topic on OpenZFS docs.
-m <mountpoint>
specify where the pool would be mounted.<pool_name>
the name of the new pool.[raidz|raidz2|raidz3|mirror] <volumes>
specify the type of VDEV to create and volumes to use
Note that generally, you should use volume ID rather than non-persistent volume location like sdX
since volume IDs are invariant among reboots and hardware changes. You can find volume IDs by checking symlink in /dev/disk/by-id/
.
Example: Create a simple pool with only one volume
Creating pool with only one volume is rather straightforward:
|
|
Here, we create a pool with /dev/disk/by-id/ata-VOLUME-ID
and mount it to /mnt/data
.
Example: Create a pool with mirror VDEVs
|
|
This creates a pool with a single mirror VDEV consists of ata-VOLUME-1
and ata-VOLUME-2
.
You can also create multiple VDEVs:
|
|
Example: Create a pool with RAIDz1 VDEVs
|
|
Checking pool status
Now, our pool has been created and we can check its status:
|
|
Although there's not much going on here right now, this would be a very useful command later. We will check pool scrub/rebuild progress, check volume state and, if we are unlucky, check which files are affected by a data loss.
Adding and Removing volumes
In ZFS, we can modify the pool layout after the pool has been created.
Adding a new VDEV
We can expand a storage pool by adding a new VDEV:
|
|
Note that if your VDEV has a different replication level (for example, have different number of volumes inside a mirror or RAIDz VDEV), then ZFS would warn you about this. However, there shouldn't be a huge difference unless performance is critical.
See zpool-add.8 for more detail on this operation.
Transforming single volume VDEV to mirror
Note that you can also add more volumes to an existing mirror VDEV.
|
|
See zpool-attach.8 for more on this operation.
Remove VDEV from pool
Currently, OpenZFS only supports removing top-level, single
or mirror
VDEVs. Also, the pool cannot have top-level RAIDz. This operation will copy all data to the remaining of the pool and shrink the size of the pool.
|
|
See zpool-remove.8 for more detail on this operation.
Importing and Exporting zpool
If you want to use the pool on other devices, you will need to export the pool:
|
|
Be caution when importing the pool. By default, ZFS would use the non-persistent volume naming, which will cause issue when disk arrangement changes. Instead, you should use:
|
|
Although it is intuitive to consider zpool to be a glorified RAID0 (that is, striping the data across VDEVs), ZFS doesn't simply segment the data and record them across the VDEVs. Rather, ZFS has a pretty complex allocator to determine when and where to write those data.
There're actually some other types of VDEVs, but they are either used for caching or used internally by ZFS. So these are all we need to know to store actual data.