File system fragmentation sounds like a dreary topic, but nothing is as tedious as waiting for a sluggish system. Because the disks are the slowest components in the system by a large margin, careful attention to file system layout and fragmentation reduction can yield large gains.
“Newer is Better” in several ways, let’s look at how updates can improve our system performance.
Newer disks may have better I/O performance. You have surely made the transition from PATA to SATA, and moved to 6 Gbps SATA interfaces. But newer disks may include more and faster cache memory.
Of course, solid-state disks (or SSDs) provide an enormous improvement in performance but their cost per gigabyte is still too high for most of us to use for our full data collections. Some newer and faster disks contain an SSD “front end”, basically a much larger and faster cache – greatly improving I/O performance.
An individual file can become fragmented, meaning that its data blocks aren’t all located contiguously, next to each other. Then files can become scattered, so that files in the same directory and frequently used together aren’t located next to each other on the disk media. That second situation is like fragmentation of directories.
File fragmentation is especially likely to happen when a file is created and then appended to gradually — log files would be the worst examples but many applications create files gradually. Directory and file fragmentation happens when files are created, then deleted, and then more files are added.
These modern file systems include some pre-emptive methods for limiting fragmentation. The techniques are called delayed allocation and allocate-on-flush. They hold data in memory longer before flushing it to the disk, grouping the blocks (Ext4) or extents (XFS, Btrfs) into larger groups and reducing fragmentation. As a nice side benefit, they also reduce CPU interruption.
Btrfs adds another feature called automatic on-line defragmentation, although “fragmentation avoidance” seems like a more accurate description. You need to mount the file system with the
autodefrag option. Be careful — this can make things either better or worse, depending on your work load.
If your work load consists of many small write operations on many small files, the
autodefrag option will help performance. But if you just learned about full system virtualization in Learning Tree’s Linux virtualization course, you do not want to use this option for the host’s file system! Nor would you want to use it if your system handles large databases. In those cases the databases or virtual machine disk images have many small writes within one large file, and the
autodefrag option will slow down I/O.
One of my consulting clients has data archiving as their primary mission. They collect new data as large files and then store them indefinitely, deleting and replacing very little. Until they start getting their file systems over about 90% full, they have almost no fragmentation.
That’s unusual! Those of us who are always deleting files and adding more in their place run into significant levels of fragmentation. We need some retroactive technique to clean things up.
The easiest and fastest way to get a significantly less fragmented file system is to back up all your data to another volume, create a new empty file system on the old volume, and copy your data back into place. This allows the OS to plan ahead as it lays out files and directories of known sizes. You get the same directory tree structure, of course, but the arrangement of data on the disk will be much better.
If you want to measure fragmentation and manually call for defragmentation, look into the
e4defrag commands on Ext4;
xfs_fsr on XFS, and the do-it-all tool
btrfs on Btrfs.
Also check out Learning Tree’s Linux optimization and troubleshooting course for more performance tips and tricks.