Disclaimer: What you are about to read may contain inaccuracies. Feel free to discuss them somewhere else. This is also my opinion and as such it may change through time, maybe tomorrow, next month, next year, next decade or never. I do also make very few reviews (if any) of what I write here so this article won’t be polished by any means and it is coming out of my mind and gut pretty raw.
Linus Torvalds, the creator and head of the Linux kernel which powers many of the systems we use, has recently pronounced himself about some questions on the ZFS filesystem. If you are reading these lines, quite probably, you already know who he is and what ZFS is about. For the uninitiated not willing to visit Google, all I can say is Mr Torvalds is the creator and head of the project named Linux, which is an operating system kernel (the nucleus), which resembles UNIX, and ZFS is a file system that broke with some past problems solving them brilliantly.
In Linus Torvalds defense.
I have to say, writting about something Linus has said and missinterpreting what he wrote is quite easy, specially considering the position he is in as a technologist. Media sites such as Phoronix, and ZDNet, among others, have already published some ‘news’ on their sites to ‘inform’ the public about his ‘bold’ claims.
Linus was just replying to a question in the mailing list (Linux and most open source projects run under public scrutiny) from a user who had seen ZFS breaking after some changes were applied to the Linux kernel. However some of his thoughts on things, such as ZFS, slipped out in a notoriously frank way.
The license issue.
Despiste the wishes of many GNU/Linux users the ZFS file system in its open and free form, called OpenZFS, is not integrated and supported into the Linux kernel releases. This is because ZFS was created at Sun Microsystems (owned by Oracle nowadays) and released in 2005 on OpenSolaris with the CDDL license. Many claim such license is not GPL compatible and that is precisely what the Linux kernel is licensed under. More specifically some claim ZFS cannot be combined with Linux.
The GPL allows anyone to change code and make fixes, however this has to be reciprocal if you pretend to publish the results of such changes. You can’t improve GPL code using proprietary fixes, they also have to be published under the GPL. This fact has allowed companies to stop the fierce competition they were in back in the UNIX wars and collaborate in what they were the best at. And in fact this has been very good for them since one kernel rules them all, not needing to develop one for each company. Multiply this cost reduction for the rest of the components of an operating system and on top add the lack of competition. It’s not just a win-win situation. It’s a win raised to the third power.
There are other licenses such as the CDDL, inspired by the MPL, such as the BSD, the Apache or the MIT one, among others, which are called liberal ones. In contrast to the copyleft nature of some, as the GPL example, these licenses allow developers and companies to build proprietary code on top of open sourced code basis. This allows them to choose what parts can be freely available and which parts can’t. Netflix can open source what they develop on FreeBSD because their business is based on selling films, not software. Apple, on the other hand, publishes the Darwin operating system code, the base of macOS (in turn based on FreeBSD), but keeps the proprietary bits that make it graphical and therefore useful for the non-techies closed.
Visions of what purpose and how software should be written are also visions of the world and how it should operate.
Mind Sun Microsystems not only had a great name, it was truely the light brought to computing. They developed UNIX and its standard form known as System V Release 4 but they also brought the Java programming language on which, for example, Android is heavily based on. Sun also released NFS, a way for computers to share files on networks easily and without tying users on them, so NFS was available on their rival’s platforms too. OpenOffice was developed there for many years, until Oracle come into the equation, as well as MySQL (which Sun had bought not long before), now forked into MariaDB and still developed under Oracle.
Oracle is after IBM and everybody should notice it. Larry Ellison, Oracle’s CTO and chairman, likes being number one. This is one of the reasons why the database giant Oracle is acquired Sun in 2009. Buying Sun was buying a broad set of technologies, even hardware with the SPARC series of machines, able to compete to the master of them all, the IBM mainframe. Add to the mix the fact Linux is mainly developed by Red Hat, a nowdays IBM company, that has had their blessing (and contributions too) for many years. Linus is right on being more than cautious with embedding ZFS in Linux.
Developers and companies choose what licenses their software fall under and they have all the rights to do so. Furthermore for some, the sharing alike nature of the GPL is one of the biggest, if not the biggest, values they find on the license and the reason they choose it. However there are others who have their priorities changed and the freedom to choose what to share and how to share ranks higher than the share-alike nature of the GPL. And both are ok.
Being ok, as individuals and mindsets expressions, doesn’t mean rivalry or even conflict doesn’t happen. And what we’ve recently read on what Linus himself has to say about ZFS is a clear example.
And in short, ZFS, with its ‘incomptabile license’ is an outside kernel module, not maintained on Linux by the Linux developers but others, mainly from the FreeBSD community and the ZoL one. The alternative filesystem Btrfs, aimed to cover the same features, has failed to achieve those and corrupted data in some configurations. The fact Linus team can’t support ZFS due to licensing issues is just one of the problems.
Torvalds licensing choice.
Torvalds has held Linux under the GPL version 2 for many long years and, hopefully, will continue to do so for much longer. He has explained the reasons for his choice many many times. But in today’s case we can highlight the following words of his:
“The fact is, the whole point of the GPL is that you’re being “paid” in terms of tit-for-tat: we give source code to you for free, but we want source code improvements back. If you don’t do that but instead say “I think this is _legal_, but I’m not going to help you” you certainly don’t get any help from us.”
Google will offer those interested on this particular a much more broad and decent selection of Linus words on the matter. This said, and for this particular case of today on ZFS one must remember, as Linus has pointed out, Oracle is a litigious company by nature.
Torvalds demands on Oracle.
“And honestly, there is no way I can merge any of the ZFS efforts until I get an official letter from Oracle that is signed by their main legal counsel or preferably by Larry Ellison himself that says that yes, it’s ok to do so and treat the end result as GPL’d.”
Press conferences and public meetings belong to another era and another frame of public relations. So Torvalds wasn’t able to ask Oracle to change the ZFS license to a GPL friendly one, or GPL it straight away. A more ‘subtle’ way was to answer a question on the mailing list and spill out some gut feeling.
Many may think Linux based systems have enough traction on the market for this to happen, and indeed this could be plausible given the fact Oracle has ditched serious Solaris development and evolution, for the x86 server market. However, it seems to be, Oracle is continuing Solaris development to be used on SPARC and compete with the mainframe. And that is a closed, very proprietary market.
Oracle has been a parasite on Linux, specially on Red Hat’s free of charge version CentOS, which is a debranded version of the paid subscription RHEL. Yes, Oracle’s Linux offers some specific tools and has helped developing the creature. Oracle is also pushing out a ZFS competitor, Btrfs, which has failed to get the traction, the feature parity and the reliability of ZFS. Oracle Linux also adds a value when operating Oracle Databases, however it is uncomparable on the sole effort of developing Solaris. Although one might say Oracle has had to invest little to get two operating systems under their umbrella.
User’s demands on Linus when developing Linux.
This can be summarized as: ‘please let us have the goodies we need and want, we understand the issues and incompatibilities, but please don’t break things without a valid reason’. The fact the whole issue came after some symbol changes is just hilarious.
My position on licensing.
I will have to read some literature on how is it possible to run a binary blob on the Linux kernel to understand why a CDDL component is treated as the OpenZFS piece is.
That said I often times find myself more inclined on ‘permissive’ licenses such as the BSD, the Apache or the MIT but that doesn’t discard the GPL. I think it is a good license and has permitted to evolve the computing world and bring technology to the masses in a signfificant way. I would choose the GPL over the BSD or Apache ones depending of the nature of the project.
Despite all the good the GPL has brought to the open source community there have been ways to circumvent some of restrictions it implies. For example Google is known to have modified substantial parts of the Linux kernel but kept them for internal operations only. Since they do not release or share binaries of such software there is no need to ask for the changes back to the source or the community.
I have no problem or issues when using GPL software, not only I have some instances of GNU/Linux running at home, I do also maintain a few servers at work and enjoy that very much.
A few technicalities and facts.
Boot Environments. This is a technology many GNU/Linux users would benefit from but still can’t. It is basically a way to create references to essential files of the operating system and switching them depending on the OS version one wants to run. This makes the upgrade process almost trivial, since a bad or not satisfactory upgrade can be rolled back in a matter of a system reboot. Bork a Linux instance when upgrading and have good luck restoring from backups. Instead of a mere five minute window you will probably spend, at least, half an hour.
Checksums and integrity validations. The nature of ZFS is to checksum and validate what is being written onto disk. When using several disks (as happens with servers) the system can compare samples of inodes and if there are differences those can be corrected. It does also benefit from ECC RAM so writes are impecable by nature. These checksumming and constant validations made the system RAM heavy but it has been gaining optimizations year after year. The trade between having a functioning system with unpaired reliability and the performance cost is in favor of the former. A fast system that corrupts data is not a useful system. Traditional file systems aren’t corrupting data left and right (luckily for us) but the advantages of CoW file systems are notorius, useful and cheap.
Send/Receive datasets. A dataset on ZFS terms can be a snapshot of a file system, a clone, a volume or a file system. In short this means you can snapshot a dataset (a path in your file system) containing whatever information you have on it, compress it, send it over the network and recreate it back instantly on a remote system. Creating and restoring snapshots is a matter of seconds. The most expensive part of this operation is sending them through the network link. This technology is very interesting for developers, since they can snapshot the development on their local machine and send it to production in a matter of minutes. But not just developers can benefit from this, hosting companies can move customer’s instances (or applications) around in a matter of seconds when combining this with Zones/Containers under Oracle’s or the open source Illumos’sparlance or Jails on FreeBSD.
Single disk use. ZFS may seem expensive to operate on a desktop or laptop device given those typically run on a single disk. So for example benefits such as RAID or mirrored disk aren’t available so any corruption on that single disk can’t be spotted by ZFS. However given the nature of the file system writes are checked in a more concious way. And one needs to be reminded about boot environments, cloning, sending and receiving datasets to actually using it on a single disk instance.
Performance. ZFS isn’t as fast as Linux’s Ext4 or XFS file systems when writting to disk when operated out of the box. That has been the case for quite some time. The question is stored data has to be kept consistently througout large periods of time. If it’s accessible quickly enough, writting it down faster or not isn’t the main priority. It can be number two priority, certainly, but consistency and reliability are number one.
Torvalds invitation to not use ZFS.
“Don’t use ZFS. It’s that simple. It was always more of a buzzword than anything else, I feel, and the licensing issues just make it a non-starter for me.”
“Note that “we don’t break users” is literally about user-space applications, and about the kernel I maintain.
If somebody adds a kernel module like ZFS, they are on their own. I can’t maintain it, and I can not be bound by other peoples kernel changes.”
“So things that are outside the kernel tree simply do not matter to us. They get absolutely zero attention. We simply don’t care. It’s that simple.
And things that don’t do that “give back” have no business talking about us being assholes when we don’t care about them.”
Linus is quite sincere and has a very valid reason not to use the ZFS file system. Despite seeming confused on what the original question was related to ZFS or OpenZFS in the mailing list, his claims and feelings can be applied on both, but specailly the latter, since this is the one everyone is using on the Linux world and outside Oracle’s.
He has the right to make these claims and to choose the licensing as well as the direction of the project. And I want to remark he has the right to do so and to choose what he thinks is best.
My invitation to use ZFS.
If reliability is around the top of your demands checklist ZFS is a must when talking about file systems. Specially if the needs to cover are big amounts of data, something becoming a typical necessity, not just a rarity of those dedicated to mass storage. If UNIX is a must and containers are just part of the daily operations, being a developer or being a sysadmin makes no difference, there is an alternative called FreeBSD.
The FreeBSD operating system is not just a kernel but a whole system. Support for ZFS has been available since version 7.0 back from early 2008. It has become the default file system and the performance has just increased over time.
I wrote a few articles on how one could abandon Linux but still be on a free open source unix-like system. These are the titles:
Abandon Linux. How to install iocage to manage FreeBSD Jails. * Jails are the FreeBSD containers.
However you may not trust what admin-by-accident has to say but a true admin. Then read the Jails and ZFS books of the FreeBSD Mastery series from Michael W. Lucas. The links go to his website, no referral, I am just a good netizen.
All the above said, it is quite strange to me, another CDDL based software called D-Trace (which not only interacts with the kernel but it rips it in and out) is running smoothly and without a hassle on the GPL based Linux kernel, particularly Oracle’s Linux. Aren’t now Linus’s licensing change claims a bit off? Oracle published DTrace for Linux with some bits of code concerning the kernel under the GPL2+. Specifically the commit states:
This changeset integrates DTrace module sources into the main kernel
source tree under the GPLv2 license. Sources have been moved to
appropriate locations in the kernel tree.
In addition a new RPM package is introduced: kernel-headers-dtrace.
This package is responsible for installation of DTrace related header
files for its userspace component.
However DTrace for Linux seems to maintain many bits under the orginal CDDL license.
Unless otherwise noted, all files in this distribution are released
under the Common Development and Distribution License (CDDL),
Version 1.0 only. Exceptions are noted within the associated
It may seem not too relevant for Linus since he claimed ZFS is not that big of a deal. Again:
“Don’t use ZFS. It’s that simple. It was always more of a buzzword than anything else, I feel, and the licensing issues just make it a non-starter for me.
The benchmarks I’ve seen do not make ZFS look all that great.”
And I think he is wrong, quite wrong indeed, when stating ZFS has been a buzzword more than anything else. Many companies are leveraging ZFS, and many users are benefitting from it. Everyone is waiting for Linux to get niceities other operating systems got in the past and ZFS is one of those things. In fact, in another article I wrote, Linux has been catching up what UNIX had. This said there is a valid point on what he says, since many organizations are relying on cloud instances, and virtual machines are the norm. The thing is for storage nothing is better than ZFS nowadays and hosting companies would be greatly benefited if they could use ZFS combined with Docker, as Solaris, Illumos based systems (SmartOS) and FreeBSD with jails are already enjoying from.
On the other hand Linus is justifiably ‘scared’ of Oracle suing Linux. They did it in the past for a technology they had bought and they even wanted to copyright an API.
“And I’m not at all interested in some “ZFS shim layer” thing either that some people seem to think would isolate the two projects. That adds no value to our side, and given Oracle’s interface copyright suits (see Java), I don’t think it’s any real licensing win either.”
For those in Linux using ZFS on its open form called OpenZFS there is just one thing they can do wait and fix whatever troubles Linux causes to the independent module. Alternatively they can test FreeBSD and see if it fits their purpose. Netflix is making heavy use of FreeBSD for the video streaming and pushing the boundaries of the network stack. NetApp is also using FreeBSD as a base for their products, as this security advisory states. There is a quite long and significant list of products based on FreeBSD.
Edit: This article was written in just 4 hours straight after a short Saturday nap, publishing it before going for dinner and a drink with friends. Take it easy if there is something off. Sunday morning has added a few corrections here and there, specially in the conclusion section (notice the underlined text).