Admin... by accident!

You may have chosen to be an admin. I didn't!

  • Home
  • FreeBSD
  • GNU/Linux
  • Security
  • Network
  • Virtualization
  • Politics
  • Github
  • Donate
  • Me

Abandon Linux. Rolling back the entire OS is possible.

February 18, 2019 by Albert Valbuena

When I was writing an article on updating FreeBSD from the 11.2 version to the new major release number 12, I was trying to add something extra for those who may read some of the information I publish. FreeBSD as a UNIX operating system has similar functionality to the old school UNIX ones such as AIX, Solaris and the like. Of course they are different in some ways, sometimes weird ways, but they share common concepts. The GNU/Linux distributions do also share those concepts and sometimes add some more weirdness to the mix. We, the humans, the weirdo species.

That little extra, not seen in many other upgrade guides, is the rollback chapter. I was reading that article back and I decided to write this one to make the rollback case out loud. While my eyes were scrutining my own words (and my bad spelling joined by my ‘English’) memories of my stay as a UNIX operator at a major European bank stroke me. Operators do ‘simple’ tasks and deal with the day to day operations. You do not need to be a mastermind nor be a proficient ‘C’ programmer to do that job. Indeed if you were such a person you would quit that job very quickly.

If you find the articles in Adminbyaccident.com useful to you, please consider making a donation.

Dealing with application changes, configuration tweaks and validations of such is a regular task for an operator. Often times everything has been already planned beforehand by systems engineers and developers. Approvals have been passed and the changes have passed some tests on development and integration changes before passing up to production, where the real deal sits. So little room is left for improvisation and the operator becomes the man feeding the machine. A monkey playing key-strokes some other, intelligent, being already layed out.

Many configuration changes are trivial and last less than five minutes. Sometimes the preparation phase takes longer than the execution itfself. Only a set of regular configuration and application changes last longer than hour. A few last hours, and a selection of one or two stand out above all. Those last for so long and across so many teams tasks are passed from one shift of operators to the other. Spending two hours of one hot sunny afternoon sitting at a ‘terminal’ just to make a backup copy of the configuration files speaks about the volume of the change. And you’d better be careful with those backups. You’d better read the checksum to compare after the copy has been made. You’d better not miss any single step of the play. Otherwise you may ruin hours of work from others.

Hours go by and changes are applied rigurously without a hassle. Sometimes it almost seems like a dance happening before your eyes but nobody is really moving. Key-strokes here and there and some confirmation calls. The minutes go by at a hugely boring pace. Until something breaks, breaks in the middle of a big application change, and it is production stuff. Then, oh Lord, the bell rings. Alarms pop up and late calls are made. Coordination teams are set and expectation levels rise. Operators are now the engineers’ fingers and eyes. Some traces here, some conversations there, a bit of panic, a bit of a call, a bit of another call, the whole lot. Some engineer suggest something but is not a good solution in the eyes of others, and then the operator is just waiting for the call to end and apply whatever he or she is commanded.

I’ve seen something I thought I’d never see in my entire life. You know, in the past I was a ski bum, who made it to the ski instruction realm and trained kids, built races, skied big mountains of powder, crud snow, icy slopes, everything. And when you are out there you can see things sometimes you never expected to see, like a chair from a chairlift going around the cable, upside down, because of the high speed winds, and the very next day the same cable is completely derailed and it has fallen from the piles, chairs are broken, pulleys bent,… I’ve seen huge avalanches. I’ve been ‘trapped’ on top of a chairlift a few hours, with more than twenty customers, because of sudden and dangerous weather change. That same day people died a few miles away. Shit happens in nature, there is no way around it. So when I was working at the bank, with central heating for the winter, air conditioning for the summer, vending machines and hords of clever and smart people I thought I’d never see a bank stop. Until the mainframe went down. Oh boy.

All banking operations ceased on a Friday afternoon. I believe almost nobody is buying stuff at the mall at that particular time. There was nothing me or my team could do but wait and see. The operations chief made his presence into the room and a small team was assembled to resolve the issues. The order was clear: shut down the whole thing, stop the mainframe. Other services had to stop too so you can imagine what kind of situation that was. Once the engine was stopped, step by step, with very carefully thought commands, operations resumed little by little. There was a bad combination of issues that made a ‘job’ jam and make no progress. This ‘small’ incident, combined with other events, was back traced as the root of the incident in the mainframe a couple of days later, although it was a suspect from the very beginning.

Yes, I had seen it before my own eyes. A bank stopped because of the mainframe stop. Not in my wildest dreams I thought I could see that, live, in the operations room. It was a pity I couldn’t smell the big machine or at least knew it was sitting in the next room. But yes, as a testimony I had seen a major incident, something I believe happens every ten years or more. Something supposed not to happen. But it did.

Back to our operator. Back to your particular day to day rutine. We all have to do updates. For God’s sake, Windows does this all the time. You are very familiar to this. Let’s say there is a twelve hour intervention on a major application for a bank and in the seventh hour something breaks. Validations of middle steps are giving unexpected errors. Conference calls are arranged and people is out of the office but dealing with a serious problem. Technicians of all sorts are trying to figure out what is happening. The operator is sending traces of what he is asked for and applying some other steps that engineers believe would work out. Despiste all the efforts time is running out, normal operations will resume soon on Monday morning, and there is no much time left. The rollback order comes in.

The operator has now to move down to the ‘darkest’ part of the document where the rollback instructions are placed. Configuration files have to be restored, whole folders have to be replaced with other folders, and sometimes backup copies of particular file systems have to be requested to be placed back in. All these pieces have to come in order and validations are also necessary to restore the system as it was before. Luckily things will be as they were on Monday morning after an intense weekend, or middle of the week night, like if nothing had happened. But it has.

I’ve seen, and you too, very old kernels up and running. I know for the fact big companies and governmental institutions are still running some service on Red Hat version 5 boxes. Who wants to run the risk of upgrading a huge custom application, or worse yet, upgrade the operating system version underneath without touching the app substantially? I guess you know that answer. And that explains, partially, why does still exist a set of boxes with old UNIX or Linux versions managing them on every big corporation basement. With ZFS and boot environments you can safely try out OS upgrades without being too worried. You know you can safely roll back the upgrade. Your OS will run again and your application too. There are commercial proprietary implementations of this technology, but there are open source ones too, like FreeBSD and the Illumos derivatives like SmartOS. So yes, in conclusion, you can rollback anytime now. Ooooh yes! I know you are already doing it with hypervisors, and some other shiny red and shifty software. Is that cheating?

Use this link to get 100 $ credit at DOcean and support Adminbyaccident.com hosting costs.

Filed Under: FreeBSD, Politics

Recent Posts

  • How to install Mate on FreeBSD 12/13
  • How to install Nessus 10 on FreeBSD 12
  • How to enable TLS traffic from the origin server on Cloudflare Argo Tunnel
  • How to use Cloudflare’s Argo Tunnel service to publish a website on FreeBSD 12/13
  • How to setup MariaDB master-slave replication on FreeBSD
  • How to upload a FreeBSD custom image on DigitalOcean
  • How to install Drupal 9 on FreeBSD 13.0
  • How to manage site visitors based on IP Geolocation
  • How to enable Geolocation in AWStats on FreeBSD 13.0
  • How to install AWStats on FreeBSD 13.0
  • How to configure Modsecurity 3 for WordPress on FreeBSD
  • How to configure Apache HTTP with a TLS reverse proxy backend on FreeBSD
  • How to detect a WAF – Web Application Firewall
  • How to install Matomo 4 on FreeBSD
  • How to test SSL/TLS configurations
  • How to configure Apache HTTP as a reverse proxy on FreeBSD
  • How to install Nextcloud on FreeBSD 12
  • How to install ModSecurity 3 on FreeBSD
  • How to replace a disk on a ZFS mirror pool
  • How to install Webmin on FreeBSD 12

Archives

  • February 2023
  • January 2023
  • December 2022
  • April 2022
  • March 2022
  • October 2021
  • September 2021
  • June 2021
  • May 2021
  • April 2021
  • March 2021
  • February 2021
  • January 2021
  • December 2020
  • July 2020
  • June 2020
  • May 2020
  • April 2020
  • March 2020
  • February 2020
  • January 2020
  • December 2019
  • November 2019
  • October 2019
  • August 2019
  • July 2019
  • June 2019
  • May 2019
  • April 2019
  • March 2019
  • February 2019
  • January 2019
  • September 2018
  • June 2018
  • May 2018
  • April 2018
  • February 2018
  • January 2018
  • November 2017
  • April 2017

RSS Admin… by accident!

  • How to install Mate on FreeBSD 12/13
  • How to install Nessus 10 on FreeBSD 12
  • How to enable TLS traffic from the origin server on Cloudflare Argo Tunnel
  • How to use Cloudflare’s Argo Tunnel service to publish a website on FreeBSD 12/13
  • How to setup MariaDB master-slave replication on FreeBSD
  • How to upload a FreeBSD custom image on DigitalOcean
  • How to install Drupal 9 on FreeBSD 13.0
  • How to manage site visitors based on IP Geolocation
  • How to enable Geolocation in AWStats on FreeBSD 13.0
  • How to install AWStats on FreeBSD 13.0

Copyright © 2023 · Magazine Pro Theme on Genesis Framework · WordPress · Log in