Especially server accessible only by SSH…
I can’t be bothered to walk down to the basement, so practically my server is also only accessible by SSH
Especially after age 40 and a knee surgery… I’m tired boss! 😩
In the old days some of the servers took at hour to reboot. That was stressful when you couldn’t ping it at an hour.
Don’t say stuff like that. You’re gonna give me a heart attack.
The more disk you had, the longer it took. It walked the scsi bus which took forever. So if you had more disk. It took even longer.
Since everything was remote, you’d have to call hands and they weren’t technical. Also no cameras since it was the 90’s.
Now when I restart a vm or container. I panic if it’s not back up in 10 minutes.
I get annoyed if my pc isn’t restarted in 30 seconds now.
I think mine takes like 2 minutes. It’s ten years old. I’ve putting off upgrading to the cost of videos cards
I got an M.2 drive last year after having a motherboard capable of it for 3-4 years, and naturally named it “Plash Speed”.
I will never not laugh at this video.
I like how posting got fairly fast. Then we started putting absurd amounts of ram into servers so now they’re back to slow.
Like we have a high clock speed dual 32 core AMD server with 1TB of ram that takes at least 5 minutes to do it’s RAM check. So every time you need to reboot you’re just sitting there twiddling your thumbs waiting anxiously.
I will date myself. These machines had a lot of memory as well which added to the slow reboot. I think it was 16 gigs.
The r series for IBM took forever. The p series was faster but was still slow
I’ll date myself. My first PC had 500MB of STORAGE
My first pc had a tape drive.
I had a friend with one of those while I had an Atari. The Atari game would come up within a minute, but the tape took like 15 min to start.
Using a tape drive is crazy when you think about it. It was slow…. This wasn’t the big tape cartridges. It was a standard Audio tape. Not sure why they could store but it was all sequential
Never ask an engineer why lol
Source: am engineer
Initializing VPC…
Configuring VPC…
Constructing VPC…
Planning VPC…
VPC Configuration…
Step (31/12)…
Spooling up VPC…
VPC Configuration Finished…
Beginning Declaration of VPC…
Declaring Configuration of VPC…
Submitting Paperwork for VPC Registration with IANA…
Redefining Port 22 for official use as our private VPC…
Recompiling OpenSSH to use Port 125…
Resetting all open SSH connections…
Your VPC declaration has been configured!
Initializing Declared VPC…
When you make a potentially system breaking change and forgot to make a snapshot of the VM beforehand…
There’s always backups… Right?
… Right?
oh there is. from 3 years ago, and some
Someone set up a script to automatically create daily backups to tape. Unfortunately, it’s still the first tape that was put in there 3.5 years ago, every backup since that one filled up failed. It might as well have failed silently because everyone who received the email with the error message filtered them to a folder they generally ignored.
And no one ever tried to restore it.
Happened to me as well, after a year I learned incremental DB backups were wrongly offset by GMT diff, so we were losing hours every time. Fun.
Luckily we never needed them.
And now we have Postgres with WAL archiving and I sleep so much better.
Just had to restart our main MySQL instance today. Had to do it at 6am since that’s the lowest traffic point, and boy howdy this resonates.
2 solid minutes of the stack throwing 500 errors until the db was back up.
If you have the bandwidth… it is absolutely worth it to invest in a maintenance mode for your system, just check some flat file on disk for a flag before loading up a router or anything and then, if it’s engaged, just send back a static html file with ye olde “under construction” picture.
Bonus points if your static site sends a 503 with a retry after header.
That’s not really… possible at this point. We have thousands of customers (some very large ones, like A——n and G—-e and Wal___t) with tens or hundreds of millions of users, and even at lowest traffic periods do 60k+ queries per second.
This is the same MySQL instance I wrote about a while ago that hit the 16TiB table size limit (due to ext4 file system limitations) and caused a massive outage; worst I’ve been involved in during my 26 year career.
Every day I am shocked at our scale, considering my company is only like 90 engineers.
Is that the same database my user couldn’t connect to today?
deleted by creator
I have more than once typed shutdown instead of reboot when working on a remote machine… always fun
Make an alias for when you type shutdown it does restart and if you want to shutdown make an alias that goes like
Yesireallywanttoshutdown
Networking, we had a remote office in Europe (I’m in the US) and wanted to reset a phone. Phone was on port 10 of the Cisco switch, port 1 went to the firewall (not my design, already in place).
Helping my coworker, I tell her to shut port 10.
Shut port 1, enter.
Ok… office is offline and on another continent…
deleted by creator
Not sure if this will help you, but I always do shutdown and then think about whether I want to do -r or -h. I’m sure it won’t help 🙂
Ipmi is your friend.
Tbh there is nothing more taxing on my mental health than doing maintenance on our production servers.
when it was the wrong server and you’re hoping it comes back up before 5 minutes and nagios starts sending alerts
If a tree falls in the woods…
I install molly-guard on important machines for this reason. So fast to do a reboot on the wrong ssh session
I work with IBM i/AS400 servers and those are not exactly the quickest thing to “reboot” (technically an IPL). Especially the old ones. I have access to the HMC/console but even this sometimes takes several minutes (if not dozens) just to show what’s going on.
It’s always a bit stressful to see the codes passing one after the other and then it stops on one and seems to get stuck there for a while before continuing the IPL process. Maybe it’s applying PTFs (updates) or something, and you just have to wait while even the console is blank.
I’ve been monitoring those servers for years and I’m still sometimes wondering if it hanged during the IPL or if it’s just doing its thing, because this part, even with codes, is not very verbose.
Fortunately it’s also very stable so it pretty much always comes back a few minutes after you start wondering why the hell it’s taking so long.
Y’all need high availability in your lives.
deleted by creator
Plot twist, reboot takes 11 minutes and you didn’t test for it
Ubuntu server just asked me if I want to upgrade to V24, I don’t know when I’ll take time to do that :p







