TLDR: How do you solve problems/what is your thought process to get through them? Do you take breaks or do you keep going until you complete the 6 steps of debugging?

Today I overcame a, what I thought was a major problem, minor challenge. We just got done setting up a cluster, all the servers merged successfully, except for the darned ubuntu VM. Now, I’m over here panicking, That VM, while not completely necessary to production, just so happens to hold a large amount of our knowledgebase. I could go through and spend oodles of time trying to rip out the DB entries and parse XML for most of the items contained, but there was this part of me that ultimately needed to figure out why the VM wasn’t booting.

This is an odd environment as I was never given admin rights to the VM by the prior sysadmin, and it wasn’t documented. So I did what most would probably do and booted the VM into recovery mode so I could go change the root password. Except… Mount -o remount,rw / wasn’t working. The other recovery tool features were not working either, all throwing some kind of error. I’m lost at this point, my google fu is going in the wrong direction until I decided to finally let it boot again check journalctl -r and check dmesg. Sure enough, something about orphaned entries came up, and a quick google fu later taught me to just run “fsck -f (Partition in Question)”, essentially chkdsk for linux (though seemingly quicker and better.?). Yes, the partition I was checking was mounted (I know I shouldn’t, but it wasn’t working, and it was a VM so rebuilding wasn’t challenging). After running that command and restarting the VM, everything came back up flawlessly. This wound up being a 7 hour endeavor overall with random issues in-between that needed to be resolved.

My takeaways. Like physical drives, virtual drives are subject to their own kinds of issues (in our case, we shifted entire file structures, so kind of on us). Checking a disk isn’t a huge problem and I should probably include it in my initial troubleshooting should something be coming up with a partition not mounting. Occam’s razor is a complete pain. Finally, I need to start approaching things with the mentality that I entered this with, but I find myself constantly unable to apply it universally as it seems my will to troubleshoot certain items is quite limited. So, how does the fellow Spiceworks community overcome challenges in their day to day life? Is there a special thought process you follow? March unto death? If I don’t get this project done I’ll be fired? This issue is more irritating than my significant other?

How do you avoid burning out with troubleshooting a specific issue? When you don’t know an environment, how do you approach it?

Video to tedx Talk with Mark Rober: https://www.youtube.com/watch?v=9vJRopau0g0

@sean-spiceworks

51 Spice ups

Hey Sean, I tagged you because I wasn’t really sure where to put this…

1 Spice up

haha, thanks! I was trying to figure out why at first until I saw your reply. I added this to Best Practices with a little dash of Careers… I figure this will bring the topic to the attention of those who would be interested.

3 Spice ups

Thank you, I forgot to tag you in the comments first instead of the actual post… Apologies!

1 Spice up

OSI? I mean depends on the problem lol

4 Spice ups

@alexw – Is OSI how you approach things most commonly? Have their been unique situations in your line of work where the issue should’ve happened at a certain layer but was in an entirely different layer? Perhaps T1 escalates a computer not booting and it is simply just the user not having it plugged in? How do you recalibrate, if at all, after something like that? ​

Edit: Okay, maybe a crappy example, this can be ruled out in a few different ways, but I have seen T1 escalate a ticket because they hated the end user.

6c7c1ad1-93ac-4aa0-b297-d96cbacaee14-2022-06-13_14-32-32.png

18 Spice ups

Love it, thank you for the meme.

I don’t understand your question?

Yes, that’s why you take whatever T1 ‘escalated’ and still start from Layer 1.is it plugged in?

Back when I used to work for an MSP we literally had a tech drive 45 minutes because the users could not figure out what ’ plugged in means.

Was it plugged in? Yes but no, it was this:
e2fd72ec-5d6c-4d1a-b9a3-bc4ff3dda291-2022-06-13_14-37-12.png

some cleaning person unplugged the strip and the user plugged it ‘back in’

14 Spice ups

I guess – Say you over complicated something (which it sounds like you definitely attempt not to). After you find the solution to the overcomplication, how do you avoid face palming again? Or is following the OSI model from start to finish the result of that?

The OSI model helps you ‘not to overcomplicate’ as it goes layer by layer.

And you go bottom up.

Does it work in every scenario and situation? no, but it at least gives like T1 something the can follow which should lead to less confusion.

I mean, you can’t you can try to educate the user, but usually does not work. It might be better to do both, try to educate the user and at the same time educate the tech to ask better questions / give better instructions.
If a T1 does not recognize that the computer is not plugged in and labels it as a no boot, sounds like the T1 needs some training or some framework so he can troubleshoot better. The OSI model helps with that.

4 Spice ups

You nailed my process:

  1. check the logs

  2. google the log entries

The only way to improve the process would be reading the manual. :wink:

4 Spice ups

I start with what i know. and work down. pretty much like the others said OSI it until i pretty much work out what is wrong.

Some of the time you can take an educated guess, but a lot of you are blind especially if you are the only person ever to have the issue or the software community is so small you may as well be on your own :frowning:

2 Spice ups

All I want to know is how the light on the power strip is on.

13 Spice ups

When I’m struggling with a new issue I’ll usually do as much research as I can and try the top 3 solutions.

I will sometimes use the OSI model to troubleshoot.

With this I have so far only been defeated twice - and a fresh reinstall fixed the issues, so technically a win!

1 Spice up

The usual, “No one in the office touched it. Must have been housekeeping/maintenance, etc. Our users would never do something like that…”

@alexw

2 Spice ups

I wait for it to start speaking to me. I then work from there. I will take breaks depending on the severity of what it is I am working on. The most important thing to do is not panic.

3 Spice ups

13f74dd1-ad84-468d-9986-213e8bcdc89a-umm-wait.gif

One has to wonder how exactly these people make it through life every day if their thought process (or rather lack of) works like this.

2 Spice ups

Does this really need to be fixed?

What are the symptoms? Are they normal? Google them

What is the box whining about? Google it

Who was the last guy to touch this? Ask them

Is there something terribly obvious that I’m missing here?

What possessed me to choose this career? : )

2 Spice ups
  1. Rule 1 - Never assume anything, check it all yourself. And by check, I mean check. just because the power cord goes toward the plugin and there are ends of cords that look right, doesn’t mean it is plugged in. As best you can, follow the cord. Check connections on power bricks, check it all that you can.
  2. Rule 2 - Keep it simple. It is easy to get caught in the complex world of this that or the other thing.
  3. Rule 3 - Should be much higher on the list, but breathe. The more stressed out and panicked we become, the less our brain operates effectively.
  4. Rule 4 - Take a moment and focus. For me that is praying. For others meditation. For others still something else. Take a moment to pause and get focus.
  5. Rule 5 - One Step at a time.

I have found this to work for problem solving just about anything.

3 Spice ups