Confessions of an IT Director

When Murphy Strikes.

You can’t do everything. You can plan, plan contingencies, have a backup, a backup for your backup, documentation up the wall… but something is going to go wrong. 

This has been my experience over the last few weeks/months. As noted before, I was brought in to my current position about a year ago to “wear the cape” and be superman for the organization. I needed a job, so I relished the opportunity. “This is a target rich environment,” the CEO said to me on my final interview, where he kept me for nearly three hours giving me a tour of the enterprise. “Plenty to do here,” the CFO (who has since left the company) said. They were right, of course. But the truth of the matter is, is that they didn’t know what they didn’t know. 

Ultimately, that’s the fundamental difference between the business and IT. That’s what I’ve been trying to disseminate and define, through networking and this blog, to anyone who will listen. The organization doesn’t know what they don’t know, and they anticipate that IT (and in my case, the IT Director) will cover that gap. But we (in IT) are just like the people in the organization, just with a different core proficiency. There’s a blurring of the lines that occurs, because the IT organization’s function isn’t ever explicitly defined. Scope creep begins quite quickly, and as the organization begins to understand that every operation that occurs (especially in these times), depends on the backbone of an IT infrastructure (at least), IT begins to permeate every aspect of the organization. Operations, Finance, Reporting, Marketing, Administration, HR… IT has a hand in them all… 

Furthermore, organizations tend to see IT costs as a burden on the bottom line of the organization, and the expectation of the IT leadership is that we will find ways to save costs by consolidation, cutting, or otherwise. So how do we simultaneously get leaner and prepare for Murphy? How do we cut costs yet advance the company, technology-wise?

I wrote about this mainly in my last post – and this is nothing if not an expansion of that topic. Whatever can go wrong, will go wrong. This is especially true in IT. This is why we have jobs. This is why we are compensated. This is why Gartner reports consistent upwards growth of IT spending. 

(https://www.gartner.com/en/newsroom/press-releases/2019-10-07-gartner-says-global-it-spending-to-grow-06-in-2019)

However, I’d wager that much of this spending is reactionary, and not precautionary… at least that’s been my experience in the enterprises that I’ve been a part of in my career. Capital (IT) expenditures are (in my experience) a direct result of a system-wide failure that forces an enterprise to make an investment (or change), and not a direct result of planning for the future. I’d also wager that those companies that invest in their IT preemptively are better prepared for when (not if) Murphy strikes.

Case-in-point, I’ve been running an ERP upgrade (we’re 5 iterations behind the latest version, so this “upgrade” is more of a re-implementation) over the last 3 months. Despite the organization’s mandate that it be completed in 30 days, due to lack of resources we’ve pushed it to 90. I’ve created the project plans, testing schedule, coordinated with the resources doing the testing, and sat in on the pilot group’s work. From hardware failures, to unknown compatibility issues, to other fires to put out, an ERP implementation is not without its challenges. 

So you document, you plan, you make backups, you convert, you pivot, and you report. You try to foresee every possible thing that could go wrong, and you plan for it.

Until you don’t.

I had spent weeks working with a pilot group in the new system, having them spend hours of their day testing and running transactions through the new ERP to verify that they were able to function and that everything functioned properly. Day after day another user would meet with me, and spend their time working on the system to make sure there would not be a problem come go live. For three months (almost) we did this practice, and I documented, planned, tested, and retested, all the while maintaining my regular duties in managing and providing direction for the IT organization.

In retrospect, however, there was a (pretty big) gap I didn’t count on. This is a learning opportunity – so pay attention.

I’ve come to realize that end-users are typically worker-bees. They aren’t paid enough to think critically of a system, and approach their tasks almost robotically in the sense of the work instructions. Where I was testing functionality with end users, I failed in the analysis. I took for granted that if an end user (and their supervisor) signs off that something is functioning, that it was, without analyzing that cross functionality between tasks worked properly. In a cleanroom environment, person A’s task might function properly, and person B’s task might function properly, but Person B doesn’t know how to analyze Person A’s work and see how it affects them. 

Ultimately, I trusted… and there was my biggest failure. This is why IT folks are cynics. 

After a myriad of delays, I finally got signoff from executive leadership and we were turning up the new system and going live. We went live on a Sunday, because our Australia office runs a day ahead of us, and everything seemed smooth. I celebrated, and was confident in a virtual ticker-tape parade in the office. 

In fairness – the Monday after go-live started that way…. I was hailed as a hero as everyone lauded the speed, consistency, and efficiency of the new system. They didn’t know what they didn’t know. I threw my hands up in celebration, and the VP of Ops was giving me a pat on the back for how great things had gone. 

Then Murphy showed up. In the form of our Planning Manager. She came to work around 8am (I had been there since 5:40am, to account for our operations in other parts of the country) and within an hour of work, she discovered a critical problem. 

Here’s the thing, and the thing I keep having to learn. Worker Bees don’t know WHY there is a problem, they just know they aren’t able to complete their task. They don’t know what they don’t know. 

To spare the technical details, a portion of an add-on that we utilize for our ERP was purchased in the last year by another vendor from the original developer, and they didn’t know that the piece of the add-on wasn’t compatible with the 2019 version of our ERP, and released it anyway. Since only our planning manager utilizes this portion of the add-on, we didn’t know it was a problem until she came in and began trying to release work orders. The ERP didn’t recognize any of the locations where the material was being held, and so while she could release work orders, the workers on the floor would have no way of knowing where what material was being held where. It was a mess.

A quick meeting with the VP of Ops (who had previously patted me on the back) and the consultant ended up with a decision to revert to our previous environment, essentially flushing 17 hours of operations down the drain (no way to recover anything done in the new system)…. And as Director of IT, I had the responsibility to let everyone, enterprise-wide, know that not only was the go-live a massive failure and that we had to go back, I had to let them know that everything that everyone had done in the last 17 hours, every product made, every transaction, invoice, sales order… everything… would have to be redone.

Persona Non Grata. 

Seconds after hearing of this catastrophic failure, my boss (the VP of Finance) called my personal cell… he wasn’t in the office yet and apparently someone had called him to tell him that “all hell was breaking loose”.

Snitches end up in ditches, I thought.

Needless to say, he was not happy, and I got an earful for the next couple days. 

Thing is, is that this function was tested… in a clean environment. The planning manager signed off on this function. Why, oh why, then, in production, did it fail? Because I tested everything individually, but not collectively. In checklist style, I had every department go through their functions and signed off, and then like a conveyor belt, brought the next department to test, never testing how work would go THROUGH the system.

The fault can lie with several people. The end user didn’t recognize the potential problem during testing. I didn’t catch the discrepancy in inventory after the work order completion. The vendor released a piece of software that was incompatible (and they didn’t know it)… but it doesn’t matter. While all those smaller issues will be addressed, the responsibility of a failed implementation (and 17 hours of lost production) lies on my shoulders. 

If you don’t fail, however, you don’t learn. 

To make a longer story even shorter, we pivoted. Shifted gears when Murphy took out one of them. I’m in the midst of deploying a stable release, setting aggressive testing, and a go live date of two weeks from now. All is not lost – much of the leg work done in the three months leading up to the (failed) go-live is still useful and will be used. 

My armor is blemished, but I’m still standing. My leash is shorter, eyes are closer on what I’m doing, but I’m still standing. I could be writing this from a Starbucks as I search for the next fork in my career path, and while that may still come, for now I’m still pushing forward, learning from my mistakes. 

Because if there’s anything I’ve learned to this point – just like the organization, IT also doesn’t know what it doesn’t know. My failure was in the assumption that others did know what they knew.

Image result for that is why you fail yoda gif

I won’t make that mistake again.

4 thoughts on “When Murphy Strikes.”

  1. Reblogged this on The Strategic Sync by Fahim Moledina and commented:
    Great post on struggles of implementation and working through testing. Having the right users is very important and getting correct feedback to reach your end-user goals. Lessons learned and shifting and working agile are great examples of this here.

    Like

  2. Brigham, I appreciate your candor, honesty and humility. I have found these blogs have a much wider application outside the so-called “IT world.” This one in particular! But so much of the impact of your writing is because of your honesty and humility. Know that I am praying for you as you “push forward” in your job/career. – Carl

    Like

  3. I was just talking to a friend of mine the other day regarding how IT capital expenditures are almost always reactive to a disaster.

    Many IT certifications now include test objectives on how to effectively communicate risk to non-technical, cost-averse leaders – which indicates just how common this problem is. The kicker is the fact it’s always IT’s fault; if the CEO decided not to heed your advice, then you didn’t do a good enough job of communicating what the risk of inaction was. Many of these test objectives include obtaining sign-off from stakeholders which is probably the thing that could save your job in this case.

    Liked by 1 person

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: