Berlin

November 4 & 5, 2024

New York

September 4 & 5, 2024

How to break out of the thread of doom

Have you ever been stuck in an online conversation that’s going nowhere? Here are three techniques to break out of the ‘thread of doom’: the rollup summary, asking obvious questions and increasing the bandwidth of the conversation.
July 28, 2021

Have you ever been stuck in an online conversation that’s going nowhere?

Nothing derails productivity like a long email thread. I’m not even talking about the ones where everyone hits ‘reply all’ to tell each other to stop hitting ‘reply all’ and the company slowly descends into lawlessness and chaos. I’m talking about the quiet erosion of motivation that occurs when people don’t quite understand each other, and nobody is getting what they need from the conversation.

How it starts

We’ve all been there.. You need some information, so you send someone an email. They reply with the wrong thing and also an obtuse comment that you don’t understand but you feel that you should. You ask for clarification and they reply with a wall of text and a question that doesn’t seem exactly relevant. It takes time and mental energy to parse what they’re saying. You wring some content out of their treatise, you respond to their question, you try to phrase your need a different way. And they reply with a different question. What is happening?

It’s not just email. I’ve witnessed similar conversations happen over Slack, in group DMs and even in the comments box of bug tracking tickets. Despite their best intentions, people talk past each other. One person insists on speaking in a passive voice that drops context, another isn’t comfortable admitting (maybe even to themselves) that they don’t know how a piece of technology works. Two different teams use different words to describe the same system (one team’s backend can be another’s frontend), and several terms have completely different meanings in different domains. This results in everyone talking past each other like the tech equivalent of the Who’s On First sketch.

An example

I was incident commander once for an escalation where the oncallers for two teams appeared to be willfully misunderstanding each other. Their desks were maybe 15 ft apart, but they were talking on Slack, so the rest of us could witness the confusion unfold. An internal service had been down for a few hours. Nothing customer-visible was happening, but engineers wanted to use the service; the cause of the outage was murky and, as the incident commander on call, I’d offered to help coordinate getting it back online.

But what was happening? It took some time to understand. An infrastructure team had been upgrading one of their components but the upgrade had hit unexpected snags and they’d asked for help reducing load on it while it recovered. The civic-minded owners of the internal service knew that 30 minutes of downtime wouldn’t be a big deal, so they’d sent out a notice to their users and switched off their service. That had been two hours ago.

The team running the internal service was still waiting for the all clear but, unusually for them, the infrastructure team appeared to have no sense of urgency. When asked on Slack about the upgrade, the infrastructure lead went into deep technical detail about the parts that had worked, the parts that had been surprising, the performance optimizations they’d enabled, and the interesting lessons that the team had learned as they improved the system.

“But why did the internal service need to be off?”

“Well, it’s easier to do upgrades when the service is quiescent.”

“But isn’t the upgrade finished now?”

“Mostly, thanks for asking, though the post-upgrade sync is still underway.”

“Should that affect the internal service?”

“Interesting question: yes it should benefit from the upgrade because the new version has improved latency, if–” and here the domain-specific details of the infrastructure system began to fill up Slack.

Back and forth it went, with the internal service still down and other engineers starting to ask pointed questions, and you can see how this began to feel ridiculous.

There are three techniques that can help break out of the thread of doom. It’s probably no surprise that they’re all about communication.

The art of the rollup

Denise Yu wrote a great Twitter thread a while back where she said that an engineering leader’s most useful skill is to create clarity and reduce chaos. She described a tool for achieving this, which she called ‘the art of the rollup’: read all of the backscroll and then update the thread with the clearest possible summary of what’s going on. Her example: to summarize: the problem is X. Possible paths forward are A, B, C. Sounds like we’re leaning towards A. Have I missed anything?”

The rollup resets the thread at a new point, so that anyone who has not read it won’t need to go through the entire backscroll. It also provides a call to action and a next step forward to unblock the discussion. When you say “we’re leaning towards A”, this encourages everyone to move in a similar direction, and there’s some chance of getting out of the conversation alive.

Maybe this is a stupid question but…

When the conversation is really murky, another superpower can be asking questions – and sometimes they need to be embarrassingly basic ones. When a good incident commander comes on the scene of an outage, their first question tends to be, “what’s the customer impact?”.  Some engineers might be indignant about that. “I already told you: the foobar database is down and P95s to the bazqux service is at 800ms!” It can feel awkward to admit that you don’t know how bad that is but, chances are, other people watching don’t know either.

Sometimes the only way out of the thread of doom is by being the person who is willing to admit they don’t know all of the systems. It is often much ‘safer’ for senior people to admit this kind of ignorance than it is for anyone else. With great power comes great responsibility: the responsibility to ask the obvious questions!

Change the channel

As Mark Dunne says, we’ve gotten really good at snarking about the meetings that should have been emails, but we’re still terrible at noticing that we’re in a never-ending email thread that could have been a 15 minute meeting. Sometimes the secret is starting the conversation again in a different place.

That was the solution for the outage I described above. On the day of that particular thread of doom I was sitting upstairs in the same building as these two engineers, watching on Slack as they talked past each other. Eventually, I slammed shut my laptop, walked down the stairs, stood in the space between them and… I’d like to say I used a calm, collected tone but I expect my exasperation leaked out… asked “WHAT IS HAPPENING?” The two engineers restarted the same conversation they’d been having for an hour, but this time they were face to face. The higher bandwidth made it easier to realise that the infrastructure lead didn’t know the internal system was still off. From his point of view, he’d told everyone hours ago that the sticky part of the upgrade was done, so all of the other teams should have known that they could turn their services back on. From everyone else’s point of view, he’d delivered a bunch of technology-specific facts that didn’t tell them anything. Context is everything.

The increased bandwidth of a face to face conversation can often add the clarity that is needed to quickly resolve the problem. This is particularly true when emotions are running high: it’s very easy to misread tone, especially during the stress of an incident, and a heated interaction can quickly become friendly again once everyone is talking openly to one another.

Changing the channel can take other forms too. It might mean moving a conversation from a Jira ticket or a comment thread onto Slack, for example. It can be paired with a rollup: spilling a thread into a set of bullet points in a document that multiple people can edit at once. And if you ask a few more obvious questions along the way, that can be even more effective.

In conclusion

Whether it’s rollups, asking obvious questions or increasing the bandwidth, if you can bring clarity to a conversation you can give everyone back hours of their day. It’s a good leadership skill to be able to recognise when the group is mired in a thread of doom and do whatever it takes to set people free.

Thanks to Katrina Sostek and Anne (juniper) Cross