Keep up with Armagh i

IT outage that paralysed Southern Trust electronic care record was down to human error

A second-stage review, described as “a second pair of eyes”, will help minimise the risk of human errors disrupting again the new encompass electronic care record

The root-cause of the major IT outage, which paralysed the new ‘encompass’ electronic care record on last September, has finally been revealed, with Southern HSC Trust board members being told at their January 29 meeting that it was essentially down to human error.

The new Encompass system, which had only been operational for just over four months – since May 8 -, went down for one working day on September 17, which caused about 1,600 patients to be rescheduled.

Addressing the board of the Southern Health and Social Care (HSC) Trust last Thursday, January 29, director of Planning, Performance and Informatics, Elaine Wilson explained that she was now in a position to outline what had gone wrong on September 17.

She commented: “The paper presented is the final report of the findings and recommendations of the Incident Review Group that was set up to review the major incident that occurred on September 17, 2025, when we had a major outage that lasted for one working day.

“Professor [Graham] Evans has independently chaired the Incident Review Group. He’s going to run us through his findings.

“I’d like to reiterate the Trust’s regrets in relation to the impact the outage had on our patients and service users, due to the cancellation of appointments.

“All these appointments have been rescheduled.”

Professor Evans, who was attending the meeting remotely, told board members: “I’m going to focus on the technical root-cause, and perhaps that reason why that’s happened.

“Digital transformation is not just about technology, it’s about people, process and technology.

“The main technical root-cause was reported as a human process error. An individual was working on some critical infrastructure, undertaking a process that had happened several times previously, but on this particular occasion was working on the active data centre, rather than the inactive data centre.

“As a result of that, when the software change was applied, we had the resulting system outage.

“Human error does take place, but what we can do is put some checks and balances, some safety nets within our process to minimise the chance of these things occurring again.

“I’m very careful using my words to ‘minimise’ rather than ‘prevent’, because it’s very difficult to completely eliminate human error, but the process and the recommendations that we’ve come up with is to implement a second-stage review, a second pair of eyes where another person within that process can double-check that the actions that are about to be taken are the correct actions.

“And we believe, in this particular instance, working on critical infrastructure, that this step will minimise the risk of a reoccurrence.

“In addition, what we’ve done working with vendors within the supply chain, is to implement better and strengthened chain controls.

“So, rather than reaching out to the supply partners during this process after the event, we want to be more proactive and involve them in the planning and preparation of critical infrastructure change.

“So, the second pair of eyes, [as well as] being more proactive, are really the key recommendations that have now been implemented, and will strengthen the learning as we go forward.

“We’ve also considered strengthening the Trust’s governance.”

Professor Graham went on to praise all involved for being so focus on restoring the new IT system so quickly: “From my experience of these types of situations, to get critical infrastructure back up and running within a business day was no mean feat, it was a real Herculean effort from all of the Trust, its partners, and during that time there was a laser-light focus on patient safety and patient care.

“So, I would like to commend the Trust and its partners in their response during that particular event. It was highly professional and patient-focused at all times.

“The report has taken quite a number of months to conclude, but what I would also stress, is that we haven’t waited until the report is concluded.

“When we’ve identified issues, we’ve implemented corrective actions in real time to strengthen governance, technical systems, and so on.

“So, I’m extremely impressed with the professionalism, and the focus of the Trust and its stakeholders.

“I would like to put on record my thanks to everyone that’s been involved in this review.

“All the workstream leads and the colleagues that have worked alongside me to help me understand what happened and why it happened, have made my task a little more palatable for all things concerned.”

The Trust’s director of Planning, Performance and Informatics explained that lessons would be learned from the incident, and recommended measures fully implemented.

She told board members: “While we have started with the implementation of the recommendations, they’re not all complete, so we will be bringing an action plan to our senior leadership team to set out all the recommendations that were made, where we are in terms of the progress in implementing those, and what else we still need to do.

“Some of them will take a little bit more time, but all the recommendations we will be taking forward.

“One of the things to mention in relation to the technical side of things, the technical recommendations, we have now implemented all of those.

“As well as having that second pair of eyes, if something does happen, we have the team there already practically engaged.

“We are also practically informed. Since September 17, we’ve had three occasions where we’ve had to do maintenance to that core infrastructure.

“There will be a maintenance window. The purpose of that maintenance window is to make sure that we remind people about their business continuity arrangements, and to remind the staff that there could be an impact on the running of that system.

“We had a maintenance window on Tuesday evening. The system was running a little bit slow on Tuesday evening, and we had informed the staff that this was likely to happen, to ensure that they got their business continuity arrangements in place.

“Because we are so much more reliant on digital systems now, it has a bigger impact when the system starts to run a bit slow.

“In terms of the work that was done on the day [of the outage], [there was] an absolutely significant effort to get things running back up again.

“The general findings of the review group were very positive in terms of how the Trust responded.”

Local jobs

Sign Up To Our Newsletter

Most read today

More in Craigavon