Cookie Consent by Free Privacy Policy Generator Aktuallisiere deine Cookie Einstellungen ๐Ÿ“Œ Incidents and Operational Resiliency - Why it Matters and What to Consider


๐Ÿ“š Incidents and Operational Resiliency - Why it Matters and What to Consider


๐Ÿ’ก Newskategorie: Programmierung
๐Ÿ”— Quelle: dev.to

Written by Thomas Hoffmann

Utilizing technology and new work methods to save money

Introduction

Are you prepared for an IT incident? If a core component of your infrastructure suddenly fails or data is lost, who would you contact? About two thirds of German mid-size companies would shrug off an answer to these questions [1] - even though being prepared could save enormous costs.

Facts and Data

An average IT outage in a mid-sized company (200-5,000 employees) costs about 25,000 Euro per hour according to a recent study. On average, German companies experience up to four of these outages per year where each outage lasts about 3.8 hours - causing an annual economic damage of over 380,000 Euro! [1]

There are many reasons for the high level of damage: even though only one third of the outages registered has any impact on customer operations [2], internal disruptions can also lead to widespread loss of productivity.

Causes and Mitigation

A process review can offer the greatest remedy: a full 20% of outages can be traced to poor process adherence [2]. At this point, the root cause must be carefully examined: the reason for the process deviation is an important clue as to what can be improved. In times of hybrid and remote work environments, the requirements for processes change as well. "People before process" is a helpful mantra to consider, keeping focused on adjusting processes to the needs of your staff. This does not mean that "sensitivities" should dictate work flows, but that processes should support employees in their work as much as possible and not hinder them.

But it is also possible to use technological influence: especially hyper-scalers such as AWS offer a plethora of possibilities to react to outages and errors in a variety of ways - be it by utilizing smart monitoring and alerting or even automatic error resolution, for example by restarting a certain service or machine.

The choice of your cloud provider is therefore the first factor in a resilient infrastructure: AWS is one of the few cloud providers that has guaranteed the physical separation of its availability zones since its early years, thereby establishing geophysical redundancy. Microsoft Azure only established mandatory geophysical separation in 2018 [3] and Google Cloud Platform still does not guarantee significant physical separation of its zones, although they famously provided the reason why this makes sense in 2023. [4]

The services and technologies used are also a key factor to achieving resiliency: smart monitoring and logging as well as properly configured autoscaling and established error management already go a long way.

Finally, a missing or unknown disaster recovery concept is another reason for long-lasting outages. While a prepared company can ideally restore basic operation with the push of a button, unprepared ones often times have to take inventory first to see what actually needs to be worked on to restore operations.

A famous scenario where you directly profit from being prepared would be a ransomware attack. This not only affects your applications, but also renders all company data inaccessible. A well-architected cloud infrastructure with protected (tamper-proof) backups can safe significant amounts of time and money in this case: affected applications can be quickly terminated and well-trained data recovery operations can reduce any data loss to an acceptable level.

Conclusion

Incidents and outages are expensive. Even more so with a growing number of employees and/or customers. Having to deal with an outage ad-hoc is prone to errors and takes a lot of time. Being prepared by utilizing current data from economy and research and establishing change, incident and recovery procedures will help to avoid incidents or at least keep them short. To prepare, all parts, processes, infrastructure and threat models of your production chain should be considered.

Make our Expertise Your Own!

As an AWS strategic partner, kreuzwerker can call on many years of experience to support you on your journey to resiliency: from best practice and process reviews and optimization over to providing expert knowledge on various AWS technologies, on observability and ElasticSearch as well as orchestrating your microservice deployments on Kubernetes - we are happy to support you to the best of our ability.

Don't hesitate to get in touch if you wish to review your infrastructure and processes in regards to resilience - we look forward to working with and supporting you on your resiliency journey!

[1] https://digitalisationworld.com/news/27800/hp-studie-it-systemausf-auml-lle-kosten-deutsche-mittelst-auml-ndler-im-durchschnitt-fast-400000-euro-pro-jahr

[2] Uptime Institute, Annual Outage Analysis 2023

[3] https://azure.microsoft.com/en-us/blog/azure-availability-zones-now-available-for-the-most-comprehensive-resiliency-strategy/

[4] https://status.cloud.google.com/incidents/dS9ps52MUnxQfyDGPfkY

...



๐Ÿ“Œ Veritas Resiliency Platform up to 3.4 Resiliency Plan privilege escalation


๐Ÿ“ˆ 34.52 Punkte

๐Ÿ“Œ Veritas Resiliency Platform up to 3.4 Resiliency Plan cross site scripting


๐Ÿ“ˆ 34.52 Punkte

๐Ÿ“Œ Protivitiโ€™s new tool helps orgs build operational and financial resiliency amid COVID-19 pandemic


๐Ÿ“ˆ 32.37 Punkte

๐Ÿ“Œ Integrated Risk Management & Operational Resiliency - Steve Schlarman - SCW #48


๐Ÿ“ˆ 31.5 Punkte

๐Ÿ“Œ CrowdStrike and EY expand alliance to help businesses grow their resiliency and security posture


๐Ÿ“ˆ 19 Punkte

๐Ÿ“Œ MetricStream and OCEG Survey Shows Businesses are Seeking Connected GRC Systems to Ensure Risk Readiness and Resiliency


๐Ÿ“ˆ 19 Punkte

๐Ÿ“Œ 37 Major Companies and Organizations Pledge to Enhance Cyber Resiliency and Counter Evolving Global Threats


๐Ÿ“ˆ 19 Punkte

๐Ÿ“Œ Elevating Education on the Security and Resiliency Campus at Think 2018


๐Ÿ“ˆ 18.13 Punkte

๐Ÿ“Œ The Journey to Security and Cyber Resiliency


๐Ÿ“ˆ 18.13 Punkte

๐Ÿ“Œ LIVE from IBM Think 2018: Security and Resiliency Campus


๐Ÿ“ˆ 18.13 Punkte

๐Ÿ“Œ Improving Azure Virtual Machine resiliency with predictive ML and live migration


๐Ÿ“ˆ 18.13 Punkte

๐Ÿ“Œ Trusted Cloud: security, privacy, compliance, resiliency, and IP


๐Ÿ“ˆ 18.13 Punkte

๐Ÿ“Œ Assess and adapt for resiliency


๐Ÿ“ˆ 18.13 Punkte

๐Ÿ“Œ Organisations are experiencing a โ€˜digital awakeningโ€™ as they build resiliency and capacity in the new normal


๐Ÿ“ˆ 18.13 Punkte

๐Ÿ“Œ Gartner: IT force multipliers for sustainable growth, cyber resiliency and responsible investment


๐Ÿ“ˆ 18.13 Punkte

๐Ÿ“Œ A CISOs Practical Guide to Storage and Backup Ransomware Resiliency


๐Ÿ“ˆ 18.13 Punkte

๐Ÿ“Œ Kyndryl CEO meets with Japan PM Kishida to discuss cyber resiliency and AI


๐Ÿ“ˆ 18.13 Punkte

๐Ÿ“Œ Organizations look to build resiliency with hybrid and multi-cloud architectures


๐Ÿ“ˆ 18.13 Punkte

๐Ÿ“Œ Organizations look to build resiliency with hybrid and multi-cloud architectures


๐Ÿ“ˆ 18.13 Punkte

๐Ÿ“Œ AWS releases four storage innovations to add storage performance, resiliency, and value to customers


๐Ÿ“ˆ 18.13 Punkte

๐Ÿ“Œ Report Says Content Indexing and Data Classification Critical for Cyber Resiliency


๐Ÿ“ˆ 18.13 Punkte

๐Ÿ“Œ Return to the office: Organizational resiliency and the new normal


๐Ÿ“ˆ 18.13 Punkte

๐Ÿ“Œ A New Paradigm for Absolute Zero Trust and Infrastructure Resiliency


๐Ÿ“ˆ 18.13 Punkte

๐Ÿ“Œ Network Perception and DeNexus to Co-Host Complimentary Webinar on Cyber Resiliency


๐Ÿ“ˆ 18.13 Punkte

๐Ÿ“Œ Cyber Resiliency and End-user Performance


๐Ÿ“ˆ 18.13 Punkte

๐Ÿ“Œ Anomali unveils new solutions and capabilities to strengthen cyber resiliency for users


๐Ÿ“ˆ 18.13 Punkte

๐Ÿ“Œ 4 Elements That Balance Security and Resiliency


๐Ÿ“ˆ 18.13 Punkte

๐Ÿ“Œ Resiliency, the Edge, and the Future of AI: A Conversation with SAS CTO Bryan Harris


๐Ÿ“ˆ 18.13 Punkte

๐Ÿ“Œ Reliability concepts: Availability, Resiliency, Robustness, Fault-Tolerance, and Reliability


๐Ÿ“ˆ 18.13 Punkte

๐Ÿ“Œ Storage And Backup Cyber Resiliencyย โ€“ CISOs Guide 2024


๐Ÿ“ˆ 18.13 Punkte

๐Ÿ“Œ NetApp cyber-resiliency capabilities protect both primary and secondary data


๐Ÿ“ˆ 18.13 Punkte

๐Ÿ“Œ How to stop ransomware for goodโ€” and add the missing layer to ransomware resiliency


๐Ÿ“ˆ 18.13 Punkte

๐Ÿ“Œ 6 Roadblocks To Business Resiliency -- How Vulnerable Is Your Company?


๐Ÿ“ˆ 17.26 Punkte

๐Ÿ“Œ Enterprise phishing attacks surge but resiliency is on the rise


๐Ÿ“ˆ 17.26 Punkte

๐Ÿ“Œ Veritas Resiliency Platform (VRP) Traversal / Command Execution


๐Ÿ“ˆ 17.26 Punkte











matomo