The term cyberresilience is defined as “[T]he ability of a system to continue to operate under adverse conditions or stress, even if in a degraded or debilitated state, while maintaining essential operational capabilities and recover to an effective operational posture in a time frame consistent with mission needs.”1 The term is also frequently used to refer to the overall organizational ability “…to protect electronic data and systems from cyberattacks, as well as to resume business operations quickly in case of a successful attack.”2 Understanding best practices for enhancing cyberresilience outlined in the US National Institute of Standards and Technology (NIST) Cybersecurity Framework (CSF)3 and COBIT® enable organizations to better protect critical enterprise applications and help limit potential damage from cyberbreaches.
The NIST CSF groups cybersecurity processes and activities into 5 high-level categories (functions) that can aid organizations in creating a structured approach for securing IT systems.4 The categories are identify, protect, detect, respond and recover. NIST further defines 8 cyberresilience objectives that can be attributed to these categories.5 These objectives can be achieved using various techniques, including relevant COBIT management practices and activities.
1. Understand the Context, IT Systems Criticality and Risk Factors (Identify)
The organizational, architectural, operational and threat contexts drive resilience requirements and help reveal potential attack vectors and cybersecurity risk exposure. COBIT governance practice Evaluate, Direct, Monitor (EDM) EDM03.01 Evaluate Risk Management suggests that organizations determine its risk appetite (the level of risk that the organization is willing to take to achieve its objectives) and the risk tolerance levels (temporarily acceptable deviations from the risk appetite). There are several factors that increase the level of risk for the organization and its IT systems including sensitivity and volume of processed data, criticality of provided services, number of users, connectivity to public networks and reliance on third parties. Overall, the criticality of IT systems is driven by its importance for continuity of supported business processes or services. The more critical the system, the more resilience measures that should be considered for its protection. Criticality of the system can be practically assessed by projecting financial losses that may be incurred as a result of system outages. When assessing the criticality of the system, it is also important to understand the impact of its failure on neighboring applications located upstream or downstream in the corresponding business workflow.
2. Prevent the Execution of a Cybersecurity Attack (Protect)
COBIT defines a range of conventional measures focused on deterrence and prevention of cyberattacks, such as protecting against malicious software (Deliver, Service and Support [DSS] 05.01), managing network and connectivity security (DSS05.02), managing endpoint security (DSS05.03), managing user identity and logical access (DSS05.04), managing physical access to IT assets (DSS05.05), managing sensitive documents and output devices (DSS05.06) and managing vulnerabilities and monitoring infrastructure for security-related events (DSS05.07).6 Nevertheless, the complexity of modern systems increases the chances that a highly motivated attacker can find and use weaknesses such as unpatched vulnerabilities or misconfigurations. With this assumption, a resilient design starts to play an important role in limiting the spread of the attack and reducing the incurred damage. For example, for Internet of Things (IoT) systems, the cyberresilience trait might include adaptive isolation of compromised endpoint devices so that the core control system can continue its safe operation, ignoring devices lost on the periphery.
Effective cyberresilience measures preventing propagation of cyberattacks include:
- The reduction of the attack surface, such as removing or disabling system functionality that is not used or is nonessential for the supported business activities (e.g., limiting the number of default system services on critical servers)
- The use of integrity checks to prevent execution of compromised binary files
- The analysis and elimination of single points of failures and sources of fragility. Chaos engineering tools such as Gremlin7 can help automate identification of weak points in IT systems.
3. Limit the Impact of Security Breaches (Protect)
After the attacker gains a foothold in the compromised computer environment, they will try to get full administrative access or at least subordinate other vulnerable systems and servers. Such resilience traits as a secure modular design of IT systems (microservices, containers and discrete components) and well-designed segregation of corporate subnetworks, including the use of network buffer zones, lock the attacker in isolated segments and, in many cases, force them to start from scratch. Other important measures include limiting the use of administrative accounts across system components, such as assigning limited privileges to IT operations teams or avoiding the use of admin accounts embedded in the source code. In addition, COBIT practice DSS05.04 Manage User Identity and Logical Access specifies that all users must have information access rights assigned in accordance with business requirements. Encrypting data at rest and in motion helps to preserve confidentiality.
4. Detect Abnormal Behavior and Realized Damage (Detect)
Effective logging and detection measures provide situational awareness to the responding cybersecurity team and management, thus facilitating an effective response strategy. Security information and event management (SIEM) system, intrusion detection system (IDS), and intrusion prevention system (IPS) tools centralize, protect and correlate security events, preventing attackers from hiding their activities while preserving an audit trail for a later post-mortem and forensic analysis. COBIT management practice DSS05.07 Manage Vulnerabilities and Monitor the Infrastructure for Security-Related Events stresses that it is important that security tools, technologies and detection are integrated with general event monitoring and incident management.
5. Follow a Predefined Incident Response Plan (Respond)
Once an incident occurs, it is important to have a viable response plan in place. The plan should consider probable scenarios (e.g., partial loss of control over IT infrastructure) and acceptable response strategies (e.g., disconnecting IT network segments to prevent further spread of the attack). Main stakeholders should validate the response plan, its activation criteria and communication protocols and also should provide sufficient authority to the response team. A tabletop exercise with the main stakeholders can help identify potential gaps in the response strategy. COBIT activities for DSS02.05 Resolve and Recover From Incidents include selecting and applying the most appropriate incident resolutions, recording whether workarounds were used for incident resolution, performing recovery actions, documenting incident resolution and assessing if the resolution can be used as a future knowledge source.
6. Guarantee Timely Recovery of Essential Components and Services (Recover)
The primary measures to ensure adequate recovery in case of security breaches are regular backups of applications, databases and components of the IT infrastructure (including configurations of directory services, virtual infrastructure and network equipment) and periodic test exercises to reconstruct critical IT systems and data. Many types of ransomware have built-in functionality to detect locations of backups and encrypt them as well. Putting critical backups in read-only archives can protect from this scenario. An additional measure to keep vital organizational information safe is to store copies in an isolated off-site location (data vault)8 It is also important not to forget about recovery scenarios dependent on external third parties. COBIT management practice Align, Plan and Organize (APO) 10.04 Manage Vendor Risk recommends identifying and managing risk relating to vendors’ ability to continually provide secure, efficient and effective service delivery. This also includes the subcontractors or upstream vendors that are relevant in the service delivery of the direct vendor.
7. Adjust the System Architecture to Prevent Breach Recurrence (Respond)
In line with COBIT management practice APO12.02 Analyze Risk, it is important to estimate the likelihood of recurrence and magnitude of loss associated with cyberbreach scenarios, compare related loss exposure to risk appetite and tolerance to identify unacceptable or elevated risk, propose responses for scenarios exceeding risk tolerance levels, specify high-level requirements and expectations for appropriate key controls for risk mitigation responses, confirm that the analysis aligns with enterprise requirements and verify that estimations were properly calibrated and scrutinized for bias and analyze cost/benefit of potential risk response options such as avoid, reduce/mitigate, transfer/share and accept. For example, in practice the post-incident analysis can drive adjustments of system architecture, such as avoidance of technologies that led to breaches. Measures may include replacing software components (e.g., vulnerable database, middleware technology), adding extra nodes to improve performance or create redundancy, changing communication protocols, or simplifying the overall design (e.g., use of a centralized messaging system instead of meshed connections between systems).
8. Adjust Operational Processes to Prevent Recurrence of the Breach (Respond)
In some cases, changing the IT system architecture may not be enough or may be too costly. Instead, the organization may choose to partially change or suspend a business process to avoid the corresponding risk. For example, in the past some social networks have decided to restrict their public application programming interfaces (APIs) as they were frequently abused by their legitimate end users.9 The solution was to implement a rigorous review process prior to granting third parties access to their APIs.
Conclusion
While the complexity of IT systems grows exponentially, so grows the probability of successful cybersecurity breaches. The ability to recover the organizational IT infrastructure and business operations in case of their full or partial compromise can become a question of overall survivability for some of organizations. Building systems with cyberresilience in mind helps reduce costs on protective measures, limits impact in case of a breach and enables quicker reconstruction of IT systems and services. The use of the NIST CSF enables a structured approach that simplifies the assessment of enterprise cyberresilience. In turn, relevant COBIT management practices and activities help identify the best response measures and can be effectively used to strengthen the overall cyberresilience posture of an organization.
Endnotes
1 Ross R.; V. Pillitteri; R. Graubart; D. Bodeau; R. McQuaid; Developing Cyber Resilient Systems: A Systems Security Engineering Approach, National Institute of Standards and Technologies (NIST) Special Publication (SP) 800-160 vol. 2, USA, 2019
2 European Central Bank, “What Is Cyber Resilience?” 2021
3 National Institute of Standards and Technologies (NIST), Cybersecurity Framework, Version 1.1, USA, 2019
4 Ibid.
5 Ibid.
6 ISACA®, COBIT 2019 Framework: Governance and Management Objectives, USA, 2019
7 Gremlin
8 An example is Sheltered Harbor
9 Constine J.; “Facebook Restricts APIs, Axes Old Instagram Platform Amidst Scandals,” TechCrunch, 4 April 2018
Alexander Obraztsov, CISA, CCAK, CISSP, PMP
Is an IT audit director in an investment banking organization based in New York, USA. Obraztsov is an experienced IT risk and assurance and information security professional with more than 12 years of experience in the financial industry. He has completed a significant number of projects enhancing management of IT risk and data protection, increasing effectiveness of information security measures and improving compliance with regulatory requirements for IT in the financial sector in Europe and the United States. He has been an active volunteer and contributor to the ISACA® community since 2017. He is a recent ISACA New York Metro Top Four Under 40 winner in the IT audit, cybersecurity and GRC space. He can be reached at http://www.linkedin.com/in/alexander-obraztsov-6ab54928/.