Domain-7 Security Operations
- Allowed/Blocked listing: allowed or blocked entities, register of entities that are being provided (or blocked) for a particular privilege, service, mobility, access or recognition including web, IP, geo, hardware address, files/programs; entities on the allowed list will be accepted, approved and/or recognized; deprecated AKA whitelist/blacklist; systems also alert IT security personnel an access attempt involves a resource not on a pre-approved list; can also incorporate anti-malware
- Alternate site: contingency or Continuity of Operations (COOP) site used to assume system or org operations, if the primary site is not available
- Backup: copies of files or programs to facilitate recovery
- Baseline: total inventory of all of a system’s components (e.g. hardware, software, data, admin controls, documentation, user instruction); types of baselines include enumerated (which are inventory lists, generated by system cateloging, discovery or enumeration), build security (minimal set of securtiy controls for each CI, see below), modification/update/patch baselines (subsets of total system baseline), or configuration baseline (which should include a revision/version identifier associated with each CI)
- Bastion host: a special-purpose computer on a network specifically designed and configured to withstand attacks; it is typically placed in a demilitarized zone (DMZ) or exposed network segment, and its primary function is to act as a gateway between an internal network and external, potentially untrusted networks (like the internet); key characteristics of a Bastion host include:
- Hardened security: minimizing the number of running services and apps which reduces potential attack surfaces
- Publically accessible: exposed to the internet or untrusted network, acting as the first point of contact for external users
- Logging and monitoring: include extensive logging and monitoring to detect suspicious activity
- Limited network access: typically has limited access to the internal network
- Clipping: one of two main methods of choosing records from a large pool for ruther analysis, clipping uses threshold values to select those records exceeding a predefined threashold (also see sampling)
- Configuration Item (CI): aggregation of information system components designated for configuration management and treated as a single entity in the config management process
- Cyber forensics: gathering, retaining, analyzing data for investigative purposes, while maintaining the integrity of that data
- Disruption: unplanned event that causes a system to be inoperable for a length of time
- DPI (Deep Packet Inspection): a method used by firewalls and other network security devices to examine the data portion (or payload) of packets as they pass through the firewall; DPI goes beyond traditional packet filtering by not only inspecting the header information (such as source/destination IP addresses and port numbers) but also analyzing the content within the packet to identify and respond to security threats
- Egress monitoring: monitoring the flow of info out of an org’s boundaries
- Entitlement:refers to the privelege granted to users when an account is first provisioned
- Entity: any form of a user including hardware device, software daemon, task, processing thread or human, which is attempting to use or access system resources; e.g. endpoint devices are entities that human (or non-human) users use to access a system; should be subject to access control and accounting
- Event: observable occurance in a network or system
- Hackback: actions taken by a victim of hacking to compromise the systems of the alleged attacker
- Heuristics: method of machine learning which identifies patterns of acceptable activity, so that deviations from the patterns will be identified
- Incident: an event which potentially or actually jeopardizes the CIA of an information system or the info the system processes, stores, transmits
- Indicator: technical artifact or observable occurrence suggesting that an attack is imminent, currently underway, or already occured
- Indicators of Compromise (IoC): a signal that an intrusion, malware, or other predefined hostile or hazardous set of events has or is occurring
- Information Security Continuous Monitoring (ICSM): maintaining ongoing awareness of information security, vulnerabilities and threats to support organizational risk management decisions; ongoing monitoring sufficient to ensure and assure effectiveness of security controls
- Information Sharing and Analysis Center (ISAC): entity or collab created for the purposes of analyzing critical cyber and related info to better understand security problems and interdependencies to ensure CIA
- Log: record of actions/events that have taken place on a system
- Motion detector types: wave pattern motion detectors transmit ultrasonic or microwave signals into the montored area watching for changes in the returned signals bouncing off objects; infrared heat-based detectors watch for unusual heat patters; capacitance detectors work based on electromagnetic fields
- MTBF: mean time between failure is an estimation of time between the first and any subsequent failures
- MTTF: mean time to failure is the expected typical functional lifetime of the device given a specific operating enviornment
- MTTR: mean time to repair is the average length of time required to perform a repair on the device
- Netflow: data that contains info on the source, destination, and size of all network communications and is routinely saved as a matter of normal activity
- Precursor: signal from events suggesting a possible change of conditions, that may alter the current threat landscape
- Regression testing: testing of a system to ascertain whether recently approved modifications have changed its performance, or if other approved functions have introduced unauthorized behaviors
- Request For Change (RFC): documentation of a proposed change in support of change management activities
- Root Cause Analysis: principle-based systems approach for the identification of underlying causes associated with a particular risk set or incidents
- RTBH (Remote Triggered Black Hole): a network security technique used in conjunction with firewalls and routers to mitigate Distributed Denial of Service (DDoS) attacks or unwanted traffic by dropping malicious or unwanted traffic before it reaches the target network; RTBH works by creating a “black hole route”, where packets destined for a specific IP address are discarded or “dropped” by the network equipment, effectively isolating malicious traffic
- Sampling: one of two main methods of choosing records from a large pool for ruther analysis, sampling uses statistical techniques to choose a sample that is representative of the entire pool (also see clipping)
- SCCM: System Center Configuration Manager is a Microsoft systems management software product that provides the capability to manage large groups of computers providing remote control, patch management, software distribution, operating system deployment, and hardware and software inventory
- Security Incident: Any attempt to undermine the security of an org or violation of a security policy is a security incident
- SWG (Secure Web Gateway): a security solution that filters and monitors internet traffic for orgs, ensuring that users can securely access the web while blocking malicious sites, preventing data leaks, and enforcing web browsing policies; while it is not a traditional firewall, it complements firewall functionality by focusing specifically on web traffic security
- TCP Wrappers: a host-based network access control system used in Unix-like operating systems to filter incoming connections to network services; allows administrators to define which IP addresses or hostnames are allowed or denied access to certain network services, such as SSH, FTP, or SMTP, by controlling access based on incoming TCP connections; TCP Wrappers relies on two config files: /etc/hosts.allow, and /etc/hosts.deny
- Trusted Computing Base (TCB): the collection of all hardware, software, and firmware components within an architecture that is specifically responsible for security and the isolation of objects that forms a trusted base
- TCB is a term that is usually associated with security kernels and the reference monitor
- a trusted base enforces the security policy
- a security perimeter is the imaginary boundary that separates the TCB from the rest of the system; TCB components communicate with non-TCB components using trusted paths
- the reference monitor is the logical part of the TCB that confirms whether a subject has the right to use a resource prior to granting access
- the security kernel is the collection of the TCB components that implement the functionality of the reference monitor
- Tuple: tuple usually refers to a collection of values that represent specific attributes of a network connection or packet; these values are used to uniquely identify and manage network flows, as part of a state table or rule set in a firewall; as an example, a 5-tuple is as a bundle of five values that identify a specific connection or network session, which might include the sourced IP address, source port numbers, destination IP address, destination port number, and the specific protocol in use (e.g. TCP UDP)
- View-Based access controls: access control that allows the database to be logically divided into components like records, fields, or groups allowing sensitive data to be hidden from non-authorized users; admins can set up views by user type, allowing only access to assigned views
7.1 Understand and comply with investigations (OSG-9 Chpt 19)
- Investigation: a formal inquiry and systematic process that involves gathering information to determine the cause of a security incident or violation
- Investigators must be able to conduct reliable investigations that will hold up in court; securing the scene is an essential and critical part of every investigation
- securing the scene might include any/all of the following:
- sealing off access to the area or crime scene
- taking images of the scene
- documenting evidence
- ensuring evidence (e.g. computers, mobile devices, portable drives etc) is not contacted, tampered with, or destroyed
- general principles:
- identify and secure the scene
- protect evidence – proper collection of evidence preserves its integrity and the chain of custody
- identification and examination of the evidence
- further analysis of the most compelling evidence
- final reporting of findings
- securing the scene might include any/all of the following:
- Locard exchange principle: whenever a crime is committed something is taken, and something is left behind
- The purpose of an investigation is to:
- identify the root cause of the incident
- prevent future occurrences
- mitigate the impact of the incident on the organization
- Types of investigations:
- administrative: an investigation that is focused on policy violations
- criminal: conducted by law enforcement, this type of investigation tries to determine if there is cause to believe (beyond a reasonable doubt) that someone committed a crime
- the goal is to gather evidence that can be used to convict in court
- the job of a security professional is to preserve evidence, ensure law enforcement has been contacted, and assist as necessary
- civil: non-criminal investigation for matters such as contract disputes
- the goal of a civil investigation is to gather evidence that can be used to support a legal claim in court, and is typically triggered from an imminent or on-going lawsuit
- the level of proof is much lower for a civil compared to a criminal investigation
- regulatory: investigation initiated by a government regulator when there is reason to believe an organization is not in compliance
- this type of investigation varies significantly in scope and could look like any of the other three types of investigation depending on the severity of the allegations
- as with criminal investigations, it is key to preserve evidence, and assist the regulator’s investigators
- 7.1.1 Evidence collection and handling
- Evidence collection is complex, should be done by professionals, and can be thrown out of court if incorrectly handled
- It’s important to preserve original evidence
- International Organization on Computer Evidence (IOCE) six principles for media, network and software analysis:
- all general forensic and procedural principles must be applied to digital evidence collection
- seizing digital evidence shouldn’t change the evidence
- accessing original digital evidence should only be done by trained professionals
- all activity relating to seizure, access, storage, or transfer of digital evidence must be fully documented, preserved, and available for review
- a person in possession of digital evidence is responsible for all actions taken with respect to that evidence
- any agency that is responsible for seizing, accessing, storing, or transferring digital evidence is responsible for compliance with these principles
- Scientific Working Group on Digital Evidence (SWGDE) developed principles for standardized recovery of computer-based evidence:
- legal system consistency
- use of a common language
- durability
- ability to cross international and state boundaries
- instill confidence in evidence integrity
- forensic evidence applicability at the individual, agency, and country levels
- ISO/IEC 27037: Guidelines for Identification, Collection, Acquisition, and Preservation of Digital Evidence: the international standard on digital evidence handling, with four phases:
- identification
- collection
- acquisition
- preservation
- Types of evidence:
- primary evidence:
- most reliable and used at trial
- original documents (e.g. legal contracts), no copies or duplicates
- secondary evidence:
- less powerful and reliable than primary evidence (e.g. copies of originals, witness oral evidence etc)
- if primary evidence is available secondary of the same content is not valid
- real evidence: this type of evidence includes physical objects, such as computers, hard drives, and other storage devices, that can be brought into a court of law
- direct evidence: this type of evidence is based on the observations of a witness or expert opinion and can be used to prove a fact at hand (with backup evidence support)
- circumstantial evidence: this type of evidence is based on inference and can be used to support a conclusion, but not prove it
- corroborative evidence: this type of evidence is used to support other evidence and can be used to strengthen a case
- hearsay evidence: type of evidence that is based on statements made by someone outside of court and is generally not admissible; rule says that a witness cannot testify about what someone else told them; courts have applied it such that attorneys may not introduce system logs into evidence unless they are authenticated by a system admin
- best evidence rule: states that the original evidence should be presented in court, rather than a copy or other secondary evidence
- parol evidence rule: determines whether extra/additional evidence can be used to alter or explain a written contract, stating that a written contract takes precedence over any oral negotiations or stipulations that relate to it; the rule generally prohibits the introduction of parol (extra) evidence that contradicts or varies the contract’s terms
- primary evidence:
- It is important to note that evidence should be collected and handled in a forensically sound manner to ensure that it is admissible in court and to avoid any legal issues
- The chain of custody: focuses on having control of the evidence – who collected and handled what evidence, when, and where
- think about establishing the chain of custody as:
- tag
- bag and
- carry the evidence
- think about establishing the chain of custody as:
- Five rules of evidence: five evidence characteristics providing the best chance of surviving legal and other scrutiny:
- authentic: evidence is not fabricated or planted, and can be proven through crime scene photos, or bit-for-bit copies of storage
- accurate: evidence that has integrity (not been modified)
- complete: evidence must be complete, and all parts available and shared, whether they support the case or not
- convincing: evidence must be easy to understand, and convey integrity
- admissible: evidence must be accepted as part of a case
- 7.1.2 Reporting and documentation
- Each investigation should result in a final report that documents the goals of the investigation, the procedures followed, the evidence collected, and the final results
- Preparing formal documentation prepares for potential legal action, and even internal investigations can become part of employment disputes
- Identify in advance a single point of contact who will act as your liasion with law enforcement, providing a go-to person with a single perspective, potentially improving the working relationship
- Participate in the FBI’s InfraGard program
- 7.1.3 Investigative techniques
- Whether in response to a crime or incident, an organizational policy breach, troubleshooting a system or network issue etc, digital forensic methodologies can assist in finding answers, solving problems, and in some cases, help in successfully prosecuting crimes
- The forensic investigation process should include the following:
- identification and securing of a crime scene
- proper collection of evidence that preserves its integrity and the chain of custody
- examination of all evidence
- further analysis of the most compelling evidence
- final reporting
- Sources of information and evidence:
- oral/written statements: given to police, investigators, or as testimony in court by people who witness a crime or who may have pertient information
- written documents: checks, printed contracts, handrwitten letters/notes
- computer systems: components, local/portable storage, memory etc
- visual/audio: visual and audio evidence pertient to a security investigation could include photographs, video, taped recordings, and surveillance footage from security cameras
- Several investigative techniques can be used when conducting analysis:
- media analysis: examining the bits on a hard drive that are intact dispite not having an index
- software analysis: focuses on an applications and malware, determining how it works and what it’s trying to do, with a goal of attribution
- 7.1.4 Digital forensics tools, tactics, and procedures
- Digital forensics: the scientific examination and analysis of data from storage media so that the information can be used as part of an investigation to identify the culprit or the root cause of an incident
- Live evidence: data stored in a running system e.g. random access memory (RAM), cache, and buffers
- Examining a live system can change the state of the evidence
- small changes like interacting with the keyboard, mouse, loading/unloading programs, or of course powering off the system, can change or eliminate live evidence
- Whenever a forensic investigation of a storage drive is conducted, two identical bit-for-bit copies of the original drive should be created first
- eDiscovery: the process of identifying, collecting, and producing electronic evidence in legal proceedings
- 7.1.5 Artifacts (e.g. computer, network, mobile device)
- Forensic artifacts: remnants of a system or network breach/attempted breach, which and may or may not be relevant to an investigation or response
- Artifacts can be found in numerous places, including:
- computer systems
- web browsers
- mobile devices
- hard drives, flash drives
7.2 Conduct logging and monitoring activities (OSG-9 Chpts 17,21)
- 7.2.1 Intrusion detection and prevention
- Intrusion: a security event, or a combination of multiple security events that constitutes an incident; occurs when an attacker attempts to bypass or can bypass/thwart security mechanisms and access an organization’s resources without the authority to do so
- Intrusion detection: a specific form of monitoring events, usually in real time, to detect abnormal activity indicating a potential incident or intrusion
- Intrusion Detection System (IDS): (AKA burglar alarms) is a security service that monitors and analyzes network or system events for the purpose of finding/providing realtime/neartime warnings of unauthorized attempts to access system resources; automates the inspection of logs and real-time system events to detect intrusion attempts and system failures
- an IDS is intended as part of a defense-in-depth security plan
- Intrusion Prevention Systems (IPS): a security service that uses available info to determine if an attack is underway, alerting and also blocking attacks from reaching intended target; includes detection capabilities, you’ll also see them referred to as intrusion detection and prevention systems (IDPSs)
- NIST SP 800-94 Guide to Intrusion Detection and Prevention Systems provides comprehensive (albeit outdated) coverage of both IDS and IPS
- 7.2.2 Security Information and Event Management (SIEM)
- Security Information and Event Management (SIEM): systems that ingest logs from multiple sources, compile and analyze log entries, and report relevant information
- SIEM systems are complex and require expertise to install and tune
- require a properly trained team that understands how to read and interpret info, and escalation procedures to follow when a legitimate alert is raised
- SIEM systems represent technology, process, and people, and each is important to overall effectiveness
- a SIEM includes significant intelligence functionality, allowing large amounts of logged events and analysis and correlation of the same to occur very quickly
- SIEM capabilities include:
- Aggregation
- Normalization
- Correlation
- Secure storage
- Analysis
- Reporting
- Security Information and Event Management (SIEM): systems that ingest logs from multiple sources, compile and analyze log entries, and report relevant information
- 7.2.3 Continuous monitoring
- After a SIEM is set up, configured, tuned, and running, it must be routinely updated and continuously monitored to function effectively
- Effective continuous monitoring encompasses technology, processes, and people
- Continuous monitoring steps are:
- Define
- Establish
- Implement
- Analyze/report
- Respond
- Review/update
- Monitoring: the process of reviewing information logs, looking for something specific
- necessary to detect malicious actions by subjects as well as attempted intrusions and system failures
- can help reconstruct events, provide evidence for prosecution, and create reports for analysis
- continuous monitoring ensures that all events are recorded and can be investigated later if necessary
- Log analysis: a detailed and systematic form of monitoring where logged info is analyzed for trends and patterns as well as abnormal, unauthorized, illegal, and policy-violating activities
- log analysis isn’t necessarily in response to an incident, it’s a periodic task
- 7.2.4 Egress monitoring
- It’s important to monitor traffic exiting as well as entering a network, and Egress monitoring refers to monitoring outgoing traffic to detect unauthorized data transfer outside the org (AKA data exfiltration)
- Common methods used to detect or prevent data exfiltration are data loss prevention (DLP) techniques and monitoring for steganography
- 7.2.5 Log management
- Log management: refers to all the methods used to collect, process, and protect log entries (see SIEM definition above)
- rollover logging: allows admins to set a maximum log size, when the log reaches that max, the system begins overwriting the oldest events in the log
- 7.2.6 Threat intelligence (e.g. threat feeds, threat hunting)
- Threat intelligence: an umbrella term encompassing threat research and analysis and emerging threat trends; gathering data on potential threats, including various sources to get timely info on current threats; information that is aggregated, transformed, analyzed, interpreted, or enriched to provide the necessary context for the decision-making process
- Kill chain: military model (used for both offense and defense):
- find/identify a target through reconnaissance
- get the target’s location
- track the target’s movement
- select a weapon to use on the target
- engage the target with the selected weapon
- evaluate the effectiveness of the attack
- Orgs have adapted this model for cybersecurity: Lockheed Martin created the Cyber Kill Chain framework including seven ordered stages of an attack:
- reconnaissance: attackers gather info on the target
- weaponize: attackers identify an exploit that the target is vulnerable to, along with methods to send the exploit
- delivery: attackers send the weapon to the target via phishing attacks, malicious email attachments, compromised websites, or other common social engineering methods
- exploitation: the weapon exploits a vulnerability on the target system
- installation: code that exploits the vulnerability then installs malware with a backdoor allowing attacker remote access
- command and control: attackers maintain a command and control system, which controls the target and other compromised systems
- actions on objectives: attackers execute their original goals such as theft of money, or data, destruction of assets, or installing additional malicious code (eg. ransomware)
- 7.2.7 User and Entity Behavior Analytics (UEBA)
- UEBA (aka UBA): focuses on the analysis of user and entity behavior as a way of detecting inappropriate or unauthorized activity (e.g. fraud, malware, insider attacks etc); analysis engines are typically included with SIEM solutions or may be added via subscription
- Behavior-based detection: AKA statistical intrusion, anomaly, and heuristics-based detection, starts by creating a baseline of normal activities and events; once enough baseline data has been accumulated to determine normal activity, it can detect abnormal activity (that may indicate a malicious intrusion or event)
- Behavior-based IDSs use the baseline, activity statistics, and heuristic evaluation techniques to compare current activity against previous activity to detect potentially malicious events
- Static code scanning techniques: the scanner scans code in files, similar to white box testing
- Dynamic techniques: the scanner runs executable files in a sandbox to observe their behavior
7.3 Perform Configuration Management (CM) (e.g. provisioning, baselining, automation) (OSG-9 Chpt 16)
- Configuration Management (CM): collection of activities focused on establishing and maintaining the integrity of IT products and info systems, via the control of processes for initializing, changing, and monitoring the configurations of those products/systems through their lifecycle; CM is the process of identifying, controlling, and verifying the configuration of systems and components throughout their lifecycle
- CM is an integral part of secure provisioning and relates to the proper configuration of a device at the time of deployment
- CM helps ensure that systems are deployed in a secure, consistent state and that they stay in a secure, consistent state throughout their lifecycle
- Provisioning: taking a particular config baseline, making additional or modified copies, and placing those copies into the environment in which they belong; refers to installing and configuring the operating system and needed apps on new systems
- new systems should be configured to reduce vulnerabilities introduced via default configurations; the key is to harden a system based on intended useage
- Hardening a system: process of applying security configurations, and locking down various hardware, communications systems, software (e.g. OS, web/app server, apps etc); normally performed based on industry guidelines and benchmarks like the Center for Internet Securit (CIS);
- makes it more secure than the default configuration and includes the following:
- disable all unused services
- close all unused logical ports
- remove all unused apps
- change default passwords
- makes it more secure than the default configuration and includes the following:
- Baseline: in the context of configuration management, it is the starting point or starting config for a system
- an easy way to think of a baseline is as a list of services; an OS baseline identifies all the settings to harden specific systems
- many organizations use images to deploy baselines; baseline images improve the security of systems by ensuring that desired security settings are always configured correctly
- baseline images improve the security of systems by ensuring that desired security settings are always configured correctly; they also reduce the amount of time required to deploy and maintain systems, reducing overall maintenance costs
- Automation: it’s typical to create a baseline, and then use automated methods to add additional apps, features, or settings for specific groups of computers
- note that admins can use create/modify group policy settings to create domain-level standardization or to make security-related Windows registry changes
7.4 Apply foundational security operations concepts (OSG-9 Chpt 16)
- Security operations encompasses the day-to-day tasks, practices, and processes involved in securing and maintaining the operational integrity of an organization’s information systems and assets; it includes security monitoring, incident response, and security awareness and training
- The primary purpose of security operations practices is to safeguard assets such as information, systems, devices, facilities, and apps, and helping organizations to detect, prevent, and respond to security threats
-
Implementing common security operations concepts, along with performing periodic security audits and reviews, demonstrates a level of due care and due diligence
- 7.4.1 Need-to-know/least privilege
- Need-to-Know: principle restricts access to information or resources to only those individuals who require it to perform their specific tasks or duties
- focus: protects sensitive information by limiting what someone can access
- Least Privilege: principle that limits the access rights of users, processes, or systems to the minimum level necessary to perform their job functions; states that subjects are granted only the privileges necessary to perform assigned work tasks and no more
- focus: restricts how much access a user or system has (permissions)
- privilege in this context includes both permissions to data and rights to perform systems tasks
- limiting and controlling privileges based on this concept protects confidentiality and data integrity
- principle relies on the assumption that all users have a well-defined job description that personnel understand
- least privilege is typically focused on ensuring that user privileges are restricted, but it also applies to apps or processes (e.g. if an app or service is compromised, the attacker can assume the service account’s privileges)
- Need to know and least privilege principle are two standard IT security principles implemented in secure networks; they limit access to data and systems so users and other subjects can access only what they require; this limited access helps prevent security incidents and helps limit the scope of incidents when they occur; when not followed, security incidents result in far greater damage to an org
- Need-to-Know: principle restricts access to information or resources to only those individuals who require it to perform their specific tasks or duties
- 7.4.2 Separation of Duties (SoD) and responsibilities
- Separation of Duties (SoD): ensures that no single person has total control over a critical function or system
- SoD policies help reduce fraud by requiring collusion between two or more people to perform unauthorized activity
- example of how SoD can be enforced, is by dividing the security or admin capabilities and functions among multiple trusted individuals
- Two-person control: (AKA two-man rule) requires the approval of two individuals for critical tasks
- using two-person controls within an org ensures peer review and reduces the likelihood of collusion and fraud
- ex: privilege access management (PAM) solutions that create special admin accounts for emergency use only; perhaps a password is split in half so that two people need to enter the password to log on
- Split knowledge: combines the concepts of separation of duties and two-person control into a single solution; the info or privilege required to perform an operation is divided among two or more users, ensuring that no single person has sufficient privileges to compromise the security of the environment; M of N control is an example of split knowledge
- Principles such as least privilege and separation of duties help prevent security policy violations, and monitoring helps to deter and detect any violations that occur despite the use of preventive controls
- Collusion: an agreement among multiple people to perform some unauthorized or illegal actions;
- implementing SoD, two-person control, or split knowledge policies help prevent fraud by limiting actions individuals can do without colluding with others
- Separation of Duties (SoD): ensures that no single person has total control over a critical function or system
- 7.4.3 Privilege account management
- Previleged entities are trusted, but they can abuse privileges, and it’s therefore essential to monitor all assignments of privileged operations
- The goal is to ensure that trusted employees do not abuse the special privleges that are granted; monitoring these operations can also detect many attacks, because attackers commonly use special privileges during an attack
- Advanced privileged account management practices can limit the time users have advanced privileges
- Privileged Account Management (PAM): solutions that restrict access to privileged accounts or detect when accounts use any elevated privileges (e.g. admin accounts)
- Microsoft domains, this includes local admin accounts, Domain and Enterprise Admins groups
- Linux includes root or sudo accounts
- PAM solutions should monitor actions taken by privileged accounts, new user accounts, new routes to a router table, altering config of a firewall, accessing system log and audit files
- 7.4.4 Job rotation
- Job rotation: (AKA rotation of duties) means that employees rotate through jobs or rotate job responsibilities with other employees
- using job rotation as a security control provides peer review, reduces fraud, and enables cross-training
- job rotation policy can act as both a deterrent and a detection mechanism
- Job rotation: (AKA rotation of duties) means that employees rotate through jobs or rotate job responsibilities with other employees
- 7.4.5 Service Level Agreements (SLA)
- Service Level Agreement (SLA): an agreement between an organization and an outside entity, such as a vendor, where the SLA stipulates performance expectations and often includes penalties if the vendor doesn’t meet these expectations
- Memoradum of Understanding (MOU): documents the intention of two entities to work together toward a common goal
7.5 Apply resource protection (OSG-9 Chpt 16)
- Media management should consider all types of media as well as short- and long-term needs and evaluate:
- Confidentiality
- Access speeds
- Portability
- Durability
- Media format
- Data format
- For the test, data storage media should include any of the following:
- Paper
- Microforms (microfilm and microfiche)
- Magnetic (HD, disks, and tapes)
- Flash memory (SSD and memory cards)
- Optical (CD and DVD)
- Mean Time Between Failure (MTBF) is an important criterion when evaluating storage media, especially where valuable or sensitive information is concerned
-
Media management includes the protection of the media itself, which typically involves policies and procedures, access control mechanisms, labeling and marking, storage, transport, sanitization, use, and end-of-life
- 7.5.1 Media management
- Media management: refers to the steps taken to protect media (i.e. anything that can hold data) and the data stored on that media; includes most portable devices (e.g. smart phones, memory/flash cards etc)
- media is prtected throughout its lifetime and destroyed when no longer needed
- As above, OSG-9 also refers to tape media, as well as “hard-copy data”
- Media management: refers to the steps taken to protect media (i.e. anything that can hold data) and the data stored on that media; includes most portable devices (e.g. smart phones, memory/flash cards etc)
- 7.5.2 Media protection techniques
- If media includes sensitive info, it should be stored in a secure location with strict access controls to prevent loss due to unauthorized access
- any location used to store media should have temperature and humidity controls to prevent losses due to corruption
- Media management can also include technical controls to restrict device access from computer systems
- When media is marked, handled, and stored properly, it helps prevent unauthorized disclosure (loss of confidentiality), unauthorized modification (loss of integrity), and unauthorized destruction (loss of availability)
- If media includes sensitive info, it should be stored in a secure location with strict access controls to prevent loss due to unauthorized access
7.6 Conduct incident management (OSG-9 Chpt 17)
- Incident response: the mitigation of violations of security policies and recommended practices; the process to detect and respond to incidents and to reduce the impact when incidents occur; it attempts to keep a business operating or restore operations as quickly as possible in the wake of an incident
-
Incident management is usually conducted by an Incident Response Team (IRT), which comprises individuals with the required expertise and experience to manage security incidents; the IRT is accountable for implementing the incident response plan, which is a written record that defines the processes to be followed during each stage of the incident response cycle
- The main goals of incident response:
- Provide an effective and efficient response to reduce impact to the organization
- Maintain or restore business continuity
- Defend against future attacks
- An important distinction needs to be made to know when an incident response process should be initiated: events take place continually, and the vast majority are insignificant; however, events that lead to some type of adversity can be deemed incidents, and those incidents should trigger an org’s incident response process steps:
- Preparation: includes developing the IR process, assigning IR team members, and everything related to what happens when an incident is identified; preparation is critical, and will anticipate the steps to follow
- Analysis: Gathering and analyzing information about the incident to determine its scope, impact, and root cause (e.g., by interviewing witnesses, collecting and analyzing evidence, and reviewing system logs)
- Containment: Limiting the impact of the incident and preventing further damage (e.g., by isolating affected systems, changing passwords, and implementing security controls)
- Eradication: Removing the cause of the incident from the environment (e.g., by removing malware, patching vulnerabilities, and disabling compromised accounts)
- Recovery: Restoring systems and data to their normal state (e.g., by restoring from backups, rebuilding systems, and re-enabling compromised accounts)
- Lessons Learned: Documenting the incident and learning from it to improve future responses (e.g., by identifying areas where the incident response process can be improved and by sharing lessons learned with other organizations)
- The following steps (Detection, Response, Migtation, Reporting, Recovery, Remediation, and Lessons Learned) are on the exam
- After detecting and verifying an incident, the first response is to limit or contain the scope of the incident while protecting evidence; based on governing laws, an org may need to report an incident to official authorities, and if PII is affected, invdividuals need to be informed; the remediation and lessons learned stages include root cause analysis to determine the cause and recommend solutions to prevent reoccurence
- 7.6.1 Detection
- Detection: the identification of potential security incidents via monitoring and analyzing security logs, threat intelligence, or incident reports; as above, understanding the distinction between an event and an incident, the goal of detection is to identify an adverse event (an incident) and begin dealing with it
- Common methods to detect incidents:
- intrusion detection and prevention systems
- antimalware
- automated tools that scan audit logs looking for predefined events
- end users sometimes detect irregular activity and contact support
- Note: receiving an alert or complaint doesn’t always mean an incident has occurred
- 7.6.2 Response
- After detecting and verifying an incident, the next step is activate an Incident Response (IR) or CSIRT team
- An IR team is AKA computer incident response team (CIRT) or computer security incident response team (CSIRT)
- Among the first steps taken by the IR Team will be an impact assessment to determine the scale of the incident, how long the impact might be experienced, who else might need to be involved etc.
- The IR team typicall investigate the incident, assess the damage, collect evidence, report the incident, perform recovery procedures, and participate in the remediation and lessons learned stages, helping with root cause analysis
- its important to protect all data as evidence during an investigation, and computers should not be turned off
- 7.6.3 Mitigation
- Migitation: attempt to contain an incident; in addition to conducting an impact assessment, the IR Team will attempt to minimize or contain the damage or impact from the incident
- The IR Team’s job at this point is not to fix the problem; it’s simply to try and prevent further damage
- Note this may involve disconnecting a computer from the network; sometimes responders take steps to mitigate the incident, but without letting the attacker know that the attack has been detected
- 7.6.4 Reporting
- Reporting occurs throughout the incident response process
- Once an incident is mitigated, formal reporting occurs because numerous stakeholders often need to understand what has happened
- Jurisdictions may have specific laws governing the protection of personally identifiable information (PII), and must report if it’s been exposed
- Additionally, some third-party standards, such as the Payment Card Industry Data Security Standard (PCI DSS), require orgs to report certain security incidents to law enforcement
- 7.6.5 Recovery
- At this point, the goal is to start returning to normal
- Recovery is the next step, returning a system to a fully functioning state
- The most secure method of restoring a system after an incident is completely rebuilding the system from scratch, including restoring all data from the most recent backup
- effective configuration and change management will provide the necessary documentation to ensure the rebuilt systems are configured properly
- According to the OGS, you should check these areas as part of recovery:
- access control lists (ACLs), including firewall or router rules
- services and protocols, ensuring the unneeded services and protocols are disabled or removed
- patches
- user accounts, ensuring they have changed from default configs
- known compromises have been reversed
- 7.6.6 Remediation
- Remdiation: changes to a system’s config to immediately limit or reduce the chance of reoccurance of an incident;
- Remediation stage: personnel look at the incident, identify what allowed it to occur, and then implement methods to prevent it from happening again
- Remediation includes performing a root cause analysis (which examines the incident to determine what allowed it to happen), and if the root cause analysis identifies a vulnerability that can be mitigated, this stage will recommend a change
- 7.6.7 Lessons Learned
- Lessons learned stage: an all-encompassing view of the situation related to an incident, where personnel, including the IR team and other key stakeholders, examine the incident and the response to see if there are any lessons to be learned
- the output of this stage can be fed back to the detection stage of incident management
- It’s common for the IR team to create a report when they complete a lessons learned review
- based on the findings, the team may recommend changes to procedures, the addition of security controls, or even changes to policies
- management will decide what recommendations to implement and is responsible for the remaining risk for any recommendations they reject
- NOTE: Incident management DOES NOT include a counterattack against the attacker
- Lessons learned stage: an all-encompassing view of the situation related to an incident, where personnel, including the IR team and other key stakeholders, examine the incident and the response to see if there are any lessons to be learned
-
Incident Response Summary:
Step Stage Action/Goal Preparation Detection Triage Response Triage activate IR team Mitigation Investigate containment Reporting Investigate Recovery Recovery return to normal Remediation Recovery prevention Lessons Learned Recovery improve process
7.7 Operate and maintain detective and preventative measures (OSG-9 Chpts 11,17)
- As noted in Domain 1, a preventive or preventative control is deployed to thwart or stop unwanted or unauthorized activity from occurring
- Examples:
- fences
- locks
- biometrics
- separation of duties policies
- job rotation policies
- data classification
- access control methods
- encryption
- smart cards
- callback procedures
- security policies
- security awareness training
- antivirus software
- firewalls
- intrusion prevention systems
- Examples:
- A detective control is deployed to discover or detect unwanted or unauthorized activity; detective controls operate after the fact
- Examples:
- security guards, guard dogs
- motion detectors
- recording and reviewing of events captured by security cameras
- job rotation policies
- mandatory vacation policies
- audit trails
- honeypots or honeynets
- intrusion detection systems
- violation reports
- supervision and reviews of users
- incident investigations
- Examples:
- Some preventative measures:
- Keep systems and applications up to date
- Remove or disable unneeded services and protocols
- Use intrusion detection and prevention systems
- Use up-to-date antimalware software
- Use firewalls
- Implement configuration and system management processes
- 7.7.1 Firewalls (e.g. next generation, web application, network)
- Firewalls are preventive and technical controls
- Types of firewalls:
- Static Packet Filtering: inspects individual packets based on predefined rules (such as IP address, port number, and protocol) without considering the connection state or the content of the data; simple and fast, but lacks context awareness
- Application-Level: functions at the application layer (OSI:Layer 7), acts as an intermediary or proxy, inspecting traffic between the user and the service; can perform deep packet inspection, meaning it can analyze the contents of data packets to identify malicious content or enforce rules for specific applications (e.g., web, email); example: a web application firewall (WAF) inspects traffic going to a web server and can block malicious traffic such as SQL injection attacks and cross-site scripting (XSS) attacks
- Circuit-Level Gateway Firewall: works at the session layer (OSI:Layer 5), and monitors TCP handshakes (i.e., the connection establishment process) to ensure the validity of the session; once the session is validated, it allows the traffic to pass without further inspection of the content; circuit-level gateway firewalls have lower processing overhead, but lacks deep packet inspection
- Stateful Inspection Firewall: operates at the network and transport layers (Layers 3 and 4) but maintains a record of active connections (i.e., it tracks the state of traffic streams across the network); checks whether a packet belongs to an active, legitimate connection before allowing it through; offers better security than static packet filtering; lacks the ability to inspect data at the application layer
- Next-Generation Firewall (NGFW): functions as a unified threat management (UTM) device and combines the features of traditional firewalls (like stateful inspection) with additional features such as deep packet inspection, intrusion prevention systems (IPS), and the ability to detect and block threats at the application layer; often incorporates advanced threat detection using techniques such as sandboxing and behavioral analysis; an NFGW inspects traffic at both the application and network layers, providing comprehensive security, including the ability to identify and block sophisticated threats, but is more expensive and resource-intensive
- Internal Segmentation Firewall (ISFW): used within a network to segment internal traffic and control access between different parts of an org; an ISFW monitors and filters traffic between network segments (such as between the finance department and HR), preventing lateral movement of threats within the network; provides internal protection by monitoring east-west traffic, reduces the risk of an insider threat or lateral movement, can enforce micro-segementation, but can be complex to configure and management
Firewall Type OSI Layers Key Features Strengths Weaknesses Static Packet Filtering Layer 3 (Network) Basic filtering on source/destination IPs and ports Fast, low overhead No context awareness, can’t inspect data payload Application-Level Layer 7 (Application) Inspects application-level data Deep inspection, blocks specific applications High processing overhead, slower performance Circuit-Level Layer 5 (Session) Validates session establishment Low overhead, monitors session validity No payload inspection, can’t detect deeper threats Stateful Inspection Layers 3-4 (Network, Transport) Tracks connection states across sessions Better security than static filtering Doesn’t inspect data at the application layer NGFW Layers 3-7 Combines stateful inspection with deep packet inspection, IPS, and app control Comprehensive threat detection, application-aware Expensive, high resource usage ISFW Internal Segmentation Filters traffic between internal network segments Prevents lateral movement, enforces micro-segmentation Complex configuration, typically for internal use
- 7.7.2 Intrusion Detection Systems (IDS) and Intrusion Prevention Systems (IPS)
- Intrusion Detection Systems (IDSs) and Intrusion Prevention Systems (IPSs) are two methods organizations typically implement to detect and prevent attacks
- Intrusion detection: a specific form of monitoring events, usually in real time, to detect abnormal activity indicating a potential incident or intrusion
- Intrusion Detection System (IDS) automates the inspection of logs and real-time system events to detect intrusion attempts and system failures
- IDSs are an effective method of detecting many DoS and DDoS attacks
- an IDS actively watches for suspicious activity by monitoring network traffic and inspecting logs
- an IDS is intended as part of a defense-in-depth security plan
- knowledge-based detection: AKA signature-based or pattern-matching detection, the most common method used by an IDS
- behavior-based detection: AKA statistical intrusion, anomaly, and heuristics-based detection; behavior-based IDSs use baseline, activity stats, and heuristic eval techniques to compare current activity against previous activity to detect potentially malicious events
- An IPS includes detection capabilities, you’ll see them referred to as intrusion detection and prevention systems (IDPSs)
- an IPS includes all the capabilities of an IDS but can also take additional steps to stop or prevent intrusions
- IDS/IPS should be deployed at strategic network locations to monitor traffic, such as at the perimeters, or between network segments, and should be configured to alert for specific types of scans and traffic patterns
- See NIST SP 800-94
- 7.7.3 Whitelisting/blacklisting
- Method used to control which applications run and which apps can’t is via allow list, and deny list (AKA whitelists and blacklists)
- Allow list: identifies a list of apps authorized to run on a system and blocks all other apps
- Deny list: identifies a list of apps that are not authorized to run on a system
- Allow and deny lists are used for applications to help prevent malware infections
- Important to note: a system would only use one list, either allow or deny
- Apple iOS running on iPhones/iPads is an example of an extreme version of an allow list; users are only able to install apps from the App Store
- 7.7.4 Third-party provided security services
- Some orgs outsource security services such as auditing and penetration testing to third party security services
- Some outside compliance entities (e.g. PCI DSS) require orgs to ensure that service providers comply
- OSG also mentions that some SaaS vendors provide security services via the cloud (e.g. next-gen firewalls, UTM devices, and email gateways for spam and malware filtering)
- 7.7.5 Sandboxing
- Sandboxing: refers to a security technique where a separate, secure environment is created to run and analyze untested or untrusted programs or code without risking harm to the host device or network; this isolated environment, known as a sandbox, effectively contains the execution of the code, allowing it to run and behave as if it were in a normal computing environment, but without the ability to affect the host system or access critical resources and data
- Confinement: restriction of a process to certain resources, or reading from and writing to certain memory locations; bounds are the limits of memory a process cannot exceed when reading or writing;isolation is using bounds to create/enforce confinement
- Sandboxing provides a security boundary for applications and prevents the app from interacting with other apps; can be used as part of development, integration, or acceptance testing, as part of malware screening, or as part of a honeynet
- 7.7.6 Honeypots/honeynets
- Honeypots: individual computers created as a trap or a decoy for intruders or insider threats
- Honeynet: two or more networked honeypots used together to simulate a network
- They look and act like legit systems, but they do not host data of any real value for an attacker; admins often configure honeypots with vulnerabilities to tempt intruders into attacking them
- In addition to keeping the attacker away from a production environment, the honeypot allows administrators to observe an attacker’s activity without compromising the live environment
- 7.7.7 Anti-malware
- Malware: program inserted into a system with the intent of compromising the CIA of the victim’s data, applications, or OS; malicious software that negatively impacts a system
- The most important protection against malicious code is the use of antimalware software with up-to-date signature files and heuristic capabilities
- multi-pronged approach with antimalware software on each system in addition to filtering internet content helps protect systems from infections
- following the principle of least privilege, ensuring users do not have admin permissions on systems won’t be able to install apps that may be malicious
- These are the characteristics of each malware type:
- virus: software written with the intent/capability to copy and disperse itself without direct owner knowledge/cooperation; the defining characteristic is that it’s a piece of malware that has to be triggered in some way by the user; program that modifies other programs to contain a possibly altered version of itself
- worm: software written with the intent/capability to copy and disperse without owner knowledge/cooperation, but without needing to modify other programs to contain copies of itself; malware that can self-propagate and spread through a network or a series of systems on its own by exploiting a vulnerability in those systems
- companion: helper software that is not malicious on its own; it could be something like a wrapper that accompanies the actual malware
- macro: associated with Microsoft Office products, and is created using a straightforward programming language to automate tasks; macros can be programmed to be malicious and harmful
- multipartite: means the malware spreads in different ways (e.g. Stuxnet)
- polymorphic: malware that can change aspects of itself as it replicates to evade detection (e.g. file name, file size, code structure etc)
- trojan: a Trojan horse is malware that looks harmless or desirable but contains malicious code; trojans are often found in easily downloadable software; a trojan inserts backdoors or trapdoors into other programs or systems
- bot: an emerging class of mobile code; employing limited machine learning capabilities to assist with user requests for help or assistance, automation of or assistance with workflows, data input quality validation etc.
- botnet: many infected systems that have been harnessed together and act in unison
- boot sector infectors: pieces of malware that can install themselves in the boot sector of a drive
- hoaxes/pranks: not actually software, they’re usually part of social engineering—via email or other means—that intends harm (hoaxes) or a joke (pranks)
- logic bomb: malware inserted into a program which will activate and perform functions suiting the attacker when some later date/conditions are met; code that will execute based on some triggering event
- stealth: malware that uses various active techniques to avoid detection
- ransome attack: any form of attack which threatens the destruction, denial or unauthorized public release/remarketing of private infomation assets; usually involves encrypting assets and withholding the decryption key until a ransom is paid
- ransomware: type of malware that typically encrypts a system or a network of systems, effectively locking users out, and then demands a ransom payment (usually in the form of a digital currency) to gain access to the decryption key
- rootkit: Similar to stealth malware, a rootkit attempts to mask its presence on a system; malware that embeds itself deeply in an OS; term is derived from the concept of rooting and a utility kit of hacking tools; rooting is gaining total or full control over a system; typically includes a collection of malware tools that an attacker can utilize according to specific goals
- zero-day: is any type of malware that’s never been seen in the wild before, and the vendor of the impacted product is unaware (or hasn’t issued a patch), as are security companies that create anti-malware software intended to protect systems; previously unreported vuln which can be potentially exploited without risk of detection or prevention until system owner/developer detects and corrects vuln; gets name from the “zero time” being the time at which the exploit or vuln is first identified by the systems’ owners or builders; AKA zero-hour exploit, zero-day attack
- 7.7.8 Machine learning and Artificial Intelligence (AI) based tools
- AI: gives machines the ability to do things that a human can do better or allows a machine to perform tasks that we previously thought required human intelligence
- Machine Learning: a subset of AI and refers to a system that can improve automatically through experience
- a ML system starts with a set of rules or guidelines
- an AI system starts with nothing and progressively learns the rules, creating its own algorithms as it learns the rules and applies ML techniques based on these rules
- Behavior-based detection is one way ML and AI can apply to cybersecurity
- an admin relates a baseline of normal activities and traffic on a network; the baseline in this case is similar to a set of rules given to a ML system
- during normal operations, it detects anomalies and reports them; if the detection is a false positive (incorrectly classifying a benign activity, system state, or configuration as malicious or vulnerable), the ML system learns
- An AI system starts without a baseline, monitors traffic and slowly creates its own baseline based on the traffic it observes
- as it creates the baseline it also looks for anomalies
- an AI system also relies on feedback from admins to learn if alarms are valid or false positives
7.8 Implement and support patch and vulnerability management (OSG-9 Chpt 16)
- Vulnerability Management: activities necessary to identify, assess, prioritize, and remediate information systems weaknesses
- Vulnerability management includes routine vuln scans and periodic vuln assessments
- vuln scanners can detect known security vulnerabilities and weaknesses, like the absence of patches or weak passwords
- vuln scanners generate reports that indicate the technical vulns of a system and are an effective check for a patch management program
- vuln assessments extend beyond just technical scans and can include review and audits to detect vulnerabilities
- Patch and vulnerability management processes work together to help protect an org against emerging threats; patch management ensures that appropriate patches are applied, and vuln management helps verify that systems are not vulnerable to known threats
- Patch: (AKA updates, quick or hot fixes) a blanket term for any type of code written to correct bug or vulnerability or to improve existing software performance; when installed, a patch directly modifies files or device settings without changing the version number or release details of the related software comonent
- in the context of security, admins are primarily concerned with security patches, which are patches that affect a system’s vulns
- Patch Management: systematic notification, identification, deployment, installation and verification of OS and app code revisions known as patches, hot fixes, and service packs
- an effective patch management program ensures that systems are kept up to date with current patches by evaluating, testing, approving, and deploying appropriate patches
- Patch Tuesday: several big-tech orgs (e.g. Microsoft, Adobe, Oracle etc) regularly release patches on the second Tuesday of every month
- Patch management is often intertwined with change and configuration management, ensuring that documentation reflects changes; when an org doesn’t have an effective patch management program, it can experience outages and incidents from known issues that could have been prevented
- There are three methods for determining patch levels:
- agent: update software (agent) installed on devices
- agentless: remotely connect to each device
- passive: monitor traffic to infer patch levels
- Deploying patches can be done manually or automatically
- Common steps within an effective program:
- evaluate patches: determine if they apply to your systems
- test patches: test patches on an isolated, non-production system to determine if the patch causes any unwanted side effects
- approve the patches: after successful testing, patches are approved for deployment; it’s common to use Change Management as part of the approval process
- deploy the patches: after testing and approval, deploy the patches; many orgs use automated methods to deploy patches, via third-party or the software vendor
- verify that patches are deployed: regularly test and audit systems to ensure they remain patched
- Vulnerability Management: regularly identifying vulns, evaluating them, and taking steps to mitigate risks associated with them
- it isn’t possible to eliminate risks, and it isn’t possible to eliminate all vulnerabilities
- a vuln managment program helps ensure that an org is regularly evaluating vulns and mitigating those that represent the greatest risk
- one of the most common vulnerabilities within an org is an unpatched system, and so a vuln management program will often work in conjunction with a patch management program
7.9 Understand and participate in change management processes (OSG-9 Chpt 16)
- Change management: formal process an org uses to transition from the current state to a future state; typically includes mechanisms to request, evaluate, approve, implement, verify, and learn the change; ensures that the costs and benefits of changes are analyzed and changes are made in a controlled manner to reduce risks
- Change management processes allow various IT experts to review proposed changes for unintended consequences before implementing
- Change management controls provide a process to control, document, track, and audit all system changes
- The change management process includes multiple steps that build upon each other:
- Change request: a change request can come from any part of an org and pertain to almost any topic; companies typically use some type of change management software
- Assess impact: after a change request is made, however small the request might be, the impact of the potential change must be assessed
- Approval/reject: based on the requested change and related impact assessment, common sense plays a big part in the approval process
- Build and test: after approval, any change should be developed and tested, ideally in a test environment
- Schedule/notification: prior to implementing any change, key stakeholders should be notified
- Implement: after testing and notification of stakeholders, the change should be implemented; it’s important to have a roll-back plan, allowing personnel to undo the change
- Validation: once implemented, senior management and stakeholders should again be notified to validate the change
- Document the change: documentation should take place at each step; it’s critical to ensure all documentation is complete and to identify the version and baseline related to a given change
- When a change management process is enforced, it creates documentation for all changes to a system, providing a trail of info if personnel need to reverse the change, or make the same change on other systems
- Change management control is a mandatory element for some security assurance requirements (SARs) in the ISO Common Criteria
7.10 Implement recovery strategies (OSG-9 Chpt 18)
- Recovery strategy: a plan for restoring critical business components, systems, and operations following a disruption
- Disaster recovery (DR): set of practices that enable an organization to minimize loss of, and restore, mission-critical technology infrastructure after a catastrophic incident
- Business continuity (BC): set of practices that enables an organization to continue performing its critical functions through and after any disruptive event
- 7.10.1 Backup storage strategies
- Backup strategies are driven by org goals and objectives and usually focus on backup and restore time as well as storage needs
- Archive bit: technical detail (metadata) that indicates the status of a backup relative to a given backup strategy
- 0 = no changes to the file or no backup required
- 1 = file has been modified or backup required
- Different backup strategies deal with the archive bit differently; Incremental and differential backup strategies don’t treat the archive bit in the same manner
- once a full backup is complete, the archive bit on every file is reset, turned off, or set to 0
- Three types of backups:
- Full backup: store a complete copy of the data contained on the protected device or backup media; full backups duplicate every file on the system regardless of the setting of the archive bit
- Incremental backup: changes since the last incremental backup
- only files that have the archive bit turned on, enabled, or set to 1 are duplicated
- once an incremental backup is complete, the archive bit on all duplicated files is reset, turned off, or set to 0
- Differential backup: changes since the last full backup
- only files that have the archive bit turned on, enabled, or set to 1 are duplicated
- unlike full and incremental backups, the differential backup process does not change the archive bit
- the most important difference between incremental and differential backups is the time needed to restore data in the event of an emergency
- a combination of full and differential backups will require only two backups to be restored: the most recent full backup and the most recent differential backup
- a combination of full backups with incremental backups will require restoration of the most recent full backups as well as all incremental backups performed since that full backup
- differential backups don’t take as long to restore, but they take longer to create than incremental
- Note: Grandfather/Father/Son, Tower of Hanoi, and Six Cartridge Weekly are all different approaches to rotating backup media, balancing media reuse with data retention concerns
- Grandfather/Father/Son (GFS): three or more backup cycles, such as daily, weekly and monthly; the daily backups are rotated on a 3-months basis using a FIFO system, the weekly backups are similarly rotated on a bi-yearly basis, and the monthly backup on a yearly basis
- Tower of Hanoi: based on the puzzle of the same name, where the first backup is overwritten every other day, the second backup is overwritten every fourth day, and the third backup is overwritten every other day at a different increment than the first backup
- Six Cartridge Weekly: a method that involves six different media (cartridge, tape, drives etc) used for each day of the week; many small businesses that do not need to backup high volumes of data use this type of tape rotation schedule, and usually consists of using four media for incremental and differential backups between Monday and Thursday
- Backup storage best practices include keeping copies of the media in at least one offsite location to provide redundancy should the primary location be unavailable, incapacitated, or destroyed; common strategy is to store backups in a cloud service that is itself geographically redundant
-
Two commmon backup strategies: 1) full backup on Monday night, then run differential backups every other night of the week - if a failure occurs Saturday morning, restore Monday’s full backup and then restore only Friday’s differential backup 2) full backup on Monday night, then run incremental backups every other night of the week - if a failure occurs Saturday morning, restore Monday’s full backup and then restore each incremental backup in the original chronological order
Feature Full Backup Incremental Backup Differential Backup Description A complete copy of all selected data Only backs up data that has changed since the last backup (regardless of type) Backs up all changes made since the last full backup Storage Space Requires the most storage space Requires the least storage space Requires more space than incremental but less than full Backup Speed Slowest, as it copies all data Fastest, as it only copies changed data since the last backup Faster than full but slower than incremental, as it copies all changes since the last full backup Recovery Speed Fastest, as all data is in one place Slowest, as it may require multiple incremental backups to restore to a specific point Faster than incremental since it requires the last full backup and the last differential backup Complexity Simplest, with no dependency on previous backups Complex, as it depends on a chain of backups from the last full backup to the most recent incremental backup Less complex than incremental, requires the last full backup and the last differential backup for restoration Best Use Case When backup time and storage space are not issues Ideal for less frequent backups Suitable for environments where daily changes are minimal and quick backups are necessary Ideal for environments where storage space is a concern but restoration time needs to be relatively quick - Three main techniques used to create offsite copies of DB content: electronic vaulting, remote journaling, and remote mirroring
- electronic vaulting: where database backups are moved to a remote site using bulk transfers
- remote journaling: data transfers are performed in a more expeditious manner; remote journaling is similar to electronic vaulting in that transaction logs transferred to the remote site are not applied to a live database server but are maintained in a backup device
- remote mirroring: the most advanced db backup solution, and the most expensive, with remote mirroring, a live db server is maintained at the backup site; the remote server receives copies of the db modifications at the same time they are applied to the production server at the primary site
- 7.10.2 Recovery site strategies
- Non-disaster: service disruption with significant but limited impact
- Disaster: event that causes an entire site to be unusable for a day or longer (usually requires alternate processing facility)
- Catastrophe: major disruption that destroys the facility altogether
- For disasters and catastrophes, an org has 3 basic options:
- use a dedicated site that the org owns/operates
- lease a commercial facility (hot, warm, cold site)
- enter into a formal agreement with another facility/org
- When a disaster interrupts a business, a disaster recovery plan should kick in nearly automatically and begin providing support for recovery operations
- in addition to improving your response capabilities, purchasing insurance can reduce the impact of financial losses
- Recovery site strategies consider multiple elements of an organization, such as people, data, infrastructure, and cost, as well as factors like availability and location
- When designing a disaster recovery plan, it’s important to keep your goal in mind — the restoration of workgroups to the point that they can resume their activities in their usual work locations
- sometimes it’s best to develop separate recovery facilities for different work groups
- To recover your business operations with the greatest possible efficiency, you should engineer the disaster recovery plan so that those business units with the highest priority are recovered first
- For disasters and catastrophes, an org has 3 basic options:
- 7.10.3 Multiple processing sites
- One of the most important elements of the disaster recovery plan is the selection of alternate processing sites to be used when the primary sites are unavailable
- cold sites: standby facilities large enough to handle the processing load of an organization and equipped with appropriate electrical and environmental support systems
- a cold site has NO COMPUTING FACILITIES (hardware or software) preinstalled
- a cold site has no active broadband comm links
- advantages:
- a cold site is the LEAST EXPENSIVE OPTION and perhaps the most practical
- disadvantages:
- tremendous lag to activate the site, often measured in weeks, which can yield a false sense of security
- difficult to test
- warm sites: a warm site is better than a cold site because, in addition to the shell of a building, basic equipment is installed
- a warm site contains the data links and preconfigured equipment necessary to begin restoring operations, but no usable data for information
- unlike hot sites, however, warm sites do not typically contain copies of the client’s data
- activation of a warm site typically takes at least 12 hours from the time a disaster is declared
- hot sites: a fully operational offsite data processing facility equipped with hardware and software; a backup facility that is maintained in constant working order, with a full complement of servers, workstations, and comm links
- a hot site is usually a subscription service
- the data on the primary site servers is periodically or continuously replicated to corresponding servers at the hot site, ensuring that the hot site has up-to-date data
- advantages:
- unsurpassed level of disaster recovery protection
- disadvanages:
- extremely costly, likely doubling an org’s budget for hardware, software and services, and requires the use of additional employees to maintain the site
- has (by definition) copies of all production data, and therefore increases your attack surface
- Mobile sites: non-mainstream alternatives to traditional recovery sites; usually configured as cold or warm sites, if your DR plan depends on a workgroup recovery strategy, mobile sites are an excellent way to implement that approach
- Cloud computing: many orgs now turn to cloud computing as their preferred disaster recovery option
- some companies that maintain their own datacenters may choose to use these IaaS options as backup service providers
- Note: A hot site is a subscription service, while a redundant site, in contrast, is a site owned and maintained by the org (and a redudant site may be “hot” in terms of capabilities)
- the exam differentiates between a hot site (a subscription service) and a redundant site (owned by the organization)
- Cloud computing: many orgs now turn to cloud computing as their preferred disaster recovery option
- cold sites: standby facilities large enough to handle the processing load of an organization and equipped with appropriate electrical and environmental support systems
- One of the most important elements of the disaster recovery plan is the selection of alternate processing sites to be used when the primary sites are unavailable
- 7.10.4 System resilience, High Availability (HA), Quality of Service (QoS), and fault tolerance
- System resilience: the ability of a system to maintain an acceptable level of service during an adverse event
- High Availability (HA): the use of redundant technology components to allow a system to quickly recover from a failure after experiencing a brief disruption
- Clustering: refers to a group of systems working together to handle workloads; often seen in the context of web servers that use a load balancer to manage incoming traffic, and distributes requests to multiple web servers (the cluster)
- Redundancy: unlike a cluster, where all members work together, redundancy typically involves a primary and secondary system; the primary system does all the work, and the secondary system is in standby mode unless the primary system fails, at which time activity can fail over to the secondary
- Both clustering and redundancy include high availability as a by-product of their configuration
- Quality of Service (QoS): controls protect the availability of data networks under load
- many factors contribute to the quality of the end-user experience and QoS attempts to manage all of these factors to create an experience that meets business requirements
- factors contributing to QoS:
- bandwidth: the network capacity available to carry communications
- latency: the time it takes a packet to travel from source to destination
- packet loss: some packets may be lost between source and destination, requiring re-transmission
- interference: electrical noise, faulty equipment, and other factors may corrupt the contents of packets
- Fault tolerance: the ability of a system to suffer a fault but continue to operate
- Redundant array of independent disks (RAID): refers to multiple drives being used in unison in a system to achieve greater speed or availability; the most well-known RAID levels are:
- RAID 0—Striping: provides significant speed, writing and reading advantages
- RAID 1—Mirroring: uses redundancy to provide reliable availability of data
- RAID 10—Mirroring and Striping: requires a minimum of four drives and provides the benefits of striping (speed) and mirroring (availability) in one solution; this type of RAID is typically one of the most expensive
- RAID 5—Parity Protection: requires a minimum of three drives and provides a cost-effective balance between RAID 0 and RAID 1; RAID 5 utilizes a parity bit, computed from an XOR operation, for purposes of storing and restoring data
Backup Method Cost Implications Time Implications for RPO Incremental Lower cost due to reduced storage requirements as only changes are backed up Longer recovery time as it requires the last full backup plus all subsequent incremental backups until the RPO Differential Moderate cost; more storage is needed than incremental, but less than full, as it stores all changes since the last full backup Faster recovery than incremental as it requires the last full backup and the last differential backup up to the RPO Replication Higher cost due to the need for a duplicate environment ready to take over at any time; continuous data replication can also increase bandwidth costs Minimal recovery time as the data is continuously updated, allowing for near-instant recovery up to the latest point before failure Clustering Highest cost because it involves multiple servers (cluster) working together to provide high availability and redundancy Minimal recovery time as the system is designed for immediate failover without data loss, ensuring the RPO can be met instantaneously Site Recovery Method Cost Implications Time Implications for RTO Cold Site Lowest cost option; facilities and infrastructure are available, but equipment and data need to be set up post-disaster Longest recovery time as systems and data must be configured and restored from backups Suitable for non-critical applications with more flexible RTOs Warm Site Moderate cost; a compromise between cold and hot sites, includes some pre-installed hardware and connectivity that can be quickly activated Faster recovery than a cold site as the infrastructure is partially ready, but data and systems might still need updates to be fully operational Hot Site High cost; a duplicate of the original site with full computer systems and near-real-time replication of data and ready to take over operations immediately Minimal recovery time, designed for seamless takeover with data and systems up-to-date, allowing for critical operations to continue with little to no downtime Redundant Site Highest cost; essentially operates as an active-active configuration where both sites are running simultaneously, fully mirroring each other Instantaneous recovery, as the redundant site is already running in parallel with the primary site, ensuring no interruption in service - High Availability (HA): the use of redundant technology components to allow a system to quickly recover from a failure after experiencing a brief disruption
- System resilience: the ability of a system to maintain an acceptable level of service during an adverse event
7.11 Implement Disaster Recovery (DR) processes (OSG-9 Chpt 18)
- Business Continuity Management (BCM): the process and function by which an organization is responsible for creating, maintaining, and testing BCP and DRP plans
- Business Continuity Planning (BCP): focuses on the survival of the business processes when something unexpected impacts it
- Disaster Recovery Planning (DRP): focuses on the recovery of vital technology infrastructure and systems
- BCM, BCP, and DRP are ultimately used to achieve the same goal: the continuity of the business and its critical and essential functions, processes, and services
- The key BCP/DRP steps are:
- Develop contingency planning policy
- Conduct BIA
- Identify controls
- Create contingency strategies
- Develop contingency plan
- Ensure testing, training, and exercises
- Maintenance
- Four key measurements for BCP and DRP procedures:
- RPO (recovery point objective): max tolerable data loss measured in time
- RTO (recovery time objective): max tolerable time to recover systems to a defined service level
- WRT (work recovery time): max time available to verify system and data integrity as part of the resumption of normal ops
- MTD (max tollerable downtime): max time-critical system, function, or process can be disrupted before unacceptable/irrecoverable consequences to the business
- 7.11.1 Response
- A disaster recovery plan should contain simple yet comprehensive instructions for essential personnel to follow immediately upon recognizing that a disaster is in progress or imminent
- Emergency-response plans are often put together in a form of checklists provided to responders; arrange the checklist tasks in order of priority, with the most important task first!
- The response plan should include clear criteria for activation of the disaster recovery plan, define who has the authority to declare a disaster, and then discuss notification procedures
- 7.11.2 Personnel
- A disaster recovery plan should contain a list of personnel to contact in the event of a disaster
- usually includes key members of the DRP team as well as critical personnel
- Businesses need to make sure employees are trained on DR procedures and that they have the necessary resources to implement the DR plan
- Key activities involved in preparing people and procedures for DR include:
- develop DR training programs
- conduct regular DR drills
- provid employees with necessary resources and tools to implement the DR plan
- communicate the DR plan to all employees
- A disaster recovery plan should contain a list of personnel to contact in the event of a disaster
- 7.11.3 Communications
- Ensure that response checklists provide first responders with a clear plan to protect life and property and ensure the continuity of operations
- the notification checklist should be supplied to all personnel who might respond to a disaster
- Ensure that response checklists provide first responders with a clear plan to protect life and property and ensure the continuity of operations
- 7.11.4 Assessment
- When the DR team arrives on site, one of their first tasks is to assess the situation
- this normally occurs in a rolling fashion, with the first responders performing a simple assessment to triage the situation and get the disaster response under way
- as the incident progresses more detailed assessments will take place to gauge effectiveness, and prioritize the assignment of resources
- When the DR team arrives on site, one of their first tasks is to assess the situation
- 7.11.5 Restoration
- Note that recovery and restoration are separate concepts
- Restoration: bringing a business facility and environment back to a workable state
- Recovery: bringing business operations and processes back to a working state
- System recovery includes the restoration of all affected files and services actively in use on the system at the time of the failure or crash
- When designing a disaster recovery plan, it’s important to keep your goal in mind — the restoration of workgroups to the point that they can resume their activities in their usual work locations
- 7.11.6 Training and awareness
- As with a business continuity plan, it is essential that you provide training to all personnel who will be involved in the disaster recovery effort
- When designing a training plan consider the following:
- orientation training for all new employees
- initial training for employees taking on a new DR role for the first time
- detailed refresher training for DR team members
- brief awareness refreshers for all other employees
- 7.11.7 Lessons learned
- A lessons learned session should be conducted at the conclusion of any disaster recovery operation or other security incident
- The lessons learned process is designed to provide everyone involved with the incident response effort an opportunity to reflect on their individual roles and the teams overall response
- Time is of the essence in conducting a lesson learned, before memories fade
- Usually a lessons learned session is led by trained facilitators
- NIST SP 800-61 offers a series of questions to use in the lessons learned process:
- exactly what happened and at what times?
- how well did staff and management perform in dealing with the incident?
- were documented procedures followed?
- were the procedures adequate?
- were any steps or actions taken that might have inhibited the recovery?
- what would the staff and management do differently the next time a similar incident occurs?
- how could information sharing with other organizations have been improved?
- what corrective actions can prevent similar incidents in the future?
- what precursors or indicators should be watched for in the future to detect similar incidents?
- what additional tools or resources are needed to detect, analyze, and mitigate future incidents?
- The team leader to document the lessons learned in a report that includes suggested process improvement actions
7.12 Test Disaster Recovery Plans (DRP) (OSG-9 Chpt 18)
- Every DR plan must be tested on a periodic basis to ensure that the plan’s provisions are viable and that it meets an org’s changing needs
- Five main test types:
- checklist tests
- structured walk-throughs
- simulation tests
- parallel tests
- full-interruption tests
- 7.12.1 Read-through/tabletop
- Read-through test: one of the simplest to conduct, but also one of the most critical; copies of a DR plan are distributed to the members of the DR team for review, accomplishing three goals:
- ensure that key personnel are aware of their responsibilities and have that knowledge refreshed periodically
- provide individuals with an opportunity to review and update plans, remvoving obsolete info
- helps identify situations in which key personnel have left the company and the DR responsibility needs to be re-assigned (note that DR responsibilities should be included in job descriptions)
- Read-through test: one of the simplest to conduct, but also one of the most critical; copies of a DR plan are distributed to the members of the DR team for review, accomplishing three goals:
- 7.12.2 Walkthrough
- Structured walk-through: AKA tabletop exercise, takes testing one step further, where members of the DR team gather in a large conference room and role-play a disaster scenario
- the team refers to their copies of the DR plan and discuss the appropriate responses to that particular type of disaster
- Structured walk-through: AKA tabletop exercise, takes testing one step further, where members of the DR team gather in a large conference room and role-play a disaster scenario
- 7.12.3 Simulation
- Simulation tests: similar to the structured walk-throughs, where team members are presented with a scenario and asked to develop an appropriate response
- unlike read-throughs and walk-throughs, some of these response measures are then tested
- this may involve the interruption of noncritical business activities and the use of some operational personnel
- Simulation tests: similar to the structured walk-throughs, where team members are presented with a scenario and asked to develop an appropriate response
- 7.12.4 Parallel
- Parallel tests: represent the next level, and involve relocating personnel to the alternate recovery site and implementing site activation procedures
- the relocated employees perform their DR responsibilities just as they would for an actual disaster
- operations at the main facility are not interrupted
- Parallel tests: represent the next level, and involve relocating personnel to the alternate recovery site and implementing site activation procedures
- 7.12.5 Full interruption
- Full-interruption tests: operate like parallel tests, but involve actually shutting down operations at the primary site and shifting them to the recovery site
- these tests involve a significant risk (shutting down the primary site, transfer recovery ops, followed by the reverse) and therefore are extremely difficult to arrange (management resistance to these tests are likely)
- Full-interruption tests: operate like parallel tests, but involve actually shutting down operations at the primary site and shifting them to the recovery site
7.13 Particpate in Business Continuity (BC) planning and exercises (OSG-9 Chpt 3)
- Business continuity planning addresses how to keep an org in business after a major disruption takes place
- It’s important to note that the scope is much broader than that of DR
- A security leader will likely be involved, but not necessarily lead the BCP effort
- The BCP life cycle includes:
- Developing the BC concept
- Assessing the current environment
- Implementing continuity strategies, plans, and solutions
- Training the staff
- Testing, exercising, and maintaining the plans and solutions
7.14 Implement and manage physical security (OSG-9 Chpt 10)
- Physical access control mechanisms deployed to control, monitor and manage access to a facility
- Sections, divisions, or areas within a site should be clearly designated as public, private, or restricted with appropriate sinage
- 7.14.1 Perimeter security controls
- A fence is a perimeter-defining device and can consist of:
- stripes painted on the ground
- chain link fences
- barbed wire
- concrete walls
- invisible perimeters using laser, motion, or heat detection
- Perimeter intrusion detection and assessment system (PIDAS): an advanced form of fencing that has two or three fences used in concert to optimize security
- Gate: controlled exit and entry point in a fence or wall
- turnstile: form of gate that prevents more than one person at a time from gaining entry and often restricts movement in one direction
- access control vestibule: (AKA mantrap) a double set of doors that is often protected by a guard or other physical layout preventing piggybacking and can trap individuals at the discretion of security personnel
- Security bollards: a key element of physical security, which prevent vehicles from ramming access points and entrances
- Barricades: in addition to fencing, are used to control both foot traffic and vehicles
- Lighting is the most commonly used form of perimeter security control providing the security benefit of deterrence (primary purpose is to discourage casual intruders, trespassers etc)
- Security guards are able to adapt and react to considtions and situations; guard dogs can be an alternative for perimiter control, functioning as detection and deterrent
- All physical security controls ultimately rely on personnel to intervene and stop actual intrusions and attacks
- KPIs (key performance indicators) of physical security are metrics or measurements of the operation of or failure of key security aspects; they should be monitored, recorded, and evaluated
- A fence is a perimeter-defining device and can consist of:
- 7.14.2 Internal security controls
- In all circumstances and under all conditions, the most important aspect of security is protecting people
- Internal security controls include locks, badges, protective distribution systems (PDSs), motion detectors, intrusion alarms, and secondary verification systems
- If a facility is designed with restricted areas to control physical security, a mechanism to handle visitors is required
- Visitor logs: manual (or automated) list of nonemployee entries or access to a facility/location
- physical access logs can establish context for interpretation of logical logs
- Visitor logs: manual (or automated) list of nonemployee entries or access to a facility/location
- Locks: designed to prevent access without proper authorization; a lock is a crude form of an identification and authorization mechanism 7.15 Address personnel safety and security concerns (OSG-9 Chpt 16)
- 7.15.1 Travel
- Training personnel on safe practices while traveling can increase their safety and prevent security incidents:
- sensitive data: devices traveling with the employee shouldn’t contain sensitive data
- malware and monitoring devices: possibilities include physical devices being installed in a hotel room of a foreign country
- free wi-fi: sounds appealing but can be a used to capture a user’s traffic
- VPNs: employers should have access to VPNs that they can use to create secure connections
- Training personnel on safe practices while traveling can increase their safety and prevent security incidents:
- 7.15.2 Security training and awareness
- Orgs should add personnel saftey and security topics to their training and awareness program and help ensure that personnel are aware of duress systems, travel best practices, emergency management plans, and general safety and security best practices
- Training programs should stress the importance of protecting people
- 7.15.3 Emergency management
- Emergency management plans and practices help an organization address personnel safety and security after a disaster
- Safety of personnel should be a primary consideration during any disaster
- 7.15.4 Duress
- An example of a duress system is a button that sends a distress call
- Duress systems allow guards to raise alarms in response to emergencies, and for emergency management plans help the org respond to disasters
- Duress systems are useful when personnel are working alone
- If a duress system is activated accidentally, code word(s) can be used to assure responding personnel it was an accident, or omit the word(s) keying an actual response