Lesson 4.2: Incident Response Planning
Incident Response Planning is one of most important aspects of your organization’s cybersecurity program. As a DoD contractor, it’s virtually inevitable that your organization will be the target of a cyberattack. Seriously. And most small businesses don’t survive a major cyberattack.
Even if your organization isn’t targeted from the outside, the threat of unintentional insider misuse of your IT system is just as real. For example, the 2019 Verizon Data Breach Investigative Report (DBIR) found around 8% of all data breaches are the result of sending an email with sensitive information to an unintended recipient. The root cause behind this is silly—our reliance on email software autofill to populate the “To:” line, and not doublechecking the autofill suggestion. “Only 8%” you say? What if one of those 8% is a Controlled Defense Information (CDI) breach that goes undiscovered until we realize the CDI is in the hands of the Chinese? As we discussed in the last lesson, a data breach is most certainly an incident—a safeguard failed, and we lost control of sensitive information. We are required to report incidents involving CDI to the DoD.
The DoD recognizes the very real risk of its sensitive information—CDI—being compromised through its contractors, so its DFARS 7012 clause requires us to be able to identify, respond to, and report cyber incidents. Part (c)(1) of that clause specifically says:
Note the “when”, not “if”. The clause then goes on to list what the contractor is expected to do (we’ll paraphrase here to spare you the legalese):
- Perform incident response
- Rapidly report (within 72 hours) incidents to the DoD (we’ll cover this in detail in the next lesson)
- Submit malicious software discovered during incident response to the DoD Cyber Crime Center (DC3): [email protected]; Hotline: (410) 981-0104; Toll Free: (877) 838-2174
- Capture and protect “images” of affected IT components and relevant monitoring/packet capture data for at least 90 days after the report is filed (supports possible DoD investigation)
Not so bad right? (Yes, we are being facetious). This clause requires us to have a very robust incident response capability. To be honest, for the smallest of you, this capability may in fact be entirely beyond your means. You may have to outsource incident response. But no matter what, none of this will get done if we don’t have a plan.
The DFARS 7012 clause requirements are reiterated in the NIST 800-171 Incident Response control family, which requires us to develop an Incident Response Plan (IRP). In this lesson we’ll cover the basics of a good IRP and introduce you to some resources that can facilitate execution of the plan when the time comes.
At a minimum, NIST 800-171 control 3.6.1 requires your organizaiton to:
This means your IRP should have the following sections or address the following response activities:
This list of steps mirrors the NIST Incident Response (IR) guidance in their 800-61 standard. At Totem.Tech we advocate the similar but slightly more comprehensive PICERL response model, with the following phases:
- Lessons Learned
Your IRP should cover at least the basics of how your organization will execute each PICERL phase. It’s impossible and impractical to try to explain in detail how your organization is expected to respond to every conceivable type of incident. Instead, the IRP should provide high-level guidelines for how the Information Security Officer (ISO) and Computer Security Incident Response Team (CSIRT) should approach each phase of IR. The IRP should be concise, readable, free of legalese, and indexable to allow the team to rapidly locate necessary information. The point is not for the team to memorize the IRP, but to consult it as necessary. Just like an NFL quarterback may consult a playbook at his wrist, you should think of the IRP as the incident handling “playbook” for the CSIRT. Let’s get into some details about what each PICERL section should contain.
Preparation is the second most important phase in IR, behind only Containment. The preparation section should at a minimum contain the following information:
- Contact information for the ISO
- Contact information for the CSIRT
- List of responsibilities for the ISO, e.g.:
- Implementation of audit logging and threat hunting strategies the organization will use to identify incidents
- Communication with external entities, including the media
List of responsibilities for the CSIRT, e.g.:
- Preparation of tools used to collect, store, and communicate incident evidence (recall the requirement to store “images” and logs for at least 90 days)
- IR tactics the CSIRT should practice
- List of recovery metrics: MTD, RTO, and RPO, so the team understands the time parameters they need to meet during IR.
- The MTD is the maximum amount of time the organization shall tolerate the IT system being unusable. For example, an organization may suffer catastrophic business loss if it cannot access its system + information for more than 48 hours. The MTD is 48 hours.
- The RTO is the maximum time the organization shall take to recover/restore the IT system components. For example, say a server taken offline to respond to an incident must be brought back online in no more than 12 hours. The RTO for the server is 12 hours. The RTO for the organization is the equal to the shortest RTO for any individual component. If it appears that an incident will affect IT components for a longer period than the RTO, the organization must exercise its Disaster Recovery Plan (DRP).
- The RPO is the maximum allowable data loss, in terms of time, the organization shall tolerate. For example, an organization may tolerate one-day of data loss; the RPO in this case is one (1) day, and backup strategies are developed accordingly.
- MTD = RTO + time it takes to restore data to RPO
- List of individuals within the organization IR information needs to be shared with
- Contact information for external entities IR information needs to be shared with, certainly the DoD, and perhaps other US government officials, e.g. the FBI
- Instructions for exercising the IRP—exercising is crucial to honing the IRP. You’ll want to train on real-world scenarios to discover and fix weakness in the IRP. See the IRP Exercise sidebar for more information
The identification section should contain general instructions for how to:
- Assess the alert and classify it as an official “incident”, if necessary
- First, rule out the possibility that the alert is the result of an authorized change
- Draw the team’s attention to Indicators of Compromise (IOC) and attack tactics, techniques, and procedures (perhaps include these lists in this section or in an appendix)
- Memorialize and share information, including
- Format of notes, collaboration tools, encryption requirements
- Collect and store evidence
- use of Digital Forensics and Incident Response (DFIR) tools, including “packet capture”. Packet capture is the process of collecting all the individual IP packets that make up a network traffic flow. You can think of flow as “conversation”. Say an attacker has remote command and control (C&C) of a server on your network—because they must issue commands as part of their attack, there will be a flow of network traffic between your server and the attacker’s C&C center. You’ll have to do your best to record, or capture, this flow. Flows can be captured into digital files using free tools such as tcpdump or Wireshark.
There are lots of other good free resources to help you identify and collect IR information. See the Resources section at the end of this lesson for some links.
As our partners at GuardSight, Inc. like to say: Containment is the most important COA (Course of Action) during incident response. The primary focus of IR is to limit damage and prevent further escalation or breaches. Think of your priority during a kitchen fire: put the fire out! Then you can sort through the damage. Activities related to containment are going to be very specific to the actual incident, so it’s hard to start off specific in this section, but IRP exercises will help flesh things out. The contain section must provide some general guidelines though:
- Reminders to the team on effective containment strategies: inventory assets, detect, deny, disrupt, degrade, deceive, and destroy
- Instructions on team relief, as containing the damage may mean long days for the team. Team members need to be sharp, so some rest periods are mandatory.
- If it appears containing the problem will take longer than the RTO (see above), leadership must be informed so they can prepare alternate means of meeting the organization’s IT needs
The eradication phase is similar to containment in that it’s highly dependent on the specific incident. But this section of the plan is a good place to remind the team that the threat must be completely removed before any affected components are restored. Also, this SANS resource has some good hints on how to create decision-tree type playbooks for the contain and eradicate phases.
This section is a good place to refer to the IT component baselines (as required by control 3.4.1), so the IT team can help the CSIRT ensure components are returned to a known state.
This section should also have a list of criteria to determine if a component is not salvageable. For instance, if the incident involves a rootkit on a server, you may be better off completely replacing that server than trying to remove all traces of the infection.
In this section include procedures for allowing a compromised IT asset to return to the system. You can use a checklist to guide an authorized individual (the ISO is a good candidate) through the steps necessary to declare the compromised asset “clean”. Another good checklist to have is the steps to bring portions of the IT system back online, what we call a restoration priority checklist. For instance, in many architectures, an authoritative domain controller must be up and running before workstation users can login with domain accounts. This is especially important for multi-factor authentication (MFA) environments: the authenticating mechanisms must be working for MFA to provide its protections.
This section should also include procedures for ensuring the IT system and data are back to normal operations, i.e. a “system checkout”.
During and after every IR, whether real-world or an exercise, all participants should be looking for opportunities for improvement. Soon after the dust has settled, the team should gather and have a formal session to share lessons learned. The ISO should take minutes and prepare an after-action report. Of course, if the incident involved CDI or a system that processes CDI, an official report will have to be filed with the DoD. It’s during the lessons learned phase that the information for such a report will be collated and packaged for the DoD.
Records of every IRP exercise should be cataloged as compelling evidence that the organization exercise the plan in accordance with control 3.6.3:
We suggest tracking the IRP exercises in the Lessons Learned section in a simple table like below:
Good Incident Response Resources
|Phase||Description + Link||Cost?|
|Prepare||How to Obtain an ECA Certificate||~100/yr|
|Prepare||List of Breach Notification Laws||Free|
|US-CERT Incident Scoring System||Free|
|Identify||SANS Digital Forensics Cheat Sheets||Free|
|Identify||Various Incident Response Cheat Sheets||Free|
|Identify||Information sharing US-CERT Traffic Light Protocol||Free|
|Identify||Comparison of Endpoint Detection and Response (EDR) Tools||Cost|
|Identify||Comparison of Free Digital Forensics Tools||Free|
|Identify||SANS SIFT Forensics Workstation||Free|
|Identify||AccessData Forensics Tool Kit (FTK) Imaging and Investigation||Cost|
|Identify||Wireshark Packet Capture Tool||Free|
|MITRE ATT&CK Threat Hunting Framework||Free|
|Multiple||SANS IR Playbook Creation||Free|
|Multiple||French CERT IR Playbooks||Free|
|Lessons Learned||MITRE IR Exercise Playbook||Free|
|Lessons Learned||CIS Six Tabletop Exercise Scenarios||Free|
|Lessons Learned||More Incident Response Scenarios||Free|