Fatskills
Practice. Master. Repeat.
Study Guide: CompTIA CySA+ Cybersecurity Analyst Certification: Security Operations and Monitoring - Analyze Data As Part Of Security Monitoring Activities
Source: https://www.fatskills.com/comptia-cysa-cybersecurity-analyst-certification/chapter/comptia-cysa-cybersecurity-analyst-certification-security-operations-and-monitoring-analyze-data-as-part-of-security-monitoring-activities

CompTIA CySA+ Cybersecurity Analyst Certification: Security Operations and Monitoring - Analyze Data As Part Of Security Monitoring Activities

By Fatskills Exam Guides Team — the exam nerds behind 28,500+ quizzes and 2.1M practice questions across 500+ global exams.

⏱️ ~46 min read

Domain Objectives
3.1  Given a scenario, analyze data as part of security monitoring activities.
3.2  Given a scenario, implement configuration changes to existing controls to improve security.
3.3  Explain the importance of proactive threat hunting.
3.4  Compare and contrast automation concepts and technologies.

Objective 3.1  Given a scenario, analyze data as part of security monitoring activities
To understand what is going on in our network infrastructures, we must collect data. And it is not enough to simply collect this data and let it sit; we must analyze it. But data collection and analysis are not random efforts where we collect everything possible and are faced with analyzing terabytes of data, although sometimes it can seem that way. Data analysis should be a carefully planned process and involves collecting the right data from the right sources and using both automated and manual means for presenting and analyzing this data to look for specific patterns.

In this module, we will discuss the particulars of data collection and analysis, and how you can use that analysis to improve the security of your infrastructure. We will look at the various ways we collect data, from both hosts and the network, and how we can review logs and other sources of data to determine the security impact on the organization. We will also look at the particulars of security information and event management (SIEM) systems, as well as the basics of writing queries and rules for those systems. We will also discuss e-mail analysis and how it affects organizational security, since e-mail is a very commonly used avenue of attack for malicious entities.

Heuristics
We have different ways to analyze some of the data we get. Signature- or pattern-based analysis is used when we have a known pattern of behavior that can be judged against previously recorded patterns for comparison. One example of something we monitor using signature-based patterns is, of course, malware. Malware can have patterns, or signatures, and whenever a piece of malware is encountered, it is compared to previously recorded patterns to see if there is a match. Another example is network traffic patterns. A specific type of network attack, for example, can match a previously known pattern so that it can be conclusively stated that we are encountering that attack pattern again.
Another form of analysis is called behavior analysis, which is what we use when we encounter behaviors of systems, data, or even people that are outside the norm. To conduct behavior analysis, however, we must start with a baseline of what normal behavior consists of. Once we have established that baseline, usually by monitoring systems over time, we can easily detect when behavior does not follow the normal baseline.
Heuristic analysis is a form of behavior analysis, in that it looks for behaviors or characteristics that are not part of a normal behavior pattern. However, rather than simply flagging something as abnormal behavior and alerting on it (which most behavioral-based systems do), it attempts to match those unusual behaviors with characteristics that could indicate an attack, whether it is abnormal network traffic, potential malware, or other possible malicious attacks. Heuristic analysis, however, only offers possible matches of characteristics with abnormal behaviors; it is not as accurate as signature-based analysis.

Key Terms: The difference between ordinary behavior analysis and heuristic analysis is that behavior analysis simply alerts you when an event occurs that is outside the normal behavior baseline, whereas heuristic analysis attempts to match that abnormal behavior to known characteristics of an attack. Note that heuristic analysis does not use an attack signature to compare the behavior to; it simply looks at possible characteristics of an attack. This makes it less accurate than the signature-based analysis.

Trend Analysis
Trend analysis is looking at a broad range of data to determine causes, patterns, and behaviors. Two important types of analysis are temporal analysis and spatial analysis. In temporal analysis, we collect and examine data along a spectrum of time. It could be over a short or long period of time or from specific points in time. This type of analysis is often evident in historical analysis, which looks for patterns in events we have already encountered so we can derive information about those events and explain them. We can also use temporal analysis, combined with statistical analysis, to use data from past events to predict future behavior or trends. If we encounter a series of events, such as abnormal bandwidth spikes, we can look at the relevant data and attempt to explain what caused the bandwidth spikes. We can further take that data and extrapolate patterns that may take place in the future, thus predicting future bandwidth spikes.
Spatial analysis looks at data relevant to physical or logical spaces (locations), such as geographical locations (for example, a specific regional office), IP subnets, or even specific hosts. This analysis can tell us about events occurring in those spaces. Consider data that points toward a specific IP address or even country as the origination point of an attack, for instance, or a specific web server that has repeatedly received unusual traffic.

Exam tip: The two types of trend analysis you will be expected to know are temporal (time-related) analysis and spatial (logical or physical location) analysis. Historical analysis is a form of temporal analysis and seeks to explain historical events using available data.

Endpoint Data
An endpoint is typically a host on the network.
This is where user processing occurs. Keep in mind that an endpoint does not necessarily have to be a user device; it could be a workstation, server, a mobile device, network device, an Internet of Things (IoT) device, or even a sensor connected to a network. Regardless of what it is, endpoints generate data that lends itself to ready analysis, if collected, aggregated, and analyzed properly. In the next few sections, we’ll discuss the different types of endpoint data and how to use specific data analysis techniques to determine the security posture of the endpoint, as well as how it aggregates with other data in the enterprise and affects the overall risk posture of the entire infrastructure.

Known-Good vs. Anomalous Behavior Analysis
In every infrastructure, there is a “known good.” This means there are known, predictable patterns of behavior, such as network traffic, user behavior, file access, data transfer, and so on, that are considered normal and acceptable. Cybersecurity analysts know what these normal, acceptable patterns are because they have observed them over time. Even unusual events that only occur infrequently might also be known and predictable, if there is a valid explanation for them. Regular peaks in processing, such as during certain times of the year (for example, fiscal year closeout or tax season), as well as increased network usage, which might immediately precede a new product release, are both events that are easily predictable and explainable.
Cybersecurity analysts must baseline the behaviors of the network, hosts, and users in order to understand what exactly is “known good” and what is not. Baselining refers to the practice of observing behavior over time to understand which behaviors are normal, routine, expected, and predictable. Baselining can also tell analysts which behaviors are not normal or at least are unacceptable. Baselining is not a simple exercise that happens once and then is done; it constantly occurs over time in mature environments. Data is constantly being collected and fed into analysis tools, such as SIEM systems, and the baseline is always being updated, refined, analyzed, and adjusted as the operating environment changes. What the baseline was one day may change the next, but hopefully due to explainable circumstances, such as a new company acquisition or product line development. In these cases, the changes are predictable, and the change is accounted for and becomes part of the baseline.
It is the anomalous behaviors that can indicate a problem with security in the infrastructure—unexplained spikes in bandwidth, a larger volume of traffic exiting the network bound for an external host, or even an increase in account lockouts one day. These can all be indicators of compromise (IOCs) that may be the first alert you have to an attack or other malicious activity on the network. The key here is unexplained behaviors. If you know what behaviors should be exhibited on the network, then any unusual behaviors that can’t be readily explained are the ones worth investigating.
There are both manual and automated means of detecting and analyzing unusual behaviors from a host, a user, or devices on the network, such as host-based intrusion detection and prevention systems, SIEMs, and so on. These should be used to fine-tune the detection of unusual behavior as well as analyze it to see if it is explainable and potentially malicious. Of course, if you lack data, you cannot easily analyze behaviors, so it is vitally important that endpoint data be collected and analyzed as you would with other data. Endpoints have logs, of course, that must be collected and analyzed, but they also contain other data that analysts should obtain. Examples of that data, which we will discuss shortly, include memory and file system data.

Exam tip: To have a valid notion of known-good behavior, you must first establish a baseline over time.

Malware Analysis and Reverse Engineering
For the purposes of this objective, you should know that malware analysis is more than simply detecting malware on a system and eradicating it. Malware analysis involves somehow obtaining a copy of the malware code, either through the executable located on the host or some other means, and recording its characteristics as much as possible. This includes the processes it spawns, memory locations it resides in, any registry entries it changes for a Windows host, files it accesses, actions it performs, and any network traffic it generates. Recording these characteristics can help you analyze the malware. Obviously, the best way to analyze it is to obtain a copy of its executable and execute it in a controlled environment, such as a sandbox environment.

Memory Analysis
Memory analysis is one of those processes that can be extremely difficult to complete. Almost any analysis of data resident in memory runs the risk of changing that data in some way. Additionally, most often, if you are going to analyze memory contents, you are going to do it during a live forensic analysis of the host.

You should understand that memory can be imaged just as a hard drive can be, although the process is somewhat different. In terms of data analysis, you are looking for processes in memory that shouldn’t be there, processes that may match known malware, traces of processes connected to an executable that could be malicious in nature, and so on. Ideally, you would like to obtain an idea of what data was occupying all memory locations used at the time you took the snapshot of the memory. Again, memory (in addition to hard drive and file system analysis) is really a function of deep forensics analysis and likely not something you would do unless you have firm evidence that some sort of system-level compromise has taken place.

File System Analysis
File system analysis can tell you a lot about a potentially compromised host. There are file system–related indicators of compromise that you should look for on an endpoint. First, you may have to do a complete forensics analysis of the hard drive or media involved. You should look for the following:
- Potentially altered files, by comparing their previously known hashes
- Time and date stamps to see when they were last accessed or modified
- Possible malicious files disguised as legitimate system files, indicating a possible rootkit
Obviously, you would want to check the media for malware, but you would also want to look for traces of deleted files that might indicate an attacker had tools on the box and attempted to hide or deleted them to avoid detection.

System and Application Behavior
In developing your baselines, you should also be developing a baseline of both individual systems as well as applications that run on them. You should determine what is considered a normal behavior and what is not. Operating systems exhibit behaviors that are easily identified, tracked, recorded, and analyzed. When any of these behaviors are outside the normal baseline, they should also be easy to detect. The same applies to the applications that run on these systems. Applications should follow a normal baseline behavior; there are certain files that they may access frequently, a certain amount of memory that they occupy, a certain percentage of CPU time that they use. If any of these or other factors fall outside the normal baseline, then you should shift your attention to the abnormal behavior to discover what is causing it. It could be a malicious event, but it could also be a malfunctioning application or system. It could also be a hardware problem or even an end-user issue. Any of these things are likely to cause operating systems and applications to behave abnormally, but once they are ruled out, you should start looking for potential security issues.

User and Entity Behavior Analytics (UEBA)
If you’re sensing that analyzing unusual patterns of behavior is a key focus of this objective, you are correct. This is true with almost any data you can collect and analyze. It is also true with a technique called user and entity behavior analytics (UEBA). UEBA is not simply about looking at logs of user actions; it is about looking for patterns of unusual behavior. Understand that there may not be malicious intent involved, but sometimes user behavior can lead to security issues on the network and can also be malicious in nature. These are the things we are looking for with this technique.
As with other types of behavioral analysis, we must first collect data and baseline it as normal user behavior. You could collect data and aggregate it, with groups of users, and develop some very useful pattern analysis. For example, you could determine that during lunchtime hours, web traffic typically spikes because people may surf the Web a little bit while they are sitting at their desks eating their lunch. That is an obvious result of behavior analysis. You could also look at specific groups of users, such as administrators. If you see that one or two administrators are using their administrator accounts for more than performing privileged tasks, such as day-to-day e-mail use, that would be an indicator that you need to enforce policies for those accounts. More in-depth use of this process, however, is to look at normal user behavior and develop a baseline, so you can determine when there are unusual, unexplainable behaviors across the entire infrastructure. This is the key to much of the analysis discussion we have had so far; you must look for unusual and unexplainable behaviors that may indicate a security problem.

Note: The word “entity” is included in this technique because it isn’t simply users that have accounts; remember that services or daemons can also possess accounts, so you must determine what their normal baseline behaviors are as well so you can detect any unusual behaviors exhibited by these accounts.

Analysis of Endpoint Exploitation Techniques
We are not going to discuss specific exploitation techniques in this module; what is important here is how you detect these exploitation techniques, collect the data on them, and analyze them. Even prior to a potential attack, you should be familiar with the various techniques that attackers use to exploit your network and systems. Based on the knowledge of these techniques, you should be employing this knowledge when threat hunting; this would lead you to look for and detect potential compromises. Some of the more advanced machine learning systems, as well as other data analysis tools, such as SIEM systems, also include exploitation and attack techniques in their libraries so these patterns can be looked for in data. These include network traffic attack signatures, malware signatures, and so on, but signature-based analysis can be used only so far. It is important that potential attacks be analyzed using behavior-based and heuristic techniques as well. Knowledge of various attack methodologies, such as the MITRE ATT&CK framework, would also help you identify and analyze potential attack and exploitation behaviors.
It takes a good amount of experience and analysis to recognize patterns of exploitation versus normal behavior, but as long as you have a good solid baseline of what known-good behavior is, anomalous behaviors that could indicate a system compromise will be more obvious to you.

Network
Like endpoint-related data, network data is of critical importance in your analysis. In addition to point-in-time analysis of specific network traffic, you also must examine network traffic over a period of time, looking for aggregated traffic or specific traffic-related events that may tell you what happened during an incident or to predict possible future trends. There are several aspects of network data analysis to consider. The primary, and most obvious one, is network traffic, which is captured and examined. There are also, however, other aspects of network data you should examine. Uniform Resource Locators and DNS data analysis, as well as network-related malware analysis, are all important. We will briefly describe each of these in the following sections.

Uniform Resource Locator (URL) and Domain Name System (DNS) Analysis
The domain name system is critical to the operation of a network infrastructure. Unfortunately, it can also be the source of many different avenues of attack. These include false entries in DNS servers, false entries in a host’s DNS cache, and diverting DNS queries to a false or malicious source. There are also attacks directly on a DNS server that take advantage of insufficient permissions, allowing zone transfers from an organization to an unauthorized server.
As with all infrastructure servers, DNS allows for advanced logging that can be fed into a centralized log server, or, preferably, a SIEM system. Combined with logs that come from a host, DNS logs can indicate if there is a potential problem with name resolution both on the internal network and to external domains. Sometimes troubleshooting a DNS security issue can be problematic, in that queries can be passed from one DNS server to another. If there is a false or inaccurate entry in a DNS server, it must be traced back to the source server that has a bad entry. Often, the trouble lies either in the host’s resolver cache or through DNS redirection from a malicious or compromised website.
Uniform Resource Locators (URLs) can also be problematic, in that the issue may not be a DNS entry; it could be a misleading or obfuscated URL. Many end users may not recognize that the URL has been changed slightly from a legitimate one, particularly if the URL contains different characters or is many levels deep. Network security devices, including intrusion detection systems, next-generation firewalls, application-level firewalls, and so on, can analyze URLs to determine if they are legitimate or not, but they will not always catch everything.
Sometimes specific URLs must be configured to redirect traffic to a sinkhole on the network. This means that a user, or even an automated malicious bot, will not be able to communicate with the URL since its traffic will be diverted to the server on the internal network. This is to prevent users and hosts from attempting to communicate with malicious websites. URLs placed in a deny list can be logged for later analysis, and when combined with host logs regarding user actions as well as DNS server logs, this can be useful in determining exactly how false or malicious URLs are being used to target users and hosts on the network.
One technique of determining if the URL could be malicious or not is to use reputation-based services. Reputation-based services employ online databases where known malicious URLs have been reported and recorded. Many of these databases can be accessed through a subscription feed service and are compatible with different network security devices such as application proxies, WAFs, and next-generation firewalls.

Domain Generation Algorithm
It is useful to know about domain generation algorithms (DGAs) and how they fit into all of this. DGAs are used to dynamically generate domain names and URLs. Unfortunately, this technique is used to enable malware to communicate with external command-and-control servers. Once a security analyst determines a malicious domain that malware may be communicating with, they can, of course, block this domain or send it to a sinkhole. However, malware writers use a DGA to enable malware to dynamically switch to new domains, even after administrators have blocked other target domains. DGAs will generate what appear to be random or nonsensical-looking domain names, which are hard to develop rules for. This is one instance where an allow list of only permissible domains would be of more value than a deny list, which only denies specific domains but may allow others generated by a DGA to slip through.

Flow Analysis
Flow analysis is the name given to a technique of analyzing network traffic.
An example of flow analysis is a technology originally developed by Cisco called NetFlow. NetFlow works by grouping traffic into flows based on common characteristics. This way, all traffic originating from or destined to a specific IP address, for example, and containing specific ports or protocols can be examined in aggregate to look for patterns. These patterns include all traffic entering your network from a particular source, or all traffic destined for a particular destination, or all traffic containing a specific protocol. The objective is to look for patterns, particularly when any element of that traffic is considered unusual. Flow analysis is not very useful for singular or one-time communications sessions, as it would be difficult to discern a pattern; it is far more useful when looking at aggregated traffic.

Packet and Protocol Analysis
Traffic analysis can take many different forms. In the previous section, we discussed the technique of NetFlow analysis, but there are also ways of analyzing traffic that focus specifically on protocol, TCP/UDP port, or network service. For example, you could analyze all network traffic inbound to various chat services on your network, or traffic using a specific source port. To analyze traffic, you should first capture it in a file that can be dissected and analyzed. There are different packet capture/traffic analysis tools available, with the most popular being the open-source Wireshark tool and the built-in Linux/UNIX utility tcpdump.
Captured (saved) traffic is much more useful for forensic analysis after the fact. Real-time analysis is possible but difficult; this type of analysis usually involves simply alerting an analyst that unusual or potentially malicious traffic has been detected. This can help you by alerting you to an attack so defensive measures can be activated and the attack stopped. Unfortunately, this type of analysis is not very useful for post-incident forensics.
Regardless of which traffic analyzer you use, you should develop filters that enable you to separate out the traffic of interest that you wish to view. There are capture filters and display filters; capture filters only capture and retain the specific traffic you build into the filter and do not store any other traffic. With a display filter, you are capturing and storing all traffic, but only displaying to the analyst the traffic of interest that you specify for the filter. You can, however, always change the display filter to show any or all traffic of interest to you.

Here are some of the elements you may want to filter and analyze:
- Source address
- Destination address
- Source or destination port
- Protocol
- Network service
- Plaintext or encrypted traffic
- Volume of traffic (for example, the volume of traffic above a certain level in a certain amount of time)

When analyzing traffic, you are looking for patterns that could indicate potentially malicious network traffic, of course. This could be a specific communications session or an aggregate of all communications sessions over a period of time entering and exiting from your network. Traffic analysis is extraordinarily useful during incident response and for a forensics analysis after an incident. When combined with other infrastructure data sources in context, traffic analysis can give you an accurate picture of what has occurred on the network, enabling you to determine the root cause or even predict potential trends.

Note: You should carefully plan for capturing traffic on a routine basis. Traffic captures require a great deal of storage space; you should capture and retain only the traffic that could be of forensic value to you and set data retention policies for it.

Network-Based Malware Analysis
In addition to reverse engineering malware (that is, looking at it from a code perspective), it is also useful to look at malware from a network point of view. Many pieces of malware are designed to initiate network-based attacks. This includes automated bots that communicate with other hosts on the network to further spread the malware, as well as communicating with a central command-and-control server on the Internet. Part of your malware detection and eradication strategy should be to monitor your network traffic for unusual ports, protocols, services, external destination addresses, and volume of traffic. These are all telltale signs that malware could be on your network and attempting a network-based attack. Of course, there is also malware that can attempt to use allowed protocols, such as HTTP, to engage in an attack and send command-and-control messages to other hosts on the network as well as external addresses. This is where packet inspection and content filtering come into play. Using tools such as NetFlow, DNS analysis, and DGA inspection will assist you in detecting network-based malware.

Log Review
Almost all operating systems, applications, and devices, when properly configured, generate log data. This is essentially an audit trail of events that have happened on a system or in an application. Log data can tell you what happened, who caused it to happen, when it happened, and, when aggregated with other types of data, can sometimes tell you how the event of interest happened and even give you enough information to hypothesize as to why it happened. There are many different sources of log data, even from a single system. For example, on Windows devices, you can have system logs, which detail events related to the operating system and its hardware resources, the security log, which obviously gives you information on security events, and application logs. There are also service and application logs created that you could put in the proper context to determine what exactly is going on with those services or applications. In the next few sections, we will cover only a brief list of log sources you should examine that could be key to an investigation or an audit, as well as assist you in passing the exam. These include events logs, syslog, firewall, web application firewall, proxy, and IDS/IPS logs.

Event Logs
Event log is a generic term for logs that could cover a broad range of areas, including system events, security events, application or service events, user events, and so on. The keyword here is event, which is an occurrence of an activity or happening. An event is something that is considered on a singular basis and has defined characteristics.

The basic information regarding an event in a log should include the following:
- Definition of the event
- System or resource affected by the event
- IP or MAC address
- Entity causing or associated with the event
- Start and end time and date of the event
- Action that caused the event (for example, account creation, file deletion, or privilege escalation or use)

A particular activity on a system may be broken down into smaller, individual activities; for example, the creation of a new user account likely isn’t a singular event in itself; it is likely made up of several smaller events that can be decomposed into individual singular actions. Normally, events have codes associated with them that indicate the event type and identification. These are usually unique to the operating system or application logging the event. Most operating systems and applications have built-in facilities for reviewing events, but not all of them function equally well. Also, most of them do not look at aggregated events to see how they relate to each other.

Syslog
Syslog is a type of log typically found on a UNIX or Linux system. It is the core event log for those types of systems. Syslogd is the name of the service that runs the event-logging service on those operating systems. A syslog server is a dedicated server to which events from various hosts are collected and forwarded for further aggregation and analysis. There is also a considerable number of utilities that are useful for examining syslogs from different hosts to discern patterns and facilitate analysis.

Note: Although syslog is a specific type of logging service that runs on UNIX/Linux hosts, it has become a more generic term in the cybersecurity community to indicate a centralized logging server that collects logs from all types of hosts and operating systems.

Firewall Logs
Firewall logs are useful in that they can give you information on all the traffic entering or leaving from a network interface. These logs can tell you basic details and characteristics about traffic that includes time/date, ports, protocols, services, and source and destination IP addresses. More advanced firewall logs can also give you detailed information about the application protocol content itself. This can help you analyze, for example, the characteristics of an Internet-based attack. Firewall logs, like most other types of logs, can be sent to a centralized syslog server or, preferably, fed into a SIEM system for better aggregation and analysis.

Web Application Firewall (WAF)
A web application firewall is dedicated to protecting the web application server. It serves as an intermediary between users who access the web application through their browsers and the server containing the web application and its data. It is useful in preventing attacks such as cross-site scripting, injection, and so on.
WAF logs can be helpful in determining what type of potentially malicious actions transpired in a particular user session that may have compromised the web application. Normally these types of logs can give you the same types of information as other, more general firewall logs but can also include detailed information on the HTTP traffic that was passed back and forth between the client and the server.

Proxy
Proxy logs can tell you details about communications session between users or clients and Internet destinations and applications. You can view information from these logs concerning types of traffic that pass between the user and the web application, as well as what type of content was transferred, its characteristics, and what type of content or sites may have been attempted access but denied.

Exam tip: Remember that a proxy intercepts Internet requests from clients and forwards them on to Internet hosts, and in turn passing their replies to the client. Proxies can be used to prevent malformed HTTP and other web-based protocol requests from compromising clients as well as to prevent internal unit users from attempting to violate policy by accessing prohibited websites.

Intrusion Detection System (IDS)/Intrusion Prevention System (IPS)
Intrusion detection systems (IDSs) and intrusion prevention systems (IPSs) are among the most valuable types of network logs you can analyze to gain information on a potential malicious event. In addition to the data that other network device sources provide, such as network traffic information, content, and so on, you can also get information from these logs that may detail how an attacker attempted or succeeded in attacking the network. You may get information on what type of traffic was allowed, denied, or dropped. You should also be getting information on different attack techniques malicious entities may employ.

Impact Analysis
When examining all the data we discussed, one of the key pieces of information you need to develop out of all this analysis is the impact. This could be an impact on the organization, a specific system, or even a user. Sometimes you are looking at the impact of a single event for a single system. Other times you are looking at what is called aggregate impact, which is the combination of all the different measures of impact, throughout the organization or a system, that an event may cause. For example, a malicious attack from the Internet may impact individual systems differently than it does the entire organization. A system could be restored or replaced, but the impact to the organization could be far worse in terms of reputation, fines, or other liability. Remember that impact could come in several forms, including damage to a system or data, financial impact, legal liability, and impact that can be difficult to measure, such as reputation or loss of market share. In the next two sections, we will talk about organizational impact versus localized impact, which may be specific to a system. We will also discuss the difference between immediate and total impact caused by an event.

Organization Impact vs. Localized Impact
The organizational impact could be considered the aggregate impact. If an attack compromises several different systems, for example, or even results in a data breach, the organizational impact is likely at least equal to, but possibly even greater than, the measures of impact for individual systems. Individual systems can be restored, repaired, or replaced, based on their value and criticality to the organization. But the organization may suffer far greater than the sum of the total impact for each system; it can suffer severe legal liability, particularly in the event of a breach, when it may incur fines, loss of customer confidence, lawsuits, and so on. It can also suffer longer-term effects in that it may have to make major changes to its infrastructure to prevent future similar attacks. This can mean upgrading equipment, hiring additional personnel, and so on.
A localized impact is typically measured for an individual system or set of resources. It can be an individual host, a network segment, or a group of systems that support a specific line of business. Localized impact could also center around a specific geographic area, perhaps a branch or department or regional location versus the entire organization. In risk management, localizing impact can also mean minimizing impact in the event of an incident. Segmenting systems and networks, both logically and physically, can help to localize the impact to a single system or network segment, for instance, minimizing potential impact.

Immediate vs. Total
Another way of looking at impact is based on the immediate impact versus the total impact. The immediate impact again can be compared to a more localized or minimal level of impact. It can also be the impact measured while an attack is in progress, for example, versus the total impact, which may not be apparent until after the event. The immediate impact could include the cost to repair or replace a server or other equipment, labor hours, software applications, and so on. The total impact could also include measurements beyond those, including revenue lost, the total cost to mitigate the incident, and even elements that are not as easily measurable, such as loss of consumer confidence or reputation, for example.
In any case, measuring the impact, whether it is organizational or localized, and immediate versus total, is not a point-in-time process. It is more of a constant, long-term process that can be performed at various points during an incident. Understanding the impact along those various points of the spectrum can help an organization focus its resources and energy on specific pressure points, such as critical servers or processes, to minimize the overall total impact.

Note: Consideration of organizational impact versus localized impact, as well as immediate versus total impact, should be part of your risk management strategy.

Security Information and Event Management (SIEM) Review
Collecting all the data that we discussed so far does not do your organization any good at all unless you do something with it. In this case, that means categorizing, aggregating, correlating, and analyzing it. Beyond analysis, it also means taking meaningful action based on that information once you have it and are using it to secure your network. Remember that data is a singular piece of knowledge; by itself, it may or may not mean anything. But when combined with other data, and given context, it becomes information. Information is what is useful to you; it adds dimension to all the otherwise disparate collections of nonrelated data. Therefore, collecting the right data is important, as well as aggregating it into a useful form so that it can be analyzed in different ways. As we have previously mentioned, there are many ways to do this. 
With most of the data we have talked about in this objective, it’s impractical to visit every system and go through all the built-in utilities, limited as they may be, to try to make sense out of the volumes of data you will have. It is more efficient and effective to send all the data you possibly can to a centralized security information and event management (SIEM) system. The system should be your go-to, one-stop solution for viewing all the disparate data in your infrastructure and making sense of it. Remember that the goal is not simply to collect and analyze it; it is to gain meaning from it that will help you make intelligent, risk-based decisions about the security posture of your network. While some of these critical and seemingly insurmountable tasks require a human touch, many of these data collection and analysis steps can be automated, delivering concise, relevant data to an analyst for further action. Depending on the complexity of the analysis engine, the SIEM system can be programmed to perform any of the data aggregation, signature, and behavior analysis for you. Of course, this is dependent on the rules you create for the system. Like all security devices, SIEM systems require well-written rulesets to process data. Each SIEM system may also have its own query language, enabling you to write detailed, complex query sets so you can further automate data transformation and analysis actions. In the next few sections, we will discuss some of the particulars of writing rules and queries for these systems, as well as discuss topics such as the usefulness of the SIEM dashboards.

Dashboard
The dashboard is the central nerve center of a security analysis tool for a cybersecurity analyst. It presents all the information the analyst needs to know, at their fingertips, as well as provides the interface to managing the different data that can be manipulated and viewed. Analysts can create queries, execute them, and view the results from the dashboard, as well as manage the different functions and options of the analysis software itself. This makes using the tool much simpler, more efficient, and effective. Otherwise, an analyst may have to use multiple different tools in different areas or systems, using different interfaces. This could significantly slow down the work of an analyst and render their efforts moot.

Rule and Query Writing
All network devices, whether routers, firewalls, intrusion detection/prevention systems, or SIEM systems, have their own built-in facilities for writing rulesets and queries. Rulesets are typically found more so in network devices that actually take action, such as routers and firewalls. Queries are more often found in more advanced devices where you need to analyze information coming from those devices, such as an IDS or SIEM system. We discuss firewall rules more in depth in Objective 3.2, so we will focus in this objective more on the rules and queries that are inherent to analysis devices, such as a SIEM system. Most popular SIEM systems, such as Splunk and the Elasticsearch, Logstash, Kibana (ELK) open-source system, have their own unique rule and query languages. Splunk has the Search Processing Language (SPL), while ELK uses its Kibana Query Language (KQL). Both are languages from different vendors; however, there are enough similarities that you can learn either of them easily if you are already proficient in the other. Many SIEM systems have the ability to import rules and queries created in other languages, such as Python and Perl.
The key to query writing is knowing exactly which data, under what circumstances, you want to derive from a large collection of aggregated data the SIEM system has at its disposal. Queries can be very simple or quite complex, depending on what type of specific data you are looking for, the volume of data stores you must search through, and the complexity of the criteria you are imposing on the search. Typical search parameters for queries include the following:
- IP or MAC addresses
- Domain names
- Protocol
- TCP/UDP port
- Inbound versus outbound traffic
- Traffic classified as allowed, denied, or dropped
- Username
- Action (for example, account creation, file deletion, or website accessed)
- File name, size, or contents
- Dates and times
- Bandwidth usage or volume of traffic
Note that these are only a few examples of basic criteria included in queries or rules. Many other, more detailed criteria and data characteristics could be included in the query to fine-tune the results returned.
Rulesets and queries can also take other devices’ rules and query results as input into their own criteria. An example of this is the ubiquitous known-bad IP address rule. Almost all network security devices have a rule of this sort, either in the form of a deny rule or a deny list. In this type of rule, the administrator adds a list of IP addresses that are known by reputation sources or by experience to be malicious or undesirable in nature. These known-bad IP addresses are processed as part of deny lists or as part of a ruleset. While a SIEM is not a security device in that it can take action based on traffic that may be detected originating from or destined to one of these addresses, you can use these known-bad IP address lists as part of your rules and queries to search for and identify any traffic that matches one of these addresses.

String Search
One of the key elements of a query is the string search. Queries can look for characteristics of certain data elements, such as an IP address, that are frequently found in defined fields. However, sometimes elements you may want to search for may be, for example, alphanumeric strings that do not conform to any particular field or format. This is where string search is valuable. You can essentially search for any user-defined value or string, regardless of how the data is formatted or defined in fields. Using wildcards in your searches, as well as regular expressions (called regex), can greatly enhance and simplify your string and character searches.

Scripting and Piping
As part of the defined objectives for the exam, you should also know how to use techniques such as scripting and piping in your data collection and analysis efforts. As previously mentioned, scripting can take the form of built-in scripting languages that come with the security analysis tool, or various scripting languages such as Python and Perl can be used for creating rule sets and queries. Piping is a simple but useful concept and involves using the output from one command or script as the input for another. A script can execute certain tasks or manipulate data and produce an output that can be used as the parameters or input for another script. This is useful in chaining scripts together without making them monolithic and allows a more modular approach to scripting.
While scripting has practical limitations, it is perfect for ad hoc or smaller tasks that are repetitive and inefficient to manually perform on a routine basis. For larger, more complex tasks, although scripting could be used, it may be more efficient to use the built-in powerful query languages that are included as part of many Security Orchestration, Automation, and Response tools (referred to as SOAR tools), such as SIEM systems.

E-mail Analysis
E-mail is one of the most commonly used services in the organization. It is also one of the most critical, since so much of the organization’s communication, both internally and externally, takes place over e-mail. In addition to providing communications, e-mail is also a popular avenue of attack for malicious entities. The Simple Mail Transfer Protocol (SMTP), which most e-mail servers use, is older and inherently insecure. There are several methods used to secure e-mail, with varying degrees of effectiveness. Several e-mail attack vectors can be used against an organization, such as impersonation, malicious attachments, links, and phishing. We will talk about those in the upcoming sections. We will also look at analyzing e-mails for key elements that may tell you the legitimacy of the e-mail and if it has any malicious intent behind it. These include the digital signature, the e-mail header, and even the e-mail signature block. Finally, we will discuss three important security technologies recently implemented in e-mail systems that can assist in increasing e-mail security. These include Domain Keys Identified Mail (DKIM), the Sender Policy Framework (SPF), and the Domain-based Message Authentication, Reporting, and Conformance (DMARC) specification.

Impersonation
One of the more obvious ways e-mail can be used against an organization is through impersonation. E-mail is commonly used to attempt to impersonate a person or organization. More often than not, this is for random purposes in the form of phishing attacks but can also be used as a concerted effort from an advanced persistent threat (APT) to compromise an organization. Impersonation can be implemented using fake or slightly altered domain names and IP addresses, obfuscated URLs, and other spoofing techniques. Normally the impersonator will make the e-mail seem like it came from a legitimate source and is from an entity known to the user. Often, impersonation is not the end goal of the e-mail; instead, it is used to further attack the organization through other means, such as carrying malicious payloads, links, or phishing attempts. The impersonation part is simply to get the user to trust the sender of the e-mail so they will open it and execute a potentially malicious link or follow instructions that may be part of a phishing attack.

Malicious Payload
E-mail can carry malicious payloads in the form of attachments that the user may be tricked into opening, or even code that is embedded in the e-mail itself. The user does not necessarily have to click the payload to execute it; simply the act of opening the e-mail could execute malicious payload. If the organization has implemented good security measures at the perimeter and on the e-mail server, most malicious payloads should be detected and stripped from the e-mail, or otherwise alert an administrator. Many times, these malicious payloads are forms of malware, which can be detected through enterprise-level antimalware software.

Embedded Links
An embedded link is one that could be hidden in the body of the e-mail, behind a person’s name, for instance, or address. If the user clicks the address thinking they are going to open it in Google maps, for example, it could actually redirect the user to another site of the attacker’s choosing. As mentioned previously with malicious payloads, e-mail security mechanisms should strip or otherwise render harmless embedded links in e-mail messages.

Phishing
Phishing is an attack technique used through e-mail to trick the user into performing various actions, which may include clicking harmful links or even replying to an e-mail with a user’s personal information, such as a password, credit card information, and so on.
Phishing is a form of social engineering that is simply executed through e-mail. There are various forms of phishing, such as whaling and spear phishing, but essentially they are all focused on gaining the trust of an unsuspecting user and tricking them into performing actions that could result in compromised data or the installation of malware on the organization’s network.
A phishing e-mail is typically designed to look exactly like a legitimate e-mail from a person or entity the user trusts. It may even contain embedded pictures or stationary that resembles a trusted organization, such as a bank. If the e-mail can establish some level of user trust, then it can use social engineering techniques to entice the users to perform potentially malicious actions, including clicking links, downloading attachments, or even replying to the e-mail with sensitive information. While there are many technical security controls that can help prevent or detect phishing attacks, ultimately user-level training is the best method for stopping them. If users are educated on various phishing techniques and their ramifications, they are better able to avoid these types of attacks.

Forwarding
E-mail forwarding can be troublesome in many different respects. First, users who continually forward potential spam or hoax e-mails are doing nothing more than using up network bandwidth and server storage space. However, this is the least of the organization’s worries; this can be annoying and troublesome at best but can also be used to inadvertently spread malware throughout an organization. E-mail and acceptable use policies normally can be used to reduce this type of forwarding by simply enforcing the policies when users violate them. Some e-mail servers have configuration settings that can prevent the forwarding of potentially malicious attachments as well.
Another aspect of forwarding, albeit a bit older in nature, is a security misconfiguration that allows an attacker to use an organization’s e-mail server as a potential forwarder. If configured improperly, the e-mail server could accept requests from an external attacker and forward e-mails on to yet another organization, becoming part of a potential attack. In this respect, external SMTP forwarding should be disabled on the organization’s e-mail servers.

Digital Signatures
Digital signatures are implemented as part of a larger public key infrastructure (PKI) and are designed to verify that an e-mail is authentic and that the person who claims that they have sent it is actually the person who did. Digital signatures ensure authenticity and nonrepudiation. In other words, if an e-mail is digitally signed, a user cannot later deny that they sent it, assuming their public/private key pair has not been compromised.
Digital signatures can be verified using various means, including a certificate revocation list (CRL) published by the certificate issuer. If the e-mail sender’s public/private key pair has been compromised, this, in turn, means that the certificate is compromised, and the issuer should publish an entry on the CRL that states such. This renders the key pair invalid, also invalidating any digital signatures used to sign e-mails. This is one way an organization can verify that e-mails are authentic.

Header
As part of our discussion on data analysis, e-mail headers provide useful information when analyzing an e-mail to determine if it has malicious intent and is authentic. The e-mail header can be easily viewed in most e-mail client applications or as part of e-mail server analysis tools. The header is supposed to indicate the identity of the user who sent the e-mail, the recipient, the originating e-mail server and organization, the destination e-mail server and organization, and possibly any intermediate hops the e-mail took on its way to the recipient. It may also include information on digital signatures as well as e-mail attachments.
E-mail headers cannot be fully trusted to indicate the authenticity of an e-mail. Headers can be forged, and they can also include information that is erroneous. The sender’s e-mail client and server can be configured to put a variety of incorrect or false information into an e-mail header. However, when analyzing an e-mail to determine its authenticity, header analysis is usually the starting point.

E-mail Signature Block
Uninformed users often confuse the digital signature with an e-mail signature block.
It is our job as cybersecurity analysts to make sure they understand the difference. Whereas a digital signature is a result of signing an e-mail with a private key, which is then validated by the corresponding public key, an e-mail signature block only contains what the sender intends for it to; this usually indicates the sender’s name, title, and other relevant information. However, this information should not be trusted to verify the authenticity of an e-mail since it is so easily forged, and users need to understand that.

Domain Keys Identified Mail (DKIM)
The Domain Keys Identified Mail (DKIM) is a standard developed to detect e-mail spoofing, thereby preventing phishing and other impersonation attacks.
DKIM uses a digital signature linked to an organization’s domain name. The organization is issued a public/private key pair, and the e-mail is digitally signed by the organization, in addition to any user’s digital signature included in the e-mail. The receiving organization can verify the digital signature because the public key has been published in the sending organization’s DNS servers. This ensures that the e-mail was, in fact, sent by the originating organization and is authentic. DKIM is an Internet standard and was published in September 2011 in RFC 6376, with later updates included in RFC 8301 and 8463.

Sender Policy Framework (SPF)
Yet another e-mail authentication method is the Sender Policy Framework (SPF). It is not as effective as DKIM and works in a different way. Whereas DKIM uses digitally signed e-mails from the organization, SPF simply checks to ensure that e-mail has been sent from an authorized IP address, as published in the organization’s DNS server. This is not as effective as using digitally signed e-mail, since the DNS server’s record could be compromised and a false mail server (MX) record IP address inserted into it. Additionally, since IP addresses can be easily spoofed, an attacker could send an e-mail from a spoofed, yet technically valid IP address, which the receiving organization may be able to verify as one listed in the sending organization’s DNS records.

Domain-Based Message Authentication, Reporting, and Conformance (DMARC)
Domain-based Message Authentication, Reporting, and Conformance (DMARC) is an e-mail authentication protocol designed to ensure the authenticity of e-mail sent from the organization.
Rather than an authentication method itself, it specifies either or both of the previous two methods (DKIM and SPF). A DMARC entry is entered into the organization’s DNS server records, which specify which is used (or that both are). The entry allows the receiving organization to check for authenticity and use the method specified by the organization. DMARC can also control what happens to the e-mail if it fails its authenticity check; the DMARC entry can specify that an e-mail can be rejected or forwarded under certain circumstances.

Exam tip: Understand the differences between DKIM, SPF, and DMARC. DKIM uses organizational digital signatures to authenticate any e-mails sent from the organization and can be verified using the public key entered in the organization’s DNS records. SPF verifies that the IP address of the mail server used to send the e-mail is a valid e-mail server address authorized by the organization. DMARC is a policy entry in DNS that dictates which of the other two authentication methods should be used, and how the e-mail should be handled should it fail authenticity checks. All three of these authentication methods rely on an organization’s DNS.

REVIEW
Objective 3.1: Given a scenario, analyze data as part of security monitoring activities In this module, we discussed how to analyze the volumes of data that you will encounter on your infrastructure. We discussed identifying, collecting, storing, aggregating, and analyzing data from all the sources you have in your organization, such as hosts, networks, applications, and so on. We looked at different data analysis techniques, including signature and pattern analysis, behavior analysis, and heuristics. We also discussed the different types of trend analysis that data can help us perform.
Endpoint data is one of the key sources of data that has long been neglected but is recently becoming more important. You learned that we should create a baseline of known-good behaviors for both devices and users so that we have a basis for comparison when anomalous behavior is encountered. We discussed malware analysis in reverse engineering, as well as memory and file system analysis. We also discussed the particulars of system and application behavior. User and entity behavior analytics (UEBA) is a form of analysis concerned with user patterns beyond simply looking at user logs. We also briefly described the analysis of endpoint exploitation techniques.
Beyond endpoints, the network is the other massive source of valuable data. We discussed different types of traffic analysis, as well as URL and DNS analysis. We talked about how malware often uses domain generation algorithms to confuse security analysts by creating random and unique domain names. We discussed flow analysis, which looks at traffic in aggregate, as well as how to decompose traffic down to packet and protocol analysis. We also talked about the network characteristics that malware could exhibit, allowing you to further detect and track malware actions.
We discussed the value in reviewing the different types of logs you will find on hosts and network devices. These include generic event logs that cover a wide variety of events on different hosts, as well as syslog, firewall logs, WAF, proxy, and intrusion detection/intrusion prevention logs.
We also discussed e-mail analysis, since e-mail is a major critical system in most infrastructures, but it is also an avenue for many malicious activities, such as phishing attacks, embedded links, data loss, impersonation, and so on. When analyzing e-mail, there is much content to review, including the e-mail signature block, headers, and so on. We discussed the different characteristics of e-mail that are worth scrutiny, including malicious payloads that can be embedded in an e-mail. We discussed different technologies involved in securing e-mail, such as digital signatures, Domain Keys Identified Mail (DKIM), Domain-based Message Authentication, Reporting, and Conformance (DMARC), and the Sender Policy Framework (SPF).
Impact is one of the key pieces of information you should be trying to ascertain from all your analysis. You want to know not only the individual measures of impact on data and systems, but also the total impact on the entire organization, both point-in-time and over a long period of time. Impact can cause damage to the system, loss of data, financial losses, legal liability, or even loss of market share, consumer confidence, and reputation.
Security information and event management (SIEM) systems are concerned with serving as a central collection point for all types of data across the infrastructure. Their job is to collect, aggregate, correlate, and analyze data that would likely be too much for human analysts to do in a practical timeframe. They can discern different patterns of behavior and produce actionable information that you can use to make risk-informed security decisions. SIEMs provide a centralized dashboard that serves as the interface for cybersecurity analysts, allowing them to run queries, monitor systems, and manage functions of the SIEM itself. SIEMs can use built-in or external query languages to assist a cybersecurity analyst in locating and producing specific data. You can use these query languages to search on specific predefined data elements, such as an IP address, or free-flow strings that might not be in defined fields. In addition to the built-in query languages, you can also use scripting languages that are better suited for smaller, routine, repetitive tasks. Piping allows you to use the results from one script as the input for another script.