By Fatskills Exam Guides Team — the exam nerds behind 28,500+ quizzes and 2.1M practice questions across 500+ global exams.
Human opportunism is often seen as a paradigm of natural selection, or survival of the fittest, as a biological mechanism to facilitate evolutionary development in the animal kingdom. The idea is that our evolutionary state requires us to find opportunities to survive without thinking of the consequences, even if our actions are selfish in nature and at the expense of others. This form of survival is not responsible for the most revolutionary developments in human biology but is more of a reflexive mechanism to improve survival. As the human mind evolved through the years, our opportunism took natural selection beyond biological survival, pairing it with our intellect to find more clever and innovative ways to gain advantages over other humans or the environment around us. Humans use opportunism to gain advantage over others—in business, finance, sports, relationships, and beyond.
Inherent within opportunity is the concept of risk. Your intention is to achieve a reward, but if you haven’t fully thought through the consequences, you don’t understand the consequences, or you’ve accepted the consequences, you face more risk. If you take no risks, you’ll never win, and if you take too much risk, you’ll lose. A core focus of security is to manage the risk you face when you take advantage of new opportunities. How can you minimize your risk while maximizing your reward? In cloud architecture, when you’re designing a system, your reward is not the security of your system. Instead, your reward is the success of the business outcome you’re trying to achieve. Take a look at your threat landscape. How can you maximize the reward while protecting your assets to prevent loss? It would be great for productivity if your employees could access your network environment any time, anywhere, and from any machine—but when you factor in the likelihood of IP theft and other malicious activity, you realize that you have to sacrifice some of your employee productivity for a more secure system. Unfortunately, companies are oftentimes reluctant to invest in security solutions until they face a breach and make the front page of the Wall Street Journal, which results in a few key executives being fired. Then, suddenly, security budgets miraculously become available. As with any investment, however, it is often less costly to reduce your risks early on than to try to retrofit and patch up security following a breach.
Risks in the cloud
As you and/or your enterprise move toward building innovative and morally positive solutions that will hopefully drive opportunism, you need to realize that others will see your investment as an opportunity for their own benefit. You competitors may want to steal your ideas to remove any “first mover advantage” you may have. Your employees may see an opportunity to make money by stealing your data or your customer’s data and selling it to your competition. Others may just want to steal your data for bragging rights! The point is that every time you make an investment, there are people out there who will try to abuse the opportunity to take advantage of you. So what do you do when the world around you is so dangerous? You look to reduce the risks in your environment. You are threat modeling, whether you know it or not, everywhere you go. For many of us, the landscape is both foreign and evolving, which introduces new risks. Solutions that used to keep us safe stop being effective as new threats and vulnerabilities present themselves. The role of a cloud architect is to be well versed in the threats involved in our designs and to work hard at protecting and securing our assets against those threats. The most advanced and morally positive human beings have applied risk management by using their knowledge, rationality, and logic to enable opportunism while minimizing the risks and the consequences of their outcomes. Focusing on being morally positive is important, because even in the most developed countries, where we are equipped to understand the consequences of opportunities, people will prey on others for their own benefit. So how do we create a future that is inherently more secure and susceptible to less risk?
As a cloud architect, you should never discount the power you hold for shaping the future of our world.
The Social Dilemma is a phenomenal documentary that outlines some of the unintended consequences of opportunism in the tech industry. Although this documentary focuses on tech companies, the same theme applies to every single industry. Understanding the intended and unintended impacts of your work is your moral responsibility. I recommend giving this documentary a watch.
Security Fundamentals Security has traditionally been a massive practice that consists of multiple domains, where each domain depends on very different people with very different objectives. Governance and risk management, compliance, architecture, engineering, policy, operations, disaster recovery, legal, marketing—in all of these domains, security is not merely a technical aspect; in fact, security involves many roles and responsibilities.
Security as a philosophy still hasn’t gone through a full DevOps-like revolution. Security professionals are often like the organization’s boogeymen. There seems to be an overarching fear of them: they have traditionally been viewed as people who don’t approve anything and who don’t understand anything about your business application. They also do extremely technical work involving scary things and have the power to shut down your sprint cycle and give you busy work that you’ll never finish patching. You know, this sounds like the old Development, Testing, and IT Ops teams that were in place prior to DevOps. But these siloes crush the souls of people who want to do good work. Before we dive into the details of modernizing security in the cloud, let’s get some quick fundamentals out the way to clarify the security lingo.
CIA Triad The CIA triad is the most foundational information security model that provides three principles of strong security: confidentiality, integrity, and availability (CIA). Together, these principles form the basis of any security infrastructure and should be the guiding rubric on achieving security goals and objectives for an architecture. Confidentiality is a protection mechanism that ensures that your data has not been improperly accessed by the wrong people. Before someone can access your data, they need to be granted specific access to it. Encryption, which has been an important consideration for a very long time, is one way that you can ensure confidentiality. Julius Caesar created a form of encryption (today known as the Caesar Cipher) that essentially substituted characters to form a message that was unreadable to anyone who did not know the cipher. Given that this encryption simply involved shifting characters forward by three places, it wasn’t much of a difficult code to crack—but hey, life was simpler then. These days, we store a lot of intellectual property, sensitive customer data, and algorithms, and all of these things need to be protected from opportunistic individuals. Confidentiality is the principle of protecting them. Integrity ensures that your data can be protected against unauthorized modifications. Suppose, for example, you put your drink down at the bar, and when you next lift the glass, the taste is funky—you know you should stop drinking it. It would be nice if there were a way to detect whether it had been modified before you started drinking it again. That’s basically what hashing does for data—it tells you whether your data has been tampered with. Hashing calculates a set of numbers, known as a message digest or hash, and appends it to your data file so that you can validate that the hash is the same one used the last time the file was opened. If your data has been tampered with, that original hash value (assuming you’re using a hashing algorithm that is not outdated) will not be present and another will be used. If the hash value has changed, you know that your data has probably been compromised. Availability ensures that your product or service is available to its consumer as needed. Imagine calling 911 only to get a busy signal. Sorry buddy. Try again next time! Availability is typically dictated by a service level agreement (SLA) that defines clear service level objectives (SLOs) and service level indicators (SLIs) that you will be providing to the consumer of your service. The availability aspect of the triad is commonly associated with network architectures, performance of systems, and outages. Distributed Denial of Service (DDoS) attacks, geographical redundancy, application bottlenecks, and suboptimal network designs are all things that can impact your availability. If a hacker is trying to attack you, stealing your data may be only part of what they’re after. What if your data doesn’t matter and they just want to cause you financial harm by rendering your business unavailable?
Control Categories When it comes to controls, there are three categories: administrative controls, technical controls, and physical controls. Administrative controls describe the policies, procedures, and guidelines for humans to follow in accordance to the organization’s security controls. Think of your organization’s documented security policies, security training, incident response procedures, security principles, and so on. Technical controls are logical mechanisms you implement to protect your assets. Think of your firewall, identity and access management (IAM) configuration, network restrictions, and antivirus software. These are essentially the things you’ve configured and deployed in the cloud to ensure that bad things can’t happen. Physical controls are physical mechanisms used to secure your environment. These include badges for access to the facility, surveillance cameras, two-factor hardware tokens, the twisty road and gates at your data center that keep people from driving a car into your server racks, and so on.
Control Functions Aside from the category of controls, the controls themselves can perform three different types of security functions: they can be preventive, detective, or corrective in nature. Preventive controls are security controls designed to stop unauthorized activity from materializing. Think of how you design your IAM policies and bindings to prevent unauthorized access, your firewall and network patterns, data security controls, virtual machine (VM) configuration and image hardening, and so on. When it comes to designing your cloud infrastructure, you can imagine that you’re designing the infrastructure to prevent most security threats from materializing by minimizing the attack surface. An attack surface is the sum of all your possible security risk exposures. Typically, preventive controls may be documented by security architecture and policy teams, but they’re implemented by the development or operations team. Detective controls are security controls designed to detect unauthorized activity that has materialized. In this case, your security operations detection team is usually doing the work, looking for indicators of compromise in your environment, automating detection capabilities, building signatures, and receiving alerts on malicious activity. An indicator of compromise is an artifact or piece of data that gives your security team high confidence that a user or machine is compromised. If an intern is performing a privileged action, that could be an indicator of compromise. If your antivirus software detected malware on a VM, that’s a detective control doing its job. Corrective controls are security measures to correct unauthorized activity or resolve detected threats and vulnerabilities. Patching your VMs after a critical security flaw, responding to an incident, and rolling back or getting back to a known good state of your system—these are all examples of corrective controls.
Preventive controls are usually a joint effort by the information security and architecture teams, but the development teams are usually required to implement them. Detective and corrective controls are usually handled by the security operations detection and response teams. Asset × Threat × Vulnerability = Risk
What is the difference between a risk, a threat, and a vulnerability? People use these terms interchangeably so often that it makes me want to punch the air out of the sky and then repent to Mother Nature for the failed attempt. When it comes to security, our goal is to protect our assets. Assets are people, property, and data—tangible or intangible. A threat is anything that can exploit a vulnerability, whether it be intentionally or unintentionally, to obtain, damage, or destroy an asset. Vulnerabilities are weaknesses or gaps in a system that can be exploited by a threat to gain unauthorized access to an asset. The next factor, which tends to be forgotten, is understanding the risk probability and impact. Risk probability is the likelihood of a risk materializing. The risk impact is the effects and consequences of that risk event. Going back to the skydiving use case, let’s say that the probability of a parachute malfunction when you pack your own chute is 1 in 100. We know that the impact of a parachute malfunction is near certain death. Not many die-hard skydivers would still be around if 1 in 100 jumps ends in death, so obviously that is not happening. It is generally best practice to jump with two parachutes—a main chute and a backup. So when you use two self-packed chutes, your risk probability goes from 1 in 100 to 1 in 10,000. Using a professionally packed chute substantially increases your odds, and this is why most skydivers use a professionally packed backup parachute, which they hope never needs to be deployed. The second chute is a risk mitigation—a mechanism to reduce the probability or likelihood of a risk materializing. You can’t eliminate all risks. All you can do is reduce their probability and/or their impact. In Figure 11-1, you can see this idea. A threat can’t materialize against an asset without a vulnerability. A vulnerability is meaningless without an asset to attack. An asset with a vulnerability has no risk if there is no threat. The idea is this: you need all three, and that intersection is the risk. I
Security Modernization Traditionally, and in most organizations, there are a variety of security teams: security architecture, infrastructure security, data security, application security, security operations, risk management, compliance, and so on. Usually these represent very distinct roles, and oftentimes security gates are in place to require development teams to go through these individuals to get their applications risk assessed and approved. I use the term “gates” loosely. A security gate is like a baby gate you use to prevent your baby from going into the kitchen, but it falls over within seconds when that two-year-old hulk-smashes into it. Like babies, most development teams smash right through them—commonly referred to as Shadow IT. It’s really difficult to get someone in a distinct role to be proactive in approving your design and managing your application when the person doesn’t know the full context of your business application or doesn’t even know what type of work you do aside from security requirements and best practices. On the flip side, you can’t trust developers who just got out of college to ship code that hosts your most critical IP to the world and expect it to be secure. This exact issue leaves security operations teams (your firefighters) consistently under water, stressed out and burnt out, constantly trying to respond to threats and incidents. There is no time for automation, because they still haven’t figured out how to build manual detection for all of their stack, and they’re too busy getting hacked by 14-year-olds in Russia all day to have downtime. This model simply does not work. The human element is the weakest element. With the cloud, however, the shared responsibility matrix removes a massive burden from organizations, as they no longer own their entire security portfolio end-to-end. Depending on what element of the Infrastructure as a Service (IaaS), Platform as a Service (PaaS), or Software as a Service (SaaS) stack you’re looking at, Google will own a varying level of security for a variety of administrative, technical, and physical controls that traditionally were managed by the organization itself. The controls assumed could be preventive, detective, or corrective in nature. Your organization still has a significant responsibility here owning the residual security controls—and that is the responsibility of everyone in your organization, not just the security professionals. Basically, it’s your responsibility to understand the shared responsibility matrix!
Most incidents that happen in the cloud are the result of a cloud customer not following their responsibilities in the shared responsibility matrix. Even with a massive burden of the security elements of a tech stack offloaded from an organization, still organizations can’t get security right. That just proves that security breaches are not always your organization’s fault. Organizations that migrate to the cloud are often strong believers of DevOps and automation, and this plays a major role in their overall security posture. In DevOps-oriented organizations, much of the security work related to the infrastructure and application stack is added into the pipeline, automated where possible, and shifted left (moved to the earliest possible point in the development process) when possible. People use the term DevSecOps often; regardless of your lingo, security should be seen as a fundamental requirement for everything. There is still a never-ending and always-growing need for dedicated security professionals and a dedicated security organization. We can envision most world-class security teams operating in the same way DevOps teams are augmented with site reliability engineering (SRE). Security teams should spend 50 percent of their time supporting the work that other people are already doing and the other 50 percent of the time automating and innovating. When you think about modernizing the security of your organization, think about how you can continue driving toward automation. Balance out the security responsibilities across your organization and minimize your threat surface by providing built-in security specifications in the architectures that you design and implement. There’s so much more to unpack about how we can improve the various domains and functions of your security organization, but that’s for another book. Just remember that the only way we’ll get to a well-managed state is if we all take the responsibility, along with the security team, to build in the necessary security controls across our technology stack.
Security Automation in Practice The solution—a tool that assigns risk scores to firewall rule changes, automating the approval of low–moderate risk changes, flagging security to review high-risk patterns, and automating the closure of clear high-risk mistakes. By automating the bulk of firewall rule changes and removing security from being directly involved in every change, this solution sped up the time to deploy firewall changes and increased developer productivity throughout the organization. Key takeaway: think about how you can lead with kindness and assume the best intentions from your developers to shift security monitoring left without slowing down the pace of innovation.
Compliance Compliance is every security engineer’s least favorite word and every security risk manager’s favorite word. Compliance regulators can understand that system design is incredibly complex. If you can demonstrate that a risk is mitigated to your own teams, you’re making it easier for them to work with the actual regulators and protect the business. In Google Cloud, a variety of compliance offerings are offered by GCP and G Suite. We’ll discuss a few of them. For some of these compliance offerings, Google may self-attest to the controls required. For others, Google has independent third parties, such as EY and Coalfire, which will audit their environment to ensure that it meets the criteria to be certified compliant or to provide the ability for customers to build a compliant environment.
The Payment Card Industry Data Security Standard (PCI-DSS) provides companies that interact with credit card data the minimum security requirements they must meet to store, process, and transmit cardholder data. One common mechanism companies employ to descope all of these controls is by tokenizing credit card data. Tokenization is the process of substituting nonsensitive data, referred to as a token, for sensitive data. In the context of credit cards, imagine substituting your credit card numbers with a theoretical, nonreversible string of characters. If that credit card database gets exposed, that’s too bad for the hacker, because they can’t do anything with the tokens. Credit card breaches are very common and very costly, and they can cause massive amounts of damage to individuals whose cards are exposed. It’s often easier simply to reframe your thought process to assume a 100 percent probability that you will be breached or something will fail. If that happens, how can you prevent the impact of that risk? Rather than just think, “no way will my parachute not deploy,” it’s easier to go at your problem with, “I know for a fact that in the next six months of skydiving, this parachute will not deploy. What can I do to avoid meeting the Grim Reaper and enjoy my exciting hobby of jumping out of perfectly good airplanes?” In the credit card case, just assume that your database will be accessed or copied via some unauthorized path, because there are just too many people who have access. What will you do to secure it? Easy—take the sensitive data out of the picture. Then ask yourself, “What is the impact of the credit card database being breached if there is nothing left in the data that can provide an opportunity to a bad actor?”
Capital One’s Industry-Leading Breach Response The 2019 Capital One data breach exposed millions of credit card numbers, and this was incredibly costly for the organization. It’s never about if a security incident will occur; it’s about being prepared when an incident occurs. From the point at which the incident was discovered, Capital One led one of the fastest breach responses in technology history. Media research company CyberScoop called this “Light Speed” recognition (for more, see the “Additional References” section at the end of the chapter). In 2019, the average time to discover a breach was 297 days; Capital One discovered the breach and an arrest was made by the FBI within 12 days. This was a major organizational response, but in particular, it demonstrated a massive effort by the incident response team at Capital One. The company was not only able to assist the FBI in making an arrest quickly, but the response also came in fast enough time to prevent the millions of exposed data records from being routed across the Internet. This resulted in an $80 million civil penalty, which is relatively small compared to the $700 million penalty levied for the 2017 Equifax breach. If I had to guess, I’d say that the lower fine resulted from Capital One’s ability to minimize the data exposure. I’d love to give major recognition to my brother, Iraj Ghanizada, who ran the incident response program at Capital One at the time of the breach. Big shout out to his team for leading a historic breach response. The moral of the story is that your organization will get pwned one day. Prepare; push your executives to provide budgets to minimize the impact; and ensure that your security operations teams have adequate funding to be able to detect, respond, and recover from incidents as effectively as possible.
The Health Insurance Portability and Accountability Act (HIPAA) was enacted in 1996 by former President Bill Clinton and mandates how the flow of protected health information (PHI) should be managed. If you break HIPAA laws, you can go to jail. Protecting patient records is incredibly important. Healthcare companies typically align to a compliance framework known as HITRUST, a combination of more than 40 different compliance frameworks that provides a very thorough set of security requirements across all security domains for an organization to adhere to.
HIPAA and other compliance frameworks may set a mandatory retention period for logs. For HIPAA, it’s six years. How would you design a cost-effective solution to long-term, archival-based storage that needs to retain six years’ worth of logs? General Data Protection Regulation (GDPR) is a regulation in the European Union that describes how to protect user data and ensure user privacy for any data residing in the European Union or the European Economic Area. It describes how to manage user consent, minimize data collection, provide users the right to erase their data, and more. For organizations in the United States that think GDPR does not apply to them, think again. You are not allowed to store European citizen data outside of the European Union unless you follow GDPR rules. Furthermore, the great state of California requires personal information management in the United States as well, as described in the California Consumer Privacy Act (CCPA) of 2018. If you store a California resident’s private data anywhere, you need to follow most of the GDPR compliance rules, even within the United States. Failure to do so is a criminal violation—not just a slap on the wrists and a big fine. Executives could end up in jail for knowingly not playing by the rules. Let’s face it. Many enterprises have done a poor job of protecting our private data over the years. With the speed and growth of big data today, self regulation just didn’t work. Hopefully, when a CISO or CIO realizes that they could go to jail for not doing their due diligence and due care in protecting their clients’ sensitive data, they may come up with the budgets needed to safeguard their customers’ personal data properly.
Oftentimes, when you see a compliance-related question on the exam, you won’t need deep subject matter expertise in the compliance framework itself. Instead, a question will present a requirement that you need to fulfill, and you’ll need to define how to architect that requirement.
Infrastructure Security Highlights This section is going to be very straightforward—no fluff. It highlights some of the key security principles of each layer of the infrastructure. There are probably five more layers of depth when you’re doing actual engineering and implementation work beyond these line items. But keep these as guiding principles for building out your GCP infrastructure.
Identity Security Cloud Identity handles your user authentication. This can include directory synchronization from Lightweight Directory Access Protocol (LDAP) or Active Directory (AD), as well as single sign-on (SSO) and integration with any of your third-party identity providers. Some of the best practices to follow for protecting your user authentication are as follows: - Use fully managed Google accounts tied to your corporate domain name. - Employ SSO and directory synchronization to simplify the end-user experience and prevent misuse of credentials by end users. - Consider using reputable identity management solutions such as Okta. - Apply strong password policies. Leverage two-step verification by using hardware or software tokens to validate that a user is who they say they are before being approved to access your GCP resources. - Migrate unmanaged accounts (or conflict accounts) so they can be managed through Cloud Identity.
Resource Management Security Google Cloud Resource Manager provides resource containers such as organizations, folders, and projects and enables you to group and hierarchically organize GCP resources into those containers to simplify governance, manage billing, and manage your organization effectively. Some of the best practices to follow for managing your resource hierarchy are as follows: - Automate your project creation and the naming convention associated with your resource containers. - Leverage a many-projects approach with clear organization through the use of a folder hierarchy to separate and organize your projects based on the structure of your organization. - Factor in how your organization wants to manage billing across the various teams and functions, and leverage labels to ensure that you can govern your organization and manage your costs effectively, - Use the Organization Policy Service to define restrictions on certain resources at various layers of the resource hierarchy.
IAM Security Cloud Identity and Access Management (Cloud IAM) handles your IAM policies, including your role bindings. This is managed in GCP. Role-based access control enables you to define roles that your users can fit into, providing a simplified and scalable solution to manage users and teams en masse. Further, attribute-based access control enables you to define context-aware conditions in which your users can access an environment. Some of the best practices to follow for managing access to your environment are as follows: - Follow the principle of least privilege to ensure that your roles are privileged with only the minimum set of permissions that allow a user to do their job if they fill that role. - Follow the principle of separation of duties to ensure that your users don’t play many different roles, preventing insider threat, minimizing the attack surface, and minimizing the likelihood of collusion. - Don’t directly assign permissions; assign roles. - Leverage groups and service accounts to delegate responsibilities in an efficient manner. - Do not download service account keys; use short-lived tokens. If you need to download a service account key, use a key management solution such as HashiCorp Vault. - Protect your application secret keys by using a key management solution such as HashiCorp Vault or Google Secrets Manager to protect, rotate, revoke, and manage your secret keys.
Understand the predefined role naming convention. It’s pretty self-explanatory, but some questions on the exam may ask you which role is appropriate for a certain set of permissions. Knowing the general role naming convention of Viewer, Browser, Admin, and Owner will help you be successful here. Also, remember that predefined roles should be leveraged where they can; use custom roles only if necessary.
Network Security The design of your network is a fundamental element that protects you against compromised users and insider threats. Opening your network to billions of people across the globe is very dangerous. Even if you don’t open your network to the world, small configuration mistakes can trigger massive compromises. Some of the best practices to follow for protecting your network are as follows: - Google traffic is encrypted in transit by default, as the Google Front End (GFE) employs authentication and Transport Layer Security (TLS) to all traffic that hits the reverse proxy. After traffic is proxied, you’re trusting Google to route your traffic on its fully owned private network. Google provides many assurances to protect your traffic’s confidentiality, integrity, and availability. - Leverage firewalls to monitor and control incoming and outgoing network traffic. Store documented and risk-assessed network patterns in a central policy repository, and use automation to deploy firewall patterns across your stack. - Avoid using public IPs as much as possible. - Protect your public endpoints with Cloud Armor and other web application firewall (WAF) solutions to defend against common attacks such as cross-site forgery (CSRF), cross-site-scripting (XSS), SQL injections, and so on. - Use VPC firewall rules to allow or deny traffic to and from your VMs. - Leverage Private Google Access to minimize the need for external IP addresses on services that need to reach out to public Google endpoints. - Put your VMs behind external load balancers when you need to expose them to the outside to mitigate and minimize availability attacks and avoid exposing your VMs directly. - Use VPC Service Controls to constrain public Google endpoints to your Virtual Private Clouds (VPCs) to avoid data exfiltration from malicious insiders or compromised users. That way, any data in a Google endpoint falls under the same enforcement boundaries that you use to protect your VPCs. - Use the Identity-Aware Proxy (IAP) to control user access to your applications and eliminate the need for a VPN, as contextually aware solutions provide stronger security and simplify the end-user experience. - Ensure that all of your network logs are being monitored by your security teams to detect malicious activity within your VPC. - Do resilience testing by introducing faults in your environment and determining how they affect your architecture. - Document all network patterns and assess risk via your enterprise architecture team so that your developers cannot build their own network patterns on the fly and are required to build around existing patterns, or they get their patterns approved by your architects before they are deployed in the organization.
Application Layer Security The various IaaS, PaaS, and SaaS computing offerings are associated with varying best practices based on your responsibilities in the shared responsibility model. Some of these offerings don’t fit exactly into a category and may be hybrids (think Kubernetes, for example). Some of the best practices to follow for protecting your application layer are as follows: - Follow IAM best practices to protect access to your resources. - Avoid using unencrypted communication traffic between application services. Use TLS or Secure Shell (SSH) to ensure that traffic never flows in cleartext. Yes, this means likely managing internal certificates and certificate authorities. But there are practices that can simplify this operational overhead. - Follow networking best practices to ensure that only approved traffic flows between resources. - Harden the configuration of your computing resources to minimize unnecessary services and configuration items being enabled on such resources. - Centralize the configuration and monitor your resources against deviances from this configuration. - Leverage an image repository, approved images, and a trusted deployment process. - Leverage managed services where possible. If this meets your needs, it should strengthen your security, minimize the operations needed, and simplify the end-user experience. - Use OS Login to simplify SSH access management to VMs, linking them to their Google identities to ensure that you can monitor and govern privileged access. - Leverage automatic upgrades where possible and follow a strong patch management process. - Use templates whenever possible to programmatically manage your infrastructure. - Manage your APIs through an API management platform such as Apigee or Cloud Endpoints to protect access to your application interfaces.
Data Security Your data is the most critical asset to your organization. Protecting your data is protecting your business. A data exposure could cause serious harm to your business—fines, customer churn, imprisonment, and a complete loss of your business. Exposure does not necessarily mean a hacker stole your data; data may be exposed accidentally, it may be stolen, it may be mistakenly revealed by an insider, or it may involve a loss of availability. In the 21st century, data is our most valuable asset. Some of the best practices to follow for protecting your data are as follows: - Data in GCP is encrypted in the server side at rest by default using AES 256. Default encryption is the most secure solution for server-side encryption, depending on your perspective, if you think about minimizing the operational burden and mismanaging your own encryption keys. - Use an external key manager if you must fully own the keys to your data. - For organizational requirements, use Cloud Key Management Service (KMS) and CloudHSM to manage and protect your key-encrypting keys. - Consider using client-side encryption if you really want the added layer of protection. - Follow IAM best practices to protect against unauthorized access to data. - When transferring data to storage solutions to or within GCP, you must have appropriate security measures in place based on your transfer mechanism to ensure confidentiality and the validation of integrity upon completion. - Leverage managed services where possible to minimize the operational overhead for your teams. - Label your data environments and data according to their classification model to ensure that you can properly govern and monitor your data. - Leverage Cloud Data Loss Prevention (DLP) to discover, classify, and protect your sensitive data either on ingestion or at rest. - Consider tokenizing personally identifiable information (PII) data and limiting access and unnecessary copies of this data. Each copy is an increase to your threat surface. - Leverage additional database security features where applicable and properly design your table schemas.
DevOps Security Continuous integration and deployment is the most effective way to build, ship, and manage code. The continued emphasis on automation enables organizations to focus on building great products rather than managing complex processes. There are many opportunities for adding security gates through this automation process, and organizations should strive to automate everything they can. Some of the best practices to follow for protecting your pipelines are as follows: - Ensure that security considerations are integrated through the entire product life cycle, from design to development, delivery, operations, and all over again. - Do threat modeling during your planning phases to identify threat vectors that need to be protected against. - Perform code reviews/peer reviews during code development. - Track and remediate vulnerabilities at various stages of the life cycle through static code analysis, dynamic code analysis, and penetration testing. - Use infrastructure as code (IaC) to deploy your infrastructure. - Eliminate embedded credentials in code by using secret key management solutions. - Control, monitor, and audit access to privileged accounts. - Protect your service accounts that require highly privileged access to deploy IaC.
Security Operations There are two separate focus tracks handled by two different teams or functions: security of the infrastructure and security operations. Security of the infrastructure is typically handled by architects, developers, operations, and engineers. These are all mostly preventive security items such as designing your networks properly, ensuring that your secrets are managed properly, designing your data ingestion pipelines and storage, and so on. Think about it: all the best practices described in the previous chapters focused on designing your overall infrastructure to prevent bad activity from occurring in the first place. Usually these requirements are driven by your central security team. But the actual work is federated to the development side of the house, depending on how your organization operates. Now what happens when bad activity has occurred? Who is best equipped to detect and respond to this activity? The answer is not always black-and-white and usually depends on the size of your organization. The more mature an organization, the more likely it will have a security operations center (SOC) or a detection and response (D&R) team—whatever you want to call it—that handles all of this type of work. For some smaller or less-equipped organizations, this may be federated to the actual developers (which is a bad practice because developers don’t have the skills that a dedicated security operations team would have) or federated to a managed security services provider (MSSP), which has its own pros and cons. Yes, having an MSSP is better than not having a security monitoring and response capability. But, as you can imagine, it is so challenging having folks who are not employees of the business integrate with the folks on the development side of the house to drive change. They are contractors after all, and they don’t really have much stake in the game. Whether the organization uses an SOC or a D&R team, the end goal is the same: to detect bad activity, respond to bad activity, and provide feedback to the business to mitigate and prevent this activity from occurring again. The skillset for this type of work is very technical, but it is not the skillset of a typical software engineer. For that reason, it never makes sense to have your development teams owning this function.
At a high level, however, here’s the 10,000-foot view of security operations: - The logging architecture must be able to provide all relevant security logs to this team. Typically this will be ingested in a security information and event management (SIEM) tool, such as Splunk, which enables security teams to analyze, enrich, alert, and respond to potential incidents. - Security operations teams will usually include an engineer who handles tooling and works with the cloud architecture team to build the logging architecture. The team will also identify all of the relevant log sources. - The log architecture is built, the logs are aggregated, and the logs are ingested in the SIEM. - The detection team, usually a team of analysts of varying levels, will look to correlate data and identify signatures (or patterns) of bad behavior that they will want to generate alerts on. - Structured alerts are created to trigger based on these indicators. Throughout this time, the team is continually seeking ways to enrich the data to provide better analytical insight so that they can minimize false positives and improve the detection capabilities. - When an alert triggers, it gets analyzed first. Upon analysis, it goes through handoffs to investigate any indicators of compromise. Then it may be formally determined that an incident is occurring or has occurred. This triggers the incident response team to take over the case. - When the incident response team takes over, a formal incident response workflow is kicked off. Through this workflow, a team is identified to take on varying roles depending on the size and scope of the incident. These roles can include communications lead, incident manager, subject matter expertise, and many more. - The goals of the incident response activities are to contain, eradicate, and recover from the incident. - Finally, the teams will join together for a postmortem, where they can create the lessons learned and share them back through the feedback loop to prevent this type of incident from occurring again.
Various tools can automate different tasks throughout the workflow, and automation should always be the goal. But as of 2020, most of this work is still heavily manual, and hackers are incredibly elusive. So it’s not always easy to automate mitigations for their behaviors. This gap will be closed through the use of machine learning and some of the latest great security companies out there. The industry is headed toward a state of automation, similar to the DevOps-like transformation, but that is many years out. In order to get there, we really need to focus on training security operations teams on doing automation and pushing leadership to allocate the budgets to hire headcount. In turn, that will minimize their utilization, giving them time to focus on creating automation and getting closer to the DevOps teams.
Within security operations is a massive, technical practice, and it’s important that you understand what your peers do so that you can build a better bridge between your teams. You also have to remember that being the firefighters of your organization, they are consistently under heavy stress loads. It’s hard for them to focus on automation and the like, because they are usually understaffed and overallocated, consistently responding to incidents. Be patient when working with your security operations partners.
Cloud Asset Inventory Cloud Asset Inventory (CAI) is the metadata inventory service that enables users to keep track of all of their assets in Google Cloud. CAI is incredibly important for doing governance. Think about the purpose of an asset management system. You don’t know what you don’t have visibility of, but if you have visibility of your entire infrastructure and things are properly tagged, it becomes easy to manage and deploy changes across the entire infrastructure. Oftentimes you would want to leverage asset inventory systems to scan your infrastructure for changes, to deploy certain policies across the stack, or to audit for compliance. Imagine, for example, that a security incident occurs and you notice that one of your firewall rules is open. You can analyze the asset history in the CAI to understand when the firewall rule change was made and then investigate that exact event to determine who made it, and then you can dial down to a root cause. Let’s say you want to deploy an organizational policy that prevents VMs from having external IP addresses. Your CAI knows where all your VMs are when that organization policy is deployed at a layer of its hierarchy and will update to reflect the new policy changes. Asset management is super important for many teams outside of security, and oftentimes when you ask people in on-premises environments what their biggest challenges are, you’ll most likely hear asset management as one of the top issues they face.
Security Command Center Security Command Center (SCC) is a security and risk management platform that provides visibility into various security elements of your organization. It does this by being connected directly to the CAI service and continuously analyzing for changes. Those changes are analyzed by various tools that make up the SCC suite of tools—namely, Cloud Threat Detection, Security Health Analytics, Web Security Scanner, and more. When a flaw is identified, it’s reported as a finding. These findings provide insight into who, what, where, and when the change was triggered. Security professionals can work directly out of the SCC interface or can have these findings exported to a SIEM of their choice, where they can analyze it in their native security tools. SCC is the start of a journey toward security modernization, and it is very quickly becoming a powerhouse tool in Google Cloud.
Cloud Threat Detection One of the logic engines within SCC is the Cloud Threat Detection (CTD) tool, which provides automated detection of certain threats in GCP through a few of its subtools. Within CTD are a few tools that focus on different threats. Event Threat Detection (ETD) and Container Threat Detection (KTD) are currently the main tools included in CTD. ETD focuses on log-based threats, and KTD focuses on threats in the container runtime. Remember the difference between threats and vulnerabilities? Most security teams build threat signatures in their SIEM so that they can easily identify where there may be a compromise in the organization. Building these threat signatures manually is cumbersome, takes a ton of time, and needs to be continually iterated to be effective. The idea with CTD is that Google is doing the work for you, using their threat analysis data and all of their engineers to create various detections. This enables your team to free up some time to focus on other detections. As you can imagine, when the prebuilt detection list continues to grow, automation improves, and security operations teams can focus more of their time on innovation and automating other areas of their jobs, while spending the rest of their time focusing on their day-to-day job. The result of these detectors, or findings, can be exported to the SIEM. This enables your SOC or D&R teams to manage all threats using a single tool, such as Splunk. For smaller organizations that may not have a budget for Splunk or other SIEMs, using only the SCC user interface may be sufficient.
You probably won’t find any information about Cloud Threat Detection, per se, in the public documentation. You can look up Event Threat Detection and Container Threat Detection separately. These likely won’t be on the PCA exam.
Security Health Analytics The Security Health Analytics (SHA) tool is focused on vulnerabilities within GCP. SHA looks for vulnerabilities in the GCP, not host-based vulnerabilities such as those that Rapid7 or Qualys would look for within VMs. SHA will scan the CAI every 12 hours looking for things like open firewall rules or open GCS buckets. It’s based on a system of scanners: the tool leverages its prebuilt scanners to detect certain vulnerabilities. The findings can be exported to a SIEM. Usually, the SOC team does not focus on vulnerabilities, and instead a dedicated vulnerability management team focuses on remediating vulnerabilities at most organizations. It is possible that the vulnerability management team may be colocated alongside the SOC or D&R teams, but their job functions are very different.
Web Security Scanner Web Security Scanner (WSS) is focused on scanning your public GCP endpoints for web-based vulnerabilities. For example, if you have a public IP address on one of your Google Compute Engine (GCE), Google Kubernetes Engine (GKE), or Google App Engine (GAE) instances hosting a web server, WSS will look for things like XSS vulnerabilities or cleartext password fields, based on the OWASP Top Ten Web Application Security Lists.
OWASP, the Open Web Application Security Project, is a highly trusted community that determines the top web-based security issues that organizations need to protect against. For more information, see the “Additional References” section at the end of the chapter.
Additional References If you’d like more information about the topics discussed in this chapter, check out these sources: - What Capital One’s Cybersecurity Team Did (and Did Not) Get Right https://www.cyberscoop.com/capital-one-cybersecurity-data-breach-what-went-wrong/ - OWASP Top Ten https://owasp.org/www-project-top-ten/
Join 4M+ learners. Unlock unlimited quizzes, wrong-answer tracking, flashcards + reminders, study guides, and 1-on-1 challenges.