AWS Certified Solutions Architect: Associate - Study Guide

Share on:

With scheduling my AWS Certified Solutions Architect: Professional for late September 2019, I figured i’d finally compile all of the notes and gathered content for the AWS Certified Solutions Architect: Associate.

As a reference, here is the certification learning path for the AWS Certified Solutions Architect:

cert path

General preliminary reminders:

  • Ensure you have hands-on experience with AWS prior to the exam
  • The content below will likely change as AWS releases new services. This was up-to-date when I took the exam in December 2018.
  • Some of the content is mixed up, still working to get it all properly organized.
  • Portions of this has been piece-mealed from various sources. I bring it here to you, to help! :)

The Basics (101)

AWS Global Infrastructure You will never be tested on numbers (e.g. number of regions/availability zones)

Region -> Geographical (Brazil, Europe, Asia, etc)). Each region consists of two or more availability zones (AZ). It’s not in the same phisical space (if a flooding occurs, other data center can still answer) Availability zone -> Data center

North America Regions:

  • US East (Northern Virginia)
  • US East (Ohio) Region
  • US West (Oregon) Region
  • US West (Northern California) Region
  • AWS GovCloud (US-West Region)
  • Canada (Central)

Edge Locations -> CDN (content delivery network). They add new ones all the times, over 100 so far. There are many more Edge Locations than Regions. Main services provided by Amazon. Based on late 2017/early 2018.

Route53: DNS service

EC2: Virtual machines and/or compute

ECS: Virtual machines + docker

Elastic Beanstalk: Deploy apps and don’t worry about infrastructure. Good for starting users.

Lambda: Serverless. Upload code, no need to configure any server/virtual machines. Used by Amazon Echo

Lightsail: out of the box cloud. Virtual servers with fixed configs

STORAGE

S3: Virtual disk in the cloud. Object based storage

Glacier: Archive - low cost, but access is not inmediate

EFS: Elastic file service, mount disks without a specific size, automatically (elastically) grows

Storage Gateway: VM in premise with S3 support.

Cache Gateway: cache information from s3 in premise

DATABASES

RDS: Mysql, MariaDB, Microsoft SQL, Oracle, Aurora, etc.

DynamoDB: No relational database (NoSQL database)

Red Shift: Data warehouse. Copy your own data to create reports

Elasticache: Cache (can use two technologies: redis or memcached)

MIGRATION

Snowball: Portable disk that Amazon sends you, you fill it and send it back, and they import it. Several flavors (depends on how much data you need to move)

DMS: Database Migration Service: migrate database to AWS

SMS: Server migration Service: migrate Virtual Machines to AWS

ANALYTICS

Athena: Run SQL queries on S3. CVS or XML. Turn flat files into a database

Elastik Map Reduce (EMR): Big data processing. Large amount of data. Uses HADOP/ Apache Spark in the back

ElasticSearch: Uses Lucene. Open source

Cloud Search: Fully Managed by AWS. Same tecnology as ES

Kinesis: Stream and analize stream data. Big data

Sentient Analisis: Social media streams

Data Pipeline: move data from place to place. Move S3 to DynamoDB, or viceversa

Quicksight: Analyze data, create dashboards, etc.

SECURITY AND IDENTITY

IAM: Authenticate, permissions, etc.

Inspector: Inspects virtual machines (status, etc)

Certificate Manager: Manage SSL Certificates

Directory Service: Active Directory Management

WAF (Web Application Firewall): App net protection against DDOS, hacking. On top of the Network firewall

Artifacts: Compliance documents

MANAGEMENT TOOLS

Cloudwatch: Get information on VM, CPU, memory, etc. Stores different kind of logs

Cloud Formation: Turn servers into code. Templates to build entire networks/servers

Cloud Trail: Audit AWS Resources

OpWorks: configuration management service, provides instances of Chef and Puppet. Automated deploys

Service Catalog: Manage images (vm) and authorized servers in the org

Trusted Advisor: Automated tips and performance optimizations. Automated scan enviroment for problems/security issues

APPLICATION SERVICES

Step functions: visualize steps inside app/microservices. Serverless orchestration

SWF: simple workflow service. Facilitate task both automated (jobs) and human (IE, pick something from a storage)

API Gateway: Door to AWS services, backend services or your own code, in AWS. Can call Lambda functions, for example

AppStream: Stream desktop apps to users

Elastic Transcoder: video tools. Change format, resize, etc.

DEVELOPER TOOLS

Code commit: Store code in the cloud. GIT

Code build: Pay by minute, compile code in different environments

Code deploy: Deploy code to EC2. Automatic building, etc.

Code Pipeline: Keep track of code differences between environments, build pipeline (IE, trigger compile when committing code, run unit test, etc)

ARTIFICIAL INTELLIGENCE

Polly: Service that transform text into Mp3

Machine Learning: dataset, analize data, predict

Rekognition: Image recognition and processing

MESSAGING

SNS: Simple notification service - alert system, via SMS, HTTP endpoint, email, etc.

SQS: Simple Queue Service: message queue

SES: Simple Email Service: SMPT

API Gateway

  • caches responses from endpoint for set period of time (TTL)
  • cache can be encrypted, it can be flushed and you can define size
  • you can’t regenerate the cache
  • low cost, efficient
  • scales effortlessly
  • throttle requests to prevent attacks
  • connect to CloudWatch to log requests and troubleshoot
  • contains default DDOS protection

Cross-Origin Resource Sharing (CORS)

  • server on other end can relax same-origin policy
  • mechanism that allows restricted resources on web page to be from another domain

AWS CLI

Command line to configure AWS

You don’t use your console user and password, you use the security and access key that was provided when you added the IAM User If you lost this values, you have to re-generate them, it’s only show after the user is added. AWS Config

Detailed view of configuration resources

  • Evaluate config with desired settings
  • Get snapshots of current config in AWS account
  • Retrieve historical configurations
  • Receive a notification when resources are created, modified or deleted
  • View relationships between resources Compute EC2 - Elastic Compute Cloud, virtual/dedicate machines in AWS Lightsail - Virtual Private Server, dumb server with fixed-IP with SSH/RDP access not fully utilizing AWS services Elastic Container Service - Run and manage docker containers at scale Lambda - Code uploaded to cloud, you control when it executes (no worrying about machines underneath) Batch - Used for batch computing in the cloud Elastic Beanstalk - Developers can upload apps and AWS auto-configures

Storage

S3 - Simple Storage Service, object-based storage. Files are uploaded into AWS Buckets EFS - Elastic File System, network-attached storage (NFS) Glacier - Data archival, used for data that isn’t accessed very often and is cheap Snowball - Way to bring in large amounts of data to AWS, physical snowball is sent to data center and is sent to AWS where they import for you Storage Gateway - Virtual Appliances, virtual machines you install in your office and data is replicated back to Amazon

Databases

RDS - Relational Database Service (MySQL, MSSQL, PostgreSQL, Aurora, Oracle, etc.) DynamoDB - Non-relational Databases (Redis, etc.) Elasticache - Caching commonly-queried things from database server (top 10 products, etc.) Red Shift - Data warehousing/business intelligence, complex queries (doing P&L analysis, time-intensive queries, etc.)

Migration

  • AWS Migration Hub - Tracking service to track application as they are migrated to AWS Application Discovery Service (ADS)- Automated tools that detects application type and related dependencies Database
  • Migration Service (DMS) - Easy way to migrate database from on-premise to AWS Server Migration Service - Easy way to migrate virtual/physical server into AWS Cloud Snowball - Similar to Storage, this helps to migrate large amounts of data into the cloud

Networking & Content Delivery

  • VPC (Virtual Private Cloud) - Basically a virtual data center (configure firewalls, AZ’s, address ranges, network ACL’s, root tables, etc.). NEED TO UNDERSTAND THIS INSIDE AND OUT TO PASS!
  • CloudFront - Content Delivery Network by Amazon, delivers media assets to users (video files, audio files, etc) by storing close to users
  • Route53 - Amazon’s DNS Service
  • API Gateway - Amazon’s way to create API’s for services to talk to
  • Direct Connect - Amazon’s way of running dedicated line from your business into AWS

Developer Tools

  • CodeStar - Project managing of code for developers
  • CodeCommit - Place to store code (source control), private git repository
  • CodeBuild - Compiles, tests code and build packages ready for deployment
  • CodeDeploy - Deployment services that will deploy applications to EC2, Lambda, on-premise
  • CodePipeline - Continuous Delivery to Model/Visualize/Automate steps for software release
  • X-Ray - Used to debug/analyze serverless applications by showing traces
  • Cloud9 - IDE Environment to develop code inside AWS console

Management Tools

  • CloudWatch - Monitoring service, need to know for SysOps Admin exam
  • CloudFormation - Automated way to deploy servers/services in AWS, agnosticizing aspects to make deployments faster everywhere
  • CloudTrail - Everything that happens in AWS is recorded and logged in CloudTrail and makes for easy tracking of things happening in your environment
  • Config - Monitors configuration of entire AWS environment and keeps snapshots of entire AWS environment (visualize AWS environment)
  • OpsWorks - Similar to elastic beanstalk, used to automate configuration of environments (convered in Sysops Admin test)
  • Service Catalog - Way of managing a catalog of IT-approved services in AWS. Typically used for governance/compliance in big organizations
  • Systems Manager - Managing AWS resources (EC2 patch maintenance, for example). Resources can be grouped by department or application
  • Trusted Advisor - Will give advice across many different disciplines. Make sure to know the difference between - Trusted Advisor and Inspector Managed Services - Amazon will take care of EC2, auto-scaling

Media Services

Elastic Transcoder

  • MediaConvert - File-based video transcoding with broadcast-grade features
  • MediaLive - Broadcast-grade live video processing services (video streams)
  • MediaPackage - Prepares and protects videos for delivery over the internet
  • MediaStore - Place to store media (storage optimized for media)
  • MediaTailor - Allows to do targeted-advertising into video streams without sacrificing broadcasting quality

Machine Learning

  • SageMaker - Makes it easy for developers to use deep learning when coding for their environments Comprehend - Sentiment analysis around data
  • DeepLens - Artificially-aware camera (can understand what it’s looking at–localized detection, not running in cloud–physical hardware)
  • Lex - Powers Amazon Alexa service, artifical intelligence Machine Learning - Different to deep learning, entry-level. AWS will analyze data sets and predict outcomes
  • Polly - Takes text and turns into speach
  • Rekognition - Upload a file and it will do file analysis (upload pic of dog on beach, it will tell you “dog”, “beach”, etc.)
  • Amazon Translate - Machine translation service, like google translate but from Amazon
  • Amazon Transcribe - Used for those that are hard of hearing–takes audio and creates text

Analytics

  • Athena - Run SQL queries against items in S3 buckets (serverless)
  • EMR - Elastic Map Reduce, used for processing large amounts of data CloudSearch - Search service for AWS ElasticSearch Service - Elastic Search service for AWS
  • Kinesis - Way of ingesting large amounts of data into AWS (i.e. social media feeds for specific hashtag) Kinesis Video Streams - Allows you to ingest large amounts of data on people streaming your media QuickSight - BI tool, significantly cheaper than competitors
  • Data Pipeline - Way of moving data between different AWS services
  • Glue - Used for ETL (extract, transform, load), glue is optimized to achieve this

Security & Identity & Compliance

  • IAM - Identity Access Management Cognito - Way of doing device authentication (2-factor auth) GuardDuty - Monitors for malicious activity on AWS account
  • Inspector - Agent installed on EC2 instances, run tests against it to check for vulnerabilities–these can be scheduled Macie - Scan S3 buckets for any personally identifiable information (PII) and alert you
  • Certificate Manager - SSL certificates for free if registered through Route53
  • CloudHSM - Cloud Hardware Security Module, Dedicated hardware used to store your private/public keys or other keys, can also use keys to encrypt objects on AWS
  • Directory Service - Incorporate Microsoft Active Directory with AWS
  • WAF - Web Application Firewall (7-layer firewall), monitoring application layer
  • Shield - DDoS Mitigation Artifact - Used for audit and compliance, download audit and compliance reports

Mobile Services

  • Mobile Hub - Management console for mobile services
  • Pinpoint - Targeted push notifications for increased mobile engagement
  • AWS AppSync - Automatically updates data in web/mobile applications and will update offline users when they reconnect
  • Device Farm - Testing apps on real, live devices
  • Mobile Analytics - Analytics service for mobile devices

AR/VR

Sumerian - Used for AR/VR 3D app design

Application Integrations

  • Step Functions - Way to manage lambda functions and steps to go through it
  • Amazon MQ - Message queues (like RabbitMQ)
  • SNS - Notification service when triggers are hit
  • SQS - Way to decouple intrastructure, take messages in, and allow EC2 instances to poll data
  • SWF - Create a workflow to be modeled after a process that you have

Customer Engagement

Amazon Connect - Contact Center as a Service (CCaaS) ***Simple Email Service - Easy way to send large amounts of emails

Business Productivity

  • Alexa For Business - use to dial into rooms, inform IT of problem–Alexa in the workplace
  • Amazon Chime - Video conferencing by Amazon
  • Work Docs - Like Dropbox for AWS
  • WorkMail - Like O365/Gmail for Amazon

Desktop & App Streaming

  • Workspaces - VDI (Virdual Desktop) that can be accessed in the cloud
  • AppStream 2.0 - Stream application live to device (like Citrix)

Internet of Things

iOT - Can have devices sending back information iOT Device Management - Used to manage AWS iOT devices Amazon FreeRTOS - Realtime OS by Amazon Greengrass - Software to run local compute services in a secure way

Game Development

GameLift - Service to help develop game services in AWSShared Responsability:

AWS is responsible for:

  • Base hypervision
  • Zones
  • Network
  • Region
  • Operative system and patches in RDS databases
  • etc

YOU are responsible for:

  • Encryption
  • Operative system in your EC2 instances
  • Firewalls
  • Customer data
  • etc.

AWS Trusted Advisor

Application that learns from existing AWS customers Inspects AWS environment and makes recomendations for

  • Saving money
  • Improve performance
  • Close security gapsBeanstalk

Deploy, monitor and scale app quickly Highly abstract focus towards infrastructure Simplify infra management, uses GUI to configure things. Good for people with few AWS experience Uses CloudFormation in the background. Can be used for Workers/Jobs

Pre-configured Instance support:

  • NodeJS, Python, PHP, Ruby, Tomcat, .NET (Win IIS), JAva, Go, Packet
  • Docker images
  • Generic docker

Can have multiple versions of your app Can be split into tiers (Web / App / DB tier / Front end / Backend, etc) You can update the configs after created

Updates can be

  • 1 instance at a time
  • % of instances
  • Immutable (launches all apps from 0 again)

If Beanstalk creates your RDS, will be deleted if/when you delete the EBS instance

Business Benefits of Cloud

  • Almost zero upfront infrastructure investment
  • Just-in-time infrastructure
  • More efficient resource utilization
  • Usage-based consting
  • Reduced time to market

Technical Benefits of Cloud

  • Automation (Scriptable Infrastructure)
  • Auto-scaling
  • Proactive Scaling
  • More efficient development lifecycle
  • Improved testability
  • Disaster recovery and Business Continuity
  • “Overflow” traffic to the CloudCloudformation

Allows to transform hardware into code Easy way to create/manage AWS resources Can apply versioning to AWS infraestructure (like code)

Template –> Diagram

Stack –> Result of the diagram

Format: JSON or YAML

Template Elements:

  • Required: list of AWS resources
  • Optional:
    • Version, file format
    • Template parameters (up to 60)
  • Output
    • Public IP, ELB addresses (up to 60)

Naming

  • You can assign local names, and they are used partially when creating resources.
  • Names are not fixed/enforced to avoid conflicts. Some exceptions exists (IE bucket names).

You can install software with a set of bootstrapping scripts.

Includes integrations with Chef and Puppet

Supports tagging , EBS volumes are automatically tagged

Once provisioned, you have control of the resources

  • Automatic rollback if error is ON by default (everything is deleted if an error occurs). Keep in mind you are charged for errors, but usage of
  • CloudFormation is free
  • Stacks can wait for app to be provisioned using WaitCondition Route53 is supported
  • IAM Role creation is also supported
  • Can define deletion policies for resources, when you delete the stack, resources are not deleted
  • 200 stacks max, can request more

If you want to hide something from Cloudtrail/Cloudwatch, mark the parameter with NOECHO

Difference with Elastic Beanstalk?

  • CloudFormation and Elasticbeanstalk compliment eachother Beanstalk deploys and runs app in the cloud, integrated with dev tools, manage life cycle of apps CloudFormation is a mechanism to provision AWS resources, template to build the entire infrastructure, including Beanstalk apps

Content Delivery Network (CDN)

Edge Location - Location where content will be cached; separate to an AWS Region/AZ

  • Origin - Origin of all the files that the CDN will distribute. This can be an S3 Bucket, EC2 Instance, Elastic Load Balancer, or Route53, or not with AWS
  • Web Distribution - Typically used for websites
  • RTMP - Used for media streaming (adobe flash)
  • Edge Locations are not just read only, you can write to them, too
  • Objectes are cached for life of TTL – Default: 24 hours – Max: 365 days
  • You can clear objects from the Cloudfront, but you will be charged
  • Restrict Viewer access – Signed URLs – Signed Cookies
  • Geo restriction

Cloudtrail is used to log all the API calls made internally on AWS, mostly for audit

Since all the settings you change via the console are actually API calls made to the internal AWS API, if you enable Cloudtrail you can get all the information about everything that was done via the console or via specific API calls.

You can turn on a trail across all regions for your AWS account. Cloudtrail will deliver log files from all regions to a S3 bucket and an optional Cloudwatch log group you specify.

  • Standard Monitoring = 5 minutes
  • Detailed Monitoring = 1 Minute

You have to pay if you want Detailed Monitoring

In Cloudwatch you can

  • Create dashboards
  • Create alarms
  • Create events (state changes for AWS resources for example) Logs (agregate, monitor and store logs) Aurora
  • MySQL-compatible
  • combines speed and availability of high-end commercial databases
  • has simplicity an dcost-effectiveness of open source databases
  • five times better performance than MySQL at 1/10th price of commercial databases
  • storage starts with 10GB, scales in 10GB increments up to 64TB
  • compute scales up to 32 vCPU’s and 244GB memory
  • 2 copies of data in each AZ, minimum of 3 AZ’s
  • designed to transparently handle loss of 2 copies without affecting DB write availabilty
  • designed to transparently handle loss of 3 copies without affecting read availability
  • self-healing; data blocks/disks are continuously scanned for errors and repaired automatically

Aurora Replica Features

  • Aurora Replicas
    • 15 MySQL Read Replicas
    • 5DynamoDB
  • NoSQL database for consistent, single-digit milisecond latency at any scale
  • Stored on SSD storage
  • Spread across 3 geographically distinct data centers
  • Eventually Consistent Reads
    • Consistency across all copies of data is uaully reached within a second. Repeating read after short time will return updated data (best read performance)
  • Strongly Consistent Reads
    • Returns a result that reflects all writes that received a successful response prior to the read
  • Autoscalling supported (% target utilization, min/max)

DynamoDB Pricing

  • Provisioned Throughput Capacity
    • Write Throughput $0.0065 per hour for every 10 units
    • Read Throughput $0.0065 per hour for every 50 units
  • Storage Costs
    • First 25 GB –> Free
    • $0.25GB/month
  • Free tier: 25 units read / 25 units write

DynamoDB Streams

  • Capture changes to DynamoDB for 24 hours. Audit trail like (Add, change (before and after), delete). Use LAMBDA if you want to store the data more than 24 hrs

Max size of each item with attributes: 400KB BatchWriteItem: 25 items, 16MB BatchGetItem: 100 items, 16MB

Scan: Eventual or consistency, add parameter ConsistentRead Iterator: returns 1MB and LastEvaluatedKey (to paginate)

Data types: Number, string, binary, boolean, NULL JSON: stored as document, can create keys and filter by attribute, can update a sub-element, can use document SDK as wrapper (JS)

Indexes: Global Secondary Index: Can add up to 5 per table Local Secondary Index Can add up to 5 per table, AT CREATION (can’t add them later) 10GB PER PARTITION

Security

Fine granular access control allows users in IAM to access/deny information (table, items or even attributes)

Reserved capacity can be bought at discounted price. Limited to a single region.

Triggers are supported (uses DynamoDB w/Lambda)

Can specify TTL on tables. Needs to have a timestamp

DAX: In memory cache (in SDK Node.js & Java)ElastiCache

  • Memcached
    • No Multi-AZ support
  • Redis
    • Multi-AZ support

When asked which service to use to alleviate stress/load on database:

  • Elasticache is good choice if database is read heavy and not prone to frequent changing
  • Redshift is good if reason database is stressed is because management keeps running OLAP transactions on it Automated Backups
  • Allow you to recover database to any point in time within retention period (1-35 days)
  • Take full, daily snapshot
  • Store transaction logs
  • Enabled by default
  • Stored in S3, free equal to size of database
  • Deleted when RDS instance is deleted

Snapshots

  • Database snapshots are done manually (stored even after RDS instance is deleted)

Restoring Backups

  • When using either restore option, restored version of database will be a new RDS instance with new DNS endpoint

Encryption

  • Done using AWS Key Management Service (KMS)
  • Encrypting existing RDS is not currently supported

Multi-AZ RDS

  • Used for Disaster Recover (DR) only
  • Availability
    • SQL Server
    • Oracle
    • MySQL Server
    • PostgreSQL
    • MariaDB

Read Replicas

  • Used for scaling, not DR
  • Must have automatic backups turned on to deploy a Read Replicas
  • You hcan have up to 5 Read Replicas of any database
  • Allow you to have a read-only copy of your database
  • Achieved using asynchronous replication
  • Used for performance improvements, read-heavy database workloads
  • Each read replica will have its own DNS end point
  • You can have read replicas that have Multi-AZ
  • You can create read replicas of Multi-AZ source databases
  • Read Replicas can be promoted to be their own databases (breaks replication)
  • You can have a read replica in another Region Redshift
  • Datawarehousing
  • Column Data. Agregation
  • Single Node (160GB)
  • Multi-Node
    • Leader Node, manages client connections and receives queries
    • Compute Node, store data and perform queries and computations
      • Up to 128 Compute Nodes
  • Columnar Data Storage
  • Massively Parallel Processing (MPP) - Automatic distribution of data and query loads across all Nodes
  • Currently only available in one AZ
  • Can restore snapshots to new AZ in event of outage

Costs

  • Computer Node Hours
  • Backup
  • Data Transfer (within a VPC, not outside of)

Encryption

  • Encrypted in transit using SSL
  • Encrypted at rest using AES-256
  • Redshift takes care of key management

Availability:

  • SINGLE AZ
  • Can restore snapshots to other AZ
  • Enable Cross-Region snapshot for recovery

VPC: Turn on Enhaced VPC routing for VPC endpoints (So data doesn’t leave your own VPC)AWS Database Types Maximun size: 16 TB. If larger, consider Redshift

RDS - OTLP (Online Transaction Processing)

  • SQL Server
  • Oracle
  • MySQL
  • PostgreSQL
  • Amazon Aurora
  • MariaDB
  • DynamoDB
  • RedShift OLAP (Online Analytics Processing, Datawarehousing)
  • Elasticache

Non-Relational Database Structure

  • Database
    • Collection (table)
      • Document (row)
        • Key/Value Pairs (fields)

Data Warehousing

  • Used for Business Intelligence (Cognos, Jaspersoft, etc.)
  • OLTP Vs. OLAP — OLTP (Online Transaction Processing) — — Order number 2120121 — — Pulls up a row of data (name, date, address, status) — OLAP (Online Analytics Processing, used for Datawarehousing) — — Pull in large number of records — — Uses different type of architecture for database and infrastructure

Elasticache

  • Web service that makes it easy to deploy, operate, and scale in-memory cache in the cloud
  • Types
    • Memcached
    • Redis

Summary Database Types

  • RDS (OLTP)
    • SQL
    • MySQL
    • PostgreSQL
    • Oracle
    • Aurora
    • MariaDB
  • DynamoDB (NoSQL)
  • RedShift (OLAP)
  • Elasticache (in-memory)
    • Memcached
    • Redis

Multi-AZ

  • Used for DR
  • Not used for performance gains

Read Replicas

  • Used for scaling, performance gains
  • You can have up to 5 Read Replicas
  • You can have replicas of replicas (higher latency)
  • Can be in a different region

Aurora scaling

  • 2 copies of data in each AZ, 3 AZ’s minimunm (total of 6 copies)
  • Designed to handle losses transparently
  • Self-healing storage

Aurora Replicas

  • Up to 15 Replicas

MySQL Replicas

  • Up to 5 Replicas

DynamoDB vs RDS

  • DynamoDB offers “push button” scaling
  • RDS requires bigger instance size or to add Read Replica

DynamoDB

  • stored on SSD storage
  • spread across 3 geographically distinc data centers
  • Types
    • eventually consistent reads (default)
    • strongly consistent reads

Redshift Configuration

  • Single Node (160GB)
  • Multi-Node
    • Leader Node (manages client connections)
    • Compute Node (stores data, performs queries, up to 128 nodes)

Elasticache

  • Memcached
    • Multi-AZ NOT available
  • Redis
    • Multi-AZ available

Two types of backup:

  • Automated -> retention period: between 1 and 35 days
  • Database snapshots Stored in S3. Not deleted when the RDS instance is deleted

Encryption at rest: KMS. Can’t enable encryption on existing DB. Must perform a copy and enable the encryption on the restored copy. SOA record stores:

  • name of server that supplied data for the zone
  • admin of the zone
  • current version of the data file
  • number of seconds a secondary name server should wait before checking for updates
  • number of seconds a secondary name server should wait before retrying a failed zone transfer
  • maximum number of seconds a secondary name server can use data before it must refresh or expire
  • default number of seconds for the time-to-live (TTL) file on resource records

NS Records (Name Server Record)

  • used by top level domain server to direct traffic to the Content DNS server which contains authoritative DNS records

A Records (Address Record)

  • used to translate domain name to IP address

TTL Record (Time-To-Live Record)

  • The Length that a record is cached on either the Resolving SErver or the users local PC

CName Record (Canonical Name Record)

  • can be used to resolve one domain name to another

Alias Record

  • works like CName record in that you can map one DNS name to another
  • CName can’t be used for naked domain names, can’t have CName for violetfamily.com, it must be either A Record or Alias

EBS Vs Instance Store

  • All AMI’s are categorized as either backed by Amazon EBS or backed by instance Store
  • EBS Volumes: — The root device for an instance launched from the AMI is an Amazon EBS volume created from an Amazon EBS snapshot
  • Instance Store Volumes: — The root device for an instance launched from the AMI is an instance store volume created from a template stored in Amazon S3 — Sometimes called Ephemeral Storage — If host fails, you lose your data
  • Only EBS backed instances can be stopped
  • Both instances types can be rebooted
  • By default, ROOT volumes will be deleted on termination, but with EBS volumes you can tell AWS to keep the root device volumeAutoScalling

Launch configuration on Autoscalling group -> Choose AMI Can’t change the AMI ID, it’s chosen on creation

Grace period: time that takes an instance to warm up. Will starts the checks after this period.

You can find load logs related to autoscaling in

  • Cloudwatch (metrics)
  • Access logs
  • Request tracing
  • Cloud trail logs

How to register a LB group?

  • Instance Id
  • IP Address of the instance

3 ways to scale the servers

  • Manual Scaling
  • Dynamic scaling – In Target Tracking Scalling, you select a metric and set a target value, and EC2 Autoscalling sets the Cloudwatch Alarms to trigger the scaling based on the metric that you set (or as close as possible) – Step scaling allows you to “step up” the number of servers (IE, add 2, add another 2, add another 2, etc), depending on the alarm breach – Simple scaling increases the current capacity of the group based on a single scaling adjustment. If you can, use step scaling even if you have a single metric.
  • Scheduled scaling – You can predict the load changes and how long you need it to run (IE, add 2 more servers between 9am and 12pm from Monday to Friday)

Volumes & Snapshots

  • Volumes exist on EBS — Virtual Hard Disk
  • Snapshots exist on S3
  • Snapshots are point in time copies of Volumes
  • Snapshots are incremental - only blocks that have changed since your lat snapshot are saved

Snapshots of Root Device Volumes

  • Can create AMI’s from Volumes and Snapshots
  • Can change volume sizes on the fly, including size and storage type. If you change a Volume on the fly, you have to wait 6 hours to change it again. Can’t change volume type of Magnetic Std HD
  • Volumes MUST BE in the same AZ as the EC2 instance
  • If you need to restore/move a EBS Volume to another AZ, you need to create a snapshot of the volume, and create a new volume based on that snapshot, in the other AZ

Encryption

  • To encrypt root volume, you need to create an AMI image of your boot disk first, OR use a third party software to encrypt

Volumes Vs Snapshots - Security

  • Snaps of encrypted volumes are encrypted automatically
  • Volumes restored from encrypted snaps are encrypted automatically
  • You can share snapshots only if they are unencrypted — Snaps can be shared with other AWS accounts or made public

Default option is to delete volume when instance is terminated. Can be turned off in EC2 settings

EBS Volumes only scale up; can’t shrink in size Elastic File System (EFS) - file storage service for EC2 instances.

  • Supports NFSv4
  • only pay for storage used
  • can scale up to petabytes
  • supports thousands of concurrent NFS connections
  • multiple EC2 can point to the same EFS
  • data is stored across multiple AZ’s within a region.
  • read after write consistency
  • can restrict permission to file level or directory level
  • can’t mount an EFS in multiples VPC; only one at a time
  • uses port 2049 (NFS) – file system and VPC must be in the same REGION
  • Two types: – General purpose: low latency – Max IO: higher latency, but useful for big data Application Load Balancers
  • best suited for load balancing http and https traffic
  • operate at layer 7
  • application-aware

Network Load Balancers

  • best suited for load balancing TCP traffic
  • operate at layer 4
  • can handle millions of requests per second with low latency

Classic Load Balancers

  • legacy Elastic Load Balancer
  • load balance HTTP/https
  • operates at layer 7
  • can use strict layer 4 load balancing
  • if application stops responding, ELB responds with 504
  • X-Forwarded-For header can pass on users public IP address

Is the load balancer not answering?

  • Internet facing load balancer is attached to PRIVATE SUBNET (should be in the public one)
  • Security group ACL does not allow traffic Placement Groups:
  • Only certain types of instances can be launched in placement group (compute, GPU, Memory, Storage)
  • AWS recommends homogeneous instances
  • Can’t merge placement Groups
  • Can’t move existing instance into a placement group
  • Can create AMI from instance and launch that into placement group

Clustered Placement Group:

  • Grouping of instances within a single AZ. Recommended for applications that need low network latency, high network throughput, or both
  • Can’t spread multiple AZ

Spread Placement Group:

  • Group of instances that are each placed on distinct underlying hardware. Recommended for applications that have small number of critical instances that should be kept separate from each other
  • Can spread over multiple AZExam Notes
  • Security Group updates are applied immediately
  • Security Groups are stateful (adding inbound rule automatically adds outbound rule)
  • All inbound traffic is blocked by default
  • All outbound traffic is allowed
  • Any number of EC2 instances within a security Group
  • You can have multiple security groups attached to an instances
  • Security Groups are STATEFUL and Network Access Control Lists are STATELESS
  • Cannot block specific IP addresses using Security Groups
  • You can specify allow rules but not deny rulesEC2 - web service that provides resizable compute capacity in the cloud

On Demand

  • for users that want low cost and flexibility
  • applications with short term, spiky, unpredictable workloads
  • initial testing on EC2

Reserved Instances

  • apps with steady or predictable usage
  • applications that require reserved capacity
  • users can make up-front payments to reduce total cost
  • Standard Reserved Instance — up to 75% off on-demand cost
  • Convertible Reserved Instance — up to 54% off on-demand cost — capability to change attributes of Reserved Instance as long as exchange results in creation of Reserved Instances of equal or greater value
  • Scheduled Reserved Instance — available to launch within time window you reserved — allows you to match capacity reservation to a predictable, recurring schedule

Spot Instances

  • flexible start and end times
  • only feasible at very low compute prices
  • users with urgent need for large amounts of additional computing capacity

Dedicated Hosts

  • useful for regulatory requirements that may not support multi-tenant virtualization
  • useful for licensing that doesn’t support cloud deployments
  • can be purchased on-demand
  • can be purchased as reservation for up to 70% off on-demand price

Instance Types F - FPGA (Field Programmable Gate Array) I - IOPS G - Graphics H - High Disk Throughput T - cheap general purpose (think T3 micro) D - Density R - RAM M - Main choice for general purpose apps C - Compute P - Graphics (think Pics) X - Extreme Memory

EBS - Elastic Block Storage

  • Attach block storage to EC2 instances
  • placed in a specific AZ, automatically replicated to protect you from single-component failure
  • if windows/linux installed on disk, it’s called the “root device volume”
  • Can’t mount 1 EBS volume on multiple EC2 instances. Use EFS instead

EBS Volume Types

  • General Purpose SSD (GP2) — General purpose, price/performance balance — 3 IOPS/GB up to 10,000 IOPS and bursts up to 3,000 IOPS for extended periods for volumes 3334GB+ — Less than 500 MiB/s
  • Provisioned IOPS SDD (IO1) — Designed for IO intensive applications or NoSQL databases — Used when needing more than 10,000 IOPS — Can provision up to 20,000 IOPS/Volume — More than 500 MiB/s
  • Throughput Optimized HDD (ST1) — Big Data — Data warehouses — Log processing — Cannot be boot volume
  • Cold HDD (SC1) — Lowest cost storage, infrequently accessed — Typically a file server
  • Magnetic (Standard) — Lowest cost per gigabyte of all volume times that are bootable — start dev here and move up when you’re ready

Instance Metadata

  • http://169.254.169.254/latest/meta-data/

Status Checks:

  • System Status Checks: underlying layer (TCP, etc, to see if the instances recieves network packages)
  • Instance Status Checks: software and network

Termination protection is OFF by defaultECS (Elastic Container Service)

  • ECS: Elastic Container Service
  • ECR: Elastic Container Registry
  • Task definition: blueprint
  • Service: Launches and maintains copies of tasks definitions
  • Cluster: Where tasks runs. Set of containers running ECS Service
  • Task: instaces of a task definition

ECS Cluster

  • Container instance \ - Task Service /- Task
  • Container instance / - Task

10,000-foot Overview

Know EC2 Pricing Models

  • On Demand — Pay by the second or hour
  • Reserved — Reserve capacity, contracts are from 12-36 months
  • Spot — Set a bid price and if spot price meets your bid it will be provisioned — Instances terminated when spot price goes out of range — won’t be charged if AWS terminates instance, but you will be charged if you terminate it
  • Dedicated Hosts — Used when licensing or multi-tenant is an issue

Know EC2 Instance Types

  • (FIGHT DR MCPX)

Know EBS

  • Storage Types — SSD, General Purpose GP2 (up to 10,000 IOPS, less than 500 MiB/sec) — SSD, Provisioned IOPS IO1 (MOre than 10,000 IOPS, more than 500 MiB/sec) — HDD, Throughput Optimized ST1 (frequently accessed workloads) — HDD, Magnetic Standard (cheap, infrequently accessed storage)
  • Cannot mount EBS Volume to multiple EC2 instances; use EFS instead
  • Termination Protection is turned off by default, you must turn it on
  • On EBS-backed instance, default action is for the root EBS volume to be deleted when the instance is terminated
  • EBS Root volumes of your DEFAULT AMI’s cannot be encrypted (but third party tools can be used to encrypt)
  • EBS Volumes can also be copied and then encrypted at that time
  • Additional volumes can be encrypted

Know Volumes Vs Snapshots

  • Volumes exist on EBS, virtual hard disk in the cloud
  • Snapshots exist on S3
  • You can take a snapshot of a volume, the snapshot will be stored on S3
  • Snapshots are point-in-time copies of volumes
  • Snapshots are incremental
  • First snapshot takes a while
  • Security — Snapshots of encrypted volumes are encrypted automatically — Volumes restored from encrypted snapshots are encrypted automatically — Snapshots can be shared, but only if they are not encrypted
  • Snapshots of ROOT Device Volumes — Stop instances before taking snapshot of ROOT volume
  • EBS Vs. Instance Store (Ephemeral Storage) — Instance Store volumes cannot be stopped — If underlying host in Instance Store fails, you lose your data — EBS can be stopped — Both can be rebooted, you won’t lose your data — Both will be deleted on termination, but EBS offers option to keep
  • Snapshotting RAID Array — Freeze the file system — Unmount RAID Array — Shutdown EC2 Instance

Know Amazon Machine Images (AMI)

  • Regional, but can be copied to other regions

Know CloudWatch (monitoring)

  • Standard (5minutes)
  • Detailed (1minute)
  • CloudWatch is for performance monitoring
  • Unlike CloudTrail, which is for auditing AWS
  • Dashboards
  • Alarms
  • Events
  • Logs

Know Roles

  • More secure than storing access key and secret access keys on instances
  • Easier to Manage
  • Can be assigned to EC2 instance AFTER provisioning
  • Universal to region

Know Instance Meta-data

  • Used to get information about an instance
  • curl http://169.254.169.254/latest/meta-data/
  • curl http://169.254.169.254/latest/user-data/

Know EFS Features

  • Supports NFSv4.1 protocol
  • Only pay for storage used
  • Can scale to petabytes
  • Can support thousands of concurrent NFS connections
  • Data stored across multiple AZ’s
  • Read After Write Consistency

Know Lambda

  • Event-driven compute service
  • Compute service, run code in response to requests

Know Placement Groups (assume clustered is implied, if not mentioned)

  • Clustered Placement Groups — Always in one AZ, used for Big Data (low latency, high throughput)
  • Spread Placement Groups — Important EC2 instance on separate hardware

KNOW Elastic Container Service (ECS)

S3 Exam Tips

  • S3 is object-based
  • Files can be 0B to 5TB
  • Unlimited storage
  • Files are stored in Buckets
  • Universal namespace, names must be unique
  • Read after Write consistency for PUTS of new objects — Immediately able to read object
  • Eventual Consistency for overwrite PUTS and DELETES (take time to propagate) — If you update object and then try to read you may get the old object
  • Writing to S3 returns HTTP200 for successful write
  • Loading files is faster when multipart upload is enabled

Route53 exam tips

  • You can only resolve an ELB by going to it’s DNS name
  • ELB never has IPv4 address, only DNS names
  • Understand difference between Alias Record and CName Record
  • Given the choice, always choose Alias Record over a CName Record
  • Understand routing policies and their use cases

VPC exam tips

  • Think of VPC as logical datacenter in AWS
  • Consists of IGW’s (or virtual private gateways), route tables, NACL’s, Subnets, Security Groups
  • 1 subnet = 1 AZ
  • Security Groups are stateful; NACL’s are Stateless
  • NO TRANSITIVE PEERING
  • If you need to access resources from another AWS account, you need to perform a VPC peering between both accounts.

Load balancer tips

  • 3 types of Load Balancers — Application Load Balancers (layer 7) — Network Load Balancers (layer 4) — Classic Load Balancers (layer 7 and layer 4)
  • 504 means the gateway has timed out. This means the application not responding within the idle timeout period. — Troubleshoot. Is it the web server or the database server?
  • If you need IPv4 address of end user, look for the X-Forwarded-For header
  • Instances monitored by ELB are reported as InService or OutofService
  • Healthchecks instance by talking to it
  • ELB’as have their own DNS name
  • Read FAQ

Exam Tips

  • ELB do not have pre-defined IPv4 addresses, must resolve using DNS name
  • Understand the difference between Alias Record and CName Record
  • Given the choice, always choose an Alias Record over CNameRoles are not tied to specific region (neither are users) Can apply roles to running instances If you apply a role to an instance, there’s no need to configure the Access Keys / Secret keys to get permissions to use AWS Services (Ie, to access an private S3 bucket) –> MORE SECUREIdentity Access Management (IAM) - Allows you to manage users and their level of access to the AWS Console.
  • Centralized control of AWS account
  • Shared access to AWS account
  • Granular permissions
  • Identify Federation (AD, FB, LinkedIn, etc.)
  • Multifactor Authentication
  • Provide temporary access for users/devices/services
  • Allows you to setup password rotation policy
  • Integrates with many services
  • Supports PCI DSS Compliance

Critical Terms Users - End users Groups - Collection of users under one set of permissions (Admins, HR, etc.) Roles - Create roles and assign them to AWS resources (i.e. giving EC2 instance role for writing to EC2) Policies - Document that defines one or more permissions. Apply policies to users, groups, and roles

IAM does not use region concept.

You can create cross-account roles (ie, you hire a company to perform audit, the user that you provide to the auditor can be cross-account)

Never use your root account for daily use. ALWAYS create new users

Remember: Add user confirmation window (where the security and access key is shown) is only displayed ONCE. If you lose access, you will have to regenerate the keys. Kinesis

  • Kinesis Stream
  • Kinesis Firehose
  • Kinesis Analytics

Kinesis Streams

  • data stored for 24 hours by default
  • data stored in shards
  • data consumers (ec2 instances) turn shards into data to analyze
  • 5 transactions per second for reads, maximum total rate of 2 MB/second up to 1,000 records for writes

Kinesis Firehose

  • Automated
  • no dealing with shards

Kinesis Analytics

  • Way of analyzing data in Kinesis using SQL-like queriesExam Tips
  • Lambda scales out (not up) automatically
  • Lambda functions are independent
  • Lambda is serverless
  • Know which AWS services are serverless
  • Lambda functions can trigger other lambda functions
  • Lambda is event driven (runs code in response to events)
  • Architecture can get complicated, AWS X-ray helps to debug
  • Lambda can do things globally
  • Know your triggers — API Gateway, Alex Skills Kit, IoT, S3, DynamoDB, Cloudwatch, Cloudfront, DynamoDB, etc.
  • Code supported: JS - Java - Python, C#, C++
  • Pricing: first 1 million hits -> Free. 0.20 USD per million after
  • Duration: can’t run more than 15 mins (was recently raised, it was 5 mins before)
  • The more memory/duration you need the function running, the higher the costLoad Balancers

Types of Load Balancers:

  • Application load balancer (http / https level)
  • Classic Network load balancer (TCP))
  • Network load balancer (TCP/UDP)

Can be external (accessible via internet) or internal (balancing backend instances behind a subnet, for example)

Performs health checks

  • Unhealthy threadhold: number of consecutive checks failed
  • Healthy threadhold: number of consecutive OK to consider healthy

Load balancers only have HOSTNAMES, not IP address. This is because if a AZ goes down, it can move to another without problems Routing Policies:

  • Simple – default routing policy when you create a new record set. Most commonly used when you have a single resource that performs a given function for your domain (e.g. one web server that serves content for violetfamily.com) can point to a ELB that will later balance the load between N servers, but it’s still pointing to a single item
  • Weighted – allows you to split your traffic based on different weights assigned
  • Latency – allows you to route your traffic based on the lowest latency for your end user (region with fastest response time)
  • Failover — used when you want to create an active/passive setup (e.g. use primary site in US-EAST-1 and secondary DR Site in US-WEST-1).
  • Geolocation

Aliases can point to:

  • ELB
  • cloudfront
  • S3 buckets

CNAME: Charged $$$ ALIASES: FreeS3 - Simple Storage Service, provides developers and IT teams with secure, durable, highly-scalable, flat object storage.

  • Object-based storage: — Key — Value — Version ID — Metadata — Subresources: — — Access Control Lists — — Torrent
  • Unlimited storage
  • Files can be 0 Bytes to 5 Terabytes
  • Files stored in buckets (basically just folders/logical separation)
  • Bucket names have to be unique globally
  • Successful upload will receive HTTP200 code
  • Read after Write consistency for PUTS of new objects
  • Eventual Consistency for overwrite PUTS and DELETES (can take some time)
  • Built for 99.99% availability
  • Amazon guarantees 99.999999999% durability (unlikely to ever lose a file) (11 9s)
  • Tiered storage available
  • Lifecycle management
  • Versioning
  • Encryption
  • Secure your data using Access Control Lists and Bucket Policies
  • Bucket tags are not inherited to files

S3 Storage Tiers

  • S3 Standard: — 99.99% available, 99.999999999% durable
  • S3 IA (Infrequently Accessed): — Data that is accessed less often, but requires rapid access. Lower fee than S3 but charged retrieval fee.
  • S3 One Zone IA: — Lower-cost option for IA but doesn’t require multiple AZ resiliance
  • Glacier: — Super cheap, used for archival only. Retrieval time takes 3-5 hours

S3 Charges

  • Storage
  • Requests
  • Storage Management Pricing (tags)
  • Data Transfer Pricing (cross-region replication)
  • Transfer Accelleration - fast transfers over long distances using CloudFront
  • Can configure bucket as Request Pays if you use multiple AWS accounts and multiple buckets that transfer info between them

S3 Versioning

  • Stores all versions of an object (even if you delete an object)
  • Once enabled, versioning cannot be disabled, only suspended
  • Integrates with Lifecycle rules
  • Versioning’s MFA Delete capability, which uses MFA, can be used to provide additional layer of security

S3 Cross Region Replication

  • Versioning must be enabled on both the source and destination buckets
  • Regions must be unique
  • Files in an existing bucket are not replicated automatically, all new and updated files will be replicated automatically
  • You cannot replicate multiple buckets or daisy chain replication
  • Delete markers are replicated
  • Deleted individual versions or markers will not be replicated

S3 Lifecycle Management

  • Can be used with versioning
  • Can be applied to current and previous versions
  • Transition to IA after 30 days is possible, if file is larger than 128k
  • Archive to Glacier after 30 days is possible
  • Can permanently delete after N days

S3 Security & Encryption

  • All newly created buckets are private by default
  • You can setup access control for buckets using bucket policies and ACL
  • Buckets can be configured to create access logs which log requests made to the bucket.
  • Methods of Encryption — In Transit — — SSL/TLS — At Rest — — Server Side — — — S3 Manged Keys SSE-S3 — — — AWS Key Management Service, Manged Keys SSE-KMS — — — Server Side Encryption with Customer Provided Keys SSE-C — — Client Side Encryption

S3 Transfer Acceleration

  • Uses CloudFront Edge Network to accelerate yoru uploads to S3

S3 Static Website Hosting

  • [bucketname].s3-website-[region].amazonaws.com
  • CORS: you can enable cors on the bucket to allow other sites to get the files from the bucket
  • If you want to host a static website in S3, just create a bucket name with the URL (IE, if you want to host something.com, create a bucket name with that name) and create an alias to that bucket.

Dualstack: support for IPV4 and IPV6

Storage Tiers — S3 Standard — — 99.99 available, 99.999999999 durable, designed to sustain loss of 2 facilities concurrently — S3 IA (Infrequently Accessed) — — Accessed less frequently, requires rapid access when needed. Lower fee than S3 but charged for retrieval. — S3 One Zone IA — — Want lower-cost for infrequent data but doesn’t require multiple AZ resiliency — Glacier — — Cheap, used for archival only. 3-5 hour retrieval time

Core Fundamentals of S3: — Key — Value — Version ID — Metadata — Subresources —- Access Control Lists —- Torrent file

  • Versioning — Objected based storage (files only, not OS or db) — All version of object are stored, writes and deletes — Once enabled, versioning cannot be disabled, only suspended — Integrates with Lifecycle rules — Versioning’s MFA Delete capability can be used to provide additional layer of security — Cross Region Replication, requires versioning on source and destination buckets

  • Lifecycle Management — Can be used in conjunction with versioning — Can be applied to current/previous versions — Actions that can be done: — — Transition to Standard S3 IA after 30 days — — Archive to Glacier after 30 days — — Permanently Delete

CloudFront — Edge Location - location where content will be cached — Origin - Origin of all files that CDN will distribute — Distribution - name given to CDN which consists of collection of Edge Locations — — Web Distribution - typically used for websites — — RTMP Distributions - media streaming/flash files — Edge locations are not just read only, you can write to them too — Objects are cached for life of TTL (default 24 hours)

Securing Buckets — Newly created buckets are private by default — You can setup access control using: — — Bucket Policies — — Access Control Lists — Buckets can be configured to create access logs

Encryption — In Transit — — SSL/TLS — At Rest — — Server Side Encryption — — — S3 Manged Keys SSE-S3 — — — AWS Key Management Service, Manged Keys SSE-KMS — — — Server Side Encryption with Customer Provided Keys SSE-C

Storage Gateways

File Gateway - flat files, directly on S3 — Volume Gateway — — Stored Volumes - Entire dataset stored on site, async backed up to S3. Stores data as Amazon EBS snapshots in S3 — — Cached Volumes - Entire dataset stored in S3, most recent data stored onsite — Gateway Virtual Tape Library (VTL) - Used for backup and uses popular backup applications like NetBackup, Backup Exec, Veeam, etc. – Network requirements: Port 443, 80 (activation only) , 3260 (iSCSI targets), UPD53 (dns)

Snowball — Import to S3 or Export from S3 — Snowball — — 80TB, no compute — Snowball Edge — — 100TB, has compute — Snowmobile — — 100PB, semi-truck, only available in USA

S3 Transfer Acceleration — Speed up transfers to S3 using S3 transfer acceleration. Costs extra, great impact for people in distant locations.

S3 Static websites

  • Serverless
  • Cheap, scales automatically
  • Static only, no compute Security Token Service (SKS)

Grants users limited and temporary access to AWS services

  • Federation (typically Active Directory) – No need to create IAM accounts. Single sign on the AWS console. Combine users from 1 domain with users from another domain
  • Federation with mobile apps - FB, AMZ, Google or other OpenID providers
  • Cross Account Access: users from other AWS accounts

Identity Broker: Service that allows to take identity from A and join it (federate it) to B. Impersonate.

Identity Store: FB, Google, Active Directory, etc, all store the identity.

Identity: the user itself.

Identity broker needs to be programmed. The temp token returned by IAM Policy is valid between 1 to 36 hours STS returns 4 values if successful:

  1. access key
  2. secret access key
  3. token
  4. duration

Steps:

  1. Develop identity broken to communicate with LDAP & AWS
  2. Identity broker (IB) ALWAYS authenticate with LDAP first, then AWS STS
  3. App gets temporary access to AWS Resources

SAML:

Secure Assertive Markup Language

Web Identity Federation with mobile apps: you can auth app using things like FB, you need to code it of course

ARN:

Amazon Resource Name

AssumeRoleWithWebIdentity: You need to call this method after auth with FB. After that you can access the AWS resouces.

Snowball - Petabyte-scale data transport solution that uses secure appliances to transfer large amounts of data into and out of AWS. Addresses challenges with large-scale data transfers including high network costs, long transfer times, and security concerns.

  • Storage only, up to 80TB Snowball Edge - Snowball but with onboard compute functionality
  • Storage up to 100TB Snowmobile - Snowball for petabyte/exobyte amounts of data
  • Storage up to 100PB

Exam Notes

  • Understand what Snowball is

  • Understand what Import Export is (old name for Snowball)

  • Snowball can: — import to S3 — export from S3SNS

  • Notification service. Data type: JSON

  • Pub/Sub paradigm

  • PUSH mechanism. Instant

  • Push notification

  • Deliver SMS or Email, SQS, or any HTTP endpoint

  • Message stored reduntly across multiple AZ

  • Topic: access point

  • Pay as you go – 0.50 per 1 million request – 0.75 per 100 sms – 0.06 per 100,000 HTTP requests – 2 usd per 100,000 emails

  • Can use different format for different protocols (http/s, email, email json, SQS, lambda)

  • Each message contains: – Name – Type – Value

Topic -> Subscriber 1 (http) -> Subscriber 2 (email) --> Subscriber 3 (SQS)

You can apply a filter policy in a topic subscription (IE, only send critical errors to the managers)

SNS Vs SQS

  • SNS is push
  • SQS is poll

SQS

  • Message queue system. Message retained up to 14 days
  • 256K of text. Billed at 64K chunks
  • Does NOT guarantee first in, first out. If you need that, use a FIFO SQS queue
  • SQS pulls information
  • Supports auto scalling
  • Visibility timeout window: 12 hours max, default is 30 secs
  • “At least once” – each message is delivered at least once but maybe more. Keep that in mind
  • First million SQS hits: free. 0.50 USD per million, per month, after that
  • Single request can have from 1 to 10 messages.
  • Change visibility timeout with the “ChangeMessageVisibility” method
  • Enable LONG POLLING (20 secs) to wait for a message to become available. Raise an event as soon as a message arrives, good for saving money/requests.
  • Has SNS integration, SQS subscribes to SNS topic. When SNS msg arrives, distribute msg to suscribed SQS queues. 1 SNS -> N SQS Storage Gateway - Service that connects an on-premise software appliance with cloud-based storage to provide seamless and secure integration between an organizations IT environment and AWS storage infrastructure.

Types of Storage Gateways

  • File Gateway (NFS)
  • Volumes Gateway (iSCSI) — Stored Volumes — Cached Volumes
  • Tape Gateway (VTL)

File Gateway - Files are stored as objects in your S3 buckets, accessed through NFS mount point. Once files are transferred they can be managed as native S3 objects. Volume gateway - Presents your applications with disk volumes using iSCSI block protocol. — Can be asynchronously backed up as point-in-time snapshots of your volumes — Stored in cloud as EBS snapshots — Stored Volumes — — Store entire copy locally, asynchronouslybackup to AWS. Complete copy of data kept on-site. — Cached Volumnes — — S3 is primary data storage and store only most recent copy locally. Don’t need large storage arrays locally. Tape Gateway - Durabled, cost-effective solution to use existing tape-based backup solution, virtual tapes are sent to and stored in S3.

Exam Study Notes File Gateway - for flat files, stored directly on S3 Volume Gateway:

  • Stored Volumes - Entire dataset stored onsite and async backed up to S3
  • Cached Volumes - Entire dataset stored in S3 and most recent data cached onsite Gateway Virtual tape Library (VTL):
  • Used for backup and uses popular backup applications like NetBackup, Backup Exec, Veeam, etc.SWF (Simple WorkFlow)

Workflow system

Actor: application to start/initiate workflow could be website or mobile app, for example

Worker: program/person that interacts with WF Get task Process recieved task Return result

Decider: control coordination tasks Ordering, concurrency, scheduling

A task is designated once and never duplicated

Domain: Container where your WF runs Isolate set of types, executions, and task lists from others in same account Format: JSON

A workflow can run for ONE YEAR (measures in seconds)

Difference between SQS and SWF

SWF: * Task oriented API * Task runs 1 (never duplicated) * Keeps track tasks and events * Human interaction if needed (ie, “Pick item from the storage”)

SQS * Message oriented API * Message might be duplicated * Implement manual app trackingSolutions Architect - Associate (Understand VPC’s inside and out)

  • Analytics
  • Management Tools
  • Migration
  • Compute
  • Desktop & App Streaming
  • Application Integration
  • Security & Identify & Compliance
  • Networking & Content Delivery
  • Storage
  • Databases

Security Group

  • operates at the instance level
  • supports allow rules only
  • stateful
  • evaluate all rules before deciding whether to allow traffic
  • applies to an instance onlyExam Tips
  • Cannot enable flow logs for VPC’s that are peered with your VPC unless the peer VPC is in your account
  • you cannot tag a flow log
  • after flow log is created, you canot change its configuration
  • not all IP traffic is monitored
  • traffic from 169.254.169.254 not monitored
  • DHCP traffic not monitored
  • traffic to reserved IP address for VPC router is not monitoredExam Tips
  • NAT is used to provide internet traffic to EC2 instances in private subnets
  • Bastion is used to securely administer EC2 in private subnets (jump box)NAT gateways
  • Preferred by the enterprise
  • Scale automatically up to 10Gbps
  • No need to patch
  • Not associated with security groups
  • Automatically assigned a public ip address
  • Remember to update route tables and point to NAT Gateways
  • No need to disable source/destination checks
  • More secure than a NAT instanceVPC
  • Logically isloated section of the AWS Cloud where you can launch AWS resources in a virtual network that you define.
  • Virtual Data Center in the cloud
  • Amazon provides you a default VPC in every region when you create your account
  • Can create hardware VPN between corporate datacenter and your VPC, make AWS your extension

What can be done:

  • Launch instances into subnet of choice
  • Assign custom IP address range to each subnet
  • Configure route tables between subnets
  • Create internet gateway and attach it to our VPC, one per VPC
  • Better security control over AWS resources
  • Instance security groups
  • Subnet Access Control Lists (ACLs)

Route Tables determines if a subnet is REACHEABLE Network ACL determines if traffic CAN ENTER a subnet

Default VPC Vs. Custom VPC

  • Default VPC is user friendly, allows you to immediately deply
  • All subnets in default VPC have a route out to the internet
  • Each EC2 instance has public and private IP address

VPC Peering

  • Allows you to connect one VPC with another via a direct network route using private IP Addresses
  • Instances behave as if they were on the same private network
  • You can peer VPC’s with other AWS accounts as well as with other VPC’s in same account
  • Peering is in star configuration: 1 central VPC peer with 4 others—no transitive peering
  • Use C5 or M5 instances for VPC peering

Bastion hosts Bastion hosts are used to security administer EC2 instances via SSH or RDP. Can also be called jump box. More secure than opening all your servers to the world to SSH/RDP

NAT Instances

  • When creating a NAT instance, disable source/destination check on the instance

  • NAT instances must be in a public subnet

  • There must be a route out of the private subnet to the NAT instance for this to work

  • The amount of traffic that NAT instances can support depends on the instance size

  • You can create HA using:

    • Autoscaling Groups
    • Multiple subnets in different AZs
    • Script to automate failover
    • Behind Security groups
  • NAT gateways

    • Preferred by the enterprise
    • Scale automatically up to 10Gbps
    • No need to patch
    • Not associated with security groups
    • Automatically assigned a public ip address
    • Remember to update route tables and point to NAT Gateways
    • No need to disable source/destination checks
    • More secure than a NAT instance

Difference between NAT Gateway and Internet Gateway

  • Both are highly available architectures
  • Both are used to enable instances in a private subnet to connect to the internet or other AWS services
  • An Internet Gateway (IGW) allows resources within your VPC to access the internet, and vice versa.
    • In order for this to happen, there needs to be a routing table entry allowing a subnet to access the IGW.
  • NAT Gateway is only from instance to internet (you can download things from the EC2, but internet can’t access the server).
    • The internet at large cannot get through your NAT to your private resources unless you explicitly allow it.
  • Nat Gateway can only scale up to 45GB. Keep in mind if bandwidth is an issue.

Network ACL’s

  • VPC automatically comes with default network ACL and by default allows all in/outbound traffic

  • You can create custom network ACL’s. By default each network ACL denies all in/outbound traffic

  • Each subnet in VPC must be associated with a network ACL, uses default ACL by default

  • You can associate network ACL with multiple subnets, however subnet can only associate with one ACL at a time

  • Adding subnet to a second ACL will automatically remove it from the previous ACL

  • network ACL contains numbered list of rules that is evaluated in order, starting with lowest number

  • network ACL always have separate inbound and outbound rules

  • network ACL’s are stateless

  • VPC Interface Endpoints – API Gateway, Cloudwatch, Config, Kinesis, SNS, etc.

  • VPC Gateway Endpoints This is used so the traffic does not go out to the Internet and back in (remains in the private network, faster and more secure) – S3 – DynamoDb