Cloud Architecture
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Alternative Title |
| |
Title of Series | ||
Part Number | 4 | |
Number of Parts | 10 | |
Author | ||
License | CC Attribution 3.0 Germany: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor. | |
Identifiers | 10.5446/62902 (DOI) | |
Publisher | ||
Release Date | ||
Language |
Content Metadata
Subject Area | |
Genre |
1
2
3
4
5
6
8
10
00:00
ArchitectureNeuroinformatikPoint cloudSelf-organizationSystem programmingService (economics)Term (mathematics)DivisorFaktorenanalyseInformation securityScalabilityElasticity (physics)Data recoveryPhysical systemMultiplicationInformation securitySelf-organizationDivisorService (economics)Computer architecturePoint cloudCloud computingData recoveryFault-tolerant systemCartesian coordinate system2 (number)XMLComputer animation
02:35
Term (mathematics)High availabilityService (economics)Structural loadElasticity (physics)Time zoneInstance (computer science)Point (geometry)Data centerSingle-precision floating-point formatDiagramRoutingLastteilungScalabilityMultiplicationServer (computing)WorkloadCartesian coordinate systemComputer animation
04:12
ScalabilityChannel capacityRead-only memoryChannel capacityScalabilityMoore's lawSemiconductor memoryServer (computing)NeuroinformatikType theoryVertex (graph theory)Scaling (geometry)Computer animation
05:20
Channel capacityElectric currentRead-only memoryElasticity (physics)Elasticity (physics)Channel capacityScalabilityOcean currentRule of inferenceMetric systemServer (computing)Scaling (geometry)Group actionSemiconductor memoryFault-tolerant systemNeuroinformatikComputer animation
06:48
Time domainSheaf (mathematics)Computer networkPhysical systemStability theoryTime zoneConnected spaceLink (knot theory)Independence (probability theory)Multitier architectureDiscrete groupBackupUltraviolet photoelectron spectroscopyEvent horizonPhysical systemSoftwareSheaf (mathematics)Domain nameFault-tolerant systemHigh availabilityElectric power transmissionCloud computingTime zoneBuildingInternet service providerBoundary value problemPower (physics)Different (Kate Ryan album)Multitier architectureCartesian coordinate systemGroup actionData centerStability theoryDialectTerm (mathematics)Independence (probability theory)Level (video gaming)Server (computing)Entire functionEvent horizon2 (number)Discrete groupComputer animation
10:46
Data recoveryInsertion lossAreaCASE <Informatik>Data centerMultiplication signData recoveryBackupPlanningPoint cloudAnalytic continuationCartesian coordinate systemVirtual machineLevel (video gaming)Duality (mathematics)Computer animation
12:43
Maxima and minimaData recoveryContinuous functionService (economics)SubsetPoint (geometry)Insertion lossAnalytic continuationIntelligent NetworkMultiplication signMaxima and minimaSubsetInsertion lossIncidence algebraData recovery2 (number)Object (grammar)Medical imagingPoint (geometry)Cartesian coordinate systemFrequencyBackupPhysical systemComputer configurationVideo gameScheduling (computing)Computer animation
15:10
BackupMultiplicationWebsitePersonal digital assistantService (economics)Event horizonScale (map)Real numberData recoveryComputer configurationSource codeTime zoneData storage deviceInstance (computer science)Point cloudCommon Language InfrastructureLocal GroupPhysical systemElasticity (physics)Structural loadDatabaseReplication (computing)Electric generatorComputer configurationGraph (mathematics)Data recoveryServer (computing)Group actionElasticity (physics)WebsiteMultiplicationInstance (computer science)Channel capacityScaling (geometry)2 (number)Type theoryNumberCartesian coordinate systemRoutingMultiplication signFront and back endsData storage deviceBackupPoint cloudService (economics)Event horizonSemiconductor memoryPhysical systemInternational Date LineReal-time operating systemVirtual machineProduct (business)Revision controlMedical imagingVolume (thermodynamics)CASE <Informatik>CodeChemical equationDebuggerRight angleComputer animation
22:14
Data recoveryPoint (geometry)Maxima and minimaSubsetInterrupt <Informatik>Self-organizationKolmogorov complexityBackupMultiplicationWebsiteInterrupt <Informatik>SubsetData recoveryMultiplication signObject (grammar)Service (economics)Maxima and minimaPhysical systemBackupSelf-organizationComputer architectureWindowComputer configurationMedical imagingCartesian coordinate systemType theoryMereologyPoint (geometry)Decision theoryDiagramInsertion lossComplex (psychology)Computer animationDiagram
24:33
Lambda calculusGateway (telecommunications)Electric generatorEmailPoint cloudWave packetComputer architectureDiagramFront and back endsDebuggerError messageMessage passingLambda calculusService (economics)WebsiteEmailFunctional (mathematics)Client (computing)Metric systemPoint cloudGateway (telecommunications)CodeComputer animationProgram flowchartJSONXMLUML
Transcript: English(auto-generated)
00:00
Who is a solution architect? Solution architect is a role in a technical organization that architects a technical solution
00:22
using multiple systems via research, documentation, experimentation. And who is a cloud architect? Cloud architect is a solution architect that is focused solely on architecting technical
00:40
solutions using a cloud service. A cloud architect need to understand the following terms and factor them into their designed architecture based on business requirements. First, this is availability. Your ability to ensure a service remains available.
01:04
Also known as highly available service or system. Then scalability, your ability to grow rapidly or unimpedant.
01:20
Then elasticity, your ability to shrink and grow to meet the demand. Then fault tolerance, your ability to prevent a failure. And disaster recovery, your ability to recover from a failure, also known as highly durable.
01:43
A solution architect needs to always consider the following business factors. First, security. Security is the most important thing in any architecture, in any application, security is the most important thing.
02:02
So how secure is solution? And cost. Because when we pay as we use and for what we use that we pay for and we are built by minutes, by second, by hour, by day, we always must know the cost of our system.
02:29
So how much is this going to cost? So let's deep dive into each term more deeper. First, this high availability.
02:45
High availability, this is your ability for your service to remain available by ensuring there is no single point of failure and or ensure a certain level of performance.
03:04
Let me show you this diagram. Here we have Route 53. After Route 53, we have elastic load balancer and we have some compute, some instance in
03:20
Availability Zone 1, Availability Zone 2, and Availability Zone 3. So elastic load balancer, this is a load balancer, allows you to eventually distribute traffic to multiple servers in one or more data centers. If a data center or server becomes unavailable and healthy, also called, the load balancer
03:44
will load the traffic to only available data centers with servers or to available just servers if all servers in one data center. Running your workload across multiple availability zones ensure that if one or two availability
04:04
zones become unavailable, your service applications remains available. High scalability. High scalability is your ability to increase your capacity based on the increasing demand
04:23
of traffic, memory, and computing power. We have two types of scaling, vertical scaling and horizontal scaling. Vertical scaling, scaling up, named upgrade to a bigger server.
04:44
For example, you have one server with two gigabytes of memory. You increase this server to four gigabytes of memory. This named vertical scaling. Horizontal scaling, named scaling out.
05:02
Then you have one server with two gigabytes of memory. And now you create three servers with two gigabytes of memory each. So add more servers of the same size.
05:20
High elasticity. High elasticity is your ability to automatically increase or decrease your capacity based on the current demand of traffic, memory, and computing power.
05:41
To reach this, in AWS, we have auto-scaling groups. This is an AWS feature that will automatically add or remove servers based on scaling rules
06:02
you defined based on metrics. Metrics can be scheduled or unscheduled. This means based on metrics what give you, for example, another resource.
06:20
Scaling out. Add more servers of the same size, scaling in, removing underutilized servers of the same size. Vertical scaling is generally hard for traditional architecture, so you will usually only see horizontal scaling described with elasticity.
06:47
Forward Tori runs. In forward Tori runs, we have two definitions. First, let's say what is forward domain. A forward domain is a section of a network that is vulnerable to damage if a critical
07:05
device or system fails. The purpose of a forward domain is that if a failure occurs, it will not cascade outside the domain, limiting the damage possible.
07:23
We can have forward domains nested inside forward domains. Second definition, forward level. Forward level is a collection of forward domains.
07:41
The scope of a forward domain would be specific servers in a rack, an entire rack in a data center, an entire room in a data center, the entire data center building.
08:01
It's up to the cloud service provider to define the boundaries of a domain. In terms of AWS, an AWS region would be a forward level, for example, US East 1 region. An availability zone would be a forward domain.
08:22
For example, forward domain, US East 1A availability zone and US East 1B availability zone.
08:41
Each Amazon region is designed to be completely isolated from the other AWS regions. This achieves the greatest possible fault tolerance and stability. Each availability zone is isolated, but the availability zones in a region are connected
09:02
through raw latency links. Each availability zone is designed as an independent failure zone. A failure zone is AWS describing a forward domain. Failure zone.
09:20
Availability zones are physically separated within a typical metropolitan region and are located in lower risk food plains. This creates an uninterruptible power supply, UPC, and on-site backup generation facilities.
09:44
Data centers located in different availability zones are designed to be supplied by independent substations to reduce the risk of an event on the power grid impacting more than one
10:03
availability zone. Availability zones are all redundantly connected to multiple tier 1 transit providers. Multi-availability zones for high availability.
10:21
If an application is partitioned across availability zones, companies are better isolated and protected from issues such as power outages, lightning strikes, tornadoes, earthquakes, and more.
10:44
High durability. High durability is your ability to recover from a disaster and to prevent the loss of data. Solutions that recover from a disaster is known as disaster recovery solutions.
11:06
Disaster recovery, answering the questions. Do you have a backup? How fast can you restore that backup? Does your backup still work? How do you ensure current live data is not corrupt?
11:27
In disaster recovery plan is very important from time to time restore your backup. Because backup can be broken. Application that make backup can broke your backup or even your data can be broken.
11:44
And you can and this data can be restored. And this very important to time to time from time to time restore backups to be sure you
12:01
don't have broken backups and you can restore your backup and make live your system again. In AWS cloud, we have cloud and do disaster recovery continuously replicates your machines
12:21
into a low cost staging area in your target AWS account and preferred region, enabling fast and reliable recovery in case of IT data center failures. Business continuity plan.
12:43
A business continuity plan, BCP is a document that outlines how a business will continue operating during an unplanned disruption in service. In BCP, we have two definitions first, recovery point objective, LPO, the maximum acceptable
13:09
amount of data loss after an unplanned data loss incident expressed as an amount of time.
13:21
Second definition recovery time objective LTO, the maximum amount of downtime your business can tolerate without incurring a significant financial loss.
13:40
So let's see the image. We have lifetime of our application. And in some period of time, appears disaster recovery, recovery point LPO.
14:03
So each time each time in our schedule, we make some recovery points. Just make backups, for example, one time in one hour.
14:23
But after our backup was made, after some period of time, appears disaster. So we have data lost here. So this data, we have already lost.
14:43
After disaster, we have some period of time, and this time called downtime, before we recover our system. And time from disaster to make our system arrive again, named recovery time.
15:07
Disaster recovery options. So here we have a graph. And we have four options in general, we have four general options for recovery.
15:24
So first, backup and restore option. LPO and LTO, this is hours, takes hours.
15:40
For priority use cases, provision all AWS resources after event, restore backups after event. And plus of this option, this very low cost option.
16:01
Then pilot right, LPO, LTO takes decades of minutes. Data is alive, services idle, provision some AWS resource and scale after event, after disaster event.
16:20
And this solution cost more than backup and restore. Then we have long standby solution. LPO, LTO takes minutes. We have always running system, but smaller, for example, machines with memory for production
16:52
version is double than for idle machines. This use for business critical services.
17:05
We scale AWS resources after event. And this more cost than pilot right.
17:21
Then we have multi-site, active, LPO, LTO, this real time. Because we have zero downtime, near zero data loss, and this for mission critical services.
17:46
And this very expensive solution. Because in general, you have two the same applications running, two the same architectures.
18:02
So let's see the image, backup and restore. We have AWS cloud, we have region one and region two. And we have only cross region backup. And we have in another region just data.
18:25
Then disaster appears. We have redeploy all infrastructure. For example, we have here EBS volumes, instances, LDS instance, DB snapshots, EBS snapshots is actually saved into Amazon Simple Storage.
18:45
So also instances this application. We have to use infrastructure as a code approach, redeploy everything in another region, just grab this data, what was the code previously.
19:08
Then we have pilot right solution. We have on top route 53. And route 53 always route traffic to Elasti code balancer in region one.
19:25
And application works fine with after scaling groups. And this after scale and in region two, resources for front end server and application server
19:46
is scaled to zero. After disaster in route 53, we'll throw traffic to another Elasti code balancer in second
20:02
region and just scale up our instances, our front end server and application server. And data always asynchronous cross region replicated.
20:24
Then we have worm standby. It's very similar to previous solution, except one thing, after scaling groups in second region doesn't scale to zero.
20:43
They already up and running, but on very small and very little capacity, for example, in other types of instances or if we scale a number of instances that just one instance,
21:06
code can accept immediately connections. And then we have multisite active-active.
21:21
We have two the same infrastructure and two the same applications in two different region. Why name active-active? Because route 53, in the same time, balance traffic to Elasti code balancer in region
21:40
one and Elasti code balancer to region two. And in the end, Elasti code balancer balance across the front end server and application server. And this is the most effective solution, but this is the most expensive solution.
22:07
LTO, LTO is revised. Let's revise. LTO, recovery time objective is the maximum acceptable delay between the interruption
22:23
of service and restoration of service. This objective determines what is considered an acceptable time window when service is unavailable and is defined by the organization. LPO, recovery point objective is the maximum acceptable amount of time since the last data
22:47
recovery point. This objective determines what is considered an acceptable loss of data between the last recovery point and the interruption of service and is defined by the organization.
23:06
So how to choose what option of recovery to take? So we have image on y-axis we have cost and complexity on x-axis we have time to recover.
23:28
And when we have request from business about acceptable recovery cost and recovery time objective, we can see what type of recovery we should take.
23:45
So here on this image we should take something between pilot ride and backup and restore. So some part of systems, if this is microservices infrastructure, some part of system might be using pilot ride approach and some part of system might be use backup and restore.
24:10
If this is not possible to divide system into two types of restore, we can choose one
24:21
and but this also this decision made by organization. Let's see a diagram or architecture diagram of simple microservices website with front
24:46
end and back end of this website runs on AWS Lambda functions. So we have here a client side, client side request front end from Amazon CloudFront and
25:07
sends API access via Amazon API Gateway. From Amazon API Gateway we send request to AWS Lambda, AWS Lambda can request or put
25:27
data into DynamoDB, if Lambda request data from DynamoDB it send to Amazon API Gateway and Amazon API Gateway show this data on front end.
25:40
AWS Lambda writes works and have its own metrics and this all metrics save in Amazon CloudWatch. In Amazon CloudWatch we can set up CloudWatch alarm with notification.
26:04
If error will appear in AWS Lambda, this error will be written in Amazon CloudWatch, then alarm will triggered and send message about Lambda is failed to first AWS Lambda
26:29
and Lambda send message to search channel and send to Amazon SNS simple notification service and simple notification service will send email about Lambda status.
26:47
Thank you.
27:17
Thank you.