YouTube transcript

Fundamentals of Backend Architecture - How to Design Scalable Software

[0:00]So software engineering has been [0:01]changing very fast especially in the [0:04]last few months and I think we at the [0:07]time that we need to slow down and [0:08]understand what is happening. We have [0:11]all of these machines writing code for [0:13]us. We talk to them and they write code [0:16]and I use them myself every day. I think [0:18]it's wonderful. uh there's definitely [0:20]some productivity boosts but sometimes [0:23]we also see kind of these messages and [0:25]outages uh these errors everywhere on a [0:28]massive platform forums are happening [0:30]more and more um so I want to kind of [0:33]make a video to kind of bring some good [0:35]and kind of bring some knowledge into [0:38]software architecture I think it's more [0:40]important than ever to understand what [0:43]we're building and not how to build it I [0:46]have a lot of videos on my YouTube [0:48]channel and I always even before AI [0:50]showed you the theory, the pros and the [0:53]cons of every decision I've made and [0:56]this is really deep to software [0:58]engineering and I wanted to kind of give [1:00]you an introduction for those that might [1:02]not be so techsavvy to software [1:05]architecture. So in this video we're [1:07]going to build um we're going to first [1:09]try to define what is software [1:10]architecture and then we're going to [1:12]build a Google drive kind of clone just [1:14]on the ideation level step by step. uh [1:17]we're going to start very small just [1:20]with the server and then we're going to [1:21]get into a micros service architecture. [1:23]So every decision along the way I'm [1:25]going to try to uh distill and show you [1:28]my reason behind it. Now for me when it [1:31]comes to software design I always [1:33]reference Martin Fer and I want to just [1:36]show you a phrase from this uh from this [1:38]article and you can get it on the [1:40]description as always but I want to try [1:42]to define architecture first and why [1:44]it's useful and then what we're going to [1:46]do the rest of the video is that we're [1:48]going to architect ourselves a Google [1:50]Drive clone. So we're going to be able [1:51]to upload files, download files, and [1:55]kind of we're going to start very small, [1:56]just one server, data on the server, and [1:59]then we're going to gradually get into a [2:01]monolith architecture. We're going to [2:03]talk about uh rate limiting, caching, uh [2:06]horizontal scaling, vertical scaling, a [2:08]lot of stuff. And kind of um the goal is [2:11]to show you the decisions, the gradual [2:14]evolution, and the multiple uh ways that [2:17]you can think about designing software. [2:19]It's not just about the codes, it's [2:20]about thinking in systems. Uh and then [2:23]we can also think about code in in [2:24]systems as well. Uh but yeah, without [2:27]further ado, let me just if you want to [2:29]get the article, I'm going to leave in [2:30]the description. But let me just uh kind [2:33]of highlight a bit of this. So uh a good [2:37]architecture is important and otherwise [2:40]it becomes slower and more expensive to [2:43]add cap new capabilities in the future. [2:46]This is really kind of the the [2:48]experience that most of us have had [2:50]building a production software, right? [2:52]Which is that if we don't build a good [2:54]architecture, the business is going to [2:56]suffer. And this is really why [2:58]architecture is important. It's to [3:00]support the business so that revenue can [3:02]continue, the business can stay uh [3:04]running smoothly and that we don't get [3:06]kind of these messages to our clients.

[3:08]Right? Of course, I'm not giving the [3:11]blame to these guys. These are companies [3:13]that know that what they're doing. Um so [3:16]to start off let's understand how to [3:18]design the software. Basically this is [3:20]going to be kind of a crash course for [3:21]that and we're gonna as I said we're [3:23]going to do a gradual uh gradual [3:25]evolution. Of course if there's no [3:28]better one than the previous one. This [3:30]is of course I don't want to [3:32]overengineer and I want to kind of teach [3:34]you the intentional process of decision [3:37]making. An architecture might be the [3:39]best for you at the current stage of the [3:42]company you're working at or it might [3:45]not. So this is really where decision [3:47]comes in and where you need to [3:48]understand the pros and cons. And [3:51]there's whole books written about this [3:53]like this one here. It's one of my [3:55]favorite ones. You probably know about [3:56]it. So of course it's a heavy topic to [3:59]talk. I'm going to stay mostly on the [4:01]ideation level. But let's get started. [4:04]Let's start very simple. So I'm going to [4:06]have here a server. So this is how you [4:09]might your first program might have [4:11]started. You have your just a server and [4:14]then you have users. So let me just and [4:17]what happens is that your users are [4:18]going to make requests to your server.

[4:20]So very simple. This is going to be our [4:21]Google drive. Uh very simple. So all the [4:25]data is inside of the server. So this is [4:27]all one instance and this works this [4:31]works well. But I'm going to start [4:33]asking some questions which is what [4:36]happens if the server dies for example [4:39]and what happens when you want to scale. [4:42]So let's start with the first one [4:43]because if you see our data is coupled [4:47]to the server. So the data is actually [4:48]inside of of the server. So we could [4:50]imagine this in go would be something [4:52]like something like this. So this is a [4:54]very simple Golang uh strct where we [4:57]have our data inside of the server. So [5:00]the files this is a map of ID to the [5:03]location of the file and of course if [5:05]the server dies the data is also going [5:08]to be lost. [5:10]Basically this is a very simple solution [5:13]and you of course you already know how [5:14]to solve probably just add a database. [5:16]But the idea here is not just adding a [5:18]database is that you are going to [5:20]decouple the server that's going to be [5:23]stateless to the database. There you go. [5:26]So basically now we have our server and [5:30]our database and whenever we make [5:32]requests we're going to get the data [5:34]from the database so that our users [5:37]don't lose the data whenever they make a [5:39]request and our server uh is down. So [5:42]now we have data persistence and our [5:44]first question our first problem is [5:46]actually solved. So very simple we're [5:48]going to get there this going to become [5:50]more interesting I promise you but it's [5:51]also important to understand why you you [5:54]why do you want to add the database at [5:56]all. So it's also important to [5:57]understand these little details and what [6:01]happens as well when you want to scale. [6:03]This is our second question which is if [6:06]you actually scale this instance here [6:07]with the data what's going to happen is [6:09]that you're going to have two servers [6:11]for example. So we're going to [6:12]horizontal scale. We're going to get [6:13]there. But if you want to have multiple [6:16]instances, basically we're going to have [6:18]duplicated data. And of course, this is [6:20]not the desirable. Um it will work [6:23]because you're going to have still the [6:25]data there. And the request comes to the [6:27]second server, the date is still going [6:29]to be there, right? So if you make a [6:31]request to this one, but the problem it [6:33]comes here when you have data [6:35]synchronization. So if the user writes [6:36]to this machine, the date is going to be [6:39]written here, but the other one is not [6:41]going to know. So this is another [6:43]problem that we have to solve and it is [6:44]solved by having a decoupled [6:47]um instance right here. This means that [6:51]we can actually scale our server and the [6:54]user whatever instance he uses it [6:56]doesn't matter because we have one [6:59]singular relational database. So a [7:01]database that is going to hold the user [7:03]tables, the file tables. So everything [7:05]is going to be hold here and what we can [7:08]learn from this is that this is the [7:10]classic separation of concerns in [7:12]software engineering which is the server [7:14]is concerned with serving the user while [7:17]the database is concerned with storing [7:19]the data. This means that if the server [7:22]um dies or it's removed, the database is [7:25]not concerned with that because it [7:27]doesn't know about the server. The [7:28]server knows about the database. But if [7:30]the server or the database gets down, of [7:33]course, this is more this is uh [7:35]different because if there's no [7:36]database, there's no way to to serve the [7:38]user. But in some cases, the requests [7:40]don't need to go to the database. So [7:42]there's some decoupling here. This is [7:44]kind of the the key takeaway from this [7:47]first lesson. So here is the concept [7:49]data couple to the machine equals no [7:51]scale and no resilience. Let's take [7:54]things a bit further. So here is our [7:56]simplistic architecture. Now let's [7:58]imagine the following. Our application [8:00]is doing well. We're getting a lot of [8:02]users. It's getting popular on the app [8:05]store but we are getting complaints that [8:07]our app is slow to respond. So basically [8:10]the server is not handling all of these [8:12]requests as it should. Of course, let's [8:15]actually scale it. So we're going to [8:16]scale to a second instance. Now what [8:20]happens here and we have already seen [8:22]this before that we can scale [8:23]independently our server that is good [8:26]but there's a whole lot of problems when [8:29]we add more than one server. We starting [8:31]to get into a distributed architecture [8:34]per se because we have multiple servers [8:36]and in this case they don't need to talk [8:37]to one another but we need to have some [8:40]way in the middle to know who is going [8:43]to be the server. So imagine that most [8:44]of the users are actually going to this [8:46]one and this one has a lot of uh free [8:49]resources to use. The users are going to [8:51]get a degraded experience because this [8:53]server is not being a rout. And we also [8:56]have to have a piece in the middle here [8:59]that is going to decide which server is [9:02]going to be the one serving the user. So [9:04]this is usually called a load balancer. [9:07]Do we actually have here? Yep, we do. [9:10]Awesome. So this is usually called a [9:12]loan balancer and basically let's put [9:14]our architecture here. Let me actually [9:16]move our servers uh and duplicate it. [9:20]Now we have another piece in the middle [9:22]that actually is responsible to just [9:24]redirecting to a healthy machine. So [9:28]whenever you hear load bouncer just this [9:29]piece in the middle that is going to [9:31]redirect to an a healthy machine or in [9:34]this case it might use that on tour or [9:36]it might just redirect to kind of a [9:38]roundroin algorithm. So what this means [9:41]is it's an algorithm that is going to uh [9:44]on the load balancer that if the user A [9:46]comes in, it's going to redirect to [9:48]server one. If another user comes in [9:50]again, we're going to redirect to the [9:52]other one. So kind of sequentially [9:54]alternating between the server so that [9:55]they both get the same traffic. But is [9:59]the traffic actually the same? If user A [10:02]actually uploads a file, so let's say [10:04]that he's going to do let's start kind [10:06]of designing our API as well. if he [10:09]starts uploading a file whilst user B is [10:12]just getting a file. So he's just [10:14]consuming [10:16]um a file, right? So it's getting file [10:18]1, two, three. Is this actually the same [10:20]traffic? Is this actually the same [10:22]request amount? Of course not. This is [10:25]going to be more um storage uh [10:28]expensive. This might be more CPU [10:30]expensive. So there's here different uh [10:33]experiences. Of course, round robin is [10:34]pretty good already and it solves our [10:38]problem. But our load balancer could [10:40]also be more smart and use health checks [10:42]to understand how our server servers are [10:45]doing. And if we have this server uh [10:47]doing much better than the other one [10:49]because this is getting a file is being [10:52]uploaded then the load balancer this [10:55]piece here knows exactly to send to this [10:58]one. So just kind of show you that [11:00]there's multiple load balancers out [11:02]there to use different techniques. And [11:04]uh an interesting thing about load [11:06]balancers as well is that they can do [11:08]much more. For example, we're not [11:10]getting into microser architecture yet. [11:12]But if we have a server and this kind of [11:15]solves the problem that I have just [11:16]said, which is we're going to have a [11:18]service called files that is just going [11:20]to handle files. [11:23]Let me actually put this a bit below [11:25]here so we have more space. If we know [11:27]that this user is going to upload a file [11:29]and it's going to make our servers [11:30]slower, we can actually just redirect [11:33]him to a specific or specialized service [11:36]that just knows how to do files upload [11:38]whilst the other ones are handling [11:41]traffic normally and they are healthy.

[11:43]So load balancers can actually do [11:46]routing on path names as well. So if [11:50]your API starts with a post and it's for [11:52]an upload, you can actually just [11:54]redirect to this server. Now to not [11:57]spend much more time on the loan [11:59]balancers, let me just show you that [12:01]there's other three ways to do this. Uh [12:03]there's the weight to broad robin. [12:05]There's stick sessions. I think this one [12:07]is actually pretty interesting. I've [12:09]actually worked on something similar [12:10]pretty recently. And basically the [12:12]reason you want to have sticky sessions [12:14]and what happens is that if user A comes [12:16]to load balancer he's going to be [12:18]redirected to the server the next time [12:20]the load balancer is going to try to do [12:22]best effort to redirect him to the same [12:24]server. Why we want to do this mostly [12:27]because if the server has some states [12:29]for this user he's doing some work it [12:32]might be interesting to redirect him to [12:33]the same server. [12:35]Um and then again we have connection. So [12:38]basically kind of hey if this server has [12:40]less connections let's actually use that [12:42]one because it's more uh it's available. [12:45]Okay to exemplify what I'm talking about [12:48]I think it makes sense to show you some [12:49]code and a web server actually running [12:52]in the real world to to make sense. I [12:54]think this is going to click for you. So [12:56]I have here three Golang services. So [12:59]just three servers that are exactly the [13:01]same. So they have the exact same code. [13:04]Uh it's pretty simple. So just a simple [13:06]HTTP server. It has some session [13:08]handling that I'm going to show you for [13:10]the cookies and the sticky sessions. [13:12]Just some HTML page that I generated and [13:14]then this is just a very simple uh [13:16]server that gets cookies and create [13:18]sessions. Interesting part is here on [13:20]the engine X because here we have two [13:22]upstream blocks. This is going to be two [13:25]groups of servers. This basically means [13:27]that our engine X web server proxy is [13:30]going to be able to load balancer uh to [13:32]load balance in case the URL has RR. So [13:35]this basically means run dropping or the [13:37]sticky session. So just to show you how [13:40]it's going to work in practice. So let [13:42]me actually do a docker compose up. Uh I [13:45]think the server was not running. And [13:48]there we go. So if I have just open the [13:51]browser at /rr and if I refresh the page [13:54]you can see that every time I'm going to [13:57]get one, two, and three. So you can see [14:00]that the round robin is doing its work. [14:02]So it's getting to server one, server [14:04]two and server three. This is what it's [14:06]doing. However, if I apply the session [14:09]cookie, I have just enabled here the uh [14:11]uh the location of the file. So you can [14:13]see it's / sticky. Let me actually [14:16]refresh. Now we get server tree. I'm [14:18]going to refresh again when we always [14:20]get server tree. So you can see how this [14:22]works. You can see here the number of of [14:24]hits. This is the session cookie [14:26]identification. So this could be your [14:28]authentication token for example. And [14:30]every time the load balancer is [14:32]redirecting me to the same server. So [14:34]this is the session affinity uh load [14:38]balancing. Now just to show you again I [14:40]have just cleaned the cookies. Let me [14:41]actually refresh and let's see what [14:42]happens. So I got server one. I'm going [14:44]to refresh again. I got server three. [14:46]And as you can see it really likes [14:48]server three for some reason. This is [14:51]exactly what the sticky session uh [14:54]works. So now that you know how to [14:56]coordinate traffic across our servers, [14:58]we can actually start scaling like crazy [15:00]because we know that we have a new piece [15:03]a new building block which is this guy [15:05]the load balancer and he can resurrect [15:07]traffic. So we have the power of the [15:09]load balancer unlocked and this means [15:11]that we can actually have the servers we [15:13]want and as long as we can pay for them [15:16]but [snorts] this means that I want to [15:18]to make the distinction between two [15:20]types of scaling that we can do. We can [15:21]do horizontal scaling which is what we [15:23]have been doing. So adding more machines [15:25]to your cluster to your business and [15:28]this makes sense because if you have [15:30]more machines you have more resources [15:32]more power because each machine is [15:34]independently from one another. So we [15:36]can do some computations here. Well this [15:38]one is free to serve another user or [15:40]even uh another cluster of users. Right? [15:44]There's another way that you can do this [15:45]and most of the time you're going to use [15:47]both of them. So there's not uh one way [15:50]to go about it. This one is a bit kind [15:53]of throwing money at a problem. You [15:55]might hear this this expression which is [15:58]increasing the server. So we have one [15:59]server but it's a beefy machine. It has [16:02]more uh more RAM, more CPU, more [16:06]storage. So you are increasing the [16:08]compute power of your machine. So it [16:11]makes sense to have both solutions of [16:13]beefier machines but horizontally [16:15]scaling them as well. And this is [16:17]usually called the vertical scaling. Um, [16:22]and this is horizontal. Okay, so this is [16:24]just this description that I wanted to [16:26]make sure you understand. And this [16:28]basically allows us to actually make the [16:30]load balancer a bit more more smart [16:32]because what more often than not, you [16:35]know that having more machines equals [16:37]more money spent. Beefier machines [16:39]equals more money spent. So we want to [16:41]kind of have as least machines as we [16:43]can. So the minimum amount is perfect. [16:45]And autoscalers usually allow you to [16:48]have or load balancers allow you to [16:49]autoscale. So you can have a minimum of [16:52]two for example. But if you're getting a [16:54]lot of traffic you can actually increase [16:56]to 10. This is how you usually do it. So [16:59]with autoscaling uh Kubernetes does [17:01]this. I know Google cloud does as well [17:03]either with servers or even serverless [17:06]solutions which basically means you [17:08]don't have to serverless solutions [17:10]basically means you don't have to care [17:11]about the service the server about [17:13]provisioning it. the the provider does [17:15]that for you. Just put a Docker image or [17:18]a Docker container and the server is [17:21]going to take care of that. But that's [17:23]for another time. Basically, what this [17:25]means is that you can have our load [17:26]balancer actually knowing how much is [17:30]too much traffic and then he's going to [17:32]increase or he's going to provision [17:34]instances for you. So, pretty [17:36]interesting. Now, let's say that our [17:38]application is profitable. It's working. [17:41]Our customers are happy. We are scaling [17:43]very well. We're making a lot of [17:44]revenue, but we are starting to hiring [17:48]more people. So, we have more members, [17:51]more engineers, and we also are having [17:53]some some endpoints are being slow when [17:55]we're getting reported on that. So, what [17:58]is happening here? And we kind of open [17:59]the doors for a whole new world, a [18:01]dangerous world, which is microservices. [18:05]And we started that journey with our [18:06]file service, which basically is that [18:08]some customers mentioned that the [18:10]uploads were getting slower. So whenever [18:13]we did uploads of files they were [18:16]getting slow because most of the servers [18:18]were handling everything. Now what we do [18:21]with microservices is that we're going [18:22]to solve this pain of having multiple [18:25]specialized services that know how to do [18:28]one thing and one thing very well. So [18:30]we're going to have for example if we [18:32]start thinking about our business and [18:34]microservices is a whole new world of [18:36]topic and I want to mention that I [18:39]actually have I have a 20our course on [18:42]just microservices alone we use golong [18:45]of course uh it's the first link in the [18:47]description if you are interested it's [18:48]the the first three hours are completely [18:50]free so we can actually watch them [18:52]there's a lot of theory and it's really [18:54]important to understand that [18:55]microservices is a complex topic most of [18:58]the times you might want to use [18:59]microservices is when the when you're [19:01]working on a big business and when I [19:04]mentioned that we were hiring in our [19:06]hypothetical case it's because it's [19:08]really interesting to have for example a [19:10]team just working for the files another [19:13]team just working for notifications and [19:16]since we're talking about services let's [19:18]actually try to uh think about our [19:20]domain our Google drive clone and let's [19:23]try to create services so we have our [19:26]file service which basically handles [19:28]uploads gets off the services of the the [19:31]file sorry notifications which handles [19:33]for example notifications for push [19:35]notifications for web desktop and we [19:38]might have for example uh the o service [19:40]so just handles authentication and for [19:43]example the real time so real time could [19:46]be anything that you might have to on [19:48]real time for example you upload a file [19:50]on your computer that gets synced to the [19:52]cloud and the cloud syncs also to other [19:55]your other personal computers so just [19:58]some separation here um that we can [20:00]start working with. So these are four [20:02]domains and four teams could actually [20:05]work on these um microservices and this [20:08]kind of reveals a question which is how [20:10]does the load balancer and I think you [20:13]should also know the answer for by now [20:14]because I've touched on this previously [20:17]how does the load balancer for example [20:18]this user is going to authenticate so [20:20]let's actually make for example post [20:22]slappi/ [20:25]login right so clearly we're going to [20:28]hit the authentication uh service but [20:30]how Does the load balancer know how to [20:33]to handle this? So if you think about [20:34]the flow, the user calls our endpoint. [20:38]So our service and the load balancer is [20:40]going to redirect to the O service [20:42]because it knows that this is for the O [20:45]service. Some load bers can do this, [20:48]some load monsters cannot. Um so it's [20:51]also important to understand that maybe [20:53]the world bouncer is not the the guy for [20:56]this job. Not only that, if we add O, it [20:59]means that we're adding authentication. [21:01]Where does the authentication leave? If [21:03]we make this request, can we actually [21:05]call the notification service directly? [21:07]Can we actually bypass the load balancer [21:10]with authentication to the file service? [21:13]Where does authentication live? So, I [21:16]have just restructured everything [21:17]because of course the load balancer is [21:19]not going to be able to to handle this [21:21]because let's add a new piece here which [21:23]is going to be the gateway. So this is [21:25]going to be called the API gateway. [21:28]Basically what the API gateway is going [21:30]to do is that he's going to receive a [21:32]request. He's going to analyze it and [21:34]redirect to the right service. Not only [21:37]that, he can also aggregate uh [21:39]responses. For example, to get um to get [21:42]your authentication, we have to go to [21:44]the O service, but we might also want to [21:46]have to go to for example another [21:48]service like the uh profile service [21:51]where we actually get the the user [21:53]profile. So just an hypothetical case [21:56]and basically what this gateway does is [21:58]that he's going to coordinate that [22:00]distribution and he's going to return [22:02]you uh to the user response aggregated [22:05]with all of those uh requests. Not only [22:08]that, this is going to be solving the [22:11]second problem I mentioned which is we [22:12]don't want to users to communicate [22:14]directly with services. So we're going [22:16]to make all of this a private network. [22:20]So this is what you usually see has a [22:22]VPC and basically what we we have here [22:25]is that the gateway is going to be the [22:27]only entry point to communicate with our [22:30]system. All of these servers are not [22:32]going to be exposing their ports to the [22:34]outside. So their IP is going to just be [22:36]inside of this virtual network and the [22:39]only way to customers to communicate [22:41]with our business is via the gateway. So [22:43]the gateway is going to handle [22:44]authentication, routing and all of that. [22:48]I have kept here the load balancer and [22:49]I'm going to remove it in the next uh [22:51]the next diagrams because we might also [22:54]want to scale the gateway. So for [22:56]example, if the gateway now becomes a [22:58]single point of failure, it also might [23:01]be interesting to have a load balancer [23:03]before the gateway and also inside of [23:06]our cluster. So we went to the to know [23:10]which services need more resources [23:14]and then we also need to scale our [23:16]gateway because our gateway can actually [23:18]scale vertically right or in this case [23:20]horizontally as well but I'm going to [23:22]kind of leave it for now but in the next [23:24]ones I'm going to always omit the load [23:26]balancer but you can imagine that there [23:28]could be a one bouncer here. So now that [23:30]we have our kind of distributed [23:32]architecture and we need authentication [23:35]and it makes sense that of course we [23:37]always had authentication but let's kind [23:39]of deep dive on it. Let's actually [23:41]understand how we can actually do it on [23:44]this distributed architecture. It's [23:46]pretty much the same if it's not if it's [23:48]just a monolith service uh just a one [23:51]server uh instance like we had basically [23:54]here what we have is if we for example [23:56]think about how the files endpoint is [23:58]going to work. So we're going to upload [23:59]some files. The user is going to send [24:01]this pay by load, right? What's going to [24:03]happen is that the gateway is going to [24:05]verify that the user has a JT token. So [24:09]a kind of a token. I talk about this in [24:11]the channel and in all of my courses, [24:13]but basically just a token, a signature, [24:15]meaning that you are authenticated. [24:18]It carries an expiry date. So the [24:20]gateway is going to kind of verify that [24:23]you might want to go to the [24:25]authentication service or not. But this [24:27]is going to make a request trip. What [24:30]you might also want to do is just add [24:32]here a security layer on the gateway or [24:35]even before the gateway to verify this. [24:37]So I'm not going to go into much details [24:39]if this is a service, if this is the [24:41]gateway. Each service is different. Each [24:44]architecture is different. You might [24:46]actually have this or just the gateway [24:49]handling the authentication or at least [24:51]the basic part authentication just [24:52]validating if the token is valid. So the [24:55]user now sends this authorization header [24:58]with the token. This is how you usually [25:00]do it with the cookies. So there's no [25:02]network call authentication in our in [25:04]our architecture because we're just [25:06]going to verify if the files um this [25:09]request actually has a valid token. If [25:12]it does, we're just going to trust it [25:13]that it's not expired. We're going to [25:15]verify that and then we're going to [25:16]redirect to the files. So right on the [25:19]edge we can actually verify and throw [25:21]401 status code if this is unauthorized. [25:25]But we also have the authentication [25:27]service. So what does it do? How do we [25:28]actually get a token? This is where we [25:30]get to generating a token and somewhere [25:33]in every website is going to be a login [25:35]page. And this is kind of how it works. [25:37]You sends your credentials. So it could [25:39]be email password or just an email with [25:41]a single sign on token. But basically [25:44]the user is going to request for a token [25:46]with valid credentials. So the user [25:48]might might has to be existing already [25:51]in your architecture in your database [25:54]and what's going to happen is that the [25:56]gateway is just going to redirect to the [25:58]authentication and the authentication is [25:59]going to use kind of a private key to [26:01]sign your token make sure it's valid and [26:04]then it's going to actually redirect. on [26:06]the authentication layer on the gateway [26:09]layer sorry you might actually verify if [26:11]the user exists first by calling for [26:13]example [26:15]uh our service in this case we have this [26:18]thumbnail I have created different [26:19]services here so don't worry about that [26:21]but for example the user service might [26:24]actually be just responsible for [26:25]creating users or validating the users [26:28]and then only then if the user exists [26:30]we're going to actually call the [26:32]authentication layer okay so they might [26:35]be together. They might not be together. [26:36]This is kind of up to you. But just to [26:38]show you that kind of the gateway is [26:40]going to be this guy in the middle of [26:42]those decisions. Now just one more thing [26:44]on O in general is that this is [26:47]authentication. So this basically means [26:49]do you have access to the application [26:52]and then there's another thing which is [26:54]authorization which basically means do [26:56]you have permissions to upload files. So [26:59]authorization is different than [27:00]authentication and it's completely uh [27:02]it's related. You could even put it on [27:04]the same service but you can run [27:06]authorization on all of the services. [27:09]They can have their own authorization [27:11]level basically like permissions. So now [27:14]that we have our microser architecture [27:16]let's actually understand how we can [27:17]upload files because there's a new piece [27:19]here in the puzzle which is this object [27:21]storage. So so far we have actually kind [27:23]of ignored how to upload files which is [27:25]our business. But I wanted to also put [27:28]all of the pieces in the table so that [27:29]you understand how this works because it [27:32]really starts to get in complex once [27:34]multiple services start communicating [27:36]with one another which is what we're [27:38]going to start uh working now. So just [27:41]going back a bit how does file uploading [27:43]and file serving works and this is kind [27:45]of a recurring problem that you might [27:46]find in your career and you might have [27:48]to implement something like this I have [27:50]done multiple times. It's an interesting [27:52]problem and there's always this pattern [27:54]which is you're going to upload a file [27:56]right we have already talked about this [27:58]and the thing is that very important [28:01]this file you so we're not going to [28:03]actually do a direct upload to your [28:05]server so let's imagine here our file [28:07]service very simple and our database [28:10]this right here is not going to happen [28:12]so you're not going to upload directly [28:14]the file to your database to your [28:17]relational database because you might be [28:18]uploading a 20 GB megabytes movie uh 200 [28:23]megabytes image and databases relational [28:26]databases are not meant for this. Not [28:28]only that, you should not be streaming [28:30]all of this data to your server because [28:32]it might actually not allow for that and [28:34]we should not because this is going to [28:36]allow you to prevent some malicious [28:38]attacks from users. So you should not [28:40]direct upload to your server. And how do [28:44]you actually upload files then? So what [28:46]usually is done is that you're going to [28:48]and yeah, you might also have for [28:49]example a timeout because it's it's [28:51]going to be a lot of data for the server [28:54]and what usually happens is that you're [28:56]going to have an object storage. Imagine [28:58]for example uh an S3 bucket, a Google [29:00]cloud bucket. So an object storage that [29:03]is just dedicated to hold files, static [29:05]files that don't change often or at all.

[29:09]The way this works is that you're going [29:11]to usually do the request to your API, [29:14]so to your gateway. So with metadata, so [29:16]just a file name, the size of the file [29:19]and also like a blob. So for example, [29:21]your actual file, the image. But what [29:24]happens is that you're going to just [29:26]say, I want to play this file. The [29:29]gateway is going to go to the file [29:30]service. It's going to go to the [29:32]relational database. It's going to say, [29:34]hey, the user wants to store a file. So [29:36]let's generate some metadata metadata [29:40]which would look like something like [29:41]this. So for example, this would be the [29:43]the files table. So we're going to have [29:45]a new generated ID for this file. We're [29:48]going to have the file name. We're going [29:49]to start the size. You might also want [29:51]to store for example the type. This is [29:54]in case a PNG something like this [29:59]but not the file. So this is important [30:01]part because what's going to happen is [30:02]that okay we have the metadata of the [30:04]file stored because we can now check if [30:07]a file exists. What we should do is call [30:10]the object storage for example Google [30:12]cloud or something like that and say hey [30:14]give me a link that my users can [30:17]directly upload to the bucket. So [30:19]basically it's going to return you a [30:22]link which is going to be authenticated [30:24]or not. So it's important that you allow [30:26]a very small window of upload. So this [30:29]is going to have an expiry date [30:32]and it might or might not be [30:33]authenticated for uploading. I think [30:35]it's fine to leave it unauthenticated [30:36]unauthenticated but as long as it has a [30:39]small window of uploading it's fine and [30:42]it's also going to restrict some sizes. [30:43]So we can say for example just 50 [30:45]megabytes of uploads through this link [30:50]and with this solution the user is going [30:52]to now so we're going to redirect with [30:54]the metadata we're going to say hey it's [30:56]a 200 it's okay and we're going to also [30:59]redirect this link right here and the [31:02]user or the front end is going to [31:04]automatically start uploading the file [31:06]or streaming the file to this bucket. So [31:09]you're going to have the file on the [31:10]bucket directly uploaded uh there. And [31:14]if you think about this solution, it's [31:15]actually pretty interesting because [31:16]we're completely bypassing our [31:19]architecture. So we're actually going [31:21]directly to the bucket and our servers [31:24]are completely untouched. [31:26]So if I should go back to the start of [31:28]the video, we were saying that our file [31:30]service was getting bombarded with uh [31:32]upload requests and it was making our [31:35]servers really slow. We have just solved [31:37]that with the object storage. So [31:39]probably I should have introduced you uh [31:41]to this piece uh before. However, I also [31:45]want to show you that this is actually a [31:47]multiple service architecture here. Let [31:50]me show you the problem. [31:52]Imagine now that we have actually [31:54]uploaded the file. So the file is [31:56]uploaded. But when the user uploads the [31:59]file, right? So when it hits the bucket, [32:01]the bucket is going to send an event to [32:04]our internal system saying, "Hey, the [32:07]file has been uploaded. Do some [32:09]actions." Some actions might be [32:11]generating a time now, calling the [32:14]real-time service to actually update all [32:16]of your personal computers with this [32:18]file. It might also be, I don't know, [32:20]not the notification service to notify [32:23]like a push notifications to your mobile [32:24]phone that a file has been uploaded. So [32:28]this is kind of the serviceto-service [32:29]communication that you're going to have. [32:31]And for this it doesn't make sense to [32:33]actually call uh or at least the object [32:36]storage to know about all of your [32:38]services. This is unrealistic. This is [32:40]not how it works. It just needs to know [32:41]about one service about your API. How [32:44]does this work? We need of course to [32:46]have here a new piece. And this is [32:49]usually known as the broker or has the [32:52]the Q the the Kafka the the Revit MQ [32:55]whatever you want to have the pub [32:56]subsystem. This is going to be basically [32:58]a broker someone in the middle that is [33:00]going to receive messages and then it's [33:03]going to just care about delivering [33:05]these messages to your services.

[33:08]So hopefully that wasn't complex. I [33:09]think it was a bit but let me kind of [33:12]break the problem more and kind of show [33:15]you and introduce you why you want the [33:17]broker. So this is kind of the whole [33:19]start of the microservices course but [33:22]this is the idea. So we have our object [33:24]storage could be another service. He's [33:27]going to fan out multiple or just one [33:29]event to these subscribers. Okay. So the [33:32]thumbnail is going to generate a [33:33]thumbnail from the from the video and [33:37]the real time wants to also update the [33:39]other machines uh that you have on your [33:42]uh on your network with that new file.

[33:45]Okay. So the flow is the following. We [33:48]the client hits the API gateway which [33:50]has a token. Authentication verifies. [33:52]All good. Metadata service creates a [33:54]row. It returns the upload URL. The [33:56]client is going to automatically [33:59]upload this to the object storage [34:01]directly. So we know that we should not [34:03]upload to our architecture directly. And [34:06]then the client uploads the bytes to the [34:08]bucket. All good. So far we have seen [34:10]everything. And then the thumbnail [34:12]service is going to consume the event, [34:14]generate a preview and write back. So [34:17]this is where the problem starts to [34:19]happening. What happens [34:22]if let's even break the problem a bit [34:25]more. So what happens if we just make [34:27]this synchronous call to the thumbnail [34:29]service after a video has been created. [34:31]So the video is created is uploaded or [34:34]the image and we need to create a [34:35]thumbnail some side effect and if the [34:39]object called directly our service what [34:42]happens if for example our terminal [34:45]service was down what happens if our [34:47]service was not was slow and the request [34:51]timed out. So for example if we got 404 [34:55]or 503 service isn't available. [34:58]Basically what happens is that our video [35:01]that we have just uploaded would be [35:03]without a thumbnail and we would not [35:05]know why. So of course platforms like [35:07]YouTube, Google Drive, they have [35:09]solutions for this. They have [35:11]reliability and durability of the [35:12]messages of the events and of course [35:15]this direct communication isn't a go no [35:18]go and this is exactly why we don't want [35:21]to have this synchronous call. We want [35:23]to have someone in the middle that only [35:25]receives messages, these events and it's [35:27]highly durable and is able to um deliver [35:31]them to the services. So the second part [35:34]is uh yeah this is basically what I've [35:35]said what the service is done and what [35:37]if we want to also fan out or send [35:40]multiple copies of the same event to [35:43]other services. So basically what we [35:45]have right here. So basically the [35:50]consumer service would have to know to [35:53]notify all of these guys which of course [35:55]it doesn't make sense it doesn't scale [35:57]as well. A better solution would be of [36:00]course to have this guy in the middle [36:03]which is our broker. So instead of the [36:07]object storage directly talking to the [36:09]service it would call this broker and [36:12]this broker would basically be highly [36:15]available. So it would make sure that [36:17]it's very good at staying up. It's not [36:20]crushing the messages. Even if it [36:21]crashes, they are durable. They could be [36:23]stored in a separate database. Uh it has [36:26]a redy functionality. So if a me message [36:29]is not delivered, for example, to the [36:31]file service, we know that he's going to [36:34]acknowledge it and he's going to [36:36]redeliver again after a certain period [36:39]of time. If the message is not [36:41]delivered, we're going to put it into a [36:43]graveyard of messages, usually called [36:46]the dead letter Q. And this is going to [36:49]be we're going to have to set up some [36:51]alerts into our system to Slack, to [36:53]Discord, knowing and then this is going [36:56]to be a way for us to know that hey, [36:57]message X was not delivered. So, let's [37:00]actually take a look at what is [37:02]happening. So we're I'm getting into a [37:04]lot of topics but this is really the [37:06]nature of the distributed systems of [37:08]microservices and really the main topic [37:10]that I want to to show you is that this [37:13]is again just like the beginning of a [37:15]service of a server and a database. This [37:18]is again separation of concerns but at a [37:21]bigger scale. The thumbnail service is [37:23]only responsible for generating [37:25]thumbnail thumbnails real time for [37:27]notifications on real time files the [37:30]same and authentication the same as [37:32]well. All right, so one of the last [37:34]topics because I could keep this video [37:36]going forever and scaling up the [37:38]problem. I want to finish just with [37:41]caching CDN and rate limiting because [37:44]our application is scaling a lot. We're [37:46]getting now for example uh 10,000 [37:49]monthly active users. This is a lot, [37:51]right? [37:52]This means also that the business is [37:53]doing good. But there's another set of [37:56]problems that we're having. So we're [37:57]able to scale infinitely. We can just [37:59]throw money at a problem. We can [38:01]vertically horizontal scale. Our [38:03]architecture looks nice and distributed. [38:06]But if 500 users that now we starting to [38:10]get into the territory of optimizations [38:12]and I want to leave this by the end [38:14]because you should not premature [38:16]optimize. This is a whole overengineer [38:19]problem on its own. So I kind of want to [38:21]leave it at the ends and this is where [38:24]we start saving resources start um [38:27]trying to shave those milliseconds of [38:29]requests caching gets there. So let's [38:33]imagine this is classic problem. 500 [38:35]users trying to open the same file every [38:38]day for example on the home screen of [38:39]your application. If they open the same [38:41]file every time we're going to have to [38:44]always waste compute [38:46]uh basically from the gateway to the [38:48]files to the relational database to [38:50]object storage and streaming back. So [38:52]the whole process we have seen getting a [38:54]file gateway files relational database [38:58]for the metadata. we get the metadata, [39:00]we go to the object storage and the [39:01]object storage returns to the gateway [39:03]and the gateway returns the file uh [39:05]enriched with all of the data. Okay, so [39:09]this is kind of the whole flow that we [39:11]want to optimize and there's multiple [39:13]ways we can optimize here. We can first [39:14]optimize the image and the asset itself, [39:18]the image is another beast of its own [39:20]because it's usually like uh 1 GB 200 [39:24]megabytes images and the metadata is [39:27]another thing. So let's start with the [39:28]metadata, right? [snorts] So caching [39:31]basically it's going to be a layer that [39:33]we're going to add here. So it's going [39:34]to be uh for example radius. If you're [39:36]familiar with radius, radius is a key [39:38]value storage. It's a it's a way to [39:40]cache values in RAM. So for example, the [39:43]reason why you want to cache on the RAM [39:45]of your computer, it's because it's fast [39:47]access. So it's very fast to get data [39:50]from RAM than to go to disk. So this is [39:52]kind of computer sciency, but this is [39:55]really why cache is nice because we're [39:56]using RAM. However, as you can see [39:59]recently with AI and all of that, RAM is [40:01]very expensive. It's a rare resource and [40:04]so we want to use it just for small [40:06]things that are not big. So, you don't [40:08]put files on RAM. This is the first [40:10]thing is that you might be tempted to [40:12]actually, hey, let's just put our whole [40:14]file in RAM. This is usually not how you [40:17]do it. It's very expensive. And I say [40:20]usually because there are businesses [40:21]that do it because it really provides [40:23]them a better service. I think that [40:25]Netflix does something like this. For [40:27]example, if a new uh movie or a series [40:30]is going to be very popular, they might [40:32]just put this on run just for that [40:34]initial burst of new users. I'm getting [40:38]sidetracked, but basically we're going [40:40]to start by caching our file metadata. [40:43]So just the file name, the the [40:46]properties because this never changes if [40:48]rarely. Okay. And this is basically uh [40:52]caching the file bytes in cache. So on [40:54]on radius for example is usually wrong. [40:56]I say usually because sometimes it might [40:57]be correct. A two 200 a 20 megaby video [41:00]in radius is expensive memory and radius [41:03]isn't built to stream large blobs. So [41:06]how do we actually stream [41:09]cached videos? Okay. So how do we [41:11]actually cache our videos, our images, [41:14]our assets? This is where it comes [41:16]another piece for the puzzle which is [41:18]the CDN. So the CDN is going to be here [41:21]at the edge outside of our cluster. [41:24]Basically the CDN is going to it's [41:26]basically a cache of its own but it's is [41:28]used very well for assets big things but [41:31]before I actually talk about the CDN [41:33]let's understand how the cache works how [41:34]how does this work how it's going to be [41:36]kind of kind of the algorithm what's [41:38]going to happen is that whenever a [41:40]request comes in actually so let me show [41:42]you so we're going to get this image [41:44]we're going to call the gateway and the [41:46]gateway is going to call the file [41:47]service and for example the file service [41:50]might go to the cache directly And if [41:54]the file is in the cache warm and uh [41:57]ready to serve, we can actually just [41:59]return this file from the cache instead [42:02]of going directly to the relational [42:04]database to the object storage and then [42:07]redirecting it. We can actually or we [42:09]need to actually go to the object [42:11]storage in this case because we have to [42:13]get the file. But just to get the URL of [42:15]the file from the ID of the file, we [42:19]need to go to the relational database. [42:21]So there's another loop here. This is [42:23]where cache is going to come in because [42:24]we're going to directly go to the cache [42:26]and the cache is going to avoid the [42:28]database because the file metadata is in [42:31]cache. So kind of the algorithm is does [42:33]the key uh for example file 1 2 3 exists [42:37]in cache. [42:39]If this is true, [42:42]we're just going to return it, right? So [42:44]we're just going to send it.

[42:47]But if this is the first time that this [42:49]image is getting requested and it's not [42:51]in the cache, what we're going to do is [42:52]that we're going to of course go to the [42:54]database. So get it from the DB. And [42:58]once we have it, we can actually save it [43:00]to the cache so that the next request is [43:02]going to be cached. It's going to be [43:03]faster. And finally, we're going to of [43:05]course return to the user. So this is a [43:09]pattern that you see it a lot whenever [43:11]you implement some cache mechanisms in [43:13]your back end. So this is a good pattern [43:16]to be familiar with. Now going back to [43:18]the CDN, [43:20]it's a bit the same as the cache and so [43:22]I have added here some notes to make it [43:24]easier. So let's imagine that the CDN [43:26]has hundreds of small servers or small [43:28]points of presence around the world. So [43:30]that's the the good thing about the a [43:32]CDN provider is that they have more uh [43:36]presence around the world because this [43:37]means that the users when making the [43:39]request instead of going to your server [43:41]he's going directly to the CDN. This is [43:44]the power of the CDN because it [43:46]completely avoids this whole computation [43:47]we have done to just get the image. [43:50]Okay. So the first user for example in [43:52]Lisbon he goes try to fetch the the file [43:54]one to three and he pulls it from the [43:57]database. He does everything we have [43:59]done before. But then the CDN is going [44:01]to cach this image so that the next time [44:03]someone and in this case even him or [44:06]anyone in Lisbon is going to get that [44:08]image from uh the C. So the next 499 [44:12]users for example are going to get that [44:14]image way faster for example to to 20 [44:18]milliseconds instead of for example 1 [44:20]second because that server that CDN is [44:23]much closer to the users. For example, [44:26]if you have another users on the US, [44:29]it's much it's like 300 millisecond to [44:31]to Lisbon in the latency. So, it's nice [44:34]to have the images cached on the near [44:37]where your users are. So, this is where [44:39]CDN comes in and it for our business is [44:42]actually a very important uh building [44:44]block. And just to wrap this whole [44:47]lesson, let me just go over rate [44:49]limiting because it's also important. [44:51]And whenever you bake you start scaling [44:54]it's very important to have a rate [44:55]limiting in place otherwise malicious [44:58]users might try to exhaust the resources [45:00]of your infrastructure and it's going to [45:02]scale it uh and waste money resources [45:05]affect your other users experience just [45:08]because they wants to to be malicious. [45:11]How do we work with this? Because we [45:13]already have every all of the building [45:14]blocks in place. We have our cache and [45:16]rate limiting is going to work with the [45:18]cache. What's going to happen is that [45:19]we're going to have some layer even on [45:22]the gateway itself or even on the on the [45:25]cloud provider it might actually give [45:27]you that for free but basically we're [45:29]going to have uh somewhere a rate [45:31]limiter or some piece of technology some [45:34]some block you can imagine it could be [45:36]on the gateway could be another service [45:38]that is going to basically see the [45:40]request that is coming from your user [45:42]identify him from the IP or any other [45:45]identifiable information and then it's [45:47]going to right to the cache something [45:49]like this and because the cache is a key [45:51]value storage it's fast because it's in [45:53]the RAM you can say for example user uh [45:55]1 to three has made I don't know like [45:58]five requests in the last uh minute and [46:02]we're going to block him if he gets to [46:03]10 requests he's going to get a 429 [46:07]status code in HTTP so he's going to be [46:10]rate limited this is very simplistic I [46:12]actually have a video on rate limiting [46:14]on my YouTube channel if you want to [46:15]take a look at it we actually go over [46:17]mult multiple algorithms because there's [46:19]different ways to rate limits. But this [46:21]is kind of the idea is that we're going [46:22]to have a layer that is going to count [46:24]the requests from each user and we're [46:27]going to of course uh try to bucket them [46:29]into uh some total amounts. Now you [46:33]actually might be familiar with this [46:34]idea in a different way. For example, if [46:36]you use quad code shptt, they have kind [46:39]of this idea of limits. You can just use [46:41]x amount of tokens per day. This is very [46:44]similar. It's a bit different because [46:45]it's kind of a currency, but it's kind [46:48]of the the same idea behind you are rate [46:52]limited on certain resources because [46:54]they are expensive to compute. And [46:55]again, the cache is also pretty good at [46:58]doing this at pre-calculating expensive [47:01]computations and storing them here in [47:03]memory because for example uh getting [47:05]the image is not that expensive though. [47:08]But another expensive computation might [47:09]be compiling some code. you might want [47:11]to store that into an object storage and [47:13]get that from cache for example. Um this [47:17]is another example you can see. So [47:19]throughout your career all of these [47:21]patterns are going to become basically [47:23]your friends. You're going to see them [47:25]uh you can see how is the journey we [47:27]have made. Um, and I really wanted to [47:31]show you in this video that you can go [47:32]from a very simplistic [47:34]architecture to something complex and [47:37]along the way understanding the pros, [47:39]the pros, the cons of every decision we [47:41]make in our system. And again, [47:43]architecture is not just about [47:44]infrastructure. As you saw with caching, [47:47]we were talking about algorithms, uh, [47:49]product decisions, what database type [47:52]we're going to use. So it's [47:53]understanding the problem, the domain [47:55]and then building some software solution [47:57]that is going to solve that in the best [47:59]way and that is going to scale in the [48:01]future. So hopefully this video uh you [48:04]got something from it, you liked it. Uh [48:07]let me know in the comments and let's [48:08]continue the discussion there. Thank you [48:10]for watching.

Was this transcript helpful?
Beyond YouTube

Got your own audio or video file?

Typist turns your own audio and video into accurate, timestamped transcripts. The same speed and export options you just used, now for your lectures, meetings, podcasts and interviews. No signup to start.

  • Done in seconds
  • 99 languages
  • TXT, DOCX, PDF, SRT, VTT
Transcribe your own
lecture-recording.mp3
Transcribing
00:00
00:05
00:11