So software engineering has been changing very fast especially in the last few months and I think we at the time that we need to slow down and understand what is happening. We have all of these machines writing code for us. We talk to them and they write code and I use them myself every day. I think it's wonderful. uh there's definitely some productivity boosts but sometimes we also see kind of these messages and outages uh these errors everywhere on a massive platform forums are happening more and more um so I want to kind of make a video to kind of bring some good and kind of bring some knowledge into software architecture I think it's more important than ever to understand what we're building and not how to build it I have a lot of videos on my YouTube channel and I always even before AI showed you the theory, the pros and the cons of every decision I've made and this is really deep to software engineering and I wanted to kind of give you an introduction for those that might not be so techsavvy to software architecture. So in this video we're going to build um we're going to first try to define what is software architecture and then we're going to build a Google drive kind of clone just on the ideation level step by step. uh we're going to start very small just with the server and then we're going to get into a micros service architecture. So every decision along the way I'm going to try to uh distill and show you my reason behind it. Now for me when it comes to software design I always reference Martin Fer and I want to just show you a phrase from this uh from this article and you can get it on the description as always but I want to try to define architecture first and why it's useful and then what we're going to do the rest of the video is that we're going to architect ourselves a Google Drive clone. So we're going to be able to upload files, download files, and kind of we're going to start very small, just one server, data on the server, and then we're going to gradually get into a monolith architecture. We're going to talk about uh rate limiting, caching, uh horizontal scaling, vertical scaling, a lot of stuff. And kind of um the goal is to show you the decisions, the gradual evolution, and the multiple uh ways that you can think about designing software. It's not just about the codes, it's about thinking in systems. Uh and then we can also think about code in in systems as well. Uh but yeah, without further ado, let me just if you want to get the article, I'm going to leave in the description. But let me just uh kind of highlight a bit of this. So uh a good architecture is important and otherwise it becomes slower and more expensive to add cap new capabilities in the future. This is really kind of the the experience that most of us have had building a production software, right? Which is that if we don't build a good architecture, the business is going to suffer. And this is really why architecture is important. It's to support the business so that revenue can continue, the business can stay uh running smoothly and that we don't get kind of these messages to our clients.
Right? Of course, I'm not giving the blame to these guys. These are companies that know that what they're doing. Um so to start off let's understand how to design the software. Basically this is going to be kind of a crash course for that and we're gonna as I said we're going to do a gradual uh gradual evolution. Of course if there's no better one than the previous one. This is of course I don't want to overengineer and I want to kind of teach you the intentional process of decision making. An architecture might be the best for you at the current stage of the company you're working at or it might not. So this is really where decision comes in and where you need to understand the pros and cons. And there's whole books written about this like this one here. It's one of my favorite ones. You probably know about it. So of course it's a heavy topic to talk. I'm going to stay mostly on the ideation level. But let's get started. Let's start very simple. So I'm going to have here a server. So this is how you might your first program might have started. You have your just a server and then you have users. So let me just and what happens is that your users are going to make requests to your server.
So very simple. This is going to be our Google drive. Uh very simple. So all the data is inside of the server. So this is all one instance and this works this works well. But I'm going to start asking some questions which is what happens if the server dies for example and what happens when you want to scale. So let's start with the first one because if you see our data is coupled to the server. So the data is actually inside of of the server. So we could imagine this in go would be something like something like this. So this is a very simple Golang uh strct where we have our data inside of the server. So the files this is a map of ID to the location of the file and of course if the server dies the data is also going to be lost. Basically this is a very simple solution and you of course you already know how to solve probably just add a database. But the idea here is not just adding a database is that you are going to decouple the server that's going to be stateless to the database. There you go. So basically now we have our server and our database and whenever we make requests we're going to get the data from the database so that our users don't lose the data whenever they make a request and our server uh is down. So now we have data persistence and our first question our first problem is actually solved. So very simple we're going to get there this going to become more interesting I promise you but it's also important to understand why you you why do you want to add the database at all. So it's also important to understand these little details and what happens as well when you want to scale. This is our second question which is if you actually scale this instance here with the data what's going to happen is that you're going to have two servers for example. So we're going to horizontal scale. We're going to get there. But if you want to have multiple instances, basically we're going to have duplicated data. And of course, this is not the desirable. Um it will work because you're going to have still the data there. And the request comes to the second server, the date is still going to be there, right? So if you make a request to this one, but the problem it comes here when you have data synchronization. So if the user writes to this machine, the date is going to be written here, but the other one is not going to know. So this is another problem that we have to solve and it is solved by having a decoupled um instance right here. This means that we can actually scale our server and the user whatever instance he uses it doesn't matter because we have one singular relational database. So a database that is going to hold the user tables, the file tables. So everything is going to be hold here and what we can learn from this is that this is the classic separation of concerns in software engineering which is the server is concerned with serving the user while the database is concerned with storing the data. This means that if the server um dies or it's removed, the database is not concerned with that because it doesn't know about the server. The server knows about the database. But if the server or the database gets down, of course, this is more this is uh different because if there's no database, there's no way to to serve the user. But in some cases, the requests don't need to go to the database. So there's some decoupling here. This is kind of the the key takeaway from this first lesson. So here is the concept data couple to the machine equals no scale and no resilience. Let's take things a bit further. So here is our simplistic architecture. Now let's imagine the following. Our application is doing well. We're getting a lot of users. It's getting popular on the app store but we are getting complaints that our app is slow to respond. So basically the server is not handling all of these requests as it should. Of course, let's actually scale it. So we're going to scale to a second instance. Now what happens here and we have already seen this before that we can scale independently our server that is good but there's a whole lot of problems when we add more than one server. We starting to get into a distributed architecture per se because we have multiple servers and in this case they don't need to talk to one another but we need to have some way in the middle to know who is going to be the server. So imagine that most of the users are actually going to this one and this one has a lot of uh free resources to use. The users are going to get a degraded experience because this server is not being a rout. And we also have to have a piece in the middle here that is going to decide which server is going to be the one serving the user. So this is usually called a load balancer. Do we actually have here? Yep, we do. Awesome. So this is usually called a loan balancer and basically let's put our architecture here. Let me actually move our servers uh and duplicate it. Now we have another piece in the middle that actually is responsible to just redirecting to a healthy machine. So whenever you hear load bouncer just this piece in the middle that is going to redirect to an a healthy machine or in this case it might use that on tour or it might just redirect to kind of a roundroin algorithm. So what this means is it's an algorithm that is going to uh on the load balancer that if the user A comes in, it's going to redirect to server one. If another user comes in again, we're going to redirect to the other one. So kind of sequentially alternating between the server so that they both get the same traffic. But is the traffic actually the same? If user A actually uploads a file, so let's say that he's going to do let's start kind of designing our API as well. if he starts uploading a file whilst user B is just getting a file. So he's just consuming um a file, right? So it's getting file 1, two, three. Is this actually the same traffic? Is this actually the same request amount? Of course not. This is going to be more um storage uh expensive. This might be more CPU expensive. So there's here different uh experiences. Of course, round robin is pretty good already and it solves our problem. But our load balancer could also be more smart and use health checks to understand how our server servers are doing. And if we have this server uh doing much better than the other one because this is getting a file is being uploaded then the load balancer this piece here knows exactly to send to this one. So just kind of show you that there's multiple load balancers out there to use different techniques. And uh an interesting thing about load balancers as well is that they can do much more. For example, we're not getting into microser architecture yet. But if we have a server and this kind of solves the problem that I have just said, which is we're going to have a service called files that is just going to handle files. Let me actually put this a bit below here so we have more space. If we know that this user is going to upload a file and it's going to make our servers slower, we can actually just redirect him to a specific or specialized service that just knows how to do files upload whilst the other ones are handling traffic normally and they are healthy.
So load balancers can actually do routing on path names as well. So if your API starts with a post and it's for an upload, you can actually just redirect to this server. Now to not spend much more time on the loan balancers, let me just show you that there's other three ways to do this. Uh there's the weight to broad robin. There's stick sessions. I think this one is actually pretty interesting. I've actually worked on something similar pretty recently. And basically the reason you want to have sticky sessions and what happens is that if user A comes to load balancer he's going to be redirected to the server the next time the load balancer is going to try to do best effort to redirect him to the same server. Why we want to do this mostly because if the server has some states for this user he's doing some work it might be interesting to redirect him to the same server. Um and then again we have connection. So basically kind of hey if this server has less connections let's actually use that one because it's more uh it's available. Okay to exemplify what I'm talking about I think it makes sense to show you some code and a web server actually running in the real world to to make sense. I think this is going to click for you. So I have here three Golang services. So just three servers that are exactly the same. So they have the exact same code. Uh it's pretty simple. So just a simple HTTP server. It has some session handling that I'm going to show you for the cookies and the sticky sessions. Just some HTML page that I generated and then this is just a very simple uh server that gets cookies and create sessions. Interesting part is here on the engine X because here we have two upstream blocks. This is going to be two groups of servers. This basically means that our engine X web server proxy is going to be able to load balancer uh to load balance in case the URL has RR. So this basically means run dropping or the sticky session. So just to show you how it's going to work in practice. So let me actually do a docker compose up. Uh I think the server was not running. And there we go. So if I have just open the browser at /rr and if I refresh the page you can see that every time I'm going to get one, two, and three. So you can see that the round robin is doing its work. So it's getting to server one, server two and server three. This is what it's doing. However, if I apply the session cookie, I have just enabled here the uh uh the location of the file. So you can see it's / sticky. Let me actually refresh. Now we get server tree. I'm going to refresh again when we always get server tree. So you can see how this works. You can see here the number of of hits. This is the session cookie identification. So this could be your authentication token for example. And every time the load balancer is redirecting me to the same server. So this is the session affinity uh load balancing. Now just to show you again I have just cleaned the cookies. Let me actually refresh and let's see what happens. So I got server one. I'm going to refresh again. I got server three. And as you can see it really likes server three for some reason. This is exactly what the sticky session uh works. So now that you know how to coordinate traffic across our servers, we can actually start scaling like crazy because we know that we have a new piece a new building block which is this guy the load balancer and he can resurrect traffic. So we have the power of the load balancer unlocked and this means that we can actually have the servers we want and as long as we can pay for them but [snorts] this means that I want to to make the distinction between two types of scaling that we can do. We can do horizontal scaling which is what we have been doing. So adding more machines to your cluster to your business and this makes sense because if you have more machines you have more resources more power because each machine is independently from one another. So we can do some computations here. Well this one is free to serve another user or even uh another cluster of users. Right? There's another way that you can do this and most of the time you're going to use both of them. So there's not uh one way to go about it. This one is a bit kind of throwing money at a problem. You might hear this this expression which is increasing the server. So we have one server but it's a beefy machine. It has more uh more RAM, more CPU, more storage. So you are increasing the compute power of your machine. So it makes sense to have both solutions of beefier machines but horizontally scaling them as well. And this is usually called the vertical scaling. Um, and this is horizontal. Okay, so this is just this description that I wanted to make sure you understand. And this basically allows us to actually make the load balancer a bit more more smart because what more often than not, you know that having more machines equals more money spent. Beefier machines equals more money spent. So we want to kind of have as least machines as we can. So the minimum amount is perfect. And autoscalers usually allow you to have or load balancers allow you to autoscale. So you can have a minimum of two for example. But if you're getting a lot of traffic you can actually increase to 10. This is how you usually do it. So with autoscaling uh Kubernetes does this. I know Google cloud does as well either with servers or even serverless solutions which basically means you don't have to serverless solutions basically means you don't have to care about the service the server about provisioning it. the the provider does that for you. Just put a Docker image or a Docker container and the server is going to take care of that. But that's for another time. Basically, what this means is that you can have our load balancer actually knowing how much is too much traffic and then he's going to increase or he's going to provision instances for you. So, pretty interesting. Now, let's say that our application is profitable. It's working. Our customers are happy. We are scaling very well. We're making a lot of revenue, but we are starting to hiring more people. So, we have more members, more engineers, and we also are having some some endpoints are being slow when we're getting reported on that. So, what is happening here? And we kind of open the doors for a whole new world, a dangerous world, which is microservices. And we started that journey with our file service, which basically is that some customers mentioned that the uploads were getting slower. So whenever we did uploads of files they were getting slow because most of the servers were handling everything. Now what we do with microservices is that we're going to solve this pain of having multiple specialized services that know how to do one thing and one thing very well. So we're going to have for example if we start thinking about our business and microservices is a whole new world of topic and I want to mention that I actually have I have a 20our course on just microservices alone we use golong of course uh it's the first link in the description if you are interested it's the the first three hours are completely free so we can actually watch them there's a lot of theory and it's really important to understand that microservices is a complex topic most of the times you might want to use microservices is when the when you're working on a big business and when I mentioned that we were hiring in our hypothetical case it's because it's really interesting to have for example a team just working for the files another team just working for notifications and since we're talking about services let's actually try to uh think about our domain our Google drive clone and let's try to create services so we have our file service which basically handles uploads gets off the services of the the file sorry notifications which handles for example notifications for push notifications for web desktop and we might have for example uh the o service so just handles authentication and for example the real time so real time could be anything that you might have to on real time for example you upload a file on your computer that gets synced to the cloud and the cloud syncs also to other your other personal computers so just some separation here um that we can start working with. So these are four domains and four teams could actually work on these um microservices and this kind of reveals a question which is how does the load balancer and I think you should also know the answer for by now because I've touched on this previously how does the load balancer for example this user is going to authenticate so let's actually make for example post slappi/ login right so clearly we're going to hit the authentication uh service but how Does the load balancer know how to to handle this? So if you think about the flow, the user calls our endpoint. So our service and the load balancer is going to redirect to the O service because it knows that this is for the O service. Some load bers can do this, some load monsters cannot. Um so it's also important to understand that maybe the world bouncer is not the the guy for this job. Not only that, if we add O, it means that we're adding authentication. Where does the authentication leave? If we make this request, can we actually call the notification service directly? Can we actually bypass the load balancer with authentication to the file service? Where does authentication live? So, I have just restructured everything because of course the load balancer is not going to be able to to handle this because let's add a new piece here which is going to be the gateway. So this is going to be called the API gateway. Basically what the API gateway is going to do is that he's going to receive a request. He's going to analyze it and redirect to the right service. Not only that, he can also aggregate uh responses. For example, to get um to get your authentication, we have to go to the O service, but we might also want to have to go to for example another service like the uh profile service where we actually get the the user profile. So just an hypothetical case and basically what this gateway does is that he's going to coordinate that distribution and he's going to return you uh to the user response aggregated with all of those uh requests. Not only that, this is going to be solving the second problem I mentioned which is we don't want to users to communicate directly with services. So we're going to make all of this a private network. So this is what you usually see has a VPC and basically what we we have here is that the gateway is going to be the only entry point to communicate with our system. All of these servers are not going to be exposing their ports to the outside. So their IP is going to just be inside of this virtual network and the only way to customers to communicate with our business is via the gateway. So the gateway is going to handle authentication, routing and all of that. I have kept here the load balancer and I'm going to remove it in the next uh the next diagrams because we might also want to scale the gateway. So for example, if the gateway now becomes a single point of failure, it also might be interesting to have a load balancer before the gateway and also inside of our cluster. So we went to the to know which services need more resources and then we also need to scale our gateway because our gateway can actually scale vertically right or in this case horizontally as well but I'm going to kind of leave it for now but in the next ones I'm going to always omit the load balancer but you can imagine that there could be a one bouncer here. So now that we have our kind of distributed architecture and we need authentication and it makes sense that of course we always had authentication but let's kind of deep dive on it. Let's actually understand how we can actually do it on this distributed architecture. It's pretty much the same if it's not if it's just a monolith service uh just a one server uh instance like we had basically here what we have is if we for example think about how the files endpoint is going to work. So we're going to upload some files. The user is going to send this pay by load, right? What's going to happen is that the gateway is going to verify that the user has a JT token. So a kind of a token. I talk about this in the channel and in all of my courses, but basically just a token, a signature, meaning that you are authenticated. It carries an expiry date. So the gateway is going to kind of verify that you might want to go to the authentication service or not. But this is going to make a request trip. What you might also want to do is just add here a security layer on the gateway or even before the gateway to verify this. So I'm not going to go into much details if this is a service, if this is the gateway. Each service is different. Each architecture is different. You might actually have this or just the gateway handling the authentication or at least the basic part authentication just validating if the token is valid. So the user now sends this authorization header with the token. This is how you usually do it with the cookies. So there's no network call authentication in our in our architecture because we're just going to verify if the files um this request actually has a valid token. If it does, we're just going to trust it that it's not expired. We're going to verify that and then we're going to redirect to the files. So right on the edge we can actually verify and throw 401 status code if this is unauthorized. But we also have the authentication service. So what does it do? How do we actually get a token? This is where we get to generating a token and somewhere in every website is going to be a login page. And this is kind of how it works. You sends your credentials. So it could be email password or just an email with a single sign on token. But basically the user is going to request for a token with valid credentials. So the user might might has to be existing already in your architecture in your database and what's going to happen is that the gateway is just going to redirect to the authentication and the authentication is going to use kind of a private key to sign your token make sure it's valid and then it's going to actually redirect. on the authentication layer on the gateway layer sorry you might actually verify if the user exists first by calling for example uh our service in this case we have this thumbnail I have created different services here so don't worry about that but for example the user service might actually be just responsible for creating users or validating the users and then only then if the user exists we're going to actually call the authentication layer okay so they might be together. They might not be together. This is kind of up to you. But just to show you that kind of the gateway is going to be this guy in the middle of those decisions. Now just one more thing on O in general is that this is authentication. So this basically means do you have access to the application and then there's another thing which is authorization which basically means do you have permissions to upload files. So authorization is different than authentication and it's completely uh it's related. You could even put it on the same service but you can run authorization on all of the services. They can have their own authorization level basically like permissions. So now that we have our microser architecture let's actually understand how we can upload files because there's a new piece here in the puzzle which is this object storage. So so far we have actually kind of ignored how to upload files which is our business. But I wanted to also put all of the pieces in the table so that you understand how this works because it really starts to get in complex once multiple services start communicating with one another which is what we're going to start uh working now. So just going back a bit how does file uploading and file serving works and this is kind of a recurring problem that you might find in your career and you might have to implement something like this I have done multiple times. It's an interesting problem and there's always this pattern which is you're going to upload a file right we have already talked about this and the thing is that very important this file you so we're not going to actually do a direct upload to your server so let's imagine here our file service very simple and our database this right here is not going to happen so you're not going to upload directly the file to your database to your relational database because you might be uploading a 20 GB megabytes movie uh 200 megabytes image and databases relational databases are not meant for this. Not only that, you should not be streaming all of this data to your server because it might actually not allow for that and we should not because this is going to allow you to prevent some malicious attacks from users. So you should not direct upload to your server. And how do you actually upload files then? So what usually is done is that you're going to and yeah, you might also have for example a timeout because it's it's going to be a lot of data for the server and what usually happens is that you're going to have an object storage. Imagine for example uh an S3 bucket, a Google cloud bucket. So an object storage that is just dedicated to hold files, static files that don't change often or at all.
The way this works is that you're going to usually do the request to your API, so to your gateway. So with metadata, so just a file name, the size of the file and also like a blob. So for example, your actual file, the image. But what happens is that you're going to just say, I want to play this file. The gateway is going to go to the file service. It's going to go to the relational database. It's going to say, hey, the user wants to store a file. So let's generate some metadata metadata which would look like something like this. So for example, this would be the the files table. So we're going to have a new generated ID for this file. We're going to have the file name. We're going to start the size. You might also want to store for example the type. This is in case a PNG something like this but not the file. So this is important part because what's going to happen is that okay we have the metadata of the file stored because we can now check if a file exists. What we should do is call the object storage for example Google cloud or something like that and say hey give me a link that my users can directly upload to the bucket. So basically it's going to return you a link which is going to be authenticated or not. So it's important that you allow a very small window of upload. So this is going to have an expiry date and it might or might not be authenticated for uploading. I think it's fine to leave it unauthenticated unauthenticated but as long as it has a small window of uploading it's fine and it's also going to restrict some sizes. So we can say for example just 50 megabytes of uploads through this link and with this solution the user is going to now so we're going to redirect with the metadata we're going to say hey it's a 200 it's okay and we're going to also redirect this link right here and the user or the front end is going to automatically start uploading the file or streaming the file to this bucket. So you're going to have the file on the bucket directly uploaded uh there. And if you think about this solution, it's actually pretty interesting because we're completely bypassing our architecture. So we're actually going directly to the bucket and our servers are completely untouched. So if I should go back to the start of the video, we were saying that our file service was getting bombarded with uh upload requests and it was making our servers really slow. We have just solved that with the object storage. So probably I should have introduced you uh to this piece uh before. However, I also want to show you that this is actually a multiple service architecture here. Let me show you the problem. Imagine now that we have actually uploaded the file. So the file is uploaded. But when the user uploads the file, right? So when it hits the bucket, the bucket is going to send an event to our internal system saying, "Hey, the file has been uploaded. Do some actions." Some actions might be generating a time now, calling the real-time service to actually update all of your personal computers with this file. It might also be, I don't know, not the notification service to notify like a push notifications to your mobile phone that a file has been uploaded. So this is kind of the serviceto-service communication that you're going to have. And for this it doesn't make sense to actually call uh or at least the object storage to know about all of your services. This is unrealistic. This is not how it works. It just needs to know about one service about your API. How does this work? We need of course to have here a new piece. And this is usually known as the broker or has the the Q the the Kafka the the Revit MQ whatever you want to have the pub subsystem. This is going to be basically a broker someone in the middle that is going to receive messages and then it's going to just care about delivering these messages to your services.








