<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0"><channel><title><![CDATA[Arslan's Tech Blog]]></title><description><![CDATA[With over 2 decades of technology and leadership experience, I help startups in designing, developing, scaling, and rolling out concepts using state-of-the-art ]]></description><link>https://blog.arslanali.io</link><generator>RSS for Node</generator><lastBuildDate>Mon, 13 Apr 2026 00:09:32 GMT</lastBuildDate><atom:link href="https://blog.arslanali.io/rss.xml" rel="self" type="application/rss+xml"/><language><![CDATA[en]]></language><ttl>60</ttl><item><title><![CDATA[Deploying Applications on Google Cloud Run: Maximizing Cost Efficiency with Serverless Containers]]></title><description><![CDATA[Google Cloud Run is a fully managed compute platform that enables you to run stateless containers without worrying about the underlying infrastructure. It abstracts away server management, automatically scales your applications based on demand, and o...]]></description><link>https://blog.arslanali.io/deploying-applications-on-google-cloud-run-almost-free-of-cost</link><guid isPermaLink="true">https://blog.arslanali.io/deploying-applications-on-google-cloud-run-almost-free-of-cost</guid><category><![CDATA[Kubernetes]]></category><category><![CDATA[cloud native]]></category><category><![CDATA[google cloud run]]></category><category><![CDATA[knative]]></category><category><![CDATA[Docker]]></category><dc:creator><![CDATA[Arslan Ali Ansari]]></dc:creator><pubDate>Wed, 12 Mar 2025 21:45:56 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1741804375242/87ae3044-b1e5-4247-b95d-950e9c549d19.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Google Cloud Run is a fully managed compute platform that enables you to run stateless containers without worrying about the underlying infrastructure. It abstracts away server management, automatically scales your applications based on demand, and only charges you for the resources you use.</p>
<p>It is essential to understand how it differs from Serverless technologies and even the traditional way of deploying on Kubernetes.</p>
<ol>
<li><p><strong>Serverless vs. Cloud Run</strong>: While both are serverless in nature, Cloud Run specifically focuses on running containerized applications. Unlike traditional serverless platforms (e.g., AWS Lambda, Google Cloud Functions), which are limited to specific runtimes and code formats, Cloud Run allows you to deploy any containerized application, giving you more flexibility.</p>
</li>
<li><p><strong>Traditional Kubernetes vs. Cloud Run</strong>: Kubernetes (e.g., GKE) requires you to manage clusters, nodes, and scaling policies. Cloud Run, on the other hand, is fully managed, eliminating the need for cluster management. It automatically scales to zero when idle and scales up instantly during traffic spikes, making it more cost-effective and easier to use for lightweight, event-driven workloads.</p>
</li>
</ol>
<p>Before we start with the deployment of our first application on Google Cloud Run, I want to show case the pricing plan and review that up to 2M requests are already free for the month. Even after that, it is only $0.04 / million request. Review detailed pricing <a target="_blank" href="https://cloud.google.com/run/pricing">here.</a></p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1741805712357/25b3b779-9dcf-4ef9-b152-9497c0b81857.png" alt class="image--center mx-auto" /></p>
<p>Now that we understand the basics of Google Cloud Run and how cheap is it to host our applications on Cloud Run, lets try creating a “Hello World“ application.</p>
<h2 id="heading-pre-requisites">Pre-requisites:</h2>
<ol>
<li><p>Google Cloud Account</p>
</li>
<li><p>Billing Enabled</p>
</li>
<li><p>Google Cloud Run API Enabled</p>
</li>
<li><p>Gcloud CLI installed on local machine (you can also use cloud-shell)</p>
</li>
</ol>
<h2 id="heading-agenda">Agenda:</h2>
<p>During this elmental exploration journey of Deployment on <strong>Google Cloud Run</strong> we will go through the following concepts:</p>
<ol>
<li><p>Enable Cloud Run API using CLI</p>
</li>
<li><p>Create a simple Node.js Application</p>
</li>
<li><p>Create a Dockerfile to containerize the application</p>
</li>
<li><p>Create and Push the Image to Google Artifact Repository</p>
</li>
<li><p>Deploy the Application</p>
</li>
<li><p>Cleanup by deleting service and images.</p>
</li>
</ol>
<h3 id="heading-cloud-run-api-setup">Cloud Run API setup</h3>
<p>If you are already logged in with your gcloud cli, then list your account to verify the right account</p>
<pre><code class="lang-bash">gcloud auth list
</code></pre>
<p>You should see an output like:</p>
<pre><code class="lang-plaintext">Credentialed accounts:
 - myaccount@mydomain.com (active)
</code></pre>
<p>Now list the projects in your account to verify the right project is selected as current project or if required then create a project:</p>
<pre><code class="lang-bash">gcloud config list project
</code></pre>
<p>This will give an output like below:</p>
<pre><code class="lang-bash">[core]
project = nodejs-gcp-66776654433
</code></pre>
<p>For more information on gcloud CLI, review the cli documentation <a target="_blank" href="https://cloud.google.com/sdk/gcloud">here</a>.</p>
<p>Now that we have selected the right account and project, lets start by enabling the Google Cloud Run API. You can do it from the API and Services Section of the Google Cloud Console also by lets do it gcloud cli, run the following command:</p>
<pre><code class="lang-bash">gcloud services <span class="hljs-built_in">enable</span> run.googleapis.com
</code></pre>
<p>It can take a moment to enable the API, once done, then let us set the right compute region and location environment variable to use later for docker commands.</p>
<pre><code class="lang-bash"><span class="hljs-built_in">export</span> LOCATION=<span class="hljs-string">"us-east-1
export GOOGLE_CLOUD_PROJECT="</span>nodejs-gcp-66776654433<span class="hljs-string">"
gcloud config set compute/region <span class="hljs-variable">$LOCATION</span></span>
</code></pre>
<p>Here, the region will be your preferred region.</p>
<h3 id="heading-write-a-nodejs-hello-world-application">Write a Nodejs Hello World Application</h3>
<p>To write an express-based nodejs application, you will need two files, a <code>package.json</code> to list all the dependencies and an <code>index.js</code> file with the application logic.</p>
<ol>
<li><p>Create a <code>package.json</code> file and write the following content into it. (You can also use npm init, if you have npm installed on your machine)</p>
<pre><code class="lang-bash"> vi package.json
</code></pre>
<pre><code class="lang-json"> {
   <span class="hljs-attr">"name"</span>: <span class="hljs-string">"helloworld"</span>,
   <span class="hljs-attr">"description"</span>: <span class="hljs-string">"Simple hello world sample in Node"</span>,
   <span class="hljs-attr">"version"</span>: <span class="hljs-string">"1.0.0"</span>,
   <span class="hljs-attr">"main"</span>: <span class="hljs-string">"index.js"</span>,
   <span class="hljs-attr">"scripts"</span>: {
     <span class="hljs-attr">"start"</span>: <span class="hljs-string">"node index.js"</span>
   },
   <span class="hljs-attr">"author"</span>: <span class="hljs-string">"Google LLC"</span>,
   <span class="hljs-attr">"license"</span>: <span class="hljs-string">"Apache-2.0"</span>,
   <span class="hljs-attr">"dependencies"</span>: {
     <span class="hljs-attr">"express"</span>: <span class="hljs-string">"^4.17.1"</span>
   }
 }
</code></pre>
</li>
<li><p>Create <code>index.js</code> file and write the following content into it:</p>
<pre><code class="lang-javascript"> <span class="hljs-keyword">const</span> express = <span class="hljs-built_in">require</span>(<span class="hljs-string">'express'</span>);
 <span class="hljs-keyword">const</span> app = express();
 <span class="hljs-keyword">const</span> port = process.env.PORT || <span class="hljs-number">8080</span>;

 app.get(<span class="hljs-string">'/'</span>, <span class="hljs-function">(<span class="hljs-params">req, res</span>) =&gt;</span> {
   <span class="hljs-keyword">const</span> name = process.env.NAME || <span class="hljs-string">'World'</span>;
   res.send(<span class="hljs-string">`Hello <span class="hljs-subst">${name}</span>!`</span>);
 });

 app.listen(port, <span class="hljs-function">() =&gt;</span> {
   <span class="hljs-built_in">console</span>.log(<span class="hljs-string">`helloworld: listening on port <span class="hljs-subst">${port}</span>`</span>);
 });
</code></pre>
<p> You may try to run it locally by first downloading the package dependencies using <code>npm install</code> and then running the command <code>node index.js</code></p>
</li>
</ol>
<h3 id="heading-containerize-the-application-using-dockerfile">Containerize the Application using Dockerfile</h3>
<p>The Docker deamon uses a Dockerfile to create a docker image, which can be used later to run its containers. Create a file named <code>Dockerfile</code> in the same folder with the following content:</p>
<pre><code class="lang-dockerfile"><span class="hljs-comment"># Use the official lightweight Node.js 12 image.</span>
<span class="hljs-comment"># https://hub.docker.com/_/node</span>
<span class="hljs-keyword">FROM</span> node:<span class="hljs-number">12</span>-slim

<span class="hljs-comment"># Create and change to the app directory.</span>
<span class="hljs-keyword">WORKDIR</span><span class="bash"> /usr/src/app</span>

<span class="hljs-comment"># Copy application dependency manifests to the container image.</span>
<span class="hljs-comment"># A wildcard is used to ensure copying both package.json AND package-lock.json (when available).</span>
<span class="hljs-comment"># Copying this first prevents re-running npm install on every code change.</span>
<span class="hljs-keyword">COPY</span><span class="bash"> package*.json ./</span>

<span class="hljs-comment"># Install production dependencies.</span>
<span class="hljs-comment"># If you add a package-lock.json, speed your build by switching to 'npm ci'.</span>
<span class="hljs-comment"># RUN npm ci --only=production</span>
<span class="hljs-keyword">RUN</span><span class="bash"> npm install --only=production</span>

<span class="hljs-comment"># Copy local code to the container image.</span>
<span class="hljs-keyword">COPY</span><span class="bash"> . ./</span>

<span class="hljs-comment"># Run the web service on container startup.</span>
<span class="hljs-keyword">CMD</span><span class="bash"> [ <span class="hljs-string">"npm"</span>, <span class="hljs-string">"start"</span> ]</span>
</code></pre>
<p>Now you need to build your docker image using the either the local Docker daemon, or if you are running these commands on your Google Cloud Shell then you can also use <code>gcloud</code> cli to build it on cloud. here are both the procedures:</p>
<ol>
<li><p>Build Docker Image using Local Docker Daemon (Docker Desktop in most cases)</p>
<pre><code class="lang-bash"> docker build . -t gcr.io/<span class="hljs-variable">$GOOGLE_CLOUD_PROJECT</span>/helloworld:1.0.0
 docker push gcr.io/<span class="hljs-variable">$GOOGLE_CLOUD_PROJECT</span>/helloworld:1.0.0
</code></pre>
<p> Remember, we created the <code>GOOGLE_CLOUD_PROJECT</code> environment variable already. In you cloud shell, you will not need to create it as it is available by default.</p>
<p> The first command will build the image, where as the second command will push it to your GCR (Google Container Registry)</p>
</li>
<li><p>Build and push Docker image using <code>gcloud</code> cli by running the following command</p>
<pre><code class="lang-bash"> gcloud builds submit --tag gcr.io/<span class="hljs-variable">$GOOGLE_CLOUD_PROJECT</span>/helloworld
</code></pre>
<p> The above command will build the Image in cloud and even push it to the respective repository as well.</p>
</li>
<li><p>You may list the images by using the following commands</p>
<pre><code class="lang-bash"> gcloud container images list
 docker images
</code></pre>
<p> You may use the first command to list the images on GCR, where as second command will list the images on your local machine.</p>
</li>
<li><p>Lets test the application by running the container locally</p>
<pre><code class="lang-bash"> docker run -d -p 8080:8080 gcr.io/<span class="hljs-variable">$GOOGLE_CLOUD_PROJECT</span>/helloworld
</code></pre>
<p> The above command will pull the Image from the GCR if not already available locally. In the second Step it will run a container using the pulled image. To test the application, do <code>curl localhost:8080</code> you should see <code>hello world</code> message as a response from the container.</p>
</li>
</ol>
<h3 id="heading-deploy-the-image-as-a-cloud-run-service">Deploy the Image as a Cloud Run Service</h3>
<p>Deploying any Docker Image as a docker container on Google Cloud Run is fairly simple and straight forward, use the following command:</p>
<pre><code class="lang-bash">gcloud run deploy --image gcr.io/<span class="hljs-variable">$GOOGLE_CLOUD_PROJECT</span>/helloworld 
--allow-unauthenticated --region=<span class="hljs-variable">$LOCATION</span>
</code></pre>
<p>We used the <code>—allow-unauthenticated</code> flag as we want to keep the application public. We used the same <code>$LOCATION</code> environment variable we created earlier.</p>
<p>On Success, you will get a service URL and other details as follows:</p>
<pre><code class="lang-bash">Service [helloworld] revision [helloworld-00001-xit] has been deployed
and is serving 100 percent of traffic.

Service URL: https://helloworld-abc1234-uc.a.run.app
</code></pre>
<p><strong>Congratulations!</strong> you have successfully deployed your first Cloud Run Application. You can also deploy applications directly from public repositories like docker hub and other GCR.io registries.</p>
<p>To verify your deployment, use the Service URL to open it in your browser. You can also verify and manage the deployed services by navigating to Cloud Run in your Google Cloud Console as follows:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1741811087381/4e759aaf-f48a-43f4-9d51-47371abeab27.png" alt class="image--center mx-auto" /></p>
<h3 id="heading-cleanup">Cleanup</h3>
<p>Run the following command to delete the docker image from the GCR registry:</p>
<p><code>gcloud container images delete</code> <a target="_blank" href="http://gcr.io/$GOOGLE_CLOUD_PROJECT/helloworld"><code>gcr.io/$GOOGLE_CLOUD_PROJECT/helloworld</code></a></p>
<p>Also, run the following commands to delete the local docker image:</p>
<pre><code class="lang-bash">docker rmi hellowworld:1.0.0
docker image prune -f
</code></pre>
<p>delete running Cloud Run service by using the cli as follows:</p>
<p><code>gcloud run services delete helloworld --region=$LOCATION</code></p>
<h2 id="heading-conclusion">Conclusion:</h2>
<p>Deploying a container on Google Cloud Run is simple, straightforward, and cost-effective, making it an excellent choice for a wide range of applications. By leveraging its serverless architecture, you can focus on building and scaling your applications without worrying about infrastructure management. Here are some key use cases where Google Cloud Run shines, offering both convenience and cost efficiency:</p>
<ol>
<li><p><strong>Deploying a Static Website Using NGINX</strong>:<br /> Hosting a static website is a perfect fit for Cloud Run. With its ability to scale to zero, you only pay when users access your site. This is ideal for portfolios, documentation sites, or small business websites that don’t require constant uptime.</p>
</li>
<li><p><strong>Deploying a Front-End Application (React, Angular, or Vue)</strong>:<br /> Cloud Run seamlessly handles single-page applications (SPAs) built with modern frameworks. Its automatic scaling ensures your app remains responsive during traffic spikes, while the pay-as-you-go model keeps costs low during periods of low activity.</p>
</li>
<li><p><strong>Deploying an API</strong>:<br /> Whether it’s a RESTful API or a GraphQL endpoint, Cloud Run is an excellent platform for backend services. Its ability to handle concurrent requests and scale instantly makes it ideal for APIs with fluctuating traffic, such as those used in mobile apps or microservices architectures.</p>
</li>
<li><p><strong>Deploying Machine Learning Models (e.g., Deepseek LLM)</strong>:<br /> Cloud Run is a great choice for deploying machine learning models or AI-powered applications. For instance, you can containerize a large language model (LLM) like Deepseek and deploy it as a scalable, cost-effective service. Since Cloud Run scales to zero, you avoid paying for idle resources when the model isn’t in use.</p>
</li>
<li><p><strong>Event-Driven Applications</strong>:<br /> Cloud Run integrates seamlessly with event-driven architectures. For example, you can deploy a service that processes data from Pub/Sub, triggers workflows in response to Cloud Storage events, or handles webhooks from third-party services. This makes it ideal for batch processing, data pipelines, or automation tasks.</p>
</li>
<li><p><strong>Microservices and Lightweight Workloads</strong>:<br /> If you’re building a microservices-based application, Cloud Run allows you to deploy each service independently. Its lightweight nature and fast cold-start times make it perfect for small, focused services that don’t require the overhead of a full Kubernetes cluster.</p>
</li>
<li><p><strong>Prototyping and Development</strong>:<br /> For developers and startups, Cloud Run is a cost-effective way to prototype and test new ideas. You can quickly deploy and iterate on your applications without worrying about infrastructure costs or complexity.</p>
</li>
</ol>
<p>In summary, Google Cloud Run offers a versatile, cost-efficient platform for deploying a wide variety of applications. Its serverless nature, automatic scaling, and pay-as-you-go pricing make it an attractive option for everything from static websites and APIs to advanced AI models like Deepseek LLM. By choosing Cloud Run, you not only simplify deployment but also optimize costs, ensuring you only pay for what you use. Whether you’re a startup, a developer, or an enterprise, Cloud Run empowers you to focus on innovation while leaving the infrastructure management to Google Cloud.</p>
]]></content:encoded></item><item><title><![CDATA[Achieving Kubestronaut in 40 Days]]></title><description><![CDATA[My first interaction with Kubernetes was in Fall of 2017, when I planned to move my Meteorjs Application away from Heruku as I wanted to implement microservices to scale individual service units.
After 3 years of playing around with Kubernetes, in 20...]]></description><link>https://blog.arslanali.io/achieving-kubestronaut-in-40-days</link><guid isPermaLink="true">https://blog.arslanali.io/achieving-kubestronaut-in-40-days</guid><category><![CDATA[Kubernetes]]></category><category><![CDATA[kubestronaut]]></category><category><![CDATA[CNCF]]></category><category><![CDATA[cncfstudents]]></category><category><![CDATA[cka]]></category><category><![CDATA[ckad]]></category><category><![CDATA[CKS]]></category><dc:creator><![CDATA[Arslan Ali Ansari]]></dc:creator><pubDate>Wed, 19 Feb 2025 07:38:20 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1739947908436/45bb2924-4005-4960-8215-e7a8fe52d77f.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>My first interaction with Kubernetes was in Fall of 2017, when I planned to move my Meteorjs Application away from Heruku as I wanted to implement microservices to scale individual service units.</p>
<p>After 3 years of playing around with Kubernetes, in 2020, a recruiter wanted to discuss further a consulting requirement and said “It would have been great if you had a Kubernetes Certification”. I told him to schedule an interview with the client and I will ensure he will have it before that. Indeed I passed it in less than a week but with a score of 71% only. Scores were not important to me then, I got the offer letter but the offer was pulled with a regret letter because of the 2020 Pandemic.</p>
<p>Fast forward to December 2024, our company <a target="_blank" href="https://lowcodesol.com/">Low Code Solutions</a> got a Lead to set up an on-prem capability of Openshift Multi-Cluster for a Telco. This is when I planned to support the company’s Pre-Qualification in front of the Customer and started preparing for Kubestronaut Training on Dec 13 2024.</p>
<p>Due to my busy schedule, I was not able to spend more than a couple of hours a day, which is why I am going to share how smartly it will be possible for you to achieve your certifications in even less time. I will discuss each one in the sequence that I attempted:</p>
<h2 id="heading-cka"><strong>CKA</strong></h2>
<p>Starting off with CKA, as I thought it would have been the hardest, however, it turned out to be the easier one compared to CKAD and CKS. If you have a reasonable knowledge of Linux basics, Docker and Linux Package Managers then you can prepare for it in less than a week.</p>
<h3 id="heading-important-topics"><strong>Important Topics:</strong></h3>
<p>Following are the most important topics, where you will need to spend most of your time:</p>
<ol>
<li><p>Imperative Commands to create, delete and update Kubernetes Resources, e.g: Pods, Deployments, Services etc.</p>
</li>
<li><p>Cluster Installation, Upgrade and Basic Troubleshooting</p>
</li>
<li><p>Understanding of Kubernetes Services Architecture (ETCD, Kube-Scheduler, Kube-Controller-Manager, Kube-ApiServer, etc.)</p>
</li>
<li><p>Understanding of Deployment, and Management of Nodes into a Cluster, understanding the role of Kubelet and Kube-Proxy</p>
</li>
<li><p>Understanding Static Pods and Important Config file Locations e.g: /etc/kubernetes, /var/lib/kubelet etc.</p>
</li>
<li><p>Understanding of TLS and Certificate Management.</p>
</li>
<li><p>Backup and Restore of ETCD</p>
</li>
<li><p>RBAC, how to create roles, rolebindings, clusterroles, clusterrolebindings, and service accounts. If you understand these 5 elements in depth then RBAC will be a piece of cake for you. (tip: it will also help you with your CKS exam, so spend more time with it.)</p>
</li>
</ol>
<h2 id="heading-ckad"><strong>CKAD</strong></h2>
<p>CKAD is more about deploying and managing cloud-native applications on Kubernetes. Remember if you have done CKA, then it will help you in preparing for CKAD, and even for CKS, all you need to do is prepare some additional concepts, here are the important topics that I think you would need to emphasize more than others:</p>
<ol>
<li>Imperative Commands as mentioned for CKA also.</li>
</ol>
<pre><code class="lang-plaintext">k create --help 
k run --help
k expose --help
k edit --help
k replace -f file.yaml -n namespace
k delete --help
k scale --help
</code></pre>
<ol start="2">
<li>Understand the concept of Ingress Resource and practice the following command:</li>
</ol>
<pre><code class="lang-plaintext">k create ingress NAME --rule=host/path=service:port[,tls[=secret]]
</code></pre>
<ol start="3">
<li><p>Practice the creation, updation and mounting of secrets, configmaps, service accounts, volumes and volume claims in Pods/Containers.</p>
</li>
<li><p>Taints, Tolerations, Node Affinity, Node Selector and nodeName for controlling how we schedule pods on specific nodes. Also understand why we need node affinity when we have taints and tolerations.</p>
</li>
<li><p>Init and Sidecar Containers</p>
</li>
<li><p>Logging, Monitoring and Probes.</p>
</li>
<li><p>Green-Blue and Canary Deployments</p>
</li>
<li><p>Pod SecurityContext and RBAC as mentioned above also.</p>
</li>
<li><p>Custom Resource Definitions and HELM</p>
</li>
</ol>
<h2 id="heading-cks"><strong>CKS</strong></h2>
<p>The CKS exam is where the real challenge begins. If you excelled in the CKA and CKAD exams, it’s easy to underestimate the difficulty of the CKS. I made the same mistake — despite my extensive Kubernetes experience and proficiency with the complex commands and vim, I failed on my first attempt. So before I tell you what to prepare most, let me share the mistakes:</p>
<h3 id="heading-mistake-1"><strong>Mistake #1:</strong></h3>
<p>I spent my first hour on just 3 questions, which left me with no chance to attempt all the questions and I was not able to even view the last 3 questions. So time management is important. Cluster Upgrade, Setting Up Audit Policy, and ImagePolicyWebhook are the most time-consuming questions, attempt them wisely.</p>
<h3 id="heading-mistake-2"><strong>Mistake #2:</strong></h3>
<p>I had clear and vivid core concepts of all the topics, so I thought I will take help from the documentation and solve all the questions. But Practice is the key, you can’t switch between the documentation and expect that you will finish in time.</p>
<h3 id="heading-mistake-3"><strong>Mistake #3:</strong></h3>
<p>I did not practice troubleshooting cluster crashes. So practice troubleshooting Api-Server or Kubelet issues by utilizing docker/crictl logs, displaying logs from /var/log folders for pods and containers, and other interactive ways. There are specific scenarios of such troubleshooting in <a target="_blank" href="https://killercoda.com/">KillerCoda Playgrounds</a>.</p>
<h3 id="heading-cks-exam-tips"><strong>CKS Exam TIPs:</strong></h3>
<ol>
<li><p>Falco questions are tricky, most of my peers were unable to solve it, so do not spend too much time on it.</p>
</li>
<li><p>Make sure you know the difference between Layer 3, Layer 4, and Layer 7 Cillium Policies.</p>
</li>
<li><p>Some questions have supplementary tasks at the end of the questions. Read the whole scenario, do not assume that you are done.</p>
</li>
<li><p>Verify your answers. For every scenario, practice the verification process. Spare at least 15 to 20 min for your answer verification, which leaves you with 90 to 100 minutes for solving the scenarios.</p>
</li>
</ol>
<h3 id="heading-important-topics-for-cks-which-require-extreme-practice"><strong>Important Topics for CKS which require extreme practice:</strong></h3>
<ol>
<li><p>BOM, Benchmarking and Vulnerability Scanning CLI Tools e.g: BOM, Kube-bench, Trivy, lsof, strace, netstat -plnt etc.</p>
</li>
<li><p>Api-Server Audit Log, ImagePolicyWebhook, Cluster Upgrade, Securing and Encrypting ETCD data and Setting up Network Policy (both native and cilium)</p>
</li>
<li><p>Sandboxing the Containers, Immutability and Runtime Security with Falco etc.</p>
</li>
<li><p>Setting proper Pod Security Context, AppArmor/Seccomp profiles, readOnlyRootFilesystem, allowPrivilegeEsclation etc. in context of Pod Security Standards.</p>
</li>
<li><p>Ingress Resource with TLS Secrets and its important annotations. (Hint: Imperitive Commands will help save time)</p>
</li>
</ol>
<h2 id="heading-general-exam-tips"><strong>General Exam Tips</strong></h2>
<p>KCNA and KCSA are relatively easier exams, I recommend you give them at the end as after you prepare the above three exams you will have enough general knowledge to attempt the MCQs effectively, however there are a lot of topics that you might have to prepare e.g: Compliance Standards like CIS, OWASP, NIST etc. Here are some generic exam tips for CKAD, CKA and CKS:</p>
<ol>
<li><p>Practice is the key, you may know and understand the concepts in depth, but remember CKA, CKAD and CKS are performance-based exams, and you will get 16 to 18 scenarios to solve. In our consulting world, we take at least a day or two to solve even one.</p>
</li>
<li><p>Solve at least 2 to 3 times the <a target="_blank" href="https://killercoda.com/">KillerCoda Playgrounds</a> thanks to <a target="_blank" href="https://www.linkedin.com/in/kimwuestkamp/">Kim Wustkamp</a>, the founder, for keeping it free. However, I would recommend taking at least a 1 month pro subscription because this way you will get the exam desktop which will help you familiarize yourself with the exam environment. I wasted 5 minutes finding how to do basic functionalities like copy/pasting and keeping notes etc.</p>
</li>
<li><p>Get a <a target="_blank" href="https://kodekloud.com/">KodeKloud</a> Subscription, it is a gold mine of learning material for Infrastructure and DevOps. If you are validating</p>
</li>
<li><p>VIM proficiency: you need to start using some shortcuts in vim, some of the most handy are the ones which will help you with: Copy, Past, Delete Line, replace-in-place, Indentation, set number to show line numbers, visual tool to copy/duplicate multiple lines, find text.</p>
</li>
<li><p>Grep proficiency: Grep can help you in unimaginable ways e.g: finding a particular vulnerability in a BOM scan, finding a file containing a specific text in a folder of multiple such files (hint: grep -r), greping two options (hint: grep -E “one|two”).</p>
</li>
</ol>
<h2 id="heading-profession-tips"><strong>Profession Tips</strong></h2>
<p>Even though <a target="_blank" href="https://www.linuxfoundation.org/">The Linux Foundation</a> and the <a target="_blank" href="https://www.cncf.io/">CNCF</a> has done a great job at creating and crafting these exams and certifications programs where they have made sure that only the exceptional comes out shining. However, it is important to know that real-world scenarios are even more complex and worse than the ones you will encounter in these exams. So, getting the certification does not guarantee that you are suitable for some challenging role. The certification will help you to land an interview though, but to get a great offer you will need a lot more than the certifications. So after hiring 100s of individual I am going to share some of the most important traits that you also need to land your dream job:</p>
<ol>
<li><p>Prepare for these certifications keeping your professional goals in mind. Prepare yourself for the industry challenges not only for the exam, have your previous job scenarios in mind and relate every topic with your current or upcoming role in business.</p>
</li>
<li><p>Work on your soft-skills, make sure you know how to express your learning and experiences effectively. Understand and make use of industry and technology buzzwords and their concepts e.g: cloud-native, cluster-hardening, encryption at rest, zero-trust, software security compliance and standards etc.</p>
</li>
<li><p>Make yourself vocal on platforms like <a target="_blank" href="https://linkedin.com/">LinkedIn</a> and <a target="_blank" href="https://x.com/">X</a> by sharing your thoughts on generic technology discussions. (even I am struggling to do that often)</p>
</li>
<li><p>Express your ideas in Diagrams and Visuals. Make every presentation or discussion more meaningful with your visual skills.</p>
</li>
<li><p>Learn how to explain a complex idea or architecture to multiple audiences, tech, non-tech, business or a layman.</p>
</li>
<li><p>Write blogs/articles while your are learning or even after accomplishing a goal. Focus on spreading the knowledge, tips and tricks rather than publicity or marketing.</p>
</li>
<li><p>Start Contributing in Kubernetes and/or other CNCF Projects. Start by attending the weekly or monthly meetings and help yourself understand the working group processes until you think you are ready to take on the challenge of taking an assignment.</p>
</li>
<li><p>Remember the motive, we are working to help make the world a better place.</p>
</li>
</ol>
]]></content:encoded></item><item><title><![CDATA[PEGA Customer Cloud Deployment Models]]></title><description><![CDATA[We all know that PEGA began embracing the true essence of cloud-native starting from version 8.6. Now, with the release and planning of its new version system, starting with PEGA 23, we understand that a major release is scheduled every year. All upc...]]></description><link>https://blog.arslanali.io/pega-customer-cloud-deployment-models</link><guid isPermaLink="true">https://blog.arslanali.io/pega-customer-cloud-deployment-models</guid><category><![CDATA[pega deploy]]></category><category><![CDATA[Kubernetes]]></category><category><![CDATA[pega]]></category><category><![CDATA[Devops]]></category><category><![CDATA[enterprise]]></category><dc:creator><![CDATA[Arslan Ali Ansari]]></dc:creator><pubDate>Sun, 22 Sep 2024 11:51:19 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1727005547096/ee8cf98d-7ecc-46d5-9f37-855bbe8e65f3.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>We all know that PEGA began embracing the true essence of cloud-native starting from version 8.6. Now, with the release and planning of its new version system, starting with PEGA 23, we understand that a major release is scheduled every year. All upcoming releases incorporate the concept of separation of concerns at their core, which means that third-party services like Kafka, Elastic Search, and Cassandra have been externalized, allowing you to share your enterprise deployments of these services. This architecture is further implemented in the core of PEGA services, with Hazelcast now serving as the clustering service that connects to your enterprise Kafka layer. Additionally, you can have a single clustering service feeding and managing the cache of multiple PEGA environments. This architecture is followed in Constellation and even for the Search and Reporting Service.</p>
<p>By establishing such an externalized architecture, PEGA can now be deployed in over a dozen different configurations. In this article, we will discuss some of the most important ones.</p>
<h2 id="heading-pega-classic-deployment">PEGA CLASSIC DEPLOYMENT</h2>
<p>A classic Pega deployment entails deploying all components for every environment. For example, consider an insurance company that requires Pega to be installed in multiple Kubernetes clusters, with each cluster representing a distinct environment.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1727002037063/d304d8a7-1ffa-4ec9-9757-5ea38a86b338.png" alt="Pega Deployment with Dedicated Services for Each Environment" class="image--center mx-auto" /></p>
<p>Each cluster must have its own database, Kafka, Elasticsearch, Cassandra, and more. In addition to third-party services, Pega services are also deployed individually within each cluster. This type of deployment is recommended for various reasons, but it has its drawbacks. On the one hand, it can be advantageous since each environment is entirely isolated from the others, eliminating the need to worry about micro-configurations for each service to manage environments simultaneously. On the other hand, it necessitates at least twice the hardware resources to run everything and will be twice as expensive to maintain each environment, as each one must be managed separately.</p>
<h2 id="heading-pega-connect-deployment">PEGA CONNECT DEPLOYMENT</h2>
<p>A connected deployment is one in which the customer already has running clusters of one or more third-party services, such as Elastic Search, Kafka, or Cassandra, etc. We often encounter customer requirements where they mention having existing clusters of any of the aforementioned third-party services and wish to utilize them for PEGA as well.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1727002193995/ac187981-534c-4b3e-9f0e-2d6fbbc4c00e.png" alt="Shared Services among all Pega Environments" class="image--center mx-auto" /></p>
<p>Generally, it is not recommend sharing these services with other Pega applications; however, it is always a design decision to make based on the Cloud Topology Architecture. We see no harm in implementing this kind of design, where we only deploy the PEGA platform and utilize the customer-provided clusters for third-party applications. This setup reduces the friction in rolling out the PEGA platform, uses significantly fewer resources compared to the classic model, and sometimes is more efficient and performant as well.</p>
<h2 id="heading-pega-shared-deployment">PEGA SHARED DEPLOYMENT</h2>
<p>A shared deployment is when we deploy and share many different shareable PEGA Services across different environments and platforms. Most of the services are highly recommended to be shared as it serves the reasons of breaking down Pega into microservices. Lets drill down this deployment in more detail by understanding different scenarios.</p>
<ol>
<li><p><strong>Shared CDN-ONPREM</strong><br /> CDN is something that you can not only share among all of your environments without any hesitation but you may also utilize the PEGA-provided CDN for this reason (only if your pods have access to internet).</p>
</li>
<li><p><strong>Shared Constellation App Static</strong></p>
<p> This service keeps track of your custom components and serves these components to your PEGA Applications which use constellation UI instead of Traditional UI. It is highly recommended that you share your Constellation App Static service among your environments. One of the major reasons is that you will get consistent Static Custom Components across all environments which means once deployed and tested a component will never be required to test again even after a feature update by DM in the Pega Environment. One scenario where I might recommend using a separate Constellation App Static is when you are using a Third-Party Pipeline Manager like Jenkins to trigger DM Pipelines along with other third-party deployment pipelines like React App, Mobile Apps etc, this way you can trigger a Constellation App Static Custom Component Publish in each environment as a Deployment Pipeline Task.</p>
</li>
<li><p><strong>Shared Search and Reporting Service</strong></p>
<p> SRS is a multi-tenant service, which means many environments with different environment/tenent IDs can connect to a single SRS service. In a Kubernetes cluster, we can deploy it along with a single Elastic Search Cluster serving many different environments.</p>
</li>
<li><p><strong>Deployment Manager and PDC</strong></p>
<p> PDC and Deployment Manager are designed as a shared service for Multiple Environments, DM requires multiple environments to perform code promotions and feature deployment from one environment to the other, however, the the PDC is also a multi-tenant multi-environment service. This means you can use Single PDC for numerous Pega environments for multiple customers/business units.</p>
</li>
</ol>
<p>The following figure shows a fully shared model, where all multi-tenant applications and services of Pega as well as other shareable third-party services are shown:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1727002474257/8d3fbb0c-20b6-4200-a473-a236d0e12ac9.png" alt="Pega Fully Shared Application Deployment Model" class="image--center mx-auto" /></p>
<p>Deploying Pega for enterprises has long been a challenge for Infrastructure and DevOps teams. While finding skilled Infra/DevOps professionals is common, however, individuals with both Pega Admin and Kubernetes DevOps experience are rare. This scarcity is one of the primary reasons many Pega customers face difficulties in managing and maintaining their enterprise Pega deployments.</p>
<p>With the introduction of Pega Cloud, enterprise customers can significantly reduce the complexity and burden of implementation and ongoing maintenance once migrated to Pega Cloud. However, the transition can still be a complex and time-consuming process for organizations with multiple integrations between Pega and other on-prem applications.</p>
<p>As a consultant, I have had the privilege of working with numerous enterprise customers who faced complex application integration challenges. With over two decades of experience in software engineering and cloud technologies, I specialize in designing intricate, scalable architectures tailored to unique business needs. My expertise extends to creating comprehensive communication matrices well in advance, ensuring seamless coordination and faster deployments. I have consistently delivered projects in record time, helping organizations save millions of dollars while achieving their digital transformation goals efficiently.</p>
<p>So feel free to contact me for any design, implementation, or infrastructure management reviews or consulting requirements.</p>
]]></content:encoded></item><item><title><![CDATA[Optimizing Workload and Compute in Kubernetes using descheduler]]></title><description><![CDATA[Binding and placement of pending Pods on to respective Nodes are managed by a scheduler in Kubernetes called Kube-scheduler. Configurable scheduler policies, plugins, and extensions manage the placement decisions, often called predicates and prioriti...]]></description><link>https://blog.arslanali.io/optimizing-workload-and-compute-in-kubernetes-using-descheduler</link><guid isPermaLink="true">https://blog.arslanali.io/optimizing-workload-and-compute-in-kubernetes-using-descheduler</guid><category><![CDATA[Kubernetes]]></category><category><![CDATA[Cloud Computing]]></category><category><![CDATA[Microservices]]></category><category><![CDATA[openshift]]></category><dc:creator><![CDATA[Arslan Ali Ansari]]></dc:creator><pubDate>Mon, 11 Mar 2024 08:19:05 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/stock/unsplash/CkZF0-etxU8/upload/b2752581674afea81a3d9a14c7654bd7.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Binding and placement of pending Pods on to respective Nodes are managed by a scheduler in Kubernetes called Kube-scheduler. Configurable scheduler policies, plugins, and extensions manage the placement decisions, often called predicates and priorities. The decision of a scheduler is based on the actual condition or state of the Cluster at the time when the Pod is requested to be deployed (scheduled). Since the Kubernetes cluster may change its state by updating or change of labels, taints, tolerations, or even by introducing new nodes into it. There may be a desire of relocating a pod from one node to another, a.k.a descheduler.</p>
<p>So before we explore the descheduler, we might need to recap how the scheduler works. The scheduling decision is based on 4 stages or extension points, these are:</p>
<ol>
<li><p>Scheduling Queue</p>
</li>
<li><p>Filtering</p>
</li>
<li><p>Scoring</p>
</li>
<li><p>Binding</p>
</li>
</ol>
<p>There can be multiple plugins installed on these extension points, e.g: PrioritySort plugin on the queue, NodeResourceFit, and NodeName plugin on the filtering extension point. Because of the highly extensible nature of Kubernetes, it makes it possible first to customize which plugin goes where and also allows us to write our custom plugins. It also gives us the ability to add a plugin in the post and pre-stages of the extension points.</p>
<p>Now since the scheduling decision is based on a decision taken place by multiple plugins and their placement in the extension points at the time of scheduling, so it is highly possible that the original scheduling decision is not valid anymore.</p>
<h2 id="heading-when-do-you-need-a-descheduler">When do you need a Descheduler</h2>
<p>Due to the dynamic nature of the Kubernetes Cluster, there could be several reasons why you may want to evict (deschedule) a Pod from a node:</p>
<ol>
<li><p>To improve cluster performance and availability by redistributing pods to optimize resource usage and reduce contention for resources.</p>
</li>
<li><p>To minimize downtime by automatically rescheduling pods on healthy nodes when a node fails.</p>
</li>
<li><p>To help with scaling by removing underutilized pods and redistributing them to nodes where they can be better utilized.</p>
</li>
<li><p>To improve security by ensuring that only authorized pods are running on the cluster.</p>
</li>
<li><p>To enforce policies such as inter-pod anti-affinity, where it can detect and deschedule the pods that don't conform to the policy and redistribute the pods to other nodes.</p>
</li>
<li><p>To improve cost-efficiency by reducing wastage of resources by identifying and removing duplicate pods and rescheduling them.</p>
</li>
</ol>
<p>Below is the list of different scenarios where the use of a descheduler is unavoidable:</p>
<h3 id="heading-1-a-new-node-is-introduced-in-a-cluster">1. A new node is introduced in a cluster</h3>
<p>You have just introduced a new node in the cluster and want to distribute the workload evenly. Without descheduling, your pods may reside on the original nodes for ages, and due to this adding new nodes will not have any immediate performance benefits. Descheduling pods and redistributing them on new nodes can help improve resource usage and ensure that resources are distributed evenly across the cluster. By spreading the pods across different nodes, the load on individual nodes is reduced resulting in improved performance and stability of the cluster. It will also help the default scheduler and auto-scaler to adjust the number of replicas to match the new capacity and resource available in the cluster.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1673478229643/1e482dd7-36fa-4e86-823e-020149611350.jpeg" alt class="image--center mx-auto" /></p>
<h3 id="heading-2-node-labels-are-updated">2. Node labels are updated</h3>
<p>A node label update can affect different scenarios and the original scheduling decision may not be appropriate for certain pods. Here are some of the important ones:</p>
<ol>
<li><p><strong>Node Affinity:</strong> Node affinity allows for pods to be scheduled based on the labels assigned to a node. If the labels of a node is changed, it may no longer match the node affinity rules of a pod, which can lead to an undesired state.</p>
</li>
<li><p><strong>Node Selector:</strong> Node selector allows to schedule pods on specific nodes based on the node labels, if a label is updated then these decisions are no longer valid and require eviction.</p>
</li>
<li><p><strong>Failure Domain:</strong> Node labels can be used to indicate the failure domain of a node like a region, rack, or zone, which can be used to schedule the pods to spread across multiple failure domains. An intelligent deschduler will ensure the high availability of services by taking optimum descheduling decisions.</p>
</li>
</ol>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1673478270716/92b72f6e-334c-4125-b289-f3d1830fba58.jpeg" alt class="image--center mx-auto" /></p>
<h3 id="heading-3-node-failure-requires-pods-to-be-moved">3. Node failure requires Pods to be moved</h3>
<p>It is important to deschedule pods on a failed node because:</p>
<ol>
<li><p><strong>High availability:</strong> When a node fails, the pods running on that node can become unavailable, and this can have a significant impact on the availability of the applications and services running on the cluster. By descheduling the pods on a failed node, the cluster can automatically reschedule the pods on healthy nodes, which can help to minimize downtime and improve availability.</p>
</li>
<li><p><strong>Resource Utilization:</strong> A failed node can cause a significant drain on resources such as CPU and memory. By descheduling the pods on a failed node, the cluster can free up these resources, which can be used more efficiently by other pods, improving overall cluster performance.</p>
</li>
<li><p><strong>Auto Scaling:</strong> A failed node can impact the scaling of pods. By descheduling the pods running on a failed node, the auto-scaler can automatically adjust the number of pods running on healthy nodes to maintain the desired number of replicas.</p>
</li>
<li><p><strong>Networking:</strong> Descheduling pods on a failed node can help prevent networking issues, such as IP conflicts or service outages, which may be caused by pods running on a failed node.</p>
</li>
<li><p><strong>Security:</strong> Descheduling pods running on a failed node can help prevent security risks. A failed node can be compromised and running malicious pods on a compromised node can pose a significant risk to the cluster.</p>
</li>
</ol>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1673529526271/a532a439-cc8e-4133-a6e7-77813c27fe6c.jpeg" alt class="image--center mx-auto" /></p>
<p>Descheduling pods on failed nodes are important to ensure that the cluster remains highly available, that resources are used efficiently, and that the auto-scaler can adjust the number of replicas running, avoiding networking issues and maintaining security.</p>
<h3 id="heading-4-remove-duplicates">4. Remove Duplicates</h3>
<p>Duplicate pods in a Kubernetes cluster can cause several issues that can negatively impact the performance and availability of the cluster. Some reasons why it's important to remove duplicate pods from a node running in Kubernetes include:</p>
<ol>
<li><p><strong>Resource Utilization:</strong> Duplicate pods consume resources such as CPU and memory that could be used more efficiently by other pods. This can cause resource contention, which can lead to delays in container startup times and negatively impact the overall performance of the cluster.</p>
</li>
<li><p><strong>Networking:</strong> Each pod consumes network resources such as IP addresses, and having multiple pods with the same IP address can cause networking issues such as IP conflicts, which can cause communication problems between pods and services.</p>
</li>
<li><p><strong>Scalability:</strong> Duplicate pods can make it difficult to scale the number of pods running in a cluster. For example, if a Deployment controller creates multiple replicas of a pod, each replica will have a different replica number and the same pod name which can confuse when trying to scale the number of replicas.</p>
</li>
<li><p><strong>Security:</strong> Having duplicate pods can make it difficult to keep track of what is running on a cluster and can open security vulnerabilities. It is important to ensure that all running pods are authorized and that no rogue pods are running.</p>
</li>
<li><p><strong>Cost:</strong> Running duplicate pods can result in a waste of resources, which can result in higher costs.</p>
<p> Removing duplicate pods from a node can help to improve the overall performance and availability of a Kubernetes cluster by optimizing resource utilization, reducing network conflicts, improving scalability, security, and reducing costs.</p>
</li>
</ol>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1673529214337/65edb487-c125-4ce9-ba73-4144e6039c75.jpeg" alt class="image--center mx-auto" /></p>
<h3 id="heading-5-lowhigh-node-utilization">5. Low/High Node Utilization</h3>
<p>A descheduler can help distribute load across different nodes in the cluster in several ways:</p>
<ol>
<li><p><strong>Balancing resource usage:</strong> A descheduler can identify pods that are consuming a disproportionate amount of resources on a node, such as CPU or memory, and move them to other nodes where resources are more available. This can help to balance the resource usage across the cluster and improve overall cluster performance.</p>
</li>
<li><p><strong>Reducing node overcommitment:</strong> A descheduler can identify nodes that have a high number of pods running on them and redistribute the pods to other nodes to reduce the number of pods running on the overcommitted node. This can help to reduce contention for resources and improve the overall performance of the cluster.</p>
</li>
<li><p><strong>Improving node utilization:</strong> A descheduler can help to identify and remove underutilized pods on a node, and redistribute them to nodes where they can be utilized better. This can help to improve the utilization of resources across the cluster.</p>
</li>
</ol>
<h3 id="heading-6-pods-violating-inter-pod-antiaffinity">6. Pods Violating Inter Pod AntiAffinity</h3>
<p>Inter-pod anti-affinity is a feature that allows you to specify rules for how pods should be scheduled in relation to one another. These rules can be used to ensure that pods that belong to the same application or service are spread across different nodes in a cluster, in order to improve availability and reduce the risk of single points of failure.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1673477748574/85b8bc5c-cd4d-4245-9453-e6b8e87bed3a.jpeg" alt="Descheduling Example of Anti Pod Affinity" class="image--center mx-auto" /></p>
<p>One important reason to use inter-pod anti-affinity is to ensure that pods that need to be highly available, such as database pods, are not scheduled on the same node. If multiple pods that need to be highly available are scheduled on the same node and that node goes down, multiple pods will become unavailable at the same time. Spreading these pods across different nodes can help to mitigate this risk.</p>
<p>Another reason to use inter-pod anti-affinity is to ensure that pods are spread across different zones or regions to improve resiliency in the event of a zone or region failure.</p>
<p>It is important to use a descheduler in this scenario because, despite the best efforts of Kubernetes scheduler, sometimes pods can violate inter-pod anti-affinity rules due to various reasons like over-commitment of resources or other factors. A descheduler can help identify and remove these pods, ensuring they are rescheduled to comply with the specified anti-affinity rules. This can help to improve cluster availability, reduce the risk of single points of failure, and optimize resource utilization.</p>
<h3 id="heading-10-pods-violating-topology-spread-constraint">10. Pods Violating Topology Spread Constraint</h3>
<p>Topology spread constraints allow to spread of pods evenly across different nodes, zones, regions, or racks. By descheduling pods that violate these constraints, the cluster can ensure that the resources are utilized more efficiently, pods are spread out to reduce contention of resources, it can guarantee high availability by running the services on multiple nodes, zones, and regions and it can also help minimize the impact of an infrastructure failure by failure domain awareness.</p>
<h3 id="heading-11-pods-having-too-many-restarts">11. Pods Having Too Many Restarts</h3>
<p>When a pod has too many restarts, it can indicate that there is an issue with the pod or the node it is running on. Pods that are continuously restarting can destabilize the cluster, can delay container startup times, and negatively impact the overall performance of the cluster. Descheduling such a pod can help the operation teams to understand the issue behind the continuous restarts and stabilize the application and cluster performance.</p>
<h2 id="heading-conclusion">Conclusion</h2>
<p>In summary, a descheduler plays an important role in managing and optimizing the distribution of pods within a Kubernetes cluster. It can help to ensure that the cluster is running at optimal performance, that resources are being used efficiently, and that the cluster is secure and available.</p>
]]></content:encoded></item><item><title><![CDATA[How to Host Static HTML Website on AWS Amplify]]></title><description><![CDATA[There are many low cost static website hosting options available but most of them require complicated process of uploading the website, configuring the domain names and managing the SSL certificates etc. Amazon Amplify simplifies these tasks by provi...]]></description><link>https://blog.arslanali.io/how-to-host-static-html-website-on-aws-amplify</link><guid isPermaLink="true">https://blog.arslanali.io/how-to-host-static-html-website-on-aws-amplify</guid><category><![CDATA[AWS Amplify]]></category><category><![CDATA[CI/CD]]></category><category><![CDATA[Static Website]]></category><category><![CDATA[hosting]]></category><dc:creator><![CDATA[Arslan Ali Ansari]]></dc:creator><pubDate>Thu, 12 Jan 2023 23:54:08 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1673567438882/3a151812-0ed7-4ca4-b667-1852d597a290.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>There are many low cost static website hosting options available but most of them require complicated process of uploading the website, configuring the domain names and managing the SSL certificates etc. Amazon Amplify simplifies these tasks by providing simple CI/CD, allowing you to connect your Code Repository, it also provides automatic Build Pipelines for multiple environments and above all it provides out of the box SSL Certificate for your custom domains.</p>
<p>Deploying a static website on AWS Amplify is a straightforward process that can be broken down into the following steps:</p>
<h2 id="heading-1-create-a-new-amplify-app">1. Create a new Amplify app.</h2>
<p>Log in to the AWS Amplify Console and create a new app by selecting the <strong>Host Web App</strong> option and by providing a name, environment, and repository for your website.</p>
<h2 id="heading-2connect-your-repository">2.Connect your repository</h2>
<p>Connect your repository to the app by linking it to your GitHub, GitLab, or Bitbucket account, or by manually uploading your website files. Here are the steps to connect your GitHub repository in AWS Amplify:</p>
<ol>
<li><p>Go to the Amplify Console and select the app that you want to connect to your GitHub repository.</p>
</li>
<li><p>Click on the "Connect branch" button.</p>
</li>
<li><p>Select "GitHub" from the list of repository providers.</p>
</li>
<li><p>Use your GitHub account to sign in, and authorize Amplify to access your GitHub repositories.</p>
</li>
<li><p>Select the repository that contains your website or app, and choose the branch that you want to connect.</p>
</li>
<li><p>Click on the "Next" button and configure the build settings, environment variables, and other options as needed.</p>
</li>
<li><p>Click on the "Save and Deploy" button to start the build and deploy process.</p>
</li>
<li><p>After the deployment is complete, you can monitor the build and deploy process, and make updates to your app through the Amplify console.</p>
</li>
</ol>
<p>Heres the screen shot of the options available for Code Repository options:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1673564857018/4195e571-74d0-4a5a-bfcd-67d118b268a7.png" alt class="image--center mx-auto" /></p>
<p>It's important to note that in order to connect to your GitHub repository, you need to have a GitHub account, and your repository should be public or accessible by AWS Amplify.</p>
<h2 id="heading-3-build-and-deploy-your-app">3. Build and Deploy your App</h2>
<p>Once your repository is connected, Amplify will automatically build and deploy your app. You can also configure custom build settings and environment variables if needed. Please comment if you need any support in your custom build process.</p>
<h2 id="heading-4-configure-custom-domains">4. Configure custom domains</h2>
<p>Amplify will provide a default domain for your app, but you can also configure custom domains if needed. Here are the steps to configure custom domains for a deployed app on AWS Amplify:</p>
<ol>
<li><p>Go to the Amplify Console and select the app that you want to configure a custom domain for.</p>
</li>
<li><p>Click on the "Domain settings" tab and then click on the "Connect domain" button.</p>
</li>
<li><p>Enter the domain name that you want to use for your app. Amplify will automatically check the availability of the domain and suggest a domain if it's not available.</p>
</li>
<li><p>If the domain is available, Amplify will provide you with a set of instructions to verify that you own the domain. This typically involves adding a CNAME or A record to your domain's DNS settings.</p>
</li>
<li><p>Once the domain is verified, Amplify will automatically create a certificate for your custom domain.</p>
</li>
<li><p>Once the certificate is ready, you can associate the custom domain with your app by clicking on the "Associate" button.</p>
</li>
<li><p>Now, your custom domain is configured and should be active within a few minutes. You can test it by visiting the custom domain in your browser.</p>
</li>
</ol>
<p>Here is a screenshot of how I have added my custom domain and mapped the branches to each of the sub-domains:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1673564771721/2bdfbb09-3b9a-4307-a5ce-6a50bdd0c94b.png" alt class="image--center mx-auto" /></p>
<p>It's important to note that you will need to have access to your domain's DNS settings to configure custom domains in Amplify. Also, Amplify require a valid SSL certificate for custom domains, it will create one for you automatically but you can also use your own certificate.</p>
<h2 id="heading-5-monitor-and-update-your-app">5. Monitor and update your app</h2>
<p>After your app is deployed, you can monitor it's performance, view analytics, and make updates through the Amplify console. You can also monitor the application's access logs and set alarms based on different metrics like 40x and 50x errors etc.</p>
<p>Below are the screen shots of the access logs and the site analytics of my <a target="_blank" href="https://arslanali.io">personal portfolio website</a> I deployed on Amplify earlier.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1673564542800/fad163c6-8eab-4cf7-b931-d47b374bf006.png" alt class="image--center mx-auto" /></p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1673564585656/4e20bf7d-884a-4c2c-8fd8-b39942ccb3ac.png" alt class="image--center mx-auto" /></p>
<h2 id="heading-6-enable-cloudfront-distribution">6. Enable CloudFront distribution</h2>
<p>By default CloudFront distribution is enabled for you application if you are using Route 53 to configure your DNS.</p>
]]></content:encoded></item><item><title><![CDATA[Securing the Kubernetes Clusters with AI and ML]]></title><description><![CDATA[More than half of all enterprises consider security as their biggest challenge when publishing their microservice workloads in production. 50% require developers to use validated images only, around 80% want to have a DevSecOps initiative, more than ...]]></description><link>https://blog.arslanali.io/how-to-secure-kubernetes-clusters-with-ai-and-ml</link><guid isPermaLink="true">https://blog.arslanali.io/how-to-secure-kubernetes-clusters-with-ai-and-ml</guid><category><![CDATA[Kubernetes]]></category><category><![CDATA[DevSecOps]]></category><category><![CDATA[Devops]]></category><category><![CDATA[Security]]></category><dc:creator><![CDATA[Arslan Ali Ansari]]></dc:creator><pubDate>Wed, 19 Oct 2022 20:46:21 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/unsplash/M5tzZtFCOfs/upload/v1666007015730/xsAT_EIBf.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>More than half of all enterprises consider <strong>security as their biggest</strong> challenge when publishing their microservice workloads in production. 50% require developers to <strong>use validated images</strong> only, around 80% want to have a <strong>DevSecOps initiative</strong>, more than 40% <strong>consider DevOps</strong> as the role most responsible for Kubernetes security, and most importantly, more than half have delayed application deployment due to security concerns.</p>
<blockquote>
<p>According to the <strong>State of Kubernetes Security Report 2022</strong>, security is one of the biggest concerns with container adoption, and security issues continue to cause delays in deploying applications into production.</p>
</blockquote>
<p>In the last 1 year, 93% said that they have experienced at least one major security incident. More than 30% of them have experienced customer or revenue losses due to these incidents. According to a recent study, 95% of the breaches were due to human errors.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1666011633384/nC1T82Mde.png" alt="Screen Shot 2022-10-17 at 4.59.21 PM.png" /><br />Source (Red Hat State of Kubernetes Security Report - 2022)</p>
<h2 id="heading-impact">Impact</h2>
<p><strong>1. Data Compromise</strong><br />An attacker with access to business or infrastructure data can leak or destroy the data.</p>
<p><strong>2. Resource Hijacking</strong><br />If an attacker gets access to a node, or any compute resource in a cluster; he can easily run resource-hungry scripts like crypto mining (crypto-jacking), AI model processing, etc.</p>
<p><strong>3. Denial of Service</strong><br />DoS can be achieved using buffer overflow (by flooding general requests), ICMP flooding (by sending spoofed packets), or SYN flooding (by sending false requests to the server without a handshake). DoS attack is meant to shut down a computer or make a network inaccessible.</p>
<p><strong>4. Ransom</strong><br />An attacker can take over the Management Layer or even remove the (important) data and ask for a ransom in return for the data or control.</p>
<p><strong>5. Loss in Customers and/or revenue</strong><br />One may incur major losses once a service is down because of any reason mentioned above.</p>
<h2 id="heading-aiops-to-the-rescue">AIOps to the Rescue</h2>
<p>KaiOps is an AI/ML SaaS-based tool, that connects with your clusters using a secure connection with the KubeAPI server. You may need to update the default security profile to enable the KaiOps security agents to scan and flag different vulnerabilities as low, medium, or high on threats. Once a threat is found, it will send you notifications on your preferred communication channels and inform you of the possible implication, remediation, and/or prevention strategies.</p>
<p>KaiOps scans the cluster every few seconds at runtime and observes the most important Kube events related to network, storage, and workload. It also watches all whitelisted Kube objects for any change and runs AI inferences using GNN (Graph Neural network) to detect misconfigurations, policy violations, and potential threats.<br />Cluster monitoring methods are divided into 10 classes:</p>
<p><strong>1. Access</strong><br />Exposing cluster Nodes publicly may give the same access to containers with potential vulnerabilities and may lead to attackers penetrating the cluster.<br /><strong>2. Execution</strong><br />Execution of commands is monitored very closely as an SSH server or a bash script running in a container can be compromised using brute force attacks.<br /><strong>3. Persistence</strong><br />Writeable hostPaths in containers or persistence volumes, backdoor containers, or exposed/compromised CronJobs are also monitored to reduce the risk of penetration.<br /><strong>4. Privilege</strong><br />The security contexts of all running containers are continuously monitored along with the RBAC context.<br /><strong>5. Defense Evasion</strong><br />Kube objects deletion, clearing of container logs, or connections through proxy server are some of the defense evasive events its monitors according to security policy.<br /><strong>6. Credentials Leak</strong><br />Leakage of Kube CA data of an exposed ApiServer or cloud credential files on a hosted Kubernetes service. KaiOps continuously monitors Kube's internal network and even the object calls data, which is fed into the GNN for AI model training for anomaly detection.<br /><strong>7. Discovery</strong><br />Exposed Observability platforms, K8s dashboard, ApiServer, and similar other services may lead an attacker to penetrate the cluster.<br /><strong>8. Lateral Movement</strong><br />Writeable volumes mounted on hosts, exposed access to cloud resources, privileged service accounts assigned to a container, exposed application configuration through environment variables, core DNS poisoning, arp poisoning, IP spoofing, or even public IPs assigned to a container may compromise the security of a cluster. KaiOps flags its vulnerabilities and whitelists some of them based on a user-defined security policy.<br /><strong>9. Collection</strong><br />Images from unknown or public sources and compromised container images can jeopardize cluster security. KaiOps continuously scans for Image and Package Vulnerabilities using static image scanners and feed the data into the AI inference engine.<br /><strong>10. Custom K-Tactics</strong><br />There is some custom patented K-Tactics that KaiOps uses to help its GNN Models detect errors, ambiguities, vulnerabilities, and even anomalies. K-Tactics are also used to reduce nuisance alarms and notifications.</p>
<h2 id="heading-conclusion">Conclusion</h2>
<p>With almost <a target="_blank" href="https://www.bleepingcomputer.com/news/security/over-900-000-kubernetes-instances-found-exposed-online/"><strong>One Million</strong> Kubernetes clusters found to be exposed</a>, it is evident that most of the above security vulnerabilities are present in the vast majority of Kubernetes clusters, making remediation difficult. KaiOps uses its patented AI/ML agents by feeding the telemetry data into a GNN model to predict and detect security vulnerabilities, misconfigurations, and anomalies. KaiOps SaaS is free for a cluster with up to 3 nodes for 14 days, and even includes live support in setting up their service for your clusters. KaiOps currently provide support for Native Kubernetes Clusters on almost all hosted Kubernetes services. Use this <a target="_blank" href="https://portal.KaiOps.io/login/register">link to register</a> and get an extended 1 month of a free trial.</p>
<h3 id="heading-references">References</h3>
<ul>
<li><a target="_blank" href="https://www.redhat.com/en/resources/state-kubernetes-security-report">Red Had State of Security Report</a></li>
<li><a target="_blank" href="https://kaiops.io/articles/aiops-use-cases">AIOps use cases</a></li>
<li><a target="_blank" href="https://blog.arslanali.io/common-security-issues-in-kubernetes-part-1">Common Security Issues found in almost all Kubernetes Deployments </a></li>
<li><a target="_blank" href="https://www.weave.works/blog/mitre-attack-matrix-for-kubernetes">MITRE ATT&amp;CK Matrix</a></li>
<li><a target="_blank" href="https://www.youtube.com/watch?v=ka0C09CAfho">Kubernetes Security Workshop</a></li>
</ul>
]]></content:encoded></item><item><title><![CDATA[Common Security Issues in Kubernetes Clusters -- Part 1]]></title><description><![CDATA[54% of enterprises mentioned Security as their biggest challenge in Kubernetes. Almost 7 out of 10 have limited or lack resources to manage security as Kubernetes has a steep learning curve. Now it is not only the Kubernetes lacking native security s...]]></description><link>https://blog.arslanali.io/common-security-issues-in-kubernetes-part-1</link><guid isPermaLink="true">https://blog.arslanali.io/common-security-issues-in-kubernetes-part-1</guid><category><![CDATA[Kubernetes]]></category><category><![CDATA[Devops]]></category><category><![CDATA[Security]]></category><category><![CDATA[DevSecOps]]></category><category><![CDATA[SecOps]]></category><dc:creator><![CDATA[Arslan Ali Ansari]]></dc:creator><pubDate>Tue, 11 Oct 2022 07:11:38 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/unsplash/Fa9b57hffnM/upload/v1665471592921/KYgZparCN.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>54% of enterprises mentioned Security as their biggest challenge in Kubernetes. Almost 7 out of 10 have limited or lack resources to manage security as Kubernetes has a steep learning curve. Now it is not only the Kubernetes lacking native security scanning ability rather any vulnerability in an application running on the cluster may lead to a bigger threat. Here are a few hacks from the past:</p>
<ul>
<li><a target="_blank" href="https://www.bleepingcomputer.com/news/security/over-900-000-kubernetes-instances-found-exposed-online/">Over 900,000 Kubernetes Instances are found Exposed Online</a></li>
<li><a target="_blank" href="https://www.wired.com/story/cryptojacking-tesla-amazon-cloud/">Tesla Clusters on Amazon Cloud were Hacked with Cryptojackers</a></li>
<li><a target="_blank" href="https://sysdig.com/blog/exposed-prometheus-exploit-kubernetes-kubeconeu/">Exposed Prometheus exploit Kubecon EU</a></li>
</ul>
<p>Now that we know that a single container vulnerability may compromise the overall security of the cluster, let us review a few aspects.</p>
<h2 id="heading-public-nodes">Public Nodes</h2>
<p>A public cluster is one whose Nodes are accessible from the internet, technically they have one or more networks configured with ExternalIP. You can test it using the following command</p>
<pre><code>kubectl get node &lt;node-name&gt; -o jsonpath=<span class="hljs-string">'{.status.addresses}'</span> | grep ExternalIP
</code></pre><p>Replace  with your node name in the cluster. following the sample output</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1665376504399/cP-xToki7.png" alt="Screen Shot 2022-10-10 at 9.34.50 AM.png" /></p>
<p>If any of your nodes have an ExternalIP then it is exposed, which means all your containers/pods running on this node are also exposed to the public. It is highly recommended to keep your nodes private and route the outgoing traffic using a <a target="_blank" href="https://cloud.google.com/nat/docs/overview">Cloud NAT</a>.</p>
<h2 id="heading-public-kubernetes-dashboard">Public Kubernetes Dashboard</h2>
<p>Kubernetes Dashboard is an open-source deployment used to manage and monitor cluster deployments and other objects. It is available on all clusters and can be deployed or installed with a few simple commands. Here is how you can also deploy it on your cluster:</p>
<pre><code>kubectl apply -f https:<span class="hljs-comment">//raw.githubusercontent.com/kubernetes/dashboard/v2.6.1/aio/deploy/recommended.yaml</span>
</code></pre><p>Once installed you can proxy it to access it from your remote machine (machine with cluster access) using 
<code>kubectl proxy</code> command.
Now you can access your dashboard using the following url: <code>http://localhost:8001/api/v1/namespaces/kubernetes-dashboard/services/https:kubernetes-dashboard:/proxy/</code></p>
<blockquote>
<p>Note: The kubeconfig authentication method does not support external identity providers or X.509 certificate-based authentication. The UI can only be accessed from the machine where the command is executed. See <code>kubectl proxy --help</code> for more options.</p>
</blockquote>
<p>Now, this is your standard access method, if one of your vulnerable (or compromised) containers has the access to your dashboard then the hacker will have access to the whole Kubernetes cluster. </p>
<h2 id="heading-exposed-observability-platform">Exposed Observability Platform</h2>
<p>If you are using Prometheus, Grafana, or Splunk, chances are that you are exposing them by using an external load balancer. These and other observability platforms are often used to monitor and prevent performance, security, and other issues in a cluster. Once exposed to public traffic either through an ingress or through a vulnerable (or compromised) container may lead to a brute force attack. A hacker can take control of the whole cluster once getting access to such a platform.</p>
<h2 id="heading-privileged-container">Privileged Container</h2>
<p>A privileged container is one that has full access to itself and through the <strong>reverse shell</strong> technique it can also get full control over its encapsulating Node. Full access to a node exposes all the images stored on it and it may lead to access to other nodes in the network exposing all available resource objects on the cluster. Hackers may deploy a cron job or hijack the resources of the nodes without even letting it be noticed from the Kubernetes platform.</p>
<h2 id="heading-conclusion">Conclusion</h2>
<p>In this part, I have just scratched the surface and mentioned some of the most common security vulnerabilities. An attacker can use impact techniques to destroy, abuse, or disrupt the normal behavior of an environment. I will discuss some more security vulnerabilities in my upcoming articles.</p>
]]></content:encoded></item></channel></rss>