How Cloud-Native Engineering and AI Are Transforming Modern Software Security
Rajeev, your journey spans full-stack development, cloud engineering, serverless automation, and DevOps. What drove your evolution toward systems thinking and security leadership?
After graduating and becoming a .Net developer, I worked primarily on single user application with no security or scalability concerns, as I was the primary user for these small projects.
Upon entering the industry, things dramatically changed. It was quickly apparent that there was a huge difference between developing something functional versus something scalable, secure and fault tolerant. Now I had to consider protecting APIs and authentication workflows, data privacy, potential misconfiguration and how systems behave when used by thousands of users and subjected to real world threats.
The change in thought process compelled me to pursue a Master's degree in Advanced Computer Science (Cloud Computing) at the University of Leeds. I wanted to learn how distributed systems operate in real-world environments with failures, high loads and uncertainty which also forced me into learning about cloud architectures, devops pipelines and automation.
A significant milestone occurred when I joined the element14 community around the same time as the mainstream emergence of AI and tools like chatGPT. Practically overnight, AI moved from being an emerging technology to a core expectation of teams using it to build AI powered search, automate workflows and personalized user experiences. With this rapid adoption came a host of new security, privacy and integration challenges. This is when my interests in the intersection of cloud computing, artificial intelligence and security escalated building responsible, intelligent features without losing user confidence.
With cloud, serverless, and AI entering engineering stacks, what security risks do teams often overlook?
At the time of my joining element14, we were in the process of transferring from our previous (and somewhat old) platform to a modern cloud-based native architecture. Our internal DevOps was well-established, and like many teams in transition, we used environment variables and stored secret to connect to our cloud based resources. We got away with this method, however it created a lot of the typical overlooked risk areas - static credential, wide access scope and poor auditable history.
The Avnet data breach in September served as a turning point for element14. The Avnet breach demonstrated the rapid escalation of exposure when identity and access is not closely monitored. Therefore, we have moved to a complete IAM model, eliminated persistent administrative access, established just-in-time permission and rotate keys on every sprint. Not only did this make our environment more secure; it also provided us with greater visibility into who accessed what, and when they accessed it.
I see a big hole in the cloud teams today where they assume that the security of their secrets in a vault will provide them the necessary security. The truth of the matter is that the largest risk lies in the way that identities are handled and how long access is left open. Security increases dramatically when identity, automation and monitoring is considered part of the architecture as opposed to afterthoughts.
AI is being positioned as a breakthrough in security. Beyond automation, where do you see its real impact in securing cloud-native systems?
The true value of AI currently exists in the amount it can analyze cloud environments with much greater speed than what is possible for humans. For example, in Kubernetes, finding the source of the problem typically requires hours of manually reviewing logs filled with noise. This has changed, we now have the capability to take the entire log output from the Kubernetes cluster and run it through an AI model in less than 10 seconds and receive a clear diagnosis and most probable root cause of the issue.
That alone improves the rate at which incidents are resolved and allows teams to focus on resolving the issue rather than spending their time trying to find the issue.
In addition, we see the beginnings of AI being used to improve the quality of both your code and your infrastructure. There are tools such as Coderabbit that review pull requests to identify problems before they reach production and I believe that there will be many other AI systems coming soon that will evaluate cloud configurations, IAM polices and runtime behavior in real-time.
However, we must be cautious because providing AI with broad access to our systems introduces significant risk to our cloud environments – for example, AI could potentially inject malicious prompts to systems, escalate privileges unexpectedly, leak sensitive data etc. I discussed some of the potential vulnerabilities associated with AI use at the MCP Summit and I believe that these are serious threats as AI becomes even more ingrained in operations.
In summary, AI will dramatically improve cloud security when properly governed and treated with the same level of discipline as we treat our cloud environments.
When you join a new team or project, what is your first step in assessing whether the cloud and security foundations are strong?
My first step in evaluating a potential risk of a system is typically reviewing identity and access. This is due to my experience working under intense pressure with the community element14, where we migrated to an IAM based model for access and rotated keys during each sprint where I have seen firsthand how quickly risks become apparent when there are no tight control measures for user permission. Therefore, I will evaluate, at a minimum, who has access to what within the system and if the environment follows least privilege principles.
In addition to assessing the identity and access layer, I will also examine the automation layer that exists in the system. For example, in our environment, we had integrated our build pipelines into Logic Apps which enabled us to more easily identify where failures occurred and provided a method for errors to surface quickly. While email alerts were useful early on, maturation of a system would require centralization of observability and alerting.
Assuming that AI is present in your system, I will review the boundaries around the AI's access and the prompts being sent to it. Misconfigured AI work flows can create vulnerabilities (i.e., data leakage or prompt manipulation).
Further, I will confirm these assumptions by utilizing automated controls (i.e., config rules, security hubs and IaC scanners). A strong foundation in the cloud does not only consist of design but also continuous validation/enforcement.
How do you balance the need for rapid development and deployment with security, compliance, and long-term maintainability?
I believe there are many misconceptions out there when it comes to how to improve efficiency through technology adoption. Just because you adopt a newer technology does not mean your organization will become more efficient. There are examples where migrating to a container orchestration platform (such as Kubernetes) can be beneficial for certain types of architectures, however this may not be true for all organizations. Many teams tend to focus on what they perceive to be the “next big thing” without taking the time to assess the operational implications, costs, or maintenance burdens associated with a significant architectural shift. What has been effective for one organization's scale and workload does not necessarily translate to another organization. If the introduction of a new technology increases complexity without providing a clear, measurable benefit in terms of either operational improvement, cost reduction, or ease of maintenance, than maintaining the status quo in most cases makes the most sense. Modernization should be a deliberate process versus simply reacting to the latest trend.
That being said, innovation can certainly provide benefits by augmenting and improving upon existing workflows rather than completely rebuilding them. Rather than trying to re-architect everything from the ground up, integrating AI into your current system to enhance its reliability is a viable option. For example, a Logic App could identify issues within your pipeline or production workloads, pass the necessary data to an internal AI model for analysis and return actionable insight back to the user. In doing so, organizations can realize improved response times and reduced manual effort, while at the same time minimizing disruption to their core architecture.
What principles guide your leadership when aligning engineering, security, and product teams, especially in high-stakes or fast-scaling environments?
I begin by integrating security into my workflow from day one. We run in sprints and use Function Apps & cloud automation heavily, which means when I'm not careful errors can spread rapidly. A simple example is missing a .gitignore & pushing app settings to GitHub (even in a private repository) could expose a number of vulnerabilities. This is why we have implemented an automated check at each commit to identify if there are key or secret patterns present. The moment something passes through undetected, we will know about it immediately, and take action prior to it becoming a vulnerability.
As is true of my overall philosophy; Security must be integrated into your development pipeline and not be considered after the fact. Policy validation, CI/CD checks, Secret Management, Environment Provisioning, etc. Must be automated so that best practices are consistently enforced. When the pipeline has taken care of Governance/Guardrails, Teams can move rapidly while maintaining their safety.
How do cloud automation and serverless workflows reshape engineering practices, and where do you see the biggest opportunities for innovation?
The use of Cloud Automation and Serverless Workflows completely transform the way Engineering functions. By eliminating the need for engineers to provision infrastructure or maintain servers, they are able to focus on writing behaviours, events, triggers, workflows and the business logic that drives them. This significantly reduces the operational burden of features, allowing them to scale as needed, without requiring additional maintenance.
We are already seeing this in action with various examples. We built an AI-based review summary application within our Community Platform using Function Apps communicating with OpenAI to generate meaningful insights from large volumes of User Feedback. The entire pipeline was designed as serverless, automated and cost-effective. While it is a small example, it illustrates how AI can help enhance the cloud ecosystem while also reducing complexity.
The largest opportunity ahead will be to make Cloud Environments Smarter by incorporating AI into the architecture of those systems. Once AI becomes a part of the system's architecture, systems will have the capability to optimize capacity, anticipate failures, enforce security policies, and ultimately lower their costs more effectively.
Can you share an example where cloud, AI, and security came together to solve a real problem or deliver measurable improvement?
A particularly effective example was the implementation of real-time image-safety scanning workflow on behalf of our community platform. We had a need for a method of identifying potentially harmful or illegal content (in compliance with UK online safety expectations) at the point of uploading images into our storage accounts.
The solution we created utilized event driven architecture; upon the addition of a new file to our storage account, an Event Hub trigger verifies the file's metadata to ensure that the file is indeed an image. Following this, a Logic App coordinates multiple validation processes: Microsoft's PhotoDNA which identifies known harmful content signatures, and an OpenAI Foundry model to identify whether the image itself represents unsafe or prohibited content. If either validation process identifies potential risk, the system immediately quarantines the image.
What trends do you believe will define the next decade of cloud security and AI-driven engineering?
The future of Cloud Security will be driven by Identity. With distributed systems becoming more common place, all aspects of an organizations infrastructure (API Access, Service-to-Service Communication, AI Inference Endpoints) will have strict Identity governance applied to it. Since the perimeter has been eliminated, Identity becomes the new boundary. AI-based analysis will become commonplace for not only detecting threats but optimizing performance, identifying misconfigurations and automating many of the decisions currently made by humans. I recently developed a custom Kubernetes scalar that adjusted workload based on application behavior. This is just one example of how there will naturally be an evolving trend toward using AI to enhance scalable applications so they are smarter and more context aware with their scaling decisions.
As Cloud environments continue to operate in self-managing mode adjusting cost, capacity and security posture automatically based on real-time signals, the new challenge will be, as AI farms become more expensive, we will need to develop more efficient architectures, lighter inference models and more Intelligent resource utilization to keep AI sustainable.
While we are much further ahead than we were two years ago when it comes to AI, the next phase of Cloud AI will not be about larger models, it will Be about creating AI that is secure, sustainable and efficiently integrated directly into the Cloud foundation.
If you could instil one mindset shift among engineering leaders, what would it be - and why is it crucial for the future of software development?
Leaders who embrace technology including AI for architectural purposes will be better positioned to do so then those who have no intention of doing so. I am witnessing a growing reliance upon AI generated code which is created with little thought of the long term implications. Although an AI solution may meet your needs at the time you begin to implement it, I have witnessed "Vibe Coded" and poorly written AI solutions fail when placed under heavy loads, fail to meet security requirements, and produce a large amount of technical debt that is very difficult to recover from.
Although I utilize AI to support my development activities on a regular basis, I always use AI to support the development activities by utilizing AI within a sound architecture. The validation of all output produced by AI and the alignment of such output with the established design patterns of our organization, as well as the evaluation of the impact of AI on the scalability, maintainability, and security of our applications is a fundamental aspect of what I do.
The mind set I would like to see develop among leaders is that AI should be used to assist the engineering judgment of engineers, not to supplant it. The leadership team has a responsibility to insure that their engineering staff understand the architecture of their systems, the constraints of their system, and the long term implications of each decision that is made. When the guardrails are in place (with clear patterns, good pipelines, and automated checks) AI can become a valuable resource and partner to the engineer.
