Io0001
← All case studies

Case study

Modernizing Legacy .NET Infrastructure with Docker, Kubernetes, and Terraform

How we containerized and orchestrated legacy .NET applications and provisioned cloud infrastructure with modern tooling.

An e-learning platform serving 400,000 daily active users across 120 countries transformed its infrastructure from legacy Windows IIS deployments to a modern, containerized Kubernetes platform—without external consultants.

This migration eliminated deployment downtime, reduced median cycle time from weeks to 2.5-4 days, enabled multiple deployments per day, and significantly reduced infrastructure costs, while building internal engineering capabilities.

Key Results

  • Zero-downtime deployments: From hours-long manual deployments causing outages to multiple seamless deployments per day
  • 75%+ faster delivery: Median cycle time reduced from weeks to 2.5-4 days (in progress to production)
  • Significant cost reduction: Infrastructure costs lowered by moving from Windows-based Azure App Services to Kubernetes
  • SLA compliance restored: Eliminated risk of violating customer SLAs with hefty financial penalties
  • Engineering velocity unlocked: Removed operational toil and "deployment days," freeing engineers to focus on features

The Challenge

Business Context

The company operated a hospitality e-learning platform handling terabytes of traffic daily across 120 countries. The application had been running on traditional Windows IIS infrastructure for nearly five years, serving a growing user base that had reached 400,000 daily active users. As the business scaled, the legacy infrastructure became a critical bottleneck.

Pain Points

Deployment Risk & Downtime

Every deployment was a high-risk event. The semi-manual deployment process required 1-2 engineers for several hours and regularly caused customer-facing downtime. This created a real threat of violating SLAs with major customers—agreements that included substantial financial penalties.

Operational Burden

Engineers designated entire "deployment days" or "release days" to manage the fragile deployment process. The Windows IIS architecture exhibited unpredictable behavior under load, leading to frequent production incidents that were difficult to diagnose due to poor observability.

Missed Business Opportunities

The slow, risky deployment process meant the company couldn't capitalize on time-sensitive marketing opportunities. Feature delivery was constrained by infrastructure limitations rather than engineering capacity.

Growing Reliability Concerns

A retrospective analysis of application reliability revealed concerning patterns. The engineering team (then 30 people) spent significant time on operational toil instead of innovation. The CTO and CPO recognized that without intervention, the infrastructure would continue to constrain business growth.

The Approach

Strategic Decision-Making

Working directly with the CTO, we evaluated cloud-specific options (e.g. Azure App Services), cloud-agnostic orchestration (HashiCorp Nomad, Docker Swarm, Kubernetes), and infrastructure management (Terraform vs. alternatives).

Why Kubernetes? Kubernetes emerged as the optimal choice: Azure's excellent support through AKS, comprehensive documentation, relatively low barrier to entry for a team new to container orchestration, and a strong, mature ecosystem. It also provided a cloud-agnostic foundation for future flexibility.

Why Terraform? For Infrastructure as Code, Terraform offered expressive, human-readable syntax, excellent documentation, a gentle learning curve, native support for Kubernetes and Azure, and an industry-standard approach with strong job market appeal for engineers.

Building the Business Case

The business case centered on risk mitigation and opportunity cost:

  • Financial risk: SLA violations with major customers carried substantial penalties
  • Opportunity cost: Inability to deliver features quickly meant missed revenue opportunities
  • Operational cost: Engineering time spent on deployment toil rather than feature development
  • Scalability ceiling: Current infrastructure couldn't support projected growth

The project would be executed "on the job" without external consultants or derailment from the product roadmap, keeping investment primarily in the form of internal engineering time.

Risk Mitigation Strategy

Recognizing that the team had no prior experience with containers or Kubernetes, we designed a phased approach:

  • Phase 1 — Learning & Non-Production: Deployed a complete non-production ecosystem so engineers could gain hands-on experience in a safe environment.
  • Phase 2 — Incremental Migration: Migrated business services one by one—first containerized with Docker, then moved to Kubernetes—building confidence with each success.
  • Phase 3 — Production Rollout: Implemented proxy layers between customers and both old and new deployments, carefully routed traffic while monitoring metrics and logs, and gradually increased traffic to the new infrastructure over several hours. We only committed to a 100% traffic shift after building complete confidence.

Technical Foundation

The .NET Core Migration Challenge

The biggest technical hurdle was containerizing the Windows .NET application. This required:

  • Migrating the application from .NET Framework to .NET Core
  • Implementing Linux-specific fixes for container compatibility
  • Gradually transitioning the engineering team from Windows to macOS to improve developer experience

The application architecture was sound enough that we could execute a lift-and-shift approach without major restructuring.

Observability First

To ensure customer experience wouldn't degrade, we implemented comprehensive monitoring: frontend error tracking, backend error monitoring, page load speed metrics, and the full Grafana monitoring stack (among early adopters of Grafana for Kubernetes). This observability layer proved critical for both the migration and ongoing operational excellence.

The Outcome

Quantifiable Business Impact

  • Deployment: Before — semi-manual deployments taking hours, causing downtime, requiring dedicated "deployment days." After — multiple seamless deployments per day with zero customer-facing impact, including easy rollbacks for bug fixes.
  • Delivery speed: Median cycle time reduced from weeks to 2.5-4 days (from "in progress" to "in production"), unlocking time-sensitive opportunities.
  • Cost efficiency: Infrastructure costs significantly reduced by moving from Windows-based Azure App Services to Kubernetes; no external consulting costs—all capability built in-house.
  • SLA compliance: Eliminated the risk of costly SLA violations and restored confidence with major customers.

Organizational Transformation

The project catalyzed a cultural evolution toward "You Build It, You Run It": engineers became owners of production reliability, operational responsibility distributed across the team, and specialization decreased while overall resilience increased.

Stable, similar non-production and production environments enabled high-confidence acceptance testing at any time, faster debugging and issue reproduction, and fewer "works on my machine" problems.

The new infrastructure became a reusable blueprint: multiple subsequent business services were deployed using the same patterns, engineers gained reliable tools for production operations, and the foundation was established for advanced patterns (Service Mesh, OpenTelemetry, etc.).

As the platform matured and operational toil decreased, the team optimized from 30 to 20 engineers while increasing output—a testament to improved efficiency and reduced operational burden.

Lessons Learned

What worked well: An empirical, iterative approach allowed the best solutions to emerge; starting with non-production gave the team safe space to learn and fail fast. Kubernetes and the ecosystem's maturity, stability, and documentation made adoption smoother than anticipated; Azure AKS was particularly easy to master. Building expertise internally rather than relying on consultants created lasting organizational value.

Critical success factors: Ensuring sufficient in-house expertise before production deployment was crucial—the non-production phase was essential preparation. The harder work was addressing organizational concerns: building buy-in, managing change, and ensuring engineers felt supported through the learning curve. The phased approach, moving services one by one, limited risk and allowed learning to compound.

Advice for others: Invest in capability building first; start with non-production; migrate incrementally (don't attempt a big bang); prioritize observability from day one; and address organizational concerns proactively—the technical challenges are solvable; the people challenges require ongoing attention.

Looking Forward

The migration established a solid foundation for continued innovation. Following the successful production deployment, the team progressed to implementing advanced architecture and security patterns including Service Mesh and OpenTelemetry. The project demonstrated that transformative infrastructure changes are achievable without disrupting the product roadmap or requiring expensive external help. By investing in internal capabilities and taking an empirical, risk-managed approach, the company built not just a better platform but a stronger engineering organization. The platform now serves as a proven blueprint for new projects, and engineers have gained valuable production operations skills that benefit both the organization and their professional growth.

Role: Head Of Engineering reporting to CTO · Timeline: Phased over multiple quarters · Team: 30 engineers at start, optimized to 20 by completion · Technologies: .NET Core, Docker, Kubernetes (AKS), Terraform, Grafana · Industry: E-learning / Hospitality · Scale: 400,000 daily active users, 120 countries, terabytes of daily traffic