Journey to Serverless (Part 2): The Cost Crisis and Path to Change
AWS Serverless Cost Optimisation Architecture Migration
Table of Contents
- 1. Introduction
- 2. The Cost Crisis: From $13 to $73
- 3. Three Options Evaluated
- 4. The Decision: Why Serverless?
- 5. The Path Forward
- 6. Key Takeaways
- 7. Next in the Series
- Series Navigation
1. Introduction
The system was running smoothly in production. The infrastructure was stable, the team was using it regularly, and operational costs were remarkably low at just $13/month. Everything seemed sustainable.
Then the AWS Free Tier window ended.
I had set up a budget alert months earlier specifically to catch this moment. When the alert triggered, I checked the projected costs and was surprised: once the free tier benefits expired, the monthly bill would jump from $13 to approximately $73/month.
The Business Context: While $73/month is trivial for most organisations, for an internal tool serving a small team within this company, it represented a significant operational expense that could be optimised. More importantly, even though I hadn’t signed a formal maintenance contract, I felt a professional responsibility to the client. Rather than simply present them with a higher bill, I decided to proactively investigate alternatives and present multiple options before costs escalated.
This decision—to take initiative and present solutions rather than just accept the increased costs—set the stage for what would become a complete architectural redesign.
2. The Cost Crisis: From $13 to $73
2.1. Understanding the Bill
When the free tier ended, costs jumped from $13 to $73. To understand the increase, I analysed the bill component by component.
Previous Cost Breakdown (Within Free Tier):
- EC2 Instance (t3.micro): $0 (750 hours free)
- RDS Instance (t4g.micro): $0 (750 hours free)
- Application Load Balancer: $0 (750 hours free)
- IPv4 Addresses: ~$11.00 (4 IPv4 addresses in use; AWS provides 1 free per account, charged for the remaining 3)
- Secrets Manager: $0.8
- RDS Backups: ~$0.2
- ECR Storage: ~$0.1
- Subtotal: ~$12.00
- Tax (10%): ~$1.20
- Total: ~$13-14/month
Projected Cost Breakdown (Post Free Tier):
- EC2: ~$14
- RDS: ~$22 (including Backups)
- Application Load Balancer: ~$17
- IPv4 Addresses: ~$15.00
- Secrets Manager: $0.80
- ECR Storage: ~$0.1
- Subtotal: ~$68.80
- Tax (10%): ~$6.88
- Total: ~$73-76/month (including tax)
2.2. The IPv4 Problem: The Unexpected Cost
I had expected the free tier to cover almost all costs, with only minor charges. However, I discovered an unexpected cost that completely changed the picture: IPv4 addresses.
The architecture required:
- 1 public IP for the EC2 instance (elastic)
- 3 public IPs for the Application Load Balancer (one per Availability Zone)
The Surprise: I had overlooked IPv4 address costs entirely. While AWS provides 1 free IPv4 address per account, the remaining three addresses (needed for EC2 and ALB) incurred charges even during the free tier period. What I didn’t realise at the time was that these IPv4 charges were actually the dominant cost factor—accounting for ~85% of the entire monthly bill during the free tier period.
- During free tier: ~$11/month (the single biggest line item, accounting for ~85% of the total bill)
- After free tier: ~$15/month (still the largest cost component)
The Architectural Trade-off: I had initially considered deploying to just one Availability Zone to reduce IPv4 costs, but ALB requires Multi-AZ deployment for reliability. I chose to deploy across all three AZs in the region, a choice that resulted in higher IPv4 charges than necessary for an internal tool with predictable, low-volume usage.
Hindsight: While all compute resources (EC2, RDS, ALB) stayed within free tier, the IPv4 charges alone added up quickly. If I had deployed to just two AZs instead of three, I could have saved approximately $3.50/month on IPv4 costs—a lesson I learned too late.
2.3. The Real Issue
The problem wasn’t the cost itself—it was permanence. The client’s priority was clear: minimise operational expenses. A $70/month recurring cost for an internal system with no direct revenue couldn’t be justified.
3. Three Options Evaluated
Using AWS Pricing Calculator, I evaluated three distinct approaches to reduce costs and prepared detailed cost projections for each option to present to the client.
3.1. Option 1: Savings Plans and Reserved Instances
Approach: Use AWS Savings Plans and Reserved Instances for long-term cost reduction.
Projected Monthly Cost: about $60/month (after 1 year commitments)
Evaluation:
- ✅ Minimal architecture changes
- ✅ Leverage existing infrastructure
- ❌ Still expensive for an internal tool
- ❌ Commitment required (1 year contracts)
- ❌ No reduction in operational management
Verdict: Better, but not good enough. Still too expensive.
3.2. Option 2: Single EC2 Instance with Embedded Database
Approach: Combine EC2 and RDS on a single instance, remove the ALB. Run both app and PostgreSQL in ECS.
Projected Monthly Cost: ~$18-22/month
Evaluation:
- ✅ Significant cost reduction
- ❌ Loss of RDS managed features and automatic failover
- ❌ Database management burden on developer
- ❌ Higher operational burden and reliability risk
- ❌ Single point of failure—no automatic redundancy
Verdict: Cost improves, but operational burden increases. The client has no dedicated developers, so this would lead to support issues. Savings don’t justify the added risk.
3.3. Option 3: Serverless Architecture
Approach: Migrate to Lambda and DynamoDB, paying only for actual usage.
Projected Cost: ~$5-10/month (85-90% reduction)
Evaluation:
- ✅ Scale to Zero
- ✅ Matches usage patterns perfectly
- ✅ Operational overhead nearly eliminated
- ✅ AWS-managed scaling and automatic high availability
- ❌ Cold start latency (~2 seconds)
- ❌ Complete rewrite required
- ❌ New serverless patterns to learn
Trade-offs Worth Considering:
Measured Cold Start Latency: ~2 seconds (API Gateway total latency during cold start), ~400ms after warm-up
For an internal tool, this is acceptable. Users don’t need sub-second responses, making this a reasonable trade-off for massive cost savings.
Development Effort: 2 months (backend rewrite, data migration, async redesign)
Verdict: Despite the development effort, the cost reduction, operational simplicity, and alignment with usage patterns make this the best choice.
4. The Decision: Why Serverless?
4.1. The Business Case
1. Cost Alignment: The Usage Pattern Perfect Match
The real insight wasn’t just that Serverless costs less—it was that this particular system’s usage pattern made Serverless dramatically more efficient.
The Usage Reality:
- Active hours: Business hours only (9-6, weekdays)
- Idle state: Nights, weekends, and holidays
- Users: 5 internal team members
- Traffic: Predictable and low-volume, occurring primarily during fixed peak hours.
The Cost Implications:
- Server-Based: $70/month fixed, whether the system runs or sits idle
- Serverless (Scale to Zero): Charges only during active usage (~sub-dollar during business hours, zero cost during nights/weekends/holidays)
For a 24/7 public-facing service, the distinction wouldn’t matter. But for an internal tool with concentrated usage in specific hours? Scale to Zero is transformative.
The system runs roughly 40 hours per week (business hours), idle the rest. Over a month, idle time exceeds 70%. Serverless charges only for active usage:
- Server-Based: $70/month always
- Serverless: ~$5-10/month (pay per actual usage)
This 85% cost reduction directly reflected the system’s actual usage pattern, not theoretical savings.
2. Operational Burden
- Server-Based: Requires capacity planning and ongoing management
- Serverless: AWS handles everything
- The client has no dedicated developers
For a small team without dedicated ops staff, operational burden becomes the bigger constraint than cost.
4.2. The Client Decision
I presented all three options with detailed analysis and recommended Serverless:
- Cost: $5-10/month vs $70/month (85% reduction)
- Operations: Infrastructure management eliminated—AWS handles everything
- Team reality: No dedicated developers to manage servers
- Long-term: Foundation for sustainable growth without operational overhead
The client chose Serverless.
4.3. Embracing the Challenge
The decision was rational: cost and operations both favoured Serverless. But implementation would be different.
The server-based system used familiar patterns—Node.js, PostgreSQL, relational models. Serverless required a complete shift:
- Event-driven thinking
- NoSQL design
- Async-first patterns
- Different debugging and testing
Despite the unfamiliar patterns, this became the right architectural choice for the client’s actual constraints. I decided to embrace it as a genuine challenge.
5. The Path Forward
With approval secured, the next phase began: complete architectural redesign.
The new architecture needed to address:
- Lambda programming model (stateless, timeout-bound)
- NoSQL data modeling (no joins, denormalization)
- Event-driven communication
- Distributed testing
The benefits were compelling:
- 85% cost reduction
- Zero infrastructure management
- Automatic scaling and high availability
- Long-term sustainability
In the next post, I’ll cover the technology choices and architectural decisions that transformed a $70/month system into a $5-10/month operation.
6. Key Takeaways
-
Business context drives architecture: This was an internal tool for a small team with no dedicated developers. Low traffic, predictable usage, and operational burden made Serverless obvious. Match actual requirements, not theoretical ideals.
-
Accept meaningful trade-offs: Trading slightly slower responses (2-second at worst when cold start) for 85% cost reduction, zero infrastructure management, and automatic high availability. For internal tools, this exchange is worthwhile.
The decision wasn’t just about cost—it was about matching reality: actual usage patterns, client constraints, and operational burden.
7. Next in the Series
Part 3: Technology Choices and Their Rationale examines the specific technology decisions made during the serverless migration:
- Backend Language: Why Python + FastAPI instead of TypeScript + Nest.js
- Authentication: AWS Cognito for managed identity services
- Database: DynamoDB on-demand vs RDS, handling the SQL to NoSQL migration
- Messaging: SQS and EventBridge for asynchronous communication
- Object Storage: S3 lifecycle policies for cost optimisation
- API Layer: HTTP API Gateway vs REST API Gateway
- Monitoring: CloudWatch, X-Ray, and observability trade-offs
- Local Development: LocalStack for AWS service emulation
- Infrastructure as Code: Why Terraform over AWS CDK
- Frontend & Desktop: React + Vite, Electron challenges, and Playwright for browser automation
Each decision prioritised cost efficiency whilst maintaining operational simplicity.
Series Navigation
- Part 1: The Initial Server-Based Architecture
- Part 2: The Cost Crisis and Path to Change (Current)
- Part 3: Technology Choices and Their Rationale