skylynk.Book a call
Blog
2026-02-107 min read

Your AWS Bill Is Lying to You

The bill grows, the confusion stays

AWS bills are almost designed to obscure where money goes. Line items like `EC2-Other` contain dozens of sub-items that AWS does not surface in the default Cost Explorer view. `DataTransfer-Out-Bytes` is a single line that could represent traffic from a CDN, a mis-routed inter-AZ service call, or a data pipeline nobody knew was running. The granularity that would make the bill actionable is two or three levels below what you see by default.

Most teams have no idea which team, application, or environment owns which cost. The billing account shows $47,000 for the month and engineering leadership wants to know why it went up 12% from last month. Nobody can answer that question because the tagging is incomplete, the cost allocation groups have not been configured, and the people who know which services are running do not have access to the billing console.

This is not a technology problem — it is a process problem. The technology to answer these questions exists in AWS: Cost Explorer, Cost and Usage Reports, tag policies, cost allocation tags. But these tools require setup and discipline that most teams do not invest in until the bill is already painful.

The path forward starts with admitting that "we do not know what we are spending on" is an unacceptable state for any organization past the earliest startup phase.

The three biggest cost leaks

Idle or oversized instances are the most common source of wasted spend. An m5.2xlarge running at 3% CPU utilization 24 hours a day because a migration six months ago provisioned it for peak load and nobody right-sized it afterward. A fleet of instances that process jobs only during business hours but run through nights and weekends. RDS instances with provisioned IOPS that were set during a performance incident and never revisited. These are not edge cases — CloudWatch Metric data consistently shows that the average EC2 instance runs at well below 30% CPU utilization.

Data transfer costs are the second leak, and the most insidious because they are nearly invisible until you look for them. NAT Gateway egress charges accumulate whenever private subnet resources reach the internet. Inter-AZ traffic — service A in AZ-a calling service B in AZ-b — is billed at $0.01/GB in both directions. For high-throughput internal services, this adds up faster than teams expect. S3 transfer acceleration, cross-region replication, and CloudFront origin fetches all have transfer costs that appear on the bill without a clear attribution to the workload generating them.

The third category is accumulated dead weight: EBS volumes detached from instances but still incurring gp3 charges at $0.08/GB-month. Snapshots from AMIs that are no longer in use. RDS instances in stopped state (stopped RDS still bills for storage). EC2 instances in stopped state (still billed for EBS). Elastic IPs not attached to running instances ($0.005/hour adds up). This category is pure waste with no operational benefit — it is infrastructure that has been forgotten rather than decommissioned.

Tagging is not optional

Without a tagging strategy enforced at the account level, cost allocation is guesswork dressed up as analysis. You can look at Cost Explorer and see that EC2 costs went up, but you cannot answer which team's EC2 costs went up, which application they are running, or whether it is dev or prod traffic. Every resource needs at minimum three tags: `env` (dev/staging/prod), `team` (the owning team), and `service` (the application or service name). More granular tags — `cost-center`, `project`, `owner` — enable more granular allocation but these three are the baseline.

Enforcement is the key word. Tag policies at the AWS Organizations level define what tag keys are allowed and what values are valid for each key. SCPs can prevent the creation of resources that lack required tags — an `iam:TagResource` condition on the SCP blocks resource creation if the required tags are not present in the API call. This is not perfect (some resource types are not taggable at creation) but it dramatically improves tag coverage for the resources that matter.

Activating tags as cost allocation tags in the billing console makes them available in Cost Explorer for filtering and grouping. This step is required and often missed: defining a tag in Organizations does not automatically make it a cost allocation tag. After activation, Cost Explorer can show you spend grouped by team, by service, by environment — the analysis that turns a $47,000 bill into something actionable.

Retrospective tagging of existing untagged resources is painful but necessary. AWS Config with a custom rule that flags untagged resources, combined with a Lambda remediation that tags them based on resource metadata (instance type, name tag, VPC), can accelerate cleanup for the existing estate.

The FinOps flywheel

Savings Plans and Reserved Instances are the most impactful single lever for reducing compute costs, but teams consistently misapply them. Reserved Instances commit you to a specific instance family, size, and region for one or three years. Savings Plans commit you to a spend rate ($/hour) with flexibility across instance families and regions. For most teams, Compute Savings Plans (the most flexible type) are the right choice — they apply to EC2, Fargate, and Lambda spend, and they do not require predicting specific instance types a year in advance.

The key discipline is to buy Savings Plans or RIs against your baseline compute usage, not your peak. Your baseline is the compute you are always running regardless of traffic — your always-on services, your RDS instances, your baseline ECS tasks. Buy 70-80% of that baseline in Savings Plans. Let the remainder run On-Demand. This approach gives you significant savings (Compute Savings Plans offer up to 66% off On-Demand) while maintaining flexibility for the variable portion of your fleet.

Right-sizing without touching reliability requires data. Pull CloudWatch CPU and memory utilization metrics for your instances over a 30-day window. Look for instances running below 20% average CPU with a p99 below 50%. These are candidates for downsizing or switching to a more cost-efficient instance type. Graviton3 (m7g, c7g, r7g) instances offer 10-20% better price-performance than equivalent x86 instances for most general-purpose workloads.

Scheduling is the easiest win for dev and staging environments. Stopping non-production instances outside business hours — 6pm to 8am on weekdays, all weekend — can eliminate 65-70% of dev/staging compute costs overnight. AWS Instance Scheduler automates this. There is no reason to pay for an environment that no engineer is using.

What Skylynk does

Skylynk's cost optimization engagement starts with what we call a cost archaeology session: pulling the Cost and Usage Report, analyzing it at the resource level, and mapping spend back to teams and workloads. The output is a prioritized list of opportunities ranked by impact and implementation complexity, with specific recommendations for each — not a generic "right-size your instances" slide.

From there we implement: tagging enforcement via SCPs and Organizations tag policies, Cost Explorer configuration for ongoing visibility, Savings Plans recommendations based on actual usage patterns, idle resource cleanup, and data transfer optimization where warranted. Clients typically see 25-35% reduction in their monthly AWS bill within 60 days of starting the engagement, with ongoing savings from the visibility infrastructure we put in place.

If your team cannot answer "what did we spend on last month and why?" in under an hour, that is a fixable problem. The cost optimization service page has the engagement details.

FinOpsCost OptimizationAWS
Cost Optimization

Ready to fix this?

Skylynk works with engineering teams to solve exactly these problems — no generic advice, no long assessments before any value. The Cost Optimization engagement is built around your specific situation.

See the Cost Optimization service