Tech12/5/2025Khateeb Ur Rehman

Scalable System Architecture: From 100 to 10k Concurrent Users

Every startup dreams of 'going viral', but for our client's e-commerce platform, viral traffic became a nightmare. During their Black Friday sale, the servers crashed under the load of 10,000 concurrent users. This is the story of how we re-engineered their system to handle 10x that load.

Video Analysis: The Monolith vs Microservices Trade-off

Phase 1: The Diagnosis

The application was a standard Monolithic Node.js service connected to a single PostgreSQL instance. It worked perfectly for 500 users. But at 10,000, we observed severe degradation:

Event Loop Lag: The single-threaded nature of Node.js meant that CPU-intensive tasks (like PDF invoice generation) blocked all incoming requests.
Database Locks: Long-running analytics queries locked rows needed for new orders, causing a cascade of failures.
Memory Leaks: Stateless sessions weren't truly stateless, consuming RAM rapidly.

We used New Relic to identify that the database CPU was sitting at 100% continuously, while the Node.js servers were idle waiting for I/O.

Phase 2: Decoupling with Event-Driven Architecture

The first step wasn't to rewrite everything, but to offload. We identified that the checkout process was doing too much: validating inventory, charging cards, sending emails, and updating analytics synchronously.

We introduced RabbitMQ to handle these side effects asynchronously. This pattern is often called 'Fire and Forget' for non-critical path items.

javascript

// ❌ OLD: Synchronous Bottleneck
app.post('/checkout', async (req, res) => {
  await inventory.check(req.body);
  await payment.process(req.body); // Waits 2s
  await email.sendConfirm(req.body); // Waits 0.5s
  res.send('Order Placed');
});

// ✅ NEW: Asynchronous & Non-Blocking
app.post('/checkout', async (req, res) => {
  const isValid = await inventory.check(req.body);
  if (!isValid) return res.status(400).send('Out of Stock');
  
  // Push to Queue and respond immediately
  await rabbitMQ.publish('orders', req.body);
  res.status(202).json({ status: 'Processing', id: req.body.id });
});

Phase 3: Database Optimization and Sharding

The database is often the hardest thing to scale. Vertical Scaling (buying a bigger server) worked for a while, but it hit a ceiling.

1. Read Replicas

We set up 3 Read Replicas (Slaves) on AWS RDS. We routed all generic `GET` requests to these replicas, keeping the Master DB dedicated to Write operations (INSERT/UPDATE).

2. Horizontal Sharding

We implemented application-level sharding based on `tenant_id`. User data for European customers was routed to `DB_EU`, while US customers went to `DB_US`. This instantly halved the load on any single instance.

Phase 4: The Caching Layer (Redis)

We realized 80% of database hits were for the same 'Top Selling Products'. By caching these queries in AWS ElastiCache (Redis), we reduced DB load by 70%.

"There are only two hard things in Computer Science: cache invalidation and naming things."

We used a 'Write-Through' caching strategy to ensure data consistency. Whenever a product was updated, the cache key was immediately invalidated.

javascript

const getProduct = async (id) => {
  const cacheKey = `product:${id}`;
  
  // 1. Check Redis Cache
  const cached = await redis.get(cacheKey);
  if (cached) return JSON.parse(cached);
  
  // 2. Fetch from DB if miss
  const product = await db.query('SELECT * FROM products WHERE id = ?', [id]);
  
  // 3. Store in Cache (TTL 1 hour)
  await redis.setex(cacheKey, 3600, JSON.stringify(product));
  return product;
};

Monitoring and Observability

You can't improve what you can't measure. We deployed a robust monitoring stack:

Prometheus: To scrape metrics (request count, latency, error rates).
Grafana: To visualize these metrics in real-time dashboards.
ELK Stack: For centralized logging, allowing us to trace a request ID across multiple microservices.

Final Results and Lessons Learned

After a 2-month migration effort, the results spoke for themselves during the Christmas sale:

99.99%

Uptime

40ms

Avg Response Time

$3k

Monthly Cloud Savings

Scalability isn't just about adding more servers. It's about designing systems that fail gracefully and handle pressure intelligently. The biggest lesson? Don't microservice too early. The monolith served its purpose for 2 years. Only optimize when you have data proving the bottleneck.

Is your infrastructure ready for growth?

Book a System Audit

Enjoyed this post? Share it with your network!

Back to Blog

Tech12/5/2025Khateeb Ur Rehman

Scalable System Architecture: From 100 to 10k Concurrent Users

Video Analysis: The Monolith vs Microservices Trade-off

Phase 1: The Diagnosis

The application was a standard Monolithic Node.js service connected to a single PostgreSQL instance. It worked perfectly for 500 users. But at 10,000, we observed severe degradation:

Event Loop Lag: The single-threaded nature of Node.js meant that CPU-intensive tasks (like PDF invoice generation) blocked all incoming requests.
Database Locks: Long-running analytics queries locked rows needed for new orders, causing a cascade of failures.
Memory Leaks: Stateless sessions weren't truly stateless, consuming RAM rapidly.

We used New Relic to identify that the database CPU was sitting at 100% continuously, while the Node.js servers were idle waiting for I/O.

Phase 2: Decoupling with Event-Driven Architecture

We introduced RabbitMQ to handle these side effects asynchronously. This pattern is often called 'Fire and Forget' for non-critical path items.

javascript

// ❌ OLD: Synchronous Bottleneck
app.post('/checkout', async (req, res) => {
  await inventory.check(req.body);
  await payment.process(req.body); // Waits 2s
  await email.sendConfirm(req.body); // Waits 0.5s
  res.send('Order Placed');
});

// ✅ NEW: Asynchronous & Non-Blocking
app.post('/checkout', async (req, res) => {
  const isValid = await inventory.check(req.body);
  if (!isValid) return res.status(400).send('Out of Stock');
  
  // Push to Queue and respond immediately
  await rabbitMQ.publish('orders', req.body);
  res.status(202).json({ status: 'Processing', id: req.body.id });
});

Phase 3: Database Optimization and Sharding

The database is often the hardest thing to scale. Vertical Scaling (buying a bigger server) worked for a while, but it hit a ceiling.

1. Read Replicas

We set up 3 Read Replicas (Slaves) on AWS RDS. We routed all generic `GET` requests to these replicas, keeping the Master DB dedicated to Write operations (INSERT/UPDATE).

2. Horizontal Sharding

Phase 4: The Caching Layer (Redis)

We realized 80% of database hits were for the same 'Top Selling Products'. By caching these queries in AWS ElastiCache (Redis), we reduced DB load by 70%.

"There are only two hard things in Computer Science: cache invalidation and naming things."

We used a 'Write-Through' caching strategy to ensure data consistency. Whenever a product was updated, the cache key was immediately invalidated.

javascript

const getProduct = async (id) => {
  const cacheKey = `product:${id}`;
  
  // 1. Check Redis Cache
  const cached = await redis.get(cacheKey);
  if (cached) return JSON.parse(cached);
  
  // 2. Fetch from DB if miss
  const product = await db.query('SELECT * FROM products WHERE id = ?', [id]);
  
  // 3. Store in Cache (TTL 1 hour)
  await redis.setex(cacheKey, 3600, JSON.stringify(product));
  return product;
};

Monitoring and Observability

You can't improve what you can't measure. We deployed a robust monitoring stack:

Prometheus: To scrape metrics (request count, latency, error rates).
Grafana: To visualize these metrics in real-time dashboards.
ELK Stack: For centralized logging, allowing us to trace a request ID across multiple microservices.

Final Results and Lessons Learned

After a 2-month migration effort, the results spoke for themselves during the Christmas sale:

99.99%

Uptime

40ms

Avg Response Time

$3k

Monthly Cloud Savings

Is your infrastructure ready for growth?

Book a System Audit

Enjoyed this post? Share it with your network!