Relational Data at the Edge

()
Relational Data at the Edge

PostgreSQL Architecture at Cloudflare

  • CloudFlare processes over 46 million HTTP requests and 55 million row operations per second on its busiest PostgreSQL database cluster.
  • CloudFlare runs its software and service stack on its own hardware for flexibility and control.
  • The database architecture is designed for high availability with an SLO of 99.95% availability, allowing for only 5 minutes and a half of downtime per year.
  • The system handles a high rate of read and write requests with minimal latency and maintains fault tolerance.

PostgreSQL Cluster Topology

  • A typical PostgreSQL deployment contains a single primary instance that replicates data to multiple replicas, supporting a high read rate.
  • The primary database handles all writes, while asynchronous and synchronous replicas handle reads.
  • PostgreSQL cluster topologies are managed by Stolon, backed by the etcd distributed key value store.
  • An active standby model is adopted for cross-region data redundancy, with the primary region serving all inbound queries and a standby region ready to handle traffic in case of evacuation or failure.
  • The primary PostgreSQL cluster replicates data across regions to the standby.

PostgreSQL Replication

  • PostgreSQL is a monolithic database management system that achieves durability using write-ahead logs.
  • Streaming replication in PostgreSQL involves creating a TCP connection and streaming log entries from operations within a transaction to another replica.
  • Logical replication in PostgreSQL replicates data at the SQL level, allowing for more flexibility in terms of replication options, but it has limitations such as not replicating schema changes and not being performant at scale.

PostgreSQL High Availability

  • Stolon is an open-source cluster management tool for PostgreSQL that provides failover, supports multiple site redundancy, and has a stable failover mechanism.
  • Connection pooling in PostgreSQL involves managing a finite number of connections to the database server to optimize resource utilization and minimize the overhead of opening and closing connections.
  • PG Bouncer is used as a connection pooler, shielding clients from database switches and failovers.
  • PG Bouncer operates as a lightweight single-process server handling network IO asynchronously, allowing for a higher number of concurrent client connections.
  • Load balancers like HAProxy are used to distribute incoming database queries across multiple Postgres servers, providing high availability and fault tolerance.

Replication Lag and Failure Scenarios

  • Replication lag can become pronounced under heavy traffic, especially for applications with auxiliary data structures and schemas.
  • To minimize replication lag, batching SQL query writes into smaller chunks and caching or reading directly after writing to the primary or synchronous replica can be done.
  • A network switch failure caused a cascading failure where the primary database failed over, leading to a second failure and no more synchronous replicas to promote.
  • A hardware failure caused a synchronous replica to take a long time to resynchronize with the new primary, leading to downtime.
  • The root cause was identified as PostgreSQL spending excessive time copying log files during resynchronization.
  • The issue was resolved by optimizing PostgreSQL to only copy files from the last merge point, reducing replica rebuild time from 2+ hours to 5 minutes.

Future of Distributed PostgreSQL

  • The future of distributed PostgreSQL involves smart failovers based on factors like latency, capacity, and regulatory requirements, as well as exploring embedded data at the edge using SQLite.
  • The speaker discusses various projects and ideas related to data management and storage.
  • One project involves replacing the Postwire wire protocol with SQL Lite as the storage engine, allowing applications to interact with Postwire as usual.
  • Another idea is to bring more client-side data to the edge, such as Cloudflare workers.
  • Data localization challenges, particularly in the EU, are discussed, and how logical replication can provide flexibility in replicating data at the row or column level.
  • Multi-tenant resource isolation is mentioned as a separate problem that needs to be addressed to ensure good citizenship and avoid noisy neighbor issues.

Overwhelmed by Endless Content?