From Open Source to SaaS: the Journey of ClickHouse

()
From Open Source to SaaS: the Journey of ClickHouse

ClickHouse

  • ClickHouse is a fast, open-source OLAP database that supports replication, sharding, multimaster, and cross-region.
  • ClickHouse is column-oriented, making it faster than row-oriented databases for aggregation queries.
  • ClickHouse has many bottom-up optimizations that make it even faster than other columnar databases.
  • ClickHouse is used for real-time data processing, business intelligence, logging, metrics, and machine learning.

ClickHouse Cloud

  • ClickHouse Cloud is a service that offers ClickHouse as a managed database in the cloud.
  • The guiding principles for ClickHouse Cloud were:
    • Serverless experience
    • Performance
    • Separation of compute and storage
    • Tenant isolation and security
    • Multicloud support
  • Kubernetes was chosen as the compute platform due to its serverless experience, separation of compute and storage, and multicloud support.
  • The ClickHouse Cloud architecture consists of a control plane and a data plane.
  • The control plane handles customer-facing tasks such as cluster and user management, authentication, user communication, and billing.
  • The data plane hosts the ClickHouse clusters and provides features such as auto-scaling, metrics, and a Kubernetes operator.
  • Users connect to their clusters through a shared load balancer per region, which hands off requests to Istio based on routing rules.
  • ClickHouse interacts with data stored in S3, which serves as persistent and durable storage for all customer clusters' data.
  • ClickHouse on AWS introduced local disks and network latency to support data storage on S3.
  • EBS volumes and SSDs are used for caching to achieve similar performance to self-hosted ClickHouse.
  • A shared load balancer is used instead of dedicated load balancers per cluster to improve user experience and reduce costs.
  • Psyllium, a Kubernetes network plugin, is used for network policies and logical isolation between clusters.
  • Vertical auto-scaling adjusts the size of individual replicas but can be disruptive and cause cache loss.
  • Horizontal auto-scaling adjusts the size of the cluster but can lead to data integrity issues and communication problems.
  • Beta launch included vertical auto-scaling only, with horizontal auto-scaling still being worked on.
  • Vertical auto-scaling is automated by publishing usage metrics to a central metric store and using those metrics to make scaling decisions.

Development and Milestones

  • Development milestones included a private preview in May 2022, public beta in October, and GA in December.
  • Private preview focused on basic cloud offerings, security, and self-service capabilities.
  • Public beta introduced autoscaling, metering, enhanced security features, and rigorous testing.
  • GA addressed customer feedback, enhanced the cloud console, introduced developer-friendly features, and prioritized reliability and security for uptime SLA and compliance.

Success Factors

  • Success factors included milestone-driven development, respecting timelines while adjusting priorities, and emphasizing reliability and security as core features from the start.
  • Gathering user feedback early and often is crucial for building an accurate product that addresses customer pain points.
  • Customer feedback during the private preview and public beta phases led to enhancements in security features, console, and Developer Edition, demonstrating the team's ability to respond quickly and delight customers.
  • The introduction of SSDs resulted in a significant performance increase, eliminating network latency and matching the performance of self-hosted ClickHouse.
  • The team utilizes ISO not only for load balancing and traffic management but also for idling instances during periods of inactivity to optimize costs for both the company and customers.

Overwhelmed by Endless Content?