SITE RELIABILITY ENGINEER - CORE C++ TEAM

ClickHouse
Full-time
United States (remote)
$130,000 - $180,000
Posted on 5 months ago

Job Description

As a Site Reliability Engineer at ClickHouse, you will be responsible for ensuring and improving the reliability, availability, scalability, and performance of ClickHouse. You will collaborate with different teams and own areas like escalation management, investigations, post-mortem analysis, and continuous improvement of how ClickHouse is run and optimized in the cloud.

Responsibilities

  • Continuously improve the reliability and performance of ClickHouse core
  • Improve and create metrics and alerts for ClickHouse
  • Identify the root cause of problems and submit bug fixes
  • Enhance and refine incident response processes and post-mortem analysis
  • Plan, enable, and drive Chaos initiatives
  • Manage on-call processes

Requirements

  • Bachelor’s or Master’s degree in Computer Science or a related field
  • 8+ years of experience in Reliability Engineering, QA or customer facing engineering
  • Experience operating ClickHouse or other SQL databases in production
  • Understanding of distributed database internals and SQL
  • Scripting experience with Shell or Python, and ability to read and understand C++ code
  • Knowledge of cloud computing platforms
  • Strong problem-solving and production debugging skills
  • Excellent communication skills

Benefits

  • No benefits