LEAD SITE RELIABILITY ENGINEER - FEDERAL TEAM

Saviynt
Full-time
Los Angeles / Atlanta
$135,000 - $180,000
Posted on 3 months ago

Job Description

Saviynt is an identity authority platform focused on providing visibility, control, and intelligence to defend against threats. As a Lead Site Reliability Engineer, you will be responsible for customer deployments, cloud infrastructure management, automation, and collaboration with development teams to improve deployment processes. You will also maintain compliance with security standards and design solutions for cloud environment provisioning.

Responsibilities

  • Perform customer deployments, migrations, and upgrades in the cloud environment
  • Install and configure Saviynt products
  • Troubleshoot and resolve incidents, collaborating with development and IT teams
  • Manage and maintain cloud infrastructure on AWS, Azure, or Google Cloud
  • Automate manual tasks during deployments
  • Troubleshoot cloud-related infrastructure incidents
  • Develop and maintain CI/CD pipelines
  • Automate infrastructure setup and maintenance using Infrastructure as Code (IaC)
  • Collaborate with development, operations, and QA teams to improve deployment processes
  • Maintain compliance with security and quality standards
  • Create and maintain technical documents for cloud infrastructure
  • Design and implement solutions to automate cloud-environment provisioning
  • Develop automation scripts to reduce repetitive tasks
  • Configure and deploy monitoring tools

Requirements

  • U.S. Citizenship
  • 8+ years of experience in observability, SRE, or cloud platform roles
  • 4+ years of hands-on cloud experience (AWS, Azure)
  • Proven track record of operating highly available systems in public cloud environments
  • 3+ years of experience in software development using Python, NodeJS, or Java
  • Expertise in container orchestration platforms (Kubernetes) and service mesh technologies
  • Experience implementing observability at scale using tools like Prometheus, Grafana, OpenTelemetry, ELK/OpenSearch, Datadog, CloudWatch, or Azure Monitor
  • Success in driving adoption of SLOs, SLIs, error budgets, and automated alerting frameworks
  • Experience with infrastructure as code (e.g., Terraform, Helm) and automated deployment pipelines
  • Leadership in setting engineering standards and mentoring team members
  • Strong analytical and communication skills
  • Meet US persons on US soil requirements
  • Undergo full background investigation/screening
  • Undergo IAL3 requirements

Benefits

  • No benefits