ForHire

SENIOR SRE ENGINEER

Shield AI

Full-time

San Diego Metro Area

$129,467 - $194,201

Posted on 5 months ago

Job Description

As a Site Reliability Engineer at Hivemind, you will ensure the performance, reliability, and scalability of cloud infrastructure by building and maintaining monitoring and alerting systems, defining incident response strategies, and automating operational processes.

Responsibilities

Design, implement, and maintain monitoring, logging, and alerting systems
Define incident response procedures and participate in on-call rotations
Identify and resolve reliability and performance issues across services
Develop automation tools to streamline operations
Collaborate with engineering teams to ensure new services are production-ready
Conduct root cause analyses and implement post-incident improvements
Champion a culture of reliability, observability, and operational excellence

Requirements

5+ years of experience in Site Reliability Engineering or related roles
Strong experience with AWS services
Deep understanding of Kubernetes and containerized deployments
Proficiency with monitoring and observability tools
Strong scripting or programming skills
Experience with infrastructure-as-code
Solid understanding of networking, Linux systems, and distributed architectures
Experience with service meshes
Familiarity with security best practices in cloud environments
Exposure to GitOps workflows and tools

Benefits

No benefits