ForHire

SITE RELIABILITY ENGINEER II

GoFundMe

Full-time

San Diego, CA

$128,500 - $192,500

Posted on 6 months ago

Job Description

GoFundMe is seeking a Site Reliability Engineer (SRE) to manage the full system lifecycle, including infrastructure provisioning, system configuration, monitoring, and incident response in production environments. The SRE will assess availability, latency, scalability, and efficiency, building reliability into systems and working closely with various teams to ensure high application performance and availability.

Responsibilities

Design and build cloud infrastructure
Participate in software and system performance analysis
Manage platform and application availability, scalability, security, and performance
Diagnose bottlenecks and provide recommendations
Implement enhancements to monitoring
Recommend and implement infrastructure changes
Improve SLO/SLI framework
Use data analysis to identify trends
Perform 24/7 on-call duties

Requirements

3+ years of experience in high-traffic SaaS environments
Deep expertise in delivering high availability
Skills to build a cloud orchestration framework on AWS
Experience running containerized infrastructure in Production (Kubernetes using EKS, AWS ECS)
Experience implementing configuration management and automation solutions using Infrastructure as Code, CI/CD and GitOps (Ansible, Terraform, ArgoCD, Github Actions)
Strong working knowledge of Linux
Solid scripting skills (e.g. Bash, Python)
Experience with performance diagnostics, tuning, capacity planning, and monitoring
BS in Computer Science or equivalent
Good verbal and written communication skills

Benefits

No benefits