SITE RELIABILITY ENGINEER II

GoFundMe
Full-time
San Diego, CA
$128,500 - $192,500
Posted on 6 months ago

Job Description

GoFundMe is seeking a Site Reliability Engineer (SRE) to manage the full system lifecycle, including infrastructure provisioning, system configuration, monitoring, and incident response in production environments. The SRE will assess availability, latency, scalability, and efficiency, building reliability into systems and working closely with various teams to ensure high application performance and availability.

Responsibilities

  • Design and build cloud infrastructure
  • Participate in software and system performance analysis
  • Manage platform and application availability, scalability, security, and performance
  • Diagnose bottlenecks and provide recommendations
  • Implement enhancements to monitoring
  • Recommend and implement infrastructure changes
  • Improve SLO/SLI framework
  • Use data analysis to identify trends
  • Perform 24/7 on-call duties

Requirements

  • 3+ years of experience in high-traffic SaaS environments
  • Deep expertise in delivering high availability
  • Skills to build a cloud orchestration framework on AWS
  • Experience running containerized infrastructure in Production (Kubernetes using EKS, AWS ECS)
  • Experience implementing configuration management and automation solutions using Infrastructure as Code, CI/CD and GitOps (Ansible, Terraform, ArgoCD, Github Actions)
  • Strong working knowledge of Linux
  • Solid scripting skills (e.g. Bash, Python)
  • Experience with performance diagnostics, tuning, capacity planning, and monitoring
  • BS in Computer Science or equivalent
  • Good verbal and written communication skills

Benefits

  • No benefits