SENIOR PLATFORM ENGINEER I, ML DATA SYSTEMS (24 MONTHS FIXED-TERM)

Khan Academy
Full-time
Mountain View, CA
$137,871 - $172,339 USD / $186,306 - $232,883 CAN
Posted on 5 months ago

Job Description

We’re looking for an ML Data Engineer to evolve our eval dataset tools to meet the growing platform needs of AI-based tutoring at Khan Academy. You will gather internal requirements, design schema, deploy, document, and train people on an internal dataset management framework.

Responsibilities

  • Evolve and maintain pipelines for transforming raw trace data into ML-ready datasets
  • Clean, normalize, and enrich data while preserving semantic meaning and consistency
  • Prepare and format datasets for human labeling, and integrate results into ML datasets
  • Develop and maintain scalable ETL pipelines using Airflow, DBT, Go, and Python running on GCP
  • Implement automated tests and validation to detect data drift or labeling inconsistencies
  • Collaborate with AI engineers, platform developers, and product teams to define data strategies
  • Contribute to shared tools and documentation for dataset management and AI evaluation
  • Inform data governance strategies for proper data retention, PII controls/scrubbing, and isolation of sensitive data

Requirements

  • Bachelor’s or Master’s degree in Computer Science, Data Engineering, or a related field
  • 5 years of Software Engineering experience with 3+ years working with large ML datasets
  • Strong programming skills in Go, Python, SQL, and at least one data pipeline framework
  • Experience with data versioning tools and cloud storage systems
  • Familiarity with machine learning workflows
  • Familiarity with the architecture and operation of large language models
  • Attention to detail and an obsession with data quality and reproducibility
  • Motivated by the Khan Academy mission
  • Proven cross-cultural competency skills

Benefits

  • No benefits