Liquid AI, an MIT spin-off, is seeking a highly skilled Member of Technical Staff to play a critical role in their foundation model development process, focusing on consolidating, gathering, and generating high-quality text data.
Responsibilities
Create and maintain data cleaning, filtering, selection pipeline
Watch out for the release of public dataset on huggingface and other platforms
Create crawlers to gather datasets from the web
Write and maintain synthetic data generation pipelines
Run ablations to assess new dataset and judging pipelines
Requirements
Expertise in data curation, cleaning, augmentation, and synthetic data generation techniques
Ability to write and debug models in popular ML frameworks, and experience working with LLMs