Data & Synthetic Data Engineer
Build data pipelines, curation systems, and synthetic data generation for training AI models.
70
Open Positions
Core Skills
Synthetic Data GenerationData Curation PipelinesSparkAirflowdbtData QualityAnnotation PipelinesBigQuery
Active Positions (50)
Senior Machine Learning Engineer, GenAI Datasenior
Roblox·San Mateo, CA, United States
Synthetic Data GenerationData CurationFoundation ModelsDiffusion ModelsApache SparkMLOps
Senior Software Engineer, Model Lifecyclesenior
Waymo·Kirkland, Washington, USA
Data CurationData QualityFoundation ModelsMLOpsAnnotation PipelinesEvaluation Frameworks
Machine Learning Engineer, Distributed Data Systems - Roboticsmid
OpenAI·San Francisco
Distributed TrainingApache KafkaMultimodal AIMLOpsData CurationApache Spark
Software Engineer, Collectmid
Cohere·Toronto
Annotation PipelinesData CurationLLM Integration
Data Annotation Specialist, Software Engineeringmid
Cohere·Canada
Annotation PipelinesAgentic AIAI-Assisted Code GenerationEvaluation FrameworksHuman-in-the-Loop Systems
Data Annotation Specialist, Data Sciencemid
Cohere·Canada
Annotation PipelinesEvaluation FrameworksHuman-in-the-Loop SystemsAgentic AI
Strategic Projects Lead, Generative AIsenior
Scale AI·India
Large Language Models (LLMs)Data QualityAnnotation PipelinesEvaluation Frameworks
Technical Program Manager, Gen AI Operations Planningmanager
Scale AI·San Francisco, CA; New York, NY
Large Language Models (LLMs)Data QualityAnnotation Pipelines
Software Engineer, Sensor Simulation mid
Anduril·Costa Mesa, California, United States
Synthetic Data GenerationSim-to-RealSensor FusionHardware-in-the-Loop (HIL) Testing
Member of Engineering (Pre-training / Data Research)midRemote
Poolside AI·Remote (EMEA/East Coast)
Pre-TrainingSynthetic Data GenerationData CurationScaling LawsFoundation Models
Engineering Manager I - AI Platform - Evaluation & Annotationmanager
Datadog·Paris, France
Evaluation FrameworksAnnotation PipelinesSynthetic Data GenerationModel Monitoring & ObservabilityMLOps
Head of Forward Deployed Engineeringdirector
Snorkel AI·New York City, NY (Hybrid); Redwood City, CA (Hybrid); San Francisco, CA (Hybrid)
Synthetic Data GenerationHuman-in-the-Loop SystemsEvaluation FrameworksAnnotation PipelinesData QualityLarge Language Models (LLMs)
Staff Software Engineer - EC Lifecyclestaff
Snorkel AI·Redwood City, CA (Hybrid); San Francisco, CA (Hybrid)
Human-in-the-Loop SystemsAnnotation PipelinesData QualityETL Pipelines
Full-Stack Engineer, AI Data Platformmid
Labelbox·San Francisco Bay Area
Human-in-the-Loop SystemsAnnotation PipelinesLarge Language Models (LLMs)Evaluation FrameworksAgentic AI
Senior Staff Data Scientist, Perceptionsenior
Waymo·Mountain View, CA USA
Data CurationScaling LawsEvaluation FrameworksPerception SystemsData QualityAnnotation Pipelines
Software Quality Operations Specialist, Domain ExpansionmidRemote
Waymo·Remote, US
Autonomous DrivingAnomaly DetectionData CurationEvaluation Frameworks
Software Quality Operations Specialist, Safety Evaluationmid
Waymo·Hyderabad, India
Autonomous DrivingAnomaly DetectionEvaluation FrameworksData Curation
Senior/Staff Software Engineer, Labeling Platformsenior
Nuro·Mountain View, California (HQ)
Annotation PipelinesData CurationLiDAR ProcessingPoint CloudsSensor FusionHuman-in-the-Loop Systems
Senior/Staff Software Engineer, ML Data Infrastructuresenior
Nuro·Mountain View, California (HQ)
ETL PipelinesAnnotation PipelinesData CurationApache SparkEmbeddingsAutonomous Driving
Software Engineer, ML Data Infrastructuremid
Nuro·Mountain View, California (HQ)
ETL PipelinesAnnotation PipelinesData CurationEmbeddingsApache SparkAutonomous Driving
Technical Lead, Behavior & Triage Labelingsenior
Nuro·Mountain View, California (HQ)
Annotation PipelinesData CurationLiDAR ProcessingHuman-in-the-Loop SystemsAutonomous DrivingMLOps
Software Engineer - Human Motion Datamid
Apptronik·Austin, TX
Diffusion ModelsReinforcement Learning for RoboticsSynthetic Data GenerationSim-to-RealAnnotation PipelinesSelf-Supervised Learning
Fullstack Software Engineer, Applied AImid
Mercor·San Francisco
Annotation PipelinesData QualitySynthetic Data GenerationEvaluation Frameworks
Technical Program Manager - Human Data Annotationmanager
Mistral AI·Paris
Annotation PipelinesData CurationEvaluation FrameworksSynthetic Data Generation
Senior Software Engineer, Data Infrastructuresenior
Waymo·Mountain View, CA, USA
MLOpsData CurationDistillationData QualityModel ServingAnnotation Pipelines
Technical Program Manager manager
Labelbox·San Francisco Bay Area
Data CurationAnnotation PipelinesEvaluation Frameworks
Manager, Technical Program Managersmanager
Labelbox·San Francisco Bay Area
Data CurationAnnotation PipelinesEvaluation Frameworks
Senior Software Engineer, ML/Eval Data Platforms & Infrastructuresenior
Waymo·Mountain View, CA, USA; San Francisco, CA, USA
Data CurationData QualityETL PipelinesEvaluation FrameworksAnnotation PipelinesFeature Engineering
Senior Staff TLM, Data Mining and Sampling for ML and Evaluationsenior
Waymo·Mountain View, California, United States; San Francisco, California, United States; New York City, New York, United States.
Data CurationAnomaly DetectionExperiment DesignEvaluation Frameworks
Software Engineer, Applied AImid
Mercor·San Francisco
Synthetic Data GenerationPost-TrainingEvaluation FrameworksData CurationModel Serving
Software Engineer - Datamid
xAI·Palo Alto, CA
Data CurationData QualityETL PipelinesApache KafkaDistributed Training
Staff Machine Learning Engineer, Data Flywheelstaff
Waymo·Mountain View, CA USA; San Francisco, CA USA;
Data CurationVision-Language Models (VLMs)Annotation PipelinesSynthetic Data GenerationMLOpsModel Monitoring & Observability
Tech Lead Manager, Data Infrastructuresenior
Cartesia·*HQ - San Francisco, CA
Data CurationSynthetic Data GenerationMultimodal AIPre-TrainingPost-TrainingData Quality
Senior Software Engineer, Perception ML Datasenior
Nuro·Mountain View, California (HQ)
Vision-Language Models (VLMs)Synthetic Data GenerationData CurationAnnotation PipelinesObject Detection
Principal Software Engineer, ML Flywheel Technical Leadsenior
Waymo·Mountain View, CA, USA; San Francisco, CA, USA
Pre-TrainingPost-TrainingFoundation ModelsData CurationMLOpsAnnotation Pipelines
Software Engineer, Data Infrastructuremid
Cartesia·*HQ - San Francisco, CA
Data CurationSynthetic Data GenerationPre-TrainingPost-TrainingData QualityMultimodal AI
Forward Deployed Engineering Managermanager
Labelbox·San Francisco Bay Area
Data CurationAnnotation PipelinesEvaluation FrameworksSynthetic Data Generation
Member of Technical Staff, Pre-Training Datastaff
Cohere·Toronto
Pre-TrainingData CurationData QualityApache SparkFoundation ModelsDistributed Training
Data Engineermid
xAI·Palo Alto, CA
ETL PipelinesApache SparkApache KafkaData CurationAnnotation PipelinesApache Airflow
Staff Software Engineer, Behavior ML Datastaff
Nuro·Mountain View, California (HQ)
Data CurationAnnotation PipelinesETL PipelinesAnomaly DetectionAutonomous DrivingSynthetic Data Generation
ML Data Engineermid
Recraft·London, UK
Data CurationETL PipelinesData QualityAnnotation PipelinesSynthetic Data Generation
Research Engineer, Datamid
Cartesia·*HQ - San Francisco, CA
Multilingual AI CapabilitiesData CurationSynthetic Data GenerationAnnotation PipelinesData QualitySpeech Recognition (ASR)
Software Engineer, ML Data Infrastructuremid
Ideogram·Toronto
Data CurationDistributed TrainingETL PipelinesApache SparkSynthetic Data Generation
Software Engineer, AI Data & Evaluationmid
Mercor·San Francisco
Synthetic Data GenerationEvaluation FrameworksData CurationAnnotation PipelinesPost-Training
Data Engineer midRemote
Tavus·Remote
Data CurationMultimodal AIAnnotation PipelinesSynthetic Data GenerationETL Pipelines
Full-Stack Software Engineer, Reinforcement Learningmid
Anthropic·San Francisco, CA | New York City, NY
Reinforcement LearningRLHFAnnotation PipelinesHuman-in-the-Loop SystemsSynthetic Data GenerationEvaluation Frameworks
Software Engineer, RL Datamid
Anthropic·London, UK; San Francisco, CA | New York City, NY
Reinforcement Learning from Human Feedback (RLHF)Annotation PipelinesData CurationHuman-in-the-Loop SystemsReward ModelingEvaluation Frameworks
Technical Program Manager, Data Acquisitionmanager
OpenAI·San Francisco
Data CurationAnnotation PipelinesEvaluation FrameworksData Quality
Software Engineer, Research - Human Datamid
OpenAI·San Francisco
RLHFAnnotation PipelinesEvaluation FrameworksHuman-in-the-Loop SystemsAlignmentData Curation
Senior Software Engineer - Expert Contributor Lifecyclesenior
Snorkel AI·Redwood City, CA (Hybrid); San Francisco, CA (Hybrid)
Human-in-the-Loop SystemsAnnotation PipelinesData QualityETL Pipelines