AI Compiler & Kernel Engineer
Write CUDA kernels, ML compilers, and low-level optimizations for AI workloads.
39
Open Positions
Core Skills
CUDA KernelsTritonXLAMLIRC++GPU ProgrammingCompiler DesignFlashAttention
Active Positions (39)
GPU Performance Engineermid
Genmo·San Francisco HQ
CUDA ProgrammingTriton (GPU Kernels)GPU OptimizationInference OptimizationModel ServingKV-Cache Optimization
ASIC Architectmid
Cerebras·Sunnyvale, CA
GPU OptimizationInference OptimizationScaling Laws
ML Software Tool Development Engineermid
Cerebras·Sunnyvale CA or Toronto Canada
Model Monitoring & ObservabilityAnomaly DetectionCompiler DesignInference OptimizationReliability Engineering
Software Engineer, Kernel Reliabilitymid
Cerebras·Sunnyvale CA or Toronto Canada
CUDA KernelsReliability EngineeringGPU OptimizationModel Monitoring & ObservabilityInference Optimization
Staff Python / PyTorch Developer — Frontend Inference Compiler – DubaistaffRemote
Cerebras·Europe; Remote, California, United States; UAE
PyTorchCompiler DesignInference OptimizationLarge Language Models (LLMs)QuantizationFoundation Models
System Software Engineer (Embedded)mid
Cerebras·Sunnyvale, CA
Embedded SoftwareEmbedded LinuxReal-time SystemsInference OptimizationEdge AICUDA Programming
Software Engineer, Performance - New Gradmid
Nuro·Mountain View, California (HQ)
GPU OptimizationGPU ProgrammingReal-time SystemsAutonomous DrivingCUDA Programming
Software Engineer, Acceleratorsmid
OpenAI·San Francisco
CUDA KernelsDistributed TrainingGPU OptimizationInference OptimizationPyTorchModel Serving
Software Engineer, Hardwaremid
OpenAI·San Francisco
CUDA KernelsCompiler DesignDistributed TrainingGPU OptimizationMLIRInference Optimization
ASIC Firmware Engineer, Modelingmid
OpenAI·San Francisco
Firmware DevelopmentEmbedded SoftwareRTOSGPU Clusters
Software Engineer, Kernel Performance & AI Toolingmid
OpenAI·San Francisco
CUDA KernelsGPU OptimizationInference OptimizationAI-Assisted Code GenerationCompiler DesignModel Serving
Systems Research Engineer, GPU Programmingmid
Together AI·San Francisco
CUDA ProgrammingTriton (GPU Kernels)GPU OptimizationGPU ProgrammingFlashAttention
Systems Research Engineer Intern - GPU Programming (Fall 2026)intern
Together AI·San Francisco
CUDA ProgrammingTriton (GPU Kernels)GPU OptimizationGPU Programming
Principal Compiler Engineer - ML Systemsstaff
SambaNova·San Jose, California, United States
Compiler DesignMLIRPyTorchXLAGPU OptimizationTensorFlow
Software Engineer, ML Inference Performancemid
SambaNova·San Jose, California, United States
Compiler DesignInference OptimizationPyTorchTensorFlowMLIR
Software Engineer - GPU Kernelsmid
Baseten·San Francisco
CUDA ProgrammingCUDA KernelsFlashAttentionQuantizationMixture-of-ExpertsGPU Optimization
ML Accelerator Architectmid
Waymo·Mountain View, CA, US; New York City, NY, US
GPU OptimizationInference OptimizationModel CompressionCUDA Programming
ML Microarchitectmid
Waymo·Mountain View, California
GPU ProgrammingCUDA Kernels
Senior ML Compiler Engineer, Computesenior
Waymo·Bangalore, Karnataka, India
Compiler DesignMLIRInference OptimizationCUDA ProgrammingXLATensorRT
Software Engineer, GPUmid
Waymo·Mountain View, CA, USA; New York, NY, USA
GPU ProgrammingCUDA ProgrammingGPU OptimizationEmbedded LinuxEmbedded Software
Software Engineer, Machine Learning/AI Acceleratormid
Waymo·Taipei, Taiwan; Hsinchu, Taiwan
Firmware DevelopmentRTOSTensorFlowJAXInference OptimizationEdge Inference
Senior Systems Software Engineer, GPU ComputeseniorRemote
Nebius·Remote - United States
GPU ClustersGPU OptimizationCUDA ProgrammingDistributed TrainingGPU Programming
AI/ML Physical Design Flow Engineermid
Tenstorrent·Austin, Texas, United States; Fort Collins, Colorado, United States; Santa Clara, California, United States
PyTorchTensorFlowCompiler DesignGPU OptimizationMLIR
Automotive and Robotics SOC Architect mid
Tenstorrent·United States
CUDA ProgrammingGPU OptimizationAutonomous DrivingCompiler Design
Design Verification Lead, AI Hardware senior
Tenstorrent·Toronto, Ontario, Canada
GPU ProgrammingHardware TestingCUDA ProgrammingGPU Optimization
Software Engineer, Acceleration Kernel Developmentmid
Tenstorrent·Toronto, Ontario, Canada
GPU ProgrammingInference OptimizationGPU OptimizationCUDA Programming
Software Engineer, AI Compilermid
Tenstorrent·Austin, Texas, United States
MLIRCompiler DesignGPU OptimizationInference OptimizationGPU Programming
Software Engineer, Kernel Development and Optimizationmid
Tenstorrent·Gdańsk, Pomeranian Voivodeship, Poland; Warszawa, Masovian Voivodeship, Poland
GPU ProgrammingInference OptimizationFlashAttentionGPU OptimizationCUDA Programming
Software Engineer, Metal Runtime (API & Abstractions)mid
Tenstorrent·Austin, Texas, United States; Santa Clara, California, United States; Toronto, Ontario, Canada
GPU ProgrammingInference OptimizationReal-time SystemsGPU Optimization
Sr Engineer, AI Kernelsenior
Tenstorrent·Austin, Texas, United States; Toronto, Ontario, Canada
GPU ProgrammingGPU OptimizationEmbedded SoftwareInference OptimizationCUDA Programming
Sr. Engineer, Performance Infrastructuresenior
Tenstorrent·Austin, Texas, United States
Compiler DesignGPU OptimizationMLIRInference Optimization
Sr. Engineer, Software - AI Compilersenior
Tenstorrent·Austin, Texas, United States; Santa Clara, California, United States; Toronto, Ontario, Canada
MLIRCompiler DesignPyTorchJAXDistributed Training
Kernel Driver Software Engineermid
Etched·San Jose
Embedded LinuxGPU ProgrammingInference Optimization
Systems Engineer, Kernelmid
CoreWeave· Livingston, NJ / New York, NY / Sunnyvale, CA / Bellevue, WA
GPU ProgrammingCUDA ProgrammingGrafanaPrometheus
Research Engineer, Performance RLmid
Anthropic·San Francisco, CA
CUDA ProgrammingTriton (GPU Kernels)JAXPyTorchGPU OptimizationReinforcement Learning from Human Feedback (RLHF)
Staff Software Engineer - GenAI Performance and Kernelstaff
Databricks·San Francisco, California
CUDA ProgrammingTriton (GPU Kernels)QuantizationFlashAttentionKV-Cache OptimizationInference Optimization
Senior GenAI Research Engineer - Optimization and Kernelssenior
Databricks·Mountain View, California; San Francisco, California
CUDA KernelsDistributed TrainingFlashAttentionPyTorchTriton (GPU Kernels)Megatron
Performance Engineer, GPUmid
Anthropic·San Francisco, CA | New York City, NY | Seattle, WA
CUDA KernelsTriton (GPU Kernels)FlashAttentionJAXPyTorchXLA
TPU Kernel Engineermid
Anthropic·San Francisco, CA | New York City, NY | Seattle, WA
CUDA ProgrammingGPU OptimizationInference OptimizationDistributed TrainingQuantizationTransformer Architectures