Site Reliability Engineer (SRE) - AI/Defense

Ensure reliability of deployed AI systems and defense infrastructure

31
Open Positions

Active Positions (34)

Senior Site Reliability Engineersenior
Anduril·Irvine, California, United States; Washington, District of Columbia, United States
Counter Intrusion systemsAir Defense systemsLattice OSsensor fusionsite reliability engineeringcloud deployments
Site Reliability Engineer, Discoverymid
Anduril·Seattle, Washington, United States
site reliability engineeringmission autonomymesh networkingsystems integrationroboticsnetworking
Senior Site Reliability Engineersenior
Anduril·Seattle, Washington, United States
Staff Software Engineer, AI Reliability Engineeringstaff
Anthropic·London, UK
AI Reliability Engineering (AIRE)Service Level Objectives for LLM servingmonitoring across the token pathsafeguard model serving
Senior Site Reliability Engineersenior
Spotify·New York, NY
BackstageAI-native workflowsagentic production systemsbackground coding agentsdeveloper portalsagentic developer tooling
Senior Site Reliability Engineer - Deployed, Connected Warfaresenior
Anduril·Costa Mesa, California, United States
Connected Warfaresystem deploymenthardware and software installationnetwork expansionmission critical capabilitiescustomer demonstrations
DevOps Engineer, IPSmid
Scale AI·Doha, Qatar
Infrastructure as Code (Terraform)CloudFormationCI/CD pipelinescontainerized applicationsVPCsVPNs
SRE / Incident Manager Team Leader (x/f/m)senior
Doctolib·Paris, Paris, France
Phone Assistant AIAI-powered Voice ServicesNode.js/TypeScript Backend
Site Reliability Engineer, Discoverymid
Anduril·Washington, District of Columbia, United States
mission autonomymesh networkingsystems integration for roboticsnetworking for autonomous systemssite reliability engineering for roboticsend-to-end system deployment
Engineering Manager - Observability & Reliability Engineering Obsession (x/f/m)manager
Doctolib·Paris, Paris, France
bias evaluation in language modelsdisaggregated evaluationdialogue system bias analysishealth NLPclinical conversation analysis
Senior Site Reliability Engineer - Database (x/f/m)senior
Doctolib·Nantes
LLMVLMRAG-based systemsAI Medical CompanionVector DatabasesGoogle Cloud Platform (GCP)
AI Deployment Managermanager
Cursor·SF / NY
CursorAI-driven organizational changeenterprise rollout strategycustomer retention analyticsproduct adoption metricsrevenue expansion strategy
Engineering Manager SRE (x/f/m)manager
Doctolib·Paris, Paris, France
CI/CD automationTesting infrastructureContract testingVisual regression testingEphemeral development environmentsDeveloper productivity tooling
Senior Site Reliability Engineer - Observability (x/f/m)senior
Doctolib·Berlin, Berlin, Germany; Paris, Paris, France
observability strategyloggingmetricstracingalertingincident detection
Senior Site Reliability Engineer, AI ResearchseniorRemote
Algolia·Remote - Australia
Site Reliability Engineeringcloud-firstservice-oriented architecturesGoogle Cloud PlatformSRE fundamentalsproduction services
Site Reliability Engineer Internintern
Dataiku·France, Paris
Dataiku Cloudfully-managed offeringlaunchpadSaaS portalCloud EngineeringSRE
Site Reliability Engineer - Cybersecuritymid
xAI·Palo Alto, CA
Site Reliability Engineermid
Together AI·San Francisco
PagerDutyAnsibleTerraform for Infrastructure as CodeKubernetes for AI Workload Orchestration
Staff / Senior Software Engineer, AI Reliabilitysenior
Anthropic·San Francisco, CA | New York City, NY | Seattle, WA
AI Reliability Engineering (AIRE)Large Language Model (LLM) serving systemsService Level Objectives (SLOs) for AISafeguards
Senior System Administratorsenior
Anduril·Santa Ana, California, United States
DISA STIGair-gapped classified IT systemssecurity hardening
Member of Technical Staff - Infrastructure Reliabilitystaff
xAI·Palo Alto, CA
Distributed GPU training superclustersHigh-QPS production systemsOn-prem and cloud-based infrastructure managementInfrastructure automation in RustCompute, storage, and networking infrastructure reliability
Site Reliability Engineer (SRE)mid
xAI·London, UK
BuildkiteArgoCDPrometheusGrafanaPagerDutyPulumi
Site Reliability Engineer - US Governmentmid
xAI·Palo Alto, CA; Washington, D.C.
GPU HardwareClassified CloudBare Metal InfrastructureHybrid Cloud ArchitectureAI Training ClustersAI Inference Clusters
Site Reliability Engineer / DevOpsmid
Scale AI·Mexico City, MX
Robot station infrastructureOn-site AI hardware managementTechnical Facility CoordinationPhysical AIRemote engineering support systems
Staff Site Reliability Engineer - Data Platform (x/f/m)staff
Doctolib·Paris, Paris, France
Multi-region data infrastructureDataOps best practicesData Infrastructure Cost Analysis and OptimizationData platform architecture for machine learning99.9% Reliability Engineering for Data SystemsAdvanced monitoring and alerting systems for data platforms
Senior Site Reliability Engineersenior
Algolia·Paris, France
Site Reliability Engineeringscalable architecturesreliabilitycost efficiencySearch products availabilityFleet team
Senior Site Reliability Engineer - Developer, Connected Warfaresenior
Anduril·Costa Mesa, California, United States
Connected Warfaresite reliability engineeringdeployment engineer toolingscalable system deliveryfault tolerant systemswarfighter capabilities
Senior Site Reliability Engineer - Tactical Reconnaissance & Strikesenior
Anduril·Atlanta, Georgia, United States
Lattice OSautonomous dronessolid rocket motorsAnduril Rocket Motor Systems (RMS)high-volume production methods
Staff Software Engineer, AI Reliability Engineeringstaff
Anthropic·Dublin, IE
AI Reliability Engineering (AIRE)Large Language Model (LLM) serving systemsService Level Objectives (SLOs) for AISafeguards
Site Reliability Engineer IImid
Dataiku·United States, New York
pretrainingposttrainingscience organizationtechnical operationsprogram managementexecution engine
Deployment Site Reliability Engineer, Connected Warfaremid
Anduril·Costa Mesa, California, United States
Connected Warfaresystem deployment engineeringmission software integrationsite reliability engineering for hardware
Senior Site Reliability Engineersenior
Anduril·Sydney, New South Wales, Australia
Extra Large Autonomous Undersea Vehicle (XL-AUV)Multi-Domain Autonomous SystemsSecure Software Delivery ToolchainsMulti-Classification Domain SystemsVMware ESXi
Senior Site Reliability Engineer, Production Engineering senior
Anduril·Seattle, Washington, United States
Lattice OSAI-powered operating systemsensor fusionautonomous command and controlreal-time 3D command center
Senior DevOps Engineer, Spacesenior
Anduril·Costa Mesa, California, United States
Lattice OSSpace Domain Awareness (SDA)Space ControlSDANetInfrastructure pipeline hardeningTest and release pipeline development