Skip to content
gentic.news — AI News Intelligence Platform
Connecting to the Living Graph…

AI Safety & Red Teaming

Ensure AI systems are safe, aligned, and robust. Red teaming, guardrails, interpretability, evaluation.

80
Open Positions

Core Skills

Safety Red-TeamingMechanistic InterpretabilityRLHFEvaluation FrameworksAlignmentGuardrailsAdversarial Testing

Active Positions (50)

AI Engineer, Evaluationmid
Distyl AI·San Francisco
Evaluation FrameworksAdversarial TestingLarge Language Models (LLMs)Prompt EngineeringAgentic AIHuman-in-the-Loop Systems
Technical Program Manager – Adversarial Model Researchmanager
OpenAI·San Francisco
Adversarial TestingEvaluation FrameworksPrompt EngineeringSafety Red-TeamingLarge Language Models (LLMs)
Data Science Manager, Integritymanager
OpenAI·San Francisco
Anomaly DetectionA/B TestingAdversarial MLExperiment DesignEvaluation Frameworks
AI Emerging Risks Analystmid
OpenAI·San Francisco
AI GovernanceAdversarial MLAI Red-TeamingAnomaly DetectionEvaluation Frameworks
Researcher, Misalignment Researchmid
OpenAI·San Francisco
AI Red-TeamingAdversarial TestingEvaluation FrameworksAI AlignmentReward ModelingAgentic AI
Abuse Investigator (AI Self-Improvement Risk)mid
OpenAI·San Francisco
Agentic AIAdversarial TestingAI Red-TeamingEvaluation FrameworksTool-Use Agents
Researcher, Alignment Trainingmid
OpenAI·San Francisco
Synthetic Data GenerationRLHFReward ModelingAlignmentPost-TrainingPre-Training
Researcher, Alignment Oversightmid
OpenAI·San Francisco
AI AlignmentReinforcement Learning from Human Feedback (RLHF)Human-in-the-Loop SystemsEvaluation FrameworksAdversarial TestingAgentic AI
Data Scientist, Safetymid
OpenAI·London, UK
Anomaly DetectionEvaluation FrameworksExperiment DesignA/B TestingModel Monitoring & Observability
Security Researcher, Agentic AI Threatsmid
OpenAI·San Francisco
Agentic AIAI Red-TeamingAdversarial TestingModel SecurityThreat Modeling
Software Engineer, Cyber Frontiermid
OpenAI·San Francisco
Evaluation FrameworksAdversarial MLModel Fine-TuningSynthetic Data GenerationAI Red-TeamingModel Security
Senior Technical Advisor, National Security Policy, Global Affairssenior
OpenAI·Washington, DC
AI GovernanceAI AlignmentEvaluation FrameworksAI Governance Frameworks
Biosafety Red Teaming Specialistmid
OpenAI·San Francisco
AI Red-TeamingAdversarial TestingEvaluation Frameworks
People Research Data Scientist, AI Fairness & Biasmid
OpenAI·San Francisco
Evaluation FrameworksAdversarial MLExperiment DesignHuman-in-the-Loop SystemsAgentic AIA/B Testing
Research Engineer, Frontier Safety Risk Assessmentmid
Google DeepMind·London, UK; New York City, New York, US; San Francisco, California, US
AI Red-TeamingEvaluation FrameworksMechanistic InterpretabilityAlignmentAdversarial TestingFoundation Models
Member of Technical Staff, Trust & Safety EngineerstaffRemote
Runway·Remote
AI Red-TeamingAdversarial TestingGuardrailsEvaluation FrameworksPyTorchAnomaly Detection
Research Scientist, Safety Post Trainingmid
Scale AI·San Francisco, CA; New York, NY
Post-TrainingReinforcement Learning from Human Feedback (RLHF)Direct Preference Optimization (DPO)GRPO (Group Relative Policy Optimization)Mechanistic InterpretabilityReward Modeling
Strategic Projects Lead, Red Teamsenior
Scale AI·San Francisco, CA; New York, NY
AI Red-TeamingAdversarial MLAdversarial TestingPrompt Injection DefenseEvaluation FrameworksLarge Language Models (LLMs)
Senior Product Operations Manager, Evaluationsenior
Harvey AI·San Francisco
Evaluation FrameworksAnnotation PipelinesHuman-in-the-Loop Systems
Senior Medical Lead - AI Health Companion (x/f/m)senior
Doctolib·Paris, Paris, France
Evaluation FrameworksLarge Language Models (LLMs)
Security Engineer - Offensive Securitymid
Stripe·Ireland
Threat ModelingAdversarial TestingDetection EngineeringAI Red-Teaming
CyberSecurity Engineer, Offensive Securitymid
Mistral AI·Paris
AI Red-TeamingAdversarial TestingAdversarial MLModel SecurityDetection EngineeringPrompt Injection Defense
Pentester, Offensive Forward Deployment Engineermid
Mistral AI·Paris
AI Red-TeamingAdversarial TestingAdversarial MLModel SecurityPrompt Injection Defense
Senior Machine Learning Engineer - Policy & Safetysenior
Spotify·New York, NY
Multimodal AILarge Language Models (LLMs)Evaluation FrameworksPyTorchAnomaly Detection
Senior Staff Machine Learning Engineer - Content Policy & Safetysenior
Spotify·London
PyTorchAnomaly DetectionModel Monitoring & ObservabilityEvaluation FrameworksGuardrails
Sr. Machine Learning Engineer, Responsible AI– Applied Research ScienceseniorRemote
Pinterest·San Francisco, CA, US; Remote, CA, US
Multimodal AIAdversarial MLAI GovernanceEvaluation FrameworksVision-Language Models (VLMs)Recommendation Systems
Sr. Staff Machine Learning Engineer, Content Qualitysenior
Pinterest·San Francisco, CA, US
Vision-Language Models (VLMs)AI AlignmentEvaluation FrameworksRecommendation SystemsLearning-to-RankAdversarial ML
Staff Product Manager, AI SafetystaffRemote
Pinterest·San Francisco, CA, US; Remote, US
AI Red-TeamingEvaluation FrameworksGuardrailsAdversarial TestingAI GovernanceThreat Modeling
Principal Data Scientist - Safetystaff
Roblox·San Mateo, CA, United States
Anomaly DetectionExperiment DesignA/B TestingNatural Language Processing (NLP)Synthetic Data GenerationComputer Vision
Principal Machine Learning Engineer, Alt Defensestaff
Roblox·San Mateo, CA, United States
Anomaly DetectionAdversarial MLNatural Language Processing (NLP)Computer VisionLarge Language Models (LLMs)Evaluation Frameworks
Senior / Principal Software Engineer - Asset Safetysenior
Roblox·San Mateo, CA, United States
Computer VisionImage SegmentationMultimodal AIAnomaly Detection
AI Research Engineer - AI Safetymid
Helsing·Berlin; London; Munich
Evaluation FrameworksAdversarial MLAdversarial TestingAnomaly DetectionExperiment DesignAI Alignment
AI Quality Assurance Internintern
Cresta·Toronto, Ontario
Evaluation FrameworksAnnotation PipelinesNatural Language Processing (NLP)
Trust and Safety Support Specialistmid
Lovable·Stockholm
Adversarial TestingGuardrails
Data Scientist, Safeguardsmid
Anthropic·New York City, NY; San Francisco, CA; Seattle, WA
A/B TestingExperiment DesignAnomaly Detection
Engineering Manager, Cloud Safetymanager
Anthropic·San Francisco, CA | Seattle, WA
Model ServingGuardrailsLarge Language Models (LLMs)Model Monitoring & ObservabilityModel Security
Engineering Manager, Safeguards Data Infrastructuremanager
Anthropic·New York City, NY
Data CurationDifferential PrivacyETL PipelinesData QualityAnnotation Pipelines
Engineering Manager, Safeguards Review Toolingmanager
Anthropic·San Francisco, CA
GuardrailsAnnotation PipelinesData QualityAgentic AIHuman-in-the-Loop Systems
Research Engineer, Safeguards Labsmid
Anthropic·San Francisco, CA | New York City, NY
Adversarial MLAI Red-TeamingEvaluation FrameworksAnomaly DetectionGuardrailsModel Security
Software Engineer, Safeguards Evals mid
Anthropic·San Francisco, CA | New York City, NY
Evaluation FrameworksAdversarial TestingAgentic AIReinforcement LearningAnnotation PipelinesSynthetic Data Generation
Software Engineer, Safeguards Foundations (Internal Tooling)intern
Anthropic·London, UK
GuardrailsHuman-in-the-Loop SystemsAnomaly DetectionModel Monitoring & ObservabilityEvaluation Frameworks
TLM, Integritymid
OpenAI·San Francisco
Large Language Models (LLMs)Model Fine-TuningAdversarial MLEvaluation FrameworksModel Monitoring & Observability
Research Engineer, Privacymid
OpenAI·San Francisco
Differential PrivacyFederated LearningPrivacy-Preserving MLPyTorchJAXAdversarial ML
Protection Scientist Engineer, Integrity mid
OpenAI·San Francisco
Anomaly DetectionAdversarial MLModel Monitoring & ObservabilityEvaluation FrameworksSynthetic Data Generation
Researcher, Alignment Sciencemid
OpenAI·San Francisco
AlignmentRLHFReward ModelingEvaluation FrameworksReinforcement LearningScaling Laws
Researcher, Pretraining Safetymid
OpenAI·San Francisco
Pre-TrainingScaling LawsData CurationEvaluation FrameworksDiffusion ModelsPyTorch
Fullstack Engineer, Safety Engineeringmid
OpenAI·San Francisco
GuardrailsEvaluation FrameworksAnnotation PipelinesAI Red-TeamingModel Monitoring & Observability
Anthropic Fellows Program — AI SafetymidRemote
Anthropic·London, UK; Ontario, CAN; Remote-Friendly, United States; San Francisco, CA
AI AlignmentSafety Red-TeamingEvaluation FrameworksAdversarial MLAdversarial Testing
Program Manager - Quality & Training (Safety Operations)manager
xAI·Bastrop, TX
Evaluation FrameworksAI Governance
Research Scientist, Frontier Risk Evaluationsmid
Scale AI·San Francisco, CA; New York, NY
Evaluation FrameworksAdversarial TestingAI Red-TeamingLarge Language Models (LLMs)Agentic AIAI Governance Frameworks