ML/AI Platform Engineer

Tel Aviv · Full-time

About The Position

Dream is a pioneering AI cybersecurity company delivering revolutionary defense through artificial intelligence. Our proprietary AI platform creates a unified security system safeguarding assets against existing and emerging generative cyber threats. Dream's advanced AI automates discovery, calculates risks, performs real-time threat detection, and plans an automated response. With a core focus on the "unknowns," our AI transforms data into clear threat narratives and actionable defense strategies.  

Dream's AI cybersecurity platform represents a paradigm shift in cyber defense, employing a novel, multi-layered approach across all organizational networks in real-time. At the core of our solution is Dream's proprietary Cyber Language Model, a groundbreaking innovation that provides real-time, contextualized intelligence for comprehensive, actionable insights into any cyber-related query or threat scenario.  

We’re seeking an experienced AI/ML Platform Engineer to join our Foundations team, the group behind our next-generation GenAI platform powering innovation across the company and beyond. This team is building scalable, high-performance AI systems for both internal users and external customers—designed to run seamlessly across cloud and on-premise environments using the latest advancements in hardware. In this role, you’ll lead efforts in distributed training, inference at large scale, resource optimization, and robust model lifecycle management using MLOps best practices. Your work will be critical to accelerating research, supporting production-grade AI infrastructure, and driving the development of our internal AI ecosystem. 

Responsibilities

  • Architect and build scalable ML infrastructure for training and inference workloads across heterogeneous compute environments (on-premise and cloud). 
  • Design and implement distributed systems to support model lifecycle management — from data ingestion and preprocessing, to training orchestration and deployment. 
  • Optimize performance and cost-efficiency of large-scale model training and serving pipelines using technologies like Ray, Kubernetes, Spark, and GPU schedulers. 
  • Collaborate with AI researchers, data scientists, and product teams to understand their workflows and translate them into reusable platform services and APIs. 
  • Drive adoption of best practices for CI/CD, observability, and reproducibility in ML systems. 
  • Contribute to the long-term vision and technical roadmap of the ML platform, ensuring it evolves to meet the growing demands of AI across the company. 

Skills

  • 5+ years of experience building large-scale distributed systems or platforms, preferably in ML or data-intensive environments 
  • Proficiency in Python with strong software engineering practices, familiarity with data structures and design patterns 
  • Deep understanding of orchestration systems (e.g., Kubernetes, Airflow, Argo) and distributed computing frameworks (e.g., Ray, Spark, Dask) and  
  • Experience with GPU compute infrastructure, containerization (Docker), and cloud-native architectures 
  • Proven track record of delivering production-grade infrastructure or developer platforms. 
  • Solid grasp of ML workflows, including model training, evaluation, and inference pipelines 

 

Apply for this position

CONTACT US

Fill out the form to get in touch with our Expert Team.