SRE
About The Position
Dream is a pioneering AI cybersecurity company delivering revolutionary defense through artificial intelligence. Our proprietary AI platform creates a unified security system safeguarding assets against existing and emerging generative cyber threats. Dream's advanced AI automates discovery, calculates risks, performs real-time threat detection, and plans an automated response. With a core focus on the ""unknowns,"" our AI transforms data into clear threat narratives and actionable defense strategies.
Dream's AI cybersecurity platform represents a paradigm shift in cyber defense, employing a novel, multi-layered approach across all organizational networks in real-time. At the core of our solution is Dream's proprietary Cyber Language Model, a groundbreaking innovation that provides real-time, contextualized intelligence for comprehensive, actionable insights into any cyber-related query or threat scenario.
We are seeking an experienced Senior Site Reliability Engineer to join our DevOps team as part of our Engineering group. This role involves taking ownership of monitoring, deploying, and ensuring the reliability of production-grade modern SaaS platforms across Cloud and On-Premise environments.
Responsibilities
- Lead initiatives to enhance product reliability and system readiness.
- Design and implement sophisticated monitoring solutions to ensure high availability and performance of our production platform.
- Oversee and refine the entire product reliability pipeline.
- Proactively troubleshoot and resolve issues across development, production, and testing environments.
- Champion an "Everything as Code" approach using a wide range of technologies including Ansible, Terraform, and Kubernetes.
- Develop and maintain advanced tools for automation, deployment, monitoring, and operations.
- Exhibit excellent communication and interpersonal skills to effectively collaborate within the team and across departments.
- Promote best practices in reliability and system operations.
Skills
- Extensive experience configuring and automating monitoring tools.
- At least 4-5 years of experience as a DevOps or Site Reliability Engineer.
- Strong leadership experience, preferably as a Team Lead or Senior Engineer.
- In-depth knowledge of microservices architectures and technologies such as Kubernetes.
- Comprehensive understanding of cloud & on-prem environments and hybrid solutions.
- Proficiency with major cloud providers: AWS, GCP, Azure.
- Advanced experience with CI/CD technologies including Jenkins, GitHub Actions, and ArgoCD.
- Proficient coding and scripting capabilities in Python, Bash, or similar languages.
- Strong team player with proven ability to lead and inspire.
Advantages:
Background in working with AI components (training, inference, serving).
Tech Stack:
AWS, Kubernetes, EKS, ECS, IaC, GitHub, Terraform, Python, Ansible, Docker + Compose, ArgoCD, MongoDB, RabbitMQ, Redis, Go, Neo4J, AI, and more.