Senior Data Engineer
About The Position
Dream is a pioneering AI cybersecurity company delivering revolutionary defense through artificial intelligence. Our proprietary AI platform creates a unified security system safeguarding assets against existing and emerging generative cyber threats. Dream’s advanced AI automates discovery, calculates risks, performs real-time threat detection, and plans an automated response. With a core focus on the “”unknowns,”” our AI transforms data into clear threat narratives and actionable defense strategies.
Dream’s AI cybersecurity platform represents a paradigm shift in cyber defense, employing a novel, multi-layered approach across all organizational networks in real-time. At the core of our solution is Dream’s proprietary Cyber Language Model, a groundbreaking innovation that provides real-time, contextualized intelligence for comprehensive, actionable insights into any cyber-related query or threat scenario.
We are seeking an experienced Data Engineer to join our Platform and DevOps Engineering group. In this role, you will be pivotal in developing and maintaining Dream’s data lake, which is the backbone for data analytics, AI model training and—key components of our proprietary technology. This position is crucial as it involves handling vast amounts of sensitive data and ensuring its availability, accuracy, and security.
We are looking for an experienced Data Engineer who is passionate about building scalable data infrastructures and has a keen interest in leveraging data to drive significant business impact. The job involves creating robust data solutions that support both batch and real-time data processing in cloud and on-prem environments.
Responsibilities
- Architect, build, and maintain a secure and scalable data lake that integrates smoothly with our data pipelines.
- Design and implement processes for data modeling, mining, and production.
- Work closely with data scientists and cyber researchers to ensure seamless data availability for AI model training.
- Develop and optimize data ingestion, storage, and retrieval processes to meet the needs of our high-throughput AI platforms.
- Build tools for data validation, cleansing, and automation to enhance data integrity and efficiency.
- Troubleshoot and resolve issues in our development, production, and testing environments related to data access and quality.
- Demonstrate excellent communication and interpersonal skills, working effectively as part of a dynamic team.
Skills
- Proven ability to work collaboratively in a team environment.
- At least 4-5 years of experience in data engineering, particularly with data lakes and large-scale data platforms.
- Proficient with AWS cloud services especially those related to data storage and processing.
- Expertise in big data technologies.
- Strong knowledge of SQL/NoSQL databases, data warehouse solutions, and data management tools.
- Solid background in data pipeline and workflow management tools like Airflow and NiFi.
- Skilled in Linux environments, with scripting experience in Python or Bash.
- Familiarity with CI/CD pipelines and version control systems such as GitHub.
Advantages:
- Experience with building data lake solutions and data scraping.
- Experience in setting up real-time data feeds and stream-processing systems.
- Exposure to machine learning and understanding of data science workflows.
- Experience with vector databases.
Tech stack:
- AWS, Google Cloud, Azure, Spark, Airflow, NiFi, Vector DBs, Jenkins, Docker, EKS, Lambda, ECS, Redshift, Terraform, Ansible, MongoDB, Neo4J, Python, Bash, GitHub, and more.