hero

Senior Director, Engineering - MLOps

DataRobot

DataRobot

Sales & Business Development
Multiple locations
Posted on Oct 9, 2024

Job Description:

DataRobot is the leader in Value-Driven AI, a unique and collaborative approach to generative and predictive AI that combines an open platform, deep expertise and broad use-case experience to improve how organizations run, grow and optimize their business. The DataRobot AI Platform is the only complete AI lifecycle platform that interoperates with an organization’s existing investments in data, applications and business processes, and can be deployed on prem or on any cloud environment. Global organizations, including 40% of the Fortune 50, rely on DataRobot to drive greater impact and value from AI.

The Senior Director of Engineering for Machine Learning Operations (ML Ops) will lead and oversee multiple software and ML teams, responsible for developing and scaling our robust ML Ops platform. This individual will guide the strategy and execution of key ML Ops functions, such as model deployment, registry, tracking, governance, monitoring, and observability. They will ensure that our end-to-end AI/ML platform continues to excel in the industry with its capabilities for efficient deployment, management, and monitoring of models across all stages of their lifecycle. These capabilities span both predictive and generative offerings at DataRobot.

Key Responsibilities:

  • Provide architectural and technical guidance for building ML Ops infrastructure, including model deployment pipelines, model registries, monitoring systems, and governance frameworks.

  • Ensure adherence to best practices for software development, automation, and machine learning engineering.

  • Oversee the scalability, reliability, and security of the ML platform in production environments.

  • Lead and support different modes of model deployment, including real-time, batch, streaming, and edge deployments, ensuring models operate efficiently across diverse environments (e.g., cloud, on-premises, IoT devices).

  • Design flexible deployment strategies to accommodate varying use cases such as low-latency predictions, offline model inference, and multi-model management.

  • Ensure the platform supports both batch and real-time machine learning workflows at scale.

  • Lead the design and implementation of machine learning lifecycle management systems to handle model deployment, versioning, rollback mechanisms, and real-time monitoring.

  • Ensure the platform supports both batch and real-time machine learning workflows at scale.

  • Drive the automation of key processes in model deployment and operations to increase speed and reduce manual intervention, as well as drive business, product and technical changes within the organization.

  • Collaborate with product management, data science, and infrastructure teams to align on business needs and ensure smooth integration of machine learning models into production systems. Act as the liaison between engineering teams to ensure the successful delivery of ML Ops solutions.

  • Develop governance policies and frameworks around model management, ensuring models are properly tracked, documented, and auditable for regulatory compliance.

  • Ensure production environments are monitored effectively, with appropriate observability, alerting, and response protocols in place for handling issues.

  • Lead efforts to improve system performance and reduce downtime, maintaining high availability and reliability standards for ML-driven products.

  • Champion continuous improvements to the ML Ops platform, focusing on reducing operational overhead and improving the efficiency of machine learning workflows.

Knowledge, Skills and Abilities:

  • Extensive experience in building and managing machine learning platforms, with deep expertise in ML Ops functions such as model registry, model deployment, tracking, and monitoring.

  • Proficiency in programming languages such as Python, Java, or similar, with experience in machine learning frameworks like TensorFlow, PyTorch, or similar.

  • Strong experience in working with cloud environments (AWS, Azure, GCP) and container orchestration technologies like Kubernetes and Docker.

  • Bachelor's degree in Computer Science, Software Engineering, Data Science, or a related field. Master's or PhD preferred.

  • 15+ years of experience in machine learning, data science, software engineering or related fields and 7+ years of experience in a leadership role, scaling, managing and mentoring technical teams.

Nice to have:

  • Experience with large-scale distributed systems and high-performance computing environments.

  • Familiarity with modern software, data & ML engineering tools and frameworks, as well as model monitoring and observability tools. Knowledge of responsible AI practices, including model transparency, fairness, and governance.

The talent and dedication of our employees are at the core of DataRobot’s journey to be an iconic company. We strive to attract and retain the best talent by providing competitive pay and benefits with our employees’ well-being at the core. Here’s what your benefits package may include depending on your location and local legal requirements: Medical, Dental & Vision Insurance, Flexible Time Off Program, Paid Holidays, Paid Parental Leave, Global Employee Assistance Program (EAP) and more!

DataRobot Operating Principles:

  • Wow Our Customers
  • Set High Standards
  • Be Better Than Yesterday
  • Be Rigorous
  • Assume Positive Intent
  • Have the Tough Conversations
  • Be Better Together
  • Debate, Decide, Commit
  • Deliver Results
  • Overcommunicate


Research shows that many women only apply to jobs when they meet 100% of the qualifications while many men apply to jobs when they meet 60%. At DataRobot we encourage ALL candidates, especially women, people of color, LGBTQ+ identifying people, differently abled, and other people from marginalized groups to apply to our jobs, even if you do not check every box. We’d love to have a conversation with you and see if you might be a great fit.

DataRobot is proud to be an Equal Employment Opportunity and Affirmative Action employer. We do not discriminate based upon race, religion, color, national origin, gender (including pregnancy, childbirth, or related medical conditions), sexual orientation, gender identity, gender expression, age, status as a protected veteran, status as an individual with a disability, or other applicable legally protected characteristics. DataRobot is committed to working with and providing reasonable accommodations to applicants with physical and mental disabilities. Please see the United States Department of Labor’s EEO poster and EEO poster supplement for additional information.

All applicant data submitted is handled in accordance with our Applicant Privacy Policy.