HPC Specialist

drweng· AI Research & Development
Apply Now ↗
📍 Montreal

About this role

DRW is a diversified trading firm with over 3 decades of experience bringing sophisticated technology and exceptional people together to operate in markets around the world. We value autonomy and the ability to quickly pivot to capture opportunities, so we operate using our own capital and trading at our own risk.

Headquartered in Chicago with offices throughout the U.S., Canada, Europe, and Asia, we trade a variety of asset classes including Fixed Income, ETFs, Equities, FX, Commodities and Energy across all major global markets. We have also leveraged our expertise and technology to expand into three non-traditional strategies: real estate, venture capital and cryptoassets.

We operate with respect, curiosity and open minds. The people who thrive here share our belief that it’s not just what we do that matters–it's how we do it. DRW is a place of high expectations, integrity, innovation and a willingness to challenge consensus.

We are looking for an HPC Specialist to join our AI and Multi Asset Systematic Strategies team. This team builds and operates GPU infrastructure that powers AI and ML workloads. You'll work on the infrastructure stack from bare metal to model serving, combining systems engineering, performance optimization, and infrastructure automation to solve complex problems at the intersection of hardware, networking, and distributed systems.

Responsibilities:

  • Deploy, maintain, and optimize GPU infrastructure for large-scale LLM inference workloads, including provisioning, configuration, and deployment of GPU server fleets.
  • Architect and implement distributed serving solutions for multi-node, multi-GPU model deployments.
  • Manage GPU-enabled Kubernetes clusters for LLM and ML workloads.
  • Configure network infrastructure including load balancers, firewalls, and inter-node communication for GPU clusters.
  • Implement and optimize storage solutions for model weights and inference caches.
  • Troubleshoot performance bottlenecks across the stack: hardware, drivers, networking, and application layer.
  • Research and evaluate emerging GPU technologies, model serving frameworks, and infrastructure optimizations.
  • Collaborate with ML engineers to profile model performance and implement inference acceleration techniques.
  • Drive reliability improvements through monitoring, alerting, capacity planning, and incident response.

Requirements:

  • Bachelor's or Master's degree in Computer Science, Systems Engineering, or related field.
  • 5+ years in DevOps, SRE, or infrastructure engineering roles.
  • Strong experience with GPU infrastructure, model serving frameworks (vLLM, SGLang), and GPU driver management.
  • Hands-on experience optimizing deep learning workloads (inference or training) on GPU clusters.
  • Deep Linux systems knowledge including network configuration, storage optimization, and Kubernetes orchestration.
  • Experience with infrastructure as code tools (Ansible, Terraform, or similar).
  • Strong understanding of distributed systems, networking protocols (TCP/IP, HTTP/2), and load balancing.
  • Proficiency in Python and Bash scripting for automation.
  • Experience with monitoring and observability tools (Prometheus, Grafana, or similar).

For more information about DRW's processing activities and our use of job applicants' data, please view our Privacy Notice at https://drw.com/privacy-notice.

California residents, please review the California Privacy Notice for information about certain legal rights at https://drw.com/california-privacy-notice.

[#LI-KS1] 

Frequently Asked Questions

Is the salary disclosed for the HPC Specialist position at drweng?
The salary for this HPC Specialist role at drweng is not publicly listed. Click "Apply Now" to learn more about the compensation package on their official careers page.
Where is the HPC Specialist position at drweng located?
This HPC Specialist role at drweng is based in Montreal. The position is listed as on-site or hybrid. Check the full job description or apply directly to confirm the work arrangement.
Which team or department does the HPC Specialist at drweng belong to?
This HPC Specialist position is part of the AI Research & Development department at drweng. See the full job description for more information about the team structure and responsibilities.
How do I apply for the HPC Specialist position at drweng?
Click the "Apply Now" button on this page. You will be redirected to drweng's official application portal hosted on greenhouse where you can submit your application directly.
When was the HPC Specialist job at drweng posted?
This HPC Specialist position at drweng was posted on Mar 23, 2026. Apply as soon as possible — early applications are often reviewed first.
HPC Specialist
drweng
Apply for this role ↗

You'll be redirected to drweng's official application page on Greenhouse.