Machine Learning Engineer (Distributed Training)

CloudWalk
São Paulo
Full time
há 6 dias
Who we are:
CloudWalk is a fintech company reimagining the future of financial services. We are building intelligent infrastructure powered by AI, blockchain, and thoughtful design. Our products serve millions of entrepreneurs across Brazil and the US every day, helping them grow with tools that are fast, fair, and built for how business actually works. Learn more at cloudwalk.io.
Who We’re Looking For:
We’re looking for a Machine Learning Engineer to own and evolve our distributed training pipeline for large language models. You’ll work inside our GPU cluster to help researchers train and scale foundation models using frameworks like Hugging Face Transformers, Accelerate, DeepSpeed, FSDP, and others. Your focus will be distributed training: from designing sharding strategies and multi-node orchestration to optimizing throughput and managing checkpoints at scale.
This role is not research - it's about building and scaling the systems that let researchers move fast and models grow big. You’ll work closely with MLOps, infra, and model developers to make our training runs efficient, resilient, and reproducible.

What You'll Do:

What We’re Looking For:

Bonus Points:

How We Hire:

If you’ve trained LLMs before - or helped others do it better - this role is for you. Even if you don’t check every box, if you’re confident working with distributed compute and real-world LLM workloads, we want to hear from you.
Apply
Other Job Recommendations:

Mid Machine Learning Engineer

Tractian
Região Metropolitana de São Paulo, São Paulo
  • Deploy and maintain ML models from the data science team
  • Design and implement APIs and real-time inference services...
há 2 semanas

Qualified Train Driver - West Melbourne

Qube
Região Metropolitana da Grande Vitória, Espírito Santo
R$ 25.927 - R$ 32.829
  • Fulfil train operations requirements in a timely, safe,...
  • Participate in training, briefing sessions and toolbox talks...
há 4 dias

Principal Software Engineer - Vue.JS

Creative Chaos
Remote
  • Analyze business requirements
  • Estimate assigned tasks
  • Write code and unit tests...
há 1 semana

Product Mechanical Engineer

Tractian
Região Metropolitana de São Paulo, São Paulo
  • Create 3D models and detailed technical drawings of...
  • Apply simplified engineering analyses for quick project...
há 1 semana

Learning Assistant

International Schools Partnership
Região Metropolitana de São Paulo, São Paulo
R$ 24.609 - R$ 31.160
Responsibilities: Assess student learning needs and develop support strategies Collaborate with teachers to adapt curriculum and...
há 1 semana

Hardware Support Engineer

Tractian
Região Metropolitana de São Paulo, São Paulo
  • Troubleshoot product issues: Investigate and diagnose...
  • Analyze data, logs, and system behavior to identify the root...
há 1 semana

DevSecOps Engineer

Encora
Brasil
  • Participate in security investigation activities, meetings,...
  • Orchestrate comms and meetings during inflight security live...
há 1 semana

Site Reliability Engineer

CloudWalk
São Paulo
  • Help to develop and spread the DevOps culture (we...
  • Create and maintain development sandbox environments...
há 3 semanas

Staff Frontend Software Engineer

Tractian
Região Metropolitana de São Paulo, São Paulo
  • Work with engineers & technical leaders across the...
  • Build high quality end-to-end product experiences for...
há 3 semanas

Backend Engineer, EHR (Brazil)

Commure
São Paulo
US$ 30.000 - US$ 40.000
  • Develop High-Quality Solutions: Write clean, efficient, and...
  • Collaborate Cross-Functionally: Work closely with product,...
há 2 semanas