Solvd is an AI-first advisory and digital engineering firm delivering measurable business impact through strategic digital transformation. Taking an AI-first approach, we bridge the critical gap between experimentation and real ROI, weaving artificial intelligence into everything we do and helping clients at all stages accelerate AI integration into each process layer. Our mission is to empower passionate people to thrive in the era of AI while maintaining rigorous ethical AI standards. We’re supported by a global team with offices in the USA, Poland, Ukraine and Georgia.
We are looking for a Data Engineer to develop an AI-powered data mapping recommendation platform to speed up the integration and validation of complex datasets. The system will automate data extraction, mapping, and validation processes that currently require extensive manual effort due to inconsistencies in source data, reliance on domain-specific code mappings, and heuristic-based validation.

Responsibilities

Build and maintain scalable data pipelines with Databricks, Spark, and PySpark.
Manage data governance, security, and credentials using Unity Catalog and Secret Scopes.
Develop and deploy ML models with MLflow; work with LLMs and embedding-based vector search.
Apply ML/DL techniques (classification, regression, clustering, transformers) and evaluate using industry metrics.
Design data models and warehouses leveraging dbt, Delta Lake, and Medallion architecture.
Work with healthcare data standards and medical terminology mapping.

Mandatory Requirements:

Databricks Expertise - Candidates must demonstrate strong hands-on experience with the Databricks platform, including:
Unity Catalog: Managing data governance, access control, and auditing across workspaces.
Secret Scopes: Secure handling of credentials and sensitive configurations.
Apache Spark / PySpark: Writing performant, scalable distributed data pipelines.
MLflow: Managing ML lifecycle including experiment tracking, model registry, and deployment.
Vector Search: Working with vector databases or search APIs to build embedding-based retrieval systems.
LLMs (Large Language Models): Familiarity with using or fine-tuning LLMs in Databricks or similar environments.

Data Engineering Skills
Experience designing and maintaining robust data pipelines:
Data Modeling & Warehousing: Dimensional modeling, star/snowflake schemas, SCD (Slowly Changing Dimensions).
Modern Data Stack: Familiarity with dbt, Delta Lake, and the Medallion architecture (Bronze, Silver, Gold layers).

Optional requirements:

Machine Learning Knowledge (Nice to Have)
Strong foundation in machine learning is expected, including:
Traditional Machine Learning Techniques: Classification, regression, clustering, etc.
Model Evaluation & Metrics: Precision, recall, F1-score, ROC-AUC, etc.
Deep Learning (DL): Understanding of neural networks and relevant frameworks.
Transformers & Attention Mechanisms: Knowledge of modern NLP architectures and their applications.

Preferred Domain Knowledge (Nice to Have)
Experience with healthcare data standards and medical code systems such as eCQM, VSAC, RxNorm, LOINC, SNOMED, etc.
Understanding of medical terminology and how to map or normalize disparate coding systems.

Tech Stack

Platforms & Tools: Databricks, Unity Catalog, Secret Scopes, MLflow
Languages & Frameworks: Python, PySpark, Apache Spark
Machine Learning & AI: Traditional ML techniques, Deep Learning, Transformers, Attention Mechanisms, LLMs
Search & Retrieval: Vector databases, embedding-based vector search
Data Engineering & Modeling: dbt, Delta Lake, Medallion architecture (Bronze/Silver/Gold), Dimensional modeling, Star/Snowflake schemas
Domain (Optional): Healthcare data standards (eCQM, VSAC, RxNorm, LOINC, SNOMED)

Data Engineer

Responsibilities

Mandatory Requirements:

Optional requirements:

Tech Stack

Principal Software Engineer - Vue.JS

Product Mechanical Engineer

Hardware Support Engineer

Senior Data Engineer - Advertising (Latam)

Data/Machine Learning Engineer - Senior

Senior or Staff Software Engineer, Data Infrastructure (RDBMS)

DevSecOps Engineer

Backend Engineer, EHR (Brazil)

Data Platform Infrastructure Engineer

Estagiário na área de Marketing - Casa Verde