Description

I specialize in the hard problems — the ones that survive first-line support and require deep investigation across the hardware/software boundary. With 10+ years in datacenter-scale environments, I've spent my career at Nutanix diagnosing critical failures where compute, storage, networking, and hypervisor layers intersect.

My focus areas include GPU-accelerated infrastructure (NVIDIA driver debugging, VFIO/mdev passthrough, ECC analysis), Linux internals (kernel crash analysis, kdump, driver fault tracing), KVM/AHV virtualization, and NUMA/performance regression work on clustered systems. I've built diagnostic tooling used thousands of times across global customer fleets and worked directly with NVIDIA engineering to trace and resolve driver-level regressions.

I'm available for short or long-term engagements involving escalation support, infrastructure reliability investigations, diagnostic tooling, or performance analysis on complex Linux/GPU environments.

Certifications: RHCSA · Nutanix Certified Master (NCM MCI) · VCP6-DCV/NV

Domaines d’expertise

Langues

Anglais
Bilingue ou natif

Préférences en matière de lieu de travail

Accepte de travailler sur site

Rotterdam (jusqu’à 50 km)

Nutanix
-Sr. Staff Escalation & Systems Reliability Engineer
HIGH TECH
janvier 2019 - juillet 2025 (6 ans et 6 mois)
Amsterdam, Pays-Bas
• - Performed kernel-level crash-dump analysis on production clusters, isolating failures within NVIDIA's closed-source GPU driver modules; identified a driver regression that triggered firmware-level timeout conditions under specific workloads, and collaborated with Nutanix GPU Engineering and NVIDIA to validate fixes using pre-release driver builds
• - Troubleshot GPU passthrough (VFIO) and mediated-device (mdev) issues on AHV/KVM, including driver-binding problems, incomplete GPU reset behavior (FLR-related), and mdev provisioning failures due to host/guest driver mismatches
• - Designed and maintained automated live-boot ISO images embedding diagnostic payloads for GPU and storage nodes; scripts captured telemetry, logs, and performance signatures automatically on boot, reducing triage time from hours to minutes across global customer fleets.
• - Developed SQL correlation queries on large telemetry datasets to detect configuration dependent failure signatures across customer environments; integrated results into automated workflows that surfaced high-impact issues for engineering and support.
• - Investigated NUMA locality challenges in Nutanix AHV clusters where Controller VMs are pinned to host cores; validated BIOS configurations to maximize local I/O performance and avoid remote-node latency penalties.
• - Ran performance benchmarking using Cinebench and Phoronix Test Suite analyzing Nutanix hardware platforms and guest performance across hypervisors (AHV/ESXi), kernel versions, and CPU architectures to identify configuration-dependent regressions.
• - Created Python and Bash automation used to orchestrate log capture, correlate kernel events, and produce actionable diagnostic summaries for critical customer escalations.
VMWARE KVM Linux Data visualization Python
Nutanix
Systems Reliability Engineer
HIGH TECH
janvier 2016 - janvier 2019 (3 ans)
Amsterdam, Pays-Bas
• - Troubleshot cross-layer failures across compute, storage, and networking paths in distributed Nutanix clusters.
• - Provisioned and managed server hardware in the Support Lab; performed node bring-up, imaging, racking, and platform validation.
• - Supported engineering teams by validating experimental configurations and identifying systemic reliability issues.
• - Work directly supported production infrastructure operating at global datacenter scale.
System administration VMWARE Linux KVM Nutanix
VCE
vPlatform Support Engineer
HIGH TECH
juin 2015 - mars 2016 (9 mois)
Durham, États-Unis
Provided escalation support for VCE Vblock converged infrastructure, resolving complex customer issues across compute, networking, and storage layers. Authored and presented RCA documents for critical incidents, proactively identified at-risk customers based on problem trends, and mentored junior support engineers. Worked closely with critical accounts and emerging product lines.
VMWARE Cisco