I was recently helping a customer configure VMware HCX to migrate from their legacy vSphere environment to a new VxRail SDDC running VMware VCF. We had carefully followed all the prerequisites in configuring firewall rules, proxies, and routing. We were able to successfully initiate vMotions and Cold Migrations but could not achieve a successful Bulk Migration.
Numerous days of testing ports, connectivity between HCX Interconnects, ESXi hosts, and Replication networks showed nothing unusual. Successful vMotions and Cold Migrations but consistent failures with Bulk Migrations.
Migration failed
Failed to enable replication while exchanging thumbprints. Exchange thumbprint failed for serviceMeshIds [“servicemesh-xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx”]
The issue came down to the configuration on VMkernel adaptor vmk0. VxRail configures vmk0 as a host discovery network using only IPv6 addressing. VxRail enables the ESXi Management network on this VMKernel but also provisions vmk2 as the main Management network where you will most likely be configuring an IPv4 address and connecting to your ESXi hosts.
HCX currently does not support IPv6 (as of R133). Despite vmk2 having an IPv4 address and having the Management and Replication networks enabled. HCX was still looking at vmk0, seeing only an IPv6 address, and failing Bulk Migrations.
The resolution turns out to be a relatively easy hack. Placing a dummy IPv4 address in addition to the IPv6 host discovery address on vmk0 is enough to convince HCX to perform a Bulk Migration.
With vmk0 now showing an IPv4 address as the default address instead of the IPv6 Discovery address Bulk Migrations can now be successfully performed.
This is no doubt an edge case using VxRail and HCX. Most customers will be using IPv4 addressing on vmk0. But in rare situations your using a different VMKernel interface other than vmk0 for Management or only using IPv6 on vmk0 this will hopefully resolve the issue, at least temporary for HCX migrations.