Do you run the ETL spark and HDFS nodes on virtual machines or on the host OS?
We have both a bare metal cluster and a virtualized cluster.
Our production cluster uses bare metal. ~30 data nodes w/ 12Core-24Thread CPUs, 128GB Ram machines, running Debian.
Our test cluster is completely virtualized. ~10 data nodes w/ 16 vcpu, 64GB Ram machines, running Ubuntu.
Edit for extra node:
Our virtualized cluster runs on it’s own dedicated vlan separate from the other VMs we run.