ETL services run on virtual machines or host OS?


#1

Do you run the ETL spark and HDFS nodes on virtual machines or on the host OS?


#2

We have both a bare metal cluster and a virtualized cluster.

Our production cluster uses bare metal. ~30 data nodes w/ 12Core-24Thread CPUs, 128GB Ram machines, running Debian.

Our test cluster is completely virtualized. ~10 data nodes w/ 16 vcpu, 64GB Ram machines, running Ubuntu.

Edit for extra node:
Our virtualized cluster runs on it’s own dedicated vlan separate from the other VMs we run.