HDFS configuration?


#1

Hello,

Our ETL DOCUMENT step fails on a workflow of 38 projects with an RPC timeout. The task where the failure occurs is

Failed to execute task 'mutation-centric-document-task:mutation-centric'

HDFS logs have messages saying
error processing WRITE_BLOCK operation
WARN Slow ReadProcessor read fields took 32320ms (threshold=30000ms);

I’ve reduced the replication factor to 1, and raised the dfs.namenode.handler.count, which allowed me to finish processing a set of 22 projects, but I’m still stuck at 38 projects in the DOCUMENT step. What HDFS settings are you using? Here’s our ansible setup for roles/hdfs/defaults/main.yml:

hdfs_namenode_properties:
  # need to add user submitting job to the "hadoop" group, or turn filesystem
  # security off.
  # https://hadoop.apache.org/docs/r1.2.1/hdfs_permissions_guide.html#Configuration+Parameters
  - { name: "dfs.permissions.superusergroup", value: "hadoop" }
  - { name: "dfs.namenode.name.dir", value: "/media/persistent0" }
  - { name: "dfs.replication", value: "1" }
  # set handler threads to min(20*log2(cluster size), 200)
  # https://community.hortonworks.com/questions/63511/namenode-handler-count.html
  - { name: "dfs.namenode.handler.count", value: 70 }
jdk_home: /usr/lib/jvm/java-8-oracle/
hdfs_datanode_properties:
  - { name: "dfs.permissions.superusergroup", value: "hadoop" }
  - { name: "dfs.datanode.data.dir", value: "{{ hdfs_disks | map(attribute='mount_point') | join(',') }}" }
  - { name: "dfs.datanode.max.transfer.threads", value: "5000" }

Thanks,
Brian K.


#2

We are using a dfs.replication of 3 and a dfs.blocksize of 134217728.

Just curious, how are you building your HDFS cluster? Standalone, Horton Works or Cloudera?


#3

Dusan, thanks, we have the same blocksize. The HDFS is done by the ansible scripts for Coudera repostiory:

- name: Configure Cloudera APT key
  apt_key: url="http://archive.cloudera.com/cdh5/ubuntu/{{ ansible_distribution_release }}/amd64/cdh/archive.key"
           state=present

- name: Configure the Cloudera APT repositories
  apt_repository: repo="deb [arch=amd64] http://archive.cloudera.com/cdh5/ubuntu/{{ ansible_distribution_release }}/amd64/cdh {{ ansible_distribution_release }}-{{ hdfs_cloudera_distribution }} contrib"
                  state=present

- name: Install Hadoop DataNode and client
  apt: pkg={{ item }}
       state=present
  with_items:
    - hadoop-hdfs-datanode
    - hadoop-client