OutOfMemoryError during importing full index data

We have been able to import a single project data from release.tar into portal without any issues. However, so far we haven’t been able to import the full index data set. when we try to import full index data, it prints the following error messages and it is very slow:

2016-09-13 15:45:02,465 [main] INFO  o.i.d.r.j.i.i.BulkProcessorListener - [1088516203] Successfully loaded bulk request '296'.
2016-09-13 15:45:11,700 [main] INFO  o.i.d.r.j.i.i.BulkProcessorListener - [1088516203] Sending bulk request '297' with 1 items (130.8 MB bytes)
2016-09-13 15:56:04,277 [main] WARN  o.i.d.r.j.i.i.BulkProcessorListener - [1088516203] Encountered exceptions during bulk load: failure in bulk execution:
[0]: index [icgc22-13], type [donor-centric], id [DO222880], message [IndexFailedEngineException[[icgc22-13][9] Index failed for [donor-centric#DO222880]]; nested: OutOfMemoryError[Java heap space]; ]
2016-09-13 15:56:04,277 [main] INFO  o.i.d.r.j.i.i.ClusterStateVerifier - [1088516203] Checking for cluster state before loading.
2016-09-13 15:56:04,286 [main] WARN  o.i.d.r.j.i.i.ClusterStateVerifier - [1088516203] Cluster is 'YELLOW'. Sleeping for 5 seconds...
2016-09-13 15:56:09,287 [main] INFO  o.i.d.r.j.i.i.ClusterStateVerifier - [1088516203] Checking for cluster state before loading.
2016-09-13 15:56:09,291 [main] WARN  o.i.d.r.j.i.i.ClusterStateVerifier - [1088516203] Cluster is 'YELLOW'. Sleeping for 7 seconds...
2016-09-13 15:56:16,291 [main] INFO  o.i.d.r.j.i.i.ClusterStateVerifier - [1088516203] Checking for cluster state before loading.
2016-09-13 15:56:16,296 [main] WARN  o.i.d.r.j.i.i.ClusterStateVerifier - [1088516203] Cluster is 'YELLOW'. Sleeping for 9 seconds...
2016-09-13 15:56:25,296 [main] INFO  o.i.d.r.j.i.i.ClusterStateVerifier - [1088516203] Checking for cluster state before loading.
2016-09-13 15:56:25,301 [main] WARN  o.i.d.r.j.i.i.ClusterStateVerifier - [1088516203] Cluster is 'YELLOW'. Sleeping for 12 seconds...
2016-09-13 15:56:37,301 [main] INFO  o.i.d.r.j.i.i.ClusterStateVerifier - [1088516203] Checking for cluster state before loading.
2016-09-13 15:56:37,307 [main] WARN  o.i.d.r.j.i.i.ClusterStateVerifier - [1088516203] Cluster is 'YELLOW'. Sleeping for 16 seconds...
2016-09-13 15:56:53,307 [main] INFO  o.i.d.r.j.i.i.ClusterStateVerifier - [1088516203] Checking for cluster state before loading.
2016-09-13 15:56:53,311 [main] INFO  o.i.d.r.j.i.i.BulkProcessorListener - [1088516203] Sending bulk request '298' with 1 items (130.8 MB bytes)
2016-09-13 16:02:59,531 [main] WARN  o.i.d.r.j.i.i.BulkProcessorListener - [1088516203] Encountered exceptions during bulk load: failure in bulk execution:
[0]: index [icgc22-13], type [donor-centric], id [DO222880], message [IndexFailedEngineException[[icgc22-13][9] Index failed for [donor-centric#DO222880]]; nested: OutOfMemoryError[Java heap space]; ]
2016-09-13 16:02:59,532 [main] INFO  o.i.d.r.j.i.i.ClusterStateVerifier - [1088516203] Checking for cluster state before loading.
2016-09-13 16:02:59,535 [main] WARN  o.i.d.r.j.i.i.ClusterStateVerifier - [1088516203] Cluster is 'YELLOW'. Sleeping for 5 seconds...
2016-09-13 16:03:04,536 [main] INFO  o.i.d.r.j.i.i.ClusterStateVerifier - [1088516203] Checking for cluster state before loading.
2016-09-13 16:03:04,540 [main] WARN  o.i.d.r.j.i.i.ClusterStateVerifier - [1088516203] Cluster is 'YELLOW'. Sleeping for 7 seconds...
2016-09-13 16:03:11,540 [main] INFO  o.i.d.r.j.i.i.ClusterStateVerifier - [1088516203] Checking for cluster state before loading.

Look at elasticsearch cluster health status, it is in green state:

 lxv-icgc-elastic01:~$ curl 'http://localhost:9200/_cluster/health?pretty=1'
{
  "cluster_name" : "elasticsearch",
  "status" : "green",
  "timed_out" : false,
  "number_of_nodes" : 10,
  "number_of_data_nodes" : 9,
  "active_primary_shards" : 33,
  "active_shards" : 58,
  "relocating_shards" : 0,
  "initializing_shards" : 0,
  "unassigned_shards" : 0
}

Look at how big index is:

lxv-icgc-elastic01:~$ curl 'http://localhost:9200/_cat/indices?pretty=1'

green open icgc22-13                15 1 113498040 0   4.8gb   2.4gb
green open .marvel-2016.09.12        1 1     20705 0  77.5mb  38.7mb
green open icgc-repository-20160830 15 0    943761 0 578.4mb 578.4mb
green open .marvel-2016.09.13        1 1     61315 0 315.5mb 158.7mb
green open terms-lookup              1 8         0 0    963b    115b

The index is merely 4.8gb. If I import single project, it would have completed in a couple of hours and index is 30gb to 50 gb.

The import command I used is shown below. It was run on varnish node (which has a lot of free memory). I have increase 50g heap space in java CLI, but that doesn’t help:

java -Xmx50g -jar dcc-download-import.jar -i release.tar -es es://lxv-icgc-elastic01:9300

My questions are:

  1. Is my CLI option correct?
  2. Is the Out of Memory error critical normal?
  3. How does ICGC DCC imported full index data into production system? How long does it take?

Brady

I am wondering if the error messages are from elasticsearch instead of the client program. Should I increase elasticsearch server heap space? Looks like elasticsearch java process has only 4GB heapspace.

Brady, the exception you are getting is thrown by the Elasticsearch cluster.
What is the Elasticsearch cluster configuration?
What is the Java heap configuration for the Elasticsearch nodes?
How much RAM the Elasticsearch nodes servers have?

A recommended setup for an Elasticsearch node is: a machine with 64GB of RAM and Elasticsearch node with 30 GB of Java heap.

4GB for Elasticsearch java process is not enough for our production dataset. A minimum requirement is at least 16GB

Hi, Vitalii,

I checked elasticsearch config /etc/default/elasticsearch. ES_HEAP_SIZE is only 4 GB. That results in java option “-Xmx4g” for elasticsearch process.

We have 10 elastic nodes. Each has 64 GB memory. As you have suggested, we should use much larger heap size. I am going to try 30 GB of Java heap and see how it works. Thank you very much for the suggestion.

Brady

After increase java heap size to 30 GB, we have successfully imported full index data set without incident!

I’m glad to here that. Thanks for the feedback, Brady!