We have been able to import a single project data from release.tar into portal without any issues. However, so far we haven’t been able to import the full index data set. when we try to import full index data, it prints the following error messages and it is very slow:
2016-09-13 15:45:02,465 [main] INFO o.i.d.r.j.i.i.BulkProcessorListener - [1088516203] Successfully loaded bulk request '296'.
2016-09-13 15:45:11,700 [main] INFO o.i.d.r.j.i.i.BulkProcessorListener - [1088516203] Sending bulk request '297' with 1 items (130.8 MB bytes)
2016-09-13 15:56:04,277 [main] WARN o.i.d.r.j.i.i.BulkProcessorListener - [1088516203] Encountered exceptions during bulk load: failure in bulk execution:
[0]: index [icgc22-13], type [donor-centric], id [DO222880], message [IndexFailedEngineException[[icgc22-13][9] Index failed for [donor-centric#DO222880]]; nested: OutOfMemoryError[Java heap space]; ]
2016-09-13 15:56:04,277 [main] INFO o.i.d.r.j.i.i.ClusterStateVerifier - [1088516203] Checking for cluster state before loading.
2016-09-13 15:56:04,286 [main] WARN o.i.d.r.j.i.i.ClusterStateVerifier - [1088516203] Cluster is 'YELLOW'. Sleeping for 5 seconds...
2016-09-13 15:56:09,287 [main] INFO o.i.d.r.j.i.i.ClusterStateVerifier - [1088516203] Checking for cluster state before loading.
2016-09-13 15:56:09,291 [main] WARN o.i.d.r.j.i.i.ClusterStateVerifier - [1088516203] Cluster is 'YELLOW'. Sleeping for 7 seconds...
2016-09-13 15:56:16,291 [main] INFO o.i.d.r.j.i.i.ClusterStateVerifier - [1088516203] Checking for cluster state before loading.
2016-09-13 15:56:16,296 [main] WARN o.i.d.r.j.i.i.ClusterStateVerifier - [1088516203] Cluster is 'YELLOW'. Sleeping for 9 seconds...
2016-09-13 15:56:25,296 [main] INFO o.i.d.r.j.i.i.ClusterStateVerifier - [1088516203] Checking for cluster state before loading.
2016-09-13 15:56:25,301 [main] WARN o.i.d.r.j.i.i.ClusterStateVerifier - [1088516203] Cluster is 'YELLOW'. Sleeping for 12 seconds...
2016-09-13 15:56:37,301 [main] INFO o.i.d.r.j.i.i.ClusterStateVerifier - [1088516203] Checking for cluster state before loading.
2016-09-13 15:56:37,307 [main] WARN o.i.d.r.j.i.i.ClusterStateVerifier - [1088516203] Cluster is 'YELLOW'. Sleeping for 16 seconds...
2016-09-13 15:56:53,307 [main] INFO o.i.d.r.j.i.i.ClusterStateVerifier - [1088516203] Checking for cluster state before loading.
2016-09-13 15:56:53,311 [main] INFO o.i.d.r.j.i.i.BulkProcessorListener - [1088516203] Sending bulk request '298' with 1 items (130.8 MB bytes)
2016-09-13 16:02:59,531 [main] WARN o.i.d.r.j.i.i.BulkProcessorListener - [1088516203] Encountered exceptions during bulk load: failure in bulk execution:
[0]: index [icgc22-13], type [donor-centric], id [DO222880], message [IndexFailedEngineException[[icgc22-13][9] Index failed for [donor-centric#DO222880]]; nested: OutOfMemoryError[Java heap space]; ]
2016-09-13 16:02:59,532 [main] INFO o.i.d.r.j.i.i.ClusterStateVerifier - [1088516203] Checking for cluster state before loading.
2016-09-13 16:02:59,535 [main] WARN o.i.d.r.j.i.i.ClusterStateVerifier - [1088516203] Cluster is 'YELLOW'. Sleeping for 5 seconds...
2016-09-13 16:03:04,536 [main] INFO o.i.d.r.j.i.i.ClusterStateVerifier - [1088516203] Checking for cluster state before loading.
2016-09-13 16:03:04,540 [main] WARN o.i.d.r.j.i.i.ClusterStateVerifier - [1088516203] Cluster is 'YELLOW'. Sleeping for 7 seconds...
2016-09-13 16:03:11,540 [main] INFO o.i.d.r.j.i.i.ClusterStateVerifier - [1088516203] Checking for cluster state before loading.
Look at elasticsearch cluster health status, it is in green state:
lxv-icgc-elastic01:~$ curl 'http://localhost:9200/_cluster/health?pretty=1'
{
"cluster_name" : "elasticsearch",
"status" : "green",
"timed_out" : false,
"number_of_nodes" : 10,
"number_of_data_nodes" : 9,
"active_primary_shards" : 33,
"active_shards" : 58,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 0
}
Look at how big index is:
lxv-icgc-elastic01:~$ curl 'http://localhost:9200/_cat/indices?pretty=1'
green open icgc22-13 15 1 113498040 0 4.8gb 2.4gb
green open .marvel-2016.09.12 1 1 20705 0 77.5mb 38.7mb
green open icgc-repository-20160830 15 0 943761 0 578.4mb 578.4mb
green open .marvel-2016.09.13 1 1 61315 0 315.5mb 158.7mb
green open terms-lookup 1 8 0 0 963b 115b
The index is merely 4.8gb. If I import single project, it would have completed in a couple of hours and index is 30gb to 50 gb.
The import command I used is shown below. It was run on varnish node (which has a lot of free memory). I have increase 50g heap space in java CLI, but that doesn’t help:
java -Xmx50g -jar dcc-download-import.jar -i release.tar -es es://lxv-icgc-elastic01:9300
My questions are:
- Is my CLI option correct?
- Is the Out of Memory error critical normal?
- How does ICGC DCC imported full index data into production system? How long does it take?
Brady