Hi All,
On my portal, the “data repository” doesn’t show any data repositories. I realized I haven’t imported repository.tar.gz. My question is how I import the downloaded repository.tar.gz.
Thanks,
Brady
Hi All,
On my portal, the “data repository” doesn’t show any data repositories. I realized I haven’t imported repository.tar.gz. My question is how I import the downloaded repository.tar.gz.
Thanks,
Brady
Hi Brady,
repository.tar.gz
should be installed with the Knapsack plugin. The import tool does not support it yet.
First, you need to install the plugin and restart the Elasticsearch node where the plugin is installed.
/usr/share/elasticsearch/bin/plugin -url http://bit.ly/29A1hsz -install knapsack
sudo service elasticsearch restart
Then use the following command to install to start an archive import:
curl -XPOST "http://elasticsearchnode:9200/_import?path=/repository.tar.gz"
where
elasticsearchnode
is address of the node where the plugin is installedpath
query parameter is path to the repository.tar.gz
Check the logs of Elasticsearch node to see when the import finishes. But it usually takes around 2 minutes.
Hi Vitalii,
Thanks for the quick reply.
I have repository.tar.gz on current working directory, I ran the command but it failed:
curl -XPOST "http://lxv-icgc-elastic01:9200/_import?path=repository.tar.gz"
{"error":"InvalidIndexNameException[[_import] Invalid index name [_import], must not start with '_']","status":400}
I remove “_” from “_import” and reran the command:
indent preformatted text by 4 spaces
curl -XPOST “http://lxv-icgc-elastic01:9200/_import?path=/repository.tar.gz”
This time it didn’t fail. In elastic log, I see this message:
[2016-09-02 07:53:57,279][INFO ][cluster.metadata ] [Krystalin] [import] creating index, cause [api], shards [5]/[1], mappings []
There is no completion message. The following command shows the index:
curl lxv-icgc-elastic02:9200/_cat/indices
green open .marvel-2016.09.02 1 1 73111 0 289.6mb 144.1mb
green open .marvel-2016.08.31 1 1 102301 0 435.7mb 217.8mb
green open .marvel-2016.09.01 1 1 121452 0 488.4mb 244.2mb
green open .marvel-2016.08.29 1 1 10277 0 46mb 23mb
green open icgc22-13 15 1 55977215 0 28gb 14gb
green open .marvel-2016.08.30 1 1 86640 0 373.7mb 186.8mb
green open icgc21-0-0 1 1 0 0 230b 115b
green open import 5 1 0 0 970b 575b
green open terms-lookup 1 4 0 0 575b 115b
Looks like the “import” index is empty after more than 10 minutes. I guess “import” is wrong name for the index. What name should I use?
Answer my own question. “_import” is a command, so changing it to “import” is wrong.
The issue is probably caused by non-functional knapsack plugin. I used the following commands to remove and add it back:
sudo /usr/share/elasticsearch/bin/plugin -remove knapsack
sudo /usr/share/elasticsearch/bin/plugin -url http://bit.ly/29A1hsz -install knapsack
sudo service elasticsearch restart
The used the command to import:
curl -XPOST “http://lxv-icgc-elastic01:9200/_import?path=/tmp/repository.tar.gz”
The import started, but appears it hits an error:
[2016-09-02 17:25:50,872][INFO ][KnapsackImportAction ] resetting refresh rate for index icgc-repository-20160830
[2016-09-02 17:25:50,872][ERROR][KnapsackImportAction ] null
java.lang.NumberFormatException: null
at java.lang.Integer.parseInt(Integer.java:542)
at java.lang.Integer.parseInt(Integer.java:615)
at org.xbib.elasticsearch.action.knapsack.imp.TransportKnapsackImportAction.performImport(TransportKnapsackImportAction.java:245)
at org.xbib.elasticsearch.action.knapsack.imp.TransportKnapsackImportAction$1.run(TransportKnapsackImportAction.java:115)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
[2016-09-02 17:26:01,598][INFO ][BulkNodeClient ] closing bulk processor…
[2016-09-02 17:26:01,599][INFO ][BulkNodeClient ] shutting down…
[2016-09-02 17:26:01,599][INFO ][BulkNodeClient ] shutting down completed
[2016-09-02 17:26:01,600][INFO ][KnapsackImportAction ] end of import: {“mode”:“import”,“started”:“2016-09-03T00:24:55.602Z”,“path”:“file:///tmp/repository.tar.gz”,“node_name”:“Typeface”}, count = 415529
[2016-09-02 17:26:01,622][INFO ][KnapsackService ] remove: plugin.knapsack.import.state -> [{“mode”:“import”,“started”:“2016-09-03T00:24:55.602Z”,“path”:“file:///tmp/repository.tar.gz”,“node_name”:“Typeface”}]
[2016-09-02 17:26:01,623][INFO ][KnapsackService ] update cluster settings: plugin.knapsack.import.state -> []
When I visit “data repository” page on browser, I will get this kind of error messages on a lot elasticsearch nodes:
[2016-09-03 06:53:15,959][DEBUG][action.search.type ] [Tag] [icgc-repository][3], node[bzl_ovkhQuq3-67T0k87Ug], [R], s[STARTED]: Failed to execute [org.elasticsearch.action.search.SearchRequest@4e8e6974]
org.elasticsearch.transport.RemoteTransportException: [Jolt][inet[/10.103.131.26:9300]][indices:data/read/search[phase/query]]
Caused by: org.elasticsearch.search.aggregations.AggregationExecutionException: [nested] nested path [file_copies] is not nested
at org.elasticsearch.search.aggregations.bucket.nested.NestedAggregator.<init>(NestedAggregator.java:71)
at org.elasticsearch.search.aggregations.bucket.nested.NestedAggregator$Factory.create(NestedAggregator.java:185)
at org.elasticsearch.search.aggregations.AggregatorFactories.createAndRegisterContextAware(AggregatorFactories.java:53)
at org.elasticsearch.search.aggregations.AggregatorFactories.createSubAggregators(AggregatorFactories.java:71)
at org.elasticsearch.search.aggregations.Aggregator.<init>(Aggregator.java:191)
at org.elasticsearch.search.aggregations.bucket.BucketsAggregator.<init>(BucketsAggregator.java:39)
at org.elasticsearch.search.aggregations.bucket.terms.TermsAggregator.<init>(TermsAggregator.java:135)
at org.elasticsearch.search.aggregations.bucket.terms.AbstractStringTermsAggregator.<init>(AbstractStringTermsAggregator.java:37)
at org.elasticsearch.search.aggregations.bucket.terms.GlobalOrdinalsStringTermsAggregator.<init>(GlobalOrdinalsStringTermsAggregator.java:73)
at org.elasticsearch.search.aggregations.bucket.terms.TermsAggregatorFactory$ExecutionMode$2.create(TermsAggregatorFactory.java:60)
It appears the index is not imported correctly. Below are the indexes in elasticsearch:
curl lxv-icgc-elastic02:9200/_cat/indices
green open .marvel-2016.09.02 1 1 125802 0 467.9mb 233.9mb
green open .marvel-2016.08.31 1 1 102301 0 435.7mb 217.8mb
green open .marvel-2016.09.01 1 1 121452 0 488.4mb 244.2mb
green open .marvel-2016.08.29 1 1 10277 0 46mb 23mb
green open icgc22-13 15 1 55977215 0 28gb 14gb
green open .marvel-2016.08.30 1 1 86640 0 373.7mb 186.8mb
green open .marvel-2016.09.03 1 1 91314 0 323.1mb 161.6mb
green open icgc21-0-0 1 1 0 0 230b 115b
green open icgc-repository 5 1 415524 0 984.8mb 492.4mb
green open terms-lookup 1 4 0 0 467b 115b
Please note I rename icgc-repository-2016-* to icgc-repository which seems to be what code looks for.
I am wondering whether this is related to the error message at the end of importing.
Hi Brady,
You should import the repository
index using the following steps:
icgc-repository
alias for the imported index. You can do this with the following command:curl -XPOST 'http://localhost:9200/_aliases' -d'
{
"actions": [
{
"add": {
"index": "icgc-repository-20160908",
"alias": "icgc-repository"
}
}
]
}'
P.S. We are going to add functionality to import this file with the import tool.
Hi Vitalii,
Thanks for the detailed steps. I ended up with using “repoIndexName” in elasticsearch application.yml before seeing your reply. Using alias is better than hard code the name in elasicsearch application.yml. I will use it next time.
Brady
We recently added support to import the repository.tar.gz
Elasticsearch index archive with the dcc-download-import tool.
wget https://artifacts.oicr.on.ca/artifactory/dcc-release/org/icgc/dcc/dcc-download-import/[RELEASE]/dcc-download-import-[RELEASE].jar -O dcc-download-import.jar
java -jar dcc-download-import.jar -i repository.tar.gz -es es://localhost:9300