Nested type is not nested after data repository import

I used the following command to import repository.tar.gz:

curl -XPOST "http://lxv-icgc-elastic01:9200/_import?path=/home/bzuo/repository.tar.gz"

After import, I don’t see data repository in portal page. In elasticsearch log, I see the following error messages:

2016-09-06 11:39:26,382 [http-nio-8080-exec-1] ERROR o.i.d.p.s.j.m.ElasticSearchExceptionMapper - Error handling a request: 4d7071f5f4f36e13
org.elasticsearch.action.search.SearchPhaseExecutionException: Failed to execute phase [query], all shards failed; shardFailures {[bzl_ovkhQuq3-67T0k87Ug][icgc-repository][0]: RemoteTransportException[[Jolt][inet[/10.103.131.26:9300]][indices:data/read/search[phase/query]]]; nested: AggregationExecutionException[[nested] nested path [file_copies] is not nested]; }{[bzl_ovkhQuq3-67T0k87Ug][icgc-repository][1]: RemoteTransportException[[Jolt][inet[/10.103.131.26:9300]][indices:data/read/search[phase/query]]]; nested: AggregationExecutionException[[nested] nested path [file_copies] is not nested]; }{[iP-9eLrhTDCIlqxFisEhBQ][icgc-repository][2]: RemoteTransportException[[Needle][inet[/10.103.131.23:9300]][indices:data/read/search[phase/query]]]; nested: AggregationExecutionException[[nested] nested path [file_copies] is not nested]; }{[PqHR5fM8TEe_4xfCTREecw][icgc-repository][3]: RemoteTransportException[[Madcap][inet[/10.103.131.25:9300]][indices:data/read/search[phase/query]]]; nested: AggregationExecutionException[[nested] nested path [file_copies] is not nested]; }{[CywSUMjCR9y4Rta41ucpDw][icgc-repository][4]: AggregationExecutionException[[nested] nested path [file_copies] is not nested]}
        at org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction.onFirstPhaseResult(TransportSearchTypeAction.java:233)
        at org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction$1.onFailure(TransportSearchTypeAction.java:179)
        at org.elasticsearch.search.action.SearchServiceTransportAction$6.handleException(SearchServiceTransportAction.java:249)
        at org.elasticsearch.transport.netty.MessageChannelHandler.handleException(MessageChannelHandler.java:185)

It appears file_copies is not nested type, but portal tries to use it as nested type. I use the following command to get mapping information:

curl lxv-icgc-elastic02:9200/icgc-repository/_mapping | jq '.'

    "file-centric": {
        "properties": {
          "file_copies": {
            "properties": {
              "repo_type": {
                "type": "string"
              },
              "repo_code": {
                "type": "string"
              },
              "repo_base_url": {
                "type": "string"
              },
              "last_modified": {
                "type": "long"
              },

It appears file_copies is type "string" instead of "nested".

If we look at the mapping in repository, it is "nested" type:
bzuo@lxv-icgc-download01:/mnt/icgc-downloads/icgc-repository-20160830/file-centric$ cat _mapping | jq '.'
{
  "file-centric": {
   ...
    "properties": {
      "donors": {
        "type": "nested"
      },
      "file_copies": {
        "type": "nested"
      }
    },

My questions are:

  1. Did I miss some steps in importing data repository?
  2. Is there some problem with the import plugin I used?

Thanks,

Brady

Looks like this is a difficult one :slight_smile:

Out of desperate, I deleted all the garage indexes, including test-index-1 which was added during initial testing, not sure what is inside. Then I reimported the index, and added the following field to /srv/dcc-portal-server/application.yml on both portal servers:

elastic:
  ...
  repoIndexName: icgc-repository-20160830

I then restarted portal servers, and varnish server (this is probably unnecessary). Then I refresh my brower. I still see the familiar error message notification on first visit, but the data repository is no long empty!

I guess this is caused by some random things we have done (added a fake index for testing or something else) which is very difficult for developers to tell what is going one without accessing the system. Hopefully this will not happen again.