I remember Vitalii helped me import project data. If I remember correctly, he used a command to choose to import one project instead of multiple project. Sorry for keeping forgetting the command, can Vitalii or someone else please resend the command?
Currently my portal only shows one project which is from China. Hardly to believe that it is accidental. It must be related to the choice we made earlier. Now I want to import all the projects, how do I import all the projects? Do I need to reimport everything?
This information can be found on the tool’s home page.
To import a single project
-p <project_code> option should be used. E.g.
java -jar dcc-download-import-4.2.12.jar -i /tmp/release21.tar -es es://localhost:9300 -p ALL-US
If the option is omitted all projects will be imported.
Vitallii, thanks for answering my question again.
To import all projects, should I delete the one already imported project? Or I just run this command to add the rest?
Should I remove hdfs:/icgc/input/release_21 and reimport? Are these two data sets can be imported independently?
You should delete the existing Elasticsearch index as the tool does not import if an index with the same name already exists.
You should not delete
/icgc/input/release_21 directory on the
HDFS cluster as that is the Download server’s data. It’s not affected by the import.