API search metamodel


#1

Hi,

I’ve got a few questions regarding some API endpoints listed here: http://docs.icgc.org/portal/api-endpoints/
The most interesting one is about search-queries syntax for the '/api/v1/download/submit’’ endpoint.
There’s an example in docs:

'$ http ‘https://dcc.icgc.org/api/v1/download/submit’ ‘filters=={“donor”:{“primarySite”:{“is”:[“Brain”]}}}’ ‘info==[{“key”:“ssm”,“value”:“TSV”}]’’

So the questions are:

  1. What is the full list of top-level (‘donor’ here) and second-level (‘primarySite’ here) attributes for search?
  2. Which operators are available apart form ‘is’?
  3. And generally speaking: is there available some metamodel containing all of the variations (keys and values, probably in the hierarchical form) which can be used for performing the search via this endpoint?
    And also: what about ‘info’ part of the query? I assume it has ‘key-value’ format, but where can be found the list of available keys (and values for them)?

Thanks.


#2

Hi Dmytro,

To attempt to answer your questions in one go:

We unfortunately do not have a meta-model or schema in a easily consumable form such as JSONSchema.

The JSON filters that you see in the query we refer to as JQL. JQL is just a JS friendly abstraction on top of the Portal Query Language (PQL). Every JQL filter passed to the search API is converted into a PQL query by our backend. https://github.com/icgc-dcc/dcc-portal/blob/develop/dcc-portal-pql/PQL.md

The package responsible for this is: https://github.com/icgc-dcc/dcc-portal/tree/develop/dcc-portal-server/src/main/java/org/icgc/dcc/portal/server/pql/convert

This subpackage contains the “model” for JQL as described in code: https://github.com/icgc-dcc/dcc-portal/tree/develop/dcc-portal-server/src/main/java/org/icgc/dcc/portal/server/pql/convert/model

For example you can see the possible operations here: https://github.com/icgc-dcc/dcc-portal/blob/develop/dcc-portal-server/src/main/java/org/icgc/dcc/portal/server/pql/convert/model/Operation.java#L34-L49

Since JQL is just an abstraction that sits on top of PQL, the PQL engine is responsible for defining the entity fields that are allowed to be searched on. Each searchable entity has a TypeModel within PQL that describes the searchable fields and how to map from friendly field aliases to raw elasticsearch fields. You can see those here: https://github.com/icgc-dcc/dcc-portal/tree/develop/dcc-portal-pql/src/main/java/org/dcc/portal/pql/meta

To Summarize
The meta-model for JQL is in the server/pql/convert package and the model itself is described in the pql/meta package. We hope to one day expose an endpoint that would allow power users to directly query the portal with PQL allowing for much more powerful and general queries.

Let me know if failed to clarify anything.


#3

Hi Dusan,

Sounds good, thanks!

So maybe there’s a Java API for performing search over ICGC data instead of calling REST API? :slight_smile:


#4

At the moment there is nothing published by us but I know others have written their own for interacting with our REST API.

I think last time I searched github I saw Javascript and Python libraries publicly available and from our logs we know people are using these languages in addition to some others like R to directly query our data.