Data submission configuration

mayfielg · February 17, 2017, 8:42pm

I want to run some sequencing data through the release pipeline that has been analyzed using both the Mutect and Varscan2 variant calling algorithms. Meaning I have two sets of maf files (turned into ssm) on the same donor/sample set, created by the two algorithms. I would like to include the files from both analyses as input to dcc-release, but I’m concerned that doing so would result in some mutations picked up by both algorithms getting counted as two different instances of the same mutation in a single donor.

How have you handled this sort of scenario before? Does the release pipeline account for that sort of input data, or does it expect every instance of a mutation to only appear in the input data once?

andricDu · February 17, 2017, 9:37pm

Here you can see the business key of a mutation:

github.com

icgc-dcc/dcc-id/blob/develop/dcc-id-server/src/main/java/org/icgc/dcc/id/server/controller/MutationController.java#L51



/**
 * Dependencies
 */
@NonNull
private final MutationRepository repository;
@NonNull
private final ExportService exportService;

@IdCreatable
@Cacheable(value = "mutationIds", key = "{ #chromosome, #chromosomeStart, #chromosomeEnd, #mutation, #mutationType, #assemblyVersion }")
@RequestMapping(value = "/id", method = GET)
public String mutationId(
    // Required
    @RequestParam("chromosome") String chromosome,
    @RequestParam("chromosomeStart") String chromosomeStart,
    @RequestParam("chromosomeEnd") String chromosomeEnd,
    @RequestParam("mutation") String mutation,
    @RequestParam("mutationType") String mutationType,
    @RequestParam("assemblyVersion") String assemblyVersion,
    // Optional

If two variants have the same business key they will be assigned to the same mutation ID. However the end result will show them as two distinct occurrences of that particular mutation with different values for things like calling algorithm and experimental protocol.

Here you can see what the end result for a single occurrence of a mutation would look like in the data portal:

mayfielg · February 17, 2017, 9:46pm

Okay. Thanks! I’ll talk to the lab about what they’d prefer for their data.