Hey there,
We want to be able to run OHSU data through the release pipeline, only this dataset does not necessarily have matched tumor-normal samples, and thus does not have control genotype information, which the MASK stage of the pipeline expects as input.
I’ve edited the MASK stage, so that it essentially treats all mutations without control data as masked. The main edits are listed below, as well as a link he relevant pull request. The release pipeline functions and builds an elastic index with these edits. My question is this: will these edits affect germline security?
GenerateMaskedRow.java
public Iterable<ObjectNode> call(ObjectNode row) throws Exception {
// Create masked counterpart if sensitive (see
// https://wiki.oicr.on.ca/display/DCCSOFT/Data+Normalizer+Component?focusedCommentId=53182773#comment-53182773)
val rows = Lists.<ObjectNode> newArrayList();
// rows.add(row);
if (getMarkingState(row) == CONTROLLED) {
val referenceGenomeAllele = row.get(SUBMISSION_OBSERVATION_REFERENCE_GENOME_ALLELE).textValue();
log.debug("Creating mask for '{}'", row);
val mask = mask(row.deepCopy(), referenceGenomeAllele);
regenerateId(mask);
log.debug("Resulting mask for '{}': '{}'", row, mask);
rows.add(mask);
}
return rows;
}
MarkSensitiveRow.java
public ObjectNode call(ObjectNode row) throws Exception {
val referenceGenomeAllele = row.get(SUBMISSION_OBSERVATION_REFERENCE_GENOME_ALLELE).textValue();
val controlGenotype = row.get(SUBMISSION_OBSERVATION_CONTROL_GENOTYPE).textValue();
val tumourGenotype = row.get(SUBMISSION_OBSERVATION_TUMOUR_GENOTYPE).textValue();
val mutatedToAllele = row.get(SUBMISSION_OBSERVATION_MUTATED_TO_ALLELE).textValue();
val mutatedFromAllele = row.get(SUBMISSION_OBSERVATION_MUTATED_FROM_ALLELE).textValue();
// Mark if applicable
final Marking masking;
if (controlGenotype == null || tumourGenotype == null || mutatedFromAllele == null) {
log.debug("Marking row without control data: '{}'", row);
masking = CONTROLLED;
} else if (!matchesAllControlAlleles(referenceGenomeAllele, controlGenotype)
|| !matchesAllTumourAllelesButTo(referenceGenomeAllele, tumourGenotype, mutatedToAllele)) {
log.debug("Marking sensitive row: '{}'", row); // Should be rare enough
for (val tumourAllele : getTumourAllelesMinusToAllele(tumourGenotype, mutatedToAllele)) {
if (!referenceGenomeAllele.equals(tumourAllele)) {
return false;
}
}
return true;
}