MASK stage germline edits

Hey there,

We want to be able to run OHSU data through the release pipeline, only this dataset does not necessarily have matched tumor-normal samples, and thus does not have control genotype information, which the MASK stage of the pipeline expects as input.

I’ve edited the MASK stage, so that it essentially treats all mutations without control data as masked. The main edits are listed below, as well as a link he relevant pull request. The release pipeline functions and builds an elastic index with these edits. My question is this: will these edits affect germline security?

GenerateMaskedRow.java

   public Iterable<ObjectNode> call(ObjectNode row) throws Exception {
     // Create masked counterpart if sensitive (see
      // https://wiki.oicr.on.ca/display/DCCSOFT/Data+Normalizer+Component?focusedCommentId=53182773#comment-53182773)
  
      val rows = Lists.<ObjectNode> newArrayList();
 //    rows.add(row);
  
      if (getMarkingState(row) == CONTROLLED) {
        val referenceGenomeAllele = row.get(SUBMISSION_OBSERVATION_REFERENCE_GENOME_ALLELE).textValue();
 
       log.debug("Creating mask for '{}'", row);
       val mask = mask(row.deepCopy(), referenceGenomeAllele);
       regenerateId(mask);
 
       log.debug("Resulting mask for '{}': '{}'", row, mask);
       rows.add(mask);
     }
 
     return rows;
   }

MarkSensitiveRow.java

   public ObjectNode call(ObjectNode row) throws Exception {
     val referenceGenomeAllele = row.get(SUBMISSION_OBSERVATION_REFERENCE_GENOME_ALLELE).textValue();
      val controlGenotype = row.get(SUBMISSION_OBSERVATION_CONTROL_GENOTYPE).textValue();
      val tumourGenotype = row.get(SUBMISSION_OBSERVATION_TUMOUR_GENOTYPE).textValue();
      val mutatedToAllele = row.get(SUBMISSION_OBSERVATION_MUTATED_TO_ALLELE).textValue();
      val mutatedFromAllele = row.get(SUBMISSION_OBSERVATION_MUTATED_FROM_ALLELE).textValue();
  
      // Mark if applicable
      final Marking masking;
     if (controlGenotype == null || tumourGenotype == null || mutatedFromAllele == null) {
 
       log.debug("Marking row without control data: '{}'", row);
       masking = CONTROLLED;
 
     } else if (!matchesAllControlAlleles(referenceGenomeAllele, controlGenotype)
          || !matchesAllTumourAllelesButTo(referenceGenomeAllele, tumourGenotype, mutatedToAllele)) {
  
        log.debug("Marking sensitive row: '{}'", row); // Should be rare enough

     for (val tumourAllele : getTumourAllelesMinusToAllele(tumourGenotype, mutatedToAllele)) {
       if (!referenceGenomeAllele.equals(tumourAllele)) {
         return false;
       }
     }
     return true;
   }

https://github.com/ohsu-comp-bio/dcc-release/pull/4

Hi @mayfielg,

Would you prefer to keep the comments here or on the PR itself? Just want to make sure it would be cool if was commenting on your PR.

Sure, right on the PR is great. Thanks.

Hey @mayfielg, just wanted to let you know that I have not forgotten about the question and that I’m still waiting on a second opinion.