Why there exist 1bp CNVs in the CNSM data?


#1

Hey there,

I downloaded all the available CNSM data form ICGC (11,662 donors), but I was wondering why there exist 1bp CNVs, and a lot of CNVs smaller than 100bp, what is the difference between them and mutation & indel?
eg.
icgc_donor_id project_code icgc_specimen_id icgc_sample_id matched_icgc_sample_id submitted_sample_id submitted_matched_sample_id mutation_type copy_number segment_mean segment_median chromosome chromosome_start chromosome_end assembly_version chromosome_start_range chromosome_end_range start_probe_id end_probe_id sequencing_strategy quality_score probability is_annotated verification_status verification_platform gene_affected transcript_affected gene_build_version platform experimental_protocol base_calling_algorithm alignment_algorithm variation_calling_algorithm other_analysis_algorithm seq_coverage raw_data_repository raw_data_accession
DO11034 GBM-US SP23901 SA140828 TCGA-12-1602-01A-01D-0591-01 undetermined NA -2.0069 NA 6 78470133 78470133 GRCh37 NA NA non-NGS NA NA not annotated not tested Affymetrix Genome-Wide Human SNP Array 6.0 Genome_Wide_SNP_6 https://www.affymetrix.com/ NA TCGA TCGA-12-1602-01A-01D-0591-01
DO23028 LIHC-US SP49551 SA269377 TCGA-CC-A1HT-01A-11D-A12Y-01 undetermined NA -1.6235 NA 10 1443180 1443180 GRCh37 NA NA non-NGS NA NA not annotated not tested Affymetrix Genome-Wide Human SNP Array 6.0 Genome_Wide_SNP_6 https://www.affymetrix.com/ NA TCGA TCGA-CC-A1HT-01A-11D-A12Y-01


#2

Dear Lou,

Thanks for reporting this. I took a look at the copy number data for DO11034, indeed, there are some very short segments.

The main reason this happens is that the copy number result submitted to ICGC DCC was analyzed by different ICGC member projects, there was not uniform analysis across different projects. The criteria for calling CNVs are determined by each member.

With this in mind, it would be good to interpret the data with caution. It’s more of lower level data, further analysis would be necessary.

Hope this helps, please let us know if you have any further questions.

Best regards,
Junjun


#3

Dear Lou,

Just want to add that PCAWG, an ICGC collaborative study aiming to analysis whole genome sequencing data uniformly, has produced copy number data through standard PCAWG variant calling workflows. Here you can find the data: https://icgc.org/ZBZ.

Let us know if you need any further assistance accessing ICGC data.

Best regards,
Junjun