Linda Sundermann (affiliated, project started November 2014)
Cancer samples are often genetically heterogeneous, harboring subclonal populations (subpopulations) with different mutations such as copy number variations (CNVs) or simple somatic mutations (SSMs; i.e., single nucleotide variants and small insertions and deletions). Information about such mutations in the subpopulations can help to identify driver mutations or to choose targeted therapies. Sequencing of bulk tumor samples is current standard practice because singlecell assays are yet not well established due to high cost and limited resolution.
Recently, several methods that attempt to infer the genotype of subpopulations using CNVs, SSMs, or both have been published. Here, we present Onctopus,a new approach to jointly model and reconstruct the subclonal composition of a bulk tumor sample utilizing SSMs and CNVs.
Given variant counts of SSMs and heterozygous germline SNPs, as well as information about the position and read depth of segments affected by CNVs, Onctopus assigns a frequency, CNVs and SSMs to Nsubclonal lineages (sublineages). Each of these lineages is defined through the CNVs and SSMs that arose in this lineage. SNPs, which are needed to identify copy number changes in sublineages, are assigned to the normal lineage. If SSMs or SNPs are influenced by CNVs, our model can phase them relative to the chromosome copy where the CNV occurs independently of known haplotype blocks.
We build a joint likelihood model and model the tumor as consisting of a mixture of lineages on which we infer a partial order. We choose sublineages to avoid ambiguous solutions that can occur when copy numbers are determined for subpopulations. We developed a linear relaxation of our model as a mixed integer linear program that can be solved with stateoftheart solvers.
Supervisors: Jens Stoye (Bielefeld University)
Project co-supervised by: Gunnar Rätsch (ETH Zurich), Quaid Morris (University of Toronto)