The Exomes of the NCI

socoolso 2016-06-24

展开全文

Abstract

The NCI-60 cell lines are the most frequently studied human tumor cell lines in cancer research. This panel has generated the most extensive cancer pharmacology database worldwide. In addition, these cell lines have been intensely investigated, providing a unique platform for hypothesis-driven research focused on enhancing our understanding of tumor biology. Here, we report a comprehensive analysis of coding variants in the NCI-60 panel of cell lines identified by whole exome sequencing, providing a list of possible cancer specific variants for the community. Furthermore, we identify pharmacogenomic correlations between specific variants in genes such as TP53, BRAF, ERBBs, and ATAD5 and anticancer agents such as nutlin, vemurafenib, erlotinib, and bleomycin showing one of many ways the data could be used to validate and generate novel hypotheses for further investigation. As new cancer genes are identified through large-scale sequencing studies, the data presented here for the NCI-60 will be an invaluable resource for identifying cell lines with mutations in such genes for hypothesis-driven research. To enhance the utility of the data for the greater research community, the genomic variants are freely available in different formats and from multiple sources including the CellMiner and Ingenuity websites. Cancer Res; 73(14); 4372–82. ?2013 AACR.

Introduction

The NCI-60 human tumor cell line panel (1) is used by a broad range of cancer investigators and by the NCI Developmental Therapeutics Program (DTP) to discover novel anticancer drugs (2). This panel represents an invaluable and publicly accessible platform of pharmacological, genomic, metabolomic, biochemical, and molecular datasets (3–8). This study reports findings from whole exome sequencing (WES) of the NCI-60 panel of cell lines. In addition, pharmacogenomic analyses provide examples of a few of the many ways the variant data could be used to generate novel hypotheses. Our study complements two recently published large-scale cancer cell line sequencing studies, which used a limited number of genes (9, 10), because our work provides the whole exome variants for the entire NCI-60 cell lines. The data are made available through the CellMiner, NCI DTP and Ingenuity Systems' websites (11).

Materials and Methods

Cell lines

The list of cell lines in the NCI-60 panel and their tissue origins are given in Supplementary Fig. S8. DNA was extracted from cells and fingerprinted as described before (12).

Exome capture and sequencing

Briefly, 38 Mb of coding region for each cell line was captured using the Agilent SureSelect All Exon v1.0 Kit (Agilent). Genomic DNA (3 μg) was sheared using the Covaris S2 ultra-sonicator (Covaris) using the settings duty cycle 10%, intensity 5%, cycle/burst 200, and time 60s, which yielded a fragment size distribution with a mean at 200 bp. Libraries were generated using standard Illumina library protocol (Illumina) followed by size selection using ChromaSpin TE200 spin columns (Clonetech). Pre- and postcapture steps were conducted following the manufacturers' protocol (Agilent). The samples were sequenced as paired-end 80-mer reads on an Illumina Genome Analyzer IIx instrument (Illumina) following the manufacturers' protocol.

Data processing and variant calls

Fastq files were aligned against the reference human genome build 19 (hg19) using the Burrows-Wheeler Aligner (13). Alignment files were base quality score recalibrated and locally realigned around indels with GATK (14) and marked for duplicates using PICARD tools (picard.sourceforge.net). Alignment files and variant calls can be accessed from the links provided (11). Consensus genotype calls were generated using samtools mpileup (15) and annotated using the Annovar package (16). Variants were further filtered for the SureSelect bait region, a minimum read depth of 6 and a minimum quality score of 30 for single nucleotide variant (SNV) and 60 for indels, producing the final variant calls.

Drug activity determination

Drug activity was determined by the DTP human cancer cell line screen (11). The concentration of agent required to cause 50% growth inhibition (GI₅₀) as measured at 48 hours by the sulphorhodamine B assay (17) was determined.

Gene expression and other NCI-60 molecular characterization

mRNA expression, miRNA expression, copy number, and protein measurements are publicly available from DTP or from CellMiner (excluding the protein data; ref. 11). The details pertaining to data acquisition and analysis were previously published (18).

Volcano plots

The x-axis of a volcano plot depicts the difference in mean log GI₅₀ between the cell lines containing a mutation in the specified gene and the cell lines not containing such a mutation. The y-axis depicts the statistical significance level for the comparison of log GI₅₀ for those 2 groups of cell lines with larger values indicating smaller P values. On a volcano plot for a gene, the points represent the compounds. On a volcano plot for a compound, the points represent the genes. For a volcano plot representing a gene, the false discovery rate can be limited to 0.2 or less by restricting attention to the 310 clinical and investigational compounds with P values no greater than 0.0005. When examining all of the screening compounds, the false discovery rate will be greater unless attention is restricted by a more stringent significance cut-off (e.g., 10⁴) and an imposed cut-off on difference in log GI₅₀ between mutated and wild-type groups (e.g., ± 0.5). In general, however, the volcano plots are used either to confirm previously identified hypotheses or to generate hypotheses that require independent validation.

Super Learner prediction models

Using GI₅₀ data on the NCI-60 for 103 U.S. Food and Drug Administration (FDA)-approved and 207 investigational oncology drugs and the 711 genes with at least 5 cell lines containing a type II variant in the gene, we estimated a predictor for each drug using the Super Learner algorithm (19). The predictor uses the gene-level mutation profile to predict the log GI₅₀ for each drug. The Super Learner is an ensemble-based prediction methodology that combines different machine-learning predictors into a single optimal predictor based on minimizing the cross-validated risk. The base algorithms for the Super Learner include elastic net regression, gradient-boosting regression, bagging, CART, random forests, neural networks, and support vector machines. In total, 35 prediction algorithms were combined for the Super Learner ensemble. We do not expect a single prediction algorithm (e.g., elastic net regression) to be optimal across all 310 drugs, and the Super Learner allows the final predictor to data-adaptively up-weight the best algorithms for the final predictor. Examining the weights for each algorithm across the 310 drugs (data not shown) shows great variability, indicating we should see a benefit with the Super Learner ensemble approach. Within a drug, the Super Learner predicts the log GI₅₀ based on the gene-level mutation profiles. To compare across the drugs with different potencies, the log GI₅₀ values need to be normalized. We define the normalized log GI₅₀ for a cell line as the log GI₅₀ minus the mean log GI₅₀ for that drug in all the other cell lines. For ROC analysis, we classified a cell line as sensitive to a drug if its true-normalized log GI₅₀ was less than ?0.5, and insensitive if the value was greater than 0.5.

Results

The variant calls were generated as described in Materials and Methods, where we filtered variants with a minimum quality of 30 (60 for small insertions/deletions) and a minimum depth of 6 with at least 3 alternate alleles over the targeted 38 Mb coding region. Because matched normals are not available for cell lines, we conducted a more stringent filtering to identify potential cancer-specific variants. Using this filtering, the variants were divided into 2 groups: type I variants corresponding to common (and possibly germline) variants and type II variants enriched for acquired cancer-specific variants (Supplementary Figs. S1 and S2). We obtained more than 1.2 million type I and 60,005 type II variants in the NCI-60 cell lines.

Although a limitation of cell line sequencing is the lack of available normal-matched tissue for comparison, the NCI-60 panel does allow comparisons between cell lines from 9 distinct tissues of origin. NCI-60 cell lines with known microsatellite instability (MSI; Supplementary Fig. S3) have very high type II variant counts (Fig. 1A). However, HCC2998, a colon cancer cell line not known to have MSI, has the highest number of type II variants. In contrast to the known MSI cell lines, more than 98% of HCC2998 type II variants are SNVs (Supplementary Fig. S4), suggesting that this hypermutator phenotype arises from a mechanism other than MSI. Of interest, HCC2998 carries a POLE exonuclease domain missence variant coding for a P286R mutation in POLε (Supplementary Fig. S5). Previous reports indicate that impaired POLε proofreading results in a high rate of single nucleotide substitutions and increased tumor formation (20) and POLE mutations in colorectal cancer has recently been reported (21). HCC2998 seems to exemplify this phenomenon, providing a reagent for further investigation and illustrating the utility of the NCI-60 WES data.

Figure 1.

Results of WES variant calling. A, variant counts for each cell line from each tumor type are plotted for types I and II fraction as green squares and red diamonds, respectively. Within each tumor type, the variant counts are sorted from lowest to highest, and a box blot is superimposed to show subgroup mean and spread. Microsatellite unstable cell lines are marked with a red asterisk. B, base ti/tv ratio is plotted for each tumor type in the NCI-60 panel for type II variants that may likely be tumor specific. The y-axis represents the fraction of base conversions from a C:G or a T:A base pair to any other possible base pair change, which cumulatively equals 1. See also Supplementary Fig. S1 for additional details.

Given the diversity in the NCI-60 panel based on the tissue of origin, the WES data reveal important information about the etiology of each subgroup. As is evident from Fig. 1B, there is a wide range of transition-to-transversion ratios (ti/tv) among the NCI-60 panel. Melanoma cell lines have the highest ti/tv (3.93) with higher C:G to T:A transitions, which is the major mode of change for UV-induced DNA damage (22). In contrast, lung cancer cell lines have a ti/tv (0.67) indicative of tobacco smoke-induced DNA damage (23). Thus, the WES data supports the prior notion that the NCI-60 panel retains disease etiology signatures (7).

Figure 2A shows a map of the 10 most frequently mutated genes in the Catalogue of Somatic Mutations in Cancer (COSMIC) database (24). We annotated the WES variant calls as those present in the COSMIC database (v59) and those that are absent in COSMIC but predicted to be deleterious by the Sorting Tolerant From Intolerant (SIFT; ref. 25) or PolyPhen (26) algorithms. TP53 is the most frequently mutated gene overall, whereas BRAF is the most frequently mutated gene among melanoma cell lines (Fig. 2A). Although most of the variants identified in these 10 genes are already annotated in COSMIC, novel variants in these 10 genes were also observed. Although, the lack of normal tissue makes it almost impossible to validate these as somatic changes, these variants were not observed in either the 1,000 Genomes Project (27), or in the 5,600 normal whole exomes available through the NHLBI Exome Sequencing Project (28). Besides the many well-defined cancer genes such as those in Fig. 2A, large-scale tumor sequencing efforts by others continue to lead to the discovery of novel cancer genes, such as the 16 genes listed in Fig. 2B. Because the NCI-60 cell lines are so well characterized and readily available, they are ideal tools for hypothesis-driven research of these novel cancer genes/mutations identified by large-scale sequencing efforts. Details for these particular mutations or for any other gene mutation can be downloaded from the public domains, including CellMiner (Fig. 3) or Ingenuity website (Supplementary Fig. S6).

Figure 2.

Mutation spectrum for the top 10 most frequently mutated genes and novel cancer-related genes in the NCI-60. A, the top 10 cosmic census cancer genes (sorted by the number of occurrences in the NCI-60 panel) were scored for the presence of mutations in each cell line. Gray marks variants annotated in the COSMIC v59 database. Blue marks variants that are not in the COSMIC database but identified in this study and predicted to be of deleterious in nature (either SIFT score < 0.05 or polyphen2 score > 0.85). Magenta marks cases where a cell lines harbors at least one COSMIC annotated and at least one novel variant in a particular gene (a gray and a blue mark). B, new cancer genes identified in recent large-scale sequencing studies such as: SETD2 (38), LRP1B (39, 40), PBRM1 (41), SPTA1 (42), DNMT3A (43), ARID1A (44), GRIN2A (45), TRRAP (45), STAG2 (46), EPHA3/5/7 (39), POLE (21), and SYNE1 (47). Blue boxes represent likely loss-of-function mutations (e.g., nonsense, splice site, initiation loss, and frame shift insertions or deletions), whereas magenta indicates missense mutations. Cases with co-occurrence of both types are labeled in gray.

Figure 3.

Snapshot from the CellMiner website. A, to access tabular data, first click on the “Query Genomic Data Sets” tab. Specify data you want by: (i) identifying the query type in step 1 (HUGO name is required); (ii) choosing whether you wish to type in your identifier, or upload your identifier(s) as a file in step 2; (iii) identifying the dataset being queried in step 3 (in this case exome sequencing); (iv) entering your e-mail address in step 6 and clicking “Get data.” B, the tabular data sent to you will include a full set of the data for all 60 cell lines (only 1 cell line is included for reasons of space). Within the output, (i) the probe ID denotes the chromosome number, start location, and the nucleotide change; (ii) AA is amino acid; (iii) dbSNP id; (iv) allele frequency in 1,000 genomes; (v) allele frequency in ESP5400; (vi) SIFT score; (vii) NCBI accession number; (viii) Polyphen2 score. C, to access graphical data, first click on the “NCI-60 Analysis Tools” tab. Choose the graphical output tool by (i) clicking “Graphical output for DNA:Exome sequencing” in step 1; (ii) choosing whether you wish to type in a your identifier, or upload your identifier(s) as a file in step 2; (iii) identifying the gene being queried, also in step 2; (iv) entering your e-mail address in step 3 and clicking “Get data.” D, the graphical data will be sent as an html, with accompanying pngs. The summary of all variants in BRAF is shown (individual cell lines are also included). The number of variants at each location are depicted by the vertical green, red, or brown lines.

To show the utility of this unique dataset and illustrate one of many ways to apply these data in hypothesis-driven research, we carried out an integrated pharmacogenomic investigation. The fact that the NCI-60 panel has been used to screen thousands of compounds provides a rich resource for testing the relationship of variants in genes to drug response. Among 43,225 compounds screened for activity against the NCI-60 cell lines (as of September 2012), 15,898 showed high dynamic range in their GI₅₀ estimates across all cell lines. For each gene with at least 5 cell lines containing a type II variant, we evaluated the association of log GI₅₀ to variants in genes for all of the screened compounds. TP53, the most frequently mutated gene in the NCI-60 panel, shows strong correlation with drug response. MDM2 inhibitors are effective agents in cell lines with wild-type p53 (Fig. 4A), where they can induce cell death. Of the 15,898 compounds and 310 FDA-approved or investigational oncology drugs, the activities of 2 clinically relevant MDM2 inhibitors show strong negative correlation with mutant p53 (Fig. 4B). Nutlin-3 gives the highest statistical significance score for its activity in p53 wild-type cell lines (Fig. 4B and C). MI-219, a known MDM2 inhibitor, exhibits a similar strong negative correlation with mutant p53 (Fig. 4B). In contrast, National Service Center (NSC)-670177 (Supplementary Table S1) shows significant selectivity for the p53 mutant cells. However, the proposed p53-specific compound reactivation of p53 and induction of tumor cell apoptosis (RITA; NSC-652287; ref. 29), initially identified as a DNA cross-linking agent (30), showed little evidence of selective activity for cell lines with p53 wild-type status and only limited correlation with nutlin-3 (Supplementary Fig. S7A), questioning the claim that RITA acts specifically as a p53-reactivating compound. As for comparison, RITA displays far less selectivity for p53 wild-type cells than the classical DNA-targeted agent mithramycin. As expected, expression of the well-known components of the p53 pathway, MDM2 and miR-34a (31) correlate with p53 wild-type cell lines (Fig. 4E and F). Additional pharmacogenomic correlations between TP53 mutational status, miRNAs, mRNA transcripts, or other agents are listed in Supplementary Fig. S7B. Integrating additional genomic datasets, such as gene and miRNA expression data (18) strengthens the value in all these comprehensive datasets for the NCI-60 panel.

Figure 4.

Correlation of TP53 wild-type cells with nutlin-3 and other p53 pathway modulators. A, schematic representation of the p53-MDM2 feedback loop with p53 acting as a positive transcription factor for MDM2 and miRNA-34a whereas nutlin-3 acts as an MDM2 antagonist (48), blocking MDM2-mediated p53 degradation and killing of wild-type p53 cell lines. B, the volcano plots show the difference in mean log GI₅₀ between the cell lines containing a type II variant in TP53 versus those cell lines not containing a variant along the x-axis and the ?log₁₀ P value on the y-axis. Each red point represents one of the 15,989 compounds tested from the NCI screening data plus 310 approved and investigational drugs (green points). A magenta guideline is given at significant P-value 10⁴. The NSC numbers or names for the statistically significant and for comparison some nonsignificant compounds are annotated on the plot. TP53-reactivating compounds from literature and in red. C, antiproliferative activity of nutlin-3 across the NCI-60 cell lines, where the bar graph is color coded by tissues of origins. D, the TP53 wild-type cells are marked with horizontal bars, red tick marks, and red lettering. E, MDM2 expression is highest in the TP53 wild-type cells and those targeted by nutlin-3 (note mirror image profiles). F, the expression profile of miRNA 34a, an established p53 target. Abbreviations: BR, breast; CNS, central nervous system; CO, colorectal; LE, leukemia; ME, melanoma; LC, lung cancer; OV, ovarian; PR, prostate; and RE, renal. See also Supplementary Fig. S6 for additional correlations.

We further supplemented this work with cross-validated multivariate analyses. For each of the 310 FDA-approved or investigational oncology drugs, we developed a Super Learner ensemble machine-learning model predicting log GI₅₀ based on variants in genes. We included genes with type II variants in 5 or more cell lines across the NCI-60 panel. Leave-one-out cross-validation was used to evaluate the ability of such modeling to distinguish sensitive from insensitive cell lines for individual drugs and to select active drugs for individual cell lines. We developed these 310 models for each loop of a cross-validation in which one cell line was omitted and the remaining cell lines were used as a training set. Those models were then used to predict the log GI₅₀ values for all drugs for the omitted cell line thereby predicting the most active drugs (smallest normalized log GI₅₀; see Materials and Methods) against this cell line (Supplementary Table S2). Using these models, we generated cross-validated receiver operating characteristic (ROC) curves for each cell line (Supplementary Fig. S8). The ROC curve plots sensitivity versus one minus specificity for identifying active drugs. The area under the curve (AUC) between the ROC curve and the diagonal line is a measure of the predictive accuracy of the WES-based models. A large AUC value for a cell line indicates that the mutation spectrum of the cell line is informative for discriminating active from inactive drugs. The set of drugs analyzed, however, contains many cytotoxics, for which the predictive model based only on mutation spectrum was poorly informative. Our models included only mutation status and did not attempt to distinguish the confounding between mutation status and cell line lineage. Further studies with comprehensive models that include copy number, transcript abundance, and methylation status should yield more accurate predictions.

The ROC curves provide valuable insight into cancer biology. For instance, among the NCI-60 melanoma cell lines, SK-MEL-2 has the lowest AUC value (Fig. 5A). This is particularly interesting because SK-MEL-2 is the only non-BRAF-V600E mutant melanoma cell line with an activating NRAS-Q61R mutation. As shown with the volcano plot in Fig. 5B, the 3 BRAF-V600E–specific inhibitors PLX-4720, vemurafenib (Fig. 5C) and SB-590885 stand out with extremely high significance and differential mean GI₅₀ in the BRAF-mutant cell lines. All the MEK inhibitors (blue font) including selumetinib (Fig. 5D) and hypothemycin (Fig. 5E) show highly significant selectivity and differential GI₅₀, indicating their therapeutic value in cancer cells with activated mitogen-activated protein kinase (MAPK) pathway. Notably, one compound, NSC-678518 showed extreme selectivity for the BRAF-mutated cells. NSC-678518, the anthrax lethal factor, was identified in a screen for agents with similar inhibitory profiles to another MAPK kinase inhibitor, PD098059, and shown to proteolytically inactivate such kinases (32).

Figure 5.

Correlation between MAPK pathway mutations and drug response to compounds that target this pathway in the NCI-60 panel. A, ROC for cross-validated drug predictors for melanoma cells. Cross-validated ROC curves are shown for each cell line. The inset reports the AUC for each cell line and the number of inactive drugs (n1) and active drugs (n2). B, same volcano plot as in Fig. 4B, for BRAF variants. A magenta guideline is given at significant P-value 10⁴. The NSC numbers or names for the statistically significant and for comparison some nonsignificant compounds are annotated on the plot. Drug response for the BRAF V600E inhibitor vemurafenib (C), the MEK inhibitor selumetinib (D), and the MEK/ERK inhibitor hypothemycin (E). Cell lines with mutations are labeled in red for the gene(s) indicated to the right. F, heat map showing correlations between mutations in key signaling intermediates (PTEN, PIK3R1, PIK3CA, ERBB2, BRAF, and NRAS) versus drugs that target these pathways; MAPK pathway inhibitors (blue), PI3K pathway inhibitors (green), EGFR/ERBB inhibitors (magenta). Values for each drug represent the mean GI₅₀ for each cell line with the particular gene mutations, including previously published deletions and small mutations (49). The number of cell lines with the particular mutation is given in parentheses.

Parallel studies support the value of correlating genomics and targeted agents (2, 9, 10). Figure 5C to E exemplifies that mutations in protein kinase target genes are strong indicators of response to clinically relevant targeted drugs. In addition, such observation could be generalized to key signaling pathways. Ten distinct kinase inhibitors from 3 major target classes cluster separately depending on the mutations in 6 genes: BRAF, NRAS, PIK3CA, PIK3R1, PTEN, and ERBB2 (Fig. 5F). These effects can be viewed in the context of the MAPK and phosphoinositide 3 kinase (PI3K) pathways downstream of receptor tyrosine kinases (RTK).

One of the most clinically relevant RTK is the epidermal growth factor receptor (EGFR). However, as showed by Garnett and colleagues (10), it is critical to integrate genomic mutation data with transcript levels to correlate and possibly predict drug responses. The NCI-60 provides a solid background for studying gene expression (see MDM2 example in Fig. 4E; ref. 18), and its large drug database offers unique opportunities to query drug response parameters. To test this possibility, we examined the EGFR inhibitor, erlotinib, whose activity is highly correlated with gefitinib and lapatinib in the NCI-60 (see Fig. 6 in ref. 18). Overall, high expression of EGFR (ERBB1) and ERBB2 are determinants of cellular response to erlotinib (Fig. 6B). However, the colon and central nervous system (CNS) cell lines are generally insensitive to erlotinib in spite of high EGFR and ERBB2 expression. This can be rationalized by taking into account mutations in the MAPK or PI3K pathways, a common mechanism of resistance (33), which are present in all 7 colon and 4 of 6 CNS cell lines (Fig. 6B).

Figure 6.

Correlation between erlotinib response and EGFR pathway gene expression and RAS–RAF–PTEN mutations in the NCI-60 panel. A, schematic representation of the EGFR pathway with its 4 components: ERBB1 (EGFR), ERBB2, ERBB3, and ERBB4. Dimerization complexes are indicated as nodes on the double-ended arrows according the Kohn's MIM nomenclature convention (50). Activations are shown as green arrows. Activating mutations of RAS or RAF directly activate MEK and render cells resistant to erlotinib (33). Similarly, inactivation of PTEN confers resistance by direct activation of PI3 kinase. B, (left), antiproliferative activity of erlotinib across the NCI-60. The cell lines are color coded by tissues of origins; (center left) the RAS-RAF-PTEN wild-type (WT) cells are marked as full horizontal bars. Mutant cells (Mut) are shown as short bars; ERBB1 expression is highest in many of the cells targeted by erlotinib (center right; note mirror image profiles); ERBB2 expression profile (far right). The cell lines identified by arrows have focal amplification for ERBB1 (RE:SN12C) and ERBB2 (OV:SKOV3; unpublished data).

Additional examples of correlations between type II variants and the 16,208 compounds, including the 310 FDA-approved or investigational oncology drugs are included in Supplementary Figs. 9 and 10. Supplementary Fig. 9 contains volcano plots for type II variants in 44 other genes of interest with the corresponding list of significant NSC numbers in supplementary Table S3. Supplementary Fig. S10 shows volcano plots for 28 selected drugs that are in clinical use or clinical trials. Together, these data again show the potential value of the NCI-60 drug and genomic databases for systems pharmacology.

The power of WES, instead of focused sequencing of preselected genes as published (9, 10), was revealed when we coincidentally found a significant correlation between a germline in-frame deletion (delCAATGT) in ATAD5 (rs72427574) in certain cell lines and their increased sensitivity to DNA-damaging agent bleomycin. In addition, zorbamycin (NSC-146208), and peplomycin (NSC-276382), which are both bleomycin analogues, show strong activity toward these cell lines. ATAD5, the human homolog of yeast ELG1, is essential for maintaining genome stability through its functions in deubiqitinating proliferating cell nuclear antigen and is known to be mutated in endometrial cancer (34–36). Genotype calls revealed 10 cell lines where 5 are heterozygous and 5 are homozygous for delCAATGT. Of the 10 cell lines, 3 are renal (ACHN, CAKI-1, RXF-393), where earlier work suggests dimethane sulfonate analogues, such as DMS612, as effective agents against renal cancer (37) and are being investigated phase I trials in renal cancer patients (#09-C-0111). Interestingly, there are additional germline variants in ATAD5, that are also present exclusively in the same set of 10 cell lines. When we looked for possible haplotypes in the Hapmap database, we discovered a region of linkage-disequilibrium spanning more than 300 kb (Fig. 7B). Therefore, this particular haplotype could be a response modifier during chemotherapy with DNA-damaging agents. These results illustrate the discovery potential of exonic variant data when integrated with previously available NCI-60 databases.

Figure 7.

ATAD5 locus as a response modifier for DNA-damaging agents. A, same volcano plot as in Fig. 4B for ATAD5 delCAATGG (rs72427574). A magenta guideline is given at significant P-value 10⁴. The names for the statistically significant compounds are annotated on the plot. B, linkage disequilibrium plot characterizing haplotype blocks in the ATAD5 locus. The black bar marks the ATAD5 gene location. The haplotype blocks were created using HaploView program (51), version 4.2.

Discussion

In this study, we provide WES analysis of the widely used NCI-60 cell line panel. We show that the overall pattern of mutation is strikingly divergent between cell lines, ranging from 172 to 9205 type II variants. As expected, higher variant rates are observed in MSI cell lines; but remarkably, the highest number of SNVs was observed in HCC2998, a colon cancer cell line in which we discovered a defect in the proofreading domain of POLε. The signature of specific carcinogens is readily discernible in lung cancer and melanoma, which show very low (0.67) and high (3.93) ti/tv ratio, respectively. Variants in established cancer genes are abundantly represented in the NCI-60, and numerous examples of variants in recently implied cancer genes are also present.

In addition to the mutational data provided in this article, substantial drug sensitivity data for tens of thousands of compounds and multiple other types of biological data are available for the NCI-60. Using straightforward approaches (see Fig. 3) together with more sophisticated analyses, we were able to show the influence of specific variants for TP53, BRAF, KRAS, NRAS, PIK3CA, PTEN, and ERBBs on the response to clinically relevant targeted agents (nutlin, vemurafenib, selumetinib, hypothemycin, rapamycin, wortmannin, perifosine, erlotinib, afatinib, lapatinib, and neratinib) and to identify aspects of those results that may merit further study. For example, even though targeted inhibitors of activated BRAF-V600E have been widely studied, the comprehensive NCI-60 datasets offers a unique opportunity to identify additional mechanisms of resistance and possibly offer novel means to overcome acquired resistance. The power of the NCI-60 WES variants is apparent from the observation that common variants in the human population may have a profound effect on drug response. Of course, our observation regarding the ATAD5 gene locus requires further studies; however, it opens up a completely new perspective on common variants and their phenotypes in the context of DNA damaging agents and the ongoing clinical trials with DMS612 (37).

In comparison to the 2 recent studies conducted with more cell lines (947 in ref. 9 and 639 in ref. 10), our study integrates far more drugs (approximately 20,000 vs. 24 in ref. 9 and 130 in ref. 10; see volcano plots in Figs. 4, 5, 7, and Supplementary Figures) and provides a comprehensive dataset of all exonic variants for the NCI-60 cell lines, whereas 1,600 genes were sequenced in ref. 9 and 64 cancer-related genes in ref. 10. Given the availability of extensive biological and pharmacological data and the vast number of NCI-60 variants identified in this study, such comprehensive analyses as performed by these 2 studies offer enormous opportunities. The WES data that we are providing for the NCI-60 also enables the vast compound activity database to be used as a resource for drug development to complement genomic studies conducted using larger cell line panels. That is, when one discovers a genomic variant as a molecular target using other cell line resources, using the WES data for the NCI-60 one can potentially identify screened compounds with selective activity for that target. We have limited our work to the exploration of certain aspects of this invaluable data, and made this dataset public for the greater community to use and analyze. This is critical for expanding our knowledge in understanding tumorigenesis and the genomic bases of drug sensitivity in years to come as many more cancer-related gene aberrations are discovered.

Importantly, the availability of this sequencing data will allow increased precision in the use of these common cell lines as experimental models and, as indicated above, expand the utility of other cell line panels for drug development. To enable this important step forward, the complete dataset is readily accessible in 2 forms, the easily searchable CellMiner database and a prefiltered, annotated Ingenuity Systems database. Through these portals, cancer investigators will be able to select precisely the cell line models most genetically suited to their research. The availability of the variant information allows the formulation and testing of hypotheses arising from the entire range of projects using the NCI-60 or its components. In conclusion, our datasets add substantial depth to the already extensive characterization of the NCI-60 tumor cell panel and provide an invaluable resource for ongoing investigations in cancer cell biology and pharmacology.

Disclosure of Potential Conflicts of Interest

No potential conflicts of interest were disclosed.

Authors' Contributions

Conception and design: O.D. Abaan, S. Davis, J.H. Doroshow, Y. Pommier, P.S. Meltzer

Development of methodology: O.D. Abaan, S. Davis, R. Walker, Y. Jiang, R.M. Simon, Y. Pommier, P.S. Meltzer

Acquisition of data (provided animals, acquired and managed patients, provided facilities, etc.): O.D. Abaan, M. Pineda

Analysis and interpretation of data (e.g., statistical analysis, biostatistics, computational analysis): O.D. Abaan, E.C. Polley, S. Davis, Y.J. Zhu, S. Bilke, Y. Gindin, S.L. Holbeck, R.M. Simon, J.H. Doroshow, Y. Pommier, P.S. Meltzer

Writing, review, and/or revision of the manuscript: O.D. Abaan, E.C. Polley, S. Davis, S.L. Holbeck, R.M. Simon, J.H. Doroshow, Y. Pommier, P.S. Meltzer

Administrative, technical, or material support (i.e., reporting or organizing data, constructing databases): O.D. Abaan, S. Davis, R. Walker, M. Pineda, W.C. Reinhold, J.H. Doroshow, P.S. Meltzer

Study supervision: S. Davis, J.H. Doroshow, P.S. Meltzer

Grant Support

This study was supported by the Division of Cancer Treatment and Diagnosis (DCTD), and the Center for Cancer Research (CCR) of the National Cancer Institute, NIH.

The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.

Acknowledgments

The authors thank B. Kopp, NCI-Frederick, for DNA purification and validation. The authors thank the NHLBI GO Exome Sequencing Project and its ongoing studies that produced and provided exome variant calls for comparison: the Lung GO Sequencing Project (HL-102923), the WHI Sequencing Project (HL-102924), the Broad GO Sequencing Project (HL-102925), the Seattle GO Sequencing Project (HL-102926), and the Heart GO Sequencing Project (HL-103010).

Footnotes

Note: Supplementary data for this article are available at Cancer Research Online (http://cancerres./).

Received August 24, 2012.
Revision received March 26, 2013.
Accepted April 26, 2013.

2013 American Association for Cancer Research.

References

1.
1. Shoemaker RH
. The NCI60 human tumour cell line anticancer drug screen. Nat Rev Cancer 2006;6:813–23.
OpenUrl CrossRef Medline Google Scholar
2.
1. Weinstein JN
. Drug discovery: cell lines battle cancer. Nature 2012;483:544–5.
OpenUrl CrossRef Medline Google Scholar
3.
1. Scherf U,
2. Ross DT,
3. Waltham M,
4. Smith LH,
5. Lee JK,
6. Tanabe L,
7. et al.
A gene expression database for the molecular pharmacology of cancer. Nat Genet 2000;24:236–44.
OpenUrl CrossRef Medline Google Scholar
4.
1. Staunton JE,
2. Slonim DK,
3. Coller HA,
4. Tamayo P,
5. Angelo MJ,
6. Park J,
7. et al.
Chemosensitivity prediction by transcriptional profiling. Proc Natl Acad Sci U S A 2001;98:10787–92.
OpenUrl Abstract/FREE Full Text Google Scholar
5.
1. Szakacs G,
2. Annereau JP,
3. Lababidi S,
4. Shankavaram U,
5. Arciello A,
6. Bussey KJ,
7. et al.
Predicting drug sensitivity and resistance: profiling ABC transporter genes in cancer cells. Cancer Cell 2004;6:129–37.
OpenUrl CrossRef Medline Google Scholar
6.
1. Zoppoli G,
2. Solier S,
3. Reinhold WC,
4. Liu H,
5. Connelly JW Jr..,
6. Monks A,
7. et al.
CHEK2 genomic and proteomic analyses reveal genetic inactivation or endogenous activation across the 60 cell lines of the US National Cancer Institute. Oncogene 2012;31:403–18.
OpenUrl CrossRef Medline Google Scholar
7.
1. Liu H,
2. D'Andrade P,
3. Fulmer-Smentek S,
4. Lorenzi P,
5. Kohn KW,
6. Weinstein JN,
7. et al.
mRNA and microRNA expression profiles of the NCI-60 integrated with drug activities. Mol Cancer Ther 2010;9:1080–91.
OpenUrl Abstract/FREE Full Text Google Scholar
8.
1. Weinstein JN,
2. Pommier Y
. Connecting genes, drugs and diseases. Nat Biotechnol 2006;24:1365–6.
OpenUrl CrossRef Medline Google Scholar
9.
1. Barretina J,
2. Caponigro G,
3. Stransky N,
4. Venkatesan K,
5. Margolin AA,
6. Kim S,
7. et al.
The cancer cell line encyclopedia enables predictive modelling of anticancer drug sensitivity. Nature 2012;483:603–7.
OpenUrl CrossRef Medline Google Scholar
10.
1. Garnett MJ,
2. Edelman EJ,
3. Heidorn SJ,
4. Greenman CD,
5. Dastur A,
6. Lau KW,
7. et al.
Systematic identification of genomic markers of drug sensitivity in cancer cells. Nature 2012;483:570–5.
OpenUrl CrossRef Medline Google Scholar
11.
NCI60_WES_data_links. Available from: Cellminer: http://discover.nci./cellminer/Ingenuity: http://www./NCI60_WES BAM_files: http://watson.nci./projects/nci60/wes/BAMS/ DTP_drug_screen: http://dtp.nci./branches/btb/ivclsp.html DTP_molecular_targets_screen: http://dtp.nci./mtargets/mt_index.html (Last accessed 5/28/13).
12.
1. Lorenzi PL,
2. Reinhold WC,
3. Varma S,
4. Hutchinson AA,
5. Pommier Y,
6. Chanock SJ,
7. et al.
DNA fingerprinting of the NCI-60 cell line panel. Mol Cancer Ther 2009;8:713–24.
OpenUrl Abstract/FREE Full Text Google Scholar
13.
1. Li H,
2. Durbin R
. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 2009;25:1754–60.
OpenUrl Abstract/FREE Full Text Google Scholar
14.
1. DePristo MA,
2. Banks E,
3. Poplin R,
4. Garimella KV,
5. Maguire JR,
6. Hartl C,
7. et al.
A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nat Genet 2011;43:491–8.
OpenUrl CrossRef Medline Google Scholar
15.
1. Li H,
2. Handsaker B,
3. Wysoker A,
4. Fennell T,
5. Ruan J,
6. Homer N,
7. et al.
The sequence alignment/map format and SAMtools. Bioinformatics 2009;25:2078–9.
OpenUrl Abstract/FREE Full Text Google Scholar
16.
1. Wang K,
2. Li M,
3. Hakonarson H
. ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data. Nucleic Acids Res 2010;38:e164.
OpenUrl Abstract/FREE Full Text Google Scholar
17.
1. Rubinstein LV,
2. Shoemaker RH,
3. Paull KD,
4. Simon RM,
5. Tosini S,
6. Skehan P,
7. et al.
Comparison of in vitro anticancer-drug-screening data generated with a tetrazolium assay versus a protein assay against a diverse panel of human tumor cell lines. J Natl Cancer Inst 1990;82:1113–8.
OpenUrl Abstract/FREE Full Text Google Scholar
18.
1. Reinhold WC,
2. Sunshine M,
3. Liu H,
4. Varma S,
5. Kohn KW,
6. Morris J,
7. et al.
CellMiner: a web-based suite of genomic and pharmacologic tools to explore transcript and drug patterns in the NCI-60 cell line set. Cancer Res 2012;72:3499–511.
OpenUrl Abstract/FREE Full Text Google Scholar
19.
1. van der Laan MJ,
2. Polley EC,
3. Hubbard AE
. Super Learner. Stat Appl Genet Mol Biol 2007;6:Article25.
Google Scholar
20.
1. Albertson TM,
2. Ogawa M,
3. Bugni JM,
4. Hays LE,
5. Chen Y,
6. Wang Y,
7. et al.
DNA polymerase epsilon and delta proofreading suppress discrete mutator and cancer phenotypes in mice. Proc Natl Acad Sci U S A 2009;106:17101–4.
OpenUrl Abstract/FREE Full Text Google Scholar
21.
The Cancer Genome Atlas Network. Comprehensive molecular characterization of human colon and rectal cancer. Nature 2012;487:330–7.
OpenUrl CrossRef Medline Google Scholar
22.
1. Ikehata H,
2. Ono T
. The mechanisms of UV mutagenesis. J Radiat Res (Tokyo) 2011;52:115–25.
OpenUrl Abstract/FREE Full Text Google Scholar
23.
1. DeMarini DM
. Genotoxicity of tobacco smoke and tobacco smoke condensate: a review. Mutat Res 2004;567:447–74.
OpenUrl CrossRef Medline Google Scholar
24.
1. Forbes S,
2. Clements J,
3. Dawson E,
4. Bamford S,
5. Webb T,
6. Dogan A,
7. et al.
Cosmic 2005. Br J Cancer 2006;94:318–22.
OpenUrl CrossRef Medline Google Scholar
25.
1. Ng PC,
2. Henikoff S
. Predicting deleterious amino acid substitutions. Genome Res 2001;11:863–74.
OpenUrl Abstract/FREE Full Text Google Scholar
26.
1. Adzhubei IA,
2. Schmidt S,
3. Peshkin L,
4. Ramensky VE,
5. Gerasimova A,
6. Bork P,
7. et al.
A method and server for predicting damaging missense mutations. Nat Methods 2010;7:248–9.
OpenUrl CrossRef Medline Google Scholar
27.
1. Abecasis GR,
2. Altshuler D,
3. Auton A,
4. Brooks LD,
5. Durbin RM,
6. Gibbs RA,
7. et al.
A map of human genome variation from population-scale sequencing. Nature 2010;467:1061–73.
OpenUrl CrossRef Medline Google Scholar
28.
1. Fu W,
2. O'Connor TD,
3. Jun G,
4. Kang HM,
5. Abecasis G,
6. Leal SM,
7. et al.
Analysis of 6,515 exomes reveals the recent origin of most human protein-coding variants. Nature 2013;493:216–20.
OpenUrl CrossRef Medline Google Scholar
29.
1. Issaeva N,
2. Bozko P,
3. Enge M,
4. Protopopova M,
5. Verhoef LG,
6. Masucci M,
7. et al.
Small molecule RITA binds to p53, blocks p53-HDM-2 interaction and activates p53 function in tumors. Nat Med 2004;10:1321–8.
OpenUrl CrossRef Medline Google Scholar
30.
1. Nieves-Neira W,
2. Rivera MI,
3. Kohlhagen G,
4. Hursey ML,
5. Pourquier P,
6. Sausville EA,
7. et al.
DNA protein cross-links produced by NSC 652287, a novel thiophene derivative active against human renal cancer cells. Mol Pharmacol 1999;56:478–84.
OpenUrl Abstract/FREE Full Text Google Scholar
31.
1. He L,
2. He X,
3. Lim LP,
4. de Stanchina E,
5. Xuan Z,
6. Liang Y,
7. et al.
A microRNA component of the p53 tumour suppressor network. Nature 2007;447:1130–4.
OpenUrl CrossRef Medline Google Scholar
32.
1. Duesbery NS,
2. Vande Woude GF
. Anthrax lethal factor causes proteolytic inactivation of mitogen-activated protein kinase kinase. J Appl Microbiol 1999;87:289–93.
OpenUrl CrossRef Medline Google Scholar
33.
1. Wheeler DL,
2. Dunn EF,
3. Harari PM
. Understanding resistance to EGFR inhibitors-impact on future treatment strategies. Nat Rev Clin Oncol 2010;7:493–507.
OpenUrl CrossRef Medline Google Scholar
34.
1. Bell DW,
2. Sikdar N,
3. Lee KY,
4. Price JC,
5. Chatterjee R,
6. Park HD,
7. et al.
Predisposition to cancer caused by genetic and functional defects of mammalian Atad5. PLoS Genet 2011;7:e1002245.
OpenUrl CrossRef Medline Google Scholar
35.
1. Davidson MB,
2. Katou Y,
3. Keszthelyi A,
4. Sing TL,
5. Xia T,
6. Ou J,
7. et al.
Endogenous DNA replication stress results in expansion of dNTP pools and a mutator phenotype. EMBO J 2012;31:895–907.
OpenUrl CrossRef Medline Google Scholar
36.
1. Fox JT,
2. Lee KY,
3. Myung K
. Dynamic regulation of PCNA ubiquitylation/deubiquitylation. FEBS Lett 2011;585:2780–5.
OpenUrl CrossRef Medline Google Scholar
37.
1. Mertins SD,
2. Myers TG,
3. Holbeck SL,
4. Medina-Perez W,
5. Wang E,
6. Kohlhagen G,
7. et al.
In vitro evaluation of dimethane sulfonate analogues with potential alkylating activity and selective renal cell carcinoma cytotoxicity. Mol Cancer Ther 2004;3:849–60.
OpenUrl Abstract/FREE Full Text Google Scholar
38.
1. Dalgliesh GL,
2. Furge K,
3. Greenman C,
4. Chen L,
5. Bignell G,
6. Butler A,
7. et al.
Systematic sequencing of renal carcinoma reveals inactivation of histone modifying genes. Nature 2010;463:360–3.
OpenUrl CrossRef Medline Google Scholar
39.
1. Ding L,
2. Getz G,
3. Wheeler DA,
4. Mardis ER,
5. McLellan MD,
6. Cibulskis K,
7. et al.
Somatic mutations affect key pathways in lung adenocarcinoma. Nature 2008;455:1069–75.
OpenUrl CrossRef Medline Google Scholar
40.
1. Lee W,
2. Jiang Z,
3. Liu J,
4. Haverty PM,
5. Guan Y,
6. Stinson J,
7. et al.
The mutation spectrum revealed by paired genome sequences from a lung cancer patient. Nature 2010;465:473–7.
OpenUrl CrossRef Medline Google Scholar
41.
1. Varela I,
2. Tarpey P,
3. Raine K,
4. Huang D,
5. Ong CK,
6. Stephens P,
7. et al.
Exome sequencing identifies frequent mutation of the SWI/SNF complex gene PBRM1 in renal carcinoma. Nature 2011;469:539–42.
OpenUrl CrossRef Medline Google Scholar
42.
1. Berger MF,
2. Lawrence MS,
3. Demichelis F,
4. Drier Y,
5. Cibulskis K,
6. Sivachenko AY,
7. et al.
The genomic complexity of primary human prostate cancer. Nature 2011;470:214–20.
OpenUrl CrossRef Medline Google Scholar
43.
1. Ley TJ,
2. Ding L,
3. Walter MJ,
4. McLellan MD,
5. Lamprecht T,
6. Larson DE,
7. et al.
DNMT3A mutations in acute myeloid leukemia. N Engl J Med 2010;363:2424–33.
OpenUrl CrossRef Medline Google Scholar
44.
1. Jones S,
2. Wang TL,
3. Shih Ie M,
4. Mao TL,
5. Nakayama K,
6. Roden R,
7. et al.
Frequent mutations of chromatin remodeling gene ARID1A in ovarian clear cell carcinoma. Science 2010;330:228–31.
OpenUrl Abstract/FREE Full Text Google Scholar
45.
1. Wei X,
2. Walia V,
3. Lin JC,
4. Teer JK,
5. Prickett TD,
6. Gartner J,
7. et al.
Exome sequencing identifies GRIN2A as frequently mutated in melanoma. Nat Genet 2011;43:442–6.
OpenUrl CrossRef Medline Google Scholar
46.
1. Solomon DA,
2. Kim T,
3. Diaz-Martinez LA,
4. Fair J,
5. Elkahloun AG,
6. Harris BT,
7. et al.
Mutational inactivation of STAG2 causes aneuploidy in human cancer. Science 2011;333:1039–43.
OpenUrl Abstract/FREE Full Text Google Scholar
47.
1. Sjoblom T,
2. Jones S,
3. Wood LD,
4. Parsons DW,
5. Lin J,
6. Barber TD,
7. et al.
The consensus coding sequences of human breast and colorectal cancers. Science 2006;314:268–74.
OpenUrl Abstract/FREE Full Text Google Scholar
48.
1. Vassilev LT
. MDM2 inhibitors for cancer therapy. Trends Mol Med 2007;13:23–31.
OpenUrl CrossRef Medline Google Scholar
49.
1. Ikediobi ON,
2. Davies H,
3. Bignell G,
4. Edkins S,
5. Stevens C,
6. O'Meara S,
7. et al.
Mutation analysis of 24 known cancer genes in the NCI-60 cell line set. Mol Cancer Ther 2006;5:2606–12.
OpenUrl Abstract/FREE Full Text Google Scholar
50.
1. Kohn KW,
2. Aladjem MI
. Circuit diagrams for biological networks. Mol Syst Biol 2006;2:2006 0002.
Google Scholar
51.
1. Barrett JC,
2. Fry B,
3. Maller J,
4. Daly MJ
. HaploView: analysis and visualization of LD and haplotype maps. Bioinformatics 2005;21:263–5.
OpenUrl Abstract/FREE Full Text Google Scholar