Author: Alan O’Callaghan (alan.b.ocallaghan@gmail.com)
Due to the size of the objects, this file is seperated to three files. You can view this series in the following links:
This is file 2 in the series.
# Let's load the packages
library(heatmaply)
#> Loading required package: plotly
#> Loading required package: ggplot2
#>
#> Attaching package: 'plotly'
#> The following object is masked from 'package:ggplot2':
#>
#> last_plot
#> The following object is masked from 'package:stats':
#>
#> filter
#> The following object is masked from 'package:graphics':
#>
#> layout
#> Loading required package: viridis
#> Loading required package: viridisLite
#>
#> ---------------------
#> Welcome to heatmaply version 0.11.1
#> Type ?heatmaply for the main documentation.
#> The github page is: https://github.com/talgalili/heatmaply/
#>
#> Suggestions and bug-reports can be submitted at: https://github.com/talgalili/heatmaply/issues
#> Or contact: <tal.galili@gmail.com>
#>
#> To suppress this message use: suppressPackageStartupMessages(library(heatmaply))
#> ---------------------
library(heatmaplyExamples)
Breast cancer has been studied extensively using gene expression profiling methods (see Breast cancer gene expression). This has lead to the identification of a number of gene sets which stratify patients into molecular subgroups. One such gene set is commonly known as the PAM50 gene set. In this example, we will visualize gene expression patterns using heatmaply.
Centering data before performing clustering tends to result in more meaningful cluster assignment. This is particularly true when the measure of interest is the similarity in patterns across features, rather than the total distance between values. Furthermore, it is typically the difference between samples which is of interest, rather than the difference between measures. Non-centered data may show that all samples measure high for one variable, and low for another, while centered data shows relative differences. Alternatively, one could use a distance measure which is invariant to total distance, such as correlation. Heatmaps of non-centered are shown in another vignette within this package. It can be seen that the concordance between the centered and non-centered heatmaps is mediocre, and the clustering of triple negative samples is not as definite.
In the heatmaps shown below, it is clear that samples appear to cluster loosely based on PAM50 subtype more than the previous examples. Concordance with the assigned labels shown in the row annotation is not complete, however this may be expected, given that a different clustering method was used here (hierarchical clustering, rather than k-medioids).
pam50_genes <- intersect(pam50_genes, rownames(raw_expression))
raw_pam50_expression <- raw_expression[pam50_genes, ]
voomed_pam50_expression <- voomed_expression[pam50_genes, ]
center_raw_mat <- raw_pam50_expression -
apply(raw_pam50_expression, 1, median)
raw_max <- max(abs(center_raw_mat), na.rm=TRUE)
raw_limits <- c(-raw_max, raw_max)
heatmaply(t(center_raw_mat),
row_side_colors = tcga_brca_clinical,
showticklabels = c(FALSE, FALSE),
fontsize_col = 7.5,
col = cool_warm(100),
main = 'Centred log2 read counts, PAM50 genes',
limits = raw_limits,
plot_method = 'plotly')
Note that heatmaply_cor
is just like heatmaply
but with defaults that are better suited for correlation matrix (limits from -1 to 1, and a cold-warm color scheme).
heatmaply_cor(cor(center_raw_mat),
row_side_colors = tcga_brca_clinical,
showticklabels = c(FALSE, FALSE),
main = 'Sample-sample correlation based on centred, log2 PAM50 read counts',
plot_method = 'plotly')