R语言数据可视化-桑基图

R语言数据可视化-桑基图

桑基图(Sankey diagram),又叫桑基能量分流图或者桑基能量平衡图,可从线条的走向,粗细的变化和节点间的比较对数据进行分析。

桑基图主要由边、流量和支点组成,其中边代表了流动的数据,流量代表了流动数据的具体数值,节点代表了不同的分类。可以展示一组数据到另一组数据的分流情况,其分支的宽度代表数据流量的大小。数据流向的实体被称为节点,数据流起源的节点是源节点,流结束的节点是目标节点

该类图比较适用于用户流量等数据的可视化分析,通常应用于能源、材料成分和金融等领域数据的可视化分析。

桑基图最明显的特征就是,始末端的分支宽度总和相等,即所有主支宽度的总和应与所有分出去的分支宽度的总和相等,保持能量的平衡。

本文介绍一种基于ggplot2的绘制桑基图的R包ggalluvial包,该包功能强大,绘制的图片美观。

加载包

library(ggalluvial)

1、绘制基本桑基图

ggplot(as.data.frame(UCBAdmissions),
       aes(y = Freq, axis1 = Gender, axis2 = Dept)) +
  geom_alluvium(aes(fill = Admit), width = 1/14) +
  geom_stratum(width = 1/8, fill = "green",color = "black") + #设置节点及边框颜色
  geom_text(stat = "stratum", aes(label = after_stat(stratum))) + #设置节点标签
  scale_x_discrete(limits = c("Gender", "Dept"), 
                   expand = c(0, 0)) +
  scale_fill_brewer(type = "qual",palette = 6) + #颜色填充
  ggtitle("UC Berkeley admissions and rejections, by sex and department")



2、自定义桑基图颜色和格式

ggplot(as.data.frame(HairEyeColor),
       aes(y = Freq,axis1 = Hair, axis2 = Eye, fill = Eye,axis3 = Sex))+
  geom_flow()+
  geom_alluvium(aes(fill = Eye),
                curve_type = "sine") +
  scale_fill_manual(values = c(Brown = "#70493D", 
                               Hazel = "#E2AC76",
                               Green = "#3F752B", 
                               Blue  = "#81B0E4")) +
  guides(fill = "none") +
  geom_stratum(alpha = .2) +
  geom_text(stat = "stratum", 
            size=3,
            aes(label = after_stat(stratum)),
            reverse = T) +
  scale_x_continuous(breaks = 1:3, 
                     expand = c(0,0),
                     labels = c("Hair", "Eye", "Sex")) +
  ggtitle("Eye colors of 592 subjects, by sex and hair color")



3、基于ggplot2绘制

data(vaccinations)
vaccinations <- transform(vaccinations,
                          response = factor(response,
                                  rev(levels(response))))
ggplot(vaccinations,
       aes(x = survey, 
           stratum = response, 
           alluvium = subject,
           y = freq,
           fill = response, 
           label = response)) +
  scale_x_discrete(expand = c(0, 0 )) +
  geom_flow() +
  geom_stratum(alpha = .5) +
  geom_text(stat = "stratum", size = 3) +
  theme(legend.position = "none") +
  ggtitle("vaccination survey responses at three points in time")



4、修改支点标签(基于ggfittext)

ggplot(vaccinations,
       aes(x = survey, 
           stratum = response, 
           alluvium = subject, 
           y = freq,
           fill = response, 
           label = response)) +
  scale_x_discrete(expand = c(.1, 0)) +
  geom_flow(width = 1/4) +
  geom_stratum(alpha = .5, width = 1/4) +
  ggfittext::geom_fit_text(stat = "stratum", 
                           width = 1/4, 
                           min.size = 3) + #修改支点标签
  theme(legend.position = "none") +
  ggtitle("vaccination survey responses", "labeled using `geom_fit_text()`")


ggauuvial包还可以绘制分面桑基图,多个支点桑基图等。还可以将桑基图嵌入shiny中,动态展示。有兴趣的读者可参考官方文档(Alluvial Plots in ggplot2 • ggalluvial (corybrunson.github.io))。

分享更多R语言知识,请关注公众号【数据统计和机器学习】进行交流学习。后台回复“桑基图”索取代码。如果对您有帮助请转发【点赞+在看】

参考资料

corybrunson.github.io/g

发布于 2023-04-05 10:45・IP 属地河南