本文接上篇文章: 浅聊对比学习(Contrastive Learning)第一弹 这次主要是想记录下最近读的三篇对比学习的经典 paper: SimCLR-A Simple Framework for Contrasting Learning of Visual Representations https:///abs/2002.05709 https:///abs/2006.10029 https:///abs/1801.04062 题外话:SimCLR 和 SimCLRV2 看完后觉得 CV 的炼丹之路:路漫漫其修远兮。 SimCLR在组内做paper reading的记录: https://bytedance./docx/doxcn0hAWqSip1niZE2ZCpyO4yb 顺带安利下公司的飞书文档,YYDS! 最近学到一个新词儿:「缝合怪」~ 1.1 One Sentence Summary
1.3 实验 实验图太多就不贴了,想看 detail 的可以直接看我的 paper_reading 记录: https://bytedance./docx/doxcn0hAWqSip1niZE2ZCpyO4yb 只对个人认为有意思的点说下。 ▲ Furthermore, even when nonlinear projection is used, the layer before the projection head, h, is still much better (>10%) than the layer after, z = g(h), which shows that the hidden layer before the projection head is a better representation than the layer after
SimCLRV2在组内做 paper reading 的记录: https://bytedance./docx/doxcn9P6oMZzuwZOrYAhYc5AUUe
实验图就不全贴了,想看 detail 的可以直接看我的 paper_reading 记录: https://bytedance./docx/doxcn9P6oMZzuwZOrYAhYc5AUUe 只对个人认为有意思的点说下。 看一张放在论文首页的图吧(按照李沐大神的话说,放在首页的图一定是非常牛逼的图!),确实可以看出只用 1% 的数据+标签,就能获得到和有监督学习用 100% 数据+标签的效果;用 10% 的数据+标签就已经超过 SOTA 了,确实还是挺牛逼的。● Distillation Using Unlabeled Data Improves Semi-Supervised Learning这个实验很有意思的点是,无论是 self-distillation 还是 distill small model,效果都比 teacher model 效果好,这里的解释可以看这里:Link | arxiv,很有趣~主要在说:distill 能让 student 学到更多的视图,从而提升了效果~● Bigger Models Are More Label-Efficient● Bigger/Deeper Projection Heads Improve Representation LearningMINE在组内做 paper reading 的记录: https://bytedance./docx/doxcnMHzZBeWFV4HZV6NAU7W73o
实验图就不全贴了,想看 detail 的可以直接看我的 paper_reading 记录: https://bytedance./docx/doxcnMHzZBeWFV4HZV6NAU7W73o 只对个人认为有意思的点说下。 参考文献 [1] 极市平台:深度学习三大谜团:集成、知识蒸馏和自蒸馏 [2] Chen T, Kornblith S, Norouzi M, et al. A simple framework for contrastive learning of visual representations[C]//International conference on machine learning. PMLR, 2020: 1597-1607. [3] Chen T, Kornblith S, Swersky K, et al. Big self-supervised models are strong semi-supervised learners[J]. Advances in neural information processing systems, 2020, 33: 22243-22255. [4] Belghazi M I, Baratin A, Rajeshwar S, et al. Mutual information neural estimation[C]//International conference on machine learning. PMLR, 2018: 531-540. [5] Allen-Zhu Z, Li Y. Towards understanding ensemble, knowledge distillation and self-distillation in deep learning[J]. arXiv preprint arXiv:2012.09816, 2020. |
|