极市导读 ICCV2021结果出炉!你的论文中了吗? >>加入极市CV技术交流群,走在计算机视觉的最前沿 神经网络结构设计(Neural Network Structure Design)Transformer[3] Rethinking Spatial Dimensions of Vision Transformers [2] Generic Attention-model Explainability for Interpreting Bi-Modal and Encoder-Decoder Transformers(Oral) [1] Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions(Oral) 检测图像目标检测(2D Object Detection)[5] Active Learning for Deep Object Detection via Probabilistic Modeling [4] Detecting Invisible People [3] Conditional Variational Capsule Network for Open Set Recognition [2] MDETR : Modulated Detection for End-to-End Multi-Modal Understanding(Oral) [1] DetCo: Unsupervised Contrastive Learning for Object Detection 分割(Segmentation)图像分割(Image Segmentation)[2] Labels4Free: Unsupervised Segmentation using StyleGAN [1] Mining Latent Classes for Few-shot Segmentation(Oral) 实例分割(Instance Segmentation)[2] Crossover Learning for Fast Online Video Instance Segmentation [1] Instances as Queries 语义分割(Semantic Segmentation)[1] Calibrated Adversarial Refinement for Stochastic Semantic Segmentation GAN/生成式/对抗式(GAN/Generative/Adversarial)[2] Labels4Free: Unsupervised Segmentation using StyleGAN [1] EigenGAN: Layer-Wise Eigen-Learning for GANs 图像处理(Image Processing)[1] Equivariant Imaging: Learning Beyond the Range Space(Oral) 超分辨率(Super Resolution)[1] Learning for Scale-Arbitrary Super-Resolution from Scale-Specific Networks 风格迁移(Style Transfer)[1] Multiple Heads are Better than One: Few-shot Font Generation with Multiple Localized Experts(字体生成) 估计(Estimation)姿态估计(Human Pose Estimation)[1] HuMoR: 3D Human Motion Model for Robust Pose Estimation(Oral) 图像&视频检索/理解(Image&Video Retrieval/Video Understanding)行人重识别/检测(Re-Identification/Detection)[1] TransReID: Transformer-based Object Re-Identification 视觉定位(Visual Localization)[2] TS-CAM: Token Semantic Coupled Attention Map for Weakly Supervised Object Localization [1] Boundary-sensitive Pre-training for Temporal Localization in Videos 图像匹配(Image Matching)[1] COTR: Correspondence Transformer for Matching Across Images 三维视觉(3D Vision)[1] MVTN: Multi-View Transformation Network for 3D Shape Recognition 目标跟踪(Object Tracking)[1] Detecting Invisible People 遥感图像(Remote Sensing Image)[1] Seasonal Contrast: Unsupervised Pre-Training from Uncurated Remote Sensing Data 场景图(Scene Graph场景图生成(Scene Graph Generation)[1] Unconstrained Scene Generation with Locally Conditioned Radiance Fields 场景图预测(Scene Graph Prediction)[1] Generative Compositional Augmentations for Scene Graph Prediction 数据处理(Data Processing)数据增广(Data Augmentation)[1] MixMo: Mixing Multiple Inputs for Multiple Outputs via Deep Subnetworks 异常检测(Anomaly Detection)[1] Weakly-supervised Video Anomaly Detection with Robust Temporal Feature Magnitude Learning 表征学习(Representation Learning)[1] In-Place Scene Labelling and Understanding with Implicit Scene Representation(Oral) 迁移学习(Transfer Learning)[2] Seasonal Contrast: Unsupervised Pre-Training from Uncurated Remote Sensing Data [1] Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling 度量学习(Metric Learning)[1] Learning with Memory-based Virtual Classes for Deep Metric Learning 增量学习(Incremental Learning)[1] Always Be Dreaming: A New Approach for Data-Free Class-Incremental Learning 对比学习(Contrastive Learning)[1] CoMatch: Semi-supervised Learning with Contrastive Graph Regularization 主动学习(Active Learning)[1] Active Learning for Deep Object Detection via Probabilistic Modeling 视觉推理/视觉问答(Visual Reasoning/VQA)[2] On the hidden treasure of dialog in video question answering [1] Just Ask: Learning to Answer Questions from Millions of Narrated Videos(Oral) 数据集(Dataset)[1] 4DComplete: Non-Rigid Motion Estimation Beyond the Observable Surface(4D重建) 其他分类Pathdreamer: A World Model for Indoor Navigation(视觉导航) IPOKE: POKING A STILL IMAGE FOR CONTROLLED STOCHASTIC VIDEO SYNTHESIS Putting NeRF on a Diet: Semantically Consistent Few-Shot View Synthesis KiloNeRF: Speeding up Neural Radiance Fields with Thousands of Tiny MLPs
|
|