动捕小组/计算机视觉/3D Human pose estimation in video with temporal convolutions and semi-supervised training:修订间差异

来自NERCN
(Edit summary)
 
(Edit summary)
第16行: 第16行:
supervised settings where labeled data is scarce. Code
supervised settings where labeled data is scarce. Code
and models are available athttps://github.com/
and models are available athttps://github.com/
facebookresearch/VideoPose3D|DetaialsDM=人体姿态估计-三维}}
facebookresearch/VideoPose3D|DetaialsDM=人体姿态估计-三维|Citation_=1005}}

2024年1月18日 (四) 22:49的版本

推荐理由

基于单目视频实现三维人体姿态估计的经典方法,文中在时序上对多帧图像的二维人体关键点使用空洞时域卷积,较为准确地估计当前帧的人体姿态;并应用半监督学习的方法,有效利用了不含标签的视频数据训练模型。

文章简介
期刊 CVPR 2019
发表年份 2018
DOI 10.48550/arXiv.1811.11742
类型 研究性工作
领域 计算机视觉
引用量 1005
推荐信息
推荐人 阿璐思
审核 王伟明 张琛
推荐小组 动捕小组

摘要

In this work, we demonstrate that 3D poses in video can be effectively estimated with a fully convolutional model based on dilated temporal convolutions over 2D key- points. We also introduce back-projection, a simple and effective semi-supervised training method that leverages unlabeled video data. We start with predicted 2D key- points for unlabeled video, then estimate 3D poses and finally back-project to the input 2D keypoints. In the supervised setting, our fully-convolutional model outper- forms the previous best result from the literature by 6 mm mean per-joint position error on Human3.6M, correspond- ing to an error reduction of 11%, and the model also shows significant improvements on HumanEva-I. More- over, experiments with back-projection show that it comfort- ably outperforms previous state-of-the-art results in semi- supervised settings where labeled data is scarce. Code and models are available athttps://github.com/ facebookresearch/VideoPose3D

细分领域

< | 人体姿态估计 | 三维