3D Human pose estimation in video with temporal convolutions and semi-supervised training

推荐理由

基于单目视频实现三维人体姿态估计的经典方法，文中在时序上对多帧图像的二维人体关键点使用空洞时域卷积，较为准确地估计当前帧的人体姿态；并应用半监督学习的方法，有效利用了不含标签的视频数据训练模型。

文章简介
期刊	CVPR 2019
发表年份	2018
DOI	10.48550/arXiv.1811.11742
类型	研究性工作
领域	计算机视觉
引用量	{{{Citation_}}}

推荐信息
推荐人	阿璐思
审核	王伟明张琛
推荐小组	动捕小组

摘要

In this work, we demonstrate that 3D poses in video can be effectively estimated with a fully convolutional model based on dilated temporal convolutions over 2D key- points. We also introduce back-projection, a simple and effective semi-supervised training method that leverages unlabeled video data. We start with predicted 2D key- points for unlabeled video, then estimate 3D poses and finally back-project to the input 2D keypoints. In the supervised setting, our fully-convolutional model outper- forms the previous best result from the literature by 6 mm mean per-joint position error on Human3.6M, correspond- ing to an error reduction of 11%, and the model also shows significant improvements on HumanEva-I. More- over, experiments with back-projection show that it comfort- ably outperforms previous state-of-the-art results in semi- supervised settings where labeled data is scarce. Code and models are available athttps://github.com/ facebookresearch/VideoPose3D

细分领域

< | 人体姿态估计 | 三维