3D Human pose estimation in video with temporal convolutions and semi-supervised training

来自NERCN
Admin讨论 | 贡献2023年11月6日 (一) 14:57的版本 (Edit summary)
(差异) ←上一版本 | 最后版本 (差异) | 下一版本→ (差异)

推荐理由

基于单目视频实现三维人体姿态估计的经典方法,文中在时序上对多帧图像的二维人体关键点使用空洞时域卷积,较为准确地估计当前帧的人体姿态;并应用半监督学习的方法,有效利用了不含标签的视频数据训练模型。

文章简介
期刊 CVPR 2019
发表年份 2018
DOI 10.48550/arXiv.1811.11742
类型 研究性工作
领域 计算机视觉
引用量 {{{Citation_}}}
推荐信息
推荐人 阿璐思
审核 王伟明 张琛
推荐小组 动捕小组

摘要

In this work, we demonstrate that 3D poses in video can be effectively estimated with a fully convolutional model based on dilated temporal convolutions over 2D key- points. We also introduce back-projection, a simple and effective semi-supervised training method that leverages unlabeled video data. We start with predicted 2D key- points for unlabeled video, then estimate 3D poses and finally back-project to the input 2D keypoints. In the supervised setting, our fully-convolutional model outper- forms the previous best result from the literature by 6 mm mean per-joint position error on Human3.6M, correspond- ing to an error reduction of 11%, and the model also shows significant improvements on HumanEva-I. More- over, experiments with back-projection show that it comfort- ably outperforms previous state-of-the-art results in semi- supervised settings where labeled data is scarce. Code and models are available athttps://github.com/ facebookresearch/VideoPose3D

细分领域

< | 人体姿态估计 | 三维