3D Human pose estimation in video with temporal convolutions and semi-supervised training
推荐理由
基于单目视频实现三维人体姿态估计的经典方法,文中在时序上对多帧图像的二维人体关键点使用空洞时域卷积,较为准确地估计当前帧的人体姿态;并应用半监督学习的方法,有效利用了不含标签的视频数据训练模型。
文章简介 | |
---|---|
期刊 | CVPR 2019 |
发表年份 | 2018 |
DOI | 10.48550/arXiv.1811.11742 |
类型 | 研究性工作 |
领域 | 计算机视觉 |
引用量 | 1005 |
推荐信息 | |
---|---|
推荐人 | 阿璐思 |
审核 | 王伟明 张琛 |
推荐小组 | 动捕小组 |
摘要
In this work, we demonstrate that 3D poses in video can be effectively estimated with a fully convolutional model based on dilated temporal convolutions over 2D key- points. We also introduce back-projection, a simple and effective semi-supervised training method that leverages unlabeled video data. We start with predicted 2D key- points for unlabeled video, then estimate 3D poses and finally back-project to the input 2D keypoints. In the supervised setting, our fully-convolutional model outper- forms the previous best result from the literature by 6 mm mean per-joint position error on Human3.6M, correspond- ing to an error reduction of 11%, and the model also shows significant improvements on HumanEva-I. More- over, experiments with back-projection show that it comfort- ably outperforms previous state-of-the-art results in semi- supervised settings where labeled data is scarce. Code and models are available athttps://github.com/ facebookresearch/VideoPose3D
细分领域