Graphical Abstract Figure
Graphical Abstract Figure
Close modal

Abstract

Computer vision (CV) algorithms require large annotated datasets that are often labor-intensive and expensive to create. We propose AnnotateXR, an extended reality (XR) workflow to collect various high-fidelity data and auto-annotate it in a single demonstration. AnnotateXR allows users to align virtual models over physical objects, tracked with six degrees-of-freedom (6DOF) sensors. AnnotateXR utilizes a hand tracking capable XR head-mounted display coupled with 6DOF information and collision detection to enable algorithmic segmentation of different actions in videos through its digital twin. The virtual–physical mapping provides a tight bounding volume to generate semantic segmentation masks for the captured image data. Alongside supporting object and action segmentation, we also support other dimensions of annotation required by modern CV, such as human–object, object–object, and rich 3D recordings, all with a single demonstration. Our user study shows AnnotateXR produced over 112,000 annotated data points in 67 min.

References

1.
Huang
,
X.
,
Cheng
,
X.
,
Geng
,
Q.
,
Cao
,
B.
,
Zhou
,
D.
,
Wang
,
P.
,
Lin
,
Y.
, and
Yang
,
R.
,
2018
, “
The Apolloscape Dataset for Autonomous Driving
,” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp.
954
960
.
2.
Levinson
,
J.
,
Askeland
,
J.
,
Becker
,
J.
,
Dolson
,
J.
,
Held
,
D.
,
Kammel
,
S.
,
Kolter
,
J. Z.
, et al.,
2011
, “
Towards Fully Autonomous Driving: Systems and Algorithms
,” 2011 IEEE Intelligent Vehicles Symposium (IV),
IEEE
, pp.
163
168
.
3.
Ronneberger
,
O.
,
Fischer
,
P.
, and
Brox
,
T.
,
2015
, “
U-Net: Convolutional Networks for Biomedical Image Segmentation
,” International Conference on Medical Image Computing and Computer-Assisted Intervention,
Springer
, pp.
234
241
.
4.
Geiger
,
A.
,
Lenz
,
P.
,
Stiller
,
C.
, and
Urtasun
,
R.
,
2013
, “
Vision Meets Robotics: The Kitti Dataset
,”
Int. J. Rob. Res.
,
32
(
11
), pp.
1231
1237
.
5.
Ipsita
,
A.
,
Duan
,
R.
,
Li
,
H.
,
Chidambaram
,
S.
,
Cao
,
Y.
,
Liu
,
M.
,
Quinn
,
A.
, and
Ramani
,
K.
,
2023
, “
The Design of a Virtual Prototyping System for Authoring Interactive Virtual Reality Environments From Real-World Scans
,”
ASME J. Comput. Inf. Sci. Eng.
,
24
(
3
), p.
031005
.
6.
Ipsita
,
A.
,
Li
,
H.
,
Duan
,
R.
,
Cao
,
Y.
,
Chidambaram
,
S.
,
Liu
,
M.
, and
Ramani
,
K.
,
2021
, “
VRFromX: From Scanned Reality to Interactive Virtual Experience With Human-In-The-Loop
,”
Extended Abstracts of the 2021 CHI Conference on Human Factors in Computing Systems, CHI EA ’21
,
Yokohama, Japan
,
May 8–13
,
Association for Computing Machinery
.
7.
Deng
,
J.
,
Dong
,
W.
,
Socher
,
R.
,
Li
,
L.-J.
,
Li
,
K.
, and
Fei-Fei
,
L.
,
2009
, “
ImageNet: A Large-Scale Hierarchical Image Database
,”
2009 IEEE Conference on Computer Vision and Pattern Recognition
,
Miami, FL
,
June 20–25
,
IEEE
, pp.
248
255
.
8.
Abu-El-Haija
,
S.
,
Kothari
,
N.
,
Lee
,
J.
,
Natsev
,
P.
,
Toderici
,
G.
,
Varadarajan
,
B.
, and
Vijayanarasimhan
,
S.
,
2016
, “YouTube-8M: A Large-Scale Video Classification Benchmark,” arXiv preprint arXiv:1609.08675.
9.
Chang
,
A. X.
,
Funkhouser
,
T.
,
Guibas
,
L.
,
Hanrahan
,
P.
,
Huang
,
Q.
,
Li
,
Z.
,
Savarese
,
S.
, et al.,
2015
, “Shapenet: An Information-Rich 3D Model Repository,” arXiv preprint arXiv:1512.03012.
10.
Kuehne
,
H.
,
Jhuang
,
H.
,
Garrote
,
E.
,
Poggio
,
T.
, and
Serre
,
T.
,
2011
, “
HMDB: A Large Video Database for Human Motion Recognition
,”
2011 International Conference on Computer Vision
,
Barcelona, Spain
,
Nov. 6–13
,
IEEE
, pp.
2556
2563
.
11.
Rai
,
N.
,
Chen
,
H.
,
Ji
,
J.
,
Desai
,
R.
,
Kozuka
,
K.
,
Ishizaka
,
S.
,
Adeli
,
E.
, and
Niebles
,
J. C.
,
2021
, “
Home Action Genome: Cooperative Compositional Action Understanding
,”
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
,
Virtual
,
June 19–24
, pp.
11184
11193
.
12.
Amazon
,
2005
, Amazon Mechanical Turk. https://www.mturk.com/.
13.
Amazon Web Services
,
2018
, Sagemaker Ground Truth, https://aws.amazon.com/sagemaker/groundtruth/.
14.
Supervisely
,
2017
, Supervisely, https://supervise.ly/.
15.
Anolytics
,
2016
, Anolytics, https://www.anolytics.ai/.
16.
Li
,
Q.
,
Ma
,
F.
,
Gao
,
J.
,
Su
,
L.
, and
Quinn
,
C. J.
,
2016
, “
Crowdsourcing High Quality Labels With a Tight Budget
,”
Proceedings of the Ninth ACM International Conference on Web Search and Data Mining, WSDM ’16
,
San Fransisco, CA
,
Feb. 22–25
,
Association for Computing Machinery
, pp.
237
246
.
17.
Yu
,
G.
,
Tu
,
J.
,
Wang
,
J.
,
Domeniconi
,
C.
, and
Zhang
,
X.
,
2021
, “
Active Multilabel Crowd Consensus
,”
IEEE Trans. Neural Netw. Learn. Syst.
,
32
(
4
), pp.
1448
1459
.
18.
Ji
,
J.
,
Krishna
,
R.
,
Fei-Fei
,
L.
, and
Niebles
,
J. C.
,
2020
, “
Action Genome: Actions as Compositions of Spatio-Temporal Scene Graphs
,”
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
,
Seattle, WA
,
June 16–18
.
19.
Richter
,
S. R.
,
Vineet
,
V.
,
Roth
,
S.
, and
Koltun
,
V.
,
2016
, “Playing for Data: Ground Truth From Computer Games,”
Computer Vision – ECCV 2016
,
B.
Leibe
,
J.
Matas
,
N.
Sebe
, and
M.
Welling
, eds.,
Springer International Publishing
, pp.
102
118
.
20.
Mo
,
K.
,
Qin
,
Y.
,
Xiang
,
F.
,
Su
,
H.
, and
Guibas
,
L.
,
2022
, “
O2O-Afford: Annotation-Free Large-Scale Object-Object Affordance Learning
,”
Proceedings of the 5th Conference on Robot Learning
,
A.
Faust
,
D.
Hsu
, and
G.
Neumann
, eds., Vol.
164
of Proceedings of Machine Learning Research, PMLR, pp.
1666
1677
.
21.
Tremblay
,
J.
,
To
,
T.
,
Sundaralingam
,
B.
,
Xiang
,
Y.
,
Fox
,
D.
, and
Birchfield
,
S.
,
2018
, “
Deep Object Pose Estimation for Semantic Robotic Grasping of Household Objects
,” arXiv preprint arXiv:1809.10790.
22.
de Melo
,
C. M.
,
Torralba
,
A.
,
Guibas
,
L.
,
DiCarlo
,
J.
,
Chellappa
,
R.
, and
Hodgins
,
J.
,
2022
, “
Next-Generation Deep Learning Based on Simulators and Synthetic Data
,”
Trends Cogn. Sci.
,
26
(
2
), pp.
174
187
.
23.
Huang
,
L.
,
Zhang
,
B.
,
Guo
,
Z.
,
Xiao
,
Y.
,
Cao
,
Z.
, and
Yuan
,
J.
,
2021
, “
Survey on Depth and RGB Image-Based 3D Hand Shape and Pose Estimation
,”
Virtual Real. Intell. Hardw.
,
3
(
3
), pp.
207
234
.
24.
Redmon
,
J.
,
Divvala
,
S.
,
Girshick
,
R.
, and
Farhadi
,
A.
,
2016
, “
You Only Look Once: Unified, Real-Time Object Detection
,”
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
,
Las Vegas, NV
,
June 16–July 1
, pp.
779
788
.
25.
Long
,
J.
,
Shelhamer
,
E.
, and
Darrell
,
T.
,
2015
, “
Fully Convolutional Networks for Semantic Segmentation
,”
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
,
Boston, MA
,
June 7–12
, pp.
3431
3440
.
26.
Shou
,
Z.
,
Wang
,
D.
, and
Chang
,
S.-F.
,
2016
, “
Temporal Action Localization in Untrimmed Videos Via Multi-stage CNNS
,”
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
,
Las Vegas, NV
,
June 16–July 1
, pp.
1049
1058
.
27.
Peng
,
S.
,
Liu
,
Y.
,
Huang
,
Q.
,
Zhou
,
X.
, and
Bao
,
H.
,
2019
, “
Pvnet: Pixel-Wise Voting Network for 6DOF Pose Estimation
,”
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
,
Long Beach, CA
,
June 16–20
, pp.
4561
4570
.
28.
Kwon
,
T.
,
Tekin
,
B.
,
Stühmer
,
J.
,
Bogo
,
F.
, and
Pollefeys
,
M.
,
2021
, “
H2O: Two Hands Manipulating Objects for First Person Interaction Recognition
,” Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp.
10138
10148
.
29.
Unmesh
,
A.
,
Jain
,
R.
,
Shi
,
J.
,
Chaithanya Manam
,
V. K.
,
Chi
,
H.-G.
,
Chidambaram
,
S.
,
Quinn
,
A.
, and
Ramani
,
K.
,
2024
, “
Interacting Objects: A Dataset of Object-Object Interactions for Richer Dynamic Scene Representations
,”
IEEE Rob. Autom. Lett.
,
9
(
1
), pp.
451
458
.
30.
Murez
,
Z.
,
As
,
T. V.
,
Bartolozzi
,
J.
,
Sinha
,
A.
,
Badrinarayanan
,
V.
, and
Rabinovich
,
A.
,
2020
, “
Atlas: End-to-End 3D Scene Reconstruction From Posed Images
,”
European Conference on Computer Vision
,
Glasgow, UK
,
Aug. 23–28
,
Springer
, pp.
414
431
.
31.
Wright
,
L.
, and
Davidson
,
S.
,
2020
, “
How to Tell the Difference Between a Model and a Digital Twin
,”
Adv. Model. Simul. Eng. Sci.
,
7
(
1
), pp.
1
13
.
32.
Hughes
,
A.
,
2018
, Forging the Digital Twin in Discrete Manufacturing: A Vision for Unity in the Virtual and Real Worlds.
33.
Antilatency
,
2021
. Antilatency. https://antilatency.com/.
34.
Oculus
,
2020
. “Oculus Quest 2,” From https://www.oculus.com/quest-2/, Accessed April 4, 2021.
35.
Jain
,
R.
,
Shi
,
J.
,
Duan
,
R.
,
Zhu
,
Z.
,
Qian
,
X.
, and
Ramani
,
K.
,
2023
, “
Ubi-TOUCH: Ubiquitous Tangible Object Utilization Through Consistent Hand-Object Interaction in Augmented Reality
,”
Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology, UIST ’23
,
San Fransisco, CA
,
Oct. 29–Nov. 2, 2022
,
Association for Computing Machinery
.
36.
Scheff-King
,
M.
,
2014
, Download, Edit and Print Your Own Parts From McMaster-Carr. https://www.instructables.com/Download-Edit-And-Print-Your-Own-Parts-From-McMast/.
37.
Stratasys
,
2022
. Grabcad Community. Retrieved March 8, 2022, From https://www.traceparts.com/en.
38.
Traceparts
,
1990
. “TraceParts,” Retrieved March 8, 2022, From https://www.traceparts.com/en.
39.
CVML
,
A.
,
2021
, “Vidat,” https://github.com/anucvml/vidat.
40.
Yu
,
F.
,
Chen
,
H.
,
Wang
,
X.
,
Xian
,
W.
,
Chen
,
Y.
,
Liu
,
F.
,
Madhavan
,
V.
,
Darrell
,
T.
,
2018
, Bdd100k: A Diverse Driving Dataset for Heterogeneous Multitask Learning. DOI: 10.48550/ARXIV.1805.04687, https://arxiv.org/abs/1805.04687.
41.
Kawaguchi
,
K.
,
Kaelbling
,
L. P.
, and
Bengio
,
Y.
,
2017
, “Generalization in Deep Learning,” arXiv preprint arXiv:1710.05468.
42.
Redmon
,
J.
, and
Farhadi
,
A.
,
2017
, “
YOLO9000: Better, Faster, Stronger
,”
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
,
Honolulu, HI
,
July 21–26
.
43.
Redmon
,
J.
, and
Farhadi
,
A.
,
2018
, YOLOV3: An Incremental Improvement. DOI: 10.48550/ARXIV.1804.02767, https://arxiv.org/abs/1804.02767.
44.
Liu
,
Y.
,
Wang
,
L.
,
Ma
,
X.
,
Wang
,
Y.
, and
Qiao
,
Y.
,
2021
, Fineaction: A Fine-Grained Video Dataset for Temporal Action Localization. DOI: 10.48550/ARXIV.2105.11107, https://arxiv.org/abs/2105.11107.
45.
Lin
,
T.-Y.
,
Maire
,
M.
,
Belongie
,
S.
,
Hays
,
J.
,
Perona
,
P.
,
Ramanan
,
D.
,
Dollár
,
P.
, and
Zitnick
,
C. L.
,
2014
, “
Microsoft COCO: Common Objects in Context
,”
European Conference on Computer Vision
,
Zurich, Switzerland
,
Sept. 6–12
,
Springer
, pp.
740
755
.
46.
Microsoft
,
2019
. Coco 2019 Stuff Segmentation Task. Retrieved April 6, 2022, From https://cocodataset.org/#stuff-2019.
47.
Microsoft
,
2020
. COCO 2020 Panoptic Segmentation Task. Retrieved April 6, 2022, From https://cocodataset.org/#panoptic-2020.
48.
Microsoft
,
2020
. COCO 2020 Densepose Task. Retrieved April 6, 2022, From https://cocodataset.org/#densepose-2020.
49.
Microsoft
,
2015
. COCO 2015 Image Captioning Task. Retrieved April 6, 2022, From https://cocodataset.org/#captions-2015.
50.
Microsoft
,
2020
. COCO 2020 Keypoint Detection Task. Retrieved April 6, 2022, From https://cocodataset.org/#keypoints-2020.
51.
Anolytics
,
2017
. Aimultiple. https://aimultiple.com/.
52.
Bearman
,
A.
,
Russakovsky
,
O.
,
Ferrari
,
V.
, and
Fei-Fei
,
L.
,
2016
, “
What’s The Point: Semantic Segmentation With Point Supervision
,”
Computer Vision - 14th European Conference, ECCV 2016, Proceedings
,
B.
Leibe
,
J.
Matas
,
N.
Sebe
, and
M.
Welling
, eds., Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics),
Springer Verlag
, pp.
549
565
. 14th European Conference on Computer Vision, ECCV 2016, Oct. 8–16.
53.
Laielli
,
M.
,
Smith
,
J.
,
Biamby
,
G.
,
Darrell
,
T.
, and
Hartmann
,
B.
,
2019
, “
Labelar: A Spatial Guidance Interface for Fast Computer Vision Image Collection
,”
UIST ’19
,
New Orleans, LA
,
Oct. 20–23
, Association for Computing Machinery, pp.
987
998
.
54.
Rennie
,
C.
,
Shome
,
R.
,
Bekris
,
K. E.
, and
De Souza
,
A. F.
,
2016
, “
A Dataset for Improved RGBD-Based Object Detection and Pose Estimation for Warehouse Pick-and-Place
,”
IEEE Rob. Autom. Lett.
,
1
(
2
), pp.
1179
1185
.
55.
Garon
,
M.
,
Laurendeau
,
D.
, and
Lalonde
,
J.-F.
,
2018
, “
A Framework for Evaluating 6-DOF Object Trackers
,”
Proceedings of the European Conference on Computer Vision (ECCV)
,
Munich, Germany
,
Sept. 8–14
, pp.
582
597
.
56.
Taheri
,
O.
,
Ghorbani
,
N.
,
Black
,
M. J.
, and
Tzionas
,
D.
,
2020
, “
GRAB: A Dataset of Whole-Body Human Grasping of Objects
,”
Computer Vision – ECCV 2020
,
A.
Vedaldi
,
H.
Bischof
,
T.
Brox
, and
J.-M.
Frahm
, eds.,
Springer International Publishing
, pp.
581
600
.
57.
von Marcard
,
T.
,
Henschel
,
R.
,
Black
,
M. J.
,
Rosenhahn
,
B.
, and
Pons-Moll
,
G.
,
2018
, “
Recovering Accurate 3D Human Pose in the Wild Using IMUS and a Moving Camera
,”
Proceedings of the European Conference on Computer Vision (ECCV)
,
Munich, Germany
,
Sept. 8–14
, pp.
601
617
.
58.
Chen
,
L.-C.
,
Fidler
,
S.
,
Yuille
,
A. L.
, and
Urtasun
,
R.
,
2014
, “
Beat the Mturkers: Automatic Image Labeling From Weak 3D Supervision
,”
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
,
Columbus, OH
,
June 24–27
, pp.
3198
3205
.
59.
Xie
,
J.
,
Kiefel
,
M.
,
Sun
,
M.-T.
, and
Geiger
,
A.
,
2016
, “
Semantic Instance Annotation of Street Scenes by 3D to 2D Label Transfer
,”
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
,
Las Vegas, NV
,
June 16–July 1
, pp.
3688
3697
.
60.
Castrejon
,
L.
,
Kundu
,
K.
,
Urtasun
,
R.
, and
Fidler
,
S.
,
2017
, “
Annotating Object Instances With a Polygon-RNN
,”
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
,
Honolulu, HI
,
July 21–26
, pp.
5230
5238
.
61.
Acuna
,
D.
,
Ling
,
H.
,
Kar
,
A.
, and
Fidler
,
S.
,
2018
, “
Efficient Interactive Annotation of Segmentation Datasets With Polygon-RNN++
,”
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
,
Salt Lake City, UT
,
June 18–22
, pp.
859
868
.
62.
Ahmadyan
,
A.
,
Zhang
,
L.
,
Ablavatski
,
A.
,
Wei
,
J.
, and
Grundmann
,
M.
,
2021
, “
Objectron: A Large Scale Dataset of Object-Centric Videos in the Wild With Pose Annotations
,” In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp.
7822
7831
.
63.
Qian
,
X.
,
He
,
F.
,
Hu
,
X.
,
Wang
,
T.
, and
Ramani
,
K.
,
2022
, “
Arnnotate: An Augmented Reality Interface for Collecting Custom Dataset of 3D Hand-Object Interaction Pose Estimation
,”
Proceedings of the 35th Annual ACM Symposium on User Interface Software and Technology, UIST ’22
,
Bend, OR
,
Oct. 29–Nov. 2
,
Association for Computing Machinery
.
64.
Doula
,
A.
,
Güdelhöfer
,
T.
,
Matviienko
,
A.
,
Mühlhäuser
,
M.
, and
Sanchez Guinea
,
A.
,
2022
, “
Immersive-Labeler: Immersive Annotation of Large-Scale 3D Point Clouds in Virtual Reality
,”
ACM SIGGRAPH 2022 Posters, SIGGRAPH ’22
,
Vancouver, BC, Canada
,
Aug. 7–11
,
Association for Computing Machinery
.
65.
Zhou
,
Z.
, and
Yatani
,
K.
,
2022
, “
Gesture-Aware Interactive Machine Teaching With In-Situ Object Annotations
,”
Proceedings of the 35th Annual ACM Symposium on User Interface Software and Technology, UIST ’22
,
Bend, OR
,
Oct. 29–Nov. 2, 2022
, pp.
1
14
.
66.
Damen
,
D.
,
Doughty
,
H.
,
Farinella
,
G. M.
,
Fidler
,
S.
,
Furnari
,
A.
,
Kazakos
,
E.
,
Moltisanti
,
D.
, et al.,
2018
, “
Scaling Egocentric Vision: The Epic-Kitchens Dataset
,”
Proceedings of the European Conference on Computer Vision (ECCV)
,
Munich, Germany
,
Sept. 8–14
.
67.
Sigurdsson
,
G. A.
,
Gupta
,
A.
,
Schmid
,
C.
,
Farhadi
,
A.
, and
Alahari
,
K.
,
2018
, Charades-Ego: A Large-Scale Dataset of Paired Third and First Person Videos. https://arxiv.org/abs/1804.09626.
68.
Caba Heilbron
,
F.
,
Escorcia
,
V.
,
Ghanem
,
B.
, and
Carlos Niebles
,
J.
,
2015
, “
Activitynet: A Large-Scale Video Benchmark for Human Activity Understanding
,”
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
,
Boston, MA
,
June 8–10
.
69.
Murray
,
N.
,
Marchesotti
,
L.
, and
Perronnin
,
F.
,
2012
, “
AVA: A Large-Scale Database for Aesthetic Visual Analysis
,”
2012 IEEE Conference on Computer Vision and Pattern Recognition
,
Providence, RI
,
June 16–21
, pp.
2408
2415
.
70.
Tang
,
Y.
,
Ding
,
D.
,
Rao
,
Y.
,
Zheng
,
Y.
,
Zhang
,
D.
,
Zhao
,
L.
,
Lu
,
J.
, and
Zhou
,
J.
,
2019
, “
Coin: A Large-Scale Dataset for Comprehensive Instructional Video Analysis
,”
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
,
Long Beach, CA
,
June 16–20
.
71.
Kay
,
W.
,
Carreira
,
J.
,
Simonyan
,
K.
,
Zhang
,
B.
,
Hillier
,
C.
,
Vijayanarasimhan
,
S.
,
Viola
,
F.
,
Green
,
T.
,
Back
,
T.
,
Natsev
,
P.
,
Suleyman
,
M.
, and
Zisserman
,
A.
,
2017
, The Kinetics Human Action Video Dataset.
72.
Materzynska
,
J.
,
Xiao
,
T.
,
Herzig
,
R.
,
Xu
,
H.
,
Wang
,
X.
, and
Darrell
,
T.
,
2020
, “
Something-Else: Compositional Action Recognition With Spatial-Temporal Interaction Networks
,”
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
,
Seattle, WA
,
June 16–18
.
73.
Shao
,
D.
,
Zhao
,
Y.
,
Dai
,
B.
, and
Lin
,
D.
,
2020
, “
Finegym: A Hierarchical Video Dataset for Fine-Grained Action Understanding
,” Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
74.
Perazzi
,
F.
,
Pont-Tuset
,
J.
,
McWilliams
,
B.
,
Van Gool
,
L.
,
Gross
,
M.
, and
Sorkine-Hornung
,
A.
,
2016
, “
A Benchmark Dataset and Evaluation Methodology for Video Object Segmentation
,”
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
,
Las Vegas, NV
,
June 16–July 1
.
75.
Brostow
,
G. J.
,
Fauqueur
,
J.
, and
Cipolla
,
R.
,
2009
, “
Semantic Object Classes in Video: A High-Definition Ground Truth Database
,”
Pattern Recogn. Lett.
,
30
(
2
), pp.
88
97
.
76.
Vijayanarasimhan
,
S.
, and
Grauman
,
K.
,
2012
, “
Active Frame Selection for Label Propagation in Videos
,”
European Conference on Computer Vision
,
Firenze, Italy
,
Oct. 7–13
,
Springer
, pp.
496
509
.
77.
Ben-Shabat
,
Y.
,
Yu
,
X.
,
Saleh
,
F.
,
Campbell
,
D.
,
Rodriguez-Opazo
,
C.
,
Li
,
H.
, and
Gould
,
S.
,
2021
, “
The IKEA ASM Dataset: Understanding People Assembling Furniture Through Actions, Objects and Pose
,”
Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)
,
Virtual
,
Jan. 5–9
, pp.
847
859
.
78.
Ahmad
,
S.
,
Samarawickrama
,
K.
,
Rahtu
,
E.
, and
Pieters
,
R.
,
2021
, “
Automatic Dataset Generation From CAD for Vision-Based Grasping
,”
2021 20th International Conference on Advanced Robotics (ICAR)
,
Virtual
,
Dec. 6–10
,pp.
715
721
.
79.
Microsoft
,
2022
. Kinect For Windows. Retrieved April 6, 2022, From https://developer.microsoft.com/en-us/windows/kinect/.
80.
Inc., N.
,
2022
. OptiTrack - V120:Duo. Retrieved April 6, 2022, From https://optitrack.com/cameras/v120-duo/.
81.
Stereolabs
,
2021
. Zed Mini. https://www.stereolabs.com/zed-mini/.
82.
Chidambaram
,
S.
,
Huang
,
H.
,
He
,
F.
,
Qian
,
X.
,
Villanueva
,
A. M.
,
Redick
,
T. S.
,
Stuerzlinger
,
W.
, and
Ramani
,
K.
,
2021
, “
ProcessAR: An Augmented Reality-Based Tool to Create In-Situ Procedural 2D/3D AR Instructions
,”
Proceedings of the 2021 ACM Designing Interactive Systems Conference, DIS ’21
,
Virtual
,
June 28–July 2
,
Association for Computing Machinery
, pp.
234
249
.
83.
Ramani
,
K.
,
Chidambaram
,
S.
,
Huang
,
H.
, and
He
,
F.
,
2022
, System and Method for Generating Asynchronous Augmented Reality Instructions. US Patent 11,380,069.
84.
Chidambaram
,
S.
,
Reddy
,
S. S.
,
Rumple
,
M.
,
Ipsita
,
A.
,
Villanueva
,
A.
,
Redick
,
T.
,
Stuerzlinger
,
W.
, and
Ramani
,
K.
,
2022
, “
Editar: A Digital Twin Authoring Environment for Creation of AR/VR and Video Instructions From a Single Demonstration
,” 2022 IEEE International Symposium on Mixed and Augmented Reality (ISMAR), pp.
326
335
.
85.
Ramani
,
K.
,
Chidambaram
,
S.
, and
Reddy
,
S. S.
,
2024
, Digital Twin Authoring and Editing Environment for Creation of AR/VR and Video Instructions From a Single Demonstration. US Patent App. 18/480,173.
86.
qlone
,
2020
, qlone. Retrieved May 5, 2020, From https://www.qlone.pro/.
87.
88.
display.land
,
2020
, display.land. Retrieved May 5, 2020, From https://get.display.land/.
89.
Unity
,
2020
, Mesh Collider. Retrieved April 6, 2022, From https://docs.unity3d.com/Manual/class-MeshCollider.html.
90.
Hartmann
,
B.
,
Abdulla
,
L.
,
Mittal
,
M.
, and
Klemmer
,
S. R.
,
2007
, “
Authoring Sensor-Based Interactions by Demonstration With Direct Manipulation and Pattern Recognition
,”
Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
,
San Jose, CA
,
Apr. 28–May 3
, pp.
145
154
.
91.
Brooke
,
J.
, et al.,
1996
, “
SUS-a Quick and Dirty Usability Scale
,”
Usability Eval. Ind.
,
189
(
194
), pp.
4
7
.
92.
Ren
,
S.
,
He
,
K.
,
Girshick
,
R.
, and
Sun
,
J.
,
2016
, “
Faster R-CNN: Towards Real-Time Object Detection With Region Proposal Networks
,”
IEEE Trans. Patt. Anal. Mach. Intell.
,
39
(
6
), pp.
1137
1149
.
93.
Lin
,
T.-Y.
,
Maire
,
M.
,
Belongie
,
S.
,
Hays
,
J.
,
Perona
,
P.
,
Ramanan
,
D.
,
Dollár
,
P.
, and
Zitnick
,
C. L.
,
2014
, “Microsoft COCO: Common Objects in Context,”
Computer Vision – ECCV 2014
,
D.
Fleet
,
T.
Pajdla
,
B.
Schiele
, and
T.
Tuytelaars
, eds.,
Springer International Publishing
, pp.
740
755
.
94.
He
,
K.
,
Gkioxari
,
G.
,
Dollár
,
P.
, and
Girshick
,
R.
,
2017
, “
Mask R-CNN
,”
Proceedings of the IEEE International Conference on Computer Vision
,
Venice, Italy
,
Oct. 22–29
, pp.
2961
2969
.
95.
Lea
,
C.
,
Reiter
,
A.
,
Vidal
,
R.
, and
Hager
,
G. D.
,
2016
, “
Segmental Spatiotemporal CNNS for Fine-Grained Action Segmentation
,”
European Conference on Computer Vision
,
Amsterdam, Netherlands
,
Oct. 11–14
,
Springer
, pp.
36
52
.
96.
Singh
,
B.
,
Marks
,
T. K.
,
Jones
,
M.
,
Tuzel
,
O.
, and
Shao
,
M.
,
2016
, “
A Multi-Stream Bi-Directional Recurrent Neural Network for Fine-Grained Action Detection
,”
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
,
Las Vegas, NV
,
June 16–July 1
, pp.
1961
1970
.
97.
Kuehne
,
H.
,
Arslan
,
A.
, and
Serre
,
T.
,
2014
, “
The Language of Actions: Recovering the Syntax and Semantics of Goal-Directed Human Activities
,”
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
,
Columbus, OH
,
June 24–27
,pp.
780
787
.
98.
Bangor
,
A.
,
Kortum
,
P.
, and
Miller
,
J.
,
2009
, “
Determining What Individual SUS Scores Mean: Adding an Adjective Rating Scale
,”
J. Usabil. Stud.
,
4
(
3
), pp.
114
123
.
99.
Nuernberger
,
B.
,
Ofek
,
E.
,
Benko
,
H.
, and
Wilson
,
A. D.
,
2016
, “
Snaptoreality: Aligning Augmented Reality to the Real World
,”
Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems, CHI ’16
,
San Jose, CA
,
May 7–12
,
Association for Computing Machinery
, pp.
1233
1244
.
100.
Hayatpur
,
D.
,
Heo
,
S.
,
Xia
,
H.
,
Stuerzlinger
,
W.
, and
Wigdor
,
D.
,
2019
, “
Plane, Ray, and Point: Enabling Precise Spatial Manipulations With Shape Constraints
,”
Proceedings of the 32nd Annual ACM Symposium on User Interface Software and Technology, UIST ’19
,
Honolulu, HI
,
Apr. 25–30
,
Association for Computing Machinery
, pp.
1185
1195
.
101.
Cao
,
Y.
,
Qian
,
X.
,
Wang
,
T.
,
Lee
,
R.
,
Huo
,
K.
, and
Ramani
,
K.
,
2020
, “
An Exploratory Study of Augmented Reality Presence for Tutoring Machine Tasks
,”
Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems
,
Honolulu, HI
,
Apr. 25–30
,
Association for Computing Machinery
, pp.
1
13
.
102.
Yoon
,
S. H.
,
Huo
,
K.
,
Zhang
,
Y.
,
Chen
,
G.
,
Paredes
,
L.
,
Chidambaram
,
S.
, and
Ramani
,
K.
,
2017
, “
iSOFT: A Customizable Soft Sensor With Real-Time Continuous Contact and Stretching Sensing
,”
Proceedings of the 30th Annual ACM Symposium on User Interface Software and Technology, UIST ’17
,
Quebec City, Canada
,
Oct. 22–25
,
Association for Computing Machinery
, pp.
665
678
.
103.
Paredes
,
L.
,
Reddy
,
S. S.
,
Chidambaram
,
S.
,
Vagholkar
,
D.
,
Zhang
,
Y.
,
Benes
,
B.
, and
Ramani
,
K.
,
2021
, “
FabHandWear: An End-to-End Pipeline From Design to Fabrication of Customized Functional Hand Wearables
,”
Proc. ACM Interact. Mob. Wearable Ubiquitous Technol.
,
5
(
2
), pp.
1
22
.
You do not currently have access to this content.