Common hand interactions introduce many self-interactions and massive self occlusions, says the research team. However, it managed to create a robust algorithm by "constraining a vision-based tracking algorithm with a physically based deformable model". The moel has been tested against some of the most complicated interaction of human hands, and you can see the tracking quality and accuracy in the embedded video.While it looks impressive, the system is far from being ready for consumer applications. The demonstration seen in the video below required a 124 camera system, six threads on an Intel E5-2698 Xeon 2.2GHz processor, and an NVIDIA Tesla V100 GPU to achieve accurate 30fps tracking.
Facebook tested various camera systems and concluded three cameras were not enough to reliable track input data. With 18 cameras the results started getting plausible and 43 cameras produced a result that was visually quite close to the 124 camera system.