Research System

Multi-Phone 3D Capture Rig

2024–2025 · Designer & developer – rig, calibration, and reconstruction

Key metric: 4–8 synchronized phones · small-scale volumetric capture

A flexible, low-cost smartphone rig for experimental multi-view 3D capture.

multi-viewvolumetric-capturesmartphone3D-reconstruction

Problem & Motivation

High-quality volumetric capture typically requires expensive cameras and infrastructure. The Multi-Phone Rig explores how widely available smartphones can be synchronized and calibrated into an affordable multi-view capture platform for motion research, XR content creation, and embodied interaction studies. The aim is not to beat studio systems on raw accuracy but to provide a practical, cheap, and transportable rig suitable for prototyping and fieldwork. :contentReference[oaicite:5]5

Approach & System Overview

Physical rig: A modular mount that holds 4–8 phones with consistent poses and line-of-sight to the capture volume.
Sync mechanism: A Raspberry Pi sends a UDP countdown to companion apps on each phone; phones trigger capture at the appointed timestamp and stream frames to a local server.
Calibration: Use ArUco markers for initial extrinsic calibration of each phone camera (solvePnP → absolute camera poses).
Processing pipeline: Per-frame 2D keypoints (MediaPipe / OpenPose) → temporal buffering → triangulation for 3D skeletons; optional COLMAP/CUSTOM pipeline for static mesh reconstruction.

Implementation Notes

The sync protocol prioritizes low-latency UDP control for start-of-capture with lightweight timestamp negotiation and small buffering to accommodate jitter.
Calibration obtains per-camera intrinsic/extrinsic parameters using OpenCV utilities; calibration is stored and reused across sessions to reduce setup time.
For dynamic capture, a short buffer allows minor alignment adjustments (frame-shift compensation) to reduce temporal misalignment effects.

Evaluation & Results

Early tests captured walking and gesturing sequences with synchronization jitter < 50 ms; in practice we observed ~20–50 ms depending on Wi-Fi conditions.
Fusion of 2D keypoints into 3D skeletons produced visually plausible motion suitable for animation and downstream XR use. Mesh reconstructions for static poses were coarse but usable for rapid prototyping.
The system is a pragmatic tool for rapid capture and exploratory HRI experiments; quantitative accuracy vs. optical mocap is left for future formal benchmarking. :contentReference[oaicite:6]6

My Contribution

Designed the rig and the app/server sync architecture.
Implemented the UDP synchronization protocol and the capture orchestration scripts.
Built the calibration pipeline (ArUco-based) and the initial 3D fusion code.

Outcomes & Next Steps

Immediate next steps: hardware trigger integration for sub-10 ms sync, increased camera counts, and systematic benchmarking against Mesquite MoCap and an optical system to quantify spatial/temporal error.