Multi-Phone 3D Capture Rig
2024–2025 · Designer & developer – rig, calibration, and reconstruction
Key metric: 4–8 synchronized phones · small-scale volumetric capture
A flexible, low-cost smartphone rig for experimental multi-view 3D capture.
multi-viewvolumetric-capturesmartphone3D-reconstruction
Problem & Motivation
High-quality volumetric capture typically requires expensive cameras and infrastructure. The Multi-Phone Rig explores how widely available smartphones can be synchronized and calibrated into an affordable multi-view capture platform for motion research, XR content creation, and embodied interaction studies. The aim is not to beat studio systems on raw accuracy but to provide a practical, cheap, and transportable rig suitable for prototyping and fieldwork. :contentReference[oaicite:5]5
Approach & System Overview
- Physical rig: A modular mount that holds 4–8 phones with consistent poses and line-of-sight to the capture volume.
- Sync mechanism: A Raspberry Pi sends a UDP countdown to companion apps on each phone; phones trigger capture at the appointed timestamp and stream frames to a local server.
- Calibration: Use ArUco markers for initial extrinsic calibration of each phone camera (solvePnP → absolute camera poses).
- Processing pipeline: Per-frame 2D keypoints (MediaPipe / OpenPose) → temporal buffering → triangulation for 3D skeletons; optional COLMAP/CUSTOM pipeline for static mesh reconstruction.
Implementation Notes
- The sync protocol prioritizes low-latency UDP control for start-of-capture with lightweight timestamp negotiation and small buffering to accommodate jitter.
- Calibration obtains per-camera intrinsic/extrinsic parameters using OpenCV utilities; calibration is stored and reused across sessions to reduce setup time.
- For dynamic capture, a short buffer allows minor alignment adjustments (frame-shift compensation) to reduce temporal misalignment effects.
Evaluation & Results
- Early tests captured walking and gesturing sequences with synchronization jitter < 50 ms; in practice we observed ~20–50 ms depending on Wi-Fi conditions.
- Fusion of 2D keypoints into 3D skeletons produced visually plausible motion suitable for animation and downstream XR use. Mesh reconstructions for static poses were coarse but usable for rapid prototyping.
- The system is a pragmatic tool for rapid capture and exploratory HRI experiments; quantitative accuracy vs. optical mocap is left for future formal benchmarking. :contentReference[oaicite:6]6
My Contribution
- Designed the rig and the app/server sync architecture.
- Implemented the UDP synchronization protocol and the capture orchestration scripts.
- Built the calibration pipeline (ArUco-based) and the initial 3D fusion code.
Outcomes & Next Steps
- Immediate next steps: hardware trigger integration for sub-10 ms sync, increased camera counts, and systematic benchmarking against Mesquite MoCap and an optical system to quantify spatial/temporal error.