Understanding objects through multiple sensory modalities is fundamental to human perception, enabling cross-sensory integration and richer comprehension. For AI and robotic systems to replicate this ability, access to diverse, high-quality multi-sensory data is critical. Existing datasets are often limited by their focus on controlled environments, simulated objects, or restricted modality pairings. We introduce X-Capture, an open-source, portable, and cost-effective device for real-world multi-sensory data collection, capable of capturing correlated RGBD images, tactile readings, and impact audio. With a build cost under $1,000, X-Capture democratizes the creation of multi-sensory datasets, requiring only consumer-grade tools for assembly. Using X-Capture, we curate a sample dataset of 3,000 total points on 500 everyday objects from diverse, real-world environments, offering both richness and variety. Our experiments demonstrate the value of both the quantity and the sensory breadth of our data for both pretraining and fine-tuning multi-modal representations for object-centric tasks such as cross-sensory retrieval and reconstruction. X-Capture lays the groundwork for advancing human-like sensory representations in AI, emphasizing scalability, accessibility, and real-world applicability.
A full release of our dataset is coming soon (contact Samuel Clarke for more information). We include some samples below.
We develop custom hardware to fit vision, touch, and audio sensing into a handheld package. We will open-source all our designs soon. See our video below to see a breakdown of the device hardware and see it in action. A full release of our hardware designs is coming soon (contact Samuel Clarke for more information).
@misc{clarke2025xcapture, title={X-Capture: An Open-Source Portable Device for Multi-Sensory Learning}, author={Samuel Clarke and Suzannah Wistreich and Yanjie Ze and Jiajun Wu}, year={2025}, eprint={2504.02318}, archivePrefix={arXiv}, primaryClass={cs.CV}, url={https://arxiv.org/abs/2504.02318}, }