CRV 2026

IRIS

Learning-Driven Cinema Robot Arm
for Visuomotor Motion Control

Qilong Cheng  ·  Matthew Mackay  ·  Ali Bereyhi

Abstract

A cinema robot that learns from demonstration.

IRIS is a low-cost, 3D-printed 6-DOF cinema robot arm that learns cinematic camera motions from human demonstrations via goal-conditioned visuomotor imitation learning. At ~$992 in materials, it achieves 97% of expert visual alignment and 6× smoother motion than its human teachers. All hardware designs, simulation, and training code are fully open-sourced.

Open-Source Hardware

Sub-$1K 6-DOF arm with quasi-direct-drive actuators. STEP files, BOM, and wiring docs released.

High-Fidelity Simulation

MuJoCo physics twin with analytical FK/IK, RRT* and potential-field planners, and cinema shot modes.

Visuomotor Imitation Learning

Goal-conditioned CVAE transformer trained from kinesthetic demonstrations. 9 ablation variants evaluated.

Full ROS Stack

200 Hz RS-485 driver, joint calibration, sim-to-real live mirroring, teach-and-repeat workflow.

System

Built from the ground up.

6
Degrees of Freedom
$992
Bill of Materials
1.5 kg
Payload Capacity
9
IL Ablation Variants
IRIS System Overview
Open Source

Everything is released.

Demo

See IRIS in action.

IRIS — Cinema Shot Execution

Crane shot

Crane — Vertical rise while tracking a fixed subject

Dolly shot

Dolly — Linear push-in / pull-out

Pan shot

Pan — Lateral arc sweep

Data collection

Kinesthetic teaching — operator physically guides the arm

Policy deployment

CVAE Full policy deployed on real hardware — 46.2% task success

Results

97% of expert alignment. 6× smoother.

Results
MethodSuccessVis. Align.Jerk
Expert90.0%0.8743.64
CVAE Full90.0%0.8470.61
Incremental0.0%0.6360.83
RGB Only0.0%0.5841.65
Visual0.0%0.5361.59
RRT*10.0%0.6360.22

Visual Alignment = ResNet18 cosine similarity to goal image. Jerk in m/s³.

Citation

Cite this work.

@inproceedings{cheng2026iris,
  title     = {{IRIS}: Learning-Driven Task-Specific Cinema Robot Arm
               for Visuomotor Motion Control},
  author    = {Qilong Cheng and Matthew Mackay and Ali Bereyhi},
  booktitle = {23rd Conference on Robots and Vision},
  year      = {2026},
  url       = {https://arxiv.org/abs/2602.17537}
}