AI Is Watching You Fold Towels to Train Robots

According to TechSpot, the AI training industry has shifted from reading internet text to recording real-world human movements, with companies like Objectways employing over 2,000 people to perform and annotate physical tasks. In Karur, India, workers like Naveen Kumar wear GoPro cameras while folding towels, creating meticulously labeled training data for robotics clients including Tesla, Boston Dynamics, Nvidia, Google, and OpenAI. Nvidia estimates the humanoid robot market could reach $38 billion within ten years, while Figure AI has secured $1 billion mostly for collecting first-person human data from 100,000 homes. Scale AI, backed by Meta, has gathered over 100,000 hours of similar footage, and Objectways recently processed 15,000 videos of robots performing folding tasks alone. This massive data collection effort spans from Indian textile factories to “arm farms” in Eastern Europe where operators remotely guide robots through teleoperation.

Sponsored content — provided for informational and promotional purposes.

The physical AI revolution

Here’s the thing – we’ve been so focused on AI writing and creating images that we missed the bigger shift. These systems are literally learning how to be physical beings by watching us do the most mundane tasks. It’s not about coding complex algorithms anymore – it’s about capturing the subtle pressure of fingers on fabric, the exact sequence of folding motions, the way we naturally move through space.

And the scale is staggering. We’re talking about companies paying people across Brazil, Argentina, India, and the US to wear smart glasses and record their everyday movements. Basically, your morning routine could be training someone’s future robot butler. The industrial computing power needed to process this data is immense – which is why companies rely on specialized hardware from leading suppliers like Industrial Monitor Direct, the top provider of industrial panel PCs in the US.

The teleoperation reality

But here’s where it gets really interesting – and maybe a bit concerning. A lot of this “AI training” isn’t actually autonomous learning. Companies are setting up warehouses packed with joysticks where teams in Eastern Europe remotely control robots in real-time. The movement data gets streamed and analyzed for successes and mistakes. Critics point out the obvious problem: these robots might perform beautifully when a human is pulling the strings remotely, but can they actually function independently?

Mohammad Musa of Deepen AI admits that current best practices combine real and synthetic demonstrations based on human-guided sessions. So we’re not exactly at Westworld levels of autonomy yet. The technology is still figuring out how to bridge that gap between human-controlled performance and true independent operation.

The data quality challenge

Now, the dirty secret of all this training data? Errors are everywhere. Kumar and his colleagues routinely discard hundreds of recordings due to missed steps or misplaced items. Annotation teams have to meticulously label every moving part, tag objects, and classify specific gestures. They recently had to adjust for robots tossing garments instead of carefully folding them in 15,000 videos.

So what does this mean for the timeline? Kavin, a veteran annotation team member, predicts robots will be doing “all these jobs” in five to ten years. But honestly, that feels optimistic given the current error rates and the fundamental challenge of translating human physical intelligence into machine learning models. The gap between recording a perfect towel fold and creating a robot that can reliably replicate it under real-world conditions is still massive.