MIT researchers havedeveloped a techniquethat teaches AI to capture actions shared between video and audio.

“The main challenge here is, how can a machine align those different modalities?

But for machine learning, it is not that straightforward.”

Artificial intelligence robot touching futuristic data screen.

Yuichiro Chino / Getty Images

It then maps those data points in a grid, known as an embedding space.

The researchers designed the model so it can only use 1,000 words to label vectors.

The model chooses the words it thinks best represent the data.

MIT AI model identifying where a certain action is taking place in a video and labeling it.

MIT News

Beszedes suggested the data industry can view AI systems from a manufacturing process perspective.

The data industry needs to treat AI bias as a quality problem.

“From a consumer perspective, mislabeled data makes e.g.

online search for specific images/videos more difficult,” Beszedes added.

MIT News

But the MIT model still has some limitations.

The MIT researchers say their new technique outperforms many similar models.