There has been a lot of criticism surrounding the issue with training data and AIs, specifically how AI companies are accused of using copyrighted data without the consent of the creator. With the startup company Vana, users will be able to initiate their data's use, and they will be compensated for it.
Vana Helps You Sell Your Data for AI
Established back in 2021, the startup was founded by Anna Kazlauskas and Art Abal, who met each other at the MIT Media Lab. Kazlauskas studied computer science and economics at MIT while Abal is a corporate lawyer by training and education and an associate at The Cadmus Group.
The two built a platform where people can collectively store their data such as chats, audio, and photos, and compile them into datasets that can be used as training data for generative AI. "Vana's infrastructure in effect creates a user-owned data treasury," Kazlauskas said.
"It does this by allowing users to aggregate their personal data in a non-custodial way," as mentioned in Tech Crunch, which means that the users who own the data will still have control over it even though AI companies will use them to train LLMs.
Anyone can participate in the practice of selling data. All they have to do is create an account with Vana, and after confirming their email, they can arrach data to a digital avatar and explore the apps that use Vana's platform and datasets.
In more good news, the company has already addressed the possibility of mishandling the payments provided by the companies that will use the datasets. Kazlauskas explained that Vana users gave the option to self-host their data instead of storing it in the company's servers.
Since the service will be subscription-based, which starts at $3.99 along with a data transaction fee, the developers will gain nothing if they exploit users and the data they have to offer. For now, users who intend to sell their data can start with their Reddit posts.
The company has already launched the Reddit Data DAO, which stands for Digital Autonomous Organization. It pools multiple users' data and allows users to decide how it will be used. This will affect how the combined data will be licensed for profit.
Unfortunately. Reddit is not exactly pumped about the idea. The company has already banned the subreddit dedicated to discussing DAO, and a Reddit spokesperson accused Vana of exploiting its data export system.
Why This Could Be a Good Start
Right now, the way licensing for training data works still needs a little bit of work. Mostly, it's because there's no definite framework for the practice, so there is no fixed system on what kind of data should be used and how much the compensation could be.
With companies like Vana, users will be able to earn as AI models use their data, which AI companies allegedly do already anyway. It also skips the complicated steps as Vana will act as the middleman who will take care of all the processing of transactions.