Data Collectives Are the Next Frontier of Labor Relations

This article was published in Harvard Business Review, a credible and valuable source in the field of business.
Authors: José Parra-Moyano and Amit Joshi
Data Collectives Are the Next Frontier of Labor Relations
The impact of AI, and generative AI in particular, is being felt across industries. But while executives are excited about this technology’s potential, white-collar workers are often wary about what it may mean for them, their jobs, and their futures. These differing perceptions are creating new tensions and presenting new challenges for both groups.
Already, workers have taken action to exert their influence on the agenda that will determine what organizations will (or will not) do with AI. Most notably, there’s the 148-day-long strike organized in 2023 by the Writers Guild of America (WGA), a union that represents 11,500 screenwriters. The strike ground the entertainment industry to a halt for months before finally ending with an agreement that AI can’t write or rewrite literary material, that AI-generated material will not be considered source material (meaning that AI-generated material can’t be used to undermine a writer’s credit or separated rights), that the exploitation of writers’ material to train AI is prohibited, and that writers can choose to use AI when performing writing services, but the company can’t require the writer to use AI software when performing writing services.
This outcome was heralded as a win for the WGA, but it also illustrates the difficulties of these kinds of negotiations. First, the agreement expires in three years, meaning that the writers will be forced to renegotiate very soon. Second, it is not clear how the agreement will be enforced, since there is no guaranteed way to check whether or not data has been ingested by an LLM. Third, the agreement says nothing about outsiders entering the industry. OpenAI has started a campaign to promote Sora—its AI-based video generation tool—among filmmakers and studios in Hollywood. In principle, nothing prevents studios and filmmakers from starting to generate scripts and videos with OpenAI’s tool, something that would significantly diminish not only the influence and power of writers, but of actors too.
While the WGA strike is the biggest example of employee action, it’s not the only one. Employers need to start preparing for the possibility that anxiety over AI will spur white-collar unionization efforts and collective action. The AI Bill Project launched by the Trades Union Congress—a federation of trade unions representing blue- and white-collar unionized workers in England and Wales—shows evidence of how unions are rapidly evolving to increase the say of workers in how technology is used at work.
All of the problems above are rooted—directly or indirectly—in access to high-quality, contextual data. Data is the input that is needed for machine learning algorithms to perform, and AI is made of such algorithms. Hence, those having access to high-quality data can train high-quality AI. Currently, the WGA does not govern the generation and use of data created by a workforce, and the latest data protection regulations like the EU AI Act focus on preventing employee monitoring but not on how content generated by the employees can be used to train models. Consequently, there is tremendous scope for conflict. And unless workers and organizations can address this data issue in a meaningful, lasting way, conflict will continue to be litigated through standoffs and piecemeal negotiations.
Data cooperatives offer one meaningful path forward. They are an organizational model that enables individuals to pool their data with the purpose of gaining bargaining power with the companies analyzing their data. Cooperatives like Swash, datum, MIDATA, Gener8, SAOS, GISC, and the Data Workers’ Union provide avenues for individuals to monetize and manage their data, transforming its role—and theirs—in the digital economy. These cooperatives emphasize individual control, ethical use, and fair treatment and allow users to retain ownership and agency over their digital footprint. Furthermore, they offer in many cases the for-profit monetization of the data from the cooperatives’ members, enabling third parties to get insights from that data (in a responsible and privacy-preserving way). Thus, data cooperatives can satisfy a demand for data-based insights or AI training. Simultaneously, they provide organizations with regularly updated, high-quality data that is extremely relevant to their context. This means that members have control over their data and they can give permission for specific companies to use it responsibly.
Data as a New Factor of Production
The impact of technology on employees and their employers is usually analyzed as a zero-sum game between both parties, in which automation increases profits for employers while it reduces jobs for employees. However, AI’s dependency on fresh data upends this view for three reasons:
Generative AI’s content-producing abilities can affect roles such as advertising and marketing, drug design and innovation, and product and process design and development. Indeed, the impact of gen AI on programming jobs is already being felt.
Because AI models are extremely data hungry, they can be trained not just on the finished products, but also on data collected during the creation process. Any such data generated is typically owned by the organization, which makes it even easier to train future internal AI models.
Most organizations are already sitting on large gold mines of past data, a lot of which can be successfully used to pretrain or fine-tune AI models.
Consider, for example, a team of engineers working on the drivetrain for a new model year of a car. Such a team may comprise several dozen individuals with a variety of skills and experiences. At the same time, all automakers have access to copious amounts of data from previous drivetrain creation including their performance, costs, and the design processes. It is therefore foreseeable that companies may look to shift more of the initial work to AI, thereby impacting several white-collar jobs in the area.
However, AI needs newer, cleaner, and larger datasets to learn from and improve its performance. Organizations relying on the current capabilities of the AI may sooner or later find out that the algorithms they use start performing worse than they used to through “model decay”—the degradation of the AI’s performance over time. The reason is that the data with which they were trained is no longer representative of the reality in which the AI has to navigate.
Our preferences, context, and sense of humor, taste, and fashion change constantly. And we humans are needed to reveal our preferences, to laugh and engage when we find something humorous, and to wear a product that we find fashionable. The most precious data will be about us humans, and it is therefore our interaction with digital systems that results in that data. Eliminating the human implies eliminating the data about the person that the AI is designed to generate content for. Only we can determine whether we find that joke funny, whether that subtle change in color is fashionable, or whether something new is now needed, because just generating what we liked three months ago is not good enough anymore.
This very fact, which emerges only from the nature of AI and not from the nature of other technologies that have in the past affected the labor market, can surprisingly help align the needs of employers and employees, as human employees are fundamental for generating newer data that can eventually drive AI.
Data Cooperatives Are to Data what Labor Unions Are to Labor
In the context of work, it is the employees’ on-the-ground knowledge, problem-solving abilities, and evolving skill sets that change with each new challenge and innovation. Thus, the most valuable data for the AI that can take over the tasks of the workers will come directly from the tasks workers perform, as their hands-on interaction with tools, systems, and customers generates the precise data that AI requires to stay accurate and useful.
It is here where data cooperatives offer a new approach to data governance by enabling workers to co-own and manage their collective data through a more or less decentralized decision-making process. This happens because the members of a data cooperative can pull data currently siloed in different sources into one bundle. The aggregation of data that members generate from diverse sources and situations can lead to the development of new products, services, and business models that would be unattainable through individual contributions alone or through the utilization of data in single sources (like one company, one platform, one service provider, etc.).
Furthermore, once workers start seeing monetization of their data from cooperatives, they are much more likely to reorganize their own processes to facilitate better data collection, which in turn can generate better AI outcomes for the organization and higher incomes from data for workers.
Other than labor unions, which have traditionally sought a one-way route to improve the workers’ conditions, data cooperatives operate like providers of training data and follow the rules of the market to satisfy an unsatisfied demand (the one of training AI systems with up-to-date data). And this can be made in privacy-preserving ways, such that the members of the cooperative keep their private insights private. This model of organization empowers workers by allowing them to establish shared rules for data use, potentially enhancing their bargaining power with employers. In practice, this requires that the data cooperatives enable workers to individually collect their data from different sources, deciding on a case-by-case setting which third party (like their employers, for example) can train an algorithm on that data. This enables workers to monetize their data and keep it controlled under one single umbrella. There are a number of models for how to do this: Some use blockchain, others use third-party apps, while others store data locally and simply give users more control over who has access to it and how it can be used.
Now that data is being increasingly used as a factor of production (i.e., as an input) in the economy, it seems natural that data cooperatives emerge as a solution. This is the same logic that led to development of labor unions in the past: The fragmentation of the ownership structure of the factor of production (be it data or labor) impedes any bargaining power of an individual with the employer.
Data and labor, both critical factors of production, do share the challenge of fragmentation. However, they differ in the fact that data mainly creates value in large volumes. Hence, its aggregation is required to unleash its value via insights. It is because of that required aggregation that we speak of “big data.”
In the context of data cooperatives, the collective ownership model of data does more than just enhance bargaining power; it fosters the creation of value by leveraging the volume and diversity of its members’ data. This illustrates how distributed ownership structures can drive broader economic and social benefits beyond mere negotiation leverage.
What This Means for the Future of Employee-Employer Relationships
For this economic value to be unleashed—and for both employers and employees to thrive in the data economy—everyone involved needs to raise their level of data literacy. Both workers and employers should understand how data creates value, which is by being aggregated and analyzed in order to reveal insights and patterns that can improve decision-making and train AI models. Further, workers should understand how they contribute to creating and managing such data while they are working and use the collective influence that cooperatives grant them in order to have the right discussion about the future of work.
To take full advantage of the data people produce, while also ensuring workers have control over which data they offer and are fairly compensated for it, employers and employees should follow these steps in order to be able to shape the future of the digital economy, and thus the future of work:
1. Assess the organization’s internal data resources and potential
Employers and employees should begin by conducting a thorough audit of the data that the organization already collects and generates, focusing on identifying data that could be valuable for AI training or analytics. In that exercise, identifying the employee-generated data can serve as a basis to quantify the contribution of the employees to the data-based value generated by the organization. The “Workers’ Algorithm Observatory” is an initiative born at Princeton University that enables workers to crowdsource data and investigate the algorithmic systems behind the platforms that determine pay, schedule, and more in the platform economy and beyond. Employers and employees can use this initiative for inspiration and support.
2. Engage employees in data literacy programs
Organizations can implement training programs to enhance their employees’ data literacy. By educating them on how their daily activities generate valuable data, the importance of this data in AI development, and how they can participate in data cooperatives, valuable knowledge is created. This knowledge enables employees to generate more value by means of the organizations’ data and, at the same time, enables all of the relevant stakeholders in the organization to have a common understanding and language about the value that data cooperatives can create. This benefits everyone: Employees better understand how they create value and are compensated for it, and employers get higher-quality data in return. The pharmaceutical company Roche is pioneering this area.
3. Incorporate data cooperative clauses in employment contracts
By updating employment contracts to include clauses that recognize employees’ rights to their data and outline how their data can be used in cooperation with data cooperatives, organizations can position themselves as pioneers in this area. This would ensure transparency and provide a legal framework for data usage and can help with the attraction of new talent. These types of clauses are being discussed in initiatives like the Data Workers Union, which advocates for workers’ rights over the data they generate. Companies can take inspiration from these discussions to craft their own policies and stay ahead of what promises to become a major labor issue, thus leading and shaping the discussion about this topic. By being proactive now, companies can develop a mutually beneficial system and head off a point of contention later on, at which point they might have to sacrifice more.
Given the dynamic nature of society, AI will (at least in the foreseeable future) need the data and guidance of humans to generate real value. Therefore, workers will play a fundamental role in keeping the AI relevant. It is the combination of labor, capital, and AI that will—if anything—generate the value that’s being projected by consultancies such as McKinsey. Workers should therefore focus on understanding the interplay between capital, labor, and data and start guiding the discussion about the future of work with the right understanding of the nature of the AI.
Business leaders should equally understand the dependencies between capital, labor, and data in order to proactively propose collaborative frameworks that let organizations—and thus society—flourish in the age of AI.
TAKEAWAYS
As AI continues to transform industries, tensions between employers and white-collar workers are rising. Workers are increasingly pushing back against AI’s impact, with data ownership and usage becoming central to labor negotiations. Data cooperatives offer a promising solution by allowing workers to control, monetize, and leverage their data for better bargaining power.
Data ownership is critical, and AI needs fresh data. Without continuous human input, AI models degrade, making worker-generated data indispensable. Workers generate valuable data, yet organizations typically own and use it without compensation.
The role of data cooperatives. These cooperatives can empower workers to control and monetize their data, much like labor unions have in the past.
Employer-employee collaboration. Companies should improve data transparency, implement data literacy programs, and integrate data rights into employment contracts.
A new labor paradigm. Aligning AI, labor, and capital can create sustainable value for businesses and workers alike.
Please Log in to leave a comment.