More and more major media are restricting the ability of search engines to study their content, raising serious questions about technological advances and access to information. The move by media companies is understandable as they seek to protect their own products and, indirectly, their revenues. At the same time, the move is generating debate about how this will affect the development of AI and the information available to users. On these issues, Áron Nagy Kovács interviewed AI expert Levente Szabados.
AI data hunger and the role of quality content
Szabados points out that AI systems, especially big language models such as ChatGPT, require a huge amount and variety of data to develop. The more AI can be taught with quality text, the more sophisticated it becomes. For example, commonly available public discourse forum content is readily available, but does not necessarily represent high quality. Valuable content, such as carefully produced news and analysis by journalists, is often owned by media companies who invest significant resources in producing such material.
The issue of database restrictions is not a recent one: legal access to the data needed to develop AI systems has long been controversial. Under the status quo, developers have always used this data, but now media companies are seeking to change this, fearing for their own economic interests.
The differences between search engines and AI
Another aspect that Szabados highlights is the access to content by already trained AI systems. While Google’s search engines drive traffic to websites based on a consensus between content producers and search engines – thus generating advertising revenue for the sites – AI systems provide information by reading and summarising content without directing the user to the source. This can mean lost revenue for content producers, as visitors lose direct traffic and advertising revenue.
Szabados stresses that the evolution of AI models and the tension between media revenue structures requires new solutions. To find a win-win solution, an economic model is needed that makes the use of AI fair for content producers.
The economic model dilemma: How do we pay for information?
The previous business model of media companies, based on print newspaper sales and advertising revenues, clearly worked. However, with the rise of online content consumption, this system has changed and the issue of how journalists are paid for their work has become increasingly pressing. Even before the advent of artificial intelligence, the question of whether readers were willing to pay for digital content was a big issue. Now, with the development of AI and the possibility of automatic processing of information, the situation has become even more complex.
Weighing up the pros and cons
When large media outlets such as the New York Times decide to exclude AI-based search engines, they have to weigh the consequences for both sides. Blocking access to information for AI systems may slow technological progress, but it allows media to preserve the value of their work, which users pay for, directly or indirectly.
The new situation requires media companies, AI developers and regulators to work out a compromise model that preserves the value of journalism while allowing AI technologies to evolve.