Yahoo is set to release 13.5TB worth of data gathered from an estimate of 20 million users of the website. But users need not fret about the data dump, as the company says that the users will be anonymous and the data will only be given to academic research communities who wish to contribute to the advancement of machine learning and artificial intelligence.
The data will reveal basic information such as age, gender and general location. The most important part of the data, however, is the information on how these users interact with the website's news web services like Yahoo! News and Yahoo! Finance. This includes information on what devices they used to visit the site, what articles they read and how long they stay on an article, Slash Gear reports.
This move by Yahoo is part of the company's means of attracting talented minds to join its quest to be ahead in researching on artificial intelligence.
With the large-scale data that Yahoo is going to provide, researchers can determine patterns in what kinds of headlines or design features attract a specific group of people. Determining these patterns can greatly contribute to machine learning, Wall Street Journal reports.
While data dumps like this rarely occur outside Internet companies because of what the data could reveal about the business, it is not the first time that Yahoo has done it. There have been 56 previous releases in the Yahoo Labs Webscope program, which encompasses advertising, image, social and ratings data, among other categories, according to Venture Beat.
What sets apart this data dump is the large data, which is allegedly the biggest data dump ever made, with the last known biggest size at only 1TB, Venture Beat added.