Researchers have developed an AI-based system called ‘SpoilerNet’ that figures out spoilers in online reviews of books & TV shows & warns about them to the users. The system can be used to build a browser extension to shield people from spoilers. The researchers collected over 1.3 million book reviews annotated with spoiler tags by book reviewers for the system.
Researchers have developed an AI-based system that can figure out spoilers in online reviews of books & TV shows.
“Spoilers are everywhere on the Internet & are very common on social media. As Internet users, we understand the pain of spoilers, & how they can ruin one’s experience,” said one of the paper’s senior authors, Ndapa Nakashole.
Some websites allow people to manually flag their posts with tags that serve as ‘spoiler ahead’ warning signs. But this doesn’t always happen. So researchers, who presented the study at Association for Computational Linguistics, wanted to develop an artificial intelligence tool powered by neural networks to automatically detect spoilers. They named the tool SpoilerNet.
On a theoretical level, researchers want to better understand how people write spoilers & what kind of linguistic patterns & common knowledge mark a sentence as a spoiler.
The tool the researchers developed could be used to build a browser extension to shield people from spoilers. To train & test SpoilerNet, researchers went looking for large datasets of sentences containing spoilers. Spoiler alert! They found none. So they created their own by collecting more than 1.3 million book reviews annotated with spoiler tags by book reviewers.
The tags encompass sentences that include spoilers & hide them behind a “view spoiler” link in the text. The reviews were collected from Goodreads, a social networking site that allows people to track what they read, & share thoughts & reviews with other readers.
“To our knowledge, this is the first dataset with spoiler annotations at this scale & at such a fine-grained granularity,” said the paper’s first author, Mengting Wan.
Researchers found that spoiler sentences tend to clump together in the latter part of reviews. But they also found that different users had different standards to tag spoilers, & neural networks needed to be carefully calibrated to take this into account.
In addition, the same word may have different semantic meanings in different contexts. For example, ‘green’ is just color in one book review, but it can be the name of an important character & a signal for spoilers in another book. Identifying & understanding these differences is challenging, Wan said.
Researchers trained SpoilerNet on 80% of the reviews on Goodreads, running the text through several layers of neural networks. The system could detect spoilers with 89 to 92% accuracy.
They also ran SpoilerNet on a dataset of more than 16,000 single-sentence reviews of about 880 TV shows. The accuracy of the tool to detect spoilers was 74 to 80%.
Most of the errors came from the system getting distracted by words that are usually loaded & revelatory for example murder or killed.
Looking forward, the Goodreads dataset can be used as a powerful tool to train algorithms to detect spoilers in different types of content said tweets containing spoilers.