A content protection network (also called content protection system or web content protection) is a term for anti-web scraping services provided through a cloud infrastructure. A content protection network is claimed to be a technology that protects websites from unwanted web scraping, web harvesting, blog scraping, data harvesting, and other forms of access to data published through the World Wide Web. A good content protection network will use various algorithms, checks, and validations to distinguish between desirable search engine web crawlers and human beings on the one hand, and Internet bots and automated agents that perform unwanted access on the other hand.
A few web application firewalls have begun to implement limited bot detection capabilities.
The protection of copyrighted content has a long tradition, but technical tricks and mechanisms are more recent developments. For example, maps have sometimes been drawn with deliberate mistakes to protect the authors' copyright if someone else copies the map without permission.[1] In 1998, a system called SiteShield eased the fears of theft and illicit re-use expressed by content providers who publish copyright-protected images on their websites.[2] A research report published in November 2000 by IBM was one of the first to document a working system for web content protection, called WebGuard.[3]
Around 2002, several companies in the music recording industry had been issuing non-standard compact discs with deliberate errors burned into them, as copy protection measures.[4] Google also notably installed an automated system to help detect and block YouTube video uploads with content that entail copyright infringement.[5]
However, as individuals and enterprises engaged in computer crime have become more skilled and sophisticated, they erode the effectiveness of established perimeter-based security controls. The response is more pervasive use of data encryption technologies.[6] Forrester Research asserted in 2011 that there is an industry-wide "drive toward consolidated content security platforms", and they predict in 2012 that "proliferating malware threats will require better threat intelligence".[7] Forrester also asserts that content protection networks (especially in the form of software as a service, or SaaS) enable companies to protect against both e-mail and web-borne theft of content.[8] In some web applications, security is defined by URL patterns that identify protected content. For example, using the web.xml security-constraint element, content could be assigned values of NONE, INTEGRAL, and CONFIDENTIAL to describe the necessary transport guarantees.[9]