DEVELOPERS
+Products for Developers
-nsConnect
+How It Works
-Categorization
Although filtering and reporting is the "visible" part of Internet resource management, the most critical function is a backend service generally referred to as categorization. For each URL, Internet application, or protocol, a category must be assigned. With this ruling made, the URL or service can be allowed or denied and logged.
Categorizing Internet applications and protocols is not time critical: new applications, new releases of existing applications and new protocols do not appear daily or even weekly. For example, a truly new Instant Messaging application may only appear once a year. At this rate, these items are easily categorized.
URLs are a different matter however. Estimates vary widely, but we see about 500,000 new URLs per day. Those are added to the 8 billion or so sites that are already out there. At that rate of growth, categorization of new URLs quickly and consistently is critical.
How does Netsweeper categorize URLs? Glad you asked. We are rather proud of our Categorization Service.
When the Catergorization Service receives a URL to categorize, the first thing it does is to check it's local cache. The surfing patterns of users are not evenly distributed: while some sites are universally popular other sites are trendy for relatively short periods of time. Keeping a cache of the most recent sites categorized provides a considerable speed enhancement.If the URL is not in the local cache, a master database is checked. This database allows mastering of entire domains (for example, all of www.sex.com can be considered pornography and all of www.microsoft.com can be considered technology) and of individual sites that have been specifically created to be deceptive. These two check take about a hundredth of a second - no time at all - yet they return the majority of the categorization rulings.
For a site that is truly new, the URL is handed off to the Categorization Engines. There are about 700 engines - an artificial intelligence application that retrieves and "looks at" each unique URL and using behavior models assigns a category (or multiple categories) to the URL.
The ruling is then stored for future reference. The next time the URL is requested, the previous categorization is used. Categorization rulings are dated. Depending on the category assigned to a URL, the URL will be automatically recategorized to ensure that the content has not changed. Categories assigned to URLs are not just set once and never considered again.
By using machine technology to categorize, Netsweeper is able to offer the fastest categorization of new URLs. Machine categorization by its very nature is more consistent and scalable than human categorization.
Consider how quickly a new computer virus can spread. Now think about your Internet users. Moment s after one user discovers of a new (inappropriate) web site, that path can be shared with hundreds of users. A new URL can be categorized in about 1 second by the
Netsweeper Categorization Service
Netsweeper will stop the proliferation of traffic to an inappropriate web site before it starts.
Our categorization engines can be trained if you have a unique definition of sites you want categorized. We also provide a simple method to have categorization rulings appealed - because no one is perfect, not even a categorization engine that has done over a billion categorizations.
print this page
email this page