Categorization for 37 languages

Netsweeper has been working hard to support as many languages as possible for categorization. We have implemented categorization support for 37 languages and are continually optimizing and improving. Over 2017, we hope to expand the number of languages we support to include any language that represents up to 0.1% of the content we are detecting.

Currently, Netsweeper supports categorization in the following languages.

Arabic German Norwegian Swedish
Bangla Hindi Persian Tagalog
Bulgarian Hungarian Polish Thai
Catalan Icelandic Portuguese Traditional Chinese
Croatian Indonesian Romanian Turkish
Danish Irish Russian Urdu
Dutch Italian Serbian Vietnamese
English Japanese Simplified Chinese
Estonian Korean Somali
French Malay Spanish

 

In the first few quarters of 2017, we expect to release support for the languages listed below. Although these languages do not represent a large amount of content that we categorize as a percentage, these languages can get requests between 1,000 to 20,000 pages per day.  Given that each of these languages can contain up to 1% of Adult related content, scanning these languages could improve our detection of Adult based content by up to 10,000 new Adult website detections per day.  When compared to the total Adult content category, this could represent up to a 0.02% improvement.

Slovak Greek
Breton Finnish
Afrikaans Tamil
Nepali Welsh
Czech Haitian

Every 4-6 months we will assess the current uncategorized languages we see in our production environment. Based on this information both past and present, we will continue the training process for other new languages. We will continue to keep our customers up-to-date with our progress and categorization updates.  If you feel a specific language should be trained, please let us know, we are always eager to hear from our customers, and respond to their requirements.