Categorization for 37 languages
Netsweeper has been working hard to support as many languages as possible for categorization. We have implemented categorization support for 37 languages and are continually optimizing and improving. Over 2017, we hope to expand the number of languages we support to include any language that represents up to 0.1% of the content we are detecting.
Currently, Netsweeper supports categorization in the following languages.
Arabic | German | Norwegian | Swedish |
Bangla | Hindi | Persian | Tagalog |
Bulgarian | Hungarian | Polish | Thai |
Catalan | Icelandic | Portuguese | Traditional Chinese |
Croatian | Indonesian | Romanian | Turkish |
Danish | Irish | Russian | Urdu |
Dutch | Italian | Serbian | Vietnamese |
English | Japanese | Simplified Chinese | |
Estonian | Korean | Somali | |
French | Malay | Spanish |
In the first few quarters of 2017, we expect to release support for the languages listed below. Although these languages do not represent a large amount of content that we categorize as a percentage, these languages can get requests between 1,000 to 20,000 pages per day. Given that each of these languages can contain up to 1% of Adult related content, scanning these languages could improve our detection of Adult based content by up to 10,000 new Adult website detections per day. When compared to the total Adult content category, this could represent up to a 0.02% improvement.
Slovak | Greek |
Breton | Finnish |
Afrikaans | Tamil |
Nepali | Welsh |
Czech | Haitian |
Every 4-6 months we will assess the current uncategorized languages we see in our production environment. Based on this information both past and present, we will continue the training process for other new languages. We will continue to keep our customers up-to-date with our progress and categorization updates. If you feel a specific language should be trained, please let us know, we are always eager to hear from our customers, and respond to their requirements.