FAQ

How does the confidence score work?

The confidence score is a float number between 0 and 1, where 1 is the highest confidence. The confidence value varies depending on the length of the text and the selected languages to detect. Generally, the longer the text, the fewer the languages to detect, the more accurate the confidence score.

Why is What-Lang API more accurate than other language detection APIs?

Most of the language detection APIs use a probabilistic n-gram model to detect language. They usually use trigrams (n-grams with a size of 3) and What-Lang uses n-grams up to 5, so our API can detect language more accurately with fewer words.

In addition of the n-gram model, What-Lang also uses a rule-based model to detect the language. This way we can quickly identify the language as long as certain words are used. This way we can return sooner and provide an API as responsive as possible.