Language Considerations
Text Analytics Toolbox™ supports the languages English, Japanese, German, and Korean. Most Text Analytics Toolbox functions also work with text in other languages. This table summarizes how to use Text Analytics Toolbox features for other languages.
Feature | Language Consideration | Workaround |
---|---|---|
Tokenization | The |
对于其他语言,您仍然可以尝试使用 For more information, see |
Stop word removal | The |
To remove stop words from other languages, use |
Sentence detection | The |
For other languages, you might need to specify your own list of abbreviations for sentence detection. To do this, use the For more information, see |
Word clouds | For string input, the |
For other languages, you might need to manually preprocess your text data and specify unique words and corresponding sizes in To specify word sizes in For more information, see |
Word embeddings | File input to the |
For files containing non-English text, you might need to input a 创建一个 For more information, see |
Keyword extraction | The |
The For other languages, specify an appropriate set of delimiters using the For more information, see |
The |
The For other languages, try using the For more information, see |
独立于语言的功能
Word and N-Gram Counting
ThebagOfWords
和bagOfNgrams
functions support象征性文档
input regardless of language. If you have a象征性文档
包含数据的数组,然后您可以使用这些功能。
建模和预测
Thefitlda
和fitlsa
functions supportbagOfWords
和bagOfNgrams
input regardless of language. If you have abagOfWords
或者bagOfNgrams
object containing your data, then you can use these functions.
ThetrainWordEmbedding
function supports象征性文档
无论语言或文件输入。如果你有a象征性文档
array or a file containing your data in the correct format, then you can use this function.
参考
[1]Unicode文本细分.https://www.unicode.org/reports/tr29/
[2]Boundary Analysis.https://unicode-org.github.io/icu/userguide/boundaryanalysis/
[3]MeCab: Yet Another Part-of-Speech and Morphological Analyzer.https://taku910.github.io/mecab/
See Also
stopWords
|removeWords
|normalizeWords
|bagOfWords
|bagOfNgrams
|象征性文档
|fitlda
|fitlsa
|wordcloud
|addSentenceDetails
|AddlagumationEtails