文本分析工具箱
分析和模型文本数据
文本分析工具箱™ provides algorithms and visualizations for preprocessing, analyzing, and modeling text data. Models created with the toolbox can be used in applications such as sentiment analysis, predictive maintenance, and topic modeling.
文本分析工具箱includes tools for processing raw text from sources such as equipment logs, news feeds, surveys, operator reports, and social media. You can extract text from popular file formats, preprocess raw text, extract individual words, convert text into numerical representations, and build statistical models.
Using machine learning techniques such as LSA, LDA, and word embeddings, you can find clusters and create features from high-dimensional text datasets. Features created with Text Analytics Toolbox can be combined with features from other data sources to build machine learning models that take advantage of textual, numeric, and other types of data.
Get Started:
提取文本数据
Import text data into MATLAB®from single files or large collections of files, including PDF, HTML, and Microsoft®Word®and Excel®files.
Language Support
Text Analytics Toolbox为英语,日语,德语和韩语提供了语言特定的预处理功能。大多数函数也使用其他语言的文本。
Clean Text Data
应用高级过滤功能以删除无关内容,例如URL,HTML标记和标点,以及正确的拼写。
过滤器停止单词并将单词标准化为根形式
Prioritize meaningful text data in your analysis by filtering out common words, words that appear too frequently or infrequently, and very long or very short words. Reduce the vocabulary and focus on the broader sense or sentiment of a document by stemming words to their root form or lemmatizing them to their dictionary form.
Identify Tokens, Sentences, and Parts-of-Speech
Automatically split raw text into a collection of words using a tokenization algorithm. Add sentence boundaries, part-of-speech details, and other relevant information for context.
Word and N-Gram Counting
计算字频统计信息以数字方式表示文本数据。
单词嵌入和编码
Train word-embedding models such as word2vec continuous bag-of-words (CBOW) and skip-gram models. Import pretrained models including fastText and GloVe.
主题建模
使用机器学习算法(如潜在的Dirichlet分配(LDA)和潜在语义分析(LSA))在大型文本数据中发现和可视化底层模式,趋势和复杂关系。
Document Summarization and Keyword Extraction
Extract summary and relevant keywords from one or more documents automatically and evaluate similarity and importance of documents.
Deep Learning with Text Data
Perform sentiment analysis and classification withdeep learningnetworks such as long short-term memory networks (LSTMs).
Text Classification
Classify text descriptions using word embeddings that can identify categories of text through deep learning.